Per-Slot LLM Parameters

Overview

Struckdown supports per-slot control over LLM parameters – temperature, thinking level, model selection, and more. Parameters are set inline using pipe syntax and validated at parse time.

Supported Parameters

Parameter	Type	Range / Values	Description
`temperature`	float	0.0 – 2.0	Randomness. Lower = deterministic, higher = creative
`thinking`	string	off, minimal, low, medium, high, xhigh	Extended reasoning level (provider-dependent)
`model`	string	any model name	Override the LLM model for this slot
`max_tokens`	int	> 0	Maximum tokens in response
`seed`	int	>= 0	For reproducible outputs

Additional parameters supported via extra_kwargs (not per-slot):

Parameter	Type	Description
`top_p`	float	Nucleus sampling
`timeout`	float	Request timeout in seconds
`presence_penalty`	float	Penalise tokens already in output
`frequency_penalty`	float	Penalise frequent tokens

Default Temperatures

Each response type has a sensible default temperature:

Response Type	Default Temp	Rationale
`extract`	0.0	Deterministic verbatim extraction
`pick`/`decide`/`bool`	0.0	Consistent selection/decision making
`int`/`date_rule`	0.0	Structured data needs precision
`number`/`date`/`time`/`duration`	0.1	Slight flexibility for interpretation
`think`	0.5	Balanced reasoning
`respond`/`default`	0.7	Natural but controlled responses
`speak`	0.8	More conversational variety
`poem`	1.5	Maximum creativity

Per-Slot Syntax

Override parameters on any completion slot using pipe syntax:

[[extract:quote|temperature=0.5]]
[[think:reasoning|temperature=0.3]]
[[poem:verse|temperature=1.8]]
[[extract:data|model=gpt-4o-mini]]
[[think:analysis|temperature=0.4,model=gpt-5]]

Model-specific options (like min, max, required) are preserved alongside LLM parameters:

[[number:score|min=0,max=100,temperature=0.0]]
[[date:when|required,temperature=0.2]]

The parser separates:

LLM parameters: temperature, thinking, model, max_tokens, seed – passed to the LLM
Slot options: min, max, required – used by response model factories

Thinking / Extended Reasoning

Use thinking to enable extended reasoning (chain-of-thought) on models that support it. The thinking parameter controls how much reasoning the model performs before producing its answer.

[[think:analysis|thinking=high]]
[[think:deep_reasoning|thinking=xhigh,temperature=0.3]]
[[pick:choice|yes,no|thinking=low]]

Levels:

off – explicitly disable thinking (for models where it’s on by default)
minimal, low, medium, high, xhigh – increasing reasoning depth

Omitting thinking means struckdown does not interfere – the provider’s default behaviour applies. This is distinct from thinking=off, which explicitly disables it.

Provider support: Thinking is supported by Claude (Opus, Sonnet with extended thinking), OpenAI o-series models, and other providers via pydantic-ai’s unified ModelSettings.thinking field. If a provider does not support the requested thinking level, the error propagates – it is not silently dropped.

Example: mixing thinking levels

Analyse this document carefully:


First, reason through the key themes:
[[think:reasoning|thinking=high]]

Then pick the dominant theme:
[[pick:theme|politics,economics,culture,science|thinking=off]]

Finally, write a summary:
[[respond:summary|temperature=0.7]]

Streaming

Free-text slots (respond, speak, think, extract, poem) are streamed token-by-token by default when using the CLI or the async incremental API. Constrained slots (pick, bool, int, etc.) complete atomically.

Streaming is transparent to template authors – no syntax changes required. It is controlled by the stream parameter on chatter_incremental_async() (default: True for async, False for sync wrapper).

Unsupported Parameters

When a parameter is not recognised by struckdown, the default behaviour is to log a warning and drop it:

WARNING: Dropped unsupported LLM parameters: top_k, custom_param

For stricter handling, enable strict_params to raise an error instead:

# Python API
result = chatter(template, strict_params=True)

# CLI
sd chat --strict-params -p template.sd

This is useful for catching typos or ensuring all parameters are supported by the current provider.

Priority Order

LLM parameters are applied in this priority order (highest to lowest):

Slot-specific overrides: [[type:var|temperature=X]]
Return type defaults: ResponseModel.llm_config
Global extra_kwargs: Passed to chatter() function

Examples

Basic usage

from struckdown import chatter

# uses default temperature for each type
result = chatter("""
Extract the quote: "Hello world"
[[extract:quote]]

Think about it:
[[think:analysis]]

Be creative:
[[poem:verse]]
""")

# quote uses temp=0.0 (deterministic)
# analysis uses temp=0.5 (balanced)
# verse uses temp=1.5 (creative)

With overrides

result = chatter("""
Extract carefully with slight flexibility:
[[extract:quote|temperature=0.1]]

Think very precisely:
[[think:analysis|temperature=0.2]]

Use a specific model:
[[think:reasoning|model=gpt-4o-mini]]
""")

With thinking

result = chatter("""
Reason deeply about this problem:
[[think:reasoning|thinking=high]]

Quick classification (no extended reasoning needed):
[[pick:category|A,B,C|thinking=off]]
""")

Cost optimisation

result = chatter("""
Simple extraction (cheap, deterministic):
[[extract:data|model=gpt-4o-mini,temperature=0.0]]

Complex reasoning (expensive, careful):
[[think:analysis|model=gpt-5,temperature=0.3,thinking=high]]
""")