Per-Slot LLM Parameters
Overview
Struckdown supports per-slot control over LLM parameters – temperature, thinking level, model selection, and more. Parameters are set inline using pipe syntax and validated at parse time.
Supported Parameters
| Parameter | Type | Range / Values | Description |
|---|---|---|---|
temperature | float | 0.0 – 2.0 | Randomness. Lower = deterministic, higher = creative |
thinking | string | off, minimal, low, medium, high, xhigh | Extended reasoning level (provider-dependent) |
model | string | any model name | Override the LLM model for this slot |
max_tokens | int | > 0 | Maximum tokens in response |
seed | int | >= 0 | For reproducible outputs |
Additional parameters supported via extra_kwargs (not per-slot):
| Parameter | Type | Description |
|---|---|---|
top_p | float | Nucleus sampling |
timeout | float | Request timeout in seconds |
presence_penalty | float | Penalise tokens already in output |
frequency_penalty | float | Penalise frequent tokens |
Default Temperatures
Each response type has a sensible default temperature:
| Response Type | Default Temp | Rationale |
|---|---|---|
extract | 0.0 | Deterministic verbatim extraction |
pick/decide/bool | 0.0 | Consistent selection/decision making |
int/date_rule | 0.0 | Structured data needs precision |
number/date/time/duration | 0.1 | Slight flexibility for interpretation |
think | 0.5 | Balanced reasoning |
respond/default | 0.7 | Natural but controlled responses |
speak | 0.8 | More conversational variety |
poem | 1.5 | Maximum creativity |
Per-Slot Syntax
Override parameters on any completion slot using pipe syntax:
[[extract:quote|temperature=0.5]]
[[think:reasoning|temperature=0.3]]
[[poem:verse|temperature=1.8]]
[[extract:data|model=gpt-4o-mini]]
[[think:analysis|temperature=0.4,model=gpt-5]]
Model-specific options (like min, max, required) are preserved alongside LLM parameters:
[[number:score|min=0,max=100,temperature=0.0]]
[[date:when|required,temperature=0.2]]
The parser separates:
- LLM parameters:
temperature,thinking,model,max_tokens,seed– passed to the LLM - Slot options:
min,max,required– used by response model factories
Thinking / Extended Reasoning
Use thinking to enable extended reasoning (chain-of-thought) on models that support it. The thinking parameter controls how much reasoning the model performs before producing its answer.
[[think:analysis|thinking=high]]
[[think:deep_reasoning|thinking=xhigh,temperature=0.3]]
[[pick:choice|yes,no|thinking=low]]
Levels:
off– explicitly disable thinking (for models where it’s on by default)minimal,low,medium,high,xhigh– increasing reasoning depth
Omitting thinking means struckdown does not interfere – the provider’s default behaviour applies. This is distinct from thinking=off, which explicitly disables it.
Provider support: Thinking is supported by Claude (Opus, Sonnet with extended thinking), OpenAI o-series models, and other providers via pydantic-ai’s unified ModelSettings.thinking field. If a provider does not support the requested thinking level, the error propagates – it is not silently dropped.
Example: mixing thinking levels
Analyse this document carefully:
First, reason through the key themes:
[[think:reasoning|thinking=high]]
Then pick the dominant theme:
[[pick:theme|politics,economics,culture,science|thinking=off]]
Finally, write a summary:
[[respond:summary|temperature=0.7]]
Streaming
Free-text slots (respond, speak, think, extract, poem) are streamed token-by-token by default when using the CLI or the async incremental API. Constrained slots (pick, bool, int, etc.) complete atomically.
Streaming is transparent to template authors – no syntax changes required. It is controlled by the stream parameter on chatter_incremental_async() (default: True for async, False for sync wrapper).
Unsupported Parameters
When a parameter is not recognised by struckdown, the default behaviour is to log a warning and drop it:
WARNING: Dropped unsupported LLM parameters: top_k, custom_param
For stricter handling, enable strict_params to raise an error instead:
# Python API
result = chatter(template, strict_params=True)
# CLI
sd chat --strict-params -p template.sd
This is useful for catching typos or ensuring all parameters are supported by the current provider.
Priority Order
LLM parameters are applied in this priority order (highest to lowest):
- Slot-specific overrides:
[[type:var|temperature=X]] - Return type defaults:
ResponseModel.llm_config - Global extra_kwargs: Passed to
chatter()function
Examples
Basic usage
from struckdown import chatter
# uses default temperature for each type
result = chatter("""
Extract the quote: "Hello world"
[[extract:quote]]
Think about it:
[[think:analysis]]
Be creative:
[[poem:verse]]
""")
# quote uses temp=0.0 (deterministic)
# analysis uses temp=0.5 (balanced)
# verse uses temp=1.5 (creative)
With overrides
result = chatter("""
Extract carefully with slight flexibility:
[[extract:quote|temperature=0.1]]
Think very precisely:
[[think:analysis|temperature=0.2]]
Use a specific model:
[[think:reasoning|model=gpt-4o-mini]]
""")
With thinking
result = chatter("""
Reason deeply about this problem:
[[think:reasoning|thinking=high]]
Quick classification (no extended reasoning needed):
[[pick:category|A,B,C|thinking=off]]
""")
Cost optimisation
result = chatter("""
Simple extraction (cheap, deterministic):
[[extract:data|model=gpt-4o-mini,temperature=0.0]]
Complex reasoning (expensive, careful):
[[think:analysis|model=gpt-5,temperature=0.3,thinking=high]]
""")