Struckdown now supports per-completion-slot temperature and model overrides, allowing fine-grained control over LLM parameters for each completion type.
Each response type now has a sensible default temperature automatically applied:
| Response Type | Default Temp | Rationale |
|---|---|---|
extract |
0.0 | Deterministic verbatim extraction |
pick/decide/bool |
0.0 | Consistent selection/decision making |
int/date_rule |
0.0 | Structured data needs precision |
number/date/time/duration |
0.1 | Slight flexibility for interpretation |
think |
0.5 | Balanced reasoning |
respond/default |
0.7 | Natural but controlled responses |
speak |
0.8 | More conversational variety |
poem |
1.5 | Maximum creativity |
You can override temperature and model on any completion slot using pipe syntax:
[[extract:quote|temperature=0.5]]
[[think:reasoning|temperature=0.3]]
[[poem:verse|temperature=1.8]]
[[extract:data|model=gpt-4o-mini]]
[[think:analysis|temperature=0.4,model=gpt-5]]
Temperature values are validated at parse time:
Options specific to response models (like min, max, required) are preserved:
[[number:score|min=0,max=100,temperature=0.0]]
[[date:when|required,temperature=0.2]]
The parser intelligently separates:
temperature, model, max_tokens, top_p, etc. → passed to LLMmin, max, required → used by response model factoriesreturn_type_models.py)
_llm_defaults TypedDict for type safety__init_subclass__ hook to automatically extract defaults from subclassesdefault_temperature fieldparsing.py)
_parse_options() separates LLM kwargs from model-specific optionsPromptPart.llm_kwargs__init__.py)
response_model._llm_defaultsprompt_part.llm_kwargsextra_kwargs (slot-specific takes priority)LLM parameters are applied in this priority order (highest to lowest):
[[type:var|temperature=X]]ResponseModel._llm_defaultschatter() functionfrom struckdown import chatter
# Uses default temperature for each type
result = chatter("""
Extract the quote: "Hello world"
[[extract:quote]]
Think about it:
[[think:analysis]]
Be creative:
[[poem:verse]]
""")
# quote uses temp=0.0 (deterministic)
# analysis uses temp=0.5 (balanced)
# verse uses temp=1.5 (creative)
# Override specific slots
result = chatter("""
Extract carefully with slight flexibility:
[[extract:quote|temperature=0.1]]
Think very precisely:
[[think:analysis|temperature=0.2]]
Use a specific model:
[[think:reasoning|model=gpt-4o-mini]]
""")
# Mix model-specific options with LLM parameters
result = chatter("""
Score from 0-100, deterministically:
[[number:score|min=0,max=100,temperature=0.0]]
Required date with low temperature:
[[date:deadline|required,temperature=0.1]]
""")
✅ All existing templates work without changes ✅ Default temperatures applied automatically ✅ No breaking changes to API ✅ All 114 existing tests pass
Comprehensive test suite in tests/test_temperature_overrides.py:
LLMDefaults TypedDict constrains temperature to float and model to Optional[str]Additional LLM parameters can be easily added:
# In parsing.py
LLM_PARAM_KEYS = {
'temperature', 'model', 'max_tokens',
'top_p', 'top_k', # Already supported!
'frequency_penalty', 'presence_penalty'
}
Potential additions:
No migration needed! But you can now:
extra_kwargs - defaults are betterExample optimization:
# Before: everything uses same model/temp
result = chatter(template, extra_kwargs={'temperature': 0.5})
# After: optimize per-slot
result = chatter("""
Simple extraction (cheap, deterministic):
[[extract:data|model=gpt-4o-mini,temperature=0.0]]
Complex reasoning (expensive, careful):
[[think:analysis|model=gpt-5,temperature=0.3]]
""")
This feature provides:
The implementation is clean, tested, and ready for production use!