Getting Started with Struckdown
Struckdown lets you extract structured data from text using LLMs. Instead of parsing free-form responses, you define exactly what you want and get validated, typed results.
Installation
# Install UV if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install struckdown
uv tool install git+https://github.com/benwhalley/struckdown
Configuration
Set your LLM credentials:
export LLM_API_KEY="sk-..."
export LLM_API_BASE="https://api.openai.com/v1"
export DEFAULT_LLM="gpt-4o-mini"
Your First Extraction
The core idea: use [[slot]] to mark where the LLM should respond, and get back structured data.
sd chat "The sky is blue. Is this true? [[bool:is_true]]"
Output:
{"is_true": true}
Compare this to a raw LLM call – you’d get “Yes, that’s correct!” and have to parse it yourself.
Why Struckdown?
1. Typed Extractions
Get exactly the data type you need:
# Boolean
sd chat "Is Python a programming language? [[bool:answer]]"
# {"answer": true}
# Number with constraints
sd chat "Rate this product (1-10): 'Amazing quality!' [[int:rating|min=1,max=10]]"
# {"rating": 9}
# Pick from options
sd chat "Classify: 'I hate waiting' [[pick:sentiment|positive,negative,neutral]]"
# {"sentiment": "negative"}
# Date extraction
sd chat "Meeting scheduled for next Tuesday [[date:when]]"
# {"when": "2024-01-16"}
2. Batch Processing
Process hundreds of files with one command:
# Summarise all text files
sd batch *.txt "Summarise in 5 words: [[summary]]" -o summaries.json
# Extract structured data from documents
sd batch documents/*.txt "
Name: [[extract:name]]
Email: [[extract:email]]
Phone: [[extract:phone]]
" -o contacts.csv
# Classify with multiple fields
sd batch reviews/*.txt "
Sentiment: [[pick:sentiment|positive,negative,neutral]]
Urgent: [[bool:urgent]]
Topic: [[pick:topic|billing,support,sales,other]]
" -o classified.xlsx
3. Multi-Step Reasoning
Use <checkpoint> to break complex tasks into stages, saving tokens:
sd chat "
Read this article and identify the main argument:
Main argument: [[extract:argument]]
<checkpoint>
Now critique this argument:
Critique: [[critique]]
" -s article.txt
Everything before <checkpoint> is forgotten – only `` carries forward. This saves tokens on long documents.
4. Required Fields and Validation
Ensure you always get what you need:
# ! prefix makes the field required
sd chat "Extract the price: 'Contact us for pricing' [[!number:price]]"
# Will indicate no valid price found rather than guessing
# Pattern matching
sd chat 'Find the module code: "PSYC2001 is great" [[extract:code|pattern="\w{4}\d+"]]'
# {"code": "PSYC2001"}
Common Patterns
Extract Structured Data from Files
sd batch invoices/*.pdf "
Invoice number: [[extract:invoice_no]]
Date: [[date:date]]
Total: [[number:total]]
Paid: [[bool:paid]]
" -o invoices.csv
Classify and Route
sd batch emails/*.txt "
Priority: [[pick:priority|high,medium,low]]
Department: [[pick:dept|sales,support,billing,hr]]
Requires response: [[bool:needs_reply]]
" -o routing.json
Chain Operations
Pipe JSON output through multiple processing steps:
sd batch *.txt "Extract company name: [[extract:company]]" | \
sd batch "Find stock ticker: [[extract:ticker]]" -k
The -k flag keeps input fields, so you get both company and ticker in the output.
Web Research
Fetch and process web content:
# Fetch a URL and extract data
sd chat " Extract the main product and price [[extract:product]] [[number:price]]" \
-s https://example.com/product
# Or use the @search action for web search
sd chat "[[@search:results|query='best python testing frameworks']] Summarise the top 3: [[summary]]"
Using Prompt Files
For complex prompts, save them as .sd files:
{# feedback_classifier.sd #}
<system>
You are a customer feedback analyst.
Be objective and precise.
</system>
Customer feedback:
Analysis:
- Sentiment: [[pick:sentiment|positive,negative,neutral,mixed]]
- Topics: [[pick+:topics|product,service,price,delivery,quality]]
- Urgency: [[int:urgency|min=1,max=5]]
- Summary: [[extract:summary]]
Run with:
sd batch feedback/*.txt -p feedback_classifier.sd -o analysis.xlsx
Python API
Use struckdown programmatically:
from struckdown import chatter
result = chatter("""
Analyse this customer review:
Sentiment: [[pick:sentiment|positive,negative,neutral]]
Rating: [[int:rating|min=1,max=5]]
Key points: [[extract+:points]]
""", context={"review": "Great product but shipping was slow"})
print(result["sentiment"]) # "positive"
print(result["rating"]) # 4
print(result["points"]) # ["Good product quality", "Slow shipping"]
print(result.total_cost) # 0.0001 (USD)
For async processing:
from struckdown import chatter_async
import asyncio
async def process_many(reviews):
tasks = [
chatter_async("Sentiment: [[pick:sentiment|pos,neg]] ", context={"r": r})
for r in reviews
]
return await asyncio.gather(*tasks)
Next Steps
- Template Syntax – Complete syntax reference
- CLI Reference – All CLI commands and options
- Custom Actions – Extend with Python plugins
- Caching – How caching works