Model Aliases
Model aliases provide a flexible way to configure different LLM models for different roles within a pipeline without hardcoding model IDs. This allows pipelines to define semantic roles (e.g., default, best, embeddings) that can be assigned different models at runtime.
Why Use Model Aliases?
When building analysis pipelines, different tasks may benefit from different models:
- Fast, cheap model for routine operations (chunking, initial filtering)
- High-capability model for complex reasoning (theme synthesis, narrative generation)
- Embedding model for semantic similarity and clustering
Hardcoding model names makes pipelines inflexible. Model aliases let pipeline authors define what kind of model is needed, while users choose which specific model to use.
Defining Aliases in Pipelines
Pipelines define model aliases in the default_config.models section:
name: thematic_analysis
default_config:
models:
default: gpt-4.1-mini # fast, routine tasks
best: gpt-4.1 # complex reasoning
embeddings: text-embedding-3-large
seed: 42
nodes:
- name: extract_codes
type: Map
model: default # uses the 'default' alias
- name: synthesize_themes
type: Transform
model: best # uses the 'best' alias
Alias names are arbitrary – you can define any aliases that make sense for your pipeline. Common conventions:
| Alias | Typical Use |
|---|---|
default | General-purpose tasks, used when no alias specified |
best | High-quality outputs requiring stronger reasoning |
cheap | Cost-sensitive operations, high-volume tasks |
fast | Low-latency operations |
Using Aliases from the CLI
Override model aliases when running pipelines with the --model flag:
# Set the default model (simple form)
uv run soak run pipeline data/*.txt --model gpt-4.1
# Override specific aliases
uv run soak run pipeline data/*.txt --model default=gpt-4.1-mini --model best=gpt-4.1
# Multiple overrides
uv run soak run pipeline data/*.txt \
--model default=claude-3-5-sonnet \
--model best=claude-3-opus \
--model cheap=gpt-4.1-mini
Syntax
| Form | Effect |
|---|---|
--model gpt-4.1 | Sets the default alias |
--model alias=model | Sets a specific alias |
You can combine multiple --model flags to override several aliases at once.
Using Aliases in the Web UI
When creating a run in the web interface:
- Select a pipeline – the UI loads its defined aliases
- For each alias, a dropdown appears showing available models
- Select which model to use for each role
- The embedding model has its own selector
The UI pre-populates dropdowns with the pipeline’s default values when available.
How Aliases are Resolved
At execution time:
- Pipeline defaults – aliases defined in
default_config.modelsprovide base values - User overrides – CLI flags or web UI selections override specific aliases
- Node lookup – each node’s
modelfield references an alias name - Resolution – the alias is resolved to an actual model ID for API calls
Pipeline YAML: default_config.models.best = "gpt-4.1"
↓
User override: --model best=claude-3-5-sonnet
↓
Node definition: model: best
↓
Resolved at runtime: "claude-3-5-sonnet"
Example Pipeline
A complete pipeline using multiple aliases:
name: multi_model_analysis
default_config:
models:
default: gpt-4.1-mini
best: gpt-4.1
cheap: gpt-4.1-mini
embeddings: text-embedding-3-large
seed: 1
nodes:
# Split uses no model (structural operation)
- name: chunks
type: Split
chunk_size: 30000
# High-volume coding -- use cheap/fast model
- name: initial_codes
type: Map
model: cheap
inputs: [chunks]
# Cluster uses embeddings (configured separately)
- name: grouped_codes
type: Cluster
inputs: [initial_codes]
# Theme synthesis -- use best model for quality
- name: themes
type: Transform
model: best
inputs: [grouped_codes]
# Final narrative -- use best model
- name: narrative
type: Transform
model: best
inputs: [themes]
Running with different model configurations:
# Use defaults from pipeline
uv run soak run multi_model_analysis data/*.txt
# Use Claude for everything
uv run soak run multi_model_analysis data/*.txt \
--model default=claude-3-5-sonnet \
--model best=claude-3-5-sonnet \
--model cheap=claude-3-5-sonnet
# Mix providers
uv run soak run multi_model_analysis data/*.txt \
--model cheap=gpt-4.1-mini \
--model best=claude-3-opus
Embeddings
The embeddings configuration is separate from LLM model aliases. It’s used by:
Clusternodes for semantic groupingVerifyQuotesfor quote similarity matchingcomparecommand for analysis comparison
Set in pipeline:
default_config:
embeddings: text-embedding-3-large
Override from CLI:
uv run soak compare --embedding-model text-embedding-3-small *.json
Best Practices
-
Use meaningful alias names –
bestandcheapare clearer thanmodel1andmodel2 -
Set sensible defaults – pipelines should work out of the box with their default models
-
Document your aliases – add comments explaining what each alias is used for
-
Consider cost – assign cheaper models to high-volume operations (Map nodes with many items)
-
Test with different models – verify your pipeline works across different providers
See Also
- CLI Reference – full
--modelflag documentation - Node Types – which nodes use models
- DAG Architecture – how pipelines execute