soak
Rapid, reproducible qualitative data analysis with LLMs.

Soak helps qualitative researchers rapidly define and run qualitative data analyses. It provides graph-based pipelines for thematic analysis, classification, and structured data extraction from unstructured text. It encourages expert supervision and oversight of AI analyses and facilitates transparent sharing of reproducible research.
Tutorials
- Getting Started – Installation and your first analysis
- Customizing Your Analysis – Adapting prompts and directing the LLM to answer specific research questions.
- Working with Results – Understanding codes, themes, and exports. Comparing analysis outputs.
How-to Guides
Analysis workflows
- Thematic Analysis – Inductive coding and theme generation
- Build a Classifier – Structured classification pipelines
- Ground Truth Validation – Validate against labelled data
Working with data
- Working with Spreadsheet Data – Process CSV and Excel files
- Pre-extraction Workflow – Filter text before analysis
- Adapting Pipelines – Customizing pipeline workflows
Explanations
- What is soak? – Purpose and design philosophy
- DAG Architecture – Why pipelines, execution model
- Node Types – Understanding different processing nodes
- Template System – Jinja2 and struckdown syntax
- Quote Verification – How quote validation works
- Model Aliases – Configuring models for pipeline roles
Reference
- CLI Reference – Command-line interface
- Node Reference – All node types and parameters
- Quote Verification Algorithm – Technical specification
Example outputs
- Thematic analysis (simple) – Analysis of patient interviews
- Thematic analysis (extended) – Same data, different model
- Analysis comparison – Comparing two analyses
- Classifier output – Structured data extraction
Support
- GitHub: github.com/benwhalley/soak
- Issues: github.com/benwhalley/soak/issues