soak

Rapid, reproducible qualitative data analysis with LLMs.

Soak helps qualitative researchers rapidly define and run qualitative data analyses. It provides graph-based pipelines for thematic analysis, classification, and structured data extraction from unstructured text. It encourages expert supervision and oversight of AI analyses and facilitates transparent sharing of reproducible research.

Tutorials

Getting Started – Installation and your first analysis
Customizing Your Analysis – Adapting prompts and directing the LLM to answer specific research questions.
Working with Results – Understanding codes, themes, and exports. Comparing analysis outputs.

How-to Guides

Analysis workflows

Thematic Analysis – Inductive coding and theme generation
Build a Classifier – Structured classification pipelines
Ground Truth Validation – Validate against labelled data

Working with data

Working with Spreadsheet Data – Process CSV and Excel files
Pre-extraction Workflow – Filter text before analysis
Adapting Pipelines – Customizing pipeline workflows

Explanations

What is soak? – Purpose and design philosophy
DAG Architecture – Why pipelines, execution model
Node Types – Understanding different processing nodes
Template System – Jinja2 and struckdown syntax
Quote Verification – How quote validation works
Model Aliases – Configuring models for pipeline roles

Reference

CLI Reference – Command-line interface
Node Reference – All node types and parameters
Quote Verification Algorithm – Technical specification

Example outputs

Thematic analysis (simple) – Analysis of patient interviews
Thematic analysis (extended) – Same data, different model
Analysis comparison – Comparing two analyses
Classifier output – Structured data extraction

Support

GitHub: github.com/benwhalley/soak
Issues: github.com/benwhalley/soak/issues