Quote Verification Approach

Problem

LLMs may hallucinate, paraphrase, or truncate quotes during qualitative analysis. Quote verification locates extracted quotes in source documents and provides confidence metrics.

Method

Two-Stage Hybrid Approach

Stage 1: BM25 (Lexical)

Segments documents into overlapping windows
Builds BM25 index for fast retrieval
Finds best matching window for each quote

Stage 2: Embeddings (Semantic)

Computes cosine similarity between quote and matched span
Validates semantic equivalence

BM25 narrows the search space efficiently; embeddings catch paraphrases.

Ellipsis Handling

Quotes like "beginning ... end" are matched by:

Splitting on ellipsis pattern
Finding BM25 matches for head and tail fragments separately
Reconstructing the span between them
Applying gap constraints to prevent false matches

Windowing

Documents are split into overlapping windows:

Size: 1.1× longest quote (capped at 500 chars)
Overlap: 30% of window size
Position tracking enables source document reconstruction

Matched windows are trimmed to quote boundaries using fuzzy matching:

Finds longest character-level matches
Snaps to word boundaries
Expands to neighbor windows if truncated at boundaries

Output Metrics

Metric	Meaning
`bm25_score`	Lexical match strength
`bm25_ratio`	Match uniqueness (top1/top2)
`cosine_similarity`	Semantic equivalence (0-1)
`match_ratio`	Boundary alignment quality

Interpretation:

High BM25 + High cosine = Verbatim quote
Low BM25 + High cosine = Paraphrase (review)
Low BM25 + Low cosine = Hallucination (reject)

Configuration

BM25 Parameters:

k1=1.5: Term frequency saturation
b=0.4: Length normalization (lower than default to reduce penalties for variable-length quotes)

Trimming:

Method: fuzzy (default), sliding_bm25, or hybrid
min_fuzzy_ratio=0.6: Minimum match quality threshold

Windows:

expand_window_neighbors=1: Search ±N windows if match appears truncated

Validation Workflow

Run VerifyQuotes node
Review aggregate statistics (mean_cosine, n_low_match_confidence)
Inspect low-confidence matches in Excel export (sorted by confidence)
Manually verify or exclude problematic quotes

Limitations

Quotes exceeding 500 characters may fail
English-centric tokenization (NLTK)
Cannot definitively distinguish paraphrases from hallucinations
Embedding computation scales linearly with quote count

Academic Reporting

Algorithm Description:

Quote verification used a two-stage hybrid approach. Source documents were segmented into overlapping windows (1.1× longest quote, 30% overlap). BM25 (k1=1.5, b=0.4) identified candidate spans. Quotes with ellipses were matched by locating head and tail fragments separately. Semantic similarity was computed using [embedding model] with cosine distance. Span boundaries were refined using fuzzy matching and snapped to word boundaries.

Validation:

Matches were validated if cosine similarity exceeded [threshold] and BM25 ratio exceeded [threshold]. [X]% of quotes met these criteria. Low-confidence matches (n=[Y]) were manually reviewed.

Reproducibility:

Report all parameter values
Include aggregate statistics
Archive Excel exports as supplementary materials

Philosophy

This system supports human judgment rather than replacing it:

Efficiently triages thousands of quotes
Provides multiple confidence signals
Surfaces edge cases for expert review
Enables reproducibility through transparent parameters