Comprehensive test suite for validating temporal type extractions (dates, datetimes, times, durations) in Struckdown.
uv run python examples/temporal_test_cases.py
See detailed results for each test case:
uv run python examples/temporal_test_cases.py --verbose
# or
uv run python examples/temporal_test_cases.py -v
Useful for debugging:
uv run python examples/temporal_test_cases.py --stop-on-error
# or
uv run python examples/temporal_test_cases.py -x
uv run python examples/temporal_test_cases.py --verbose --stop-on-error
The test suite includes 38 comprehensive test cases covering:
[[date*:dates]][[date{2}:dates]] - exactly 2 dates[[date{1,3}:dates]] - 1 to 3 dates[[date+:dates]] - at least 1 date[[date*:dates]] - zero or more dates================================================================================
TEMPORAL EXTRACTION TEST SUITE
================================================================================
Total test cases: 38
Categories: 12
================================================================================
CATEGORY: Single Dates - Explicit
================================================================================
[1/2] Extract explicit date (Jan 15, 2024)... ✓ PASSED
[2/2] Extract ISO format date... ✓ PASSED
...
================================================================================
TEST SUMMARY
================================================================================
✓ Passed: 38/38
✗ Failed: 0/38
⚠ Errors: 0/38
================================================================================
SUCCESS RATE: 100.0%
================================================================================
Tests verify that natural language patterns like “first 2 Tuesdays in September” are:
dateutil.rruleTests verify that:
Tests verify:
[[date:var]] extracts single date, warns if pattern generates multiple[[date*:vars]] extracts list of datesTests verify that date_rule extractions receive temporal context hints, ensuring proper year inference.
To add a new test case, append to the TEST_CASES list in temporal_test_cases.py:
{
"category": "Your Category",
"description": "Brief description of what's being tested",
"prompt": "Your prompt with [[date:var]] or other temporal slots",
"expected_type": date, # or datetime, time, timedelta, list, etc.
"validate": lambda r: (
# Your validation logic
isinstance(r["var"], date)
and r["var"].month == 9
),
},
isinstance(r["varname"], expected_type) for type checking.year, .month, .day, .weekday().hour, .minutetimedelta and str (LLM may return strings)len() and use all() for element validationThe test suite clears the Struckdown cache before each run. If you encounter issues:
rm -rf ~/.struckdown/cache
0: All tests passed1: At least one test failed or error occurredPerfect for CI/CD integration!
Running all 38 tests typically takes 2-5 minutes depending on:
Use --stop-on-error for faster debugging iterations.