Core Testing Tools
SPOUT's core testing suite provides comprehensive validation of all core modules and model configurations. The testing framework includes both single-model and multi-model testing capabilities.
Core Test Runner
The core test runner validates all SPOUT modules against a single model:
# Windows
./test_core.bat
# Unix
./test_core.sh
Modules Tested
The test suite evaluates all core modules:
- reduce
- expand
- enhance
- search
- mutate
- generate
- iterate
- translate
- converse
- parse
- evaluate
- imagine
Test Process
For each module, the runner:
- Records initial test state
- Executes module tests
- Captures all outputs
- Calculates pass/fail rates
- Generates detailed logs
Multi-Model Test Runner (Gamut)
The gamut test runner executes core tests across all active models:
# Windows
./test_gamut.bat
# Unix
./test_gamut.sh
Features
- Reads active models from
models.ini
- Automatically switches between models
- Tracks per-model performance
- Calculates aggregate statistics
- Generates comprehensive reports
Configuration
Models are configured in spout/config/models.ini
:
gpt-3.5-turbo=1
gpt-4=1
claude-2=0 # Disabled model
Test Results
Core Test Output
Results are saved to tests/core_test-[timestamp].txt
:
Test Results - 2024-03-21_14-30-22
===================
Overall Pass Rate: 95% (57/60)
Module Results:
reduce: 100% (5/5)
expand: 90% (9/10)
enhance: 95% (19/20)
...
Gamut Test Output
Results are saved to tests/gamut_summary-[timestamp].txt
:
Model Performance Summary - 2024-03-21_14-30-22
=================================
Total Models Tested: 3
Total Duration: 0h 15m 45s
Average Duration: 5m 15s
Average Pass Rate: 92%
Models Tested Successfully:
- gpt-3.5-turbo (90%)
- gpt-4 (95%)
- claude-2 (91%)
Performance Metrics
Individual Tests
Each test captures:
- Execution time
- Response validity
- Error handling
- Token usage
- Response formatting
Aggregate Metrics
The summary includes:
- Overall pass rate
- Per-module statistics
- Total execution time
- Average response time
- Model comparisons
Best Practices
Regular Testing
- Run core tests after updates
- Test new models thoroughly
- Monitor performance trends
- Track error patterns
- Document issues
Test Management
- Archive test results
- Review performance regularly
- Compare model behaviors
- Track long-term trends
- Document anomalies
Error Analysis
- Review failed tests
- Check error patterns
- Validate error handling
- Monitor timeout rates
- Track recovery behavior
Run the gamut test suite when:
- Adding new models
- Updating model configurations
- Making significant changes
- Validating deployments
Core tests are designed to validate both functional correctness and error handling. Failed tests may indicate issues with model configuration, API access, or core module functionality.