Core Testing Tools

SPOUT's core testing suite provides comprehensive validation of all core modules and model configurations. The testing framework includes both single-model and multi-model testing capabilities.

Core Test Runner

The core test runner validates all SPOUT modules against a single model:

# Windows
./test_core.bat

# Unix
./test_core.sh

Modules Tested

The test suite evaluates all core modules:

  1. reduce
  2. expand
  3. enhance
  4. search
  5. mutate
  6. generate
  7. iterate
  8. translate
  9. converse
  10. parse
  11. evaluate
  12. imagine

Test Process

For each module, the runner:

  1. Records initial test state
  2. Executes module tests
  3. Captures all outputs
  4. Calculates pass/fail rates
  5. Generates detailed logs

Multi-Model Test Runner (Gamut)

The gamut test runner executes core tests across all active models:

# Windows
./test_gamut.bat

# Unix
./test_gamut.sh

Features

  • Reads active models from models.ini
  • Automatically switches between models
  • Tracks per-model performance
  • Calculates aggregate statistics
  • Generates comprehensive reports

Configuration

Models are configured in spout/config/models.ini:

gpt-3.5-turbo=1
gpt-4=1
claude-2=0  # Disabled model

Test Results

Core Test Output

Results are saved to tests/core_test-[timestamp].txt:

Test Results - 2024-03-21_14-30-22
===================

Overall Pass Rate: 95% (57/60)

Module Results:
reduce: 100% (5/5)
expand: 90% (9/10)
enhance: 95% (19/20)
...

Gamut Test Output

Results are saved to tests/gamut_summary-[timestamp].txt:

Model Performance Summary - 2024-03-21_14-30-22
=================================

Total Models Tested: 3
Total Duration: 0h 15m 45s
Average Duration: 5m 15s
Average Pass Rate: 92%

Models Tested Successfully:
- gpt-3.5-turbo (90%)
- gpt-4 (95%)
- claude-2 (91%)

Performance Metrics

Individual Tests

Each test captures:

  • Execution time
  • Response validity
  • Error handling
  • Token usage
  • Response formatting

Aggregate Metrics

The summary includes:

  • Overall pass rate
  • Per-module statistics
  • Total execution time
  • Average response time
  • Model comparisons

Best Practices

Regular Testing

  • Run core tests after updates
  • Test new models thoroughly
  • Monitor performance trends
  • Track error patterns
  • Document issues

Test Management

  • Archive test results
  • Review performance regularly
  • Compare model behaviors
  • Track long-term trends
  • Document anomalies

Error Analysis

  • Review failed tests
  • Check error patterns
  • Validate error handling
  • Monitor timeout rates
  • Track recovery behavior
Run the gamut test suite when:
  • Adding new models
  • Updating model configurations
  • Making significant changes
  • Validating deployments
Core tests are designed to validate both functional correctness and error handling. Failed tests may indicate issues with model configuration, API access, or core module functionality.