Skip to content

Testing

CalcFlow uses four test levels, each targeting a distinct layer of complexity: pure logic, parser contracts, component integration, and numerical regression.

The Four Levels

Marker Scope Speed Asserts
unit Single function or method <1 ms Correctness of pure logic
contract Single BlockParser + minimal fixture <10 ms Data structure (not precision)
integration Multiple components, real output files 10–500 ms Functional correctness
regression Same scope as integration Slow Exact numerical values

Unit tests

Test individual functions in isolation. No file I/O, no parsers, no external state. Input and expected output are constructed inline.

@pytest.mark.unit
def test_atom_validates_symbol():
    from calcflow.common.models import Atom
    with pytest.raises(ValueError):
        Atom(symbol="Xx", x=0.0, y=0.0, z=0.0)

Contract tests

Verify that a BlockParser produces the correct data structure from a minimal fixture. The fixture is the smallest possible snippet of output text that exercises the parser — not a real output file.

MULLIKEN_FIXTURE = """\
 Mulliken charges:
               1
     1  O   -0.8476
     2  H    0.4238
     3  H    0.4238
 Sum of Mulliken charges =   0.00000
"""

@pytest.mark.contract
def test_mulliken_charges_parsed():
    result = parse_qchem_output(MULLIKEN_FIXTURE)
    mulliken = result.get_charges("Mulliken")
    assert mulliken is not None
    assert len(mulliken.charges) == 3
    assert isinstance(mulliken.charges[0], float)

Contract tests are the workhorses of the test suite. Write one for every BlockParser you add.

Integration tests

Test the full parse pipeline against real output files from tests/testing_data/. Assert on functional correctness — the right fields are populated, the right states are found — but do not assert on exact floating-point values.

@pytest.mark.integration
def test_qchem_tddft_sp_parses_excited_states(testing_data_path):
    text = (testing_data_path / "qchem" / "h2o" / "tddft_sp.out").read_text()
    result = parse_qchem_output(text)
    assert result.termination_status == "NORMAL"
    assert result.tddft is not None
    assert len(result.tddft.tddft_states) > 0
    assert result.tddft.tddft_states[0].excitation_energy_ev > 0

Regression tests

Same scope as integration, but assert on exact values using pytest.approx. These tests catch unintended numerical changes — a parser refactor that accidentally changes a parsed energy value.

@pytest.mark.regression
def test_qchem_tddft_sp_first_state_energy(testing_data_path):
    text = (testing_data_path / "qchem" / "h2o" / "tddft_sp.out").read_text()
    result = parse_qchem_output(text)
    state = result.tddft.tddft_states[0]
    assert state.excitation_energy_ev == pytest.approx(7.654, abs=1e-3)
    assert state.oscillator_strength  == pytest.approx(0.0821, abs=1e-4)

Directory Structure

The test directory mirrors the source layout:

tests/
    conftest.py              # testing_data_path fixture (shared)
    smoke_test.py            # end-to-end smoke tests

    common/
        test_api_docs.py
        test_input_serialization.py
        test_results_serialization.py

    geometry/
        test_static.py
        test_topology.py
        test_trajectory.py
        test_annotated.py

    io/
        test_peekable.py
        orca/
            orca_builders/
            orca_parsers/
        qchem/
            qchem_builders/
            qchem_parsers/
                tddft/
                adc/

    postprocess/
        test_postprocess.py

    testing_data/
        geometries/           # .xyz files
        orca/h2o/             # .out files
        qchem/h2o/            # .out files

Every test file lives in a directory that mirrors its corresponding source module. Parser tests live under io/<program>/qchem_parsers/, builder tests under io/<program>/<program>_builders/.

Fixtures

testing_data_path

Defined in tests/conftest.py. Returns a Path to tests/testing_data/:

@pytest.fixture
def testing_data_path():
    return Path(__file__).parent / "testing_data"

Use it in integration and regression tests:

def test_something(testing_data_path):
    text = (testing_data_path / "qchem" / "h2o" / "sp.out").read_text()

Shared builder fixtures

For builder tests, define common fixtures in a conftest.py at the appropriate level:

@pytest.fixture
def water_geometry():
    return Geometry.from_lines([
        "O    0.000000    0.000000    0.119748",
        "H    0.000000    0.756950   -0.478993",
        "H    0.000000   -0.756950   -0.478993",
    ], source="water")

@pytest.fixture
def minimal_spec():
    return CalculationInput(
        charge=0, spin_multiplicity=1,
        task="energy", level_of_theory="B3LYP", basis_set="def2-SVP"
    )

Testing Input Builders

Builder tests require a different strategy from parser tests. The output is structured text — exact string comparison is brittle because whitespace and keyword ordering may change without affecting correctness.

The correct approach: assert on semantics, not on exact text.

@pytest.mark.contract
def test_tddft_keywords_present(minimal_spec, water_geometry):
    calc   = minimal_spec.set_tddft(nroots=5, singlets=True, triplets=False)
    output = calc.export("qchem", water_geometry)

    assert "cis_n_roots" in output.lower()
    assert "5" in output
    assert "$molecule" in output
    assert "$end" in output
# Brittle — breaks on any formatting change
@pytest.mark.contract
def test_tddft_exact_output(minimal_spec, water_geometry):
    calc   = minimal_spec.set_tddft(nroots=5, singlets=True, triplets=False)
    output = calc.export("qchem", water_geometry)
    assert output == EXPECTED_EXACT_STRING

For regression tests on builders, the most robust approach is round-tripping: generate the input, parse it back into structured data, and assert on the parsed values.

Running Tests

# All tests
uv run pytest

# By marker
uv run pytest -m unit
uv run pytest -m contract
uv run pytest -m "integration or regression"

# Single file
uv run pytest tests/io/qchem/qchem_parsers/test_qchem_sp_scf.py -v

# Single test
uv run pytest tests/io/qchem/qchem_parsers/test_qchem_sp_scf.py::test_scf_converged -v

# With coverage
uv run pytest --cov=calcflow --cov-report=term-missing

The Testing Pyramid

Aim for this shape:

        / \
       /   \   regression (few, expensive)
      /─────\
     /       \  integration (moderate)
    /─────────\
   /           \ contract (many, fast)
  /─────────────\
 /               \ unit (most, fastest)
/─────────────────\

Most of the safety net comes from unit and contract tests. Integration and regression tests are the final check that everything works end-to-end with real data.