Schema Versioning¶

CalcFlow serializes CalculationInput and CalculationResult to JSON. Schema versioning ensures that JSON produced by an older version of CalcFlow can still be loaded by a newer version.

Two Version Fields¶

Every serialized object includes two version fields:

{
  "calcflow_version": "0.5.0",
  "schema_version": 2,
  ...
}

Field	Type	Purpose
`calcflow_version`	string (semver)	Provenance — which version of the library produced this dump. Never used in loading logic.
`schema_version`	integer	Structural compatibility — drives migration logic on load.

These are independent. A patch release that fixes a parser bug does not touch schema_version. A minor release that renames a field does.

Where the Constants Live¶

# calcflow/common/results.py
RESULT_SCHEMA_VERSION: int = 2

# calcflow/common/input.py
INPUT_SCHEMA_VERSION: int = 1

Each model tracks its own version independently. A change to CalculationResult's schema does not require bumping INPUT_SCHEMA_VERSION.

When to Bump `schema_version`¶

Bump when an old dump would fail to load correctly with the new code:

Renaming a field
Removing a field
Restructuring a nested object (e.g. changing a flat dict to a nested object)
Changing a field's type (e.g. float to str)
Adding a required field (one with no default)

Do not bump when old dumps remain loadable without changes:

Adding an optional field with a default value
Fixing a parser bug (the schema structure is unchanged)
Changing a default value
Renaming an internal variable that doesn't appear in serialized output

The rule of thumb

If from_dict(old_dump) would raise an error or silently produce wrong data, bump schema_version. If it would work correctly, don't.

Writing a Migration¶

Migrations live in the _migrate() class method of each model. They are sequential: v1→v2, then v2→v3, and so on. Never skip a step.

# calcflow/common/results.py

RESULT_SCHEMA_VERSION = 3

@classmethod
def _migrate(cls, data: dict) -> dict:
    version = data.get("schema_version", 1)

    if version < 2:
        # v1 -> v2: geometry fields changed from bare atom lists to Geometry objects
        for key in ("input_geometry", "final_geometry"):
            if isinstance(data.get(key), list):
                data[key] = {"comment": "", "atoms": data[key]}
        version = 2

    if version < 3:
        # v2 -> v3: renamed "atomic_charges" list items from "method_name" to "method"
        for charge_dict in data.get("atomic_charges", []):
            if "method_name" in charge_dict:
                charge_dict["method"] = charge_dict.pop("method_name")
        version = 3

    data["schema_version"] = version
    return data

The migration is called by from_dict() before deserializing:

@classmethod
def from_dict(cls, data: dict) -> "CalculationResult":
    data = cls._migrate(dict(data))  # shallow copy before mutating
    # ... deserialize fields ...

Detecting Forward Incompatibility¶

If a dump has a schema_version higher than the current RESULT_SCHEMA_VERSION, the code cannot safely load it — it was produced by a newer version of CalcFlow that added fields this version doesn't know about.

stored_version = data.get("schema_version", 1)
if stored_version > RESULT_SCHEMA_VERSION:
    raise ConfigurationError(
        f"Cannot load result: schema version {stored_version} is newer than "
        f"this version of CalcFlow supports ({RESULT_SCHEMA_VERSION}). "
        f"Please upgrade calcflow."
    )

Example: Renaming a Field (Bump Required)¶

Renaming CalculationResult.scf_results to CalculationResult.scf:

Bump RESULT_SCHEMA_VERSION from 2 to 3.

Add a migration step in _migrate():

if version < 3:
    if "scf_results" in data:
        data["scf"] = data.pop("scf_results")
    version = 3

Update to_dict() to use the new key "scf".
Old dumps with "scf_results" will be transparently migrated.