MedCAT library and model version compatibility

We are aware that we’ve had a few issues with newer MedCAT versions not being compatible with models created in older versions.
As of release v1.10 this should no longer be the case.

This means we are strongly encouraging everyone to be using the latest version (medcat==1.10 at the time of writing).

This post will try to highlight the kind of issues we found, which models and versions of the library they affect, and the steps we have taken to mitigate them.

  • Incompatible spacy model
    • Affects: Loading newer models (after 1.7.0) with older medcat version (before 1.7.0).
    • Details: MedCAT models saved as a model pack package with them a spacy model. If that spacy model is newer than the spacy version that is installed along with MedCAT, the model fails to load. The spacy version (and the underlying spacy model) were updated in MedCAT v1.7.0. While full backwards compatibility is not guaranteed by spacy, we have not observed issues using older spacy models with newer spacy.
    • Mitigation: Running the latest version of the MedCAT library.
      • NOTE: We understand this may not have always been an option due to other incompatibilities, but now should be the best time to upgrade
  • Incompatible config (validation error upon load)
    • Affects: Loading models created prior to MedCAT v1.4.0 with library between 1.6.0 and 1.9.3 (included)
    • Details: Originally the linking filters in the config was set to an empty dict instead of an empty set. In v1.4.0 release this was fixed. And in 1.6.0 a change was made that validated the data to make sure everything is of the correct type. Because of this, older models weren’t able to be loaded directly.
      • NOTE: There was a workaround released that allowed users to manually patch this issue in 1.8.0. However, that turned out to not be the optimal way.
    • Mitigation: The newest version now automatically converts the offending value on the fly.
  • Incompatible dill version
    • Affects: Loading newer models (after 1.7.0) with older medcat version (before 1.7.0).
    • Details: In MedCAT 1.7.0, the dill dependency version was loosened. Because of this, new installs will install a newer dill version which somewhat change the way files are saved on disk. Thus, older dill versions will not be able to load newer models.
      • NOTE: This may technically still be an issue when migrating from an older MedCAT version to a newer one since MedCAT still allows the older version of dill, and when upgrading the library, the dependency is thus already met. In such cases, I’d recommend installing explicitly (pip install dill).
    • Mitigation: Running the latest version of the MedCAT library.
  • [NEW: 2024/02/12] Incompatible transformers version
    • Affects: Loading older de-id models in medcat v1.9.3 or v1.10.0
    • Details: The de-id models created in older medcat versions (pre v1.9.3) package and ship with an older transformers model which does not fully support the newer version of the library.
    • Mitigation: There’s a fix in the works that will be in the next release (after v1.10.0)

The above issues were discovered upon some testing that was done over the last month or so.

Some caveats on validation methodology:

  • Only CAT.get_entities was called
    • So it’s possible there are some other issues
  • Tested for 6 different models
    • Models from 2022 and 2023
  • Library version 1.5.3 to 1.10.0 was used
  • Python 3.9 was used
    • So other versions could have different results
  • For each library version, a fresh environment was used
    • Otherwise some dependency versions may leak from previous installs

Notes on upgrading MedCAT library:

  • When installing a newer version, we recommend installing with compatible release clause
    • To allow for installation of latest patch release for a specific minor release
    • i.e pip install medcat~= 1.10.0
  • When upgrading to a newer version you may run into some incompatibilities
    • I.e some dependencies may retain their older versions
      • This is something we will try to address
    • It can be beneficial to start from a fresh environment

PS:
If you find issues running the latest MedCAT library alongside any model (other than the legacy 0.x models) please do let us know. Only then can we work on fixing the issue.

3 Likes

Just an FYI on library version and de-id model compatibility.
The previously (pre medcat v1.9.3) trained de-id model does not work with the library versions v1.9.3 and v1.10.0.
There is a fix in the works and this will be included in the next release.

PS:
By “does not work” I mean it will not de-identify any documents. This is due to an underlying change in how transformers handles its models. Since the model gets packaged within the model pack, the older model is not fully compatible with the newer version of the library. However, this failure is generally obfuscated from the user (for some reason this does not raise an exception but rather fails silently).

1 Like