Error in MedCATtrainer Project Setup: Missing "spacy_model"


We’re currently in the process of setting up a project in MedCATtrainer. We’ve uploaded the vocab.dat and cdb.dat from the UMLS small model. However, when attempting to open the project for annotation, we encounter the following error:

Our understanding is that the project should use the ‘en_core_web_md’ model, as in the config file (cat.general.spacy_model = 'en_core_web_md'). The ‘en_core_web_md’ model is downloaded during the container build.

We would appreciate any help in resolving this issue. Thank you

1 Like

hi @tapak - which trainer release version are you trying to run?

I’m using v2.12.3

Did anyone manage to find fixes for this problem? I am experiencing the same issue

I’m fairly certain I know what the issue is.

The earlier medcat model packs shipped with a spacy model named simply spacy_model within the model pack. The same is specified within the config.
When the medcat library loads the model pack, it modifies the value of config.general.spacy_model to point at the to the unpacked model pack folder (currently line 375 of the medcat.CAT module).
However, due to the way MedCATtrainer is built, it only loads the CDB. Not the entire model pack. Thus, this change to the name/path of the spacy model is not done (and wouldn’t be applicable since the spacy model wouldn’t be available at that location).

I’ve come up with the following workaround.

What this script does is rename the spacy model from spacy_model to en_core_web_md (the small UMLS model used the 3.1.0 version of this, but newer ones should also work) and then saves it back to disk. This will allow MCT to successfully load and use the CDB.

1 Like