Error Report: Training Meta Annotations with CogStack Scripts + MedCAT

Error Report: Training Meta Annotations with CogStack Scripts + MedCAT

Environment

  • MedCAT version: 1.16.0

  • Python version: 3.10

  • Workflow: Training meta_model using Cogstack scripts and project-exported annotations.

Steps to Reproduce

  1. Exported project annotations including meta annotations (e.g., Presence).

  2. Attempted to train the meta_model:

save_dir_path = "test_meta_" + meta_model  # where to save the meta_model and results
results = mc.train_from_json(mctrainer_export_path, save_dir_path=save_dir_path)

# Save results
json.dump(
    results["report"],
    open(os.path.join(save_dir_path, "meta_" + meta_model + "_results.json"), "w"),
)

  1. Encountered the following error during training.

Full Error Traceback

Exception                                 Traceback (most recent call last)
Cell In[26], line 13
    results = mc.train_from_json(mctrainer_export_path, save_dir_path=save_dir_path)

File /.../site-packages/medcat/meta_cat.py:162, in MetaCAT.train_from_json(...)
    return self.train_raw(data_loaded, save_dir_path, data_oversampled=data_oversampled)

File /.../site-packages/medcat/meta_cat.py:249, in MetaCAT.train_raw(...)
    category_name = g_config.get_applicable_category_name(data)
    if category_name is None:
        raise Exception(
            "The category name does not exist in this json file. You’ve provided ‘{}’, "
            "while the possible options are: {}. Additionally, ensure the populate the "
            "‘alternative_category_names’ attribute to accommodate for variations."
            .format(category_name, " | ".join(list(data.keys())))
        )

Exception:
The category name does not exist in this json file. You’ve provided ‘None’, while the possible options are: .
Additionally, ensure you populate the ‘alternative_category_names’ attribute to accommodate for variations.

Observations

  • In the MedCAT JSON export I can see:

    "meta_anno_defs": [
      {"name": "Presence", "values": ["False", "Hypothetical", "True"]},
      {"name": "Subject/Experiencer", "values": ["Other", "Patient", "Relative"]},
      {"name": "Time", "values": ["Future", "Past", "Recent"]}
    ],
    "relation_anno_defs": []
    
    
  • In the modelpack, the folder Presence exists as expected.

Issue

Despite having Presence defined in both the JSON export and the modelpack, training fails with:

  • category_name = None

  • Possible options list is empty ([]).

This suggests that:

  • The JSON structure may not match what MetaCAT.train_from_json expects, or

  • The category name mapping (alternative_category_names) is not being resolved correctly.

Hi Samora,

This does indeed look like it’s an issue with the trainer export format you’re using. Neither meta_anno_defs or relation_anno_defs is a defined name in the trainer export that would be used by the library.

What the library expects is a trainer export in the following format:

{
    "projects": [
        {
            "name": "<Proj-name>",
            "id": "<proj-ID>",
            "cuis": "",  # filter for cuis if needed
            "tuis": "",  # filter for type ids if needed
            "documents": [
                {
                    "name": "<Doc-name>",
                    "id": "<doc-ID>",
                    "last_modified": "<last-modified-date>",
                    "text": "<The raw text>",
                    "annotations": [
                        {
                            "id": "<ann-ID>",
                            "cui": "<CUI>",
                            "start": -1,  # start index
                            "end": -1,    # end index
                            "value": "<Annotated Value>",
                            "validated": True,  # whether validated by annotator
                            "meta_anns": {
                                "<Category-name>": {
                                    "name": "<Category-name>",
                                    "value": "<category value>",
                                    "confidence": -1.0,  # the confidence rating
                                },  # and potentially more for other categories
                            }
                        },  # and probably more annotations
                    ]
                },  # and potentially more
            ]
        },  # and potentially more
    ]
}

With that said, if the listed available categories are empty, it’s possible that the model doesn’t also have any MetaCATs loadded. You can check that by printing cat._meta_cats.