Error Report: Training Meta Annotations with CogStack Scripts + MedCAT

Error Report: Training Meta Annotations with CogStack Scripts + MedCAT

Environment

  • MedCAT version: 1.16.0

  • Python version: 3.10

  • Workflow: Training meta_model using Cogstack scripts and project-exported annotations.

Steps to Reproduce

  1. Exported project annotations including meta annotations (e.g., Presence).

  2. Attempted to train the meta_model:

save_dir_path = "test_meta_" + meta_model  # where to save the meta_model and results
results = mc.train_from_json(mctrainer_export_path, save_dir_path=save_dir_path)

# Save results
json.dump(
    results["report"],
    open(os.path.join(save_dir_path, "meta_" + meta_model + "_results.json"), "w"),
)

  1. Encountered the following error during training.

Full Error Traceback

Exception                                 Traceback (most recent call last)
Cell In[26], line 13
    results = mc.train_from_json(mctrainer_export_path, save_dir_path=save_dir_path)

File /.../site-packages/medcat/meta_cat.py:162, in MetaCAT.train_from_json(...)
    return self.train_raw(data_loaded, save_dir_path, data_oversampled=data_oversampled)

File /.../site-packages/medcat/meta_cat.py:249, in MetaCAT.train_raw(...)
    category_name = g_config.get_applicable_category_name(data)
    if category_name is None:
        raise Exception(
            "The category name does not exist in this json file. You’ve provided ‘{}’, "
            "while the possible options are: {}. Additionally, ensure the populate the "
            "‘alternative_category_names’ attribute to accommodate for variations."
            .format(category_name, " | ".join(list(data.keys())))
        )

Exception:
The category name does not exist in this json file. You’ve provided ‘None’, while the possible options are: .
Additionally, ensure you populate the ‘alternative_category_names’ attribute to accommodate for variations.

Observations

  • In the MedCAT JSON export I can see:

    "meta_anno_defs": [
      {"name": "Presence", "values": ["False", "Hypothetical", "True"]},
      {"name": "Subject/Experiencer", "values": ["Other", "Patient", "Relative"]},
      {"name": "Time", "values": ["Future", "Past", "Recent"]}
    ],
    "relation_anno_defs": []
    
    
  • In the modelpack, the folder Presence exists as expected.

Issue

Despite having Presence defined in both the JSON export and the modelpack, training fails with:

  • category_name = None

  • Possible options list is empty ([]).

This suggests that:

  • The JSON structure may not match what MetaCAT.train_from_json expects, or

  • The category name mapping (alternative_category_names) is not being resolved correctly.

Hi Samora,

This does indeed look like it’s an issue with the trainer export format you’re using. Neither meta_anno_defs or relation_anno_defs is a defined name in the trainer export that would be used by the library.

What the library expects is a trainer export in the following format:

{
    "projects": [
        {
            "name": "<Proj-name>",
            "id": "<proj-ID>",
            "cuis": "",  # filter for cuis if needed
            "tuis": "",  # filter for type ids if needed
            "documents": [
                {
                    "name": "<Doc-name>",
                    "id": "<doc-ID>",
                    "last_modified": "<last-modified-date>",
                    "text": "<The raw text>",
                    "annotations": [
                        {
                            "id": "<ann-ID>",
                            "cui": "<CUI>",
                            "start": -1,  # start index
                            "end": -1,    # end index
                            "value": "<Annotated Value>",
                            "validated": True,  # whether validated by annotator
                            "meta_anns": {
                                "<Category-name>": {
                                    "name": "<Category-name>",
                                    "value": "<category value>",
                                    "confidence": -1.0,  # the confidence rating
                                },  # and potentially more for other categories
                            }
                        },  # and probably more annotations
                    ]
                },  # and potentially more
            ]
        },  # and potentially more
    ]
}

With that said, if the listed available categories are empty, it’s possible that the model doesn’t also have any MetaCATs loadded. You can check that by printing cat._meta_cats.

Hi Mart,

Thanks for the response,

Loading the model with train supervised working with cogstack notebook and inspecting the cat._meta_cats: [{ “Category Name”: “Presence”, “Description”: “No description”, “Classes”: { “False”: 2, “Hypothetical”: 1, “True”: 0 }, “Model”: “lstm” }, { “Category Name”: “Time”, “Description”: “No description”, “Classes”: { “Future”: 0, “Past”: 2, “Recent”: 1 }, “Model”: “lstm” }, { “Category Name”: “Subject”, “Description”: “No description”, “Classes”: { “Other”: 0, “Patient”: 1 }, “Model”: “lstm” }].

The issue has arisen when using the meta annotation training script.

The trainer export was trainer v2.22.1.

Shubam is looking into this with the export so i’ll leave this for reference.

Thanks again.