Medcat trained models issues

mart.ratas · January 8, 2024, 10:01am

Hi! Thank you for the interest and the questions.

The UMLS big model doesn’t seem to be trained on the diabetes concept. While I don’t know why that would be (it is mentioned quite a lot in the MIMIC-III training data), given it’s not received training, the result is not unexpected since the model doesn’t know the context in which to expect it.
- You can check how much training a CUI has received by checking cat.cdb.cui2count_train[cui], though note that you may want to check if the CUI is in the dict first since it won’t be if it’s received no training
  - If you don’t know the CUI for a concept/name, you can find it from cat.cdb.name2cuis[name] - though do bare in mind that many names are ambiguous and may refer to multiple concepts
- You can also use cat.cdb.name2count_train in a similar manner for the name
- Here’s some (20) of the most-trained concepts for this model, along with their corresponding names and train counts:
  - C0392360 (rationale, indication, with~indication, indication~of~contextual~qualifier, indications, reasons, reason, justification) 1202508
    C4288581 (notable, noted ) 880647
    C2826258 (subject~continuance, cont ) 810844
    C0205397 (seen ) 669220
    C4745084 (medical~condition ) 500584
    C0587081 (laboratory~report, lab~findings, interpretation~laboratory~test, laboratory~findings, interpretation~laboratory~tests, laboratory~test~observations, laboratory~test~interpretation, laboratory~test~finding, interpretation~of~laboratory~tests, test~result, laboratory~test~observation, labs, interpretation~of~laboratory~test, laboratory~finding, lab~finding, lab~result, laboratory~test~findings, laboratory~test~result) 468121
    C0043084 (weanings, wean, weaning, ablactation, weaned) 398961
    C0184666 (admitting, admission, admits, hospital~admissions, admissions, hospital~admission, admit~to~hospital, admissions~hospital, admit, admission~to~hospital, admission~hospital, admitted~hospital, hospitalization~admission, admitted) 388589
    C1514756 (receiving, receive, received ) 386376
    C1533810 (placement, placed, placement~action, place) 379965
    C5553941 (aper, specimen~appearance~assessment, specimen~appearance, appear, appearance) 359490
    C1292718 (is~a, is~a~attribute ) 357654
    C2986914 (nonclinical~study~title, stitle ) 356354
    C0746591 (mitral ) 341402
    C0220825 (evaluation~procedure, evaluations, assessment, evaluate, evaluated, investigation, effectiveness~assessment, evaluation, efficacy~assessment) 334406
    C2081612 (explanation~of~plan~:~medication, medication~:, plan~:~medication~treatment, plan~:~medication) 324319
    C0699992 (lasix ) 317946
    C4698386 (intubated ) 307567
    C1707455 (compare, comparison, compared ) 304762
    C2317096 (spo2~saturation~of~peripheral~oxygen, peripheral~oxygen~saturation, spo2, saturation~of~peripheral~oxygen) 292541
- As such, the following (nonsensical) sentence does correctly work for NER:
  - Hospital admissions have been going up due to lab funding going down
  - {'entities': {0: {'pretty_name': 'Hospital Environment', 'cui': 'C0019994', 'type_ids': ['T073', 'T093'], 'types': ['', ''], 'source_value': 'Hospital', 'detected_name': 'hospital', 'acc': 0.99, 'context_similarity': 0.99, 'start': 0, 'end': 8, 'icd10': [], 'ontologies': ['NCI', 'MEDLINEPLUS', 'SNOMEDCT_US', 'RCD', 'CHV', 'PSY', 'LCH', 'LNC', 'NCI_FDA', 'CSP', 'MTH', 'HL7V3.0', 'MSH', 'LCH_NW', 'NCI_CDISC', 'AOD', 'SNMI'], 'snomed': [], 'id': 0, 'meta_anns': {}}}, 'tokens': []}
This is a known issue (e.g MedCAT model for SNOMED-CT).
- Older models initialised something as a set where a dict was expected
  - And newer versions catch this discrepency
- The current fix (for medcat 1.8.0+) is to run:
  - python -m medcat.utils.versioning fix-config <model_pack_path> <new_model_pack_path>
- We have patched this in the current development branch but have yet to release it (probably soon in 1.10.0).

Topic		Replies	Views
MedCAT French model only matches exact terms - accuracy similarity always 1 MedCAT	7	66	June 8, 2025
How to improve recall and make medcat find correct word combinations?	15	315	January 20, 2023
Medcat 1.7.0 trained on documents, or sentences (short documents) MedCAT	1	214	March 30, 2023
Issue with medcat umls full model MedCAT	20	310	May 16, 2024
Adding new concepts to a trained model or re-training a MedCAT model MedCAT	9	375	January 30, 2023

Medcat trained models issues

Related topics