Medcat trained models issues

Hi! Thank you for the interest and the questions.

  1. The UMLS big model doesn’t seem to be trained on the diabetes concept. While I don’t know why that would be (it is mentioned quite a lot in the MIMIC-III training data), given it’s not received training, the result is not unexpected since the model doesn’t know the context in which to expect it.
    • You can check how much training a CUI has received by checking cat.cdb.cui2count_train[cui], though note that you may want to check if the CUI is in the dict first since it won’t be if it’s received no training
      • If you don’t know the CUI for a concept/name, you can find it from cat.cdb.name2cuis[name] - though do bare in mind that many names are ambiguous and may refer to multiple concepts
    • You can also use cat.cdb.name2count_train in a similar manner for the name
    • Here’s some (20) of the most-trained concepts for this model, along with their corresponding names and train counts:
      • C0392360 (rationale, indication, with~indication, indication~of~contextual~qualifier, indications, reasons, reason, justification) 1202508
        C4288581 (notable, noted ) 880647
        C2826258 (subject~continuance, cont ) 810844
        C0205397 (seen ) 669220
        C4745084 (medical~condition ) 500584
        C0587081 (laboratory~report, lab~findings, interpretation~laboratory~test, laboratory~findings, interpretation~laboratory~tests, laboratory~test~observations, laboratory~test~interpretation, laboratory~test~finding, interpretation~of~laboratory~tests, test~result, laboratory~test~observation, labs, interpretation~of~laboratory~test, laboratory~finding, lab~finding, lab~result, laboratory~test~findings, laboratory~test~result) 468121
        C0043084 (weanings, wean, weaning, ablactation, weaned) 398961
        C0184666 (admitting, admission, admits, hospital~admissions, admissions, hospital~admission, admit~to~hospital, admissions~hospital, admit, admission~to~hospital, admission~hospital, admitted~hospital, hospitalization~admission, admitted) 388589
        C1514756 (receiving, receive, received ) 386376
        C1533810 (placement, placed, placement~action, place) 379965
        C5553941 (aper, specimen~appearance~assessment, specimen~appearance, appear, appearance) 359490
        C1292718 (is~a, is~a~attribute ) 357654
        C2986914 (nonclinical~study~title, stitle ) 356354
        C0746591 (mitral ) 341402
        C0220825 (evaluation~procedure, evaluations, assessment, evaluate, evaluated, investigation, effectiveness~assessment, evaluation, efficacy~assessment) 334406
        C2081612 (explanation~of~plan~:~medication, medication~:, plan~:~medication~treatment, plan~:~medication) 324319
        C0699992 (lasix ) 317946
        C4698386 (intubated ) 307567
        C1707455 (compare, comparison, compared ) 304762
        C2317096 (spo2~saturation~of~peripheral~oxygen, peripheral~oxygen~saturation, spo2, saturation~of~peripheral~oxygen) 292541

    • As such, the following (nonsensical) sentence does correctly work for NER:
      • Hospital admissions have been going up due to lab funding going down
      • {'entities': {0: {'pretty_name': 'Hospital Environment', 'cui': 'C0019994', 'type_ids': ['T073', 'T093'], 'types': ['', ''], 'source_value': 'Hospital', 'detected_name': 'hospital', 'acc': 0.99, 'context_similarity': 0.99, 'start': 0, 'end': 8, 'icd10': [], 'ontologies': ['NCI', 'MEDLINEPLUS', 'SNOMEDCT_US', 'RCD', 'CHV', 'PSY', 'LCH', 'LNC', 'NCI_FDA', 'CSP', 'MTH', 'HL7V3.0', 'MSH', 'LCH_NW', 'NCI_CDISC', 'AOD', 'SNMI'], 'snomed': [], 'id': 0, 'meta_anns': {}}}, 'tokens': []}
  2. This is a known issue (e.g MedCAT model for SNOMED-CT).
    • Older models initialised something as a set where a dict was expected
      • And newer versions catch this discrepency
    • The current fix (for medcat 1.8.0+) is to run:
      • python -m medcat.utils.versioning fix-config <model_pack_path> <new_model_pack_path>
    • We have patched this in the current development branch but have yet to release it (probably soon in 1.10.0).