How to map SNOMED IDs to UMLS Semantic Types

I am currently using MedCAT with the SNOMED International (Full SNOMED modelpack trained on MIMIC-III) model. The extracted entities are of the following form:

'pretty_name': 'Diverticulitis',
  'cui': '18126004',
  'type_ids': ['33782986'],
  'types': ['morphologic abnormality'],
  'source_value': 'Diverticulitis',
  'detected_name': 'diverticulitis',
  'acc': 0.99,
  'context_similarity': 0.99,
  'start': 0,
  'end': 14,
  'icd10': [],
  'ontologies': ['SNOMED-CT'],
  'snomed': [],
  'id': 0,

Is there a way to get the UMLS Semantic Types (in this case T047) from the SNOMED IDs (in this case 18126004). I could search from BioPortal (such as https://bioportal.bioontology.org/ontologies/SNOMEDCT/?p=classes&lang=en&conceptid=http%3A%2F%2Fpurl.bioontology.org%2Fontology%2FSNOMEDCT%2F18126004&jump_to_nav=true for this case), but I want an automatic way of doing so. I wonder if there is an existing mapping for such a task.

I also copy the code below to show how I got the entities:

from medcat.cat import CAT
from medcat.utils.preprocess_umls import UMLS

cat = CAT.load_model_pack("mc_modelpack_snomed_int_16_mar_2022_25be3857ba34bdd5.zip")
text = "Diverticulitis, sigmoid colon, with colo-ovarian fistula formation and left ovarian abscess. Mesothelium-lined cyst in parovarian adhesions"

entities_data = cat.get_entities(text, only_cui=False, addl_info=["cui2icd10", "cui2ontologies", "cui2snomed"])
entities = entities_data["entities"]

The reason I wasn’t using the UMLS Full model is because the UMLS Full model cannot even extract Diverticulitis in my case.
My ultimate goal is to extract any entities of type Disease or Syndrome (UMLS Semantic Types T047 or T050). If I am over-coding here, please tell the right way to do so. Thank you.

Hi!

I’ve looked into this a little bit to figure out why this may be happening.

First of all, I looked at the UMLS model you’re using to figure out why the term wasn’t being picked up.

It turns out the reason is the following:

  1. The term hasn’t seen any training
  2. The term is marked in the CDB as a not preferred name that needs to be disambiguated
    • If it had been a preferred name, it’d have been linked regardless

And the reason it’s not been trained is also because the is marked as not preferred and would thus need to be disambiguated.

Now, you can mark this name/concept pair to be preferred if you like (cat.cdb.name2cuis2status['diverticulitis']['C0012813'] = 'P') but that is not a scalable solution in general.
You can also do this in general for all names that represent only 1 concept, i.e:

def fix_single_cui_names(cdb, old_val='N', new_val='P'):
    changes = 1
    for cui2status in cdb.name2cuis2status.values():
        if len(cui2status) == 1:
            (name, val), = list(cui2status.items())
            if val == old_val:
                cui2status[name] = new_val
                changes += 1
    return changes

NOTE: This would change 1 026 503 names’ (out of 3 080 845) status to preferred over not-preferred in the UMLS model.
However, this is a bad idea, at least in general.
This will force the model to assume the the context is correct for every name that refers to a single concept even when it’s not actually sure. For instance, the UMLS term C0325112 has names {'jaguar', 'panthera~onca', 'jaguars'}. And the name jaguar can only refer to the concept C0325112. If the concept’s status is changed to preferred (P). the model will confidently identify this word in the sentence “My friend bought a Jaguar for his 50th birthday” as the UMLS term C0325112 for the organism as defined within it. And that’s simply not correct.
Now, this is a pretty nonsensical example, but I’ve just brought this up to illuminate why this might not be the best thing to do.

The other (also bad!) option is to cat.config.linking.similarity_threshold = -1.1 to allow any matching concepts even when there’s perfect un-alignment (i.e the opposite meaning) between the current context and that of the trained concept; or when there’s no learned knowledge on a term that needs to be disambiguated (in which case the similarity is set at -1).
This would - somewhat obviously - be a terrible change since the model would find something to link to anything that it can match to a name of a term in the CDB. With no regard to context. Which is exactly what the strengths of the vector context model based approach would/should be!

Now, if you do want to use the UMLS model for this purpose, you could always gather some manual annotations and perform supervised training on the concepts that the model requires help with (i.e ones that it hasn’t seen). That way the concept would gain a context vector and the model would be able to disambiguate the term and make sure to only link it if/when appropriate.

The second thing is the mappings from the Snomed term to the UMLS semantic type.
As far as I know, such mappings do not exist natively.
You could always try to load the raw UMLS using the UMLS class you’ve already imported. Then you can use UMLS.map_umls2snomed and parse the DataFrame to figure out which Snomed concepts map to which UMLS CUIs. And after that you should be able to extract the corresponding type IDs of the CUIs from the output of UMLS.to_concept_df and/or the original UMLS model.

Hope this helps.