Using type IDs with the snomedct model


I am trying to run a set of sentences through a medcat model to get a list of SCTIDs from the snomed-ct medcat model, based on type IDs.

I am following the example at link - GitHub & BitBucket HTML Preview - Annotating documents with the full medCAT pipeline

Instead of the model in the example (“”), I am using “mc_modelpack_snomed_int_16_mar_2022…zip”.

In order to filter by type ID, I am using a TUI (from SNOMED-CT_Analysis/Exploring a SNOMED-CT Release.ipynb at master · tomolopolis/SNOMED-CT_Analysis · GitHub) for clinical finding ( T-02000 Clinical finding (finding)) instead of the ones used in the example (such as T047, T048). But , this doesn’t work, and I get a KeyError for the TUI used. I suspect the TUI is not recognised as a type_id similar to the umls ones used in the example, but unsure how to go ahead at this point so I can get SCTIDs based on a type ID. Any suggestions, or has anyone done something similar with the snomed-ct model?

medcat 1.5.0
python 3.10.5




I’m not fully familiar with what the T-02000 refers to or whether/where it is stored.

But the type_id field of the the SNOMED-CT CDB is written here:

The description is simply gathered from the parenthesis of the name:

I am not sure whether/where there would be a list of what the type IDs correspond to. But if you find a concept with the correct type-name in the parentheses then you should be able to use that one.
You may have to look into addl_info['cui2original_names'] to find the original names with the brackets.

A subset of SNOMED TUIs and their possible names (I looked through the addl_info['cui2original_names'] for them, but didn’t check too thoroughly) I’ve got saved from something I ran locally:


Thanks so much for this! The subset of type_ids that you’ve shared in the end is exactly what I needed, but didn’t know where to find them. So its good to know for the future. Really appreciate your help!


Hi there,

This is so useful! I’m still getting to grips with coding and things. May I ask how you generated this list? Did you reverse the hash function of the Semantic Tags?


Unfortunately I didn’t do anything that exhaustive.

I just had a bunch of annotated data and ran through the CUIs that were annotated. And I simply extracted the type from the brackets in the names. Though there were sometimes multiple names with bracketed parts so it wasn’t too straight forward.

1 Like

Dear @zeljko, can you help with this?

1 Like

Hi @Hideaki,

I’m not sure which CDB are you using, most versions have the following field: cat.cdb.addl_info['type_id2name'] this is a map from TUI (or type_id) to the name. Unfortunately not all CDBs have this as we did not have it standardised. If your CDB does not have this field please post the CDB name here and I can try to find the mapping.

1 Like

Thank you, @zeljko. I used the cat.cdb.add1_info[‘type_id2name’] for my SNOMED-CT cdb. This method generated a dictionary which I used as a lookup operation to populate my dataframe of CUIs and percentage of documents where the CUI is mentioned. Hope i used it correctly