Using type IDs with the snomedct model

Hello,

I am trying to run a set of sentences through a medcat model to get a list of SCTIDs from the snomed-ct medcat model, based on type IDs.

I am following the example at link - GitHub & BitBucket HTML Preview - Annotating documents with the full medCAT pipeline

Instead of the model in the example (“medmen_wstatus_2021_oct.zip”), I am using “mc_modelpack_snomed_int_16_mar_2022…zip”.

In order to filter by type ID, I am using a TUI (from SNOMED-CT_Analysis/Exploring a SNOMED-CT Release.ipynb at master · tomolopolis/SNOMED-CT_Analysis · GitHub) for clinical finding ( T-02000 Clinical finding (finding)) instead of the ones used in the example (such as T047, T048). But , this doesn’t work, and I get a KeyError for the TUI used. I suspect the TUI is not recognised as a type_id similar to the umls ones used in the example, but unsure how to go ahead at this point so I can get SCTIDs based on a type ID. Any suggestions, or has anyone done something similar with the snomed-ct model?

versions:
medcat 1.5.0
python 3.10.5

Thanks!

Jaya

Hi,

I’m not fully familiar with what the T-02000 refers to or whether/where it is stored.

But the type_id field of the the SNOMED-CT CDB is written here:

The description is simply gathered from the parenthesis of the name:

I am not sure whether/where there would be a list of what the type IDs correspond to. But if you find a concept with the correct type-name in the parentheses then you should be able to use that one.
You may have to look into addl_info['cui2original_names'] to find the original names with the brackets.

PS:
A subset of SNOMED TUIs and their possible names (I looked through the addl_info['cui2original_names'] for them, but didn’t check too thoroughly) I’ve got saved from something I ran locally:

3 Likes

Thanks so much for this! The subset of type_ids that you’ve shared in the end is exactly what I needed, but didn’t know where to find them. So its good to know for the future. Really appreciate your help!

2 Likes

Hi there,

This is so useful! I’m still getting to grips with coding and things. May I ask how you generated this list? Did you reverse the hash function of the Semantic Tags?

-Hideaki

Unfortunately I didn’t do anything that exhaustive.

I just had a bunch of annotated data and ran through the CUIs that were annotated. And I simply extracted the type from the brackets in the names. Though there were sometimes multiple names with bracketed parts so it wasn’t too straight forward.

1 Like

Dear @zeljko, can you help with this?

1 Like

Hi @Hideaki,

I’m not sure which CDB are you using, most versions have the following field: cat.cdb.addl_info['type_id2name'] this is a map from TUI (or type_id) to the name. Unfortunately not all CDBs have this as we did not have it standardised. If your CDB does not have this field please post the CDB name here and I can try to find the mapping.

1 Like

Thank you, @zeljko. I used the cat.cdb.add1_info[‘type_id2name’] for my SNOMED-CT cdb. This method generated a dictionary which I used as a lookup operation to populate my dataframe of CUIs and percentage of documents where the CUI is mentioned. Hope i used it correctly