Using type IDs with the snomedct model

Jaya · February 23, 2023, 2:40pm

Hello,

I am trying to run a set of sentences through a medcat model to get a list of SCTIDs from the snomed-ct medcat model, based on type IDs.

I am following the example at link - GitHub & BitBucket HTML Preview - Annotating documents with the full medCAT pipeline

Instead of the model in the example (“medmen_wstatus_2021_oct.zip”), I am using “mc_modelpack_snomed_int_16_mar_2022…zip”.

In order to filter by type ID, I am using a TUI (from SNOMED-CT_Analysis/Exploring a SNOMED-CT Release.ipynb at master · tomolopolis/SNOMED-CT_Analysis · GitHub) for clinical finding ( T-02000 Clinical finding (finding)) instead of the ones used in the example (such as T047, T048). But , this doesn’t work, and I get a KeyError for the TUI used. I suspect the TUI is not recognised as a type_id similar to the umls ones used in the example, but unsure how to go ahead at this point so I can get SCTIDs based on a type ID. Any suggestions, or has anyone done something similar with the snomed-ct model?

versions:
medcat 1.5.0
python 3.10.5

Thanks!

Jaya

mart.ratas · February 23, 2023, 3:37pm

Hi,

I’m not fully familiar with what the T-02000 refers to or whether/where it is stored.

But the type_id field of the the SNOMED-CT CDB is written here:

github.com

CogStack/MedCAT/blob/master/medcat/utils/preprocess_snomed.py#L146


      
          
          
        temp_df = active_snomed_df[active_snomed_df['name_status'] == 'P'][[
                      'cui', 'name']]
                  temp_df['description_type_ids'] = temp_df['name'].str.extract(
                      r"\((\w+\s?.?\s?\w+.?\w+.?\w+.?)\)$")
                  active_snomed_df = pd.merge(active_snomed_df, temp_df.loc[:, ['cui', 'description_type_ids']],
                                              on='cui',
                                              how='left')
                  del temp_df
          
          
        # Hash semantic tag to get a 8 digit type_id code
                  active_snomed_df['type_ids'] = active_snomed_df['description_type_ids'].apply(
                      lambda x: int(hashlib.sha256(str(x).encode('utf-8')).hexdigest(), 16) % 10 ** 8)
                  df2merge.append(active_snomed_df)
          
          
    return pd.concat(df2merge).reset_index(drop=True)
          
          
def list_all_relationships(self):
              """
              List all SNOMED CT relationships.

The description is simply gathered from the parenthesis of the name:

github.com

CogStack/MedCAT/blob/master/medcat/utils/preprocess_snomed.py#L139


      
          active_snomed_df = active_snomed_df.rename(
              columns={'id_x': 'cui', 'term': 'name', 'typeId': 'name_status'})
          active_snomed_df['ontologies'] = 'SNOMED-CT'
          active_snomed_df['name_status'] = active_snomed_df['name_status'].replace(
              ['900000000000003001', '900000000000013009'],
              ['P', 'A'])
          active_snomed_df = active_snomed_df.reset_index(drop=True)
          
          
temp_df = active_snomed_df[active_snomed_df['name_status'] == 'P'][[
              'cui', 'name']]
          temp_df['description_type_ids'] = temp_df['name'].str.extract(
              r"\((\w+\s?.?\s?\w+.?\w+.?\w+.?)\)$")
          active_snomed_df = pd.merge(active_snomed_df, temp_df.loc[:, ['cui', 'description_type_ids']],
                                      on='cui',
                                      how='left')
          del temp_df
          
          
# Hash semantic tag to get a 8 digit type_id code
          active_snomed_df['type_ids'] = active_snomed_df['description_type_ids'].apply(
              lambda x: int(hashlib.sha256(str(x).encode('utf-8')).hexdigest(), 16) % 10 ** 8)
          df2merge.append(active_snomed_df)

I am not sure whether/where there would be a list of what the type IDs correspond to. But if you find a concept with the correct type-name in the parentheses then you should be able to use that one.
You may have to look into addl_info['cui2original_names'] to find the original names with the brackets.

PS:
A subset of SNOMED TUIs and their possible names (I looked through the addl_info['cui2original_names'] for them, but didn’t check too thoroughly) I’ve got saved from something I ran locally:

gist.github.com

https://gist.github.com/mart-r/02af4f79f10b56492d21ecda746c7597

A subset of type_ids for the MedCAT 1.2 model based on SNOMED-CT

81102976  : organism  
91187746  : substance 
28321150  : procedure 
37552161  : body structure
9090192   : disorder  
67667581  : finding   
7882689   : qualifier value
33782986  : morphologic abnormality
32816260  : physical object
91776366  : product

This file has been truncated. show original

Jaya · February 24, 2023, 10:39am

Thanks so much for this! The subset of type_ids that you’ve shared in the end is exactly what I needed, but didn’t know where to find them. So its good to know for the future. Really appreciate your help!

Hideaki · April 5, 2023, 10:29am

Hi there,

This is so useful! I’m still getting to grips with coding and things. May I ask how you generated this list? Did you reverse the hash function of the Semantic Tags?

-Hideaki

mart.ratas · April 5, 2023, 10:43am

Unfortunately I didn’t do anything that exhaustive.

I just had a bunch of annotated data and ran through the CUIs that were annotated. And I simply extracted the type from the brackets in the names. Though there were sometimes multiple names with bracketed parts so it wasn’t too straight forward.

Hideaki · April 7, 2023, 1:10pm

Dear @zeljko, can you help with this?

zeljko · April 7, 2023, 1:59pm

Hi @Hideaki,

I’m not sure which CDB are you using, most versions have the following field: cat.cdb.addl_info['type_id2name'] this is a map from TUI (or type_id) to the name. Unfortunately not all CDBs have this as we did not have it standardised. If your CDB does not have this field please post the CDB name here and I can try to find the mapping.

Hideaki · April 10, 2023, 10:34am

Thank you, @zeljko. I used the cat.cdb.add1_info[‘type_id2name’] for my SNOMED-CT cdb. This method generated a dictionary which I used as a lookup operation to populate my dataframe of CUIs and percentage of documents where the CUI is mentioned. Hope i used it correctly

Topic		Replies	Views
How to map SNOMED IDs to UMLS Semantic Types MedCAT medical-ontologies	1	69	January 27, 2025
Cat.get_entities() not finding ICU mentions MedCAT	2	192	July 20, 2022
Trouble creating ICD10 codes mappings for NER MedCAT	3	155	April 2, 2024
Usuage of MedCat MedCAT	7	236	May 16, 2024
MedCAT model for SNOMED-CT MedCAT medical-ontologies	2	425	June 20, 2023

Using type IDs with the snomedct model

Related topics