Issue with medcat umls full model

gneuron · April 29, 2024, 6:31am

Im facing problem with using the full model i got from the UMLS.
from medcat.cat import CAT
cat = CAT.load_model_pack(‘model.zip’)
text = “Kidney disfunction”
entities = cat.get_entities(text)
print(entities)
Model loading part is getting stuck and after some time the terminal shows “killed”.

medmen_wstatus_2021_oct.zip model is working perfectly.

mart.ratas · April 29, 2024, 3:04pm

You’re most likely running out of memory (RAM).
The full UMLS model is an extremely large one.

When I last loaded up the model, it took a whopping 31GB of memory.

EDIT: I just looked at it, and it seems it was “only” 20.8GB for the CAT instance model pack itself. This is the memray flamegraph from January of 2023:
https://mart-r.github.io/UMLSMemory/memray-flamegraph-umls.html

gneuron · May 6, 2024, 9:08am

Ohh okay, i have only got a memory of 8 gb

gneuron · May 13, 2024, 9:00am

Increasing my ram worked. I was trying to use the full model to get the icd 10 codes but after running the full model the icd 10 codes array is empty for every entities. What i was expecting was similar to the demo page where for each entities icd 10 codes are also provided. Any extra step to do for this?

mart.ratas · May 13, 2024, 9:22am

Unfortunately the ICD10 mappings weren’t created and included in the full UMLS model.

If you want these mappings, you can try and find them yourself from the raw UMLS download.
The UMLS preprocessing module might be of help:

github.com

CogStack/MedCAT/blob/master/medcat/utils/preprocess_umls.py


from typing import List, Union
import pandas as pd
import tqdm
import os
from typing import Dict

_DEFAULT_COLUMNS: list = [
    "CUI",
    "LAT",
    "TS",
    "LUI",
    "STT",
    "SUI",
    "ISPREF",
    "AUI",
    "SAUI",
    "SCUI",
    "SDUI",
    "SAB",

This file has been truncated. show original

gneuron · May 13, 2024, 12:03pm

Any way to map the entities that I get from the model to its corresponding ICD 10 codes?
I cant understand the above code of preprocessing. I am actually new to python. It will be very helpful if I can get a step by step procedure.

mart.ratas · May 13, 2024, 12:55pm

If you download the official UMLS release files and then use the class in the module I’ve provided, you should be able to do the following:

from medcat.utils.preprocess_umls import UMLS
path_to_mrconso = '/path/to/MRCONSO.RRF'  # CHANGE THIS TO REFER TO MRCONSO.RRF
path_to_mrsty = 'path/to/MRSTY.RRF'  # CHANGE THIS TO REFER TO MRSTY.RRF
umls = UMLS(path_to_mrconso, path_to_mrsty)
# get the mappings
umls2icd10 = umls.map_umls2icd10()
# now you should have a pandas DataFrame that should have the UMLS concept IDs (`CUI` column) as well as the ICD10 IDs (`CODE` column).

gneuron · May 13, 2024, 1:01pm

from medcat.cat import CAT
from medcat.utils.preprocess_umls import UMLS

cat = CAT.load_model_pack(‘Models/medmen_wstatus_2021_oct.zip’)
text = “lower back pain, upper abdominal pain, headache”
entities = cat.get_entities(text, only_cui=False, addl_info=[‘cui2icd10’, ‘cui2ontologies’, ‘cui2snomed’])

path_to_mrconso = '/path/to/MRCONSO.RRF'  # CHANGE THIS TO REFER TO MRCONSO.RRF
path_to_mrsty = 'path/to/MRSTY.RRF'. # CHANGE THIS TO REFER TO MRSTY.RRF
umls = UMLS(path_to_mrconso, path_to_mrsty)
# get the mappings
umls2icd10 = umls.map_umls2icd10(cui) #pass cui of each entities to this?

this would work?

mart.ratas · May 13, 2024, 1:37pm

You don’t need the MedCAT model for this at all.

And you’d need to change the two lines for MRCONSO.RRF and MRSTY.RRF to point to the files on your disk that you’ve downloaded. (plus remove the . after the string).

gneuron · May 13, 2024, 4:54pm

Im not getting a big picture here. My requirement is to get icd 10 codes for a given clinical text by using the umls full model. The model is giving the entities but, icd 10 codes is empty there, how can get the icd 10 codes? How to structure my code(like where to use the above code that you showed with the code for getting the entities from the model)? What does it mean when you say you dont need the model part here. Isn’t the model is what annotates the text and gives the entities? Is the code you showed part of training the model? If that is the case do i need to train a model for my requirement to get satisfied?

mart.ratas · May 14, 2024, 10:43am

As I said before, the full UMLS model does not have the ICD10 codes embedded in it.
There is no way to extract something from the model that it does not have saved with it.

So in order to get the mappings from UMLS to ICD10, you would need to use the raw UMLS files that are used during preprocessing. In the way that I described above.

So it sounds like what you would need to do is:

Get the UMLS → ICD10 mappings (this part does not require a model)
- Download UMLS (I think 2022AA was used for the full model)
- Use preprocessing to extract the relevant UMLS → ICD10 mappings
- Create a direct dict mapping from the CUI column to the CODE column in the pandas.DataFrame
- Save the mappings dict to disk
Add the UMLS → ICD10 mappings to the model
- Load up the model pack
- Set mappings at cat.cdb.addl_info['cui2icd10']
- Save model
Use the newly saved model
- It now has the UMLS → ICD10 mappings

gneuron · May 14, 2024, 11:40am

Thank you for you response. Now it seems more clear.

gneuron · May 16, 2024, 6:39am

from medcat.cat import CAT
from medcat.utils.preprocess_umls import UMLS

cat = CAT.load_model_pack("Models/medmen_wstatus_2021_oct")

path_to_mrconso = 'Models/MRCONSO.RRF'
path_to_mrsty = 'Models/MRSTY.RRF'
umls = UMLS(path_to_mrconso, path_to_mrsty)
umls2icd10 = umls.map_umls2icd10()
cat.cdb.addl_info['cui2icd10']= umls2icd10
save_folder = 'Models'
cat.create_model_pack(save_folder)

From my understanding i tried out the above code and it saved a new model in the specified folder. Still im not getting the icd 10 codes with this newly created model. Do you find anything wrong in the above code? I tried this 2019 umls full release files.

mart.ratas · May 16, 2024, 10:00am

The code sets the pandas.DataFrame in the addl_info. But the library expects a dict that maps the Snomed CUI to the ICD10 CUIs.

I explicitly mentioned this above as well:

gneuron · May 16, 2024, 10:03am

How to do that? Could you please help.

 df = umls.map_umls2icd10()
umls2icd10 = dict(zip(dataframe['CUI'], dataframe['CODE']))
cat.cdb.addl_info['cui2icd10']= umls2icd10

Does this work

mart.ratas · May 16, 2024, 11:32am

As far as I can tell, that should work.

gneuron · May 16, 2024, 11:36am

It worked. But It seems there are only 13k concepts in 2019 MRCONSO with icd 10 codes.

mart.ratas · May 16, 2024, 12:32pm

First of all, the full UMLS model was created with the 2022AA release, as I mentioned above. So the 2019 version may have a significant amount of differences.

As for why there would the “only 13k concepts”.
Is that based on the actual MRCONSO.RRF file?
If that’s the case, then there’s nothing I can do about it.
However, if you’re referring to the number of concepts that map to ICD10 in UMLS, the issue might be in the way you’ve created your mapping. What you’ve done allows any UMLS CUI to map to only one ICD10 code. There may be some CUIs that should map to more than one ICD10 code.

KimTang · May 16, 2024, 12:46pm

I think that is probably correct.

I am working with the 2023AA release right now and there are also around 13k UMLS CUIs associated with ICD10 codes:

And there are indeed some CUIs that map to multiple ICD10 codes, as dropping duplicate CUIs after reducing the dataframe to only the columns with unique rows of CUI and CODE leads to 11552 rows.
grafik

Alternatively, you can see how many concepts with ICD codes are in the current UMLS version in total in the statistics overview:
https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/ICD10/stats.html

So the current UMLS version has 11560 concepts with ICD codes (at the bottom at " Source Overlap").

gneuron · May 16, 2024, 1:00pm

Okay. So it doesn’t make a difference if I choose 2023 umls files instead of 2019 files.

Topic		Replies	Views
Error messages when uploading current UMLS full model to MedCATrainer MedCAT	5	14	January 13, 2025
Medcat trained models issues MedCAT	5	258	January 16, 2024
Trouble creating ICD10 codes mappings for NER MedCAT	3	124	April 2, 2024
MedCAT model for SNOMED-CT MedCAT medical-ontologies	2	363	June 20, 2023
MedCAT Large CDB Upload Failure MedCAT	3	147	March 23, 2023

Issue with medcat umls full model

Related topics