Im facing problem with using the full model i got from the UMLS.
from medcat.cat import CAT
cat = CAT.load_model_pack(‘model.zip’)
text = “Kidney disfunction”
entities = cat.get_entities(text)
print(entities)
Model loading part is getting stuck and after some time the terminal shows “killed”.
medmen_wstatus_2021_oct.zip model is working perfectly.
Increasing my ram worked. I was trying to use the full model to get the icd 10 codes but after running the full model the icd 10 codes array is empty for every entities. What i was expecting was similar to the demo page where for each entities icd 10 codes are also provided. Any extra step to do for this?
Any way to map the entities that I get from the model to its corresponding ICD 10 codes?
I cant understand the above code of preprocessing. I am actually new to python. It will be very helpful if I can get a step by step procedure.
If you download the official UMLS release files and then use the class in the module I’ve provided, you should be able to do the following:
from medcat.utils.preprocess_umls import UMLS
path_to_mrconso = '/path/to/MRCONSO.RRF' # CHANGE THIS TO REFER TO MRCONSO.RRF
path_to_mrsty = 'path/to/MRSTY.RRF' # CHANGE THIS TO REFER TO MRSTY.RRF
umls = UMLS(path_to_mrconso, path_to_mrsty)
# get the mappings
umls2icd10 = umls.map_umls2icd10()
# now you should have a pandas DataFrame that should have the UMLS concept IDs (`CUI` column) as well as the ICD10 IDs (`CODE` column).
from medcat.cat import CAT
from medcat.utils.preprocess_umls import UMLS
cat = CAT.load_model_pack(‘Models/medmen_wstatus_2021_oct.zip’)
text = “lower back pain, upper abdominal pain, headache”
entities = cat.get_entities(text, only_cui=False, addl_info=[‘cui2icd10’, ‘cui2ontologies’, ‘cui2snomed’])
path_to_mrconso = '/path/to/MRCONSO.RRF' # CHANGE THIS TO REFER TO MRCONSO.RRF
path_to_mrsty = 'path/to/MRSTY.RRF'. # CHANGE THIS TO REFER TO MRSTY.RRF
umls = UMLS(path_to_mrconso, path_to_mrsty)
# get the mappings
umls2icd10 = umls.map_umls2icd10(cui) #pass cui of each entities to this?
And you’d need to change the two lines for MRCONSO.RRF and MRSTY.RRF to point to the files on your disk that you’ve downloaded. (plus remove the . after the string).
Im not getting a big picture here. My requirement is to get icd 10 codes for a given clinical text by using the umls full model. The model is giving the entities but, icd 10 codes is empty there, how can get the icd 10 codes? How to structure my code(like where to use the above code that you showed with the code for getting the entities from the model)? What does it mean when you say you dont need the model part here. Isn’t the model is what annotates the text and gives the entities? Is the code you showed part of training the model? If that is the case do i need to train a model for my requirement to get satisfied?
As I said before, the full UMLS model does not have the ICD10 codes embedded in it.
There is no way to extract something from the model that it does not have saved with it.
So in order to get the mappings from UMLS to ICD10, you would need to use the raw UMLS files that are used during preprocessing. In the way that I described above.
So it sounds like what you would need to do is:
Get the UMLS → ICD10 mappings (this part does not require a model)
Download UMLS (I think 2022AA was used for the full model)
Use preprocessing to extract the relevant UMLS → ICD10 mappings
Create a direct dict mapping from the CUI column to the CODE column in the pandas.DataFrame
From my understanding i tried out the above code and it saved a new model in the specified folder. Still im not getting the icd 10 codes with this newly created model. Do you find anything wrong in the above code? I tried this 2019 umls full release files.
First of all, the full UMLS model was created with the 2022AA release, as I mentioned above. So the 2019 version may have a significant amount of differences.
As for why there would the “only 13k concepts”.
Is that based on the actual MRCONSO.RRF file?
If that’s the case, then there’s nothing I can do about it.
However, if you’re referring to the number of concepts that map to ICD10 in UMLS, the issue might be in the way you’ve created your mapping. What you’ve done allows any UMLS CUI to map to only one ICD10 code. There may be some CUIs that should map to more than one ICD10 code.
And there are indeed some CUIs that map to multiple ICD10 codes, as dropping duplicate CUIs after reducing the dataframe to only the columns with unique rows of CUI and CODE leads to 11552 rows.