How to improve recall and make medcat find correct word combinations?

I have set up a medcat system locally with the prebuilt UMLS (umls_sm_wstatus_2021_oct) and i am looking to find disorders.
I am wondering why the medcat system is having issues to correctly find texts like these:

  • premature ventricular contractions (here it finds only the word contractions, where as another place in the same text its able to find "occasional premature ventricular contractions ")
  • known drug allergies (here it does not find anything)
  • acute distress (here it does not find anything)
  • frequent ectopic beats (here it finds only the words ectopic beats)
  • mild epigastric and right upper quadrant tenderness (here it finds only the word tenderness)
    I have many more example where i dont quite understand why medcat is having issues.

Do i need to tweak the medcat setup somehow? It seems to be especially the recall which is weak…How do i improve this?

Hi @bkakke

So the publicly available models are minimally trained on public data. MIMIC-III if I remember correctly.
Currently medcat only returns the “most similar” concept to a span of text.

I would have a first look at the cat.cdb.config and see if the prebuilt models configuration can be optimised for your usecase.

If you have access to data then I would conduct some unsupervised training.

Lastly have you checked out MedCATtrainer? This is a supervised training step for the models where annotators can label a subsample of documents with UMLS concepts, from which you can train and produce performance metrics. If you haven’t already done so I would highly recommend using this tool.

Hi Anthony,
Yes i believe its mimic III - but thats also a relatively big dataset i think - isnt it?
As i remember its something like 2.1 million documents. Is a bigger dataset required you think?

In the cat.cdb.config there are quite a lot of options. Which ones do you recommend to tweak?

I do have some data. I have the share clef data (ShARe/CLEF eHealth 2013) which i use for benchmarking. When you say " I would conduct some unsupervised training", do you then mean by using the MedCATtrainer or something else?

Yes that’s correct mimic III. 2.1 million documents is more than enough and will be representative of your dataset (mimic II I believe).

So when the model has been run across a large corpus of documents, using one of your examples, contractions would have been encountered significantly more than premature ventricular contractions. There is a config setting where there is a weight towards more frequently encountered concepts. (see below). The model then will present the single “most similar” concept across a phrase.

To double check this you can use cat.cdb.cui2count_train[<cui>] to check the number of times the model has encounter the concept.

Configs
The main configs I look at when exploring are:
cat.config.linking['prefer_primary_name']
cat.config.linking['prefer_frequent_concepts'] # make sure to reduce this value when trained on large datasets as more frequent short concepts will be frequently encountered
cat.config.General['spell_check_len_limit:']
cat.config.Ner['min_name_len']
cat.config.Ner['min_name_len']
cat.config.Ner['check_upper_case_names']
cat.config.Ner['upper_case_limit_len']

Lastly what I forgot to mention is that if you don’t want to retrieve all concepts. Try and add a white list filter to the model, to only retrieve CUI’s present in your list:

cat.config.linking['filters'] = {'cuis':<set of cuis here>}

For all configs have a look here.
Finding the right configuration balance for all concepts is not so easy though. But have an explore and let us know what you have found works best for your use case.

Another point is further training via MedCATtrainer (supervised training) to increase the number of examples exposed to the model in this process thus changing the models preference from one concept to another.

Thanks for those pointers… as an example i am trying with the following:

from medcat.cat import CAT
cat = CAT.load_model_pack('./umls_sm_wstatus_2021_oct/umls_sm_wstatus_2021_oct')
cat.config.linking['prefer_primary_name'] = 0.35
cat.config.linking['prefer_frequent_concepts'] = 0.05 # make sure to reduce this value when trained on large datasets as more frequent short concepts will be frequently encountered
# cat.config.General['spell_check_len_limit:']
# cat.config.Ner['min_name_len']
# cat.config.Ner['min_name_len']
# cat.config.Ner['check_upper_case_names']
# cat.config.Ner['upper_case_limit_len']
text = "21 year old with delayed pp hemorrhage"
print(cat.get_entities(text))

Where it finds only “Hemorrhage” and is supposed to find “delayed pp hemorrhage”. I have tried adjust on all the parameters you suggested. Are you able to make it find “delayed pp hemorrhage” in that sentence?

Sounds like you need to use MedCATtrainer to recognise that novel tri-gram (as well as other synonymous phrases)

The current setup of medcat struggles with major changes to the key phrase.

@jthteo you are right here. MedCATtrainer would help here.

Alternatively if you know a list of potential phrase variations you can add them straight into the model

aha thats interesting.
So, it sounds like I need to use MedCATtrainer more.
Regarding MedCATtrainer, what is the normal workflow for it?
By that I mean, how much text should I annotate and what is the way to figure out what text should be annotated?
Is the normal workflow that i sit and find all the cases where medcat fails and then correct it in the MedCATtrainer basically?

thank you for your help

Have a look at the template workflows present here:

Generally producing a annotated dataset can be used to create a “gold standard” in which you can use to train models on and benchmark new models against.

Re. your other questions:

By that I mean, how much text should I annotate and what is the way to figure out what text should be annotated?

This is a hard question to answer as it depends. It depends on the number of variations that your concept may be represented as, and their alternative meanings: e.g. The term Seizure would require a lot of training because it can be represented as “fit”, “attack”, “sz”, “episode”… etc and these terms may be confused with alternative terms (The patient is healthy and “fit”) which can have a completely different meaning. Where as Epilepsy will have few representations which do not overlap with other concepts.

How accurate is accurate enough? will ultimately depend on your own usecase.

Is the normal workflow that i sit and find all the cases where MedCAT fails and then correct it in the MedCATtrainer basically?

Yes that is generally correct. Training concepts which already perform well may not be the best use of one’s time.
As part of the workflow, It is always good to validate the performance of a model first across datasets, as documents may be written differently across different departments, hospitals etc and cover different varieties of diseases and use different acronyms and expressions.

Aha, great. Thank you for this clarification :slight_smile:

Follow up question:
If I already have some annotated documents saved in format X, can I use them to train medcat (probably by converting to whatever format medcat accepts)?
What I mean is, can i train medcat without using the medcat trainer UI if i have pre annotated documents somehow?

Absolutely!

If you want I can send across a template for an example MCTtrainer export? You can then copy that format.

ah yes - that would be great. :slight_smile:

Thank you

@anthony.shek Do you think you could post an example of the format in here and how to import and use it?

Yes sure. When I have a moment today Ill post something here :slight_smile:

I’ll point out that an example MedCATTrainer export is available on the MedCAT repo (tests->resources):

ah great…and then i can just use it directly in the train here: MedCAT/meta_cat.py at master · CogStack/MedCAT · GitHub

correct?