Multiple NLP models or single one

sangeetabose · November 28, 2022, 11:31am

Hi… We are using MedCAT on clinical notes of patients. One of the options being considered was training a generalized model and multiple specialized models depending on the disease group. This needs using multiple models on the text and determining the expert for every annotation. Wanted to check if there is merit in using multiple models or stick to a single model. Are there any relevant usecases with Cogstack/MedCAT or papers that one can refer to.

anthony.shek · November 28, 2022, 4:55pm

The ideal scenario is a single model which has been trained and validated by the respective domain specialists. I.e. Cardiologists validating cardiology reports, Epileptologists validating epilepsy clinic letters etc…

Validating and training all possible concepts at once may not be possible. Therefore at the beginning there are times when you may have to create a model fork (e.g. Epilepsy specialist model) as you have only validated it across the epilepsy domain. In my experience setting up and training models by specialty is easier.

Over time you, once have gathered enough training/validation material across multiple domains from MedCATtrainer and can then retrain a single model.

I would only intentionally look to train different models in two specific circumstances:

If you expect different interpretations of the same spans of text. For example: consider the text: “The pt has Juvenile Myoclonic Epilepsy” the outputs of two different models could be:

Output model 1: Juvenile-> 59223006, Myoclonic->17450006, Epilepsy-> 84757009
Output model 2: Juvenile myoclonic epilepsy → 6204001

If you are annotating and linking with different Ontologies/terminologies e.g. dm+d, SNOMED, UMLS and/or different versions/releases of the ontology. Then I would advise to keep these as separate model, as there may be different concept representations across the same word/phrase span and will consequently clash.

sangeetabose · November 29, 2022, 3:58am

Thanks… this is useful

Jthteo · November 29, 2022, 10:56am

As Anthony says, an ideal approach would be one model one which is broadly generalisable across all domains. The direct opposite approach is a model zoo where there is one model for each task in each domain.

The main risk with overly generalising is overlapping acronyms that different specialties or clinicians use, but this is not very common (e.g. PD meaning Parkinson’s Disease to some, and Personality Disorder to others).

I think the reality is in between: a broadly generalisable one which is generalisable across most domains and focused very light fine-tuning needed for specific tasks.

Topic		Replies	Views
Adding new concepts to a trained model or re-training a MedCAT model MedCAT	9	373	January 30, 2023
MedCAT French model only matches exact terms - accuracy similarity always 1 MedCAT	7	63	June 8, 2025
MedCat meta annotation model poor functionality MedCAT	4	261	January 18, 2023
Medcat 1.7.0 trained on documents, or sentences (short documents) MedCAT	1	213	March 30, 2023
Using different scispaCy models with MedCAT MedCAT medical-ontologies	6	299	June 9, 2023

Multiple NLP models or single one

Related topics