Anyone tried using MedCAT on data which is not in english?

anthony.shek · March 31, 2022, 6:07pm

Has anyone tried to use MedCAT on data which in is another language? What are you experiences? Would love to know!

Jthteo · April 1, 2022, 2:43pm

@SanderTan said the following on a Github thread as they had been working with Dutch diacritics:

indeed we made models for Dutch language from public UMLS and SNOMED data. We documented our methods in an internal repo. We’re looking into open-sourcing this, with a few changes it might be fairly easy to generate something similar for other languages. In the meantime, you could look into downloading the UMLS database, loading it in a MySQL database, and filter the concepts for your language and source vocabularies from the MRCONSO table, and put them in the MedCAT format described at https://github.com/CogStack/MedCAT/tree/master/examples.

anthony.shek · April 3, 2022, 5:04pm

Interesting… would love to know more!

I guess as long as there is a terminology available in the respective language and a pre-trained spaCy model in that language, MedCAT should pretty much work out of the box. There is a Multi-language spaCy model available but I’ve yet to check it out.

Topic		Replies	Views
Is there some guides or examples to help implement MedCAT in other language other than English? MedCAT	3	312	May 16, 2023
Install Dutch spaCy model for MedCATtrainer MedCAT	3	191	January 18, 2024
MedCAT French model only matches exact terms - accuracy similarity always 1 MedCAT	7	133	June 8, 2025
How to improve recall and make medcat find correct word combinations?	15	488	January 20, 2023
Using different scispaCy models with MedCAT MedCAT medical-ontologies	6	391	June 9, 2023

Anyone tried using MedCAT on data which is not in english?

Related topics