Has anyone tried to use MedCAT on data which in is another language? What are you experiences? Would love to know!

@SanderTan said the following on a Github thread as they had been working with Dutch diacritics:

indeed we made models for Dutch language from public UMLS and SNOMED data. We documented our methods in an internal repo. We’re looking into open-sourcing this, with a few changes it might be fairly easy to generate something similar for other languages. In the meantime, you could look into downloading the UMLS database, loading it in a MySQL database, and filter the concepts for your language and source vocabularies from the MRCONSO table, and put them in the MedCAT format described at MedCAT/examples at master · CogStack/MedCAT · GitHub.

Interesting… would love to know more!

I guess as long as there is a terminology available in the respective language and a pre-trained spaCy model in that language, MedCAT should pretty much work out of the box. There is a Multi-language spaCy model available but I’ve yet to check it out.