Anyone tried using MedCAT on data which is not in english?

Has anyone tried to use MedCAT on data which in is another language? What are you experiences? Would love to know!

@SanderTan said the following on a Github thread as they had been working with Dutch diacritics:

indeed we made models for Dutch language from public UMLS and SNOMED data. We documented our methods in an internal repo. We’re looking into open-sourcing this, with a few changes it might be fairly easy to generate something similar for other languages. In the meantime, you could look into downloading the UMLS database, loading it in a MySQL database, and filter the concepts for your language and source vocabularies from the MRCONSO table, and put them in the MedCAT format described at MedCAT/examples at master · CogStack/MedCAT · GitHub.

Interesting… would love to know more!

I guess as long as there is a terminology available in the respective language and a pre-trained spaCy model in that language, MedCAT should pretty much work out of the box. There is a Multi-language spaCy model available but I’ve yet to check it out.