Has anyone tried to use MedCAT on data which in is another language? What are you experiences? Would love to know!
@SanderTan said the following on a Github thread as they had been working with Dutch diacritics:
indeed we made models for Dutch language from public UMLS and SNOMED data. We documented our methods in an internal repo. We’re looking into open-sourcing this, with a few changes it might be fairly easy to generate something similar for other languages. In the meantime, you could look into downloading the UMLS database, loading it in a MySQL database, and filter the concepts for your language and source vocabularies from the MRCONSO table, and put them in the MedCAT format described at https://github.com/CogStack/MedCAT/tree/master/examples.
Interesting… would love to know more!
I guess as long as there is a terminology available in the respective language and a pre-trained spaCy model in that language, MedCAT should pretty much work out of the box. There is a Multi-language spaCy model available but I’ve yet to check it out.