March 24, 2023, 9:28am
Is Medcat1.7.0 trained on sentences, or on multi-sentence documents, i.e. documents greater than 100 characters.
I am not fully sure what you mean by your question. The 1.7.0 release of
medcat is not a trained model (nor does it contain one), but a package that provides the means and tools for training and using of a medCAT model.
The models we have publicly available are listed in the README:
This file has been truncated.
# Medical <img src="https://github.com/CogStack/MedCAT/blob/master/media/cat-logo.png" width=45> oncept Annotation Tool
MedCAT can be used to extract information from Electronic Health Records (EHRs) and link it to biomedical ontologies like SNOMED-CT and UMLS. Paper on [arXiv](https://arxiv.org/abs/2010.01165).
**Official Docs [here](https://medcat.readthedocs.io/en/latest/)**
**Discussion Forum [discourse](https://discourse.cogstack.org/)**
## Available Models
We have 4 public models available:
1) UMLS Small (A modelpack containing a subset of UMLS (disorders, symptoms, medications...). Trained on MIMIC-III)
2) SNOMED International (Full SNOMED modelpack trained on MIMIC-III)
3) UMLS Dutch v1.10 (a modelpack provided by UMC Utrecht containing [UMLS entities with Dutch names](https://github.com/umcu/dutch-umls) trained on Dutch medical wikipedia articles and a negation detection model [repository](https://github.com/umcu/negation-detection/)/[paper](https://doi.org/10.48550/arxiv.2209.00470) trained on EMC Dutch Clinical Corpus).
4) UMLS Full. >4MM concepts trained self-supervsied on MIMIC-III. v2022AA of UMLS.
None of these have actually been trained on
medcat v1.7.0 (though should be
mostly compatible with it).
All the models trained on MIMIC-III were certainly trained on documents mostly bigger than 100 characters. And I’m pretty sure many of the Dutch wikipedia would also have documents that are larger than 100 characters (referring to the Dutch UMLS model).