carrpa
March 24, 2023, 9:28am
1
Is Medcat1.7.0 trained on sentences, or on multi-sentence documents, i.e. documents greater than 100 characters.
Hi,
I am not fully sure what you mean by your question. The 1.7.0 release of medcat
is not a trained model (nor does it contain one), but a package that provides the means and tools for training and using of a medCAT model.
The models we have publicly available are listed in the README:
# Medical <img src="https://github.com/CogStack/MedCAT/blob/master/media/cat-logo.png" width=45> oncept Annotation Tool
[![Build Status](https://github.com/CogStack/MedCAT/actions/workflows/main.yml/badge.svg?branch=master)](https://github.com/CogStack/MedCAT/actions/workflows/main.yml?query=branch%3Amaster)
[![Documentation Status](https://readthedocs.org/projects/medcat/badge/?version=latest)](https://medcat.readthedocs.io/en/latest/?badge=latest)
[![Latest release](https://img.shields.io/github/v/release/CogStack/MedCAT)](https://github.com/CogStack/MedCAT/releases/latest)
[![pypi Version](https://img.shields.io/pypi/v/medcat.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/medcat/)
MedCAT can be used to extract information from Electronic Health Records (EHRs) and link it to biomedical ontologies like SNOMED-CT and UMLS. Paper on [arXiv](https://arxiv.org/abs/2010.01165).
**Official Docs [here](https://medcat.readthedocs.io/en/latest/)**
**Discussion Forum [discourse](https://discourse.cogstack.org/)**
## Available Models
We have 4 public models available:
1) UMLS Small (A modelpack containing a subset of UMLS (disorders, symptoms, medications...). Trained on MIMIC-III)
2) SNOMED International (Full SNOMED modelpack trained on MIMIC-III)
3) UMLS Dutch v1.10 (a modelpack provided by UMC Utrecht containing [UMLS entities with Dutch names](https://github.com/umcu/dutch-umls) trained on Dutch medical wikipedia articles and a negation detection model [repository](https://github.com/umcu/negation-detection/)/[paper](https://doi.org/10.48550/arxiv.2209.00470) trained on EMC Dutch Clinical Corpus).
4) UMLS Full. >4MM concepts trained self-supervsied on MIMIC-III. v2022AA of UMLS.
This file has been truncated. show original
None of these have actually been trained on medcat
v1.7.0 (though should be mostly compatible with it).
All the models trained on MIMIC-III were certainly trained on documents mostly bigger than 100 characters. And Iām pretty sure many of the Dutch wikipedia would also have documents that are larger than 100 characters (referring to the Dutch UMLS model).
1 Like