Medcat 1.7.0 trained on documents, or sentences (short documents)

carrpa · March 24, 2023, 9:28am

Is Medcat1.7.0 trained on sentences, or on multi-sentence documents, i.e. documents greater than 100 characters.

mart.ratas · March 30, 2023, 3:54pm

Hi,

I am not fully sure what you mean by your question. The 1.7.0 release of medcat is not a trained model (nor does it contain one), but a package that provides the means and tools for training and using of a medCAT model.

The models we have publicly available are listed in the README:

github.com

CogStack/MedCAT/blob/master/README.md#available-models

# Medical  <img src="https://github.com/CogStack/MedCAT/blob/master/media/cat-logo.png" width=45> oncept Annotation Tool

[![Build Status](https://github.com/CogStack/MedCAT/actions/workflows/main.yml/badge.svg?branch=master)](https://github.com/CogStack/MedCAT/actions/workflows/main.yml?query=branch%3Amaster)
[![Documentation Status](https://readthedocs.org/projects/medcat/badge/?version=latest)](https://medcat.readthedocs.io/en/latest/?badge=latest)
[![Latest release](https://img.shields.io/github/v/release/CogStack/MedCAT)](https://github.com/CogStack/MedCAT/releases/latest)
[![pypi Version](https://img.shields.io/pypi/v/medcat.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/medcat/)

MedCAT can be used to extract information from Electronic Health Records (EHRs) and link it to biomedical ontologies like SNOMED-CT and UMLS. Paper on [arXiv](https://arxiv.org/abs/2010.01165). 

**Official Docs [here](https://medcat.readthedocs.io/en/latest/)**

**Discussion Forum [discourse](https://discourse.cogstack.org/)**

## Available Models

We have 4 public models available:
1) UMLS Small (A modelpack containing a subset of UMLS (disorders, symptoms, medications...). Trained on MIMIC-III)
2) SNOMED International (Full SNOMED modelpack trained on MIMIC-III)
3) UMLS Dutch v1.10 (a modelpack provided by UMC Utrecht containing [UMLS entities with Dutch names](https://github.com/umcu/dutch-umls) trained on Dutch medical wikipedia articles and a negation detection model [repository](https://github.com/umcu/negation-detection/)/[paper](https://doi.org/10.48550/arxiv.2209.00470) trained on EMC Dutch Clinical Corpus).
4) UMLS Full. >4MM concepts trained self-supervsied on MIMIC-III. v2022AA of UMLS.

This file has been truncated. show original

None of these have actually been trained on medcat v1.7.0 (though should be mostly compatible with it).
All the models trained on MIMIC-III were certainly trained on documents mostly bigger than 100 characters. And I’m pretty sure many of the Dutch wikipedia would also have documents that are larger than 100 characters (referring to the Dutch UMLS model).

Topic		Replies	Views
How to improve recall and make medcat find correct word combinations?	15	315	January 20, 2023
MedCAT French model only matches exact terms - accuracy similarity always 1 MedCAT	7	62	June 8, 2025
Medcat trained models issues MedCAT	5	302	January 16, 2024
What's the best way to trial MedCAT MedCAT	3	258	April 19, 2022
Usuage of MedCat MedCAT	7	237	May 16, 2024

Medcat 1.7.0 trained on documents, or sentences (short documents)

Related topics