I’m interested in running MedCAT and extracting all Heart Disease concepts from some clinical text: NHSDigital SNOMED CT Browser
How do I load MedCAT with these concepts and extract them from text?
How do I collect training data and fine-tune a model for this use case?
Without going into too much detail. The steps are pretty straight forward. First build a model, then train (both in the semi-supervised and supervised training steps), then extract data.
This discourse group is pretty responsive so I tend to throw questions here and someone debugs my issues within the same day! Anyway:
Create your own model:
-
Construct a model. Vocab, cdb, configs etc…
- Find a corpus of documents similar to the documents which you require information from and follow the Unsupervised training steps.
Once you have a pretrained model, time to fine-tune it…
Pre-trained Model:
- Fine tune the model through: Use MedCATtrainer to create a labelled dataset. Supervised training and fine-tuning + Meta-annotations. Also use this labelling step to create a training dataset for our own customisable meta-annotations.
- Run your model and annotate documents with the full MedCAT pipeline with MetaAnnotations
- Create fancy visualisations of the insights from big data.
- Show off your work to the MedCAT community through this discourse group
1 Like