Thank you for your quick and detailed reply. I realize I forgot to include the details of the training I performed. Here it is: I trained the model on approximately 27k documents (2k + 21k + 2k + 300), all focused on psychiatry, with varied origins and lengths.
I did follow the workflow you mentioned — and I would be very happy to share my files with the community once everything is functional. I also followed your advice to analyze the training process, and it appears to be running correctly. Here is a sample of the training logs for a few documents — perhaps you’ll notice something I missed? Despite this, the detected concepts still consistently return an accuracy
and context_similarity
of 1.
Training log
Maybe annotating name: hypothyroïdie
DEBUG:medcat.ner.vocab_based_annotator:Maybe annotating name: hypothyroïdie
NER detected an entity.
Detected name: hypothyroïdie
Link candidates: ['C0020676']
DEBUG:medcat.ner.vocab_based_annotator:NER detected an entity.
Detected name: hypothyroïdie
Link candidates: ['C0020676']
Maybe annotating name: traitement
DEBUG:medcat.ner.vocab_based_annotator:Maybe annotating name: traitement
NER detected an entity.
Detected name: traitement
Link candidates: ['C2350609']
DEBUG:medcat.ner.vocab_based_annotator:NER detected an entity.
Detected name: traitement
Link candidates: ['C2350609']
Maybe annotating name: traitement~par
DEBUG:medcat.ner.vocab_based_annotator:Maybe annotating name: traitement~par
NER detected an entity.
Detected name: traitement~par
Link candidates: ['C0678054', 'C1112342']
DEBUG:medcat.ner.vocab_based_annotator:NER detected an entity.
Detected name: traitement~par
Link candidates: ['C0678054', 'C1112342']
Maybe annotating name: crampe
DEBUG:medcat.ner.vocab_based_annotator:Maybe annotating name: crampe
NER detected an entity.
Detected name: crampe
Link candidates: ['C0026821', 'C4324339']
DEBUG:medcat.ner.vocab_based_annotator:NER detected an entity.
Detected name: crampe
Link candidates: ['C0026821', 'C4324339']
Maybe annotating name: fourmillements
DEBUG:medcat.ner.vocab_based_annotator:Maybe annotating name: fourmillements
NER detected an entity.
Detected name: fourmillements
Link candidates: ['C0016579']
DEBUG:medcat.ner.vocab_based_annotator:NER detected an entity.
Detected name: fourmillements
Link candidates: ['C0016579']
Maybe annotating name: examen~physique
DEBUG:medcat.ner.vocab_based_annotator:Maybe annotating name: examen~physique
NER detected an entity.
Detected name: examen~physique
Link candidates: ['C0031809']
DEBUG:medcat.ner.vocab_based_annotator:NER detected an entity.
Detected name: examen~physique
Link candidates: ['C0031809']
Maybe annotating name: absence
DEBUG:medcat.ner.vocab_based_annotator:Maybe annotating name: absence
NER detected an entity.
Detected name: absence
Link candidates: ['C0235956', 'C4316903']
DEBUG:medcat.ner.vocab_based_annotator:NER detected an entity.
Detected name: absence
Link candidates: ['C0235956', 'C4316903']
[...]
Updating CUI: C0020676 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0020676 with negative=False
Updating CUI: C2350609 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C2350609 with negative=False
Updating CUI: C2350609, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C2350609, with 0 negative words
Updating CUI: C2350609, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C2350609, with 0 negative words
Updating CUI: C2350609, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C2350609, with 0 negative words
Updating CUI: C2350609, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C2350609, with 0 negative words
Updating CUI: C0016579 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0016579 with negative=False
Updating CUI: C0016579, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0016579, with 0 negative words
Updating CUI: C0016579, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0016579, with 0 negative words
Updating CUI: C0016579, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0016579, with 0 negative words
Updating CUI: C0016579, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0016579, with 0 negative words
Updating CUI: C0031809 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0031809 with negative=False
Updating CUI: C0031809, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0031809, with 0 negative words
Updating CUI: C0031809, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0031809, with 0 negative words
Updating CUI: C0031809, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0031809, with 0 negative words
Updating CUI: C0031809, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0031809, with 0 negative words
Updating CUI: C0751115 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0751115 with negative=False
Updating CUI: C0027853 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0027853 with negative=False
Updating CUI: C0234146 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0234146 with negative=False
Updating CUI: C0151888 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0151888 with negative=False
Updating CUI: C0151888, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0151888, with 0 negative words
Updating CUI: C0151888, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0151888, with 0 negative words
Updating CUI: C0151888, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0151888, with 0 negative words
Updating CUI: C0151888, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0151888, with 0 negative words
Updating CUI: C1260928 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C1260928 with negative=False
Updating CUI: C1260928, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C1260928, with 0 negative words
Updating CUI: C1260928, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C1260928, with 0 negative words
Updating CUI: C1260928, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C1260928, with 0 negative words
Updating CUI: C1260928, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C1260928, with 0 negative words
Updating CUI: C0853374 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0853374 with negative=False
Updating CUI: C0853374, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0853374, with 0 negative words
Updating CUI: C0853374, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0853374, with 0 negative words
Updating CUI: C0853374, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0853374, with 0 negative words
Updating CUI: C0853374, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0853374, with 0 negative words
Updating CUI: C0856592 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856592 with negative=False
Updating CUI: C0235000 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0235000 with negative=False
Updating CUI: C0235000, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0235000, with 0 negative words
Updating CUI: C0235000, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0235000, with 0 negative words
Updating CUI: C0235000, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0235000, with 0 negative words
Updating CUI: C0235000, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0235000, with 0 negative words
Updating CUI: C0240991 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0240991 with negative=False
Updating CUI: C0240991, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0240991, with 0 negative words
Updating CUI: C0240991, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0240991, with 0 negative words
Updating CUI: C0240991, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0240991, with 0 negative words
Updating CUI: C0240991, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0240991, with 0 negative words
Updating CUI: C0541939 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0541939 with negative=False
Updating CUI: C0541939, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0541939, with 0 negative words
Updating CUI: C0541939, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0541939, with 0 negative words
Updating CUI: C0541939, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0541939, with 0 negative words
Updating CUI: C0541939, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0541939, with 0 negative words
Updating CUI: C0200631 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0200631 with negative=False
Updating CUI: C0005778 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0005778 with negative=False
Updating CUI: C0005778, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0005778, with 0 negative words
Updating CUI: C0005778, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0005778, with 0 negative words
Updating CUI: C0005778, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0005778, with 0 negative words
Updating CUI: C0005778, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0005778, with 0 negative words
Updating CUI: C0221423 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0221423 with negative=False
Updating CUI: C0024198 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0024198 with negative=False
Updating CUI: C0856593 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856593 with negative=False
Updating CUI: C0856593, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856593, with 0 negative words
Updating CUI: C0856593, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856593, with 0 negative words
Updating CUI: C0856593, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856593, with 0 negative words
Updating CUI: C0856593, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856593, with 0 negative words
Updating CUI: C0856592 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856592 with negative=False
Updating CUI: C0856592, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856592, with 0 negative words
Updating CUI: C0856592, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856592, with 0 negative words
Updating CUI: C0856592, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856592, with 0 negative words
Updating CUI: C0856592, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856592, with 0 negative words
Updating CUI: C0553794 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0553794 with negative=False
Updating CUI: C0019348 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0019348 with negative=False
Updating CUI: C0019348, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0019348, with 0 negative words
Updating CUI: C0019348, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0019348, with 0 negative words
Updating CUI: C0019348, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0019348, with 0 negative words
Updating CUI: C0019348, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0019348, with 0 negative words
Updating CUI: C0008049 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0008049 with negative=False
Updating CUI: C0856593 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856593 with negative=False
Updating CUI: C0541939 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0541939 with negative=False
Updating CUI: C0856955 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856955 with negative=False
Updating CUI: C0856955, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856955, with 0 negative words
Updating CUI: C0856955, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856955, with 0 negative words
Updating CUI: C0856955, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856955, with 0 negative words
Updating CUI: C0856955, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856955, with 0 negative words
Updating CUI: C0152025 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0152025 with negative=False
Updating CUI: C0152025, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0152025, with 0 negative words
Updating CUI: C0152025, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0152025, with 0 negative words
Updating CUI: C0152025, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0152025, with 0 negative words
Updating CUI: C0152025, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0152025, with 0 negative words
Updating CUI: C0271681 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0271681 with negative=False
Updating CUI: C0271681, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0271681, with 0 negative words
Updating CUI: C0271681, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0271681, with 0 negative words
Updating CUI: C0271681, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0271681, with 0 negative words
Updating CUI: C0271681, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0271681, with 0 negative words
Updating CUI: C0541939 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0541939 with negative=False
Updating CUI: C0541939, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0541939, with 0 negative words
Updating CUI: C0541939, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0541939, with 0 negative words
Updating CUI: C0541939, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0541939, with 0 negative words
Updating CUI: C0541939, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0541939, with 0 negative words
Updating CUI: C0040405 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0040405 with negative=False
Updating CUI: C0040405 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0040405 with negative=False
Updating CUI: C0040405, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0040405, with 0 negative words
Updating CUI: C0040405, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0040405, with 0 negative words
Updating CUI: C0040405, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0040405, with 0 negative words
Updating CUI: C0040405, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0040405, with 0 negative words
Updating CUI: C0040405 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0040405 with negative=False
Updating CUI: C5208163 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C5208163 with negative=False
Updating CUI: C0497156 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0497156 with negative=False
Updating CUI: C0497156 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0497156 with negative=False
Updating CUI: C0040405 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0040405 with negative=False
Updating CUI: C0040405, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0040405, with 0 negative words
Updating CUI: C0040405, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0040405, with 0 negative words
Updating CUI: C0040405, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0040405, with 0 negative words
Updating CUI: C0040405, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0040405, with 0 negative words
Updating CUI: C0194884 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0194884 with negative=False
Updating CUI: C0497156 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0497156 with negative=False
Updating CUI: C0497156, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0497156, with 0 negative words
Updating CUI: C0497156, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0497156, with 0 negative words
Updating CUI: C0497156, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0497156, with 0 negative words
Updating CUI: C0497156, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0497156, with 0 negative words
Updating CUI: C0441633 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0441633 with negative=False
Updating CUI: C0024485 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0024485 with negative=False
Updating CUI: C1518156 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C1518156 with negative=False
Updating CUI: C5849587 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C5849587 with negative=False
Updating CUI: C5849587, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C5849587, with 0 negative words
Updating CUI: C5849587, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C5849587, with 0 negative words
Updating CUI: C5849587, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C5849587, with 0 negative words
Updating CUI: C5849587, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C5849587, with 0 negative words
Updating CUI: C0014038 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0014038 with negative=False
Updating CUI: C0338430 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0338430 with negative=False
Updating CUI: C0854581 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0854581 with negative=False
Updating CUI: C0021044 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0021044 with negative=False
Updating CUI: C0856592 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856592 with negative=False
Updating CUI: C0856592, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856592, with 0 negative words
Updating CUI: C0856592, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856592, with 0 negative words
Updating CUI: C0856592, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856592, with 0 negative words
Updating CUI: C0856592, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856592, with 0 negative words
Updating CUI: C0856593 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856593 with negative=False
Updating CUI: C0856593, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856593, with 0 negative words
Updating CUI: C0856593, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856593, with 0 negative words
Updating CUI: C0856593, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856593, with 0 negative words
Updating CUI: C0856593, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0856593, with 0 negative words
Updating CUI: C0041618 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0041618 with negative=False
Updating CUI: C0041618, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0041618, with 0 negative words
Updating CUI: C0041618, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0041618, with 0 negative words
Updating CUI: C0041618, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0041618, with 0 negative words
Updating CUI: C0041618, with 0 negative words
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0041618, with 0 negative words
Updating CUI: C0014038 with negative=False
DEBUG:medcat.linking.vector_context_model:Updating CUI: C0014038 with negative=False
Updating CUI: C0014038, with 0 negative words
[...]
INFO:medcat.cdb:{
"Number of concepts": 62818,
"Number of names": 121285,
"Number of concepts that received training": 7811,
"Number of seen training examples in total": 802217,
"Average training examples per concept": 102.70349507105364
}
Do you think this could be due to a problem in the training phase, or that my training data (in terms of quantity or quality) might be insufficient?
On a related note, I’d also like to ask about contextualization — specifically regarding the detection of present/absent status for each identified concept. In the MedCAT tutorial for this step, it looks like a JSON file generated using medcattrainer
is required. Can you confirm whether using this feature in MedCAT necessarily requires manually training a model via medcattrainer
? Or is there another way to implement it?
Thanks again for your responsiveness. I’m really looking forward to fully using MedCAT in French and would be happy to share our results with the community.