Context vectors and the 4 context types

Michale_Angelo · October 7, 2022, 4:27pm

So a quick question about the many context vectors found within: unsupervised_trained_cdb.cui2context_vectors. There appears to be 4 context types: [‘xlong’, ‘long’, ‘medium’, ‘short’] .

My question is that after unsupervised training the total number of CUIs ( calculated from: len(cat.cui2count_train)) which have received some form of training is 37521. When I do len(cat.cui2context_vector) I notice that there is also 37521 CUIs here.

However, when I break it down but each context type the counts are not all 37521.
Counter({'xlong': 39228, 'long': 39228, 'medium': 39220, 'short': 39092})

In short, my question is, why doesn’t every concept that receives training, assigned one of each context type? Is there a specific reason for this?

Thanks in advance!

Topic		Replies	Views
Accessing MedCAT entities' concept embeddings MedCAT	10	333	January 3, 2024
Self-supervised MedCAT model MedCAT	4	275	June 12, 2023
Negative accuracy in annotation suggestion? MedCAT	7	275	July 3, 2023
Advice on MedCAT for a small set of concepts MedCAT	2	236	June 26, 2023
New paper citing MedCAT MedCAT	4	232	October 21, 2022

Context vectors and the 4 context types

Related topics