Context vectors and the 4 context types

So a quick question about the many context vectors found within: unsupervised_trained_cdb.cui2context_vectors. There appears to be 4 context types: [‘xlong’, ‘long’, ‘medium’, ‘short’] .

My question is that after unsupervised training the total number of CUIs ( calculated from: len(cat.cui2count_train)) which have received some form of training is 37521. When I do len(cat.cui2context_vector) I notice that there is also 37521 CUIs here.

However, when I break it down but each context type the counts are not all 37521.
Counter({'xlong': 39228, 'long': 39228, 'medium': 39220, 'short': 39092})

In short, my question is, why doesn’t every concept that receives training, assigned one of each context type? Is there a specific reason for this?

Thanks in advance!

1 Like