So a quick question about the many context vectors found within: unsupervised_trained_cdb.cui2context_vectors
. There appears to be 4 context types: [‘xlong’, ‘long’, ‘medium’, ‘short’] .
My question is that after unsupervised training the total number of CUIs ( calculated from: len(cat.cui2count_train)
) which have received some form of training is 37521. When I do len(cat.cui2context_vector)
I notice that there is also 37521 CUIs here.
However, when I break it down but each context type the counts are not all 37521.
Counter({'xlong': 39228, 'long': 39228, 'medium': 39220, 'short': 39092})
In short, my question is, why doesn’t every concept that receives training, assigned one of each context type? Is there a specific reason for this?
Thanks in advance!