Accessing MedCAT entities' concept embeddings

Hello. Is there a way to access the embeddings of the concepts that have been linked to the CDB?

Dear @zeljko , is there a way to access the context embeddings of the entities linked?

Hi @Hideaki , unfortunately, that is not possible, it would consume too much memory for big runs so the context embeddings are just calculated but not stored anywhere during the Linking phase. You can of course access embeddings of concepts from the CDB.

1 Like

Thank you, @zeljko. I’ve used the cat.cdb.cui2context_vectors on a few CUIs. For most of them, I am able to access the four vector types. However, I get a key error with CUI 840539006 and 91637004. Not sure what I’ve done wrong

Also, may I please check, where does the original embedding for concepts in the CDB come from?

Hi @Hideaki,

re the key error: not all CUIs have embeddings, depends did the CUI receive any training.

The original embeddings for concepts in the CDB come from the unsupervised training, have a look at the MedCAT paper it explains the training procedure and how the Vocab (word embeddings) are used to make concept embeddings.

1 Like

Thank you, @zeljko! We received a private model from a different Trust and so I used cat.multiprocessing to annotate then checked the embeddings of the concepts. I suppose those CUIs did not receive training from the original organisation.

@Hideaki just to double check if the concepts have recieved any training you can explore:

cat.cdb.cui2count_train['<CUI OF INTREREST>']

2 Likes