Loopback between contexts and concepts during training

csep · October 24, 2022, 12:24pm

Hi,
We are aiming to understand whether we can expect that in the concept embeddings MedCAT learns, concepts that are related semantically are close to each other. Using t-SNE to visualise these embeddings we already see that concepts in similar categories (e.g. drugs, symptoms, diseases, etc.) are close to each other; however, it would be really great if concepts that are in different categories but which are related were close to each other, too (e.g. a disease and closely related drug).

We understand that the embeddings are learned based on the contexts in which these concepts appear, and since drugs and diseases are likely to appear in different contexts, we wonder if we can expect this to happen at all. Our thinking is that it might happen if these related concepts appear in each other contexts (at least in the wider ones), in which case their embeddings could slowly converge.

Which prompted a question we don’t know the answer to: during training, are evolving concept embeddings used in the context embeddings in any way (e.g. if aspirin [as a token] appears in the context of headache, and the embedding of the aspirin concept is updated, will it change the context vector of this occurrence of headache?), or are context embeddings calculated from a fixed vocabulary that maps tokens to vectors and which is never updated?

Many thanks for the insight.

Topic		Replies	Views
Cosine similarity and word2vec MedCAT	0	209	October 24, 2022
Accessing MedCAT entities' concept embeddings MedCAT	10	352	January 3, 2024
Relationship between concepts MedCAT rel-cat	4	295	June 19, 2022
Self-supervised MedCAT model MedCAT	4	292	June 12, 2023
Trainer stops auto annotating concepts mid way through, even exact string matches MedCAT	1	167	February 10, 2023

Loopback between contexts and concepts during training

Related topics