We are aiming to understand whether we can expect that in the concept embeddings MedCAT learns, concepts that are related semantically are close to each other. Using t-SNE to visualise these embeddings we already see that concepts in similar categories (e.g. drugs, symptoms, diseases, etc.) are close to each other; however, it would be really great if concepts that are in different categories but which are related were close to each other, too (e.g. a disease and closely related drug).
We understand that the embeddings are learned based on the contexts in which these concepts appear, and since drugs and diseases are likely to appear in different contexts, we wonder if we can expect this to happen at all. Our thinking is that it might happen if these related concepts appear in each other contexts (at least in the wider ones), in which case their embeddings could slowly converge.
Which prompted a question we don’t know the answer to: during training, are evolving concept embeddings used in the context embeddings in any way (e.g. if aspirin [as a token] appears in the context of headache, and the embedding of the aspirin concept is updated, will it change the context vector of this occurrence of headache?), or are context embeddings calculated from a fixed vocabulary that maps tokens to vectors and which is never updated?
Many thanks for the insight.