Re using Vocabulary files

I got access to your “SNOMED INT enriched with UMLS and trained unsupervised on MIMIC-III” model.

  1. Just to be sure, I want to confirm that it is alright to use its vocabulary to build a new model alongside our existing SNOMED-Canada-based CDB. We created this CDB in-house.
  2. Could you also confirm the corpus that was used to create this vocab, please (UMLS Metathesaurus+WIkipedia)?
  3. I’m curious, do you think creating a new vocabulary using Wikipedia and MIMIC-III together would lead to a noticeable boost in performance?

Thank you

  1. That should be fine. In fact, I’d be surprised if there hadn’t already been people that have reused this before.
  2. That should indeed be the corpuse the Vocab is based on.
  3. It’s hard to tell. You can always try it out if you’re curious. The Vocab is used for context embeddings for concepts. And as such, if you create one that better captures the emebeddings of the relevant words (or one that simply has embeddings for more words) it could very well lead to better performance. And I’m sure it’s possible to do. Though I don’t know whether or not it’s easy.
1 Like