Hello, as I understand i can create meta annotation models to, for example, indicate negation as the mc_status model does. I have seen the notebook “Part 4.2 - Supervised Training and Meta-annotations.ipynb” which essentially does the following:
Hi @bkakke - welcome to the CogStack discourse community!
Answers to your questions:
Yes - we have two alternative implementations so far, one using bi-lstm and another via a Transformer, i.e. BERT. you can extend the API, re-use the base classes / configs etc. for your own implementations here. Each model implementation that actually does the heavy lifting is here
yes exactly - the concept that has been identified and extracted by the NER+L MedCAT pipe, and the surrounding context.
Each model could theoretically have its own tokenizer, but in practise you can use the same BBPE (or BERTTokenzier i.e. WordPiece) tokenizer across the different tasks for which you’ll train different MetaCAT models. We use a BBPE or WordPiece tokenizer here, as these are more effective tokenization methods during this classification scenario, with BBPE / WordPiece the vocab is not driven by the clinical terminology but is built directly from the corpus, so sub-word tokens can be used and word vectors learnt, alleviating the OOV problem seen by non-subword methods, i.e. Word2Vec. Sub-word tokenization for NER+L i.e. the MedCAT problem, doesn’t make sense as we ultimately are aiming to link full tokens to somewhere in the configured terminology, we don’t care about learning a sub-word latent space that allows to perform some abstract downstream task such as classification, inference etc.
MetaCAT models are configured via the medcat.config_meta_cat.ConfigMetaCATclass. This defines what labels are being predicted, and what the label actually maps to in human readable form. Collecting these annotations (i.e. labelled data) is via the MedCATtrainer interface.