Impact of filters on MedCAT annotations

With MedCAT, how to apply filters while creating particular projects? e.g. Cui-based filters, source value-based filters and specific semantic type filters.
What all to consider for enhancing annotations

Hi, Komal!

It can depend on what you’re trying to accomplish exactly.
I’ll start by quoting the comment from CAT.train_supervised:

When filtering, the filters within the CAT model are used first, then the ones from MedCATtrainer (MCT) export filters, and finally the extra_cui_filter (if set). That is to say, the expectation is: extra_cui_filter ⊆ MCT filter ⊆ Model/config filter.

To elaborate a little bit:

  1. Somewhat obviously, no concept can be trained if it’s in not included within the CDB.
  2. You can also set up filters in config.linking.filters. There’s some documentation here.
  • You can explicitly allow only a subset of CUIs
  • Or exclude a subset of of CUIs
  1. When performing supervised training, the MedCAT trainer export can define its own filters. These are applied on top of the filters in the config
  2. When performing supervised training, additional/extra CUI filters can be specified. These will be applied on top of the previous.

With that said, if you’re only working with a small subset of CUIs, you might be better off filtering your CDB, i.e with CDB.filter_by_cui.
The advantage of this approach is that you’d be working with a smaller model (in terms of file size on disk as well as memory footprint while using the model).
The disadvantage could be that you’d need to redo the training if/when you wanted to add new CUIs.

If you have any further questions, don’t hesitate to ask.

1 Like