MetaCAT - Issue with training when there's more than 2 output classes

As above, I keep getting the error ‘Target 2 is out of bounds’ when trying to train a MetaCAT model with 3 possible outputs. From this thread, IndexError: Target 2 is out of bounds - #12 by Vijaya_kumar - vision - PyTorch Forums, I believe it is to do with the input tensor being the wrong size but I’m struggling to debug it looking at the git hub code. This happens when using the model.train_raw and model.train_from_json functions. Any thoughts?

Thanks

I think the issue is that the meta_cat_config.model.nclasses value is 2 while the number of of keys in meta_cat_config.general.category_value2id is 3.
So if you set meta_cat_config.model.nclasses = 3 this should be resolved.

It seems to struggle when you try to condense several classes into one. In my case this was:

  • config_metacat.general[‘category_value2id’] = {‘patient’: 0, ‘family’: 1, ‘other’: 1}
  • config_metacat.model[‘nclasses’] = 2

I’ve had to edit the source code for metacat to fix this

I do believe this is not something it is designed to do. The expectation is to have a one-to-one mapping for category values to the corresponding ID.

With that said, since release 1.16 (in a pre-release state right now - so not available on PyPI yet) we will be supporting alternative_category_names. The change was introduced here: Updates for MetaCAT by shubham-s-agarwal · Pull Request #515 · CogStack/MedCAT · GitHub
You can look at the docs here.

Thanks, I’ll have a look. FYI the condensing of classes works with the BERT MetaCAT models, just not the LSTM ones. I’m trying to debug this at the moment as they share the same pipeline so it seems to be a model specific issue.

If they work with something, then that’s just a happy accident. They’re not designed to work that way.