Key Error when running supervised training on annotations file saved with MedCATrainer

I am getting the following Key error when calling:
cat2.train_supervised(data_path=ANNOTATIONS_FILE,
nepochs=1,
reset_cui_count=False,
print_stats=True,
use_filters=True)

Error:
File ~/project/venv/lib/python3.8/site-packages/medcat/cat.py:869, in CAT.train_supervised(self, data_path, reset_cui_count, nepochs, print_stats, use_filters, terminate_last, use_overlaps, use_cui_doc_limit, test_size, devalue_others, use_groups, never_terminate, train_from_false_positives, extra_cui_filter, checkpoint, is_resumed)
866 train_set, test_set, _, _ = make_mc_train_test(data, self.cdb, test_size=test_size)
868 if print_stats > 0:
→ 869 fp, fn, tp, p, r, f1, cui_counts, examples = self._print_stats(test_set,
870 use_project_filters=use_filters,
871 use_cui_doc_limit=use_cui_doc_limit,
872 use_overlaps=use_overlaps,
873 use_groups=use_groups,

22 if ‘irrelevant’ in a and a[‘irrelevant’]:
—> 23 i_cuis.add(a[‘cui’])
24 return i_cuis

KeyError: ‘cui’

The annotations file was saved without text, with MedCATrainer 2.5.3.
I am using a model saved with MedCat 1.5.0 after unsupervised training on local Neurlogy documents, based on a KCL fine tuned model created in v1.2.8.

I am still stuck on this one, any suggestions please?

I’ve upgraded medcat to 1.7.0 but getting the same error.
Could this be a problem with how the annotations file was exported fro MedCATrainer? I used v 2.5.3.

Hi @elenaP - apologies for the delay in response here!

This should have a better error message here. I’ll fix that, but you cannot train medcat on a saved Trainer export without text. The surrounding text each concept is used in the training process to adapt the concept embedding. Can you try and export the data again - with text - and then train?

Thanks. I ran it on annotations saved with text this time but still stumbled on this KeyError when training. Let me know if more details would be helpful. It looks like most people don’t get issues with supervised training so wondering if my fine tunned model could be the cause. Any suggestions would be much appreciated.

Thank you,
Elena

Hi @elenaP - the same KeyError as the above stacktrace?

Good point, it’s a different KeyError this time:

File ~/project/venv/lib/python3.8/site-packages/medcat/cat.py:944, in CAT.train_supervised(self, data_path, reset_cui_count, nepochs, print_stats, use_filters, terminate_last, use_overlaps, use_cui_doc_limit, test_size, devalue_others, use_groups, never_terminate, train_from_false_positives, extra_cui_filter, retain_extra_cui_filter, checkpoint, retain_filters, is_resumed)
941 train_set, test_set, _, _ = make_mc_train_test(data, self.cdb, test_size=test_size)
943 if print_stats > 0:
→ 944 fp, fn, tp, p, r, f1, cui_counts, examples = self._print_stats(test_set,
945 use_project_filters=use_filters,
946 use_cui_doc_limit=use_cui_doc_limit,
947 use_overlaps=use_overlaps,
948 use_groups=use_groups,
949 extra_cui_filter=extra_cui_filter)
950 if reset_cui_count:
951 # Get all CUIs
952 cuis =

File ~/project/venv/lib/python3.8/site-packages/medcat/cat.py:478, in CAT._print_stats(self, data, epoch, use_project_filters, use_overlaps, use_cui_doc_limit, use_groups, extra_cui_filter)

49 for i_cui in i_cuis:
—> 50 cui_filter.remove(i_cui)
52 return cui_filter

KeyError: ‘255511005’


It is a concept I marked as incorrect on MedCATrainer.

I have checked and the cui exists in cdb:
cat2.cdb.cui2names[‘255511005’]

{‘elongated’, ‘has~length’, ‘long’, ‘long~qualifier~value’}