Key Error when running supervised training on annotations file saved with MedCATrainer

elenaP · February 13, 2023, 2:53pm

I am getting the following Key error when calling:
cat2.train_supervised(data_path=ANNOTATIONS_FILE,
nepochs=1,
reset_cui_count=False,
print_stats=True,
use_filters=True)

Error:
File ~/project/venv/lib/python3.8/site-packages/medcat/cat.py:869, in CAT.train_supervised(self, data_path, reset_cui_count, nepochs, print_stats, use_filters, terminate_last, use_overlaps, use_cui_doc_limit, test_size, devalue_others, use_groups, never_terminate, train_from_false_positives, extra_cui_filter, checkpoint, is_resumed)
866 train_set, test_set, _, _ = make_mc_train_test(data, self.cdb, test_size=test_size)
868 if print_stats > 0:
→ 869 fp, fn, tp, p, r, f1, cui_counts, examples = self._print_stats(test_set,
870 use_project_filters=use_filters,
871 use_cui_doc_limit=use_cui_doc_limit,
872 use_overlaps=use_overlaps,
873 use_groups=use_groups,
…
22 if ‘irrelevant’ in a and a[‘irrelevant’]:
—> 23 i_cuis.add(a[‘cui’])
24 return i_cuis

KeyError: ‘cui’

The annotations file was saved without text, with MedCATrainer 2.5.3.
I am using a model saved with MedCat 1.5.0 after unsupervised training on local Neurlogy documents, based on a KCL fine tuned model created in v1.2.8.

elenaP · March 8, 2023, 8:11pm

I am still stuck on this one, any suggestions please?

elenaP · March 20, 2023, 12:40pm

I’ve upgraded medcat to 1.7.0 but getting the same error.
Could this be a problem with how the annotations file was exported fro MedCATrainer? I used v 2.5.3.

tomolopolis · March 23, 2023, 10:33am

Hi @elenaP - apologies for the delay in response here!

This should have a better error message here. I’ll fix that, but you cannot train medcat on a saved Trainer export without text. The surrounding text each concept is used in the training process to adapt the concept embedding. Can you try and export the data again - with text - and then train?

elenaP · April 19, 2023, 12:43am

Thanks. I ran it on annotations saved with text this time but still stumbled on this KeyError when training. Let me know if more details would be helpful. It looks like most people don’t get issues with supervised training so wondering if my fine tunned model could be the cause. Any suggestions would be much appreciated.

Thank you,
Elena

tomolopolis · April 19, 2023, 3:42pm

Hi @elenaP - the same KeyError as the above stacktrace?

elenaP · April 19, 2023, 7:16pm

Good point, it’s a different KeyError this time:

File ~/project/venv/lib/python3.8/site-packages/medcat/cat.py:944, in CAT.train_supervised(self, data_path, reset_cui_count, nepochs, print_stats, use_filters, terminate_last, use_overlaps, use_cui_doc_limit, test_size, devalue_others, use_groups, never_terminate, train_from_false_positives, extra_cui_filter, retain_extra_cui_filter, checkpoint, retain_filters, is_resumed)
941 train_set, test_set, _, _ = make_mc_train_test(data, self.cdb, test_size=test_size)
943 if print_stats > 0:
→ 944 fp, fn, tp, p, r, f1, cui_counts, examples = self._print_stats(test_set,
945 use_project_filters=use_filters,
946 use_cui_doc_limit=use_cui_doc_limit,
947 use_overlaps=use_overlaps,
948 use_groups=use_groups,
949 extra_cui_filter=extra_cui_filter)
950 if reset_cui_count:
951 # Get all CUIs
952 cuis =

File ~/project/venv/lib/python3.8/site-packages/medcat/cat.py:478, in CAT._print_stats(self, data, epoch, use_project_filters, use_overlaps, use_cui_doc_limit, use_groups, extra_cui_filter)
…
49 for i_cui in i_cuis:
—> 50 cui_filter.remove(i_cui)
52 return cui_filter

KeyError: ‘255511005’

It is a concept I marked as incorrect on MedCATrainer.

I have checked and the cui exists in cdb:
cat2.cdb.cui2names[‘255511005’]

{‘elongated’, ‘has~length’, ‘long’, ‘long~qualifier~value’}

Topic		Replies	Views
Medecat Trainer Missing Annotations MedCAT	3	208	January 17, 2023
Attribute Error for Medcat Trainer MedCAT	1	139	August 23, 2023
Negative accuracy in annotation suggestion? MedCAT	7	298	July 3, 2023
MedCATtrainer error when uploading dataset MedCAT	5	166	December 19, 2022
Impact of filters on MedCAT annotations	1	169	June 30, 2023

Key Error when running supervised training on annotations file saved with MedCATrainer

Related topics