Negative accuracy in annotation suggestion?

drk · February 16, 2023, 5:58am

Hi,
While nlp training for a particular concept (with UMLS cui filter), there have been multiple incidences where the highlighted text shows negative accuracy of the concept.
What does it mean? How the number eventually turn out?

drk · March 6, 2023, 6:15am

Hi, any update on this?
Hi,
While nlp training for a particular concept (with UMLS cui filter), there have been multiple incidences where the highlighted text shows negative accuracy of the concept.
What does it mean? How the number eventually turn out?

jkgenser · June 8, 2023, 11:43pm

I’m also getting something similar. I have a CUI for falling and it’s annotated the word “filling” with -0.2 accuracy.

When I run the same CDB in python without using the webapp, this entity doesn’t get extracted. I’ll plan on digging into the medcat trainer code soon but here to +1 it.

mart.ratas · June 21, 2023, 9:49am

I’ve been trying to get to the bottom of this by looking into the code of MedCATtrainer and MedCAT.

I don’t see MedCATtrainer changing the value anywhere, so I’d be inclined to believe it comes from MedCAT itself.

Looking at MedCAT, I can see that the default value for the Span extension is set to -1.

However, the only way I can see a negative value set to the spam extension (which is where MCT grabs it from) is if
a) config.linking.similarity_threshold is less than 0
or
b) both

self.config.linking.similarity_threshold_type = 'dynamic'
and cdb.cui2average_confidence[cui] < 0 for the specific CUI

I find it fairly unlikely for a) to be the case in a real world scenario (though it may be worth a double check).

So that’d leave me with b).
Can anyone experiencing this issue verify if they’ve got self.config.linking.similarity_threshold_type set to 'dynamic'?

The value of cdb.cui2average_confidence[cui] could be less than 0 for a CUI if it’s not been trained enough (which results in similarity of -1) or if the CUI has no context vector (again, similarity -1).
And it could subsequently rise from -1 if it then is trained enough (more than self.config.linking.train_count_threshold) since then it would report a positive similarity and update the cui2average_confidence dict.

mart.ratas · June 29, 2023, 1:48pm

I’ve looked into this in a little more detail.

The similarity can actually become negative. In principle, the similarity can vary between -1 and 1.
By default, MedCAT is using 4 different context types (short, medium, long, and xlong) with corresponding weights (0.1, 0,4, 0.4, and 0.1, respectively).
For each context type, the learned context vector for the CUI and the context vector in the document are taken. They are then converted into unit vectors and a dot product of the latter is calculated.
A weighed average (effectively) is taken of these dot products (though it could technically happen that not all context types have a corresponding context vector).
Each dot product can range from -1 to 1. It is -1 if the two unit vectors point in the opposite direction and 1 if they are pointing in the same direction. In general, the dot product will be negative if the angle between the two vectors is greater than 90 degrees.

So essentially what the negative value means is that the accuracy/similarity (at least in terms of one/some context type) is quite poor.

But as long as config.linking.similarity_threshold > 0 is set for the model’s config we shouldn’t see these values (other than the edge cases above). Is that the case for the users who are experiencing issues with this?

If you can still see negative values after setting the threshold greater to 0 (which I’ve not been able to see in my limited testing), I’d need to look into this in more detail.

jkgenser · July 1, 2023, 5:30pm

OK thanks. This explains why in the annotation view, it sometimes comes up with negative annotation but when I run the linker using the same model on the command line (or like in a python script), then I don’t receive link results since by default in that context it definitely has threshold > 0.2 or something like that.

Maybe the threshold is different for default configs for a brand new medcat annotator service?

mart.ratas · July 3, 2023, 8:27am

By default, MedCATtrainer should take the threshold value from the model. And if that doesn’t exist, it defaults to 0.2 as well.

So on paper, it should behave identically to running the same text through a straight up python script.

So the threshold seems to be lower on the trainer for some reason in this case. But I don’t really know why that would be.

anthony.shek · July 3, 2023, 3:45pm

In MedCATTrainer please checkout the configs. Edit as appropriate.

github.com

CogStack/MedCATtrainer/blob/master/configs/base.txt

cat.linking.optim = {'type': 'standard', 'lr': 0.1}
cat.linking.filter_before_disamb = True
# 20 - INFO; 10 - DEBUG
cat.general.log_level = 20
# Recommended is to have this one negative
cat.linking.similarity_threshold = -5
# And this one to be used as the real th
cat.linking.similarity_threshold_trainer = -5
# Used for limiting the number of occ of a concept in a project
cat.general.cui_count_limit = 100000000
# Is unlink full
cat.general.full_unlink = False
# use this spacy model
cat.general.spacy_model = 'en_core_web_md'

This config is then loaded into the model cdb.config.

github.com

CogStack/MedCATtrainer/blob/6caa61db8fa652334e188a3748cfba0479ade5a9/webapp/api/api/utils.py#L309


      
                  cdb = CDB.load(cdb_path)
              except KeyError as ke:
                  mc_v = pkg_resources.get_distribution('medcat').version
                  if int(mc_v.split('.')[0]) > 0:
                      log.error('Attempted to load MedCAT v0.x model with MCTrainer v1.x')
                      raise Exception('Attempted to load MedCAT v0.x model with MCTrainer v1.x',
                                      'Please re-configure this project to use a MedCAT v1.x CDB or consult the '
                                      'MedCATTrainer Dev team if you believe this should work') from ke
                  raise
          
              custom_config = os.getenv("MEDCAT_CONFIG_FILE")
              if custom_config is not None and os.path.exists(custom_config):
                  cdb.config.parse_config_file(path=custom_config)
              else:
                  log.info("No MEDCAT_CONFIG_FILE env var set to valid path, using default config available on CDB")
              CDB_MAP[cdb_id] = cdb
          
          if vocab_id in VOCAB_MAP:
              vocab = VOCAB_MAP[vocab_id]
          else:
              vocab_path = project.vocab.vocab_file.path

Topic		Replies	Views
MedCAT French model only matches exact terms - accuracy similarity always 1 MedCAT	7	63	June 8, 2025
Medecat Trainer Missing Annotations MedCAT	3	208	January 17, 2023
Negative accuracy for Consecutive note MedCAT medical-ontologies	0	164	March 20, 2023
MedCat meta annotation model poor functionality MedCAT	4	261	January 18, 2023
How to improve recall and make medcat find correct word combinations?	15	315	January 20, 2023

Negative accuracy in annotation suggestion?

Related topics