Negative accuracy in annotation suggestion?

Hi,
While nlp training for a particular concept (with UMLS cui filter), there have been multiple incidences where the highlighted text shows negative accuracy of the concept.
What does it mean? How the number eventually turn out?
negative accuracy

Hi, any update on this?
Hi,
While nlp training for a particular concept (with UMLS cui filter), there have been multiple incidences where the highlighted text shows negative accuracy of the concept.
What does it mean? How the number eventually turn out?

I’m also getting something similar. I have a CUI for falling and it’s annotated the word “filling” with -0.2 accuracy.

When I run the same CDB in python without using the webapp, this entity doesn’t get extracted. I’ll plan on digging into the medcat trainer code soon but here to +1 it.

I’ve been trying to get to the bottom of this by looking into the code of MedCATtrainer and MedCAT.

I don’t see MedCATtrainer changing the value anywhere, so I’d be inclined to believe it comes from MedCAT itself.

Looking at MedCAT, I can see that the default value for the Span extension is set to -1.

However, the only way I can see a negative value set to the spam extension (which is where MCT grabs it from) is if
a) config.linking.similarity_threshold is less than 0
or
b) both

  • self.config.linking.similarity_threshold_type = 'dynamic'
  • and cdb.cui2average_confidence[cui] < 0 for the specific CUI

I find it fairly unlikely for a) to be the case in a real world scenario (though it may be worth a double check).

So that’d leave me with b).
Can anyone experiencing this issue verify if they’ve got self.config.linking.similarity_threshold_type set to 'dynamic'?

The value of cdb.cui2average_confidence[cui] could be less than 0 for a CUI if it’s not been trained enough (which results in similarity of -1) or if the CUI has no context vector (again, similarity -1).
And it could subsequently rise from -1 if it then is trained enough (more than self.config.linking.train_count_threshold) since then it would report a positive similarity and update the cui2average_confidence dict.

I’ve looked into this in a little more detail.

The similarity can actually become negative. In principle, the similarity can vary between -1 and 1.
By default, MedCAT is using 4 different context types (short, medium, long, and xlong) with corresponding weights (0.1, 0,4, 0.4, and 0.1, respectively).
For each context type, the learned context vector for the CUI and the context vector in the document are taken. They are then converted into unit vectors and a dot product of the latter is calculated.
A weighed average (effectively) is taken of these dot products (though it could technically happen that not all context types have a corresponding context vector).
Each dot product can range from -1 to 1. It is -1 if the two unit vectors point in the opposite direction and 1 if they are pointing in the same direction. In general, the dot product will be negative if the angle between the two vectors is greater than 90 degrees.

So essentially what the negative value means is that the accuracy/similarity (at least in terms of one/some context type) is quite poor.

But as long as config.linking.similarity_threshold > 0 is set for the model’s config we shouldn’t see these values (other than the edge cases above). Is that the case for the users who are experiencing issues with this?

If you can still see negative values after setting the threshold greater to 0 (which I’ve not been able to see in my limited testing), I’d need to look into this in more detail.

OK thanks. This explains why in the annotation view, it sometimes comes up with negative annotation but when I run the linker using the same model on the command line (or like in a python script), then I don’t receive link results since by default in that context it definitely has threshold > 0.2 or something like that.

Maybe the threshold is different for default configs for a brand new medcat annotator service?

By default, MedCATtrainer should take the threshold value from the model. And if that doesn’t exist, it defaults to 0.2 as well.

So on paper, it should behave identically to running the same text through a straight up python script.

So the threshold seems to be lower on the trainer for some reason in this case. But I don’t really know why that would be.

In MedCATTrainer please checkout the configs. Edit as appropriate.

This config is then loaded into the model cdb.config.