Hello, after reading both the tutorials and the paper I noticed that the train setup in the MedCATtutorials set the train_from_false_positives as False so do the devalue_others variable (which I try to find add_name to no avail so far). Should both variable set to True if I want to implement the false positive training? Or maybe the question is what do you guys suggest which argument should be set to True if I want the best possible training?
Thank you in advance…
There is no universal best settings for “best possible training”. It can depend on the use case.
If you’re unsure which is the best option for you, then I recommend trying out with various different options and comparing the performance of the various approaches.
With that said, devalue_others
and train_from_false_positives
work quite differently.
With train_from_false_positives
you specify that all false positives in the span should be trained as negative examples, explicitly. That means this affects the annotations that were incorrectly annotated by the model.
However, with devalue_others
for a correct annotation, all other concepts with the same name will be explicitly devalued as part of the training process based on this annotation. This means that effectively you “get more” out of each annotation since each incorrect meaning of a name is trained as a negative example.
But whether either will be beneficial for you or not really depends on your use case. For instance, some concepts share the names with their children (i.e 73211009
(“Diabetes mellitus”) and 44054006
(“Diabetes mellitus type 2”) have the name diabetes
yet the latter is a child of the former). So in these situations devaluing one over the other would probably not make sense since they will probably appear in similar contexts.
If you know which should be useful based on the specific use case and data, proceed with that knowledge. If you don’t know - as I said - try and experiment to find out.