Question regarding MedCat and token order "tolerance" mentioned in paper

Good afternoon CogStack team,

thanks for providing access to MedCat and also the Demo online!
I read the paper on MedCat with the note, that token order can be ignored with up to two tokens.
But I could not try out that feature in the online Demo nor through downloading the model trained on SNOMED-CT and running it locally.

Is this behavior enabled by default, or can I activate it somehow?

I also created an issue on Github for that: Concept not found if token order is slightly changed contrary to mentioned note in paper · Issue #344 · CogStack/MedCAT · GitHub and tried to contact via mail but it seems to be not reachable, so I figured it’s better to bring this up over here.

Kind regards,
Kim Tang

Hey @KimTang!

I think you can find the configuration for what you are looking for here:
cat.cdb.config.Ner()

The source code is below for you to inspect:

I haven’t used personally used this feature myself. Let us know me know how this works for you.

Just to find out, I used a simple model to see if the revers word order actually works.
I set the value to True, but it didn’t seem to work. At least not for the model I was using (which is a UMLS model trained on MIMIC-III).


(The right hand side is with the correct order, left hand side is with reversed order).
In fact, as we can see from the in-code documentation, this is exactly the type of things it’s supposed to find. But it doesn’t seem to be able to.
I’ve double checked that try_reverse_word_order is in fact True.

Just in case, I also looked at the code just to make sure the value is actually used. And it looks like it is:

With that said, the concept you tried and posted on Github probably was never meant to be supported - it has 4 tokens whereas (as you quoted) the feature was supposed to work up to 2 tokens.