Detecting mismatch between intent and action

Recent preprint showing the usability in detecting the mismatch between what a clinician types in freetext of an EHR and what a clinician does in the EHR, e.g. missed order entries, wrongly coded follow-up:

This is a big area in many NHS Trusts with a backlog of clinical episodes on waiting lists where it is unclear whether to send further appointments to or not.

Hmm interesting usecase…

Do you know if there was any comparisons to “Humans” results? Did the annotators differ from the clinical coders/data entry results?

I suspect that the errors of the model could possibly be overlap the human comparators. (unclear to humans likely unclear to AI) So this would need to be evaluated for its value before being launched into production.

Also this group annotated 3000 documents… impressive stuff!

  1. How do you find these enthusiastic clinicians?
  2. Why so many? Is it the variability in the phrasing of intent? Would it of been better to find e.g. n=50 of each clinical intent outcome?

Hi thanks for the interest in this paper and apologies for the delay. We are currently doing a comparison between what the clinician booked into the system (orders, follow ups) and what the model predicts. Hopefully we can update on this soon. As for the other questions the annotations were provided by non-clinicians who work in the clinic(s) in which this analysis was done (gastro intestinal services). They volunteered because they are aware of the gap between what clinicians intend to do any actually do and are keen to plug this gap!

In terms of why we had to do so many documents it was due to the variety of intents we were trying detecting. Many of them do not appear simultaneously in the same document and so in order to generate training data we ended up with that number of documents.