Thinking further, one could also use the pseudo attention strategy and frame it as a text classification + passing center position.
Label every token in the sequence chunks with some either just simple flags or BIO labeling scheme. And subsequently either take the average for entities with multiple tokens or add some CRF on top.
Curious to know your thoughts and if someone makes a suggestion on approach preferred for medcat I’m down to take a stab at implementation. If the BLSTM model is not performant enough for my use case, I’ll probably go ahead and work on an implementation but wouldn’t mind some guidance so that there’s a higher likelihood it gets merged in and helpful for others!