I am trying to get medcat to extract dates ( for e.g. 10.10.2010, 10-10-2010, 10/10/2010, 10th Oct 2010 etc). from unstructured text. Spacy itself extracts dates but ‘cat.get_entities’ filters out these dates. I tried adding the timexy module to the medcat pipeline
I believe medcat stores the detected entities in ._.ents and then the non-overlapping ones in .ents.
You may need to add a pipe that runs before medcat, takes the normal spacy .ents which are the date time type, put those into ._.times or some other namespace, then be able to recover it later.
Otherwise it will get clobbered by the medspacy .ents which stores the CUI entities.
This line is where the linker resets the .ents attribute
Here is where the cat is instantiated and components are added to the pipeline. Some of these are spacy components some of them are medcat specific:
Also I believe medcat disables spacy NER by default. So you may want to re-enable it and put it earlier in the pipeline then have your own custom component that reads from spacy doc.ents before the medcat NER+Linker runs/