Temporal information extraction using Medcat

I am trying to get medcat to extract dates ( for e.g. 10.10.2010, 10-10-2010, 10/10/2010, 10th Oct 2010 etc). from unstructured text. Spacy itself extracts dates but ‘cat.get_entities’ filters out these dates. I tried adding the timexy module to the medcat pipeline

cat.pipe.nlp.add_pipe(“timexy”, config=config)
config = {
“kb_id_type”: “timex3”,
“label”: “concept”, # default: ‘time’
“overwrite”: False # default: False
}

But the results were not as expected. The original entities extraction itself got messed up.

Would like to know if there are any recommended approaches to get medcat extract dates ?

Would be great to get a response on this… we are working on getting temporal information from unstructured text.

I believe medcat stores the detected entities in ._.ents and then the non-overlapping ones in .ents.

You may need to add a pipe that runs before medcat, takes the normal spacy .ents which are the date time type, put those into ._.times or some other namespace, then be able to recover it later.

Otherwise it will get clobbered by the medspacy .ents which stores the CUI entities.

This line is where the linker resets the .ents attribute

Here is where the cat is instantiated and components are added to the pipeline. Some of these are spacy components some of them are medcat specific:

Also I believe medcat disables spacy NER by default. So you may want to re-enable it and put it earlier in the pipeline then have your own custom component that reads from spacy doc.ents before the medcat NER+Linker runs/

Thank you @jkgenser for answering in such details with code, I am still investigating on it, will come back again for further query if have, thanks again