Reenabling pipeline components

plandes · August 15, 2022, 7:51pm

Disabling spaCy components means missing annotation data such as sentence boundaries and parsed data (i.e. head tree). What happens when we add back the sentencizer and parser spaCy components?

So far in my testing, nothing “bad” seems to happen. However, can others confirm this doesn’t lead to harmful side effects?

zeljko · August 16, 2022, 6:38pm

Hi @plandes there should be no harmful effects, unless you use NER components from spacy that somehow clash with MedCAT NER components. Things like sentencizer or parser will not affect anything.

plandes · August 16, 2022, 6:56pm

No, these are the only two components, then merge non-medical NER from a different spaCy Language model instance.

However, I have noticed the sentence boundaries are at places different from a non-MedCAT spaCy language model. Any insight on that?

zeljko · August 16, 2022, 8:04pm

Possibly because we’ve modified the standard spacy tokenizer and added some other rules, but should be minimal. One thing to check is that the spacy model you are using is exactly the same as the spacy model in the medcat pipeline and that all the components that help to detect sentence boundaries are enabled.

plandes · August 18, 2022, 1:33am

This makes sense. Thank you.

plandes · August 18, 2022, 2:25pm

One more question on this: can you provide an example of text where the spelling is fixed and an abbreviation expanded for the purposes of a unit test case with my own setup? Thanks in advance.

zeljko · August 18, 2022, 10:44pm

Try typing the following Intracerebral heorrhage and CKD in the demo app here. It will detect the first part even though it is misspelt and also CKD even though it is an abbreviation. The text will not be touched, but internally the model will ignore the spelling mistakes and also detect abbreviations.

Topic		Replies	Views
Using different scispaCy models with MedCAT MedCAT medical-ontologies	6	299	June 9, 2023
Error in MedCATtrainer Project Setup: Missing "spacy_model" MedCAT	4	163	January 22, 2024
MedCAT sentencisation and chunking MedCAT	1	207	December 23, 2022
How to improve recall and make medcat find correct word combinations?	15	315	January 20, 2023
Understanding medcat MedCAT	6	385	September 13, 2022

Reenabling pipeline components

Related topics