Removing a CDB Concept

Is there a way to remove a CDB concept in it’s entirety (e.g. remove the cui and all associated information)? Thanks!

Was there a particular reason for removing a concept? Most people would use a filter if they are only interested in a single or group of concepts. E.g. all Epilepsy terms or all procedures

It was a manually added concept that we had later decided should not have been.

If it’s not possible to remove one, is there an easy way to set a filter to only ignore that one concept?

So the filter is a white list filter.

For now you can do the following:

from medcat.cat import CAT

# Create CAT - the main class from medcat used for concept annotation
cat = CAT.load_model_pack('<Name of model_pack here>')

# List all SCTIDs (aka CUIs)
cui_filter = {cui for cui in cat.cdb.cui2preferred_name.keys()}
cui_filter.remove('<troublesome SCTID concept>') # add the SCTID of the concept you want to remove

# Add new filter to model
cat.config.linking['filters'] = {'cuis':cui_filter}  # To set new model filter

# Now use as you want
.....

# Save model pack with filter
cat.create_model_pack(save_dir_path='<DATA_DIR>' + model_pack_name='<my_first_medcat_modelpack>')

In the meantime, I’ll make a note to create a function to delete a concept from the cat model.

Many thanks, but there is now an odd issue. The cui_filter returns a key error when I try to filter out the troublesome cui. But the cui is still present in cui2names (as an empty set as I had early removed the word it captured with cui2names,remove), and it still captures the word that it had associated with it in documents it is run over (the term still appears in get_entities). Do you have any thoughts as to what the issue may be?

So the concept will still be there in cui2names as it has not been removed from the model.

The filter simple stops the model from annotating it in the documents that you provide it. Because you haven’t set the filter correctly, the concept will still appear.

Can you either print the KeyError that you have here or alternatively DM me.
Thanks

Hi

When I try to remove it using cui_filter I get the following error;

Cell In[28], line 1
cui_filter.remove(‘112’)

KeyError: ‘112’

When I look for the cui in cui2names the result it;

cat.cdb.cui2names[‘112’]
Out[26]: set()

And as a control, when I try adding a cui that was never there;

cat.cdb.cui2names[‘blah’]
Traceback (most recent call last):

Cell In[27], line 1
cat.cdb.cui2names[‘blah’]

KeyError: ‘blah’

Oh sorry forgot to add, when I process the documents, the term that the troublesome cui captures appears in .ents

What does the following return:
'112' in cat.cdb.cui2preferred_name.keys()

This should return: True

If it returns False, something has gone wrong and your CUI has no name:
Can you tell me what cat.cdb.cui2preferred_name[‘112’] returns?

Can you try interchange that line to:
cui_filter = {cui for cui in cat.cdb.cui2names.keys()}
It shouldn’t make a difference but lets first see.

Also drop me an email so we can have a chat.

Glad that this has been resolved!

Ill request a new feature to delete concepts. So that there will be an easy way to do this in the future

Here is the PR to add the new feature. Let me know what you think

1 Like

One reason would be if I have a particular list of about 3000 concepts, I might want to take the large UMLS self-supervised model and then finetune just 3000 concepts. It takes about 20G ram in order to read the 4m concept database into memory whereas I only need a small amount of RAM for my 3000 umls concepts.