Medcat Trainer configuration

The MedCAT Trainer is taking a lot of time to load even though it is set in the high configuration. The configuration currently used is 8 CPU 46 GB RAM which seems to be insufficient. Any thoughts on the configuration required ?

Hi @Gunavardhan,

8 CPU 46 GB RAM, should be more than sufficient to run MedCATtrainer.
Can you tell me a few more details about the issue:

  • What exactly is slow/failing? Is it building the image? slow loading a project? admin page? etc…
  • Have you checked the resources allocated to your docker instance? CPU, Memory, swap etc…
  • If it is slow loading a project. How many documents have you assigned to a project?
  • Your deployed Version/tag of MedCATtrainer

Thanks

Hi @anthony.shek

It is slow in loading a project, In a project we are having 546 documents

medcat-trainer:v2.1.1

Thanks in advance

Hi @Gunavardhan - that’s quite an old version.

The latest is v2.3.0, could you git pull and re-run docker-compose?

Hi @anthony.shek

Will try and come back to you

Thanks in advance

Please note that the MedCAT library version is different between these versions of the the Trainer, so current MedCAT models you’ve got loaded into the trainer may not work with v2.3.0

Hi All,

Our MedCAT trainer version is 2.3.4. These are our findings:

  1. The loading time of each note in the MedCAT Trainer depends upon the length of that particular note.
  2. In a dataset we have length of clinical notes (count of characters in a note) varying from 122219 to 118. Hence, the loading time of note varies from 5 mins to 2-3 secs.
  3. For notes with higher length - sifting from one annotated value to other in a clinical note, or performing any operation on the annotated term, the page renders unresponsive.

How can we tackle this issue of huge loading time without reducing the size of the clinical note?

Hi @kashmirabhake,

What are the PC hardware specs that you are running the MedCAT trainer on?

One could run it on a small 4 core, 8gb machine, but that would be slow if running the full snomed terminology medcat model, I.e. The 2 - 3gb model.

Hi @anthony.shek thank you for your inputs.
Our system configuration is: 8 CPU 46 GB RAM
We need to run the full medcat model for our requirement.
Here are some more additional findings:
The time taken to load a note depends on 2 factors:

  1. Length of the individual note
  2. Number of concept ids to be annotated in a note irrespective their occurrences in a note

Out trainers are facing difficulties in loading a large note and training it as the page renders unresponsive. Having said this we are only interested in only 50 semantic types(tuis) out of 127. We believe that filtering the semantic types might help us reduce the loading time. Is there a way to be able to have a filter for only those semantic types that I am interested in?

In terms of document loading time - if you don’t set a project filter cui list, either directly or via .json file, you’re then annotating for ‘all concepts’ as configured within the medcat model. This is not advised both from a it takes ages to load long documents, and is also very painful for a human being to annotate,

Exactly. In the project annotate entities tab: Enter your white list concept filter as follows: