MedCATtrainer error when uploading dataset

Hello, I am happily usingMedCATTrainer! I am very happy with this tool, thanks for all your efforts! However, I wanted to upload a new, improved dataset to annotate, but then I got an error, the following:

Do you have any idea how I can fix this? The data format is good I believe [name, text].

Thank you in advance!

Best, Malin

No too sure what that error represents without inspecting the logs, @tomolopolis any ideas?

You are correct in that the file should be a CSV with at least two columns [name, text].
Can you confirm that this is the case?

If that is not the issue can you mention which version of MedCATtrainer are you using? Thanks!

hi @Malin - this looks like a bug in the latest release. I’ve created a github issue here - but to @anthony.shek’s point do you have the name and text columns in the .csv or .xlsx file you’re trying to upload?

Hi Tom and Anthony,

Thank you for your quick replies! Yes I can now confirm that I uploaded a .csv with two columns [name, text]. Thank you for creating a github issue. I’ll wait and see what comes out.

Best regards,
Malin

@Malin - I’ve fixed this here - bug on my part that was actually hiding a environment variable that limits the max size for a dataset upload. The fix will be available in the next release once this is merged in.

By default - the max size is located in /envs/env:

MAX_DATASET_SIZE=10000

To annotate 10k documents is already a little extreme in a single project, but do feel free to edit this env var, or split your dataset into multiple projects.

Great @tomolopolis, thank you for your efforts! Much appreciated :slight_smile: