MedCAT as service without docker

Hi all,

I would like to install MedCAT as a service without docker and without root acces on our HPC cluster . Are any other python modules than uWSGI needed for this? Or do you have an other suggestion for this?

(I will need to ask the cluster admins for installing extra modules so I would like to know all requirements in advance)

regards,
Tom

Hello,

All packages required are here : https://github.com/CogStack/MedCATservice/blob/master/medcat_service/requirements.txt

Do a “pip3 install -r ./medcat_service/requirements.txt” whilst in the git repo. Please note that the package version(s) will be updated every release.

Thanks, i will let you know how it goes.

RHEL 8.7, python 3.9.16

The install with your command seems to work (no errors) but start-service-prod.sh in ~/MedCATservice references ‘gunicorn <…> --config file /cat/config.py’ which I do not have at that place. Which file should I specify? I tried ~/MedCATservice/config.py but that gives a python error ('cannot import ‘Dict’ from ‘pydantic.dataclasses’). Same result for leaving it out.

Hey!

It looks like some of the dependencies/requirements (in your case pydantic at the very least) are not available when you’re running the script.

How did you install the dependencies? I assume you ran the command in a python virtual environment. Did you make sure to activate said environment before running the production bash script?

The output of pip freeze could be helpful since it should show you what packages have been installed.

PS:
The production script seems to have been made to work within the docker container, where /cat effectively refers to the root of the repository (according to the Dockerfile). So you should be able to switch out /cat/config.py for just config.py (if you’re running the script from the folder itself) or to the absolute path (which might be ~/MedCATservice/config.py in your case as you mentioned).

This time in a virtual python 3.11.3 environment (although I have only one python version installed).
However, same result, see below the output of start-service-prod.sh followed by pip freeze:
I installed the dependencies as “pip3 install -r ./medcat_service/requirements.txt” in ~/MedCATservice and saw no errors there. This seems to be a problem with my python installation. I will re-install everything (again).
Any python version recommendations?

(mdcservice) [tgwelter@node011 MedCATservice]$ ./start-service-prod.sh
SERVER_HOST is unset -- setting to default: 0.0.0.0
SERVER_PORT is unset -- setting to default: 5000
SERVER_WORKERS is unset -- setting to default: 1
SERVER_THREADS is unset -- setting to default: 1
SERVER_WORKER_TIMEOUT is unset -- setting to default (sec): 3600
Starting up Flask app using gunicorn server ...
[2023-12-01 23:26:30 +0100] [3383404] [INFO] Starting gunicorn 20.1.0
[2023-12-01 23:26:30 +0100] [3383404] [INFO] Listening at: http://0.0.0.0:5000 (3383404)
[2023-12-01 23:26:30 +0100] [3383404] [INFO] Using worker: sync
[2023-12-01 23:26:30 +0100] [3383409] [INFO] Booting worker with pid: 3383409
[2023-12-01 23:26:30 +0100] [3383409] [INFO] Worker spawned (pid: 3383409)
[2023-12-01 23:26:30 +0100] [3383409] [INFO] APP_CUDA_DEVICE_COUNT device variables not set
[2023-12-01 23:26:36 +0100] [3383409] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
    worker.init_process()
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
                ^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
                    ^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
           ^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/gunicorn/util.py", line 359, in import_app
    mod = importlib.import_module(module)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/shared/apps/easybuild/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1206, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/trinity/home/tgwelter/MedCATservice/wsgi.py", line 6, in <module>
    from medcat_service.app import create_app
  File "/trinity/home/tgwelter/MedCATservice/medcat_service/app/__init__.py", line 4, in <module>
    from .app import create_app
  File "/trinity/home/tgwelter/MedCATservice/medcat_service/app/app.py", line 12, in <module>
    from medcat_service.api import api
  File "/trinity/home/tgwelter/MedCATservice/medcat_service/api/__init__.py", line 4, in <module>
    from .api import api
  File "/trinity/home/tgwelter/MedCATservice/medcat_service/api/api.py", line 11, in <module>
    from medcat_service.nlp_service import NlpService
  File "/trinity/home/tgwelter/MedCATservice/medcat_service/nlp_service/__init__.py", line 4, in <module>
    from .nlp_service import MedCatService, NlpService
  File "/trinity/home/tgwelter/MedCATservice/medcat_service/nlp_service/nlp_service.py", line 6, in <module>
    from medcat_service.nlp_processor import MedCatProcessor
  File "/trinity/home/tgwelter/MedCATservice/medcat_service/nlp_processor/__init__.py", line 4, in <module>
    from .medcat_processor import MedCatProcessor
  File "/trinity/home/tgwelter/MedCATservice/medcat_service/nlp_processor/medcat_processor.py", line 10, in <module>
    from medcat.cat import CAT
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/medcat/cat.py", line 23, in <module>
    from medcat.preprocessing.tokenizers import spacy_split_all
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/medcat/preprocessing/tokenizers.py", line 10, in <module>
    from medcat.config import Config
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/medcat/config.py", line 3, in <module>
    from pydantic.dataclasses import Any, Callable, Dict, Optional, Union
ImportError: cannot import name 'Dict' from 'pydantic.dataclasses' (/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/pydantic/dataclasses.py)
[2023-12-01 23:26:36 +0100] [3383409] [INFO] Worker exiting (pid: 3383409)
[2023-12-01 23:26:37 +0100] [3383404] [INFO] Shutting down: Master
[2023-12-01 23:26:37 +0100] [3383404] [INFO] Reason: Worker failed to boot.
(mdcservice) [tgwelter@node011 MedCATservice]$ pip freeze
aiofiles==23.2.1
aiohttp==3.8.3
aiosignal==1.3.1
annotated-types==0.6.0
asttokens==2.4.1
async-timeout==4.0.3
attrs==23.1.0
blinker==1.7.0
blis==0.7.11
catalogue==2.0.10
certifi==2023.11.17
charset-normalizer==2.1.1
click==8.1.7
cloudpathlib==0.16.0
comm==0.2.0
confection==0.1.4
cymem==2.0.8
datasets==2.15.0
decorator==5.1.1
dill==0.3.7
distlib @ file:///trinity/shared/apps/easybuild/build/virtualenv/20.23.1/GCCcore-12.3.0/distlib/distlib-0.3.6
executing==2.0.1
filelock @ file:///trinity/shared/apps/easybuild/build/virtualenv/20.23.1/GCCcore-12.3.0/filelock/filelock-3.12.2
Flask==2.3.2
Flask-Injector==0.14.0
frozenlist==1.4.0
fsspec==2023.10.0
gensim==4.3.2
gunicorn==20.1.0
huggingface-hub==0.19.4
idna==3.6
injector==0.20.1
interchange==2021.0.4
ipython==8.18.1
ipywidgets==8.1.1
itsdangerous==2.1.2
jedi==0.19.1
Jinja2==3.1.2
joblib==1.3.2
jsonpickle==3.0.2
jupyterlab-widgets==3.0.9
langcodes==3.3.0
MarkupSafe==2.1.3
matplotlib-inline==0.1.6
medcat==1.7.3
monotonic==1.6
mpmath==1.3.0
multidict==6.0.4
multiprocess==0.70.15
murmurhash==1.0.10
networkx==3.2.1
numpy==1.25.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
packaging==23.2
pandas==2.1.3
pansi==2020.7.3
parso==0.8.3
pexpect==4.9.0
platformdirs @ file:///trinity/shared/apps/easybuild/build/virtualenv/20.23.1/GCCcore-12.3.0/platformdirs/platformdirs-3.8.0
preshed==3.0.9
prompt-toolkit==3.0.41
psutil==5.9.6
ptyprocess==0.7.0
pure-eval==0.2.2
py2neo==2021.2.4
pyarrow==14.0.1
pyarrow-hotfix==0.6
pydantic==2.5.2
pydantic_core==2.14.5
Pygments==2.17.2
python-dateutil==2.8.2
pytz==2023.3.post1
PyYAML==6.0.1
regex==2023.10.3
requests==2.31.0
safetensors==0.4.1
scikit-learn==1.3.2
scipy==1.9.3
semantic-version==2.10.0
setuptools-rust==1.6.0
simplejson==3.17.6
six==1.16.0
smart-open==6.4.0
spacy==3.7.2
spacy-legacy==3.0.12
spacy-loggers==1.0.5
srsly==2.4.8
stack-data==0.6.3
sympy==1.12
thinc==8.2.1
threadpoolctl==3.2.0
tokenizers==0.15.0
torch==2.1.1
tqdm==4.66.1
traitlets==5.14.0
transformers==4.35.2
triton==2.1.0
typer==0.9.0
typing_extensions==4.8.0
tzdata==2023.3
urllib3==2.1.0
virtualenv @ file:///trinity/shared/apps/easybuild/build/virtualenv/20.23.1/GCCcore-12.3.0/virtualenv/virtualenv-20.23.1
wasabi==1.1.2
wcwidth==0.2.12
weasel==0.3.4
Werkzeug==2.3.3
widgetsnbextension==4.0.9
xxhash==3.4.1
yarl==1.9.3

I might be wrong, but looking at the fact that you’re running pedantic>2.0, I’m fairly certain that’s the issue here.

Pedantic 2 has breaking changes compared to 1.x which we (read:me/I) didn’t anticipate when loosening the dependencies. As such, we should.
(effectively) depend on pydantic<2.0.
I think we changed that in later versions, but it’s still open for 1.7.3 which is what you’ve got.

So all in all, I’d try to install pedantic<2.0.
Hopefully that fixes the issue.

Thanks for your late evening reply! That could indeed be the problem. The dataclasses.py in the pedantic that I have really does not have ‘Dict’. Just tried python 3.9.16 but that one has problems getting one of the nvdia sources… I will try other things later but not now, it is too late…

Just to clarify, all you should have to do is run pip3 install pydantic<2.0 in your virtual environment.
This will install a compatible pydantic version.

Now, it’s possible that at that point pip complains about package version incompatibility (i.e it might have previously installed another package that we need, but in a version that supports only pydantic>=2.0).
In this case, it’d probably be easier to modify the requirements file and specify pydantic<2.0 in a new line in there (the ordering doesn’t matter). And then re-run the installation based on the requirements file.

1 Like

I actually tried both options already with pydanctic==1.10.7 (not ‘<’) but the installation of dependencies gets killed when installing nvidia-cudnn-cu12. Also tried --force-reinstall, and with a separate install, all with the same result. I was messing about just now and tried with pydantic<2.0 in the requirements file. This installs pydantic 1.10.13, however get killed at installation of torch-2.1.1.

I think there is someting with my specificic python installation? I do not have complete control over it (not root in this machine) but can make virtual environments. I will keep digging some more.

Getting a little bit further. After recreating the virtual environment again the install of depencies ran without errors. A lot faster this time, perhaps caching going on. However, starting the service gave an new error : AttributeError: 'Flask' object has no attribute 'before_first_request_funcs' See below for the output with pip freeze at the end . Any ideas?

(mdcservice) [tgwelter@node011 MedCATservice]$ ./start-service-prod.sh 
SERVER_HOST is unset -- setting to default: 0.0.0.0
SERVER_PORT is unset -- setting to default: 5000
SERVER_WORKERS is unset -- setting to default: 1
SERVER_THREADS is unset -- setting to default: 1
SERVER_WORKER_TIMEOUT is unset -- setting to default (sec): 3600
Starting up Flask app using gunicorn server ...
[2023-12-04 11:24:08 +0100] [3925799] [INFO] Starting gunicorn 20.1.0
[2023-12-04 11:24:08 +0100] [3925799] [INFO] Listening at: http://0.0.0.0:5000 (3925799)
[2023-12-04 11:24:08 +0100] [3925799] [INFO] Using worker: sync
[2023-12-04 11:24:08 +0100] [3925800] [INFO] Booting worker with pid: 3925800
[2023-12-04 11:24:08 +0100] [3925800] [INFO] Worker spawned (pid: 3925800)
[2023-12-04 11:24:08 +0100] [3925800] [INFO] APP_CUDA_DEVICE_COUNT device variables not set
[2023-12-04 11:24:17 +0100] [3925800] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
    worker.init_process()
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
                ^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
                    ^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
           ^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/gunicorn/util.py", line 359, in import_app
    mod = importlib.import_module(module)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/shared/apps/easybuild/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1206, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/trinity/home/tgwelter/MedCATservice/wsgi.py", line 8, in <module>
    application = create_app()
                  ^^^^^^^^^^^^
  File "/trinity/home/tgwelter/MedCATservice/medcat_service/app/app.py", line 56, in create_app
    flask_injector.FlaskInjector(
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/flask_injector/__init__.py", line 320, in __init__
    process_list(app.before_first_request_funcs, injector)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Flask' object has no attribute 'before_first_request_funcs'
[2023-12-04 11:24:17 +0100] [3925800] [INFO] Worker exiting (pid: 3925800)
[2023-12-04 11:24:18 +0100] [3925799] [INFO] Shutting down: Master
[2023-12-04 11:24:18 +0100] [3925799] [INFO] Reason: Worker failed to boot.

Flask==2.3.2
gunicorn==20.1.0
injector==0.20.1
flask-injector==0.14.0
medcat==1.7.3
setuptools==65.5.1
simplejson==3.17.6
werkzeug==2.3.3
setuptools_rust==1.6.0
pydantic<2.0
(mdcservice) [tgwelter@node011 MedCATservice]$ pip freeze
aiofiles==23.2.1
aiohttp==3.8.3
aiosignal==1.3.1
asttokens==2.4.1
async-timeout==4.0.3
attrs==23.1.0
blinker==1.7.0
blis==0.7.11
catalogue==2.0.10
certifi==2023.11.17
charset-normalizer==2.1.1
click==8.1.7
cloudpathlib==0.16.0
comm==0.2.0
confection==0.1.4
cymem==2.0.8
datasets==2.15.0
decorator==5.1.1
dill==0.3.7
distlib @ file:///trinity/shared/apps/easybuild/build/virtualenv/20.23.1/GCCcore-12.3.0/distlib/distlib-0.3.6
executing==2.0.1
filelock @ file:///trinity/shared/apps/easybuild/build/virtualenv/20.23.1/GCCcore-12.3.0/filelock/filelock-3.12.2
Flask==2.3.2
Flask-Injector==0.14.0
frozenlist==1.4.0
fsspec==2023.10.0
gensim==4.3.2
gunicorn==20.1.0
huggingface-hub==0.19.4
idna==3.6
injector==0.20.1
interchange==2021.0.4
ipython==8.18.1
ipywidgets==8.1.1
itsdangerous==2.1.2
jedi==0.19.1
Jinja2==3.1.2
joblib==1.3.2
jsonpickle==3.0.2
jupyterlab-widgets==3.0.9
langcodes==3.3.0
MarkupSafe==2.1.3
matplotlib-inline==0.1.6
medcat==1.7.3
monotonic==1.6
mpmath==1.3.0
multidict==6.0.4
multiprocess==0.70.15
murmurhash==1.0.10
networkx==3.2.1
numpy==1.25.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
packaging==23.2
pandas==2.1.3
pansi==2020.7.3
parso==0.8.3
pexpect==4.9.0
platformdirs @ file:///trinity/shared/apps/easybuild/build/virtualenv/20.23.1/GCCcore-12.3.0/platformdirs/platformdirs-3.8.0
preshed==3.0.9
prompt-toolkit==3.0.41
psutil==5.9.6
ptyprocess==0.7.0
pure-eval==0.2.2
py2neo==2021.2.4
pyarrow==14.0.1
pyarrow-hotfix==0.6
pydantic==1.10.13
Pygments==2.17.2
python-dateutil==2.8.2
pytz==2023.3.post1
PyYAML==6.0.1
regex==2023.10.3
requests==2.31.0
safetensors==0.4.1
scikit-learn==1.3.2
scipy==1.9.3
semantic-version==2.10.0
setuptools-rust==1.6.0
simplejson==3.17.6
six==1.16.0
smart-open==6.4.0
spacy==3.7.2
spacy-legacy==3.0.12
spacy-loggers==1.0.5
srsly==2.4.8
stack-data==0.6.3
sympy==1.12
thinc==8.2.1
threadpoolctl==3.2.0
tokenizers==0.15.0
torch==2.1.1
tqdm==4.66.1
traitlets==5.14.0
transformers==4.35.2
triton==2.1.0
typer==0.9.0
typing_extensions==4.8.0
tzdata==2023.3
urllib3==2.1.0
virtualenv @ file:///trinity/shared/apps/easybuild/build/virtualenv/20.23.1/GCCcore-12.3.0/virtualenv/virtualenv-20.23.1
wasabi==1.1.2
wcwidth==0.2.12
weasel==0.3.4
Werkzeug==2.3.3
widgetsnbextension==4.0.9
xxhash==3.4.1
yarl==1.9.3

just for fun I disabled line 320 in File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/flask_injector/__init__.py", line 320, in __init__
and now the service is running :slight_smile:
When I submit a query, the error now is :
raise ValueError("Vocabulary (env: APP_MODEL_VOCAB_PATH) not specified").
It is not running out of the box , but I should be able to configure that. I want the dutch language model anyway.

QUESTION: there are some references to the ‘/cat’ directory in files in the medcatservice directory tree, mainly related to the /cat/model directory. This has to do with the location of the model files in docker containers. I suppose I will have to relace them with my own directory to get things running? There is also some mentions in *.pyc , compiled python, files. Will this be a problem since I can not change these ?

With respect to the Flask error, it looks like we’d need Flask<2.3.

With respect to the /cat directory, you are correct, there are a few mentions of it in the project and you would probably need to change those on your installation.
This is what I found with a quick grep:

./medcat_service/nlp_processor/medcat_processor.py:        with open("/cat/models/data.json", "w") as f:
./medcat_service/nlp_processor/medcat_processor.py:        DATA_PATH = "/cat/models/data.json"
./medcat_service/nlp_processor/medcat_processor.py:        CDB_PATH = "/cat/models/cdb.dat"
./medcat_service/nlp_processor/medcat_processor.py:        VOCAB_PATH = "/cat/models/vocab.dat"
./medcat_service/nlp_processor/medcat_processor.py:            cat.cdb.save_dict("/cat/models/cdb_new.dat")
./envs/env_app:APP_MODEL_CDB_PATH=/cat/models/cdb.dat
./envs/env_app:APP_MODEL_VOCAB_PATH=/cat/models/vocab.dat
./envs/env_app:APP_MODEL_META_PATH_LIST=/cat/models/Status
./envs/env_app:# Respect the same paths as above : /cat/models/model_pack_name.zip
./envs/env_app:# APP_MODEL_CUI_FILTER_PATH=/cat/models/cui_filter.txt

In general, the user should not worry about .pyc files. These are automatically managed by the python interpreter and recompiled when necessary (i.e when the corresponding .py file changes).

Hi @mart.ratas , can you help me once again please?!
The service runs as far as flask and pydantic are concerned. However, the env variables from ./envs/env_app do not seem to be used/set (see error below). When submitting a query via curl, an error is reported that APP_MEDCAT_MODEL_PACK and APP_MODEL_VOCAB_PATH are not specified although the are.
I recreated the ‘models’ directory to contain the same files and dirs as in my (working) medcat docker container. I cannot find where the env files are sourced/read, and if I source them myself at the beginning of start-service-prod.sh I get the same error. The model_pac zip file is the same as in my docker container.

[2023-12-07 10:06:22,343] [INFO] MedCatProcessor: Initializing MedCAT processor ...
[2023-12-07 10:06:23,608] [INFO] MedCatProcessor: APP_MEDCAT_MODEL_PACK not set, skipping....
[2023-12-07 10:06:23,609] [ERROR] medcat_service.app.app: Exception on /api/process [POST]
Traceback (most recent call last):
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/injector/__init__.py", line 783, in get
    return self._context[key]
           ~~~~~~~~~~~~~^^^^^
KeyError: <class 'medcat_service.nlp_service.nlp_service.NlpService'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/injector/__init__.py", line 783, in get
    return self._context[key]
           ~~~~~~~~~~~~~^^^^^
KeyError: <class 'medcat_service.nlp_processor.medcat_processor.MedCatProcessor'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/flask/app.py", line 2529, in wsgi_app
    response = self.full_dispatch_request()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/flask/app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/flask/app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/flask_injector/__init__.py", line 89, in wrapper
    return injector.call_with_injection(callable=fun, args=args, kwargs=kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/injector/__init__.py", line 999, in call_with_injection
    dependencies = self.args_to_inject(
                   ^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/injector/__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/injector/__init__.py", line 1047, in args_to_inject
    instance: Any = self.get(interface)
                    ^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/injector/__init__.py", line 943, in get
    result = scope_instance.get(interface, binding.provider).get(self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/injector/__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/injector/__init__.py", line 785, in get
    provider = InstanceProvider(provider.get(self.injector))
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/injector/__init__.py", line 264, in get
    return injector.create_object(self._cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/injector/__init__.py", line 966, in create_object
    self.call_with_injection(init, self_=instance, kwargs=additional_kwargs)
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/injector/__init__.py", line 999, in call_with_injection
    dependencies = self.args_to_inject(
                   ^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/injector/__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/injector/__init__.py", line 1047, in args_to_inject
    instance: Any = self.get(interface)
                    ^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/injector/__init__.py", line 943, in get
    result = scope_instance.get(interface, binding.provider).get(self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/injector/__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/injector/__init__.py", line 785, in get
    provider = InstanceProvider(provider.get(self.injector))
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/injector/__init__.py", line 264, in get
    return injector.create_object(self._cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/injector/__init__.py", line 966, in create_object
    self.call_with_injection(init, self_=instance, kwargs=additional_kwargs)
  File "/trinity/home/tgwelter/environments/mdcservice/lib/python3.11/site-packages/injector/__init__.py", line 1008, in call_with_injection
    return callable(*full_args, **dependencies)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/MedCATservice/medcat_service/nlp_processor/medcat_processor.py", line 65, in __init__
    self.cat = self._create_cat()
               ^^^^^^^^^^^^^^^^^^
  File "/trinity/home/tgwelter/MedCATservice/medcat_service/nlp_processor/medcat_processor.py", line 240, in _create_cat
    raise ValueError("Vocabulary (env: APP_MODEL_VOCAB_PATH) not specified")
ValueError: Vocabulary (env: APP_MODEL_VOCAB_PATH) not specified
[07/Dec/2023:10:06:23 +0100] [ACCESSS] 127.0.0.1 "POST /api/process HTTP/1.1" 500 "-" "curl/7.61.1"

In normal operation, the env files are read by docker - they’re passed as keyword arguments (refer to the MedCATservice readme).
But since you’re running it without docker, you’ll need to read them in manually.
The problem is that when you source the env files, the environmental variables are only defined within the scope of your terminal session. They are not passed to other scripts you run (i.e they are not accessible when you run the startup script).

So what we need to do export the environmental variables. Perhaps the easiest way to do that is to simply prefix all the environmental variables in the file(s) with export (i.e APP_NAME=MedCAT becomes export APP_NAME=MedCAT).
After you’ve done that, and have subsequently sourced the files (you probably need both since they’re both passed to docker), the variables/values will be available within the startup script and subsequently the service.

Thanksfor the suggestion. I thought that sourcing the env files from within start-service-prod.sh would make the variables available but apparently ‘export’ was still necessary.

However, never ending fun: my model pack (dutch snomed v1.10) was made with medcat v 1.3 whereas the current medcat service runs v 1.7. This produces a warning, the medcat instances crashes and a new worker is spawned and no result returned.

I suppose I will have to make/get a newer model pack for which I will contact my Dutch colleagues in Utrecht. My tests with older versions of medcat service (v0.5) and python (v3.9) produced yet another bunch of errors (no matching medcat distribution found for py2neo==2021.2.3 )

to be continued…

The version incompatibility should be able to be rectified.

I’m guessing your issue is similar to this one:

If so, you should be able to migrate the model.
Though the code to do so was released in medcat v1.8.0.

So what I would do is the following:

# you might want to do this in another virtual environment in order to avoid messing with the one that otherwise works for you
# or you can just do this on another machine and transfer the model later

# install medcat 1.8
python -m pip install medcat==1.8.2

# do the conversion
python -m medcat.utils.versioning fix-config <model_pack_path> <new_model_pack_path>

Now you should have a new model that should work with medcat 1.7 (there’s nothing that changed between 1.7 and 1.8 in that regard).

I tried as you suggested but it appears that the version warning is not the problem or cause of the crashing medcat: ‘Model pack does not need ugprade’. I will try to get more debug info (the setting in the env file did not do much)

tom@22-002756:~/convert$ python -m medcat.utils.versioning fix-config ~/cogstack-nl/services/nlp-services/applications/medcat/models/umls-dutch-v1-10/model_pack_umls_dutch/ ./newp/
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/tom/convert/packconv/lib/python3.10/site-packages/medcat/utils/versioning.py", line 314, in <module>
    main()
  File "/home/tom/convert/packconv/lib/python3.10/site-packages/medcat/utils/versioning.py", line 308, in main
    fix_config(args)
  File "/home/tom/convert/packconv/lib/python3.10/site-packages/medcat/utils/versioning.py", line 295, in fix_config
    upgrader.upgrade(args.newpath, overwrite=args.overwrite)
  File "/home/tom/convert/packconv/lib/python3.10/site-packages/medcat/utils/versioning.py", line 215, in upgrade
    raise ValueError(f"Model pack does not need ugprade: {self.model_pack_path} "
ValueError: Model pack does not need ugprade: /home/tom/cogstack-nl/services/nlp-services/applications/medcat/models/umls-dutch-v1-10/model_pack_umls_dutch/ since it's at version: (1, 3, 0)

What is the actual error when you run it and it crashes?

I set ./env/env_app → export APP_LOG_LEVEL=DEBUG but that does not produce more output, see below. The service starts up with log messages up to and including APP_CUDA_DEVICE_COUNT device variables not set. After that, a query is submitted that starts medcat init. I also tried setting the spacy model path in env_medcat to the correct path, with no effect.
To clear the filename argument error I set env_medcat → export KEEP_PUNCT=":|.", with no effect.
The server has appr 400 GB RAM available (of 500 GB)

(mdcservice) [tgwelter@node011 MedCATservice]$ ./start-service-prod.sh 
./envs/env_medcat: line 18: .: filename argument required
.: usage: . filename [arguments]
threads = 4
Starting up Flask app using gunicorn server ...
[2023-12-09 12:40:24 +0100] [1390041] [INFO] Starting gunicorn 20.1.0
[2023-12-09 12:40:24 +0100] [1390041] [INFO] Listening at: http://0.0.0.0:5000 (1390041)
[2023-12-09 12:40:24 +0100] [1390041] [INFO] Using worker: gthread
[2023-12-09 12:40:24 +0100] [1390061] [INFO] Booting worker with pid: 1390061
[2023-12-09 12:40:24 +0100] [1390061] [INFO] Worker spawned (pid: 1390061)
[2023-12-09 12:40:24 +0100] [1390061] [INFO] APP_CUDA_DEVICE_COUNT device variables not set
[2023-12-09 12:40:35,221] [DEBUG] MedCatProcessor: APP log level set to : DEBUG
[2023-12-09 12:40:35,221] [DEBUG] MedCatProcessor: MedCAT log level set to : 40
[2023-12-09 12:40:35,221] [INFO] MedCatProcessor: Initializing MedCAT processor ...
[2023-12-09 12:40:36,464] [INFO] MedCatProcessor: Loading model pack...
[2023-12-09 12:40:48,798] [WARNING] medcat.cdb: You have MedCAT version '1.7.3' installed while the CDB was exported by MedCAT version '1.3.0',
which may or may not work. If you experience any compatibility issues, please reinstall MedCAT
or download the compatible model.
[2023-12-09 12:40:56 +0100] [1390041] [WARNING] Worker with pid 1390061 was terminated due to signal 9
[2023-12-09 12:40:56 +0100] [1390144] [INFO] Booting worker with pid: 1390144
[2023-12-09 12:40:56 +0100] [1390144] [INFO] Worker spawned (pid: 1390144)
[2023-12-09 12:40:56 +0100] [1390144] [INFO] APP_CUDA_DEVICE_COUNT device variables not set
[2023-12-09 12:44:35 +0100] [1390041] [INFO] Handling signal: winch
[2023-12-09 12:44:35 +0100] [1390041] [INFO] Handling signal: winch

With ./envs/env_medcat → export LOG_LEVEL=DEBUG gives almost the same log output

(mdcservice) [tgwelter@node011 MedCATservice]$ ./start-service-prod.sh 
./envs/env_medcat: line 18: .: filename argument required
.: usage: . filename [arguments]
threads = 4
Starting up Flask app using gunicorn server ...
[2023-12-09 12:55:19 +0100] [1393190] [INFO] Starting gunicorn 20.1.0
[2023-12-09 12:55:19 +0100] [1393190] [INFO] Listening at: http://0.0.0.0:5000 (1393190)
[2023-12-09 12:55:19 +0100] [1393190] [INFO] Using worker: gthread
[2023-12-09 12:55:19 +0100] [1393191] [INFO] Booting worker with pid: 1393191
[2023-12-09 12:55:19 +0100] [1393191] [INFO] Worker spawned (pid: 1393191)
[2023-12-09 12:55:19 +0100] [1393191] [INFO] APP_CUDA_DEVICE_COUNT device variables not set
[2023-12-09 12:55:53,068] [DEBUG] MedCatProcessor: APP log level set to : DEBUG
[2023-12-09 12:55:53,068] [DEBUG] MedCatProcessor: MedCAT log level set to : DEBUG
[2023-12-09 12:55:53,068] [INFO] MedCatProcessor: Initializing MedCAT processor ...
[2023-12-09 12:55:54,313] [INFO] MedCatProcessor: Loading model pack...
[2023-12-09 12:56:06,799] [WARNING] medcat.cdb: You have MedCAT version '1.7.3' installed while the CDB was exported by MedCAT version '1.3.0',
which may or may not work. If you experience any compatibility issues, please reinstall MedCAT
or download the compatible model.
[2023-12-09 12:56:09 +0100] [1393190] [WARNING] Worker with pid 1393191 was terminated due to signal 9
[2023-12-09 12:56:09 +0100] [1393353] [INFO] Booting worker with pid: 1393353
[2023-12-09 12:56:09 +0100] [1393353] [INFO] Worker spawned (pid: 1393353)
[2023-12-09 12:56:09 +0100] [1393353] [INFO] APP_CUDA_DEVICE_COUNT device variables not set