MedCAT as service without docker

It’s odd that it doesn’t show a stack trace along with the terminated worker.
You could try adding --log-level debug command line argument to force gunicorn to use debug log level.

Without seeing the stack trace it’s really difficult to judge what the issue is.

You could also try to install another version of medcat to see if it works on another version. But I don’t really see why that should result in a different outcome.

Reading up on gunicorn respawning without a trace of what went wrong on Google, it is mentioned that when other (e.g. OS related) causes kill the worker process , it may be that gunicorn just notices the worker was killed and respawns a new process. I will ask the systems admins of the cluster that I am on if they have implemented some settings that do not fit our use case. I can not read all the system logs here.
Is the entire model loaded into RAM (vocab + cdb ~ 1.2 GB)?

With --log-level debug no more information at the time of the crash. Perhaps you @mart.ratas see something in the start up phase?
I will contact the system admins to see if there is anything in their log or settings.

(mdcservice) [tgwelter@node012 MedCATservice]$ ./start-service-prod.sh
threads = 4
Starting up Flask app using gunicorn server ...
[2023-12-11 11:20:09 +0100] [2900978] [DEBUG] Current configuration:
  config: config.py
  wsgi_app: None
  bind: ['0.0.0.0:5000']
  backlog: 2048
  workers: 1
  worker_class: sync
  threads: 4
  worker_connections: 1000
  max_requests: 0
  max_requests_jitter: 0
  timeout: 300
  graceful_timeout: 30
  keepalive: 2
  limit_request_line: 4094
  limit_request_fields: 100
  limit_request_field_size: 8190
  reload: False
  reload_engine: auto
  reload_extra_files: []
  spew: False
  check_config: False
  print_config: False
  preload_app: False
  sendfile: None
  reuse_port: False
  chdir: /trinity/home/tgwelter/MedCATservice
  daemon: False
  raw_env: []
  pidfile: None
  worker_tmp_dir: None
  user: 1165
  group: 1165
  umask: 0
  initgroups: False
  tmp_upload_dir: None
  secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}
  forwarded_allow_ips: ['127.0.0.1']
  accesslog: -
  disable_redirect_access_to_syslog: False
  access_log_format: %(t)s [ACCESSS] %(h)s "%(r)s" %(s)s "%(f)s" "%(a)s"
  errorlog: -
  loglevel: debug
  capture_output: False
  logger_class: gunicorn.glogging.Logger
  logconfig: None
  logconfig_dict: {}
  syslog_addr: udp://localhost:514
  syslog: False
  syslog_prefix: None
  syslog_facility: user
  enable_stdio_inheritance: False
  statsd_host: None
  dogstatsd_tags:
  statsd_prefix:
  proc_name: None
  default_proc_name: wsgi
  pythonpath: None
  paste: None
  on_starting: <function OnStarting.on_starting at 0x1555465782c0>
  on_reload: <function OnReload.on_reload at 0x155546578400>
  when_ready: <function WhenReady.when_ready at 0x155546578540>
  pre_fork: <function Prefork.pre_fork at 0x155546578680>
  post_fork: <function post_fork at 0x15554657a980>
  post_worker_init: <function PostWorkerInit.post_worker_init at 0x155546578900>
  worker_int: <function WorkerInt.worker_int at 0x155546578a40>
  worker_abort: <function WorkerAbort.worker_abort at 0x155546578b80>
  pre_exec: <function PreExec.pre_exec at 0x155546578cc0>
  pre_request: <function PreRequest.pre_request at 0x155546578e00>
  post_request: <function PostRequest.post_request at 0x155546578ea0>
  child_exit: <function ChildExit.child_exit at 0x155546578fe0>
  worker_exit: <function WorkerExit.worker_exit at 0x155546579120>
  nworkers_changed: <function NumWorkersChanged.nworkers_changed at 0x155546579260>
  on_exit: <function OnExit.on_exit at 0x1555465793a0>
  proxy_protocol: False
  proxy_allow_ips: ['127.0.0.1']
  keyfile: None
  certfile: None
  ssl_version: 2
  cert_reqs: 0
  ca_certs: None
  suppress_ragged_eofs: True
  do_handshake_on_connect: False
  ciphers: None
  raw_paste_global_conf: []
  strip_header_spaces: False
[2023-12-11 11:20:09 +0100] [2900978] [INFO] Starting gunicorn 20.1.0
[2023-12-11 11:20:09 +0100] [2900978] [DEBUG] Arbiter booted
[2023-12-11 11:20:09 +0100] [2900978] [INFO] Listening at: http://0.0.0.0:5000 (2900978)
[2023-12-11 11:20:09 +0100] [2900978] [INFO] Using worker: gthread
[2023-12-11 11:20:09 +0100] [2900979] [INFO] Booting worker with pid: 2900979
[2023-12-11 11:20:09 +0100] [2900979] [INFO] Worker spawned (pid: 2900979)
[2023-12-11 11:20:09 +0100] [2900979] [INFO] APP_CUDA_DEVICE_COUNT device variables not set
[2023-12-11 11:20:09 +0100] [2900978] [DEBUG] 1 workers
[2023-12-11 11:20:33 +0100] [2900979] [DEBUG] POST /api/process
[2023-12-11 11:20:33,165] [DEBUG] MedCatProcessor: APP log level set to : DEBUG
[2023-12-11 11:20:33,165] [DEBUG] MedCatProcessor: MedCAT log level set to : DEBUG
[2023-12-11 11:20:33,165] [INFO] MedCatProcessor: Initializing MedCAT processor ...
[2023-12-11 11:20:34,430] [INFO] MedCatProcessor: Loading model pack...
[2023-12-11 11:20:48,652] [WARNING] medcat.cdb: You have MedCAT version '1.7.3' installed while the CDB was exported by MedCAT version '1.3.0',
which may or may not work. If you experience any compatibility issues, please reinstall MedCAT
or download the compatible model.
[2023-12-11 11:20:57 +0100] [2900978] [WARNING] Worker with pid 2900979 was terminated due to signal 9
[2023-12-11 11:20:57 +0100] [2901139] [INFO] Booting worker with pid: 2901139
[2023-12-11 11:20:57 +0100] [2901139] [INFO] Worker spawned (pid: 2901139)
[2023-12-11 11:20:57 +0100] [2901139] [INFO] APP_CUDA_DEVICE_COUNT device variables not set

Yes, the entire model will be loaded into memory. But that shouldn’t really be an issue with the amount of memory you’ve got, should it?

With that said, since this is a HPC cluster, they might not want you to run heavy applications on the front end (i.e without running through the workload management system).
If that’s what you’ve been doing, you could try doing the same things in an interactive job environment.

Nothing that really pops out to me.

Thanks, I contacted the sysadmins. They did not mention that my login environment was restricted in anyway but we will see. Indeed, memory should certainly not be a problem.

@mart.ratas It appears to have to do with the cluster settings because the service runs on my laptop without problem. Thanks for the help, I will let you know when it runs on the cluster.

@mart.ratas It was indeed a cluster setting that put a limit on memory usage. Increasing memory for the session resolved the issue and it is working fine now.
regards,
Tom

1 Like