MedCAT Trainer Helm Chart for Kubernetes

Hi All,

Our analytics infrastructure is hosted on Microsoft Azure and a large part of it is deployed to Azure Kubernetes Service (AKS).

I have been working on a scalable deployment of MedCAT for annotation on AKS as noted in a previous post.

It made sense to also be able to deploy MedCAT Trainer on AKS. I was unable to find a Helm chart and so have started working on one.

At the time of writing this post, this is the status:

  • SOLR pod works and the UI is reachable through port-forwarding
  • NGINX pod crashes with error messages as below. Looks like the nginx container is heavily customised and I wonder if it is just possible to use k8s nginx controller with modifications. I will explore this further when I get a chance.
  • MedCATTrainer pod does not pull the image successfully yet. But, seeing that the image pulled successfully to my laptop when using docker-compose, I suspect that this may simply be a timeout issue and should be easy enough to fix.

NGINX error message

/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: /etc/nginx/conf.d/default.conf is not a file or does not exist
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2023/03/20 12:20:34 [emerg] 1#1: host not found in upstream "mct_solr" in /etc/nginx/sites-enabled/medcattrainer:17
nginx: [emerg] host not found in upstream "mct_solr" in /etc/nginx/sites-enabled/medcattrainer:17

If someone has already built a working version, please let me know before I travel too far down this rabbit hole.

Else I will post updates as I go along. Contributions welcome.

MedCAT Trainer also starts up successfully.

But, with no exposed port, not sure how the other pods talk to this pod.

docker-compose.yml does not specify a port for medcattrainer while docker-compose-prod.yml and docker-compose-mc0x.yml both expose port 8000.

I have created a new k8s service with port 8000 exposed on the MedCATTrainer pod and portforwarding this but results in the following blank webpage.

<html lang="en">

<head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width,initial-scale=1">
    <link rel="icon" href="/favicon.ico">
    <title>MedCATTrainer</title>
    <link href="/static/css/app.5980f582.css" rel="preload" as="style">
    <link href="/static/css/chunk-vendors.6879fd7c.css" rel="preload" as="style">
    <link href="/static/js/app.15921033.js" rel="preload" as="script">
    <link href="/static/js/chunk-vendors.83ec4c01.js" rel="preload" as="script">
    <link href="/static/css/chunk-vendors.6879fd7c.css" rel="stylesheet">
    <link href="/static/css/app.5980f582.css" rel="stylesheet">
</head>

<body><noscript><strong>We're sorry but MedCATTrainer doesn't work properly without JavaScript enabled. Please enable it
            to continue.</strong></noscript>
    <div id="app"></div>
    <script src="/static/js/chunk-vendors.83ec4c01.js"></script>
    <script src="/static/js/app.15921033.js"></script>
</body>

</html>

Any advice from the MedCAT team will be helpful. I suspect the mct_solr service hard-coded in the nginx container may have something to do with all of this.

But, hey, I have 2 out of 3 working. :slight_smile:

Hi @vvcb - thanks for the contribution great stuff!

the MCTrainer API (django server) is mapped on the nginx config to ‘/’ on port 8000, via the docker-compose service name, i.e. ‘medcattrainer’.

The custom nginx image probably isn’t needed here, as its really just a sites-enabled and static-content mount which could be performed at runtime not build time. Hoping to get around to looking into this soon.

The nginx service is exposed externally to 8001 and mapped to 8000

Hope that helps!

Thanks @tomolopolis .

I now have a working Helm chart that deploys to AKS and most things appear to work.

Details here → Release v0.0.1 · LTHTR-DST/medcat_trainer_helm · GitHub

It took me a little while to figure out that the nginx container was acting as both ingress controller and static content server. I think I have successfully separated these functions and have moved the ingress control function to the Kubernetes nginx-ingress controller that I already have deployed in the cluster.

There is still a lot of tidying up to do including serving this from a subpath of a host domain.

@tomolopolis , MedCATtrainer uses sqlite3 behind the scenes. I have simulated 2 users annotating simultaneously (1 in incognito mode) and it seems to work. If this is the case, this is a really useful feature. Does MedCATTrainer support multiple simultaneous users out-of-the-box.

If we have several people annotating at the same time, then will be useful to leverage some of k8s autoscaling features. However, I don’t think there is an immediate use case for this yet - don’t think anyone has an army of annotators at their disposal!