First post here. As the question in the title includes cloud requirements, I have added my very initial experience here. Apologies for ignoring the NIFI bit in the question.
This is only for running MedCAT using pre-trained models - no Elastic, training, etc. yet.
This approach prioritises fault-tolerance, auto-scaling and costs over raw performance - can’t see why I have to annotate 15+ years of clinic letters (6+ million documents - many of them quite long) in a couple of hours when I can run something in the cloud for pennies over days. The same pipeline will get reused for ongoing annotations in near real-time.
Limitations:
- Requires a few bits in place in Azure.
- Azure specific tooling - although can easily be reconfigured to run on any cloud.
We have a secure, private Azure Kubernetes cluster as part of our Trusted Research Environment.
On-Prem: I have a Python script that runs on an on-prem VM that has the data. It creates gzipped, json arrays, gives them a unique name (simple MD5 hash) and uploads to Azure Blob storage. At the same time, the script pushes a message with the blob name into a raw Azure Queue Storage.
AKS:
- I have a docker container which when deployed will read blob names from the raw queue.
It will set a visibility timeout of 5 minutes so that this message is not visible to any other worker while it is working on the blob. - It then loads the corresponding blob from Azure Storage, annotate with MedCAT and will save the gzipped output json into a different container.
- Once annotation has succeeded on a blob, it dequeues the message from the raw queue and pushes a similar message with the blob name into a processed queue.
- Blobs that raise an error get marked in a third error queue for manual processing. Nothing there so far.
Pods and Nodes:
I have setup a dedicated nodepool with spot VM instances (8vCPU, 64GB RAM, ~£0.055/h) that can scale from 0 to 20 nodes. Using Kubernetes taints/tolerations and node affinity labels, we can force our pods to be scheduled on this pool.
Each pod has a minimum request of 0.5 vCPU and 7GB RAM (after quite a bit of testing, this amount of RAM was adequate for the mc_modelpack_snomed_int_16_mar_2022_25be3857ba34bdd5 model pack and a maximum limit of 1 vCPU and 16GB (although there have been no OOM kills so far).
So, each 8 vCPU/64 G can easily handle 8 pods.
Scaling with KEDA (https://keda.sh/)
KEDA monitors the raw queue and when a certain number of messages have accumulated, triggers the above deployment and will scale the number of replicas as required.
I have set it up to trigger at 5 messages and a pod for every 10 messages. So if there are 300 messages, it will try to deploy 30 pods.
Each of these pods will churn through the raw message queue until there are no more messages left. KEDA scales everything back to 0 and the nodes automatically switch off.
Costs:
If I had to run this setup with 5 spot nodes (so 40 workers, each with 1vCPU and 7GB RAM) for 24 hours a day for an entire month - this will cost me less than £200 (0.055 x 5 x 24 x 30).
Fault tolerance:
Spot instances are discounted up to 90% but do sometimes get kicked off. But, with the combination of storage queues and KEDA, this system seems to slowly grind through anything I throw at it.
Why:
We already had an AKS cluster for our JupyterHub deployment and a few other microservices. I have another project that is compute/gpu intense that I cannot afford to keep running all the time. So, was looking for something straightforward(ish) to start playing with KEDA.
We generate about 2500 clinic documents a day. These will only take minutes to annotate and Azure VM billing is per minute. So, it does not make sense to have a beefy VM up and running all the time.
And, when we start throwing our EPR free text, radiology reports, etc. at it, all I have to do is increase the maximum number of allowed nodes in this pool (or spin up a more powerful spot pool).
Apologies for the long post. Not sure how useful this is for someone - or if this is even the right way to do it. It was fun nonetheless.