The RIH CIRR

Derek Merck
Rhode Island Hospital and Brown University
Providence, RI

Build Status Coverage Status Doc Status

Overview

The RIH Clinical Imaging Research Repository (CIRR) was the initial development site for all of these configurations.

The Admin stack provides

The RIH CIRR stack provides:

  • An `Orthanc <>`__ DICOM archive
  • An Orthanc instance configured as a simple DICOM ingress multiplexer to the archive and 3D workstations
  • An Orthanc instance configured as a DICOM Q/R bridge to the PACS for external data pulls
  • An Orthanc “MockPacs” for testing
  • A `Postgres <>`__ database with bind-mounted storage
  • An attachable service network (“cirr_service_network”)

The bridge service can be manipulated using DIANA watcher scripts to monitor and index the clinical PACS, and to exfiltrate and anonymize large data collections.

Additional project repositories can be added in the “projects” stack.

Usage

Provisioning

  • At RIH, the CIRR runs in production on a pair of 16-core Xeon servers with 200GB of RAM each. One node has an attached iSCSI interface to a 45TB StorSimple. The system handles around one hundred thousand image studies, or about 10 million image instances, per year.
  • For staging, we use two disposable desktop-type machines with 8GB of RAM and about 1TB of disk.
  • For some testing, we use two disposable Atom-based cloud instances with 8GB of RAM and 10GB of disk.
  1. Install Docker-ce and docker-compose. Note: Requires Docker version >= 18 for ingress routing.
  2. Create directories for persistent storage on the node that will support storage-bound operations (PostgreSQL, Splunk). See cloud-init: https://gist.github.com/derekmerck/7b55c34c91954e84aa155e487ffe2e8d
$ mkdir -p /data/{splunk,postgres}

Install the admin stack

  1. Set variables for abstractions and secrets. Create a cirr.env file on the master and source it. Note: The Splunk password must be at least 8 characters long, or Splunk will fail to initialize properly.
export DATA_DIR=/data
export PORTAINER_PASSWORD=<hashed pw>
export SPLUNK_PASSWORD=<plain pw>
export SPLUNK_HEC_TOKEN=<TOKEN0-TOKEN0-TOKEN0-TOKEN0>
  1. Install the administrative backend. The admin stack only needs to be deployed once, and then all other stacks can share the same cluster and data management systems.
$ . cirr.env && docker stack deploy -c docker-stacks/admin/admin-stack.yml admin
$ . cirr.env && docker stack deploy -c docker-stacks/admin/splunk-service.yml admin

Install the CIRR service stack

  1. Set additional variables for abstractions and secrets
export DATA_DIR=/data
export ORTHANC_PG_DATABASE=orthanc
export ORTHANC_PASSWORD=orthanc
export POSTGRES_PASSWORD=postgres
export MOD_PACS=PACS,10.0.0.1,11112  # aet, ip addr, port format
export MOD_WORKSTATION=TERARECON,10.0.0.2,11112
  1. Start up the service stack
$ . cirr.env && docker stack deploy -c examples/rih-cirr/cirr_v2.yml cirr
  1. Start up a projects stack. The CIRR can have additional Orthanc and DIANA nodes attached to it for DICOM review and automated post-processing tasks.
$ . cirr.env && docker stack deploy -c examples/rih-cirr/projects projects
  1. To access legacy data, a CIRRv1 stack is also available. (Skip this on new servers and testing.)
$ docker stack deploy -c examples/rih-cirr/cirr_v1.yml cirr1

Install a Test Service

  1. Add a mock pacs and random study header generator:
$ docker stack deploy -c docker-stacks/diana-workers/mock-stack.yml mock

Notes

Reset Volumes

Note: if volumes are created on a node, they are not removed when the stack is removed. They must manually be removed to clear errors about directories not being found.

Points of Potential Failure

  • The database backend is constrained to a single system with a large disk store. This would benefit from a distributed storage system, like Rexray.
  • The IP address for the bridge is hard-coded into the sending modalities and PACS. They should be using a name with multiple IP’s or an non-bound IP that can be reassigned across the cluster as necessary.
  • With a setup of 3 machines, the system only fault tolerant against loss of a single manager node

Postgresql Config

See http://pgtune.leopard.in.ua for simple config tool. For our servers w 200GB of RAM I used the following:

max_connections = 200
shared_buffers = 25GB
effective_cache_size = 75GB
work_mem = 128MB
maintenance_work_mem = 2GB
min_wal_size = 1GB
max_wal_size = 2GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100

Although it did not seem to make much of a difference in performance.

Portainer Reset

Fix Portainer showing multiple copies of the same container:

$ docker service rm admin_portainer-agent
$ docker service rm admin_portainer
$ docker stack deploy -c docker-stacks/admin/admin-stack.yml admin

Splunk Config

Don’t forget to turn off acknowledgement in the HEC – otherwise it will insist on a data channel and show up with 400’s

Testing:

curl -k http://splunk:8088/services/collector -H "Authorization: Splunk $SPLUNK_HEC_TOKEN" -d '{"event":"Hello, World!"}'

Increase length for _json sources:

/opt/splunk/etc/system/local/props.conf

[_json]
TRUNCATE = 500000

Currently have to manually do a bunch of things:

  • add a dicom index
  • add a hec token
  • enable hec
  • switch off https for hec
  • re-deploy with correct hec token

I did these all with an Ansible role previously. Need to investigate implementing similar here.