Setting up a FreeSurfer Worker Node on Azure

Acknowledgements: Thanks to Azure guru Peter Barrera at Microsoft for setting up the infrastructure. And thanks to Ezra Wegbreit at PediMind and Ryan Cabeen for their expertise with Freesurfer.

If you need to do neuro MR post-processing, but don't have access to a high performance computing cluster, it's possible to setup FreeSurfer to run on Microsoft Azure. If you don't have a lab account, this works fine with a free trial account.

This was done as a proof of concept to evaluate how expensive it would be to process the entirety of the 7.5k neuro MR studies collected each year at Lifespan. Processing a subset of the PediMind MR data took an average of 20 hours per case1 on an A6 ($0.7/hour = $14) and 15 hours on an D12 ($0.8/hour = $12). Storage is roughly $1000/TB/yr, or $1/GB/yr. We assume 1GB total of primary and results storage for each study.2

MR Studies Compute Cost Storage Cost Total
7.5k $14/per $1/per/yr $113k

Our conclusion is that doing a FreeSurfer workup on all Lifespan neuroimaging would be cost approximately 7.5k*$15=$113k, plus tech staff salary.

This cost analysis also demonstrates that you can process about 13 cases within the $200 credit budget allocated for an Azure free trial account.

If you want to setup an Azure FreeSurfer environment up from scratch, here are the basic steps.

Setup Azure Infrastructure

  • Login at http://manage.azure.com
  • Start from gallery with a new Ubuntu Server instance (I used 14.04 LTS) (Edit: Better to use CentOS6
    • Pick something with at least 4 cores and 28gb of RAM.3
    • Open an SSH endpoint on port 22
    • Open a Remote Desktop endpoint on port 5899
  • Setup a new storage account4
    • Setup a container with a name
    • Attach it to your vm
  • You may want to also create a virtual network if you are going to run many nodes and manage them through a head node scheduler.

Basic System Configuration

  • Connect
@local$ ssh [email protected]
  • Set the timezone
$ echo "America/New_York" | sudo tee /etc/timezone
America/New_York
$ sudo dpkg-reconfigure --frontend noninteractive tzdata
  • Mount a data drive

See these notes on adding a drive in Linux.

$ # Local storage, 1 GB machine specific store
$ # Figure out what the disk is named
$ sudo fdisk -l
$ # In this case, it's /dev/sdc
$ # Partition it
$ sudo fdisk /dev/sdc
$ # Follow the instructions at the link above
$ # Format your new partition
$ sudo mkfs -t ext3 /dev/sdc1
$ # Make a folder to mount it in
$ sudo mkdir /media/data
$ # Add the drive into fstab
$ sudo nano /etc/fstab
$ sudo mount -a

Ideally, we would mount a network cifs/smb share here instead.4

  • Setup the LXDE remote host

Azure likes to use Microsoft's own "Remote Desktop Protocol" for screen sharing. xrdp is a remote desktop server that uses vncserver as a backend.

$ sudo apt-get update
$ sudo apt-get install lubuntu-desktop lxde-common xrdp

Sudo this file edit to /etc/xrdp/startwm.sh so the remote window manager invokes lxde instead of X:

#. /etc/X11/Xsession
. /usr/bin/startlxde

You can add /etc/xrdp/xrdp.sh to local.rc as well to start it up automatically on system reboot.

You'll probably want to lock down rdp with some better security as well. McKearny suggests deleting the RDP endpoint and using an encrypted ssh tunnel.

Setup FreeSurfer

FreeSurfer is distributed as gzipped binaries.

# Copy binaries over
$ wget -c ftp://surfer.nmr.mgh.harvard.edu/pub/dist/freesurfer/5.3.0/freesurfer-Linux-centos6_x86_64-stable-pub-v5.3.0.tar.gz

It doesn't take long to copy the 4gb package over, but while you are waiting, you can register for a FreeSurfer license at https://surfer.nmr.mgh.harvard.edu/registration.html.

$ sudo mv freesurfer-Linux-centos6_x86_64-stable-pub-v5.3.0.tar.gz  /usr/local
$ cd /usr/local
$ sudo tar xzvf freesurfer-Linux-centos6_x86_64-stable-pub-v5.3.0.tar.gz
$ # Create a license
$ cd freesurfer
$ sudo echo "stuff copied from license" > license.txt

Do this file edit to .bashrc:

export FREESURFER_HOME=/usr/local/freesurfer
source $FREESURFER_HOME/SetUpFreeSurfer.sh

It's necessary to fix some missing dependencies under Ubuntu.

$ sudo apt-get update
$ sudo apt-get install libgomp1 make
$ cd /usr/lib/x86_64-linux-gnu/
$ sudo ln -s libjpeg.so.8 libjpeg.so.62
$ sudo ln -s libtiff.so.4 libtiff.so.3

Installation instructions for some other useful neuro processing tools can be found at Serigado.

Test Run

Terminal session:

@local$ # Get some clinical images
@local$ scp data/test/* [email protected]:/media/data/test/
@local$ ssh [email protected]
$ # Run it
$ cd /media/data/test
$ SUBJECTS_DIR=$PWD
$ recon-all -i my_data -s my_sid -all &; disown

disown is useful in case your terminal closes, you don't want the FreeSurfer job to be torn down with it. screen serves the same purpose, but has the added advantage of letting you reattach (with screen -r) to a disconnected session.

Finally, if recon-all exits with an error, you can restart it:

$ # Delete the lock if its present
$ rm my_sid/scripts/IsRunning.lh+rh
$ recon-all -s my_sid -make all &; disown

Wait for the job to terminate (about 10-20 hours).

RDP Session:

  • Start MS Remote Desktop and point it at my_freesurfer.cloudapp.net
  • Alternatively, you can login to Azure, select the "my_freesurfer" vm and click the "connect" button at the bottom of the screen. This will save an rdp profile that you can double click to connect.
  • After logging in, disable the screen saver
  • Open a terminal
  • Download and untar the bert demo dataset:
$ sudo mkdir /media/data/tutorial
$ # Change the owner since root probably owns this
$ chown $USER /media/data/tutorial
$ cd /media/data/tutorial
$ wget ftp://surfer.nmr.mgh.harvard.edu/pub/data/bert.recon.tgz
$ tar xvzf bert.recon.tgz
  • Then visualize it with this command:
$ freeview -v $SUBJECTS_DIR/bert/mri/brainmask.mgz -v $SUBJECTS_DIR/bert/mri/aseg.mgz:colormap=lut:opacity=0.2 -f $SUBJECTS_DIR/bert/surf/lh.white:edgecolor=yellow -f $SUBJECTS_DIR/bert/surf/rh.white:edgecolor=yellow -f $SUBJECTS_DIR/bert/surf/lh.pial:annot=aparc:edgecolor=red -f $SUBJECTS_DIR/bert/surf/rh.pial:annot=aparc:edgecolor=red
  • Or try visualizing your own dataset:
$ freeview -v $SUBJECTS_DIR/my_sid/mri/brainmask.mgz

It's painfully slow, but technically it works. The remote lxde session apparently doesn't have complete support for Mesa, so surface visualization pops in and out. (This appears to be fixed using the Gnome desktop under CentOS6.) The idea is that we don't want to have to pull back every segmented dataset for review. We need some way to visually review it online. It's not clear that this will work and there are several discussions online about not being able to get Mesa to work well on Azure.

Instructions to download the full FreeSurfer tutorial data set can be found here.

Using the latest FreeView

There is a much more recent and apparenlty more reliable "developer" version of FreeView available.

$ wget -c ftp://surfer.nmr.mgh.harvard.edu/pub/dist/freesurfer/  freeview/linux_centos6_x86_64/freeview.bin
$ chmod +x freeview.bin
$ # Rename it to freeview-dev so we can keep the old version around
$ sudo mv freeview.bin /usr/local/freesurfer/bin/freeview.dev

freeview-dev, as I'll call it now, relies on the FreeSurfer built VTK and QT libraries. Indeed, if there is a general QT library already installed, it may be missing symbols that freeview-dev needs. It is easiest just to put the FreeSurfer built libraries in the user LD_LIBRARY_PATH variable.

$ echo export LD_LIBRARY_PATH=$FREESURFER_HOME/lib/qt/lib:$FREESURFER_HOME/lib/vtk/lib/vtk-5.6 >> .bashrc
$ source .bashrc

Alternately, you can add the FreeSurfer built library paths for VTK and QT to /etc/ld.so.conf.d/, which will effect all processes for all users. But overriding the any package-manged QT or VTK libraries might lead to more linking trouble later.

Setup a template

At this point, you may want to setup a template. You should delete any private study data and the FreeSurfer tutorial data to save space.

  • Shutdown the vm from the Azure dashboard
  • Select the "capture" button at the bottom of the dashboard. This prompts you to save the image so that it will show up in your gallery
  • Note that this captures the entirety (or at least the used parts) of both the 30GB main drive and the 1TB storage drive we added.
  • If you haven't installed the Linux equivalent of sysprep, this image will just be a duplicate of the host at that state, including the same host name, root user, etc. You can boot it onto a different architecture with a different service name, but once you log in, it's setup identically.
  • If you do install the Linux sysprep, the capture becomes a template and you can instantiate a new machine with a new hostname, mac address, and root user from your gallery.

CentOS6

I was quite surprised to discover that FreeSurfer and FreeView both appear to work pretty much "out of the box" under CentOS6. Moreover, the surface visualization, which is a problem using xrdp and lubuntu appears to be much more stable. (Suggesting that we should see if it works with a Debian Gnome window manager.)

Start with a CentOS6.5 x86_64 gallery image and setup FreeSurfer.

Install missing packages

screen is not installed by default.

$ sudo yum install screen

Install xrdp and Gnome Desktop

xrpd installation is slightly different and requires a different desktop manager 6.

First you have to disable NetworkManager because it conflicts with Gnome. Sudo this edit at the end of /etc/yum.conf.

exclude=NetworkManager*
$ sudo yum clean all
$ sudo yum groupinstall basic-desktop desktop-platform x11 fonts

This takes a while.

$ wget http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
$ rpm -ivh epel-release-6-8.noarch.rpm
$ sudo yum install xrdp tigervnc-server
$ sudo nano /etc/xrdp/xrdp.sh

Sudo this edit at the beginning of /etc/xrdp.sh, so it can find its binary:

#$SBINDIR=/usr/local/sbin
$SBINDIR=/usr/sbin

Then install Gnome and start everything up.

$ service vncserver start
$ sudo service xrdp start
$ chkconfig xrdp on
$ chkconfig vncserver on

Emily's test section

Let's talk about cats

Black Cats 4

Did you know that most people think black cats are superstitious? But they are not very smart, because black cats are amazing. Even though sometimes they hork in your shoes. They are ~~sometimes~~ usually apologetic. Cats are---most of the time---great companions.5

Here are some of my favorite cats:

  • Buttons

  • Gretchen

  • Salem from Sabrina the Teenage Witch

  • Sarafina

If I had to rank them in order, I think it would be:

  1. Buttons & Gretchen

  2. Sarafina

  3. Salem

Other Cats

Other cats do not matter. Here are some fun cats

ß


  1. [18.5,22.5] hours 

  2. See http://azure.microsoft.com/en-us/pricing/details/virtual-machines/ 

  3. We started with an 4cpu/8gb A4 and it crashed constantly. After upgrading to an 4cpu/28gb A6 it ran fine. At that point, top claimed that recon-all's subfunctions were only using about 1GB of RAM, but the machine would startup with about 50% of the RAM in use(??). 

  4. I am told that if you do this through the new http://portal.azure.com interface, you can enable "files" as a storage type, and then get a cifs share name and access key. Then you can simply mount the container as a network shared drive with something like smb://my_containter.my_account.cloudapp.net my_key. This seems much more useful than standing up each machine with a separate virtual disk. 

  5. this is a footnote for that. 

  6. See http://www.rajinders.com/2014/01/12/installing-gnome-desktop-on-centos-running-in-windows-azure-vm/ and http://ajmatson.net/wordpress/2014/01/install-xrdp-remote-desktop-to-centos-6-5/