Amazon Web Services is amazing

I needed to convert some very large (40,000 x 40,000 pixel) images from one format to another, and I ran into a problem. My desktop PC just couldn’t handle the memory requirements despite having 8 GB RAM and a 64-bit OS. Even at 8-bits per pixel, imagemagick was failing on the JPEG2000 encoding part of my pipeline. (Details later.)

So I finally decided to try out this whole cloud computing thing for real. I’m very familiar with using other people’s virtual machines, but for the first time, I rented my own — an instance with 244 GB RAM and 8x 800 GB SSD hard drives (“i2.8xlarge”). I chose RedHat Enterprise Linux 7 and recompiled imagemagick for the best performance for my situation using one of their free instances. Then I upgraded to the real instance and started processing images.

Three hours on EC later, imagemagick finally finished processing all of the images. The final compressed files were only about 1 GB, but the intermediate files hit hundreds of GB, which would have been very annoying on traditional hard drives. The computations had simultaneously used up to 120 GB RAM and a lotta GHz of processor time. I was very happy with how relatively painless the process was, given that I already knew how to use Linux and I had worked out the bugs on the free instance.

How much did this endeavor cost? $22 and change. I couldn’t have even bought a motherboard capable of handling 128 GB RAM, let alone the RAM or CPU necessary to do this job. While I could have downsampled the source images and trivialized the computations, that would have sacrificed the accuracy of the final result. Plus, I didn’t really want to own or maintain that hardware in the long term; I just needed to get through some computations right now.

So anyway, here’s a hearty endorsement of Amazon Web Services and EC2. It worked great. (They paid me nothing to say this or write this; in fact, I doubt they even noticed the brief spikes in load on their clusters.)

Image processing details:

The input image was a set of histology images with about 10 um x 10 um resolution.

>identify heart.jpg
heart.jpg JPEG 45717×38257 45717×38257+0+0 8-bit sRGB 155.9MB 0.000u 0:00.001

My desktop PC was not even close to up to the task:

>identify -list resource
File       Area     Memory        Map       Disk   Thread  Throttle       Time
1536   16.525GB   7.695GiB   15.39GiB  unlimited        4         0  unlimited

The goal was to eventually get the images into JPEG2000 format, which is really a big problem because the wavelet transform isn’t cheap. Anyway, the commands looked something like this:

>convert -monitor -rotate -45 -crop 33000×10630+24580+10630 -resize 50%  heart.mpc heart.jp2