Friday, May 20, 2011

C++ Performance Benchmarking

We are in the process of developing computational models of neural structures with the eventual goal of studying the effects of deep brain stimulation on dystonia. We are starting by reproducing the results of one of the best-recognized models of neural tissue in the deep-brain, published in 2004 by Rubin and Terman. That paper draws heavily from an earlier 2002 paper which describes some of the basic neural cell models. I have an exceptionally capable student who has been working with these models for a while now and has successfully re-created each of the individual cell types; we are now working on linking those cells together synaptically and optimizing the simulation process.

The neural models used by Rubin and Terman are Hodgkin-Huxley-type which means that each of the voltage-gated membrane proteins is modelled by one or more differential equations. For example, in the case of the Subthalamic Neuron cells, there are seven first-order differential equations that must be solved in parallel.

Numerically, there are any number of methods for solving these differential equations. The most simple are Forward Euler and Runge-Kutta, each of which use a mathematical approximation of the derivative to iteratively determine the next value of the solution, given the current value of the solution. More advanced methods such as Runge-Kutta-Fehlberg and Runge-Kutta Prince-Dormand use adaptive time-stepping, which means they exploit the fact that you can solve the differential equation less often when the signal isn't changing so much in time. Of course, there is a computational overhead involved with calculating what the optimal time-step is, which  might temper some of the advantage of adaptive-timestepping.

I decided to run a sample simulation on a few different computers to determine how fast the simulations are running. I ran an array of 100 subthalamic neurons (each with seven differential equations) for 2500ms under a variety of conditions. Not that it matters, but the neurons were independent; there was no interneuron connectivity for this test. All code was written in C++ using the GSL library for solving the differential equations.

I ran two tests: (1) which simulation method is better, and (2) how fast are the various computers we are using.

Test 1
Using our fastest computer (a dedicated processing-only workstation; see below for more details) I simulated the 100-neuron model using three different differential equation solvers.

Method Timestep Type Execution Time Points Generated
Runge-Kutta 4 Fixed 147.7s 250,000
Ruge-Kutta-Fehlberg Adaptive4.86s10,237
Runge-Kutta Prince-DormandAdaptive5.7s4,558

The RKPD method produces by far the fewest number of data points, but takes about 17% longer to execute that the fastest method, RKF.

Test 2
The simulation was repeated of five different computers; the RKF simulation was used in every case.

Computer Specs Execution Time
Dedicated Linux Workstation3.2GHz Quad Core i7, 6GB RAM4.86s
iMac (2009)3.06GHz Core 2 Duo, 4GB RAM5.40s
Mac Mini (2011)2.4GHz Core 2 Duo, 2GB RAM6.88s
Macbook Pro (2007)2.4GHz Core 2 Duo, 4GB RAM7.21s
Old Laptop*2.2GHz Core 2 Duo, 4GB RAM17.45s

*The "old laptop" simulation was actually run on virtual Ubunutu box running on the old laptop under Windows...

The first test is interesting because it emphasizes the tradeoff between fewer number of points versus longer simulation time. The second test demonstrates that our new dedicated Linux machine is actually quite fast, even when compared with other reasonably fast machines.

Its important to work through some of these issues while the simulations are still relatively small; understanding the tradeoffs now will be very helpful when the simulations get up to thousands or even tens of thousands of neurons.

No comments:

Post a Comment