Engineering the Interface: C++ Performance Benchmarking

We are in the process of developing computational models of neural structures with the eventual goal of studying the effects of deep brain stimulation on dystonia. We are starting by reproducing the results of one of the best-recognized models of neural tissue in the deep-brain, published in 2004 by Rubin and Terman. That paper draws heavily from an earlier 2002 paper which describes some of the basic neural cell models. I have an exceptionally capable student who has been working with these models for a while now and has successfully re-created each of the individual cell types; we are now working on linking those cells together synaptically and optimizing the simulation process.

The neural models used by Rubin and Terman are Hodgkin-Huxley-type which means that each of the voltage-gated membrane proteins is modelled by one or more differential equations. For example, in the case of the Subthalamic Neuron cells, there are seven first-order differential equations that must be solved in parallel.

Numerically, there are any number of methods for solving these differential equations. The most simple are Forward Euler and Runge-Kutta, each of which use a mathematical approximation of the derivative to iteratively determine the next value of the solution, given the current value of the solution. More advanced methods such as Runge-Kutta-Fehlberg and Runge-Kutta Prince-Dormand use adaptive time-stepping, which means they exploit the fact that you can solve the differential equation less often when the signal isn't changing so much in time. Of course, there is a computational overhead involved with calculating what the optimal time-step is, which might temper some of the advantage of adaptive-timestepping.

I decided to run a sample simulation on a few different computers to determine how fast the simulations are running. I ran an array of 100 subthalamic neurons (each with seven differential equations) for 2500ms under a variety of conditions. Not that it matters, but the neurons were independent; there was no interneuron connectivity for this test. All code was written in C++ using the GSL library for solving the differential equations.

I ran two tests: (1) which simulation method is better, and (2) how fast are the various computers we are using.

Test 1
Using our fastest computer (a dedicated processing-only workstation; see below for more details) I simulated the 100-neuron model using three different differential equation solvers.

Method	Timestep Type	Execution Time	Points Generated
Runge-Kutta 4	Fixed	147.7s	250,000
Ruge-Kutta-Fehlberg	Adaptive	4.86s	10,237
Runge-Kutta Prince-Dormand	Adaptive	5.7s	4,558

The RKPD method produces by far the fewest number of data points, but takes about 17% longer to execute that the fastest method, RKF.

Test 2
The simulation was repeated of five different computers; the RKF simulation was used in every case.

Computer	Specs	Execution Time
Dedicated Linux Workstation	3.2GHz Quad Core i7, 6GB RAM	4.86s
iMac (2009)	3.06GHz Core 2 Duo, 4GB RAM	5.40s
Mac Mini (2011)	2.4GHz Core 2 Duo, 2GB RAM	6.88s
Macbook Pro (2007)	2.4GHz Core 2 Duo, 4GB RAM	7.21s
Old Laptop*	2.2GHz Core 2 Duo, 4GB RAM	17.45s

*The "old laptop" simulation was actually run on virtual Ubunutu box running on the old laptop under Windows...

The first test is interesting because it emphasizes the tradeoff between fewer number of points versus longer simulation time. The second test demonstrates that our new dedicated Linux machine is actually quite fast, even when compared with other reasonably fast machines.

Its important to work through some of these issues while the simulations are still relatively small; understanding the tradeoffs now will be very helpful when the simulations get up to thousands or even tens of thousands of neurons.

Engineering the Interface

Friday, May 20, 2011

C++ Performance Benchmarking

No comments:

Post a Comment

Search This Blog