Wednesday, June 1, 2011

Dirichlet Distribution

I'm taking part in a journal club on data modeling comprised of faculty and graduate students. We have started with the paper, "Modeling individual differences using Dirichlet Processes" by Navarro et al. Part of the paper reading process is to delve into the mathematical background that underpins the authors' research, which is how we wound up having a lively discussion on what the Dirichlet Process is and why its important.

In order to help the graduate students visualize the Dirichlet Process, I made a widget in Matlab that plots the joint PDF of a third order Dirichlet Process. The density is colormapped onto a three dimensional representation of the sample space, which for this case is the plane x+y+z=1, confined to the first quadrant.

A third order Dirichlet Process deals with a random process that has three discrete outcomes, but the probabilities for those outcomes are unknown. The Dirichlet Process quantifies the possible spread of probabilities for the outcomes. Note that each of the three unknown probabilities x, y, and z have to be between 0 and 1, and that x + y + z = 1 (because the sum of probabilities in a sample space always equals one).

You can download the widget (including another version for the 2nd order Dirichlet Process) by clicking here. From within Matlab, just run "dirch_3" or "dirch_2". The smaller plots at the bottom show the marginal densities for the individual variables.

These demos show 2nd and 3rd order processes only because we can physically render them on a computer. Of course mathematics allows us to expand the Dirichlet Process up to n dimensions using all the same concepts and intuition that apply to the 2nd and 3rd order cases, even though creating visualizations of them is impossible.

No comments:

Post a Comment