Help make a ubiquitous model of decision processes

Markov decision processes are mathematical models used to determine the best courses of action when both current circumstances and future consequences are uncertain. They’ve had a huge range of applications — in natural-resource management, manufacturing, operations management, robot control, finance, epidemiology, scientific-experiment design, and tennis strategy, just to name a few.
But analyses involving Markov decision processes (MDPs) usually make some simplifying assumptions. In an MDP, a given decision doesn’t always yield a predictable result; it could yield a range of possible results. And each of those results has a different “value,” meaning the chance that it will lead, ultimately, to a desirable outcome.
Characterizing the value of given decision requires collection of empirical data, which can be prohibitively time consuming, so analysts usually just make educated guesses. That means, however, that the MDP analysis doesn’t guarantee the best decision in all cases.
In the Proceedings of the Conference on Neural Information Processing Systems, published last month, researchers from MIT and Duke University took a step toward putting MDP analysis on more secure footing. They show that, by adopting a simple trick long known in statistics but little applied in machine learning, it’s possible to accurately characterize the value of a given decision while collecting much less empirical data than had previously seemed necessary.
In their paper, the researchers described a simple example in which the standard approach to characterizing probabilities would require the same decision to be performed almost 4 million times in order to yield a reliable value estimate.