Publications
Preprints

Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Botao
Hao, Morteza Ibrahimi, Dieterich Lawson, Xiuyuan Lu, Brendan O'Donoghue,
Benjamin Van Roy, ``Evaluating Predictive Distributions: Does Bayesian Deep Learning Work?.''

Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen,
``Reinforcement Learning, Bit by Bit.''

S. Dong, B. Van Roy, Z. Zhou,
``Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent State.''

A. Devraj, B. Van Roy, and K. Xu,
``A Bit Better? Quantifying Information for Bandit Learning.''

D. Russo and B. Van Roy,
``Satisficing in
TimeSensitive Bandit Learning,'' forthcoming in Mathematics of
Operations Research.

V. Dwaracherla and B. Van Roy,
``Langevin DQN.''
2021

D. Arumugam and B. Van Roy,
``The Value of Information When Deciding What to Learn,''
Advances in Neural Information Processing Systems 34, 2021.

D. Arumugam and B. Van Roy,
``Deciding
What to Learn: A RateDistortion Approach,''
ICML, 2021.
2020

Z. Wen, D. Precup, M. Ibrahimi, A. Barreto, B. Van Roy, Satinder Singh,
``On Efficiency in Hierarchical Reinforcement
Learning,'' Advances in Neural Information Processing Systems 33, 2020.

D. Arumugam and B. Van Roy,
``Randomized Value Functions via Posterior StateAbstraction Sampling,''
NeurIPS Workshop on Biological and Artificial Reinforcement Learning, 2020.

V. Dwaracherla, X. Lu, M. Ibrahimi, I. Osband, B. Van Roy, Z. Wen,
``Hypermodels for Exploration,''
ICLR, 2020.

I. Osband, Y. Doron, M. Hessel, J. Aslanides, E. Sezener, A. Saraiva,
K. McKinney, T. Lattimore, C. Szepezvari, S. Singh, B. Van Roy, R. Sutton,
D. Silver, H. Van Hasselt, ``Behavior Suite for Reinforcement Learning,''
ICLR, 2020.
2019

B. Van Roy and S. Dong,
``Comments on the DuKakadeWangYang Lower Bounds.''

X. Lu and B. Van Roy,
``InformationTheoretic Confidence Bounds for Reinforcement Learning,''
Advances in Neural Information Processing Systems 32, 2019.

S. Dong, T. Ma, and B. Van Roy,
``On the Performance of Thompson Sampling on Logistic Bandits,''
COLT, 2019.

I. Osband, D. Russo, B. Van Roy, Z. Wen,
``Deep Exploration via
Randomized Value Functions,'' Journal of Machine
Learning Research, Vol. 20, No. 124, pp. 162, 2019.
2018

S. Dong and B. Van Roy,
``An InformationTheoretic
Analysis for Thompson Sampling with Many Actions,''
Advances in Neural Information Processing Systems 31, 2018.

M. Dimakopoulou, I. Osband, and B. Van Roy,
``Scalable Coordinated
Exploration in Concurrent Reinforcement Learning.''
Advances in Neural Information Processing Systems 31, 2018.
[demo]

D. Russo, B. Van Roy, A. Kazerouni, I. Osband and Z.
Wen, ``A Tutorial on Thompson Sampling," Foundations and Trends in
Machine Learning, Vol. 11, No. 1, pp. 196, 2018.
[code]

M. Dimakopoulou and B. Van Roy,
``Coordinated Exploration in
Concurrent Reinforcement Learning,''
Proceedings of The 35th
International Conference on Machine Learning, 2018.
[demo]

D. Russo and B. Van Roy,
``Learning to Optimize Via
InformationDirected Sampling,'' Operations Research,
Vol. 66, No. 1, pp. 230252, 2018.
2017

X. Lu and B. Van Roy,
``Ensemble Sampling,''
Advances in Neural Information Processing Systems 30, 2017.

A. Kazerouni, M. Ghavamzadeh, Y. AbbasiYadkori, B. Van Roy,
``Conservative Contextual
Linear Bandits,''
Advances in Neural Information Processing Systems 30, 2017.

I. Osband and B. Van Roy,
``Why is Posterior Sampling Better than Optimism for Reinforcement Learning?'' Proceedings of The 34rd
International Conference on Machine Learning, 2017.

I. Osband and B. Van Roy,
``On Optimistic versus
Randomized Exploration in Reinforcement Learning,'' The Third Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2017.

Z. Wen and B. Van Roy,
``Efficient
Exploration and Value Function Generalization in Deterministic
Systems,'' Mathematics of Operations Research, Vol. 42,
No. 3, pp. 762782, 2017.
2016

I. Osband, C. Blundell, A. Pritzel, B. Van Roy,
``Deep Exploration Via
Bootstrapped DQN,''
Advances in Neural Information Processing Systems 29, 2016.

I. Osband, B. Van Roy, and Z. Wen,
``Generalization and Exploration
Via Randomized Value Functions,'' Proceedings of The 33rd
International Conference on Machine Learning, pp. 23772386, 2016.
[supplementary material]

D. Russo and B. Van Roy,
``An InformationTheoretic
Analysis of Thompson Sampling,'' Journal of Machine
Learning Research, Vol. 17, pp. 130, 2016.
2015

B. Park and B. Van Roy,
``Adaptive
Execution: Exploration and Learning of Price Impact,'' Operations
Research,
Vol. 63, No. 5, pp. 10581076, 2015.
2014

D. Russo and B. Van Roy,
``Learning to Optimize Via InformationDirected Sampling,''
Advances in Neural Information Processing Systems 27, pp. 15831591, 2014.

I. Osband and B. Van Roy,
``ModelBased Reinforcement Learning and the Eluder Dimension,''
Advances in Neural Information Processing Systems 27,
pp. 14661474, 2014.

I. Osband and B. Van Roy,
``NearOptimal Reinforcement Learning in Factored MDPs,''
Advances in Neural Information Processing Systems 27, pp. 604612, 2014.

D. Russo and B. Van Roy,
``Learning
to Optimize Via Posterior Sampling,'' Mathematics
of Operations Research, Vol. 39, No. 4, pp. 12211243, 2014.

Y.H. Kao and B. Van Roy,
``Directed
Principal Component Analysis,'' Operations Research,
Vol. 62, No. 4, pp. 957972, 2014.
2013

D. Russo and B. Van Roy,
``Eluder Dimension and the Sample Complexity of Optimistic Exploration,''
Advances in Neural
Information Processing Systems 26, pp. 22562264, 2013.

I. Osband, D. Russo, and B. Van Roy,
``(More) Efficient Reinforcement
Learning Via Posterior Sampling,'' Advances in Neural
Information Processing Systems 26, pp. 30033011, 2013.

Z. Wen and B. Van Roy,
``Efficient
Exploration and Value Function Generalization in Deterministic
Systems,'' Advances in Neural
Information Processing Systems 26, pp. 30213029, 2013.

Y.H. Kao and B. Van Roy,
``Learning
a Factor Model Via Regularized PCA,'' Machine Learning,
Vol. 91, No. 3, pp. 279303, 2013.
2012

Z. Wen, L. J. Durlofsky, B. Van Roy, and K. Aziz,
``Approximate
Dynamic Programming for Optimizing Oil Production,''
Chapter 25 in Reinforcement Learning and Approximate Dynamic Programming for
Feedback Control, edited by F. L. Lewis and D. Liu, WileyIEEE Press, 2012.

M. Ibrahimi, A. Javanmard, and B. Van Roy
``Efficient
Reinforcement Learning for High Dimensional Linear Systems,''
Advances in Neural Information Processing Systems 25,
MIT Press, 2012.

M. T. Padilla and B. Van Roy,
``Intermediated
Blind Portfolio Auctions,'' Management Science,
Vol. 58, No. 9, pp. 17471760, 2012.

C. C. Moallemi, B. Park, and B. Van Roy,
``Strategic
Execution in the Presence of an Uninformed Arbitrageur,''
Journal of Financial Markets, Vol. 15, pp. 361391, 2012.

A. Chairawongse, S. Kiatsupaibul, S. Tirapat, and B. Van Roy,
``Portfolio
Selection with Qualitative Input,'' Journal
of Banking and Finance, Vol. 36, No. 2, pp. 489496, 2012.
2011

G. Y. Weintraub, C. L. Benkard, and B. Van Roy,
``Industry
Dynamics: Foundations for Models with an Infinite Number of Firms,''
Journal of Economic Theory, Vol. 146, No. 5, pp. 19651994, 2011.

C. C. Moallemi and B. Van Roy, ``Resource
Allocation Via Message Passing,'' INFORMS
Journal on Computing, Vol. 23, No. 2, pp, 205219, 2011.

J. Han and B. Van Roy, ``Control of
Diffusions Via Linear Programming,'' in Stochastic Programming:
The State of the Art, in Honor of George B. Dantzig, edited by Gerd
Infanger, pp. 329354, Springer, 2011.

Z. Wen, L. J. Durlofsky, B. Van Roy, and K. Aziz,
``Use of
Approximate Dynamic Programming for Production Optimization,'' SPE
Proceedings, 2011.
2010

B. Van Roy and X. Yan,
``Manipulation
Robustness of Collaborative Filtering,''
Management Science, Vol. 56, No. 11, pp. 19111929, 2010.

B. Van Roy,
``On
RegressionBased Stopping Times,'' Discrete Event
Dynamic Systems, Vol. 20, No. 3, pp. 307324, 2010.

R. Johari, G. Y. Weintraub, and B. Van Roy,
``Investment
and Market Structure in Industries with Congestion,''
Operations Research, Vol. 58, No. 5, 2010, pp. 13031317.

C. C. Moallemi and B. Van Roy, ``Convergence of the MinSum Algorithm for
Convex Optimization,'' IEEE Transactions on
Information Theory, Vol. 56, No. 4, pp. 20412050, 2010.

G. Y. Weintraub, C. L. Benkard, and B. Van Roy,
``Computational
Methods for Oblivious Equilibrium,'' Operations Research,
Vol. 58, No. 4, pp. 12471265, 2010.
[Matlab
code (updated July 2012)]

V. F. Farias, C. C. Moallemi, B. Van Roy, and T. Weissman,
``Universal
Reinforcement Learning,'' IEEE Transactions on Information
Theory, Vol. 56, No. 5, pp. 24412454, 2010.

V. F. Farias and B. Van Roy,
``Dynamic
Pricing with a Prior on Market Response,'' Operations Research,
Vol. 58, No. 1, pp. 1629, 2010.
2009

Y. H. Kao, B. Van Roy, and X. Yan,
``Directed
Regression,''
Advances in Neural Information Processing Systems 22,
MIT Press, pp. 889897, 2009.

B. Van Roy and X. Yan,
``ManipulationResistant Collaborative Filtering Systems,''
Proceedings of the Third ACM Conference on Recommender Systems,
pp. 165172, 2009.

C. C. Moallemi and B. Van Roy,
``Convergence
of MinSum Message Passing for Quadratic
Optimization,'' IEEE Transactions on Information
Theory, Vol. 55, No. 5, pp. 24132423, 2009.
2008

G. Y. Weintraub, C. L. Benkard, and B. Van Roy,
``Markov
Perfect Industry Dynamics with
Many Firms,'' Econometrica, Vol. 76, No. 6, 2008, pp. 13751411.
[Technical Appendix]

X. Yan and B. Van Roy,
``Reputation
Markets,'' Proceedings of the ACM SIGCOMM 2008 Workshop on Economics of
Networks, Systems, and Computation.

H. Permuter, P. Cuff, B. Van Roy, and T. Weissman, ``Capacity of the
Trapdoor Channel with Feedback,'' IEEE
Transactions on Information Theory, Vol. 54, No. 7, pp. 31503165,
2008.

C. C. Moallemi, S. Kumar, and B. Van Roy, ``Approximate
and DataDriven Dynamic
Programming for Queueing Networks,'' 2008.
2007

V. F. Farias and B. Van Roy, ``An
Approximate Dynamic Programming
Approach to Network Revenue Management,'' 2007.

N. O. Keohane, B. Van Roy, and R. J. Zeckhauser,
``Managing
the Quality of a Resource with Stock and Flow Controls,''
Journal of Public Economics, Vol. 91, 2007, pp. 541569.

B. Van Roy,
``A Short
Proof of Optimality for the MIN Cache Replacement Algorithm,''
Information Processing Letters, Vol. 102, No. 2, pp. 7273, 2007.
2006
 G. Y. Weintraub, C. L. Benkard, and B. Van Roy,
``Oblivious Equilibrium: A Mean Field Approximation for Large Scale
Dynamic Games,''
Advances in Neural Information Processing Systems 18, MIT Press,
2006.

C. C. Moallemi and B. Van Roy,
``Consensus
Propagation,'' IEEE Transactions on Information Theory,
Vol. 52, No. 11, pp. 47534766, 2006.
 C. C. Moallemi and B. Van Roy,
``Consensus Propagation,''
Advances in Neural Information Processing Systems 18, MIT Press,
2006.

D. S. Choi and B. Van Roy,
``A
Generalized Kalman Filter for Fixed Point Approximation
and Efficient TemporalDifference Learning,''
Discrete Event Dynamic Systems, Vol. 16, No. 2, April 2006.

P. Rusmevichientong, B. Van Roy, and P. W. Glynn,
``A
NonParametric Approach to MultiProduct Pricing,''
Operations Research, Vol. 54, No. 1, 2006, pp. 8298.

P. Rusmevichientong, J. A. Salisbury, L. T. Truss, B. Van Roy, and P. W. Glynn,
``Opportunities
and Challenges in Using Online Preference Data for Vehicle Pricing: A Case
Study at General Motors,'' Journal of Revenue
and Pricing Management, Vol. 5, No. 1, pp. 4561, 2006.

D. P. de Farias and B. Van Roy,
``A
CostShaping Linear Program for AverageCost Approximate Dynamic
Programming with Performance Guarantees,''
Mathematics of Operations Research, Vol. 31, No. 3, pp. 597620,
2006.

V. F. Farias and B. Van Roy
``Approximation
Algorithms for Dynamic Resource Allocation,''
Operations Research Letters, Vol. 34, No. 2, March 2006,
pp. 180190.

R. Cogill, M. Rotkowitz, B. Van Roy, S. Lall,
``An
Approximate Dynamic Programming Approach to Decentralized Control
of Stochastic Systems,''
Lecture Notes in Control and Information Sciences,
Springer, Berlin, 2006, Vol. 329, pp. 243256.

B. Van Roy
``Performance
Loss Bounds for Approximate Value Iteration with State Aggregation,''
Mathematics of Operations Research, Vol. 31, No. 2, pp. 234244, 2006.
 B. Van Roy,
``TD(0) Leads to Better Policies than Approximate Value Iteration,''
Advances in Neural Information Processing Systems 18, MIT Press,
2006.

V. F. Farias and B. Van Roy,
``Tetris:
A Study of Randomized Constraint Sampling,''
in Probabilistic and Randomized Methods for Design
Under Uncertainty, G. Calafiore and F. Dabbene, eds., SpringerVerlag,
2006.
2005
 D. P. de Farias and B. Van Roy,
``A Linear Program for Bellman Error Minimization with Performance
Guarantees,''
Advances in Neural Information Processing Systems 17, MIT Press,
2005.
 V. F. Farias, C. C. Moallemi, B. Van Roy, and T. Weissman,
``A Universal Scheme for Learning,''
Proceedings of the IEEE International Symposium on Information Theory,
Adelaide, Australia, September 2005.

X. Yan, P. Diaconis, P. Rusmevichientong, and B. Van Roy,
``Solitaire:
Man Versus Machine,''
Advances in Neural Information Processing Systems 17,
MIT Press, 2005.
2004

R. Cogill, M. Rotkowitz, B. Van Roy, S. Lall,
``An
Approximate Dynamic Programming Approach to Decentralized Control
of Stochastic Systems,''
Proceedings of the Allerton Conference on Communication,
Control, and Computing, 2004, pp. 10401049.

D. P. de Farias and B. Van Roy,
``
On Constraint Sampling in the Linear Programming Approach to
Approximate Dynamic Programming,''
Mathematics of Operations Research, Vol. 29, No. 3,
August 2004, pp. 462478.

H. Zhang, A. Goel, R. Govindan, K. Mason, and B. Van Roy,
``Improving EigenvectorBased Reputation Systems Against Collusion,''
Workshop on Algorithms and Models for the Web Graph, October 2004.

W. B. Powell and B. Van Roy,
``Approximate
Dynamic Programming for HighDimensional Dynamic Resource
Allocation Problems,'' in Handbook of Learning and Approximate
Dynamic Programming, edited by
J. Si, A. G. Barto, W. B. Powell, and D. Wunsch,
WileyIEEE Press, Hoboken, NJ, 2004, pp. 261279.

C. C. Moallemi and B. Van Roy
``Distributed
Optimization in Adaptive Networks,'' Advances in Neural Information
Processing Systems 16, MIT Press, 2004.
[appendix]
2003

D. P. de Farias and B. Van Roy,
``The
Linear Programming Approach to Approximate Dynamic Programming,''
Operations Research, Vol. 51, No. 6, NovemberDecember 2003,
pp. 850865.
 C. C. Moallemi and B. Van Roy,
``Decentralized Protocols for Optimization of Sensor Networks,''
Proceedings of Allerton 2003.
 D. P. de Farias and B. Van Roy,
``Approximate Linear Programming for AverageCost Dynamic Programming,''
Advances in Neural Information Processing Systems 15, MIT Press,
2003.

B. Van Roy,
``Book
Review: SelfLearning Control of Finite Markov Chains,
by A. S. Poznyak, K. Najim, and E. GomezRamirez,'' Automatica,
Volume 39, Issue 2, February 2003, pp. 373376.
2002

N. Agarwal, J. Basch, P. Beckmann, P. Bharti, S. Bloebaum, S. Casadei,
A. Chou, P. Enge, W. Fong, N. Hathi, W. Mann, A. Sahai, J. Stone, J.
Tsitsiklis, and B. Van Roy,
``Algorithms
for GPS Operation Indoors and
Downtown,'' GPS Solutions, Vol. 6, No. 3, December, 2002,
pp. 149160.
 J. N. Tsitsiklis and B. Van Roy,
``
On Average Versus Discounted Reward TemporalDifference
Learning,'' Machine Learning, Vol. 49, No. 23, 2002, pp. 179191.
2001
 D. S. Choi and B. Van Roy,
``A Generalized Kalman Filter for Fixed Point Approximation
and Efficient TemporalDifference Learning,''
Proceedings of the International Conference
on Machine Learning, 2001.

P. Rusmevichientong and B. Van Roy,
``A
Tractable POMDP for a Class of Sequencing Problems,''
Proceedings of the Conference on Uncertainty in Artificial
Intelligence, 2001.

B. Van Roy, ``
NeuroDynamic Programming: Overview and Recent
Trends,'' in Handbook of Markov Decision
Processes: Methods and Applications,
edited by E. Feinberg and A. Shwartz,
Kluwer, 2001.

J. N. Tsitsiklis and B. Van Roy,
``Regression Methods
for Pricing Complex AmericanStyle Options,''
IEEE Transactions on Neural Networks,
Vol. 12, No. 4 (special issue on computational finance), July 2001,
pp. 694703.

P. Rusmevichientong and B. Van Roy,
``
An Analysis of Belief Propagation on the Turbo Decoding
Graph with Gaussian Densities,''
IEEE Transactions on Information Theory, Vol. 47,
No. 2, pp. 745765, 2001.
2000

P. Rusmevichientong and B. Van Roy,
``An Analysis of Turbo Decoding with Gaussian Priors,''
Advances in Neural Information Processing Systems 12, MIT
Press, 2000.

N. O. Keohane, B. Van Roy, and R. J. Zeckhauser,
``The Optimal Harvesting of Environmental Bads,''
Proceedings of the IEEE Conference on Decision and
Control, 2000.

D. P. de Farias and B. Van Roy,
``
On the Existence of Fixed Points for Approximate Value
Iteration and TemporalDifference Learning,''
Journal of Optimization Theory and Applications,
Vol. 105, No. 3, June, 2000.

D. P. de Farias and B. Van Roy,
``Approximate Value Iteration with Randomized Policies,''
Proceedings of the IEEE Conference on Decision and
Control, 2000.

D. P. de Farias and B. Van Roy,
``Approximate Value Iteration and TemporalDifference Learning,''
Proceedings of the IEEE Symposium 2000 on Adaptive
Systems for Signal Processing, Communications
and Control, 2000.

D. P. de Farias and B. Van Roy,
``Fixed Points for Approximate Value Iteration and
TemporalDifference Learning,''
Proceedings of the International Conference
on Machine Learning, 2000.
1999
 J. N. Tsitsiklis and B. Van Roy,
``Average Cost
TemporalDifference Learning,'' Automatica,Vol. 35,
No. 11, November 1999, pp. 17991808.

B. Van Roy, ``TemporalDifference Learning and Applications in Finance,''
Computational Finance (Proceedings of the Sixth International Conference
on Computational Finance, Leonard N. Stern School
of Business, January 68, 1999).
Edited by Y. S. AbuMostafa, B. LeBaron, A. W. Lo, and A. S. Weigend.
Cambridge, MA: MIT Press, 1999.
 J. N. Tsitsiklis and B. Van Roy,
``Optimal Stopping of
Markov Processes: Hilbert Space Theory,
Approximation Algorithms, and an
Application to Pricing HighDimensional
Financial Derivatives,''
IEEE Transactions on Automatic Control,
Vol. 44, No. 10, October 1999, pp. 18401851.
1997

J. N. Tsitsiklis and B. Van Roy, ``Average Cost TemporalDifference
Learning,'' Proceedings of the IEEE Conference on
Decision and Control, 1997.

J. N. Tsitsiklis and B. Van Roy, ``Overview of NeuroDynamic
Programming and a Case Study in Optimal Stopping,'' Proceedings of
the IEEE Conference on Decision and Control, 1997.

J. N. Tsitsiklis and B. Van Roy, ``Approximate Solutions to
Optimal Stopping Problems,'' Advances in Neural Information Processing
Systems 9, MIT Press, 1997.
 J. N. Tsitsiklis and B. Van Roy,
``An Analysis of
TemporalDifference Learning with Function Approximation,''
IEEE Transactions on Automatic Control,
Vol. 42, No. 5, May 1997, pp. 674690.

J. N. Tsitsiklis and B. Van Roy, ``Analysis of
TemporalDifference Learning with Function Approximation,''
Advances in Neural Information Processing Systems 9, MIT
Press, 1997.

B. Van Roy, D. P. Bertsekas, Y. Lee, and J. N. Tsitsiklis,
``A NeuroDynamic Programming Approach to Retailer Inventory
Management,'' Proceedings of the IEEE Conference on
Decision and Control, 1997.
(full length version)

R. Kennedy, Y. Lee, B. Van Roy, C. Reed, and R. Lippman,
Solving Data Mining Problems Through Pattern Recognition,
PrenticeHall, 1997.
1996
 J. N. Tsitsiklis and B. Van Roy, ``FeatureBased Methods for
Large Scale Dynamic Programming,'' Machine
Learning, Vol. 22, 1996, pp. 5994.

B. Van Roy and J. N. Tsitsiklis, `` Stable Linear Approximations
to Dynamic Programming for Stochastic Control Problems with
Local Transitions,'' Advances in Neural Information Processing
Systems 8, MIT Press, 1996.
1995

R. Kennedy, Y. Lee, C. Reed, and B. Van Roy, Solving Pattern
Recognition Problems, Unica,
1995.
Theses

B. Van Roy, ``Learning
and Value Function Approximation in Complex
Decision Processes,'' PhD Thesis, Massachusetts
Institute of Technology, May 1998.

B. Van Roy, ``FeatureBased Methods for Large Scale Dynamic
Programming,'' Master's Thesis, Massachusetts
Institute of Technology, January 1995.

B. Van Roy, ``Differential Cost Functions for
Training Neural Network Pattern Classifiers,'' Bachelor's Thesis,
Massachusetts Institute of Technology, May 1993.