photo
Robust Decision Making
IJCAI 11
Robust Online Optimization of Reward-uncertain MDPs
Kevin Regan and Craig Boutilier
Twenty-Second Joint Conference on Artificial Intelligence
Imprecise-reward Markov decision processes (IRMDPs) are MDPs in which the reward function is only partially specified (e.g., by some elicitation process). Recent work using minimax regret to solve IRMDPs has shown, despite theoretical intractability, how the set of policies that are nondom- inated w.r.t. reward uncertainty can be exploited to accelerate regret computation. However, the number of nondominated policies is generally so large as to undermine this leverage. In this paper, we show how the quality of the approximation can be improved online by pruning/adding nondominated policies during reward elicitation, while maintain- ing computational tractability. Drawing insights from the POMDP literature, we also develop a new anytime algorithm for constructing the set of nondominated policies with provable (anytime) error bounds.
@article{ijcai2011:rb:a, author = {Kevin Regan and Craig Boutilier}, title = { Robust Online Optimization of Reward-uncertain MDPs }, journal = { Twenty-Second Joint Conference on Artificial Intelligence (IJCAI 2011) }, year = {2011} }
IJCAI 11
Eliciting Additive Reward Functions for Markov Decision Processes
Kevin Regan and Craig Boutilier
Twenty-Second Joint Conference on Artificial Intelligence
Specifying the reward function of a Markov decision process (MDP) can be demanding, requiring human assessment of the precise quality of, and tradeoffs among, various states and actions. However, reward functions often possess considerable structure which can be leveraged to streamline their specification. We develop new, decision-theoretically sound heuristics for eliciting rewards for factored MDPs whose reward functions exhibit additive independence. Since we can often find good policies without complete reward specification, we also develop new (exact and approximate) algorithms for robust optimization of imprecise-reward MDPs with such additive reward. Our methods are evaluated in two domains: autonomic computing and assistive technology.
@article{ijcai2011:rb:b, author = {Kevin Regan and Craig Boutilier}, title = { Eliciting Additive Reward Functions for Markov Decision Processes }, journal = { Twenty-Second Joint Conference on Artificial Intelligence (IJCAI 2011) }, year = {2011} }
AAAI 10
Robust Policy Computation in Reward-uncertain MDPs using Nondominated Policies
Kevin Regan and Craig Boutilier
Twenty-Fourth AAAI Conference on Artificial Intelligence
The precise specification of reward functions for Markov decision processes (MDPs) is often extremely difficult, motivating research into both reward elicitation and the robust solution of MDPs with imprecisely specified reward (IRMDPs). We develop new techniques for the robust optimization of IRMDPs, using the minimax regret decision criterion, that exploit the set of nondominated policies, i.e., policies that are optimal for some instantiation of the imprecise reward function. Drawing parallels to POMDP value functions, we devise a Witness-style algorithm for identifying nondominated policies. We also examine several new algorithms for computing minimax regret using the nondominated set, and examine both practically and theoretically the impact of approximating this set. Our results suggest that a small subset of the nondominated set can greatly speed up computation, yet yield very tight approximations to minimax regret.
@article{aaai2010rb, author = {Kevin Regan and Craig Boutilier}, title = { Robust Policy Computation in Reward-uncertain MDPs using Nondominated Policies }, journal = { Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2010) }, year = {2010} }
UAI 09
Regret-based Reward Elicitation for Markov Decision Processes
Kevin Regan and Craig Boutilier
The 25th Conference on Uncertainty in Artificial Intelligence
The specification of a Markov decision process (MDP) can be difficult. Reward function specification is especially problematic; in practice, it is often cognitively complex and time-consuming for users to precisely specify rewards. This work casts the problem of specifying rewards as one of preference elicitation and aims to minimize the degree of precision with which a reward function must be specified while still allowing optimal or near-optimal policies to be produced. We first discuss how robust policies can be computed for MDPs given only partial reward information using the minimax regret criterion. We then demonstrate how regret can be reduced by efficiently eliciting reward information using bound queries, using regret-reduction as a means for choosing suitable queries. Empirical results demonstrate that regret-based reward elicitation offers an effective way to produce near-optimal policies without resorting to the precise specification of the entire reward function.
@article{uai09rb, author = {Kevin Regan and Craig Boutilier}, title = { Regret-based Reward Elicitation for Markov Decision Processes }, journal = { UAI-09 The 25th Conference on Uncertainty in Artificial Intelligence }, year = {2009} }
NIPS 08
Regret-based Reward Elicitation for Markov Decision Processes
Kevin Regan and Craig Boutilier
Workshop on Model Uncertainty and Risk in Reinforcement Learning held at the Conference on Neural Information Processing Systems
Traditional methods for finding optimal policies in stochastic, multi-step decision environments require a precise model of both the environment dynamics and the rewards associated with taking actions and the effects of those actions. While dynamics are often easily learnable through observation—and in many domains are stable across different users—reward functions are more problematic. In practice, it is often cognitively complex and time-consuming for users to precisely specify rewards. This work casts the problem of specifying rewards as one of preference elicitation. We will first discuss how robust policies can be computed for Markov Decision Processes given partial reward information using the minimax regret criterion. We will then show how regret can be reduced by efficiently eliciting rewards information using bound queries. Regret-based elicitation of reward offers an efficient way to produce desirable policies without resorting to the precise specification of the entire reward function, and as such, opens up a new avenue for the design of stochastic controllers.
@article{nips08workshop, author = {Kevin Regan and Craig Boutilier}, title = { Regret-based Reward Elicitation for Markov Decision Processes }, journal = { NIPS-08 workshop on Model Uncertainty and Risk in Reinforcement Learning }, volume = {1}, year = {2008} }
ICML 06
An Analytic Solution to Discrete Bayesian Reinforcement Learning
Pascal Poupart and Nikos Vlassis and Jesse Hoey and Kevin Regan
The Twenty-First National Conference on Artificial Intelligence
Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an online fashion as they interact with their environment. Existing RL algorithms come short of achieving this goal because the amount of exploration required is often too costly and/or too time consuming for online learning. As a result, RL is mostly used for offline learning in simulated environments. We propose a new algorithm, called BEETLE, for effective online learning that is computationally efficient while minimizing the amount of exploration. We take a Bayesian model-based approach, framing RL as a partially observable Markov decision process. Our two main contributions are the analytical derivation that the optimal value function is the upper envelope of a set of multivariate polynomials, and an efficient point-based value iteration algorithm that exploits this simple parameterization.
@article{icml06, author = {Pascal Poupart and Nikos Vlassis and Jesse Hoey and Kevin Regan}, title = { An Analytic Solution to Discrete Bayesian Reinforcement Learning }, journal = { The Twenty-First National Conference on Artificial Intelligence }, volume = {1}, year = {2006} }
Learning User Concepts
AAAI 10
Simultaneous Elicitation of Preference Features and Utility
Paolo Viappiani, Craig Boutilier and Kevin Regan
Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2010)
ICML 09
Online Feature Elicitation in Interactive Optimization
Craig Boutilier, Kevin Regan and Paolo Viappiani
International Conference on Machine Learning
Most models of utility elicitation in decision support and interactive optimization assume a predefined set of "catalog" features over which user preferences are expressed. However, users may differ in the features over which they are most comfortable expressing their preferences. In this work we consider the problem of feature elicitation: a user's utility function is expressed using features whose definitions (in terms of "catalog" features) are unknown. We cast this as a problem of concept learning, but whose goal is to identify only enough about the concept to enable a good decision to be recommended. We describe computational procedures for identifying optimal alternatives w.r.t. minimax regret in the presence of concept uncertainty; and describe several heuristic query strategies that focus on reduction of relevant concept uncertainty.
@article{icml09brv, author = {Craig Boutilier, Kevin Regan and Paolo Viappiani}, title = { Online Feature Elicitation in Interactive Optimization }, journal = { ICML-09 Iternational Conference on Machine Learning }, year = {2009} }
RecSys 09
Preference Elicitation with Subjective Features
Craig Boutilier, Kevin Regan and Paolo Viappiani
The 3rd ACM Conference on Recommender Systems
Utility or preference elicitation is a critical component in many rec- ommender and decision support systems. However, most frame- works for elicitation assume a predefined set of features (e.g., as derived from catalog descriptions) over which user preferences are expressed. Just as user preferences vary considerably, so too can the features over which they are most comfortable expressing these preferences. In this work, we consider preference elicitation in the presence of subjective or user-defined features. We treat the prob- lem of learning a user’s feature definition as one of concept learn- ing, but whose goal is to learn only enough about the concept defi- nition to enable a good decision to be made. This is complicated by the fact that user preferences are unknown. We describe computa- tional procedures for identifying optimal alternatives w.r.t minimax regret in the presence of both utility and concept uncertainty; and develop several heuristic query strategies that focus on reduction of relevant concept and utility uncertainty. Computational experi- ments verify the efficacy of these strategies.
@article{uai09rb, author = {Craig Boutilier, Kevin Regan and Paolo Viappiani}, title = { Preference Elicitation with Subjective Features }, journal = { RECSYS-09, The 3rd ACM Conference on Recommender Systems }, year = {2009} }
Reputation & Trust
AAAI 06
Bayesian Reputation Modeling in E-Marketplaces Sensitive to Subjectivity, Deception and Change
Kevin Regan and Pascal Poupart and Robin Cohen
International Conference on Machine Learning
We present a model for buying agents in e-marketplaces to interpret evaluations of sellers provided by other buying agents, known as advisors. The interpretation of seller evaluations is complicated by the inherent subjectivity of each advisor, the possibility that advisors may deliberately provide misleading evaluations to deceive competitors and the dynamic nature of seller and advisor behaviors that may naturally change seller evaluations over time. Using a Bayesian approach, we demonstrate how to cope with subjectivity, deception and change in a principled way. More specifically, by modeling seller properties and advisor evaluation functions as dynamic random variables, buyers can progressively learn a probabilistic model that naturally and ``correctly'' calibrate the interpretation of seller evaluations without having to resort to heuristics to explicitely detect and filter/discount unreliable seller evaluations. Our model, called BLADE, is shown empirically to achieve lower mean error in the estimation of seller properties when compared to other models for reasoning about advisor ratings of sellers in electronic maketplaces.
@article{aaai06, author = {Kevin Regan and Pascal Poupart and Robin Cohen}, title = { Bayesian Reputation Modeling in E-Marketplaces Sensitive to Subjectivity, Deception and Change }, journal = { International Conference on Machine Learning }, volume = {1}, year = {2006} }
PST 05
The Advisor-POMDP: A Principled Approach to Trust through Reputation in Electronic Markets
Kevin Regan and Robin Cohen and Pascal Poupart
Conference on Privacy Security & Trust
This paper examines approaches to representing uncertainty in reputation systems for electronic markets with the aim of constructing a decision theoretic framework for collecting information about selling agents and making purchase decisions in the context of a social reputation system. A selection of approaches to representing reputation using Dempster-Shafter Theory and Bayesian probability are surveyed and a model for collecting and using reputation is developed using a Partially Observable Markov Decision Process.
@article{pst05, author = {Kevin Regan and Robin Cohen and Pascal Poupart}, title = { The Advisor-POMDP: A Principled Approach to Trust through Reputation in Electronic Markets }, journal = { Conference on Privacy Security & Trust }, volume = {1}, year = {2005} }
JBT 05
Designing Adaptive Buying Agents in Electronic Markets Using Advice from Other Buyers to Model Seller Reputation
Kevin Regan and Robin Cohen
The Journal of Business and Technology
In this paper, we present a model for designing buying agents in electronic marketplaces that adapt by adjusting decisions about which sellers to select for business, based on reputation ratings provided by other buying agents in the marketplace (known as indirect reputation information). The focus of our research is a method for effectively representing and employing this indirect reputation information. In particular, we address the case of buying agents providing deceptive information to other buyers, by having each buyer model not only the reputation of all sellers in the marketplace but also the reputation of each buyer. We also systematically account for differing standards between buyers, in assessing the reputation of sellers. Our approach is to avoid disreputable buyers when seeking advice and then to pass through three phases: adjusting for subjective bias, estimating seller reputability from reputable buyers only and eliminating any wild predictions. Overall the model presented here builds on a strong foundation of how best to model seller reputation but allows for a suitably cautious integration of a social aspect to the reputation modeling, towards improved purchasing decisions for buyers.
@article{jbt05, author = {Kevin Regan and Robin Cohen}, title = { Designing Adaptive Buying Agents in Electronic Markets Using Advice from Other Buyers to Model Seller Reputation }, journal = { The Journal of Business and Technology }, volume = {1}, year = {2005} }
DASUM 05
Sharing models of sellers amongst buying agents in electronic marketplaces
Kevin Regan and Robin Cohen and Thomas Tran
Decentralized Agent Based and Social Approaches to User Modelling workshop
In this paper, we demonstrate the value of decentralized models of sellers in electronic marketplaces, as the basis for purchasing decisions from buyers. We discuss how buying agents can model the reputation of sellers in order to make effective purchases and how these agents can also take advantage of reputation ratings provided by other buying agents in the marketplace, once it is established that each buyer will be independently modelling the sellers. We outline the methods required to make use of reputation ratings of sellers provided by other buyers, including adjustments for possibly different subjective scales and for possible deception in reporting the reputation ratings. In particular, we propose that buying agents model the reputation not only of sellers but of advisors, as part of the overall processing. We describe as well how buyers can most effectively model the set of possible buying advisors in order to apply selective processing of the information provided by these agents, in their purchasing decisions.
@article{dasum05, author = {Kevin Regan and Robin Cohen and Thomas Tran}, title = { Sharing models of sellers amongst buying agents in electronic marketplaces }, journal = { Decentralized Agent Based and Social Approaches to User Modelling workshop }, volume = {1}, year = {2005} }
BASEWEB 05
Indirect Reputation Assessment for Adaptive Buying Agents in Electronic Markets
Kevin Regan and Robin Cohen
Business Agents and the Semantic Web workshop
In this paper, we present a model for designing buying agents in electronic marketplaces that adapt by adjusting decisions about which sellers to select for business, based on reputation ratings provided by other buying agents in the marketplace (known as indirect reputation information). The focus of our research is a method for effectively representing and employing this indirect reputation information. In particular, we address the case of buying agents providing deceptive information to other buyers, by having each buyer model not only the reputation of all sellers in the marketplace but also the reputation of each buyer. We also systematically account for differing standards between buyers, in assessing the reputation of sellers. Overall, the model presented here builds on a strong foundation of how best to model seller reputation but allows for a suitably cautious integration of a social aspect to the reputation modelling, towards improved purchasing decisions for buyers.
@article{baseweb05, author = {Kevin Regan and Robin Cohen}, title = { Indirect Reputation Assessment for Adaptive Buying Agents in Electronic Markets }, journal = { Business Agents and the Semantic Web workshop }, volume = {1}, year = {2005} }