As a trivial example of the non-existence of mutually consistent individual WTP amounts when social influences are important, consider a society of two people with the following preferences for funding a proposed project:
Individual 1 is willing to pay up to the maximum WTP amount that anyone else pays (in this case, just individual 2), so that no one can accuse him of failing to pay his fair share. If no one else pays anything, then individual 1 is willing to pay $100.
Individual 2 is willing to pay what he considers his fair share, namely, the total social benefit of the project (which he defines as the sum of WTPs from everyone else – in this case, just individual 1 – divided by the number of people in society, in this case, 2).
With these preferences, there is no well-defined set of individual WTP amounts. Letting A denote the WTP for individual 1 and B the WTP for individual 2, there is no pair of WTP amounts, ($A, $B), satisfying the individual preference conditions that A = B for B > 0, else A = 100; and B = A/2.
Multiple Decision Biases Contribute to Learning Aversion The network of decision biases in Figure 12.1 shows a prominent role for learning aversion, meaning reluctance to seek or use information that might change a decision for the better. The term “learning aversion” (Louis, 2009) is not widely used in decision science. However, we believe it is central to understanding how to avoid premature action and to improve the practice and outcomes of BCA. For example, Table 12.2 summarizes ten well-documented “decision traps,” or barriers to effective decision-making by individuals and organizations, discussed in a popular book (Russo and Schoemaker, 1982). Most of these traps involve failing to take sufficient care to collect, think about, appropriately use, and deliberately learn from relevant information that could improve decisions. Not keeping track of decision results (number 9), failing to make good use of feedback from the real world (number 8), failing to collect relevant information because of overconfidence in one’s own judgment (number 4), and trusting too much in the most readily available ideas and information (number 5) are prominent examples of failure to learn effectively from experience. Although most of the examples in the Decision Traps book (Russo and Schoemaker, 1989) are drawn from the world of business, the same failings are pervasive in applied risk analysis, policy analysis, and BCA. For example, in the Dublin coal-burning ban example previously considered, the original researchers failed to collect relevant information (what happened to mortality rates outside the ban area over the same period?), while expressing great confidence in their own judgments that the correct interpretation of the data was obvious (“could not be more clear”) (Harvard School of Public Health, 2002).
Table 12.2. Ten Decision Traps (from Russo and Schoemaker, 1989)
1) Plunging In – Beginning to gather information and reach conclusions without first taking a few minutes to think about the crux of the issue you’re facing or to think through how you believe decision like this one should be made.
2) Frame Blindness – Setting out to solve the wrong problem because you have created a mental framework for your decision, with little thought, that causes you to overlook the best options or lose sight of important objectives.
3) Lack of Frame Control – Failing to consciously define the problem in more ways than one or being unduly influenced by the frames of others.
4) Overconfidence in Your Judgment – Failing to collect key factual information because you are too sure of your assumptions and opinions.
5) Shortsighted Shortcuts – Relying inappropriately on “rules of thumb” such as implicitly trusting the most readily available information or anchoring too much on convenient facts.
6) Shooting From The Hip – Believing you can keep straight in your head all the information you’ve discovered, and therefore “winging it” rather than following a systematic procedure when making the final choice.
7) Group Failure – Assuming that with many smart people involved, good choices will follow automatically, and therefore failing to manage the group decision-making process.
8) Fooling Yourself About Feedback – Failing to interpret the evidence from past outcomes for what it really says, either because you are protecting your ego or because you are tricked by hindsight.
9) Not Keeping Track – Assuming that experience will make its lessons available automatically, and therefore failing to keep systematic records to track the results of your decisions and failing to analyze these results in ways that reveal their key lessons.
10) Failure to Audit Your Decision Process – Failing to create and organized approach to understanding your own decision-making, so you remain constantly exposed to all the above mistakes.
Figure 12.1 suggests that such learning-aversion is not only a product of over-confidence (which, in turn, might reflect a predilection to consider only information and interpretations that support the views with which one is already endowed, to avoid the loss of those comfortable views and the negative affect associated with such a loss). Hyperbolic discounting and ambiguity aversion are also shown as contributors to learning aversion. Hyperbolic discounting implies that the immediate costs of learning (e.g., costs of having to collect new information that might disconfirm current beliefs, and costs of having to update current beliefs and decision rules that depend on them) may overwhelm (at present) the potential future benefits of being able to make better decisions based on the new information – even if, in retrospect, the potential (but delayed) benefits would be judged much larger than the costs of learning. Ambiguity aversion, as axiomatized by Gilboa and Schmeidler (1989) and others (Maccheronia et al., 2006) implies that a decision-maker will sometimes refuse free information that could improve decisions (Al-Najjar and Weinstein, 2009). For example, in principle, an ambiguity-averse decision-maker might refuse sufficiently informative, free genetic information that is highly relevant for decisions on lifestyle, healthcare planning, and insurance purchasing (Hoy et al., 2014). Empirically, fuller disclosure of scientific uncertainties to women facing cancer treatment choices does not necessarily improve the quality of their decisions (by any measure evaluated), but does significantly reduce their subsequent (post-decision) satisfaction with the decisions that are eventually made (Polti et al., 2011).
BCA facilitates learning-averse decision-making. Its golden rule is to choose the action (from among those being evaluated) that maximizes the expected discounted net benefit. There is no requirement that expected values must be calculated from adequate information, or that more information collection must continue until some optimality condition is satisfied before a final BCA comparison of alternatives is made. In this respect, BCA differs from other normative frameworks, including decision analysis with explicit value-of-information calculations, and optimal statistical decision models (such as the Sequential Probability Ratio Test) with explicit optimal stopping rules and decision boundaries for determining when to stop collecting information and take action. Since learning-averse individuals (Hoy et al., 2014) and organizations (Russo and Schoemaker, 1989) typically do not collect enough information (as judged in hindsight) before acting, prescriptive disciplines should explicitly encourage optimizing information collection and learning as a prelude to evaluating, comparing, and choosing among final decision alternatives (Russo and Schoemaker, 1989). Helping users to overcome learning aversion is therefore a potentially valuable direction for improving the current practice of BCA.
In a collective choice context, learning aversion may be strategically rational if discovering more information about the probable consequences of alternative choices could disrupt a coalition’s agreement on what to do next (Louis, 2009). But collective learning aversion may also arise because of free-rider problems or other gaps between private and public interests.
Example: Information externalities and learning aversion in clinical trials In clinical trials, a well known dilemma arises when each individual seeks his or her own self-interest, i.e., the treatment that is expected to be best for his or her own specific case, given presently available information. If everyone uses the same treatment, then the opportunity to learn about potentially better (but possibly worse) treatments may never be taken. Given a choice between a conventional treatment that gives a 51% survival probability with certainty and a new, experimental treatment that is equally likely to give an 80% survival probability or a 20% survival probability, and that will give the same survival probability (whichever it is) to all future patients, each patient might elect the conventional treatment (since 51% > 0.5*0.2 + 0.5*0.8 = 50%). But then it is never discovered whether the new treatment is in fact better. The patient population continues to endure an individual survival probability of 51% for every case, when an 80% survival probability might well be available (with probability 50%). The same remains true even if there are many possible treatment alternatives, so that the probability that at least some of them are better than the current incumbent approaches 100%. Ethical discussions of the principle of clinical equipoise (should a physician prescribe an experimental treatment when there is uncertainty about whether it performs better than a conventional alternative, especially when opinions are divided?) recognize that refusal to experiment with new treatments (possibly due to ambiguity-aversion) in each individual case imposes a costly burden from failure to learn on the patient population as a whole, and on each member of it when he or she must choose among options whose benefits have not yet been well studied (Gelfand, 2013). The principle that maximizing expected benefit in each individual case can needlessly reduce the expected benefit for the entire population is of direct relevance to BCA, as discussed further in the next example.
Example: Desirable interventions with uncertain benefits become undesirable when they are scaled up
Many people who would be willing to pay $1 for a 50-50 chance to gain $3 or nothing (expected net value of $1.50 expected benefit - $1 cost = $0.50) might baulk at paying $100,000 for a 50-50 chance to gain $300,000 or nothing. Indeed, for risk-averse decision-makers, scaling up a favorable prospect with uncertain benefits by multiplying both costs and benefits by a large enough factor can make the prospect unacceptable. (As an example, for a decision-maker with exponential utility function evaluating a prospect with normally distributed benefits having mean M and variance V, the certainty equivalent of n copies of the prospect, where all of n of them share a common uncertainty and the same outcome, has the form CE = nM – kn2V, where k reflects subjective relative risk aversion. Since the first term grows linearly and the second term grows quadratically with the scaling factor n, the certainty equivalent is negative for sufficiently large n.) Now consider a local ordinance, such as a ban on coal-burning, that has uncertain health benefits and known implementation costs, such that its certainty equivalent is assessed as positive for a single county. If the same ban is now scaled up to n counties, so that the same known costs and uncertain benefits are replicated n times, then the certainty equivalent will be negative for sufficiently large n. A bet worth taking on a small scale is not worth taking when the stakes are scaled up too many times. Yet, top-down regulations that apply the same action (with uncertain benefits) to dozens, hundreds, or thousands of counties or individuals simultaneously, based on assessment that CE > 0 for each one, implies that essentially the same bet is being made many times, so that the total social CE will be negative if the number of counties or individuals is sufficiently large. This effect of correlated uncertainties in reducing the net benefits of regulations with uncertain benefits that are widely applied is omitted from BCA calculations that only consider expected values.
The decision biases network in Figure 1 has a potentially surprising implication: Real people typically over-estimate highly uncertain benefits and under-estimate highly uncertain costs, and hence are willing to pay too much, for projects (or other proposed changes) with unknown or highly uncertain benefits and/or costs. Intuitively, one might expect exactly the reverse: that ambiguity aversion would reduce the perceived values or net benefits of such projects. But in fact, ambiguity aversion (and other drivers of learning aversion) mainly cut off information collection and analyses needed for careful evaluation, comparison, and selection of alternatives, leading to premature and needlessly risky decisions (see Table 12.2). Then, overconfidence and optimism bias take over (Figure 12.1). From the perspective of obtaining desirable outcomes, members of most decision-making groups spend too much time and effort convincing each other that their decisions are sound, and increasing their own confidence that they have chosen well. They spend too little effort seeking and using potentially disconfirming information that could lead to a decision with more desirable outcomes (Russo and Schoemaker, 1989). Moreover, in assessing the likely future outcomes of investments in risky projects, individuals and groups typically do not focus on the worst plausible scenario (e.g., the worst-case probability distribution for completion times of future activities), as theoretical models of ambiguity aversion suggest (Gilboa and Schmeidler, 1989). To the contrary, they tend to assign low subjective probabilities to pessimistic scenarios, and to base plans and expectations on most-favorable, or nearly most-favorable, scenarios (e.g., Newby-Clark et al., 2000).
This tendency toward overly-optimistic assessment of both uncertain benefits (too high) and uncertain costs or delays (too low) has been well documented in discussions of optimism bias (and corollaries such as the planning fallacy). For example, it has repeatedly been found that investigators consistently over-estimate the benefits (treatment effects) to be expected from new drugs undergoing randomized clinical trials (e.g., Djulbegovi et al., 2011; Gan et al., 2012); conversely, most people consistently underestimate the time and effort needed to complete complex tasks or projects, such as new drug development (Newby-Clark et al., 2000). These psychological biases are abetted by statistical methods and practices that routinely produce an excess of false positives, incorrectly concluding that interventions have desired or expected effects that, in fact, they do not have, and that cannot later be reproduced (Nuzzo, 2014; Sarewitz, 2012; Lehrer, 2012; Ioannidis, 2005). Simple Bayesian calculations suggest that more than 30% of studies with reported P values of ≤ 0.05 may in fact be reporting false positives (Goodman, 1991). Indeed, tolerance for, and even encouragement of, a high risk of false-positive findings (in order to reduce risk of false negatives and to continue to investigate initially interesting hypotheses) has long been part of the culture of much of epidemiology and public health investigations supposed to be in the public interest (e.g.. Rothman, 1990).
The bottom of Figure 12.1 suggests that learning aversion and several related decision biases contribute to a willingness to take costly actions with highly uncertain benefits and/or costs. Other prominent decision biases that favor such willingness to bet on a positive outcome under substantial uncertainty include the following:
Overconfidence in subjective judgments when objective facts or probabilities are not available (Russo and Schoemaker, 1992);
Sunk-cost effect (propensity to throw good money after bad, or escalating commitment to an uncertain project as past investment increases, in preference to stopping and acknowledging failure and the need to move on) (Navarro and Fantino, 2005); and
Optimism bias (e.g., underestimating the probable effort, cost, success probability, or uncertainty to complete a complex undertaking; and overestimating the probable benefits of doing so).
These biases favor premature decisions to pay to achieve uncertain benefits, even in situations where free or inexpensive additional investigation would show that the benefits are in fact almost certainly much less than the costs.
Example: Overconfident Estimation of Health Benefits from Clean Air Regulations
Overconfidence and confirmation biases can be encoded in the modeling assumptions and analytic procedures used to develop estimates of cost and benefits for BCA comparisons. For example, the U.S. EPA (2011and b) estimated that reducing fine particulate matter (PM2.5) air pollution in the United States has created close to 2 trillion dollars per year of annual health benefits, mainly from reduced elderly mortality rates. This is vastly greater than the approximately 65 billion dollars per year that EPA estimates for compliance costs, leading them to conclude that “The extent to which estimated benefits exceed estimated costsand an in‐depth analysis of uncertainties indicate that it is extremely unlikely the costs of 1990 Clean Air Act Amendmentprograms would exceed their benefits under any reasonable combination of alternativeassumptions or methods identified during this study” (emphasis in original). However, the benefits calculation used a quantitative approach to uncertainty analysis based on a Weibull distribution (assessed using expert guesses) for the reduction in mortality rates per unit reduction in PM2.5. The Weibull distribution is a continuous probability distribution that is only defined over non-negative values. Thus, the quantitative uncertainty analysis implicitly assumes a 100% certainty that reducing PM2.5 does in fact cause reductions in mortality rates (the Weibull distribution puts 100% of the probability mass on positive values), in direct proportion to reductions in PM2.5 pollutant levels, even though EPA’s qualitative uncertainty analysis states (correctly) such a causal relation has not been established. An alternative uncertainty analysis that assigns a positive probability to each of several discrete uncertainties suggests that “EPA's evaluation of health benefits is unrealistically high, by a factor that could well exceed 1000, and that it is therefore very likely that the costs of the 1990 CAAA [Clean Air Act Amendment] exceed its benefits, plausibly by more than 50-fold. The reasoning involves re-examining specific uncertainties (including model uncertainty, toxicological uncertainty, confounder uncertainty, and uncertainty about what actually affects the timing of death in people) that were acknowledged qualitatively, but whose discrete contributions to uncertainty in health benefits were not quantified, in EPA's cost-benefit analysis” (Cox, 2011). If this analysis is even approximately correct, then EPA’s highly confident conclusion results from an uncertainty analysis that disregarded key sources of uncertainty. It implicitly encodes (via the choice of a Weibull uncertainty distribution) overconfidence and confirmation biases that may have substantially inflated estimated benefits from Clean Air Act regulations by assuming, similar to the Dublin coal ban analysis, that reducing PM2.5 concentrations causes reductions in mortality rates, while downplaying (by setting its subjectively assessed probability to zero) the possibility that this fundamental assumption might be wrong.
In the political realm, the costs of regulations (or of projects or other proposed expensive changes) can also be made more palatable to decision-makers by a variety of devices, long known to marketers and politicians and increasingly familiar to behavioral economists, that exploit the decision biases in Figure 1 (Poundstone, 2010). Among these are the following: postponing costs by even a little (to exploit hyperbolic discounting, since paying now provokes an adverse reaction that paying even slightly later does not); emphasizing annual costs instead of larger total costs; building in an annual rate increase (so that increases become viewed as part of the status quo, and hence acceptable without further scrutiny); paying from unspecified, obscure, or general funds (e.g., general revenues) rather than from specific accounts (so that trade-offs, opportunity costs and outgoing payments are less salient); adding comparisons to alternatives that no one would want to make the recommended one seem more acceptable; creating a single decision point for committing to a stream of expenses, rather than instituting multiple review and decision points (e.g., a single yes/no decision, with a limited time window of opportunity, on whether to enact a costly regulation that will last for years, rather than a contingent decision for temporary funding with frequent reviews to ask whether it has now served its purpose and should be discontinued); considering each funding decision in isolation (so that proposal can be evaluated based on its affect when viewed outside the context of competing uses to which the funds could be put); framing the cost as protecting an endowment, entitlement, or option (i.e., as paying to avoid losing a benefit, rather than as paying to gain it); and comparing expenditures to those of others (e.g., to how much EU or Japan is spending on something said to be similar). These and related techniques are widely used in marketing and advertising, as well as by business leaders and politicians seeking to “sell” programs to the public (Gardner, 2009). They are highly effective in persuading consumers to spend money that, in retrospect, they might feel would have been better spent on something else (Ariely, 2009; Poundstone, 2010).
Doing Better: Using Predictable Rational Regret to Improve BCA Figure 12.1 and the preceding discussion suggest that a variety of decision biases can lead to both individual and collective decision processes that place too little value on collecting relevant information, rely too heavily on uninformed or under-informed judgments (which tend to be over-optimistic and over-confident), and hence systematically over-value prospects with uncertain costs and benefits, creating excessive willingness to gamble on them. One result is predictable disappointment: consistent over-investment in uncertain and costly prospects that, predictably, will be seen in retrospect to have (on average) cost more and delivered less than expected. A second possible adverse outcome is predictable regret: investing limited resources in prospects with uncertain net benefits when, predictably, it will be clear in hindsight that the resources could have been better spent on something else. Standard BCA facilitates these tendencies by encouraging use of current expected values to make choices among alternatives, instead of emphasizing more complex, but potentially less costly (on average), optimal sequential strategies that require waiting, monitoring, and inaction until conditions and information justify costly interventions (Stokey, 2009 for economic investment decisions; Methany et al. 2011 for hospital operations). This section considers how to do better, and what “better” means.
A long-standing tradition in decision analysis and normative theories of rational decision-making complements the principle of maximizing expected utility with various versions of minimizing expected rational regret (e.g., Loomes and Sugden, 1982; Bell, 1985). Formulations of rational regret typically represent it as a measure of the difference between the reward (e.g., net benefit, in a BCA context) that one’s decision actually achieved and the greatest reward that could have been achieved had one made a different (feasible) decision instead (Hart, 2005; Hazan and Kale, 2007). Adjusting decision rules to reduce rational regret plays a crucial role in current machine-learning algorithms, as well as in neurobiological studies of human and animal learning, adaptation, and decision-making, within the general framework of computational reinforcement learning (e.g., Li and Daw, 2011; Schönberg et al., 2007). (By contrast, related concepts such as elation or disappointment (Delquié and Cillo, 1986) reflect differences between expected or predicted rewards and those actually received. They do not necessarily attribute the difference to one’s own decisions, or provide an opportunity to learn how to make more effective decisions.)
Intuitively, instead of prescribing that current decisions should attempt to maximize prospective expected reward (or expected discounted net benefits), rational regret-based theories prescribe that they should be made so that, even in hindsight, one has no reason to change the decision process to increase average rewards. In effect, instead of the advice “Choose the alternative with the greatest expected value or utility,” normative theories of regret give the advice “Think about how, in retrospect, you would want to make decisions in these situations, so that no change in the decision procedure would improve the resulting distribution of outcomes. Then, make decisions that way.” In this context, a no-regret rule (Chang, 2007) is one that, even in retrospect, one would not wish to modify before using again, since no feasible modification would lead to a preferred distribution of future consequences. Equivalently, if various options are available for modifying decision rules to try to improve the frequency distribution of rewards that they generate, then a no-regret rule is one that cannot be improved upon: it is a fixed point of the decision rule-improvement process (Hazan and Kale, 2007). (These concepts apply to what we are calling rational regrets, i.e., to regrets about not making decisions that would have improved reward distributions.)
Example: Rational vs. Irrational Regret
Suppose that a decision maker’s reward (or “payoff,” in the game-theoretic terminology often used) is determined by her choice of an act, A or B, together with a random state of nature (e.g., the outcome of one toss of a fair die, with faces 1-6 being equally likely, revealed only after the choice has been made. Possible payoffs range between 1 and 6.1 dollars, as described by the following table.
State: 1 2 3 4 5 6
Act A: 1 2 3 4 5 6.1
Act B: 2 3 4 5 6 1
Expected utility theory unequivocally prescribes choosing act A (since its probability distribution of rewards stochastically dominates that of act B, as 6.1 > 6), even though act B yield a higher payoff than A 5/6 of the time. Minimizing rational regret also prescribes choosing act A, since any decision rule that prescribes choosing B in this situation (always or sometimes) will yield a payoff frequency distribution that is inferior to (stochastically dominated by) the payoff distribution from always choosing act A. In this simple case, choosing act A and then observing that choosing B would have yielded a higher reward provides no reason for a rational decision-maker to deviate from the optimal strategy of always choosing act A. Thus, minimizing rational regret recommends A, not B.
Other concepts of regret and regret-avoidance are linked to personality psychology. These include making decisions with low potential for regret to protect damaging already low self-esteem (Josephs et al., 1992), as well as preferring to avoid learning outcomes in order to avoid possible regrets. From a biological perspective, it has been proposed that the emotion of regret, when used as an error signal to adaptively modify decision rules in individual decision-making, is a “rational emotion” that helps us to learn and adapt decision-making effectively to uncertain and changing environments (e.g., Bourgeois-Gironde, 2010). Although these psychological and biological aspects of regret are important for some kinds of decision-making under risk, it is primarily proposed concepts of rational regret, as just discussed,that we believe are most useful for improving the practice of BCA. The rest of this section explains how.
Does the shift in perspective from maximizing prospective expected net benefits to minimizing expected retrospective regret make any practical difference in what actions are recommended? Not for homo economicus. For an ideally rational SEU decision-maker, the principle of maximizing SEU, while optimally taking into account future plans (contingent on future events) and the value of information, is already a no-regret rule. But for real decision-makers (whether individuals or groups) who are not able to formulate trustworthy, crisp, agreed-to probabilities for the consequences of each choice, the shift to minimizing regret has several powerful practical advantages over trying to maximize expected net benefits. Among these are the following:
Encourage prospective hindsight analysis. A very practical aid for reducing over-confidence and optimism bias is for decision-makers to imagine that a contemplated project or investment ends badly, and then to figure out what could have caused this and how it might have been prevented. Such “prospective hindsight” or “premortem” exercises have been used successfully in business to help curb under-estimation of costs and over-estimation of benefits when both are highly uncertain (Russo and Schoemaker, 1989). In the domain of regulatory benefit-cost analysis, they prompt questions such as: Suppose that, twenty years from now, we rigorously assess the health benefits and economic costs actually achieved by extending Clean Air Act amendments, and find that the costs were on the order of a trillion dollars (EPA, 2011), but that the projected benefits of reduced mortality rates caused by cleaner air never materialized. How might this have happened? Could it have been discovered sooner? What might we do now or soon to prevent such an outcome? When such questions are asked on a small scale, as in the Dublin coal-ban example, they lead to simple answers, such as to use a control group (people outside the affected area) to determine whether the bans actually produced their predicted effects (HEI, 2013). On a national level a similar openness to the possibility of errors in projections, and vigilance in frequently testing uncertain assumptions against data as the effects of expensive regulations become manifest, might likewise be used to anticipate and prevent the BCA failure scenarios imagined in premortem exercises. In the U.S., for example, learning from the experiences of cities, counties, or states (such as California) who are early adopters of policies or initiatives that are later proposed for national implementation provides opportunities to check assumptions against data relatively early, and to modify or optimally slow-roll (Stokey, 2009) the implementation of national-level policies as needed to reduce expected regret.
Increase feedback and learning. Items 8-10 in Table 12.2 describe failures to learn from real-world feedback based on the gaps between what was expected and what actually occurred, or between what was achieved and what could have been achieved by better decisions (if this is known). Formal models of how to adaptively modify decision processes or decision rules to reduce regret – for example, by selecting actions next time a situation is encountered in a Markov decision process, or in a game against nature (with an unpredictable, possibly adversarial, environment) using probabilities that reflect cumulative regret for not having used each action in such situations in the past – require explicitly collecting and analyzing such data (Robards and Sunehag, 2011; Hazan and Kale, 2007). Less formally, continually assessing the outcomes of decisions and how one might have done better, as required by the regret-minimization framework, means that opportunities to learn from experience will more often be exploited instead of missed.
Increase experimentation and adaptation. An obvious possible limitation of regret-minimization is that one may not know what would have happened if different decisions had been made, or what probabilities of different outcomes would have been induced by different choices (Jaksch et al., 2010). This is the case when relevant probabilities are unknown or ambiguous. It can arise in practice when no states or counties have been early (or late) adopters of a proposed national policy, and so there is no comparison group to reveal what would have happened had it not been adopted. In this case, formal models of regret reduction typically require exploring different decision rules to find out what works best. Such learning strategies (called “on-policy” learning algorithms, since they learn only from experience with the policy actually used, rather than from information about what would have happened if something different had been tried) have been extensively developed and applied successfully to regret reduction in machine learning and game theory (Chang, 2007; Yu et al., 2009; Robards and Sunehag, 2011). They adaptively weed out the policies that are followed by the least desirable consequences, and increase the selection probabilities for policies that are followed by preferred consequences. Many formal models of regret-minimization and no-regret learning strategies (e.g., Chang, 2007; Jaksch et al., 2010 for Markov decision processes) have investigated how best to balance exploration of new decision rules and exploitation of the best ones discovered so far. Under a broad range of conditions, such adaptive selection (via increased probabilities of re-use) of the decision rules that work best empirically soon leads to discovery and adoption of optimal or near-optimal (‘no-regret”) decision rules (i.e., maximizing average rewards) (Chang, 2007; Robards and Sunehag, 2011; Hazan and Kale, 2007). Of course, translating these mathematical insights from the simplified world of formal decision models (e.g., Markov decision processes with initially unknown transition and reward probabilities and costs of experimentation) to the real world requires caution. But the basic principle that the policies that will truly maximize average net benefits per period (or discounted net benefits, in other formulations) may initially be unknown, and that they should then be discovered via well-designed and closely analyzed trials, has powerful implications for the practice of BCA and policy making. It emphasizes the desirability of conducting, and carefully learning from, pilot programs and trial evaluations (or natural experiments, where available) before rolling out large-scale implementations of regulations or other changes having highly uncertain costs or benefits. In effect, the risk of failure or substantially sub-optimal performance from programs whose assumptions and expectations about costs and benefits turn out to be incorrect can be reduced by small-scale trial-and-error learning, making it unnecessary to gamble that recommendations based on BCA using current information will turn out to coincide with those that will be preferred in hindsight, after key uncertainties are resolved.
Asymptotic optimization of decision rules with initially unknown probabilities for consequences. In formal mathematical models of no-regret reinforcement learning with initially unknown environments and reward probabilities, swift convergence of the prescriptions from empirical regret-minimization algorithms to approximately optimal policies holds even if the underlying process tying decisions to outcome probabilities is unknown or slowly changing (Yu et al., 2009). This makes regret-minimization especially relevant and useful in real-world applications with unknown or uncertain probabilities for the consequences of alternative actions. It also provides a constructive approach for avoiding the fundamental limitations of collective choice mechanisms that require combining the subjective probabilities (or expected values) of different participants in order to make a collective choice (Hylland and Zeckhauser 1979; Nehring, 2007). Instead of trying to reconcile or combine discrepant probability estimates, no-regret learning encourages collecting additional information that will clarify which among competing alternative policies work best. Again, the most important lesson from the formal models is that adaptively modifying policies (i.e., decision rules) to reduce empirical estimates of regret based on multiple small trials can dramatically improve the final choice of policies and the final results produced (e.g., average rewards per period, or discounted net benefits actually achieved). From this perspective, recommending any policy based on analysis and comparison of its expected costs and benefits to those of feasible alternatives will often be inferior to recommending a process of trials and learning to discover what works best. No-regret learning (Chang, 2007) formalizes this intuition.
In summary, adjusting decision processes to reduce empirical estimates of regret, based on actual outcomes following alternative decisions, can lead to much better average rewards or discounted net benefits than other approaches. Real-world examples abound of small-scale trial and error leading to successful adaptation in highly uncertain business, military, and policy environments (e.g., Harford, 2011).
Conclusions This chapter has argued that a foundational principle of traditional BCA, choosing among proposed alternatives to maximize the expected net present value of net benefits, is not well suited to guide public policy choices when costs or benefits are highly uncertain. In principle, even if preferences are aligned (so that familiar collective choice paradoxes and impossibility results caused by very different individual preferences do not arise) – for example, even if all participants share a common goal of reducing mortality risks – there is no way (barring such extremes as dictatorship) to aggregate sufficiently diverse probabilistic beliefs to avoid selecting outcomes that no one favors (Hylland and Zeckhauser 1979; Nehring, 2007). BCA does not overcome such fundamental limitations in any formulation that requires combining probability estimates from multiple participants to arrive at a collective choice among competing alternatives – including using such probabilities to estimate which alternative has the greatest expected net benefit.
In practice, a variety of well-known decision biases conspire to make subjectively assessed expected value calculations and WTP estimates untrustworthy, with highly uncertain benefits often tending to be over-estimated, and highly uncertain costs tending to be under-estimated. Biases that contribute to unreliable expected net benefit and WTP estimates range from the affect heuristic, which we view as fundamental, to optimism, over-confidence, and confirmation biases, ambiguity aversion, and finally to what we have called learning aversion (Figure 12.1). As a result of this network of biases, it is predictable that projects and proposals with highly uncertain costs and benefits will tend to be over-valued, leading to potentially regrettable decisions, meaning decisions that, in retrospect, and upon rational review, one would want to have made differently. Similar results have been demonstrated for groups and for individuals (Russo and Schoemaker, 2009). The net result is a proclivity to gamble on excessively risky proposals when the benefits and costs are highly uncertain.
To help overcome these difficulties, we have proposed shifting to a different foundation for BCA calculations and procedures: minimizing rational regret. Regret minimization principles been developed in both decision analysis (e.g., Loomes and Sugden, 1982; Bell, 1985) and extensively in more recent machine learning, game theory, and neurobiological models of reinforcement learning (Hart, 2005; Chang, 2007; Hazan and Kale, 2007; Li and Daw, 2011; Schönberg et al., 2007). Although the idealized mathematical models and analyses of these fields are not necessarily directly applicable to real-world BCA settings, they do suggest several practical principles that have proved valuable in improving real-world individual and collective decisions when potential costs and benefits are uncertain enough so that the best course of action (given clarity on goals) is not clear. In particular, we propose that BCA under such conditions of high uncertainty can be improved by greater use of prospective hindsight (or “premortem”) analyses to reduce decision biases; explicit data collection and careful retrospective evaluation and comparison of what was actually achieved to what was expected, and to what could have been achieved by different choices (when this can be determined); and deliberate learning and adaptation of decision rules based on the results of multiple small-scale trials in settings for which this is practical. Not all of these principles are applicable in all BCA situations, of course. Whether to build a bridge in a certain location cannot be decided by multiple small-scale trials, for example. But for many important health, safety, and environmental regulations with substantial costs and substantial uncertainty about benefits, learning from experiences on smaller scales (e.g., from the changes in mortality rates following different histories of pollution reductions in different counties) can powerfully inform and improve BCA analyses that are intended to guide larger-scale (e.g., national) policy-making. The main proposed shift in emphasis is from guessing what will work best (in the sense of maximizing the expected NPV of net benefits, as assessed by experts or other participants in the decision-making process), and then perhaps betting national policies on the answer, to discovering empirically what works best, when it is practical to do so and when the answer is initially highly uncertain.
REFERENCES Al-Najjar NI, Weinstein, J. 2009. The ambiguity aversion literature: A critical assessment. Economics and Philosophy 25 (Special Issue 03): 249–284.
Ariely, D. 2009. Predictably Irrational: The Hidden Forces that Shape Our Decisions. Revised and Expanded Edition. HarperCollins. New York, New York.
Armstrong K, Schwartz JS, Fitzgerald G, Putt M, Ubel PA. Effect of framing as gain versus loss on understanding and hypothetical treatment choices: survival and mortality curves. Med Decis Making. 2002 Jan-Feb;22(1):76-83.
Bell DE. 1985. Putting a premium on regret. Management Science Jan 31(1): 117–20. doi:10.1287/mnsc.31.1.117 Bennett R, Blaney RJP. 2002. Social consensus, moral intensity and willingness to pay to address a farm animal welfare issue. Journal of Economic Psychology, 23 (4). pp. 501-520
Bourgeois-Gironde S. 2010. Regret and the rationality of choices. Philos Trans R Soc Lond B Biol Sci. Jan 27;365(1538):249-57. doi: 10.1098/rstb.2009.0163.
Casey JT, Delquie P. 1995. Stated vs. implicit willingness to pay under risk. Organizational behavior and human decision processes. Feb. 61(2): 123-137.
Champ PA, Bishop RC. 2006. Is Willingness to Pay for a Public Good Sensitive to the Elicitation Format?
Land Economics 82(2): 162-173.
Chang YC. 2007. No regrets about no-regret. Artificial Intelligence 171:434–439
Clancy L, Goodman P, Sinclair H, Dockery DW. 2002. Effect of air-pollution control on death rates in Dublin, Ireland: An intervention study. Lancet. Oct 19;360(9341):1210-4.
Cox LA Jr. Reassessing the human health benefits from cleaner air. Risk Analysis 2012 May;32(5):816-29
Dean M, Ortoleva P. 2012. Estimating the relationship between economic preferences: A testing ground for unified theories of behavior. Working Paper, Department of Economics. Brown University. Providence, RI. http://www.econ.brown.edu/fac/Mark_Dean/papers.shtml. Last Retrieved 1 February 2014.