- Split View
-
Views
-
Cite
Cite
Philippe Jehiel, Jakub Steiner, Selective Sampling with Information-Storage Constraints, The Economic Journal, Volume 130, Issue 630, August 2020, Pages 1753–1781, https://doi.org/10.1093/ej/uez068
-
Share
Abstract
A memoryless agent can acquire arbitrarily many signals. After each signal observation, she either terminates and chooses an action, or she discards her observation and draws a new signal. By conditioning the probability of termination on the information collected, she controls the correlation between the payoff state and her terminal action. We provide an optimality condition for the emerging stochastic choice. The condition highlights the benefits of selective memory applied to the extracted signals. Implications—obtained in simple examples—include (i) confirmation bias, (ii) speed-accuracy complementarity, (iii) overweighting of rare events, and (iv) salience effect.
Economic agents often acquire information about the state of the economy before making their decisions. The information is typically modelled as a signal that helps the agent refine the distribution of the state and improve the decision-making. Often, signals come over time and agents can absorb only a small number of them. We capture this information-processing friction by assuming that agents receive as many signals as they wish but can remember only a finite number of them when making their choices. In the simplest setting we analyse, the agent can only remember one signal. A key strategic variable that we consider is to allow the agent to ignore some signals with positive probability and restart the signal extraction process. We allow agents to employ an arbitrary stationary decision process that specifies for each possible signal realisation a probability with which the agent restarts the process as well as the chosen action in case of termination. We do not impose time constraints and costs in the basic formulation so that the friction comes solely from the limited information-storing capacity of the agent.
We ask ourselves: Should the agent optimally make her choice as soon as she receives the first signal whatever the realisation of it is, or could she be better off by rerunning the very same information-acquisition process? Can hesitation—selective repetition of a fixed stochastic decision procedure—be welfare-enhancing?
A general insight is that selective rerunning of the primitive decision procedure is typically optimal. To document this most generally, we provide a simple necessary condition satisfied by the optimal rerunning strategy. The result is an interim indifference condition imposed on the agent who has concluded her decision-making with a plan to choose a particular action. Given the recommended action, the agent’s posterior expected payoff from implementing this action must be the same as the posterior expected payoff from rerunning the whole decision-making—the whole selective repetitions of the primitive signal extraction—and implementing whichever action the second run of the decision-making will recommend. We refer to this as to the second-thought-free condition.
For illustration, consider a binary decision of whether to make an investment of a fixed size. The agent receives payoff 1 if she invests in the good state of the economy, payoff −1 if she invests in the bad state, and receives 0 when she does not invest whatever the state. One of the two states is a priori more likely; for sake of concreteness, let the prior probability of the good state be 2/3. Both states give rise to a population of good and bad signals, with the share of the good signals at 90% in the good state and 10% in the bad state. The agent draws possibly several signal realisations in sequence but remembers only the last one when making her investment choice. As follows from simple optimisation considerations, assume she invests if and only if the last remembered signal was good. Observe that the decision rule generated by the immediate termination upon the first signal that comes in does not generate a second-thought-free choice rule: An agent whose first observed signal was bad prefers to rerun the decision process, since the new run will either lead to not investing again or will give rise to the signal realisation that conflicts with the first observation and will lead to investing. Since, conditional on two conflicting signals, the a priori more common state is more likely, investing is preferred in this contingency. The agent benefits from having second thoughts when the first observed signal is surprising.
We interpret the probability of terminating the decision process after receiving a particular signal as a search intensity targeting this particular signal. A higher probability of termination at a given information set inflates the likelihood that the agent makes the terminal choice at the set. We show that the failure of the second-thought-free condition with uniform search intensity in the above investment decision example indicates that relative to the uniform search, the agent benefits from decreasing the search intensity for the bad attribute. More generally, the second-thought-free condition follows from the first-order condition imposed on the optimal search intensities.
The model provides microfoundations to a range of behavioural stylised facts. The unifying principle of our behavioural insights is the intuition that the agent targets her search towards the type of evidence that would provide her with more valuable posteriors under the uniform search. This principle generates confirmation bias in the context of the above example, since evidence that confirms the agent’s prior leads to more informed posterior than does evidence that contradicts the prior. An optimally targeted information search also generates speed-accuracy complementarity in the same setting; that is, accuracy of choice declines with the response time. The effect is generated by the confirmation bias: The agent encountering evidence contradicting her prior is likely to disregard the evidence and to have a second thought. Hence, long response times indicate a surprising state of the world, and the constrained-optimal choice rule commits errors in the surprising state relatively often. Overweighting of rare events occurs in a related setting in which the agent’s task is to form a probability belief about an event that is known to be rare, such as a flight accident, by observing a random flight outcome. Since observing a flight accident is far more informative about the probability of future accidents than observing an uneventful flight, the agent optimally biases her search towards eventful flights. In the last behavioural application, we show in a setting with multiple states that distinct states of the world are salient in the sense that they attract the agent’s attention (i.e., trigger higher termination rates in our framework). The effect arises because an indistinct perception stimulus that can be generated by several similar states is less informative than a distinct perception stimulus that is most likely generated by a specific distinct state. Hence, the optimal information search targets stimuli indicating distinct states.
Our leading interpretation of the model is in terms of a single-person with information storage limitations but perfect ability to adjust optimally her termination strategies as a function of what she remembers. These adjustments can be viewed as a result of a successful trial and error process or as a result of evolutionary pressures in which case the adjustments may not be fine-tuned to each specific problem.1 Alternatively, one may think of the termination strategies and the final decisions as being chosen by different persons and only the one in charge of the final decision would be subject to information storage limitations, thereby allowing the termination strategies to be optimally determined.
There is a wide range of studies that propose different modelling of optimisation over information structures. Relative to rational inattention models (Sims, 2003), we provide a procedural micro-foundation of our set of the feasible information structures (which must be obtained through variations of the termination strategies and as such can straightforwardly be related to the time dimension of the decision process). Perhaps surprisingly, our memoryless agent shares some of the flexibility in her choice of information structures with the sequential-sampling model of Wald (1945) in which there is perfect information aggregation. But, our approach allows for a simple derivation of the speed-accuracy complementarity, which is less immediate to obtain with sequential-sampling models (see however Fudenberg et al., 2018).
Relative to studies based on finite automata (Hellman and Cover, 1970; Wilson, 2014; Basu and Chatterjee, 2015), our approach yields a simple necessary condition for the optimal choice rule, the second-thought-free condition. This condition arises because our agent chooses the probability of termination at each of her information sets. Such a termination optimisation is absent from the related models with finite automata in which the termination is exogenous (or the objective involves asymptotic performance as time diverges). The second-thought-free condition allows us to characterise the optimal choice rule in the binary settings.
This article belongs to a growing economic literature that explains behavioural stylised facts as the constrained-optimal behaviour of decision-makers facing information processing frictions. For instance, Robson (2001), Rayo and Becker (2007), Netzer (2009), and Khaw et al. (2017) provide microfoundations for risk attitudes; Gabaix and Laibson (2017) endogenise discounting; and Wilson (2014), Compte and Postlewaite (2012), and Leung (2020) establish constrained-optimal ignorance of weakly informative news.2
1. Model
An agent faces a decision under uncertainty. She chooses an action a ∈ A and receives a payoff u(a, θ) in the fixed payoff state θ ∈ Θ drawn from an interior prior π ∈ Δ(Θ). The sets Θ and A are finite. The agent chooses a Blackwell experiment p, where p is a family of conditional signal distributions p(x∣ θ) that depend on θ ∈ Θ. The experiment generates a signal realisation x from a finite space X. The conditional signal distributions are fully mixed: p(x∣ θ) > 0 for all x, θ. We allow the agent to choose among possibly several such experiments and we let |$\mathcal {P}$| denote the exogenous set of experiments from which she chooses. We impose no restrictions on |$\mathcal {P}$| (other than the full-support of each p).
The agent can repeat the selected experiment arbitrarily many times, but she is unable to aggregate the information across the repetitions. Each run of the experiment is a cognition that exhausts the agent’s capacity dedicated to the problem being solved. Once the agent hits the constraint at the end of the experiment, she can continue only after she unclogs her capacity by amnesia.
We model this as follows. The agent can condition the repetition of the experiment on the last observed signal realisation. She chooses a vector β = (βx)x∈X ∈ B = [0, 1]|X|∖{(0, …, 0)} of termination probabilities βx for each signal realisation x; we call β a termination strategy. The agent runs the experiment p for the first time, receives signal realisation x(1) with probability p(x(1) ∣ θ) and terminates the reasoning with probability |$\beta _{x^{(1)}}$|. She restarts with the complementary probability |$1-\beta _{x^{(1)}}$|, and receives a signal realisation x(2) from a new run of the process p with probability p(x(2) ∣ θ), terminates with probability |$\beta _{x^{(2)}}$| or restarts with probability |$1-\beta _{x^{(2)}}$|, and continues until she terminates after a random number of repetitions of p; see Figure 1. When the agent chooses having distinct βx for different x, then she implements the familiar idea of selective memory; some facts and observations are easily forgotten whereas others are remembered and they trigger choice. After the agent terminates the reasoning with a terminal signal realisation x, she selects an action a = σ(x) according to an action strategy|$\sigma :X\longrightarrow A$|.3 Let S be the set of all mappings from X to A.
By excluding the termination strategy (0, …, 0) we force the agent to make a decision a ∈ A. Since β ≠ (0, …, 0) and each feasible experiment p generates all signal values with a positive probability in each state, the decision process almost surely eventually terminates.
The optimisation in (3) can be an outcome of selective pressures that favour successful decision procedures via cultural or biological evolution, or via competition of firms differing in their internal procedures. There are no costs to delaying the decision in our model but incorporating such costs would not affect our qualitative insights when these are not too big. We address agents with less severe memory constraints and with exponential time discounting preferences in Section 6.
2. Optimal Cognition Biases
We now derive a necessary optimality condition that the choice rule solving the repeated-cognition problem must satisfy. Generically, the condition requires the agent to engage in selective information processing—that is, to ignore some signals more often than others.
2.1. Second-Thought-Free Choice Rules
We start with a definition of second-thought-free choice rules. If the agent’s decision process generates such a rule, then she has no incentive to rerun the process regardless of the action recommendation with which the process terminates. Our main result below states that the optimal rule is second-thought-free.
Let r be a generic stochastic choice rule that specifies conditional probabilities r(a ∣ θ) of each action a ∈ A in each state θ ∈ Θ.
The definition requires the agent who terminates with an action plan a to weakly prefer a to forgetting a and choosing whichever action a new run of the decision process will recommend. Although the definition allows for the strict preference against having a second thought, the next lemma shows that if a choice rule is second-thought-free, then the agent is indifferent between terminating and the second thought.
We refer to (5) as the second-thought-free condition.
2.2. Optimality Condition
We provide here a general necessary optimality condition imposed on the stochastic choice rule.
If a choice rule solves the repeated-cognition problem (3), then it is second-thought-free and satisfies (5).
To understand the statement, consider the optimal choice rule r* generated by a process that consists of a random number of repetitions of a primitive cognition p. Once these repetitions of p terminate with a signal realisation x and the agent is about to take an action a = σ(x), then, according to the proposition, she must be indifferent between a, and running the process associated with r* from scratch, where the new run of r* would involve new repetitions of p.
To prove Proposition 1, we introduce an effective experiment s(p, β). While the primitive experiment p(x ∣ θ) specifies the probability that its one run results in signal x, we define s(x∣ θ; p, β) to be the probability that selective repetitions of p according to the termination strategy β terminate with x. Relative to p(x ∣ θ), the effective probabilities s(x ∣ θ; p, β) are inflated for those x at which the agent terminates with a high probability βx.
The lemma implies that s(p, β) and hence also r(p, β, σ) are homogeneous of degree zero with respect to β. Thus, since we abstract from the delay costs, for any optimal termination strategy β*, αβ* for α ∈ (0, 1) is optimal too, and it generates the same optimal choice rule r* as β*. This multiplicity of implementation of the optimal choice rule would disappear in a natural approximation of our model with exponential discount factor approaching 1. Such approximation would select the quickest available decision process that implements the optimal feasible rule r*; that is, it would impose that maxx∈Xβx = 1.
Since the objective function in (7) is homogenous of degree zero with respect to β, we can restrict β to the simplex over the signal set X. This simplex is compact, the objective function in (7) is continuous in β and the p(x∣ θ), hence the repeated-cognition problem has a solution whenever the set of the primitive experiments |$\mathcal {P}$| is compact.
2.2.1. Comment
Our agent can be viewed as having imperfect recall in the sense of Piccione and Rubinstein (1997). Our approach corresponds to their ex ante approach, and the insight of Proposition 1 can be related to the observation in their absent-minded driver example that the ex ante optimal solution is also a (modified) multi-self equilibrium in which the decision problem is viewed as a team composed of multiple selves all sharing the decision-maker’s objective.
3. Analytical Solution of the Binary Setting
The action and state sets are binary: A = Θ = {0, 1}. To avoid a trivial case, we assume that neither action is dominant. Then, without loss of generality, u(a, θ) = uθ > 0 if a = θ and u(a, θ) = 0 otherwise. State θ is drawn from an interior prior π ∈ Δ(Θ). The exogenous set |$\mathcal {P}$| of the feasible statistical experiments is finite, and each |$p\in \mathcal {P}$| delivers a signal x from a finite signal space X with probability p(x ∣ θ). The agent chooses |$p\in \mathcal {P}$|, the termination strategy β = (βx)x∈X and action strategy |$\sigma :X\longrightarrow A$| to maximise the expected payoff.
The first result states that there exists a solution in which the agent ignores all but two signal realisations of the chosen experiment p. That is, she always repeats the experiment upon encountering all but two signals. Roughly, the result follows because it is advantageous to consider only the most informative signal realisations.4
There exists a solution in which the termination probability βxis positive for at most two signal values x ∈ X.
See Appendix for the proofs omitted in the main text.
Based on the lemma, we can, without loss of generality, restrict the signal space X to be binary, and identify it with the action and state spaces, X = A = Θ. Again without loss of generality, we choose signal labels in each experiment in such a way that each experiment |$p\in \mathcal {P}$| satisfies the monotone likelihood ratio property: p(1 ∣ θ)/p(0 ∣ θ) increases in θ. We continue to assume that p(x∣ θ) > 0 for all x and θ.
|$\mathcal {R}_{p,\sigma _I}=\lbrace r:r(1\mid 1)r(0\mid 0)=d_p r(1\mid 0)r(0\mid 1)\rbrace$|.
That is, a rule r can be constructed from p if and only if it preserves the perceptual distance: |$\frac{r(1\,\mid \,1)r(0\,\mid \,0)}{r(0\,\mid \,1)r(1\,\mid \,0)}=d_{p}$| (or if it always selects a same action). By controlling the termination strategy β, the agent trades off the likelihoods r(0 ∣ 0; p, β, σI) and r(1 ∣ 1; p, β, σI) of the correct choice in the states 0 and 1, respectively. See Figure 2. The set |$\mathcal {R}_{p,\sigma _{I}}$| of rules accessible from p is compact.
Thanks to the chosen labelling of the signals, the agent can equate her choice to the observed signal without a loss:
For any rule r(p, β, σ)there exists β′ such that the rule r(p, β′, σI) achieves at least as high expected payoff as r(p, β, σ) where σIis the identity function.
The solution to the repeated-cognition problem in the binary setting exists since the objective is continuous in the choice rule and the agent optimises on the compact set |$\bigcup _{p\in \mathcal {P}}\mathcal {R}_{p,\sigma _I}$| of the rules.
Let |$\overline{p}$| be the experiment with the maximal perceptual distance: |$\overline{p}\in \arg \max _{p\in \mathcal {P}}d_{p}$|, and let |$\overline{d} =\max _{p\in \mathcal {P}}d_{p}$|. In line with the intuition that the agent should go for the most informative experiment, we establish:
There exists a solution to the repeated-cognition problem in which the agent employs the experiment|$\overline{p}$|with the maximal perceptual distance.
The last lemma implies that all details of the set |$\mathcal {P}$| relevant for the solution are summarised in the one-dimensional statistic |$\overline{d}$| that is independent of the payoff function u.5
We are now ready to solve the binary setting. The optimal effective choice rule |$r^*(a\mid \theta )=r(a\mid \theta ;\overline{p}, \beta ^*,\sigma _I)$| consists of four unknown probabilities and it is determined by four conditions: the second-thought-free condition (5), the feasibility condition from Lemma 4, and two normalisation conditions. Let parameter |$R=\frac{\pi _1u_1}{\pi _0u_0}$| measure the relative a priori attractiveness of action 1.
When|$R\ge \overline{d}$|, then the agent always chooses action 1;
when|$R\le 1/\overline{d}$|, then the agent always chooses action 0;
- when|$R\in (1/\overline{d},\overline{d})$|, then the agent chooses both actions with positive probabilities and(8)$$\begin{eqnarray} r^{\ast }(1\mid 1)=\frac{\overline{d}R-\sqrt{\overline{d}R}}{(\overline{d} -1)R}\mbox{, }r^{\ast }(0\mid 0)=\frac{\overline{d}-\sqrt{\overline{d}R}}{ \overline{d}-1}, \end{eqnarray}$$(9)$$\begin{eqnarray} \frac{\beta _{1}^{\ast }}{\beta _{0}^{\ast }}=\frac{\overline{d}R-\sqrt{ \overline{d}R}}{\sqrt{\overline{d}R}-R}\frac{\overline{p}(0\mid 1)}{\overline{p}(1\mid 1)}. \end{eqnarray}$$
When the ex ante attractiveness of one of the actions is too strong relative to the perceptual distance of the two states, then the agent always chooses the ex ante attractive action. The decision process is non-trivial for intermediate incentives: the agent engages in repeated cognition and she chooses both actions with positive probabilities.
4. Behavioural Applications
This section presents three behavioural effects illustrated in the binary setting from Section 3: confirmation bias, speed-accuracy complementarity, and overweighting of rare events.
4.1. Confirmation Bias
Psychologists and economists distinguish at least three mechanisms leading to confirmation bias: (i) People search for evidence selectively, targeting the evidence type in accord with their priors, e.g., Nickerson (1998), (ii) they selectively memorise and recall the data supporting their priors, e.g., Oswald and Grosjean (2004), and (iii) they selectively interpret ambiguous evidence, e.g., Rabin and Schrag (1999) and Fryer et al. (2018). We focus on the first two mechanisms and interpret them in light of our optimal repeated-cognition result.
When action|$1$|is a priori more attractive, |$R \in (1, d)$|, and the unique primitive binary experiment is symmetric, |$p(1 \mid 1) = p(0 \mid 0) > 1/2$|, then the agent searches relatively more intensively for signal value|$1$|: |$\beta ^*_1\gt \beta ^*_0$|.
To see the connection to confirmation bias, consider, like in our introductory example, an agent whose task is to announce the realised state of the world: she receives reward u1 = u0 = 1 if she makes the correct announcement and 0 otherwise. The agent finds the state θ = 1 a priori more likely than the state 0, π1 > π0. Consider the decision process that terminates immediately after the first run of the experiment and chooses the action equal to the observed signal value: β0 = β1 = 1, σ = σI. To establish that such an unbiased process is suboptimal it suffices to show that it does not satisfy the second-thought-free condition. To see this, we examine the agent who has received the a priori unlikely signal 0 and argue that she benefits from the second thought. Such a surprised agent is better off by restarting instead of terminating with action 0, since if the new run of the process concludes with signal 0 again, then the second thought will have been inconsequential. If, however, the new run of the process concludes with signal and action 1, then the induced switch from action 0 to 1 is beneficial. This is because when the experiment p is symmetric, then, conditional on two conflicting signals, the a priori more common state 1 is relatively more likely. The agent benefits from second thought whenever she receives the surprising recommendation, and thus will deviate from the uniform search in favour of the a priori likely signal value 1.
The optimal strategy resembles the natural process in which the selective memory gives rise to confirmation bias. Consider the fastest optimal strategy, letting |$\beta _{1}^{\ast }=1$|. When the agent observes signal 1 that confirms her prior belief, then she terminates and immediately announces the state 1. But if she is surprised, observing signal 0 that contradicts her prior, then she discards the signal with positive probability |$\beta _{0}^{\ast }$| and repeats the experiment. Although finding the exact optimal value |$\beta _{0}^{\ast }$| may be difficult, the fact that double-checking one’s own reasoning when one arrives at a surprising conclusion is a common practice suggests that people are able to deviate from the unbiased information-acquisition process in the payoff improving direction.
4.1.1. Comments
(1) The above insight can receive an alternative political economy interpretation. The two states θ represent left vs right wing policy. Consider a right-wing newspaper that targets the right-wing readers viewed as having a prior belief in favour of the right-wing policy. Readers can only absorb one piece of information (the analog of our information storage constraint) and the newspaper has to decide which piece of information x as generated by p(x ∣ θ) to choose as its headline. Our model explains why such journals would target their search toward evidence favouring the right-wing policy.6
(2) Meyer (1991) studies optimal biases in a sequential-learning problem of an agent who receives a sequence of signals and, unlike our agent, can aggregate the sequence. Meyer’s main insight is that some asymmetries in the signal structure are optimal. Although optimal asymmetries arise both in her and our frameworks, the two papers study distinct optimisations. While our agent controls termination probabilities in a stationary decision process, Meyer’s agent controls the choice of a Blackwell experiment in each round of a non-stationary process.
4.2. Speed-Accuracy Complementarity
Our model generates the speed-accuracy complementarity effect—a stylised fact stating that delayed choices tend to be less accurate than speedy choices; see the psychology studies of Swensson (1972) and Luce (1986). We establish this effect in the setting from the previous subsection.
Let φ(θ, a, t) be the joint probability distribution of the state θ, chosen action a, and the reaction time t generated by the solution (p, β*, σI) of the repeated-cognition problem.
When action |$1$| is a priori more attractive, |$R \in (1, d)$|, and the unique primitive binary experiment is symmetric, |$p(1 \mid 1) = p(0 \mid 0) > 1/2$|, then the probability|${{\rm {Pr}}}_{\varphi }(a=\theta \mid t)$|of the correct choice decreases with response time t.
Due to the stationarity of the decision process, the probability of the correct choice conditional on the payoff state is independent of the reaction time: |$\Pr _{\varphi }(a=\theta \mid \theta ,t)=\Pr _{\varphi }(a=\theta \mid \theta )$|. At optimum, this conditional probability of the correct choice is larger in the a priori more attractive state 1 than in the state 0, reflecting the relative weights of the two states in the objective. Overall, unconditionally on the payoff state, the probability Prφ(a = θ ∣t) of the correct choice depends on the response time because t correlates with θ. A long response time indicates that the agent has repeatedly encountered the signal value 0 and has hesitated to terminate. Hence, conditional on large t, the likelihood of the unattractive state 0 becomes high. The longer the agent has hesitated, the more likely it is that she is facing the unattractive state in which she is making more mistakes.
4.2.1. Comment
The predictions of our model are in line with the evidence on state recognition problems reported in Ratcliff and McKoon (2008) according to which: (i) the posterior probability of correct recognition is higher when announcing the a priori more likely state, and (ii) late announcements are relatively less precise. This is in contrast to the prediction of the traditional Wald model (see Fudenberg et al. (2018) for an elaboration of Wald model in which the stakes attached to the correct recognition are a priori unknown).
4.3. Overweighting of Rare Events
We consider a state-recognition task in which the two actions are a priori equally attractive, R = π0u0/π1u1 = 1, and π0 = π1 = 1/2. In contrast to previous applications, the distribution of the signal values x = 0, 1 is asymmetric across states. Specifically, the probability of x = 1 in state θ is ρθ ∈ (0, 1) and the probability of x = 0 is 1 − ρθ. We assume, essentially without loss of generality, that ρ0 < ρ1 < 1 − ρ0.7 The a priori probability of event x = 1 is (ρ0 + ρ1)/2 < 1/2, and thus the event x = 1 is relatively rarer than x = 0. The next result states that at the optimum, the agent is relatively more likely to discard the more common event x = 0 in agreement with Kahneman and Tversky (1979), who observe that agents tend to overweight rare events.
When the two actions are a priori equally attractive, |$R = 1$|, then the agent is biased in favour of the event|$x = 1$|: |$\beta _{1}^{\ast }\gt \beta _{0}^{\ast }\gt 0$|(and her guess of the state equals the observed signal realisation, i.e., σ = σI).
For illustration, consider a formation of belief over the probability of a flight accident. The accident probability per flight in the safe state of the world is 10−6, whereas it is 10−5 in the dangerous state of the world, and both states are a priori equally likely. The agent can sequentially observe arbitrarily many past flight outcomes, but cannot aggregate the information, and recalls only the last observed flight. She guesses that the state of the world is dangerous if and only if the last observed flight is eventful.
Consider first an agent who always terminates right after the observation of the first data-point. Such an agent benefits from a ‘second thought' whenever she observes an uneventful flight: either the second observed flight will be uneventful, in which case the second thought will have been inconsequential, or the redrawn flight will be eventful and the agent will switch her assessment from the safe to the dangerous state. Such a switch is beneficial since conditional on two contradicting data-points the dangerous state is relatively more likely. Thus, relative to the immediate termination strategy, the agent will benefit from discarding the uneventful flight observations with positive probability.
5. A Tractable Setting with Multiple States
We now present a class of settings with multiple payoff states that admits a simple analytical solution in the form of a system of linear equations for the optimal termination probabilities. Subsection 5.1 applies this solution in a stylised example that explains salience of perceptually distinct states as a second-best adaptation.
The agent faces a perceptual task that requires her to announce a realisation of the state θ ∈ Θ drawn from a fully mixed prior π ∈ Δ(Θ), where 2 < |Θ| < ∞. She is endowed with a primitive perception technology that generates a perceived value θ′ of the state. The primitive perception is informative but noisy: the perceived value θ′ equals the true state θ with a high probability, but mistakes, θ′ ≠ θ, occur sometimes. We view the primitive perception technology as a black-box model of a physiological sensor that generates a noisy impression θ′ of the true state θ. The agent can use the sensor repeatedly but is not able to aggregate the information. She conditions the repetition of the sensor’s use on the most recent perception and announces the terminal perception.
We formalise this perception task as follows. The agent makes an announcement a ∈ A = Θ, where 2 < |Θ| < ∞, and receives payoff u(a, θ) = uθ > 0 if her announcement is correct, a = θ, and u(a, θ) = 0 if a ≠ θ. Each use of the agent’s sensor generates a signal value/perception θ′ ∈ X = Θ, with conditional probabilities p(θ′ ∣ θ) > 0. The set |$\mathcal {P}$| is the singleton {p}. We make the following assumption.
Symmetry: p(θ′ ∣ θ) = p(θ ∣ θ′).
The symmetry assumption leads to a significant simplification of the second-thought-free condition described in Lemma 9 in Appendix. Additionally, we make a simplifying assumption that the agent uses the identity action strategy σI; she announces the state equal to her last perception. We also make the assumption that the optimal termination probabilities βx are positive for all x ∈ Θ.8 Let r* = r(p, β*, σI) be the optimal feasible choice rule.
The proposition implies that the decision rate |$f_\theta =\sum _{\tilde{\theta }}\beta ^*_{\tilde{\theta }}p(\tilde{\theta }\mid \theta )$| in each state θ is proportional to (πθuθp(θ ∣ θ))1/2 and thus is high in those states that are reliably identified by the primitive experiment and in which the ex ante expected reward for the correct state recognition is high.
5.1. Salience
Bordalo et al. (2012) interpret salience as directed attention focus. They quote the popular work by Daniel Kahneman (2011):
‘Our mind has a useful capability to focus on whatever is odd, different or unusual.'
The quote states a causal relation between the two features of the salient phenomena: these are: (i) odd, different or unusual, and because of (i), people benefit from (ii) focusing their attention on such phenomena. Here, we confirm Kahneman’s intuition within our proposed framework. Our microfoundation of the salience effect is related to the insight emerging in psychological research on visual salience. Itti (2007) conceptualises the visual salience effect as attention allocation to a subset of the visual field that is ‘sufficiently different from its surroundings to be worthy of [one’s] attention'. Similarly, in our model, a payoff state is salient if it stands out sufficiently from similar states to be worthy of the focus of the agent’s information search.
For two states θ1 and θ2, we say that θ1 is more distinct than θ2 if for each other state θ3 ≠ θ1, θ2, p(θ1 ∣ θ3) < p(θ2 ∣ θ3). Suppose for illustration that the perceptual task involves recognition of a color from a set {azure, indigo, red}. Intuitively, the red color stands out of this set, and this is captured by the above definition. Assume that the two shades of blue are similar in that the agent’s first impression confuses them in 10% of cases, p(azure ∣ indigo) = p(indigo ∣ azure) = 0.1, but p(θ ∣ red) = p(red ∣ θ) = 0.01 for θ ∈ {azure, indigo}. Then, the red color is more distinct according to our definition than either of the two blue shades.
We focus on the effect stemming from the agent’s differential ability to perceptually discriminate between the states, and thus we abstract from the differences in the ex ante rewards across states; we assume that πθuθ is constant across all states. Additionally, we impose the following assumption:
5.1.1. Sufficient Precision
p(θ ∣ θ) > p(θ′ ∣ θ) for all θ ≠ θ′.
Since the primitive perception technology p is symmetric by assumption, the asymmetry in favour of the distinct state of the optimal terminal perception r* is driven solely by the optimisation of the termination strategy. To gain the intuition for the salience of the distinct states, consider a state θ* that is similar to many other states and an agent who always terminates the process after the first round: |$\beta =\mathbf {1}$|. This agent is relatively uninformed whenever she forms perception θ*, since the true state differs from θ* with a sizeable probability. The agent with this indistinct perception θ* would thus benefit from ‘having a second thought'—i.e., from running the primitive perception formation process once again. The optimal termination strategy involves repeating the primitive process with relatively high probability whenever the agent forms a perception of an indistinct state, and this shifts the terminal perception in favour of the distinct states.
6. Extensions
In the first subsection, we discuss how our model can accommodate agents with more general memory constraints. Subsection 6.2 accommodates agents who discount future payoffs at an exponential rate.
6.1. Sophisticated agents
To demonstrate the flexibility of the general model, we now discuss two specific settings. They feature sophisticated agents with non-trivial memory that can be used to aggregate information over several observed signal realisations. Perhaps surprisingly, we show that those settings can in fact be interpreted as special cases of our general model that on its face value allows only for trivial memory. We show that such accommodation of non-trivial memory is possible via expansion of the set |$\mathcal {P}$| of the primitive experiments. This allows us to establish the generality of the second-thought-free condition.
Moreover, when the state and action spaces are binary, then the setting with sophisticated agents boils down to the simple binary setting as formulated in Section 3, except for the determination of the perceptual-distance parameter |$\overline{d}$|, which is now endogenously determined by the agent’s ability to process information.
(imperfect information aggregation). This setting relaxes the agent’s inability to aggregate information across the repetitions of her reasoning by endowing her with a finite set of memory states that she can use to represent the signal histories. The setting of this example builds on Hellman and Cover (1970) and Wilson (2014). The agent can repeatedly sample from a single statistical experiment that generates signal realisations from a finite signal space. Additionally, the agent is endowed with a finite set of memory states. After each run of the experiment, the agent randomises between terminating and continuation of the decision process, where in the latter case, she may transition to a new memory state. The termination decisions and the transitions among memory states follow a stationary mixed strategy that conditions on the current memory state and the last observed signal. Once the agent terminates, she maps the last memory state and the last observed signal value to a chosen action. The feasible statistical experiment and the set of memory states specify a set of constructible choice rules, from which the agent chooses the one that maximises her ex ante expected payoff.
The formal specification of this example follows. The agent is endowed with one Blackwell experiment μ(x∣θ) with a finite signal space X and, additionally, with a finite set M of memory states m. After each run of the experiment μ, the agent either terminates or continues with decision-making. If the agent continues, then she transitions from the current memory state to a new memory state and reruns the statistical experiment μ(x ∣ θ). That is, the agent selects a (generalisation of the) termination strategy: |$\gamma :M\times X\longrightarrow \Delta \left( M\cup \lbrace \mathfrak {t}\rbrace \right)$|, where γ(m′ ∣m, x) is the probability that the agent in memory state m who has observed signal realisation x in the last run of the experiment μ continues with the decision-making and transitions to memory state m′, and |$\gamma (\mathfrak {t}\mid m,x)$| is the probability that such an agent terminates. The terminating agent chooses action σ(m, x) that depends both on the current memory state and on the signal realisation observed in the last run of μ. The agent starts the decision-making in the memory state m0. A pair γ, σ induces a θ-dependent Markov chain over the memory states that eventually terminates with choice σ(m, x), where m is the last memory state and x is the last signal realisation observed. Let p(a ∣ θ; γ, σ) be the probability that the agent terminates with the choice a in state θ, and let |$\mathcal {P}_{iia}$| be the set of all stochastic choice rules p that this agent can construct. She selects the choice rule from |$\mathcal {P}_{iia}$| that maximises her ex ante expected payoff.
We now demonstrate that this example is a special case of our baseline model. Consider the baseline model with the signal space X = A and the set of the feasible primitive experiments |$\mathcal {P}=\mathcal {P}_{iia}$|. The set |$\mathcal {R}(\mathcal {P}_{iia}) =\lbrace r(p,\beta ,\sigma ):p\in \mathcal {P} _{iia},\beta \in B, \sigma \in S\rbrace$| is then the set of stochastic choice rules that can be constructed as follows. The agent runs any process |$p\in \mathcal {P}_{iia}$|, and observes a signal value/action recommendation a with probability p(a ∣ θ). She terminates with probability βa, according to the termination strategy β = (βa)a∈A, and upon the termination chooses an action a′ = σ(a), where σ ∈ S is any mapping |$A\longrightarrow A$|. She reruns the process p with probability 1 − βa, observes a new action recommendation generated by p, et cetera, until she terminates after a stochastic number of repetitions of the process p.
As it turns out, no new choice rules beyond those from |$\mathcal {P}_{iia}$| can be constructed by these selective repetitions. This follows because the repetitions of the rule |$p\in \mathcal {P}_{iia}$| with the termination strategy β can always be replicated with an appropriate choice of a different rule in |$\mathcal {P}_{iia}$| that whenever p would terminate with a restarts the process from scratch with probability 1 − βa. Formally:
|$\mathcal {R}(\mathcal {P} _{iia})=\mathcal {P}_{iia}$|.
According to the lemma, Example 1 is a special case of our baseline model with |$\mathcal {P}=\mathcal {P}_{iia}$| and X = A, since in such a specification of the baseline model, the set of feasible rules coincides with those in Example 1. In particular, the optimal choice rule |$p^*\in \mathcal {P}_{iia}$| solving Example 1 coincides with the optimal rule |$r^*\in \mathcal {R}(\mathcal {P}_{iia})$| solving this specification of the baseline model.
The repeated-cognition problem with |$\mathcal {P}=\mathcal {P}_{iia}$| is purely formal in that the optimal termination probabilities |$\beta ^*_x=1$| for all x ∈ X = A, and thus the agent conducts the optimal process |$p^*\in \mathcal {P}_{iia}$| only once and terminates. Nevertheless, the observation that p* solves the repeated-cognition problem has an important implication.
The choice rule that solves Example 1 (imperfect information aggregation) is second-thought-free.
Wilson (2014) differs from this example mainly in that she assumes exogenous termination probabilities. By adding optimisation over the terminations to the model of Wilson, we gained the partial characterisation of the optimal choice rule with no need to fully solve the problem: one can conclude that the optimal choice rule is second-thought-free without analysing the optimal use of the memory states.
(partial forgetting). The agent of this example can remember up to a fixed finite number of signal realisations generated by a single statistical experiment. In each round of her decision process, she can discard a subset of the currently remembered signals values, extract a new signal realisation, or terminate, where each of these decisions is determined by a stationary mixed strategy that conditions on the currently remembered stock of the signal values. The statistical experiment and the maximal number of signals that the agent can remember determine the set of stochastic choice rules that she can construct, from which she chooses the rule that maximises her ex ante expected payoff.
We first formalise this example as follows. Let H be the set of signal histories h of length |h| ≤ N. The agent at a history h can (i) terminate her decision-making, (ii) discard some of the information accumulated, or (iii), if |h| < N, acquire a new signal realisation. (i) An agent terminating at h chooses action σ(h). (ii) An agent who discards some information transitions to a truncation h′ of her current history h.9 (iii) An agent who acquires a new signal realisation transitions to a history hx, where x is the new signal realisation drawn from μ(x∣ θ). The decision-making is governed by a pair of mappings |$\gamma :H\times \Theta \longrightarrow \Delta \left(H\cup \lbrace \mathfrak {t}\rbrace \right)$| and |$\sigma :H\longrightarrow A$|, where γ(h′ ∣h, θ) stands for the probability that the agent at history h in state θ continues decision-making and transitions to h′, and |$\gamma (\mathfrak {t}\mid h,\theta )$| is the probability of termination at history h in state θ. The mapping γ is constrained to satisfy (1) γ(h′ ∣h, θ) is independent of θ if h′ is a truncation of h, (2) |$\gamma (\mathfrak {t}\mid h,\theta )$| is independent of θ, (3) |$\frac{\gamma (hx\,\mid \,h,\theta )}{ \gamma (hx^{\prime }\,\mid \,h,\theta )}=\frac{\mu (x\,\mid \,\theta )}{\mu (x^{\prime }\,\mid \,\theta )}$|, (4) γ(h′ ∣h, θ) = 0 unless h′ is a truncation of h, or h′ = hx for some x ∈ X and |hx| ≤ N. Constraints 1 and 2 require the agent to condition the decision to discard information or to terminate only on her current history independently of the state. Constraint 3 allows the agent to expand her information set only by running the experiment μ(x∣ θ). Constraint 4 restricts each step of information acquisition to one draw from μ(x ∣ θ) or to a partial discarding of the accumulated information. Let p(a ∣ θ; γ, σ) be the probability that the agent who employs (γ, σ) terminates with action a in the state θ. The agent chooses γ and σ to maximise her ex ante expected payoff.
As with the previous example, let |$\mathcal {R}(\mathcal {P}_{pf})$| be the set of feasible choice rules in our baseline model with the set of feasible primitive experiments |$\mathcal {P}$| identified with |$\mathcal {P} _{pf}$|.
|$\mathcal {R}(\mathcal {P}_{pf})= \mathcal {P}_{pf}$|.
Thus, again, the rule |$p^*\in \mathcal {P}_{pf}$| solving this example, and the optimal rule |$r^*\in \mathcal {R}(\mathcal {P}_{pf})$| coincide, and thus the rule solving the example must be second-thought-free.
The choice rule that solves Example 2 (partial forgetting) is second-thought-free.
Additionally, when the state and action sets are binary, Proposition 2 applies to both examples with |$\overline{d}=\frac{ p^*(1\,\mid \,1)p^*(0\,\mid \,0)}{p^*(0\,\mid \,1)p^*(0\,\mid \,1)}$|, and thus, relative to the baseline setting in which the agent remembers only one signal, the examples have the same solution except for the determination of the endogenous parameter |$\overline{d}$|. Thus, for instance, if the state 1 is a priori more attractive than state 0, then the agent is more likely to make the correct choice in state 1 than in state 0; r*(1 ∣ 1) > r*(0 ∣ 0). Like in Subsection 4.1, the optimal decision procedure favours the evidence supporting the a priori attractive state.
6.2. Impatient Agents
Our baseline model abstracts from the cost of time in that the agent is only concerned with how the repetitions of the signal extraction affect the correlation of the signal with the state. We now incorporate discounting.
The next result generalises the second-thought-free condition. Let |$r^*_\delta =r_\delta (p^*,\beta ^*,\sigma ^*)$| be the choice rule solving the discounted repeated-cognition problem (12).
The condition has the same interpretation as the second-thought-free condition in the absence of discounting. The left-hand side is the payoff for following the optimal decision process |$r^*_\delta$| summed up across all contingencies that terminate with choice of a. The right-hand side is the payoff that the agent would get across the same contingencies if she restarted the decision process |$r^*_\delta$| instead of the termination.
For illustration, we now revisit the confirmation bias application from Subsection 4.1 with an impatient agent. We find that, unless discounting is too strong, the impatient agent chooses qualitatively the same strategy as the patient one, although the impatient agent speeds up her decision-making by choosing larger termination probabilities.
The setting is as follows. The agent chooses a ∈ {0, 1} and receives u(a, θ) = uθ > 0 if a = θ, and zero reward otherwise. Action 1 is a priori more attractive than action 0; π1u1 > π0u0. The agent has access to a single primitive experiment p that generates signal values in X = {0, 1}. The experiment is symmetric with probabilities p(1 ∣ 1) = p(0 ∣ 0) = α > 1/2. We impose a sufficient-informativeness condition that the signal is sufficiently precise relative to the ex ante attractiveness of action 1: |$\frac{\alpha }{1-\alpha }\gt \frac{\pi _1u_1}{\pi _0u_0}$|.
The agent chooses the action equal to the last observed signal realisation. She terminates her decision-making immediately after she encounters signal realisation|$1$|: |$\beta _{1}^{\ast }=1$|. When |$\delta \in \big( \frac{1}{\alpha +(1-\alpha )R} ,1\big]$|, then the agent who observes |$x = 0$|terminates with an interior probability|$\beta _{0}^{\ast }\in (0,1)$|that decreases in δ. When|$\delta \in \big( 0,\frac{1}{\alpha +(1-\alpha )R}\big)$|, then the agent terminates immediately:|$\beta _{0}^{\ast }=\beta _{1}^{\ast }=1$|.
7. Summary
Agents, who cannot comprehend all facts available to them, benefit from selective attention. We show that agents can implement a targeted information search in a process that resembles the natural phenomenon of hesitation. Like a hesitant person, the agent can, conditional on the action contemplated, decide whether she implements the action or whether she will have a second thought, and run the cognition process once more. Such hesitation can be productive, despite consisting of repetitions of the same stochastic cognition process. By conditioning the probability of the repetition on the conclusion of the reasoning, the agent controls the correlation of her terminal conclusion and the payoff state. The optimal decision process arising in our model exhibits natural hesitation patterns: the agent will have second thoughts—that is, she will repeat her cognition—whenever the expected payoff for the currently favoured choice is inferior to the expected payoff for continuing decision-making. At optimum, the agent terminating the decision-making must be indifferent between terminating with the currently contemplated action, and repeating the process.
In a sense, the condition formalises the concept of a reasonable doubt. Abstracting from many considerations such as information aggregation across the jury members, a jury deciding a trial under common law should be, if using the optimal decision procedure, indifferent between declaring a verdict and announcing a hung jury and initiate retrial.
Let us conclude by reviewing the limitations of our main result. The central assumption—the ability of the agent to freely repeat her decision process—may fail for several reasons. One reason is that the agent may only have access to a limited data set that constrains her to a finite number of repetitions of the primitive decision process, making the optimal termination strategy non-stationary. Another complication arises if the outcomes of distinct runs of the same cognition process are not conditionally independent as assumed in our model; this may arise if some cognition errors are systematic and are likely to emerge in distinct repetitions of the cognition. We conjecture that the second-thought-free condition holds in such a case, with the agent internalising the correlations between the cognition runs.
Appendix A
A.1. Proofs for Section 3
Assume that there exists a solution with βx positive for n > 2 signals x ∈ X. We show that then there exists a solution with n − 1 positive signals. The proposition follows from the induction on n.
Let us prove the induction step. Fix the primitive experiment p employed by the agent, let β be an optimal termination strategy for the given p, and let X′ be the set of signals with positive βx, and write shortly s(x∣ θ) for the effective experiment s(x ∣ θ; p, β) induced by p and β. Let us abuse notation by letting s(x) = ∑θπθs(x ∣ θ) stand for the unconditional effective probability of x. For x ∈ X′ let qx ∈ Δ(Θ) be the posterior belief upon terminating with x: qx(θ) = πθs(x ∣ θ)/s(x).
Since |X′| > 2 and the state space Θ is binary, there exists a signal x* ∈ X′ such that |$q_{x^*}$| is in the convex hull of the posteriors qx, x ∈ X′∖{x*}. Let μx be the coefficients that decompose |$q_{x^*}$| into qx, x ∈ X′∖{x*}. That is, μ ∈ Δ(X′∖{x*}) such that |$q_{x^*}(\theta )=\sum _{x\in X^{\prime }\setminus \lbrace x^*\rbrace }\mu _x q_x(\theta )$| for all θ ∈ Θ.
The agent’s objective is linear with respect to the choice rule r(p, β, σ). Thus, the optimal rule is the point of tangency of the set |$\mathcal {R}_{\overline{p},\sigma _I}$| of the feasible rules and of an indifference line; see Figure 2. The slope |$\frac{d r\left(0\,\mid \,0;\overline{p},\beta ,\sigma _I\right)}{d r\left(1\,\mid \,1;\overline{p},\beta ,\sigma _I\right)}$| is decreasing in |$r\left(1\mid 1;\overline{p},\beta ,\sigma _I\right)$| and attains value |$-1/\overline{d}$| for |$r\left(1\mid 1;\overline{p},\beta ,\sigma _I\right)=0$|, and value |$-\overline{d}$| for |$r\left(1\mid 1;\overline{p},\beta ,\sigma _I\right)=1$|. Thus, when |$R\lt 1/\overline{d}$| or |$R\gt \overline{d}$|, then the problem has the corner solution as specified in statements 1 and 2 of the proposition.
A.2. Proofs for Section 5
The next result is an auxiliary lemma used in the proof of Proposition 4.
Condition (A3) is a strengthening of the second-thought-free condition (5). It requires that the agent who has terminated the decision process with perception θ, and knows that the second run of the process r* terminates with a value θ′ is indifferent between θ and θ′. This condition is stronger than the second-thought-free condition (5), since (5) requires (A3) to hold only on average across all θ′. This strengthening holds for the special case of a symmetric experiment p.
A.3. Proofs of the Results from Section 6
All rules feasible in |$\mathcal {P}_{iia}$| are feasible in |$\mathcal {R}(\mathcal {P}_{iia})$|: |$\mathcal {R}(\mathcal {P}_{iia})\supset \mathcal {P}_{iia}$|. This is immediate since when βa = 1 for all a ∈ A, then r(p, β, σI) = p for all |$p\in \mathcal {P}_{iia}$|.
It remains to show |$\mathcal {R}(\mathcal {P}_{iia})\subset \mathcal {P}_{iia}$|. Consider |$p(\gamma ,\sigma )\in \mathcal {P}_{iia}$| constructed in the setting of Example 1 by the use of the generalised termination strategy γ(m, x), and the action strategy σ(m, x). Recall that |$r(p(\gamma ,\sigma ),\beta ,\hat{\sigma })$| is the choice rule constructed by repetitions of the rule p(γ, σ) according to the termination strategy β = (βa)a∈A and by applying the action strategy |$\hat{\sigma }:A\longrightarrow A$| upon the termination. We need to show that there exists γ′ and σ′ such that |$r(p(\gamma ,\sigma ),\beta ,\hat{\sigma })=p(\gamma ^{\prime },\sigma ^{\prime })$|. This is indeed so when the termination probability |$\gamma ^{\prime }(\mathfrak {t}\mid m,x)=\gamma (\mathfrak {t}\mid m,x)\beta _{\sigma (m,x)}$| for m ≠ m0, the transition probability to the original memory state m0 is |$\gamma ^{\prime }(m_0\mid m,x)=\gamma (m_0\mid m,x)+ \gamma (\mathfrak {t}\mid m,x)\left(1-\beta _{\sigma (m,x)}\right)$|, which is the sum of the probabilities that the original process γ transits to m0 and that the decision process |$r(p(\gamma ,\sigma ),\beta ,\hat{\sigma })$| restarts after termination of p(γ, σ). Additionally, for all |$\tilde{m}\ne m_0$|, |$\gamma ^{\prime }(\tilde{m}\mid m,x)=\gamma (\tilde{m}\mid m,x)$|. The above choice of γ′ implies that the process p(γ′, σ′) replicates the Markov process over the memory states under |$r(p(\gamma ,\sigma ),\beta ,\hat{\sigma })$|. Finally, to replicate the choices upon terminations, we set the action strategy |$\sigma ^{\prime }(m,x)=\hat{\sigma }(\sigma (m,x))$| for all (m, x).
Again, trivially, |$\mathcal {R}(\mathcal {P}_{pf})\supset \mathcal {P}_{pf}$|, since r(p, (1, …, 1), σI) = p for all |$p\in \mathcal {P}_{pf}$|. Additionally, |$\mathcal {R}(\mathcal {P}_{pf})\subset \mathcal {P}_{pf}$|. This is indeed so because for any β = (βa)a∈A and any |$\hat{\sigma }:A\longrightarrow A$|, |$r(p(\gamma ,\sigma ),\beta ,\hat{\sigma })=p(\gamma ^{\prime },\sigma ^{\prime })$| where the termination probability |$\gamma ^{\prime }(\mathfrak {t}\mid h,\theta )=\gamma (\mathfrak {t}\mid h,\theta )\beta _{\sigma (h)}$|, the transition probability to the empty signal history ∅ is set to |$\gamma ^{\prime }(\emptyset \mid h,\theta )=\gamma (\emptyset \mid h,\theta )+ \gamma (\mathfrak {t}\mid h,\theta )\left(1-\beta _{\sigma (h)}\right)$|, and for all |$\tilde{h} \ne \emptyset$|, |$\gamma ^{\prime }(\tilde{h}\mid h,\theta )=\gamma (\tilde{h}\mid h,\theta )$|. Finally, the action strategy is set to |$\sigma ^{\prime }(h)=\hat{\sigma }(\sigma (h))$| for all histories h.
Due to the condition that α/(1 − α) > R, any (β, σ) that leads to a selection of only one action with certainty is dominated by the decision process that terminates after the first round and chooses an action equal to the observed signal value. Thus, both |$\beta ^*_0$| and |$\beta ^*_1$| are positive, and the action strategy is σ*(x) = x or σ*(x) = 1 − x. Let us show that the action strategy σ* must be the identity function σI.
Further, it must hold that |$\beta ^*_0=1$| or |$\beta ^*_1=1$|. Otherwise, if both |$\beta ^*_0\lt 1$| and |$\beta ^*_1\lt 1$|, then the agent can increase both |$\beta ^*_x$| by a same factor. This preserves the conditional action distribution in each state θ and increases the decision rates in both states, and thus it is a profitable deviation.
When |$\delta \gt \frac{1}{\alpha + (1-\alpha )R}$|, then the condition has an interior solution and the derivative of the value (A8) with respect to β0 at β0 = 1 is negative. Thus, for this range of parameters, the unique |$\beta ^*_0$| satisfying the first-order condition is the interior value that solves the quadratic equation, solution of which decreases in δ.
Notes
This paper has been previously circulated under the title: ‘On Second Thoughts, Selective Memory, and Resulting Behavioral Biases.’ We thank Mark Dean, Andrew Ellis, Alessandro Pavan, Philip Reny, Larry Samuelson, Colin Stewart, Balazs Szentes, colleagues at the University of Edinburgh, the audiences at Bocconi, Queen Mary, Columbia, Ecole Polytechnique, Zurich and St Andrews universities, workshops and conferences in Erice, Alghero, Faro, Gerzensee, New York, Cambridge, Vancouver, and Barcelona, the editor (Gilat Levy), and the two referees for helpful comments. Ludmila Matysková and Jan Šedek provided excellent research assistance. Deborah Nováková and Laura Straková have helped with English. Jakub Steiner has received financial support from the Czech Science Foundation grant 16-00703S and from the ERC grant 770652. Philippe Jehiel thanks the ERC grant n○ 742816 for funding.
Footnotes
In the latter case, while the second-thought free conditions need not be satisfied for each problem in isolation, we would still derive that some degree of selective hesitation is optimal.
Somewhat less related is a literature that explores how exogenous analogy-based and extrapolation-driven errors in learning lead to behavioural biases; see coarse learning in Jehiel (2005) and its application to overoptimism in Jehiel (2018). By contrast, in our approach, the agent optimises the error distribution given the constraints.
We do not allow for mixed action strategies since the optimum can always be achieved with a pure action strategy.
This insight exploits the assumption of perfect patience, since impatient agents would trade off informativeness against delay costs. We conjecture that when exponential discounting is considered, then the result that the agent ignores all but two signal realisations continues to hold for sufficiently patient agents and generic signal structures.
Such summary statistic of |$\mathcal {P}$| continues to exist when 2 < |X| < ∞. For any pair of signal realisations (x, x′) and an experiment p, let |$d_{x,x^{\prime },p}=\frac{p(x\,\mid \,0)p(x^{\prime }\,\mid \,1)}{p(x\,\mid \,1)p(x^{\prime }\,\mid \,0)}$|. Then, |$\overline{d}$| is the maximum of |$d_{x,x^{\prime },p}$| over all ordered pairs (x, x′) and experiments p.
We can always achieve this by relabelling the states θ and the signals values x, unless ρ0 = ρ1 or ρ0 = 1 − ρ1.
These two assumption are satisfied when p(θ ∣ θ) is sufficiently close to one for each θ.
A truncation is obtained by deleting one or more last elements in h.
Rules ra that always select an action a can be trivially constructed from p and σI by using βa = 1 and βx = 0 for x ≠ a.