Abstract

A memoryless agent can acquire arbitrarily many signals. After each signal observation, she either terminates and chooses an action, or she discards her observation and draws a new signal. By conditioning the probability of termination on the information collected, she controls the correlation between the payoff state and her terminal action. We provide an optimality condition for the emerging stochastic choice. The condition highlights the benefits of selective memory applied to the extracted signals. Implications—obtained in simple examples—include (i) confirmation bias, (ii) speed-accuracy complementarity, (iii) overweighting of rare events, and (iv) salience effect.

Economic agents often acquire information about the state of the economy before making their decisions. The information is typically modelled as a signal that helps the agent refine the distribution of the state and improve the decision-making. Often, signals come over time and agents can absorb only a small number of them. We capture this information-processing friction by assuming that agents receive as many signals as they wish but can remember only a finite number of them when making their choices. In the simplest setting we analyse, the agent can only remember one signal. A key strategic variable that we consider is to allow the agent to ignore some signals with positive probability and restart the signal extraction process. We allow agents to employ an arbitrary stationary decision process that specifies for each possible signal realisation a probability with which the agent restarts the process as well as the chosen action in case of termination. We do not impose time constraints and costs in the basic formulation so that the friction comes solely from the limited information-storing capacity of the agent.

We ask ourselves: Should the agent optimally make her choice as soon as she receives the first signal whatever the realisation of it is, or could she be better off by rerunning the very same information-acquisition process? Can hesitation—selective repetition of a fixed stochastic decision procedure—be welfare-enhancing?

A general insight is that selective rerunning of the primitive decision procedure is typically optimal. To document this most generally, we provide a simple necessary condition satisfied by the optimal rerunning strategy. The result is an interim indifference condition imposed on the agent who has concluded her decision-making with a plan to choose a particular action. Given the recommended action, the agent’s posterior expected payoff from implementing this action must be the same as the posterior expected payoff from rerunning the whole decision-making—the whole selective repetitions of the primitive signal extraction—and implementing whichever action the second run of the decision-making will recommend. We refer to this as to the second-thought-free condition.

For illustration, consider a binary decision of whether to make an investment of a fixed size. The agent receives payoff 1 if she invests in the good state of the economy, payoff −1 if she invests in the bad state, and receives 0 when she does not invest whatever the state. One of the two states is a priori more likely; for sake of concreteness, let the prior probability of the good state be 2/3. Both states give rise to a population of good and bad signals, with the share of the good signals at 90% in the good state and 10% in the bad state. The agent draws possibly several signal realisations in sequence but remembers only the last one when making her investment choice. As follows from simple optimisation considerations, assume she invests if and only if the last remembered signal was good. Observe that the decision rule generated by the immediate termination upon the first signal that comes in does not generate a second-thought-free choice rule: An agent whose first observed signal was bad prefers to rerun the decision process, since the new run will either lead to not investing again or will give rise to the signal realisation that conflicts with the first observation and will lead to investing. Since, conditional on two conflicting signals, the a priori more common state is more likely, investing is preferred in this contingency. The agent benefits from having second thoughts when the first observed signal is surprising.

We interpret the probability of terminating the decision process after receiving a particular signal as a search intensity targeting this particular signal. A higher probability of termination at a given information set inflates the likelihood that the agent makes the terminal choice at the set. We show that the failure of the second-thought-free condition with uniform search intensity in the above investment decision example indicates that relative to the uniform search, the agent benefits from decreasing the search intensity for the bad attribute. More generally, the second-thought-free condition follows from the first-order condition imposed on the optimal search intensities.

The model provides microfoundations to a range of behavioural stylised facts. The unifying principle of our behavioural insights is the intuition that the agent targets her search towards the type of evidence that would provide her with more valuable posteriors under the uniform search. This principle generates confirmation bias in the context of the above example, since evidence that confirms the agent’s prior leads to more informed posterior than does evidence that contradicts the prior. An optimally targeted information search also generates speed-accuracy complementarity in the same setting; that is, accuracy of choice declines with the response time. The effect is generated by the confirmation bias: The agent encountering evidence contradicting her prior is likely to disregard the evidence and to have a second thought. Hence, long response times indicate a surprising state of the world, and the constrained-optimal choice rule commits errors in the surprising state relatively often. Overweighting of rare events occurs in a related setting in which the agent’s task is to form a probability belief about an event that is known to be rare, such as a flight accident, by observing a random flight outcome. Since observing a flight accident is far more informative about the probability of future accidents than observing an uneventful flight, the agent optimally biases her search towards eventful flights. In the last behavioural application, we show in a setting with multiple states that distinct states of the world are salient in the sense that they attract the agent’s attention (i.e., trigger higher termination rates in our framework). The effect arises because an indistinct perception stimulus that can be generated by several similar states is less informative than a distinct perception stimulus that is most likely generated by a specific distinct state. Hence, the optimal information search targets stimuli indicating distinct states.

Our leading interpretation of the model is in terms of a single-person with information storage limitations but perfect ability to adjust optimally her termination strategies as a function of what she remembers. These adjustments can be viewed as a result of a successful trial and error process or as a result of evolutionary pressures in which case the adjustments may not be fine-tuned to each specific problem.1 Alternatively, one may think of the termination strategies and the final decisions as being chosen by different persons and only the one in charge of the final decision would be subject to information storage limitations, thereby allowing the termination strategies to be optimally determined.

There is a wide range of studies that propose different modelling of optimisation over information structures. Relative to rational inattention models (Sims, 2003), we provide a procedural micro-foundation of our set of the feasible information structures (which must be obtained through variations of the termination strategies and as such can straightforwardly be related to the time dimension of the decision process). Perhaps surprisingly, our memoryless agent shares some of the flexibility in her choice of information structures with the sequential-sampling model of Wald (1945) in which there is perfect information aggregation. But, our approach allows for a simple derivation of the speed-accuracy complementarity, which is less immediate to obtain with sequential-sampling models (see however Fudenberg et al., 2018).

Relative to studies based on finite automata (Hellman and Cover, 1970; Wilson, 2014; Basu and Chatterjee, 2015), our approach yields a simple necessary condition for the optimal choice rule, the second-thought-free condition. This condition arises because our agent chooses the probability of termination at each of her information sets. Such a termination optimisation is absent from the related models with finite automata in which the termination is exogenous (or the objective involves asymptotic performance as time diverges). The second-thought-free condition allows us to characterise the optimal choice rule in the binary settings.

This article belongs to a growing economic literature that explains behavioural stylised facts as the constrained-optimal behaviour of decision-makers facing information processing frictions. For instance, Robson (2001), Rayo and Becker (2007), Netzer (2009), and Khaw et al. (2017) provide microfoundations for risk attitudes; Gabaix and Laibson (2017) endogenise discounting; and Wilson (2014), Compte and Postlewaite (2012), and Leung (2020) establish constrained-optimal ignorance of weakly informative news.2

1. Model

An agent faces a decision under uncertainty. She chooses an action aA and receives a payoff u(a, θ) in the fixed payoff state θ ∈ Θ drawn from an interior prior π ∈ Δ(Θ). The sets Θ and A are finite. The agent chooses a Blackwell experiment p, where p is a family of conditional signal distributions p(x∣ θ) that depend on θ ∈ Θ. The experiment generates a signal realisation x from a finite space X. The conditional signal distributions are fully mixed: p(x∣ θ) > 0 for all x, θ. We allow the agent to choose among possibly several such experiments and we let |$\mathcal {P}$| denote the exogenous set of experiments from which she chooses. We impose no restrictions on |$\mathcal {P}$| (other than the full-support of each p).

The agent can repeat the selected experiment arbitrarily many times, but she is unable to aggregate the information across the repetitions. Each run of the experiment is a cognition that exhausts the agent’s capacity dedicated to the problem being solved. Once the agent hits the constraint at the end of the experiment, she can continue only after she unclogs her capacity by amnesia.

We model this as follows. The agent can condition the repetition of the experiment on the last observed signal realisation. She chooses a vector β = (βx)xXB = [0, 1]|X|∖{(0, …, 0)} of termination probabilities βx for each signal realisation x; we call β a termination strategy. The agent runs the experiment p for the first time, receives signal realisation x(1) with probability p(x(1) ∣ θ) and terminates the reasoning with probability |$\beta _{x^{(1)}}$|⁠. She restarts with the complementary probability |$1-\beta _{x^{(1)}}$|⁠, and receives a signal realisation x(2) from a new run of the process p with probability p(x(2) ∣ θ), terminates with probability |$\beta _{x^{(2)}}$| or restarts with probability |$1-\beta _{x^{(2)}}$|⁠, and continues until she terminates after a random number of repetitions of p; see Figure 1. When the agent chooses having distinct βx for different x, then she implements the familiar idea of selective memory; some facts and observations are easily forgotten whereas others are remembered and they trigger choice. After the agent terminates the reasoning with a terminal signal realisation x, she selects an action a = σ(x) according to an action strategy|$\sigma :X\longrightarrow A$|⁠.3 Let S be the set of all mappings from X to A.

Fig. 1.

For each |$(p, \beta , \sigma )$|⁠, the decision process is a Markov chain evolving on the agent’s states of mind, with transition probabilities that depend on the payoff state |$\theta$|⁠. The chain begins in the state of mind O and transits to states xX = {x1, x2} with probabilities |$p(x\mid \theta )$|⁠. The process returns to O with probability |$1- \beta _x$|⁠, or terminates with choice of |$a= \sigma (x)$| with probability |$\beta _x$|⁠.

Fig. 1.

For each |$(p, \beta , \sigma )$|⁠, the decision process is a Markov chain evolving on the agent’s states of mind, with transition probabilities that depend on the payoff state |$\theta$|⁠. The chain begins in the state of mind O and transits to states xX = {x1, x2} with probabilities |$p(x\mid \theta )$|⁠. The process returns to O with probability |$1- \beta _x$|⁠, or terminates with choice of |$a= \sigma (x)$| with probability |$\beta _x$|⁠.

By excluding the termination strategy (0, …, 0) we force the agent to make a decision aA. Since β ≠ (0, …, 0) and each feasible experiment p generates all signal values with a positive probability in each state, the decision process almost surely eventually terminates.

The outcomes of distinct runs of p are conditionally independent. Thus, the probability that the agent terminates after t repetitions of the experiment p resulting in the signal history |$\mathbf {x}^t=(x^{(1)},\dots ,x^{(t)})$| is
$$\begin{eqnarray} \rho \left(\mathbf {x}^t\mid \theta ;p,\beta \right)= \beta _{x^{(t)}}p(x^{(t)}\mid \theta )\prod _{l=1}^{t-1} \left(1-\beta _{x^{(l)}}\right)p\left(x^{(l)}\mid \theta \right). \end{eqnarray}$$
(1)
We let
$$\begin{eqnarray} r(a\mid \theta ;p,\beta ,\sigma )=\sum _{t=1}^\infty \sum _{\mathbf {x}^t:\sigma (x^{(t)})=a}\rho \left(\mathbf {x}^t\mid \theta ;p,\beta \right) \end{eqnarray}$$
(2)
denote the probability that the agent who employs the experiment p, the termination strategy β, and action strategy σ terminates with action a in state θ. We call r(p, β, σ) ≔ (r(a ∣ θ; p, β, σ))aA,θ∈Θ the choice rule. The set of feasible choice rules is |$\mathcal {R}(\mathcal {P} )=\lbrace r(p,\beta ,\sigma ):p\in \mathcal {P},\beta \in B, \sigma \in S\rbrace$|⁠. Sometimes we abuse notation, omit p, β, σ and write r(a∣ θ) for the probability of a in state θ under the rule constructed by some p, β, σ.
The repeated-cognition problem is to select, for a given prior π, utility function u and set |$\mathcal {P}$|⁠, a feasible choice rule r that maximises the expected payoff:
$$\begin{eqnarray} \max _{r\in \mathcal {R}(\mathcal {P})}\sum _{\theta \in \Theta ,a\in A}\pi _{\theta }r(a\mid \theta )u(a,\theta ). \end{eqnarray}$$
(3)

The optimisation in (3) can be an outcome of selective pressures that favour successful decision procedures via cultural or biological evolution, or via competition of firms differing in their internal procedures. There are no costs to delaying the decision in our model but incorporating such costs would not affect our qualitative insights when these are not too big. We address agents with less severe memory constraints and with exponential time discounting preferences in Section  6.

2. Optimal Cognition Biases

We now derive a necessary optimality condition that the choice rule solving the repeated-cognition problem must satisfy. Generically, the condition requires the agent to engage in selective information processing—that is, to ignore some signals more often than others.

2.1. Second-Thought-Free Choice Rules

We start with a definition of second-thought-free choice rules. If the agent’s decision process generates such a rule, then she has no incentive to rerun the process regardless of the action recommendation with which the process terminates. Our main result below states that the optimal rule is second-thought-free.

Let r be a generic stochastic choice rule that specifies conditional probabilities r(a ∣ θ) of each action aA in each state θ ∈ Θ.

 
Definition 1.
The choice rule r is second-thought-free with respect to the utility u and prior π if the agent prefers each action recommended by the rule to a new run of the rule r. That is, for each action a chosen with positive probability,
$$\begin{eqnarray} \operatorname{E}_{\alpha }[u(a_1,\theta )\mid a_{1}=a]\ge \operatorname{E}_{\alpha }[u(a_{2},\theta )\mid a_{1}=a], \end{eqnarray}$$
(4)
where the expectations are with respect to the random variables θ and a2, and α(θ, a1, a2) = πθr(a1∣ θ)r(a2∣ θ) is the joint distribution of the state and two actions consecutively generated by r.

The definition requires the agent who terminates with an action plan a to weakly prefer a to forgetting a and choosing whichever action a new run of the decision process will recommend. Although the definition allows for the strict preference against having a second thought, the next lemma shows that if a choice rule is second-thought-free, then the agent is indifferent between terminating and the second thought.

 
Lemma 1.
If r is second-thought-free, then(4)is met with equality for each action a chosen with positive probability:
$$\begin{eqnarray} \operatorname{E}_\alpha [u(a_1,\theta )\mid a_1=a]=\operatorname{E}_\alpha [u(a_2,\theta )\mid a_1=a]. \end{eqnarray}$$
(5)

We refer to (5) as the second-thought-free condition.

 
Proof.
If (4) holds with strict inequality for some a chosen with positive probability, then
$$\begin{eqnarray} \operatorname{E}_\alpha \left[u(a_1,\theta )\right] =\operatorname{E}_\alpha \left[\operatorname{E}_\alpha \left[u(a_1,\theta )\mid a_1\right] \right] \gt \operatorname{E}_\alpha \left[\operatorname{E}_\alpha \left[u(a_2,\theta )\mid a_1\right] \right]=\operatorname{E}_\alpha \left[u(a_2,\theta )\right], \end{eqnarray}$$
which contradicts that a1 and a2 are conditionally iid.

2.2. Optimality Condition

We provide here a general necessary optimality condition imposed on the stochastic choice rule.

 
Proposition 1.

If a choice rule solves the repeated-cognition problem (3), then it is second-thought-free and satisfies (5).

To understand the statement, consider the optimal choice rule r* generated by a process that consists of a random number of repetitions of a primitive cognition p. Once these repetitions of p terminate with a signal realisation x and the agent is about to take an action a = σ(x), then, according to the proposition, she must be indifferent between a, and running the process associated with r* from scratch, where the new run of r* would involve new repetitions of p.

To prove Proposition 1, we introduce an effective experiment s(p, β). While the primitive experiment p(x ∣ θ) specifies the probability that its one run results in signal x, we define s(x∣ θ; p, β) to be the probability that selective repetitions of p according to the termination strategy β terminate with x. Relative to p(x ∣ θ), the effective probabilities s(x ∣ θ; p, β) are inflated for those x at which the agent terminates with a high probability βx.

 
Lemma 2.
An agent who employs a primitive experiment p and a termination strategy β terminates with x in state θ with probability:
$$\begin{eqnarray} s(x\mid \theta ;p,\beta )=\frac{\beta _{x}p(x\mid \theta )}{\sum _{x^{\prime }\in X}\beta _{x^{\prime }}p\left( x^{\prime }\mid \theta \right) }. \end{eqnarray}$$
(6)
 
Proof.
Experiment s(p, β) satisfies the recursive formula
$$\begin{eqnarray} s(x\mid \theta ;p,\beta )=\beta _x p(x\mid \theta )+\sum _{x^{\prime }\in X} \left(1-\beta _{x^{\prime }}\right)p\left(x^{\prime }\mid \theta \right) s(x\mid \theta ;p,\beta ). \end{eqnarray}$$
The first summand is the probability that the agent terminates with signal x after the first run of the experiment p. The second summand is the probability that the agent continues after the first run and terminates with x later. Solving for s(x∣ θ; p, β) gives (6).

The lemma implies that s(p, β) and hence also r(p, β, σ) are homogeneous of degree zero with respect to β. Thus, since we abstract from the delay costs, for any optimal termination strategy β*, αβ* for α ∈ (0, 1) is optimal too, and it generates the same optimal choice rule r* as β*. This multiplicity of implementation of the optimal choice rule would disappear in a natural approximation of our model with exponential discount factor approaching 1. Such approximation would select the quickest available decision process that implements the optimal feasible rule r*; that is, it would impose that maxxXβx = 1.

 
Proof of Proposition 1.
Using (6), we rewrite the objective as follows.
$$\begin{eqnarray} \max _{p\in \mathcal {P},\beta \in B,\sigma \in S}\sum _{\theta \in \Theta ,x\in X}\pi _{\theta }\frac{\beta _{x}p(x\mid \theta )}{\sum _{x^{\prime }\in X}\beta _{x^{\prime }}p\left( x^{\prime }\mid \theta \right) }u(\sigma (x),\theta ). \end{eqnarray}$$
(7)
Let rule r(p*, β*, σ*) solve the repeated-cognition problem. Consider an action a chosen with a positive probability and x such that σ*(x) = a and |$\beta ^*_x\gt 0$|⁠. The constraint βx ≥ 0 is not binding for this x, and the first-order condition of (7) with respect to βx is:
$$\begin{eqnarray} \sum _{\theta \in \Theta } \pi _\theta \frac{s\left(x\mid \theta ;p^*,\beta ^*\right)}{\beta ^*_x} u(a,\theta ) - \sum _{\theta \in \Theta ,x^{\prime }\in X} \pi _\theta s\left(x^{\prime }\mid \theta ;p^*,\beta ^*\right)\frac{s\left(x\mid \theta ;p^*,\beta ^*\right)}{\beta ^*_x} u\left(\sigma ^*(x^{\prime }),\theta \right) &=& \\ \sum _{\theta \in \Theta } \pi _\theta \frac{s\left(x\mid \theta ;p^*,\beta ^*\right)}{\beta ^*_x} u(a,\theta ) - \sum _{\theta \in \Theta ,a^{\prime }\in A} \pi _\theta r\left(a^{\prime }\mid \theta ;p^*,\beta ^*,\sigma ^*\right)\frac{s\left(x\mid \theta ;p^*,\beta ^*\right)}{\beta ^*_x} u\left(a^{\prime },\theta \right) &\ge &0. \end{eqnarray}$$
Multiplication by |$\beta ^*_x$|⁠, summation over all x such that σ*(x) = a, and division by ∑θπθr(a ∣ θ; p*, β*, σ*) gives (4). Thus, the terminating agent weakly prefers termination to continuation. Lemma 1 implies the indifference (5).

Since the objective function in (7) is homogenous of degree zero with respect to β, we can restrict β to the simplex over the signal set X. This simplex is compact, the objective function in (7) is continuous in β and the p(x∣ θ), hence the repeated-cognition problem has a solution whenever the set of the primitive experiments |$\mathcal {P}$| is compact.

2.2.1. Comment

Our agent can be viewed as having imperfect recall in the sense of Piccione and Rubinstein (1997). Our approach corresponds to their ex ante approach, and the insight of Proposition 1 can be related to the observation in their absent-minded driver example that the ex ante optimal solution is also a (modified) multi-self equilibrium in which the decision problem is viewed as a team composed of multiple selves all sharing the decision-maker’s objective.

3. Analytical Solution of the Binary Setting

The action and state sets are binary: A = Θ = {0, 1}. To avoid a trivial case, we assume that neither action is dominant. Then, without loss of generality, u(a, θ) = uθ > 0 if a = θ and u(a, θ) = 0 otherwise. State θ is drawn from an interior prior π ∈ Δ(Θ). The exogenous set |$\mathcal {P}$| of the feasible statistical experiments is finite, and each |$p\in \mathcal {P}$| delivers a signal x from a finite signal space X with probability p(x ∣ θ). The agent chooses |$p\in \mathcal {P}$|⁠, the termination strategy β = (βx)xX and action strategy |$\sigma :X\longrightarrow A$| to maximise the expected payoff.

The first result states that there exists a solution in which the agent ignores all but two signal realisations of the chosen experiment p. That is, she always repeats the experiment upon encountering all but two signals. Roughly, the result follows because it is advantageous to consider only the most informative signal realisations.4

 
Lemma 3.

There exists a solution in which the termination probability βxis positive for at most two signal values xX.

See Appendix for the proofs omitted in the main text.

Based on the lemma, we can, without loss of generality, restrict the signal space X to be binary, and identify it with the action and state spaces, X = A = Θ. Again without loss of generality, we choose signal labels in each experiment in such a way that each experiment |$p\in \mathcal {P}$| satisfies the monotone likelihood ratio property: p(1 ∣ θ)/p(0 ∣ θ) increases in θ. We continue to assume that p(x∣ θ) > 0 for all x and θ.

Define σI to be the identity function, and let the agent employ the binary experiment p and the action strategy σI. The next lemma characterises the set |$\mathcal {R}_{p,\sigma _{I}}=\lbrace r(p,\beta ,\sigma _{I}):\beta \in B\rbrace$| of the feasible choice rules that such an agent has access to. To characterise this set, we introduce a parameter that we dub perceptual distance between states 0 and 1 under the experiment p:
$$\begin{eqnarray} d_{p}=\frac{p(1\mid 1)p(0\mid 0)}{p(0\mid 1)p(1\mid 0)}. \end{eqnarray}$$
The perceptual distance is a summary statistic of the experiment p. The larger it is, the more p reliably discriminates between the two states. The monotone likelihood property of each p implies that dp > 1. The next lemma states that the perceptual distance is preserved under any termination strategy β.
 
Lemma 4.

|$\mathcal {R}_{p,\sigma _I}=\lbrace r:r(1\mid 1)r(0\mid 0)=d_p r(1\mid 0)r(0\mid 1)\rbrace$|⁠.

That is, a rule r can be constructed from p if and only if it preserves the perceptual distance: |$\frac{r(1\,\mid \,1)r(0\,\mid \,0)}{r(0\,\mid \,1)r(1\,\mid \,0)}=d_{p}$| (or if it always selects a same action). By controlling the termination strategy β, the agent trades off the likelihoods r(0 ∣ 0; p, β, σI) and r(1 ∣ 1; p, β, σI) of the correct choice in the states 0 and 1, respectively. See Figure 2. The set |$\mathcal {R}_{p,\sigma _{I}}$| of rules accessible from p is compact.

Fig. 2.

Each point in |$[0, 1]^2$| on this graph corresponds to a choice rule. The depicted curves are the sets |$\mathcal {R}_{p, \sigma _I}$| of the choice rules constructible from experiments p and action strategy σI. The thick curve corresponds to the experiment |$\overline{p}$| with the maximal perceptual distance |$\overline{d}$|⁠. Since the objective is linear in the choice rule, the indifference curves are downward sloping lines. The dashed line is the indifference curve tangential to |$\mathcal {R}_{\overline{p}, \sigma _I}$|⁠. The dot depicts the solution of the repeated-cognition problem.

Fig. 2.

Each point in |$[0, 1]^2$| on this graph corresponds to a choice rule. The depicted curves are the sets |$\mathcal {R}_{p, \sigma _I}$| of the choice rules constructible from experiments p and action strategy σI. The thick curve corresponds to the experiment |$\overline{p}$| with the maximal perceptual distance |$\overline{d}$|⁠. Since the objective is linear in the choice rule, the indifference curves are downward sloping lines. The dashed line is the indifference curve tangential to |$\mathcal {R}_{\overline{p}, \sigma _I}$|⁠. The dot depicts the solution of the repeated-cognition problem.

Fig. 3.

Confirmation bias with discounting. Action 1 is a priori more attractive: |$\pi _1u_1=5\times \pi _0u_0$|⁠. The primitive experiment is symmetric: |$\mathit{ p}(1 \mid 1) = \mathit{ p}(0 \mid 0) = 0.9$|⁠. The agent terminates immediately when she observes signal value 1, |$\beta ^*_1=1$|⁠. When |$\delta \gt 0.71$|⁠, then the agent is biased towards state 1: when she encounters signal value 0, then she terminates the decision-process with a probability only |$\beta ^*_0( \delta )\lt 1$| (the full curve). The dotted line is |$\beta ^*_0/ \beta ^*_1$| from the baseline model without discounting.

Fig. 3.

Confirmation bias with discounting. Action 1 is a priori more attractive: |$\pi _1u_1=5\times \pi _0u_0$|⁠. The primitive experiment is symmetric: |$\mathit{ p}(1 \mid 1) = \mathit{ p}(0 \mid 0) = 0.9$|⁠. The agent terminates immediately when she observes signal value 1, |$\beta ^*_1=1$|⁠. When |$\delta \gt 0.71$|⁠, then the agent is biased towards state 1: when she encounters signal value 0, then she terminates the decision-process with a probability only |$\beta ^*_0( \delta )\lt 1$| (the full curve). The dotted line is |$\beta ^*_0/ \beta ^*_1$| from the baseline model without discounting.

Thanks to the chosen labelling of the signals, the agent can equate her choice to the observed signal without a loss:

 
Lemma 5.

For any rule r(p, β, σ)there exists β′ such that the rule r(p, β′, σI) achieves at least as high expected payoff as r(p, β, σ) where σIis the identity function.

The solution to the repeated-cognition problem in the binary setting exists since the objective is continuous in the choice rule and the agent optimises on the compact set |$\bigcup _{p\in \mathcal {P}}\mathcal {R}_{p,\sigma _I}$| of the rules.

Let |$\overline{p}$| be the experiment with the maximal perceptual distance: |$\overline{p}\in \arg \max _{p\in \mathcal {P}}d_{p}$|⁠, and let |$\overline{d} =\max _{p\in \mathcal {P}}d_{p}$|⁠. In line with the intuition that the agent should go for the most informative experiment, we establish:

 
Lemma 6.

There exists a solution to the repeated-cognition problem in which the agent employs the experiment|$\overline{p}$|with the maximal perceptual distance.

The last lemma implies that all details of the set |$\mathcal {P}$| relevant for the solution are summarised in the one-dimensional statistic |$\overline{d}$| that is independent of the payoff function u.5

We are now ready to solve the binary setting. The optimal effective choice rule |$r^*(a\mid \theta )=r(a\mid \theta ;\overline{p}, \beta ^*,\sigma _I)$| consists of four unknown probabilities and it is determined by four conditions: the second-thought-free condition (5), the feasibility condition from Lemma 4, and two normalisation conditions. Let parameter |$R=\frac{\pi _1u_1}{\pi _0u_0}$| measure the relative a priori attractiveness of action 1.

 
Proposition 2.

  1. When|$R\ge \overline{d}$|⁠, then the agent always chooses action 1;

  2. when|$R\le 1/\overline{d}$|⁠, then the agent always chooses action 0;

  3. when|$R\in (1/\overline{d},\overline{d})$|⁠, then the agent chooses both actions with positive probabilities and
    $$\begin{eqnarray} r^{\ast }(1\mid 1)=\frac{\overline{d}R-\sqrt{\overline{d}R}}{(\overline{d} -1)R}\mbox{, }r^{\ast }(0\mid 0)=\frac{\overline{d}-\sqrt{\overline{d}R}}{ \overline{d}-1}, \end{eqnarray}$$
    (8)
    $$\begin{eqnarray} \frac{\beta _{1}^{\ast }}{\beta _{0}^{\ast }}=\frac{\overline{d}R-\sqrt{ \overline{d}R}}{\sqrt{\overline{d}R}-R}\frac{\overline{p}(0\mid 1)}{\overline{p}(1\mid 1)}. \end{eqnarray}$$
    (9)

When the ex ante attractiveness of one of the actions is too strong relative to the perceptual distance of the two states, then the agent always chooses the ex ante attractive action. The decision process is non-trivial for intermediate incentives: the agent engages in repeated cognition and she chooses both actions with positive probabilities.

4. Behavioural Applications

This section presents three behavioural effects illustrated in the binary setting from Section  3: confirmation bias, speed-accuracy complementarity, and overweighting of rare events.

4.1. Confirmation Bias

Psychologists and economists distinguish at least three mechanisms leading to confirmation bias: (i) People search for evidence selectively, targeting the evidence type in accord with their priors, e.g., Nickerson (1998), (ii) they selectively memorise and recall the data supporting their priors, e.g., Oswald and Grosjean (2004), and (iii) they selectively interpret ambiguous evidence, e.g., Rabin and Schrag (1999) and Fryer et al. (2018). We focus on the first two mechanisms and interpret them in light of our optimal repeated-cognition result.

 
Corollary 1.

When action|$1$|is a priori more attractive, |$R \in (1, d)$|⁠, and the unique primitive binary experiment is symmetric, |$p(1 \mid 1) = p(0 \mid 0) > 1/2$|, then the agent searches relatively more intensively for signal value|$1$|⁠: |$\beta ^*_1\gt \beta ^*_0$|⁠.

 
Proof.
Since |$\beta ^*_1/\beta ^*_0$| increases in R, it suffices to show that |$\beta ^*_1/\beta ^*_0=1$| when R = 1 and the primitive experiment p is symmetric. Indeed, when R = 1, then by (9),
$$\begin{eqnarray} \frac{\beta ^*_1}{\beta ^*_0}=\sqrt{d}\frac{p(0\mid 1)}{ p(1\mid 1)}=\sqrt{\frac{p(0\mid 0)p(0\mid 1)}{p(1\mid 1) p(1\mid 0)}}=1, \end{eqnarray}$$
where |$d=\frac{p(0\,\mid \,0)p(1\,\mid \,1)}{p(1\,\mid \,0)p(0\,\mid \,1)}$| and the last equality follows from the symmetry of p.

To see the connection to confirmation bias, consider, like in our introductory example, an agent whose task is to announce the realised state of the world: she receives reward u1 = u0 = 1 if she makes the correct announcement and 0 otherwise. The agent finds the state θ = 1 a priori more likely than the state 0, π1 > π0. Consider the decision process that terminates immediately after the first run of the experiment and chooses the action equal to the observed signal value: β0 = β1 = 1, σ = σI. To establish that such an unbiased process is suboptimal it suffices to show that it does not satisfy the second-thought-free condition. To see this, we examine the agent who has received the a priori unlikely signal 0 and argue that she benefits from the second thought. Such a surprised agent is better off by restarting instead of terminating with action 0, since if the new run of the process concludes with signal 0 again, then the second thought will have been inconsequential. If, however, the new run of the process concludes with signal and action 1, then the induced switch from action 0 to 1 is beneficial. This is because when the experiment p is symmetric, then, conditional on two conflicting signals, the a priori more common state 1 is relatively more likely. The agent benefits from second thought whenever she receives the surprising recommendation, and thus will deviate from the uniform search in favour of the a priori likely signal value 1.

The optimal strategy resembles the natural process in which the selective memory gives rise to confirmation bias. Consider the fastest optimal strategy, letting |$\beta _{1}^{\ast }=1$|⁠. When the agent observes signal 1 that confirms her prior belief, then she terminates and immediately announces the state 1. But if she is surprised, observing signal 0 that contradicts her prior, then she discards the signal with positive probability |$\beta _{0}^{\ast }$| and repeats the experiment. Although finding the exact optimal value |$\beta _{0}^{\ast }$| may be difficult, the fact that double-checking one’s own reasoning when one arrives at a surprising conclusion is a common practice suggests that people are able to deviate from the unbiased information-acquisition process in the payoff improving direction.

4.1.1. Comments

(1) The above insight can receive an alternative political economy interpretation. The two states θ represent left vs right wing policy. Consider a right-wing newspaper that targets the right-wing readers viewed as having a prior belief in favour of the right-wing policy. Readers can only absorb one piece of information (the analog of our information storage constraint) and the newspaper has to decide which piece of information x as generated by p(x ∣ θ) to choose as its headline. Our model explains why such journals would target their search toward evidence favouring the right-wing policy.6

(2) Meyer (1991) studies optimal biases in a sequential-learning problem of an agent who receives a sequence of signals and, unlike our agent, can aggregate the sequence. Meyer’s main insight is that some asymmetries in the signal structure are optimal. Although optimal asymmetries arise both in her and our frameworks, the two papers study distinct optimisations. While our agent controls termination probabilities in a stationary decision process, Meyer’s agent controls the choice of a Blackwell experiment in each round of a non-stationary process.

4.2. Speed-Accuracy Complementarity

Our model generates the speed-accuracy complementarity effect—a stylised fact stating that delayed choices tend to be less accurate than speedy choices; see the psychology studies of Swensson (1972) and Luce (1986). We establish this effect in the setting from the previous subsection.

Let φ(θ, a, t) be the joint probability distribution of the state θ, chosen action a, and the reaction time t generated by the solution (p, β*, σI) of the repeated-cognition problem.

 
Corollary 2.

When action |$1$| is a priori more attractive, |$R \in (1, d)$|⁠, and the unique primitive binary experiment is symmetric, |$p(1 \mid 1) = p(0 \mid 0) > 1/2$|, then the probability|${{\rm {Pr}}}_{\varphi }(a=\theta \mid t)$|of the correct choice decreases with response time t.

Due to the stationarity of the decision process, the probability of the correct choice conditional on the payoff state is independent of the reaction time: |$\Pr _{\varphi }(a=\theta \mid \theta ,t)=\Pr _{\varphi }(a=\theta \mid \theta )$|⁠. At optimum, this conditional probability of the correct choice is larger in the a priori more attractive state 1 than in the state 0, reflecting the relative weights of the two states in the objective. Overall, unconditionally on the payoff state, the probability Prφ(a = θ ∣t) of the correct choice depends on the response time because t correlates with θ. A long response time indicates that the agent has repeatedly encountered the signal value 0 and has hesitated to terminate. Hence, conditional on large t, the likelihood of the unattractive state 0 becomes high. The longer the agent has hesitated, the more likely it is that she is facing the unattractive state in which she is making more mistakes.

 
Proof.
|$\beta ^*_1\gt \beta ^*_0$| by Corollary 1. We let |$f_\theta =\beta ^*_1 p(1\mid \theta )+\beta ^*_0p(0\mid \theta )$| denote the probability of termination per each round in state θ. The response time t in the state θ is geometrically distributed with the decision rate fθ: |$\Pr _\varphi (t\mid \theta )= f_\theta (1-f_\theta )^t$| for t = 0, 1, …. Since p(1 ∣ 1) = p(0 ∣ 0) > p(1 ∣ 0) = p(0 ∣ 1) and |$\beta ^*_1\gt \beta ^*_0$|⁠, the decision rate is higher in state 1 than in state 0: f1 > f0. Thus, the likelihood ratio |$\Pr _\varphi (t\mid \theta =1)/\Pr _\varphi (t\mid \theta =0)$| decreases with t, and hence |$\Pr _\varphi (\theta =1\mid t)$| decreases in t. The fact that |$\beta ^*_1\gt \beta ^*_0$|⁠, and the symmetry of p implies that the probability of the correct choice is larger in state 1 than in state 0:
$$\begin{eqnarray} r\left(1\mid 1;p,\beta ^*,\sigma _I\right)&=& \frac{\beta ^*_1 p(1\mid 1)}{\beta ^*_0p(0\mid 1)+\beta ^*_1p(1\mid 1)}\gt \frac{\beta ^*_0p(0\mid 0)}{\beta ^*_0 p(0\mid 0)+\beta ^*_1p(1\mid 0)} \\ &=& r\left(0\mid 0;p,\beta ^*,\sigma _I\right). \end{eqnarray}$$
Since |$\Pr _\varphi (a=\theta \mid t)= \Pr _\varphi (\theta =1\mid t)r(1\mid 1;p,\beta ^*,\sigma _I)+\Pr _\varphi (\theta =0\mid t)r(0\mid 0;p,\beta ^*,\sigma _I)$|⁠, the result obtains.

4.2.1. Comment

The predictions of our model are in line with the evidence on state recognition problems reported in Ratcliff and McKoon (2008) according to which: (i) the posterior probability of correct recognition is higher when announcing the a priori more likely state, and (ii) late announcements are relatively less precise. This is in contrast to the prediction of the traditional Wald model (see Fudenberg et al. (2018) for an elaboration of Wald model in which the stakes attached to the correct recognition are a priori unknown).

4.3. Overweighting of Rare Events

We consider a state-recognition task in which the two actions are a priori equally attractive, R = π0u01u1 = 1, and π0 = π1 = 1/2. In contrast to previous applications, the distribution of the signal values x = 0, 1 is asymmetric across states. Specifically, the probability of x = 1 in state θ is ρθ ∈ (0, 1) and the probability of x = 0 is 1 − ρθ. We assume, essentially without loss of generality, that ρ0 < ρ1 < 1 − ρ0.7 The a priori probability of event x = 1 is (ρ0 + ρ1)/2 < 1/2, and thus the event x = 1 is relatively rarer than x = 0. The next result states that at the optimum, the agent is relatively more likely to discard the more common event x = 0 in agreement with Kahneman and Tversky (1979), who observe that agents tend to overweight rare events.

 
Corollary 3.

When the two actions are a priori equally attractive, |$R = 1$|, then the agent is biased in favour of the event|$x = 1$|⁠: |$\beta _{1}^{\ast }\gt \beta _{0}^{\ast }\gt 0$|(and her guess of the state equals the observed signal realisation, i.e., σ = σI).

 
Proof.
This task is a special case of our binary setting with the primitive experiment p(x ∣ θ) = ρθ if x = 1, p(x ∣ θ) = 1 − ρθ if x = 0 and with equally a priori attractive actions, R = 1. Since ρ0 < ρ1, the labelling of the signals satisfies the monotone likelihood property. Since R = 1 ∈ (1/d, d), Proposition 2 implies that the agent’s behaviour is stochastic, both |$\beta ^*_0$| and |$\beta ^*_1$| are positive, and the ratio of the search intensities |$\beta ^*_1/\beta ^*_0$| satisfies (9). Since R = 1, (9) simplifies to
$$\begin{eqnarray} \frac{\beta ^*_1}{\beta ^*_0}=\sqrt{d}\frac{p(0\mid 1)}{p(1\mid 1)}=\sqrt{\frac{ p(0\mid 1) p(0\mid 0)}{ p(1\mid 1)p(1\mid 0)}}= \sqrt{\frac{(1-\rho _1)(1-\rho _0)}{\rho _1\rho _0}}. \end{eqnarray}$$
The inequality |$\beta ^*_1\gt \beta ^*_0$| follows from ρ1 < 1 − ρ0.

For illustration, consider a formation of belief over the probability of a flight accident. The accident probability per flight in the safe state of the world is 10−6, whereas it is 10−5 in the dangerous state of the world, and both states are a priori equally likely. The agent can sequentially observe arbitrarily many past flight outcomes, but cannot aggregate the information, and recalls only the last observed flight. She guesses that the state of the world is dangerous if and only if the last observed flight is eventful.

Consider first an agent who always terminates right after the observation of the first data-point. Such an agent benefits from a ‘second thought' whenever she observes an uneventful flight: either the second observed flight will be uneventful, in which case the second thought will have been inconsequential, or the redrawn flight will be eventful and the agent will switch her assessment from the safe to the dangerous state. Such a switch is beneficial since conditional on two contradicting data-points the dangerous state is relatively more likely. Thus, relative to the immediate termination strategy, the agent will benefit from discarding the uneventful flight observations with positive probability.

5. A Tractable Setting with Multiple States

We now present a class of settings with multiple payoff states that admits a simple analytical solution in the form of a system of linear equations for the optimal termination probabilities. Subsection 5.1 applies this solution in a stylised example that explains salience of perceptually distinct states as a second-best adaptation.

The agent faces a perceptual task that requires her to announce a realisation of the state θ ∈ Θ drawn from a fully mixed prior π ∈ Δ(Θ), where 2 < |Θ| < ∞. She is endowed with a primitive perception technology that generates a perceived value θ′ of the state. The primitive perception is informative but noisy: the perceived value θ′ equals the true state θ with a high probability, but mistakes, θ′ ≠ θ, occur sometimes. We view the primitive perception technology as a black-box model of a physiological sensor that generates a noisy impression θ′ of the true state θ. The agent can use the sensor repeatedly but is not able to aggregate the information. She conditions the repetition of the sensor’s use on the most recent perception and announces the terminal perception.

We formalise this perception task as follows. The agent makes an announcement aA = Θ, where 2 < |Θ| < ∞, and receives payoff u(a, θ) = uθ > 0 if her announcement is correct, a = θ, and u(a, θ) = 0 if a ≠ θ. Each use of the agent’s sensor generates a signal value/perception θ′ ∈ X = Θ, with conditional probabilities p(θ′ ∣ θ) > 0. The set |$\mathcal {P}$| is the singleton {p}. We make the following assumption.

Symmetry: p(θ′ ∣ θ) = p(θ ∣ θ′).

The symmetry assumption leads to a significant simplification of the second-thought-free condition described in Lemma 9 in Appendix. Additionally, we make a simplifying assumption that the agent uses the identity action strategy σI; she announces the state equal to her last perception. We also make the assumption that the optimal termination probabilities βx are positive for all x ∈ Θ.8 Let r* = r(p, β*, σI) be the optimal feasible choice rule.

 
Proposition 3.
The optimal termination probabilities satisfy the system of linear equations,
$$\begin{eqnarray} \sum _{\tilde{\theta }\in \Theta }\beta ^*_{\tilde{\theta }}\frac{p(\tilde{\theta }\mid \theta )}{\left(\pi _\theta u_\theta p(\theta \mid \theta )\right)^{1/2}}= \sum _{\tilde{\theta }\in \Theta }\beta ^*_{\tilde{\theta }}\frac{p(\tilde{\theta }\mid \theta ^{\prime })}{\left(\pi _{\theta ^{\prime }} u_{\theta ^{\prime }} p(\theta ^{\prime } \mid \theta ^{\prime })\right)^{1/2}} \mbox{ for all }\theta ,\theta ^{\prime }\in \Theta . \end{eqnarray}$$
(10)

The proposition implies that the decision rate |$f_\theta =\sum _{\tilde{\theta }}\beta ^*_{\tilde{\theta }}p(\tilde{\theta }\mid \theta )$| in each state θ is proportional to (πθuθp(θ ∣ θ))1/2 and thus is high in those states that are reliably identified by the primitive experiment and in which the ex ante expected reward for the correct state recognition is high.

5.1. Salience

Bordalo et al. (2012) interpret salience as directed attention focus. They quote the popular work by Daniel Kahneman (2011):

‘Our mind has a useful capability to focus on whatever is odd, different or unusual.'

The quote states a causal relation between the two features of the salient phenomena: these are: (i) odd, different or unusual, and because of (i), people benefit from (ii) focusing their attention on such phenomena. Here, we confirm Kahneman’s intuition within our proposed framework. Our microfoundation of the salience effect is related to the insight emerging in psychological research on visual salience. Itti (2007) conceptualises the visual salience effect as attention allocation to a subset of the visual field that is ‘sufficiently different from its surroundings to be worthy of [one’s] attention'. Similarly, in our model, a payoff state is salient if it stands out sufficiently from similar states to be worthy of the focus of the agent’s information search.

For two states θ1 and θ2, we say that θ1 is more distinct than θ2 if for each other state θ3 ≠ θ1, θ2, p1 ∣ θ3) < p2 ∣ θ3). Suppose for illustration that the perceptual task involves recognition of a color from a set {azure, indigo, red}. Intuitively, the red color stands out of this set, and this is captured by the above definition. Assume that the two shades of blue are similar in that the agent’s first impression confuses them in 10% of cases, p(azure ∣ indigo) = p(indigo ∣ azure) = 0.1, but p(θ ∣ red) = p(red ∣ θ) = 0.01 for θ ∈ {azure, indigo}. Then, the red color is more distinct according to our definition than either of the two blue shades.

We focus on the effect stemming from the agent’s differential ability to perceptually discriminate between the states, and thus we abstract from the differences in the ex ante rewards across states; we assume that πθuθ is constant across all states. Additionally, we impose the following assumption:

5.1.1. Sufficient Precision

p(θ ∣ θ) > p(θ′ ∣ θ) for all θ ≠ θ′.

 
Proposition 4.
If state|$\theta_1$|is more distinct than state|$\theta_2$|⁠, then the agent’s terminal perception is biased in favour of the more distinct state|$\theta_1$|at the expense of the less distinct state|$\theta_2$|⁠:
$$\begin{eqnarray} r^{\ast }(\theta _{1}\mid \theta _{2})\gt r^{\ast }(\theta _{2}\mid \theta _{1}). \end{eqnarray}$$

Since the primitive perception technology p is symmetric by assumption, the asymmetry in favour of the distinct state of the optimal terminal perception r* is driven solely by the optimisation of the termination strategy. To gain the intuition for the salience of the distinct states, consider a state θ* that is similar to many other states and an agent who always terminates the process after the first round: |$\beta =\mathbf {1}$|⁠. This agent is relatively uninformed whenever she forms perception θ*, since the true state differs from θ* with a sizeable probability. The agent with this indistinct perception θ* would thus benefit from ‘having a second thought'—i.e., from running the primitive perception formation process once again. The optimal termination strategy involves repeating the primitive process with relatively high probability whenever the agent forms a perception of an indistinct state, and this shifts the terminal perception in favour of the distinct states.

6. Extensions

In the first subsection, we discuss how our model can accommodate agents with more general memory constraints. Subsection 6.2 accommodates agents who discount future payoffs at an exponential rate.

6.1. Sophisticated agents

To demonstrate the flexibility of the general model, we now discuss two specific settings. They feature sophisticated agents with non-trivial memory that can be used to aggregate information over several observed signal realisations. Perhaps surprisingly, we show that those settings can in fact be interpreted as special cases of our general model that on its face value allows only for trivial memory. We show that such accommodation of non-trivial memory is possible via expansion of the set |$\mathcal {P}$| of the primitive experiments. This allows us to establish the generality of the second-thought-free condition.

Moreover, when the state and action spaces are binary, then the setting with sophisticated agents boils down to the simple binary setting as formulated in Section  3, except for the determination of the perceptual-distance parameter |$\overline{d}$|⁠, which is now endogenously determined by the agent’s ability to process information.

 
Example 1

(imperfect information aggregation). This setting relaxes the agent’s inability to aggregate information across the repetitions of her reasoning by endowing her with a finite set of memory states that she can use to represent the signal histories. The setting of this example builds on Hellman and Cover (1970) and Wilson (2014). The agent can repeatedly sample from a single statistical experiment that generates signal realisations from a finite signal space. Additionally, the agent is endowed with a finite set of memory states. After each run of the experiment, the agent randomises between terminating and continuation of the decision process, where in the latter case, she may transition to a new memory state. The termination decisions and the transitions among memory states follow a stationary mixed strategy that conditions on the current memory state and the last observed signal. Once the agent terminates, she maps the last memory state and the last observed signal value to a chosen action. The feasible statistical experiment and the set of memory states specify a set of constructible choice rules, from which the agent chooses the one that maximises her ex ante expected payoff.

The formal specification of this example follows. The agent is endowed with one Blackwell experiment μ(x∣θ) with a finite signal space X and, additionally, with a finite set M of memory states m. After each run of the experiment μ, the agent either terminates or continues with decision-making. If the agent continues, then she transitions from the current memory state to a new memory state and reruns the statistical experiment μ(x ∣ θ). That is, the agent selects a (generalisation of the) termination strategy: |$\gamma :M\times X\longrightarrow \Delta \left( M\cup \lbrace \mathfrak {t}\rbrace \right)$|⁠, where γ(m′ ∣m, x) is the probability that the agent in memory state m who has observed signal realisation x in the last run of the experiment μ continues with the decision-making and transitions to memory state m′, and |$\gamma (\mathfrak {t}\mid m,x)$| is the probability that such an agent terminates. The terminating agent chooses action σ(m, x) that depends both on the current memory state and on the signal realisation observed in the last run of μ. The agent starts the decision-making in the memory state m0. A pair γ, σ induces a θ-dependent Markov chain over the memory states that eventually terminates with choice σ(m, x), where m is the last memory state and x is the last signal realisation observed. Let p(a ∣ θ; γ, σ) be the probability that the agent terminates with the choice a in state θ, and let |$\mathcal {P}_{iia}$| be the set of all stochastic choice rules p that this agent can construct. She selects the choice rule from |$\mathcal {P}_{iia}$| that maximises her ex ante expected payoff.

We now demonstrate that this example is a special case of our baseline model. Consider the baseline model with the signal space X = A and the set of the feasible primitive experiments |$\mathcal {P}=\mathcal {P}_{iia}$|⁠. The set |$\mathcal {R}(\mathcal {P}_{iia}) =\lbrace r(p,\beta ,\sigma ):p\in \mathcal {P} _{iia},\beta \in B, \sigma \in S\rbrace$| is then the set of stochastic choice rules that can be constructed as follows. The agent runs any process |$p\in \mathcal {P}_{iia}$|⁠, and observes a signal value/action recommendation a with probability p(a ∣ θ). She terminates with probability βa, according to the termination strategy β = (βa)aA, and upon the termination chooses an action a′ = σ(a), where σ ∈ S is any mapping |$A\longrightarrow A$|⁠. She reruns the process p with probability 1 − βa, observes a new action recommendation generated by p, et cetera, until she terminates after a stochastic number of repetitions of the process p.

As it turns out, no new choice rules beyond those from |$\mathcal {P}_{iia}$| can be constructed by these selective repetitions. This follows because the repetitions of the rule |$p\in \mathcal {P}_{iia}$| with the termination strategy β can always be replicated with an appropriate choice of a different rule in |$\mathcal {P}_{iia}$| that whenever p would terminate with a restarts the process from scratch with probability 1 − βa. Formally:

 
Lemma 7.

|$\mathcal {R}(\mathcal {P} _{iia})=\mathcal {P}_{iia}$|⁠.

According to the lemma, Example 1 is a special case of our baseline model with |$\mathcal {P}=\mathcal {P}_{iia}$| and X = A, since in such a specification of the baseline model, the set of feasible rules coincides with those in Example 1. In particular, the optimal choice rule |$p^*\in \mathcal {P}_{iia}$| solving Example 1 coincides with the optimal rule |$r^*\in \mathcal {R}(\mathcal {P}_{iia})$| solving this specification of the baseline model.

The repeated-cognition problem with |$\mathcal {P}=\mathcal {P}_{iia}$| is purely formal in that the optimal termination probabilities |$\beta ^*_x=1$| for all xX = A, and thus the agent conducts the optimal process |$p^*\in \mathcal {P}_{iia}$| only once and terminates. Nevertheless, the observation that p* solves the repeated-cognition problem has an important implication.

 
Corollary 4.

The choice rule that solves Example 1 (imperfect information aggregation) is second-thought-free.

Wilson (2014) differs from this example mainly in that she assumes exogenous termination probabilities. By adding optimisation over the terminations to the model of Wilson, we gained the partial characterisation of the optimal choice rule with no need to fully solve the problem: one can conclude that the optimal choice rule is second-thought-free without analysing the optimal use of the memory states.

 
Example 2

(partial forgetting). The agent of this example can remember up to a fixed finite number of signal realisations generated by a single statistical experiment. In each round of her decision process, she can discard a subset of the currently remembered signals values, extract a new signal realisation, or terminate, where each of these decisions is determined by a stationary mixed strategy that conditions on the currently remembered stock of the signal values. The statistical experiment and the maximal number of signals that the agent can remember determine the set of stochastic choice rules that she can construct, from which she chooses the rule that maximises her ex ante expected payoff.

We first formalise this example as follows. Let H be the set of signal histories h of length |h| ≤ N. The agent at a history h can (i) terminate her decision-making, (ii) discard some of the information accumulated, or (iii), if |h| < N, acquire a new signal realisation. (i) An agent terminating at h chooses action σ(h). (ii) An agent who discards some information transitions to a truncation h′ of her current history h.9 (iii) An agent who acquires a new signal realisation transitions to a history hx, where x is the new signal realisation drawn from μ(x∣ θ). The decision-making is governed by a pair of mappings |$\gamma :H\times \Theta \longrightarrow \Delta \left(H\cup \lbrace \mathfrak {t}\rbrace \right)$| and |$\sigma :H\longrightarrow A$|⁠, where γ(h′ ∣h, θ) stands for the probability that the agent at history h in state θ continues decision-making and transitions to h′, and |$\gamma (\mathfrak {t}\mid h,\theta )$| is the probability of termination at history h in state θ. The mapping γ is constrained to satisfy (1) γ(h′ ∣h, θ) is independent of θ if h′ is a truncation of h, (2) |$\gamma (\mathfrak {t}\mid h,\theta )$| is independent of θ, (3) |$\frac{\gamma (hx\,\mid \,h,\theta )}{ \gamma (hx^{\prime }\,\mid \,h,\theta )}=\frac{\mu (x\,\mid \,\theta )}{\mu (x^{\prime }\,\mid \,\theta )}$|⁠, (4) γ(h′ ∣h, θ) = 0 unless h′ is a truncation of h, or h′ = hx for some xX and |hx| ≤ N. Constraints 1 and 2 require the agent to condition the decision to discard information or to terminate only on her current history independently of the state. Constraint 3 allows the agent to expand her information set only by running the experiment μ(x∣ θ). Constraint 4 restricts each step of information acquisition to one draw from μ(x ∣ θ) or to a partial discarding of the accumulated information. Let p(a ∣ θ; γ, σ) be the probability that the agent who employs (γ, σ) terminates with action a in the state θ. The agent chooses γ and σ to maximise her ex ante expected payoff.

As with the previous example, let |$\mathcal {R}(\mathcal {P}_{pf})$| be the set of feasible choice rules in our baseline model with the set of feasible primitive experiments |$\mathcal {P}$| identified with |$\mathcal {P} _{pf}$|⁠.

 
Lemma 8.

|$\mathcal {R}(\mathcal {P}_{pf})= \mathcal {P}_{pf}$|⁠.

Thus, again, the rule |$p^*\in \mathcal {P}_{pf}$| solving this example, and the optimal rule |$r^*\in \mathcal {R}(\mathcal {P}_{pf})$| coincide, and thus the rule solving the example must be second-thought-free.

 
Corollary 5.

The choice rule that solves Example 2 (partial forgetting) is second-thought-free.

Additionally, when the state and action sets are binary, Proposition 2 applies to both examples with |$\overline{d}=\frac{ p^*(1\,\mid \,1)p^*(0\,\mid \,0)}{p^*(0\,\mid \,1)p^*(0\,\mid \,1)}$|⁠, and thus, relative to the baseline setting in which the agent remembers only one signal, the examples have the same solution except for the determination of the endogenous parameter |$\overline{d}$|⁠. Thus, for instance, if the state 1 is a priori more attractive than state 0, then the agent is more likely to make the correct choice in state 1 than in state 0; r*(1 ∣ 1) > r*(0 ∣ 0). Like in Subsection 4.1, the optimal decision procedure favours the evidence supporting the a priori attractive state.

6.2. Impatient Agents

Our baseline model abstracts from the cost of time in that the agent is only concerned with how the repetitions of the signal extraction affect the correlation of the signal with the state. We now incorporate discounting.

We continue to study the baseline model from Section  1, except that the agent discounts future payoffs exponentially with the discount factor δ ∈ (0, 1). To accommodate discounting, we redefine the choice rule induced by the experiment p, the termination strategy β and the action strategy σ as follows.
$$\begin{eqnarray} r_{\delta }(a\mid \theta ;p,\beta ,\sigma )=\sum _{t=1}^\infty \sum _{\mathbf {x} ^{t}:\sigma (x_{t})=a}\delta ^{t}\rho \left( \mathbf {x}^{t}\mid \theta ;p,\beta \right) , \end{eqnarray}$$
(11)
where |$\rho \left( \mathbf {x}^{t}\mid \theta ;p,\beta \right)$| defined in (1) is the conditional probability of the signal history |$\mathbf {x}^{t}$|⁠. That is, rδ(a∣ θ; p, β, σ) is the discounted probability of the choice of action a in the state θ. When δ = 1, then (11) coincides with our baseline definition of the choice rule.
The set of feasible discounted rules is |$\mathcal {R}_{\delta }(\mathcal {P} )=\lbrace r_{\delta }(p,\beta ,\sigma ):p\in \mathcal {P},\beta \in B,\sigma \in S\rbrace$|⁠. The discounted repeated-cognition problem is to select a feasible rule rδ that maximises the expected payoff:
$$\begin{eqnarray} \max _{r_{\delta }\in \mathcal {R_{\delta }}(\mathcal {P})}\sum _{\theta \in \Theta ,a\in A}\pi _{\theta }r_{\delta }(a\mid \theta )u(a,\theta ), \end{eqnarray}$$
(12)
where discounting is incorporated in the definition of the feasible rules.

The next result generalises the second-thought-free condition. Let |$r^*_\delta =r_\delta (p^*,\beta ^*,\sigma ^*)$| be the choice rule solving the discounted repeated-cognition problem (12).

 
Proposition 5.
If the termination strategy|$\beta ^*_x\in (0,1)$|is interior for all x such that|$\sigma^*(x) = a$|⁠, then
$$\begin{eqnarray} \sum _{\theta \in \Theta }\pi _\theta u(a,\theta ) r^*_\delta (a\mid \theta )= \delta \sum _{\theta \in \Theta ,a^{\prime }\in A} \pi _\theta u(a^{\prime },\theta )r^*_\delta (a^{\prime }\mid \theta )r^*_\delta (a\mid \theta ). \end{eqnarray}$$
(13)

The condition has the same interpretation as the second-thought-free condition in the absence of discounting. The left-hand side is the payoff for following the optimal decision process |$r^*_\delta$| summed up across all contingencies that terminate with choice of a. The right-hand side is the payoff that the agent would get across the same contingencies if she restarted the decision process |$r^*_\delta$| instead of the termination.

For illustration, we now revisit the confirmation bias application from Subsection  4.1 with an impatient agent. We find that, unless discounting is too strong, the impatient agent chooses qualitatively the same strategy as the patient one, although the impatient agent speeds up her decision-making by choosing larger termination probabilities.

The setting is as follows. The agent chooses a ∈ {0, 1} and receives u(a, θ) = uθ > 0 if a = θ, and zero reward otherwise. Action 1 is a priori more attractive than action 0; π1u1 > π0u0. The agent has access to a single primitive experiment p that generates signal values in X = {0, 1}. The experiment is symmetric with probabilities p(1 ∣ 1) = p(0 ∣ 0) = α > 1/2. We impose a sufficient-informativeness condition that the signal is sufficiently precise relative to the ex ante attractiveness of action 1: |$\frac{\alpha }{1-\alpha }\gt \frac{\pi _1u_1}{\pi _0u_0}$|⁠.

 
Proposition 6.

The agent chooses the action equal to the last observed signal realisation. She terminates her decision-making immediately after she encounters signal realisation|$1$|⁠: |$\beta _{1}^{\ast }=1$|⁠. When |$\delta \in \big( \frac{1}{\alpha +(1-\alpha )R} ,1\big]$|⁠, then the agent who observes |$x = 0$|terminates with an interior probability|$\beta _{0}^{\ast }\in (0,1)$|that decreases in δ. When|$\delta \in \big( 0,\frac{1}{\alpha +(1-\alpha )R}\big)$|⁠, then the agent terminates immediately:|$\beta _{0}^{\ast }=\beta _{1}^{\ast }=1$|⁠.

7. Summary

Agents, who cannot comprehend all facts available to them, benefit from selective attention. We show that agents can implement a targeted information search in a process that resembles the natural phenomenon of hesitation. Like a hesitant person, the agent can, conditional on the action contemplated, decide whether she implements the action or whether she will have a second thought, and run the cognition process once more. Such hesitation can be productive, despite consisting of repetitions of the same stochastic cognition process. By conditioning the probability of the repetition on the conclusion of the reasoning, the agent controls the correlation of her terminal conclusion and the payoff state. The optimal decision process arising in our model exhibits natural hesitation patterns: the agent will have second thoughts—that is, she will repeat her cognition—whenever the expected payoff for the currently favoured choice is inferior to the expected payoff for continuing decision-making. At optimum, the agent terminating the decision-making must be indifferent between terminating with the currently contemplated action, and repeating the process.

In a sense, the condition formalises the concept of a reasonable doubt. Abstracting from many considerations such as information aggregation across the jury members, a jury deciding a trial under common law should be, if using the optimal decision procedure, indifferent between declaring a verdict and announcing a hung jury and initiate retrial.

Let us conclude by reviewing the limitations of our main result. The central assumption—the ability of the agent to freely repeat her decision process—may fail for several reasons. One reason is that the agent may only have access to a limited data set that constrains her to a finite number of repetitions of the primitive decision process, making the optimal termination strategy non-stationary. Another complication arises if the outcomes of distinct runs of the same cognition process are not conditionally independent as assumed in our model; this may arise if some cognition errors are systematic and are likely to emerge in distinct repetitions of the cognition. We conjecture that the second-thought-free condition holds in such a case, with the agent internalising the correlations between the cognition runs.

Appendix A

A.1. Proofs for Section 3

 
Proof of Lemma 3.

Assume that there exists a solution with βx positive for n > 2 signals xX. We show that then there exists a solution with n − 1 positive signals. The proposition follows from the induction on n.

Let us prove the induction step. Fix the primitive experiment p employed by the agent, let β be an optimal termination strategy for the given p, and let X′ be the set of signals with positive βx, and write shortly s(x∣ θ) for the effective experiment s(x ∣ θ; p, β) induced by p and β. Let us abuse notation by letting s(x) = ∑θπθs(x ∣ θ) stand for the unconditional effective probability of x. For xX′ let qx ∈ Δ(Θ) be the posterior belief upon terminating with x: qx(θ) = πθs(x ∣ θ)/s(x).

Since |X′| > 2 and the state space Θ is binary, there exists a signal x* ∈ X′ such that |$q_{x^*}$| is in the convex hull of the posteriors qx, xX′∖{x*}. Let μx be the coefficients that decompose |$q_{x^*}$| into qx, xX′∖{x*}. That is, μ ∈ Δ(X′∖{x*}) such that |$q_{x^*}(\theta )=\sum _{x\in X^{\prime }\setminus \lbrace x^*\rbrace }\mu _x q_x(\theta )$| for all θ ∈ Θ.

We will construct an alternative feasible effective experiment |$\tilde{s}(x\mid \theta )$| with unconditional probabilities of x denoted by |$\tilde{s}(x)$| and the posteriors |$\pi _\theta \tilde{s}(x\mid \theta )/\tilde{s}(x)$| denoted by |$\tilde{q}_x(\theta )$| such that:
$$\begin{eqnarray} \tilde{s}(x)= \left\lbrace \begin{array}{@{}l@{\quad }l@{}}s(x)+s(x^*)\mu _x \mbox{ if }x\in X^{\prime }\setminus \lbrace x^*\rbrace ,\\ 0 \mbox{ otherwise,} \end{array}\right. \end{eqnarray}$$
(A1)
and
$$\begin{eqnarray} \tilde{q}_x(\theta )=q_x(\theta ) \mbox{ for all }x\in X^{\prime }\setminus \lbrace x^*\rbrace ,\theta \in \Theta . \end{eqnarray}$$
(A2)
Since the experiment |$\tilde{s}$| is more informative than s (in the sense of the Blackwell comparison), there exists a solution with this alternative feasible effective experiment |$\tilde{s}$|⁠, as needed for the induction step.
It remains to construct |$\tilde{s}$|⁠. Note that if an effective experiment |$s(x\mid \theta ;p,\beta )=\frac{\beta _x p(x\,\mid \,\theta )}{\sum _{x^{\prime }}\beta _{x^{\prime }}p(x^{\prime }\,\mid \,\theta )}$| is induced by some p and β, then for any vector of probabilities |$\tilde{\beta }_x$|⁠, the experiment
$$\begin{eqnarray} \tilde{s}(x\mid \theta )= \frac{\tilde{\beta }_x s(x\mid \theta ;p,\beta )}{\sum _{x^{\prime }\in X}\tilde{\beta }_{x^{\prime }} s(x^{\prime }\mid \theta ;p,\beta )}= \frac{\tilde{\beta }_x\beta _x p(x\mid \theta )}{\sum _{x^{\prime }\in X}\tilde{\beta }_{x^{\prime }}\beta _{x^{\prime }}p(x^{\prime }\mid \theta )} \end{eqnarray}$$
is also feasible, since it is induced by p and |$\beta ^{\prime }=(\tilde{\beta }_{x}\beta _x)_{x\in X}$|⁠.
We claim that if
$$\begin{eqnarray} \tilde{\beta }_x= \left\lbrace \begin{array}{@{}l@{\quad }l@{}}c\left(1+\displaystyle\frac{s(x^*)\mu _x}{s(x)}\right) \mbox{ if }x\in X^{\prime }\setminus \lbrace x^*\rbrace ,\\ 0 \mbox{ otherwise,} \end{array}\right. \end{eqnarray}$$
where c is a constant such that |$\tilde{\beta }_x\in (0,1)$| for all xX, then the resulting |$\tilde{s}$| satisfies the properties (A1) and (A2). Let us check:
$$\begin{eqnarray} \tilde{s}(x\mid \theta ) &=& \displaystyle\frac{\tilde{\beta }_x s(x\mid \theta )}{\sum _{x^{\prime }\in X^{\prime }\setminus \lbrace x^*\rbrace }\tilde{\beta }_{x^{\prime }}s(x^{\prime }\mid \theta )} \\ &=& \displaystyle\frac{\tilde{\beta }_x s(x\mid \theta )}{c\left(\sum _{x^{\prime }\in X^{\prime }\setminus \lbrace x^*\rbrace }s(x^{\prime }\mid \theta ) +\sum _{x^{\prime }\in X^{\prime }\setminus \lbrace x^*\rbrace }\displaystyle\frac{s(x^*)\mu _{x^{\prime }}}{s(x^{\prime })}s(x^{\prime }\mid \theta )\right)}\\ &=& \displaystyle\frac{\tilde{\beta }_x s(x\mid \theta )}{c\left(\sum _{x^{\prime }\in X^{\prime }\setminus \lbrace x^*\rbrace }s(x^{\prime }\mid \theta ) +\sum _{x^{\prime }\in X^{\prime }\setminus \lbrace x^*\rbrace }\displaystyle\frac{s(x^*)\mu _{x^{\prime }}}{\pi _\theta }q_{x^{\prime }}(\theta )\right)}\\ &=& \displaystyle\frac{\tilde{\beta }_x s(x\mid \theta )}{c\left(\sum _{x^{\prime }\in X^{\prime }\setminus \lbrace x^*\rbrace }s(x^{\prime }\mid \theta ) +\displaystyle\frac{s(x^*)}{\pi _\theta }q_{x^*} (\theta )\right)}\\ &=& \displaystyle\frac{\tilde{\beta }_x s(x\mid \theta )}{c\left(\sum _{x^{\prime }\in X^{\prime }\setminus \lbrace x^*\rbrace }s(x^{\prime }\mid \theta ) +s(x^*\mid \theta )\right)}\\ &=&\displaystyle\frac{\tilde{\beta }_x s(x\mid \theta )}{c}\\ &=& \left(1+\displaystyle\frac{s(x^*)\mu _x}{s(x)}\right) s(x\mid \theta ). \end{eqnarray}$$
The property (A1) holds since for all xX′∖{x*}:
$$\begin{eqnarray} \tilde{s}(x)=\left(1+\frac{s(x^*)\mu _x}{s(x)}\right)s(x)=s(x)+s(x^*)\mu _x. \end{eqnarray}$$
To establish the property (A2), check that for all xX′∖{x*} and all θ ∈ Θ:
$$\begin{eqnarray} \tilde{q}_x(\theta )=\displaystyle\frac{\pi _\theta \tilde{s}(x\mid \theta )}{\sum _{\theta ^{\prime }\in \Theta }\tilde{s}(x\mid \theta ^{\prime })}=\displaystyle\frac{\pi _\theta \left(1+\displaystyle\frac{s(x^*)\mu _x}{s(x)}\right) s(x\mid \theta )}{\sum _{\theta ^{\prime }\in \Theta }\left(1+\displaystyle\frac{s(x^*)\mu _x}{s(x)}\right) s(x\mid \theta ^{\prime })}=\displaystyle\frac{\pi _\theta s(x\mid \theta )}{\sum _{\theta ^{\prime }\in \Theta } s(x\mid \theta ^{\prime })} =q_x(\theta ). \end{eqnarray}$$
 
Proof of Lemma 4.
For any positive β,
$$\begin{eqnarray} \displaystyle\frac{r(1\mid 1;p,\beta ,\sigma _I)r(0\mid 0;p,\beta ,\sigma _I)}{r(0\mid 1;p,\beta ,\sigma _I)r(1\mid 0;p,\beta ,\sigma _I)} = \displaystyle\frac{ \displaystyle\frac{\beta _1 p(1\mid 1)}{\sum _x\beta _x p(x\mid 1)} \displaystyle\frac{\beta _0 p(0\mid 0)}{\sum _x\beta _x p(x\mid 0)} }{ \displaystyle\frac{\beta _0 p(0\mid 1)}{\sum _x\beta _x p(x\mid 1)} \displaystyle\frac{\beta _1 p(1\mid 0)}{\sum _x\beta _x p(x\mid 0)} } = \displaystyle\frac{p(1\mid 1)p(0\mid 0)}{p(0\mid 1)p(1\mid 0)}= d_p. \end{eqnarray}$$
Thus, every |$r\in \mathcal {R}_{p,\sigma _I}$| either always selects a same action, or satisfies |$\frac{r(1\,\mid \,1)r(0\,\mid \,0)}{r(0\,\mid \,1)r(0\,\mid \,1)}=d_p$|⁠. Vice versa, if a rule r′ satisfies |$\frac{r^{\prime }(1\,\mid \,1)r^{\prime }(0\,\mid \,0)}{r^{\prime }(0\,\mid \,1)r^{\prime }(0\,\mid \,1)}=d_p$|⁠, then it belongs to |$\mathcal {R}_{p,\sigma _I}$|⁠. To see this, let ra denote the rule that always selects action a. Consider positive β0, and note that r(p, (β0, β1), σI) is continuous in β1, and converges to r1 and r0 as β1 approaches 1 and 0. Thus, there exists β such that r′(1 ∣ 1) = r(1 ∣ 1; p, β, σI). Moreover, there is a unique rule |$\tilde{r}$| that satisfies |$\tilde{r}(1\mid 1)=r^{\prime }(1\mid 1)$| and |$\frac{\tilde{r}(1\,\mid \,1)\tilde{r}(0\,\mid \,0)}{\tilde{r}(0\,\mid \,1)\tilde{r}(0\,\mid \,1)}=d_p$|⁠. Thus, r′ must be r(p, β, σI) and hence constructible from p.10
 
Proof of Lemma 5.
The statement is trivial when r(p, β, σ) chooses an action a′ with probability 1, since then we can set |$\beta ^{\prime }_{a^{\prime }}=1$| and |$\beta ^{\prime }_{x}=0$| for xa′. Accordingly, assume that both actions are chosen with positive probabilities under the rule r(p, β, σ) and σ(x) = 1 − x. For the sake of contradiction, assume that r(p, β, σ) achieves a higher payoff than all rules constructible with p and σI. Then, the payoff difference between the rule r(p, β, σ) and the choice rule that always selects a = 1 must be positive:
$$\begin{eqnarray} \pi _0u_0r(0\mid 0;p,\beta ,\sigma )+\pi _1u_1r(1\mid 1;p,\beta ,\sigma )-\pi _1u_1 &=& \\ \pi _0u_0r\left(1\mid 0;p,\beta ,\sigma _I\right)+\pi _1u_1r\left(0\mid 1;p,\beta ,\sigma _I\right)-\pi _1u_1 &=& \\ \pi _0u_0r\left(1\mid 0;p,\beta ,\sigma _I\right) - \pi _1u_1r\left(1\mid 1;p,\beta ,\sigma _I\right) &\gt & 0, \end{eqnarray}$$
where we have used r(a ∣ θ; p, β, σI) = r(1 − a ∣ θ; p, β, σ) for the first equality. Similarly, the payoff difference between the rule r(p, β, σ) and the rule that always selects a = 0 must be positive:
$$\begin{eqnarray} \pi _0u_0r(0\mid 0;p,\beta ,\sigma )+\pi _1u_1r(1\mid 1;p,\beta ,\sigma )-\pi _0u_0 &=&\\ \pi _0u_0r\left(1\mid 0;p,\beta ,\sigma _I\right)+\pi _1u_1r\left(0\mid 1;p,\beta ,\sigma _I\right)-\pi _0u_0 &=&\\ \pi _1u_1r(0\mid 1;p,\beta ,\sigma _I)-\pi _0u_0 r(0\mid 0;p,\beta ,\sigma _I) &\gt & 0. \end{eqnarray}$$
The last two inequalities imply
$$\begin{eqnarray} \frac{r(1\mid 1;p,\beta ,\sigma _I)}{r(1\mid 0;p,\beta ,\sigma _I)} \lt \frac{\pi _0u_0}{\pi _1u_1}\lt \frac{r(0\mid 1;p,\beta ,\sigma _I)}{r(0\mid 0;p,\beta ,\sigma _I)}, \end{eqnarray}$$
which establishes contradiction because by Lemma 4, the rule r(a ∣ θ; p, β, σI) satisfies
$$\begin{eqnarray} \frac{r(1\mid 1;p,\beta ,\sigma _I)r(0\mid 0;p,\beta ,\sigma _I)}{r(1\mid 0;p,\beta ,\sigma _I)r(0\mid 1;p,\beta ,\sigma _I)}=\frac{p(1\mid 1)p(0\mid 0)}{p(1\mid 0)p(0\mid 1)}, \end{eqnarray}$$
and therefore it inherits the monotone likelihood ratio property from p.
 
Proof of Lemma 6.
Consider the choice rule r(p, β, σI) constructed from the experiment p with perceptual distance dp = d, and fix the probability r(0 ∣ 0; p, β, σI) = α of the correct choice in state 0 to a value α ∈ (0, 1). Then, by Lemma 4, the probability r(1 ∣ 1; p, β, σI) of the correct choice in state 1 satisfies
$$\begin{eqnarray} \frac{r(1\mid 1;p,\beta ,\sigma _I)\alpha }{(1-r(1\mid 1;p,\beta ,\sigma _I))(1-\alpha )}=d. \end{eqnarray}$$
For each α, the solution for r(1 ∣ 1; p, β, σI) of this equation increases in d.
 
Proof of Proposition 2.

The agent’s objective is linear with respect to the choice rule r(p, β, σ). Thus, the optimal rule is the point of tangency of the set |$\mathcal {R}_{\overline{p},\sigma _I}$| of the feasible rules and of an indifference line; see Figure 2. The slope |$\frac{d r\left(0\,\mid \,0;\overline{p},\beta ,\sigma _I\right)}{d r\left(1\,\mid \,1;\overline{p},\beta ,\sigma _I\right)}$| is decreasing in |$r\left(1\mid 1;\overline{p},\beta ,\sigma _I\right)$| and attains value |$-1/\overline{d}$| for |$r\left(1\mid 1;\overline{p},\beta ,\sigma _I\right)=0$|⁠, and value |$-\overline{d}$| for |$r\left(1\mid 1;\overline{p},\beta ,\sigma _I\right)=1$|⁠. Thus, when |$R\lt 1/\overline{d}$| or |$R\gt \overline{d}$|⁠, then the problem has the corner solution as specified in statements 1 and 2 of the proposition.

When |$R\in \left(1/\overline{d},\overline{d}\right)$|⁠, then the optimal choice rule |$r^*=r\left(\overline{p},\beta ^*,\sigma _I\right)$| satisfies the feasibility condition |$\frac{r^*(1\,\mid \,1)r^*(0\,\mid \,0)}{r^*(0\,\mid \,1)r^*(0\,\mid \,1)}=\overline{d}$|⁠, the second-thought-free condition (5) (applied to action a = 1):
$$\begin{eqnarray} \pi _1u_1r^*(1\mid 1)=\pi _0u_0r^*(0\mid 0)r^*(1\mid 0) + \pi _1u_1r^*(1\mid 1)r^*(1\mid 1), \end{eqnarray}$$
and two normalisation conditions ∑ar*(a∣ θ) = 1, for θ ∈ {0, 1}. These four conditions jointly imply the explicit solution for the optimal choice rule in (8). The expression (9) for |$\beta ^*_1/\beta ^*_0$| follows from (8) and the condition |$\frac{r^*(1\,\mid \,\theta )}{r^*(0\,\mid \,\theta )}= \frac{\beta _1^*\overline{p}(1\,\mid \,\theta )}{\beta _0^*\overline{p}(0\,\mid \,\theta )}$|⁠.

A.2. Proofs for Section 5

The next result is an auxiliary lemma used in the proof of Proposition 4.

 
Lemma 9.
The optimal effective choice rule r* satisfies for any pair of states θ, θ′ ∈ Θ:
$$\begin{eqnarray} \pi _\theta u_\theta r^*(\theta \mid \theta )r^*(\theta ^{\prime }\mid \theta )=r^*(\theta \mid \theta ^{\prime })\pi _{\theta ^{\prime }} u_{\theta ^{\prime }} r^*(\theta ^{\prime }\mid \theta ^{\prime }). \end{eqnarray}$$
(A3)

Condition (A3) is a strengthening of the second-thought-free condition (5). It requires that the agent who has terminated the decision process with perception θ, and knows that the second run of the process r* terminates with a value θ′ is indifferent between θ and θ′. This condition is stronger than the second-thought-free condition (5), since (5) requires (A3) to hold only on average across all θ′. This strengthening holds for the special case of a symmetric experiment p.

 
Proof of Lemma 9.
The optimal effective choice rule satisfies the second-thought-free condition (5), equivalent to:
$$\begin{eqnarray} \pi _\theta u_\theta r^*(\theta \mid \theta )=\sum _{\theta ^{\prime }\in \Theta }\pi _{\theta ^{\prime }} u_{\theta ^{\prime }} r^*(\theta \mid \theta ^{\prime })r^*(\theta ^{\prime }\mid \theta ^{\prime })\mbox{ for all }\theta \in \Theta , \end{eqnarray}$$
which after two algebraic steps gives:
$$\begin{eqnarray} \pi _\theta u_\theta r^*(\theta \mid \theta )\big (1-r^*(\theta \mid \theta )\big )=\sum _{\theta ^{\prime }\ne \theta }\pi _{\theta ^{\prime }}u_{\theta ^{\prime }} r^*(\theta \mid \theta ^{\prime })r^*(\theta ^{\prime }\mid \theta ^{\prime })\mbox{ for all }\theta \in \Theta , \end{eqnarray}$$
$$\begin{eqnarray} \sum _{\theta ^{\prime }\ne \theta }\pi _\theta u_\theta r^*(\theta \mid \theta )r^*(\theta ^{\prime }\mid \theta )=\sum _{\theta ^{\prime }\ne \theta }\pi _{\theta ^{\prime }}u_{\theta ^{\prime }} r^*(\theta ^{\prime }\mid \theta ^{\prime })r^*(\theta \mid \theta ^{\prime })\mbox{ for all }\theta \in \Theta . \end{eqnarray}$$
The last system of equations is formally equivalent to the system of balance conditions for a Markov chain. To see this, consider an ergodic Markov chain with transition probabilities from θ to θ′ equal to r*(θ′ ∣ θ). The balance condition for the stationary distribution μ(θ) of this chain is
$$\begin{eqnarray} \sum _{\theta ^{\prime }\ne \theta }\mu (\theta )r^*(\theta ^{\prime }\mid \theta )=\sum _{\theta ^{\prime }\ne \theta }\mu (\theta ^{\prime })r^*(\theta \mid \theta ^{\prime })\mbox{ for all }\theta \in \Theta , \end{eqnarray}$$
and thus, for each state θ, πθuθr*(θ ∣ θ) is proportional to the ergodic probability μ(θ) of the state θ for the chain with transition probabilities r*(θ′ ∣ θ).
Recall that if a Markov chain with transition probabilities m(θ′ ∣ θ) is reversible, then its stationary distribution μ(θ) satisfies detailed balance conditions
$$\begin{eqnarray} \mu (\theta )m(\theta ^{\prime }\mid \theta )=\mu (\theta ^{\prime })m(\theta \mid \theta ^{\prime })\mbox{ for all } \theta \ne \theta ^{\prime }. \end{eqnarray}$$
Thus, it suffices to prove that the probabilities r*(θ′ ∣ θ) constitute a reversible Markov chain.
Recall that a Markov chain m(θ′ ∣ θ) is reversible if and only if it satisfies the Kolmogorov criterion, which requires for all sequences of states θ1, θ2, …, θn,
$$\begin{eqnarray} \frac{m(\theta _2\mid \theta _1)m(\theta _3\mid \theta _2)\dots m(\theta _n\mid \theta _{n-1}) m(\theta _1\mid \theta _n)}{ m(\theta _n\mid \theta _1)m(\theta _{n-1}\mid \theta _n)\dots m(\theta _2\mid \theta _{3}) m(\theta _1\mid \theta _2)}=1. \end{eqnarray}$$
(A4)
The Markov chain with transition probabilities p(θ′ ∣ θ) given by the primitive experiment p satisfies the Kolmogorov criterion (A4) since p is symmetric by assumption. Finally, for any positive termination strategy β, the effective choice rule r(θ′ ∣ θ; p, β, σI) satisfies the Kolmogorov criterion too. This is because |$r(\theta ^{\prime }\mid \theta ;p,\beta ,\sigma _I)=\frac{\beta _{\theta ^{\prime }}p(\theta ^{\prime }\,\mid \,\theta )}{\sum _{\tilde{\theta }}\beta _{\tilde{\theta }}p(\tilde{\theta }\,\mid \,\theta )}$|⁠, and when the expressions for r(θ′ ∣ θ; p, β, σI) are substituted into (A4), then the terms |$\beta _{\theta ^{\prime }}$| and the denominators cancel out, and hence
$$\begin{eqnarray} && \frac{r(\theta _2\mid \theta _1;p,\beta \sigma _I)r(\theta _3\mid \theta _2;p,\beta ,\sigma _I)\dots r(\theta _1\mid \theta _n;p,\beta ,\sigma _I)}{ r(\theta _n\mid \theta _1;p,\beta ,\sigma _I)r(\theta _{n-1}\mid \theta _n;p,\beta ,\sigma _I)\dots r(\theta _1\mid \theta _2;p,\beta ,\sigma _I)} \\ && \qquad = \frac{p(\theta _2\mid \theta _1)p(\theta _3\mid \theta _2)\dots p(\theta _1\mid \theta _n)}{ p(\theta _n\mid \theta _1)p(\theta _{n-1}\mid \theta _n)\dots p(\theta _1\mid \theta _2)}=1, \end{eqnarray}$$
as needed.
 
Proof of Proposition 3.
Lemma 9 implies for all pairs θ, θ′ ∈ Θ:
$$\begin{eqnarray} \pi _\theta u_\theta r^*(\theta \mid \theta )r^*(\theta ^{\prime }\mid \theta )=r^*(\theta \mid \theta ^{\prime })\pi _{\theta ^{\prime }} u_{\theta ^{\prime }} r^*(\theta ^{\prime }\mid \theta ^{\prime }). \end{eqnarray}$$
By Lemma 2, we can substitute |$r^*(\theta ^{\prime }\mid \theta )=\frac{\beta ^*_{\theta ^{\prime }}p(\theta ^{\prime }\,\mid \,\theta )}{\sum _{\tilde{\theta }}\beta ^*_{\tilde{\theta }}p\left(\tilde{\theta }\,\mid \,\theta \right)}$|⁠, which gives
$$\begin{eqnarray} \frac{\beta ^*_{\theta }\beta ^*_{\theta ^{\prime }}\pi _\theta u_\theta p(\theta \mid \theta )p(\theta ^{\prime }\mid \theta )}{\left(\sum _{\tilde{\theta }}\beta ^*_{\tilde{\theta }}p(\tilde{\theta }\mid \theta )\right)^2} = \frac{\beta ^*_{\theta }\beta ^*_{\theta ^{\prime }}p(\theta \mid \theta ^{\prime })\pi _{\theta ^{\prime }} u_{\theta ^{\prime }} p(\theta ^{\prime }\mid \theta ^{\prime })}{\left(\sum _{\tilde{\theta }}\beta ^*_{\tilde{\theta }}p(\tilde{\theta }\mid \theta ^{\prime })\right)^2}. \end{eqnarray}$$
Using the symmetry of p we get
$$\begin{eqnarray} \frac{\sum _{\tilde{\theta }}\beta ^*_{\tilde{\theta }}p(\tilde{\theta }\mid \theta ^{\prime })}{\sum _{\tilde{\theta }}\beta ^*_{\tilde{\theta }}p(\tilde{\theta }\mid \theta )} = \left(\frac{\pi _{\theta ^{\prime }} u_{\theta ^{\prime }}p(\theta ^{\prime }\mid \theta ^{\prime })}{\pi _\theta u_\theta p(\theta \mid \theta )}\right)^{1/2}, \end{eqnarray}$$
(A5)
which gives (10) after rearrangement.
 
Proof of Proposition 4.
To compare r*(θ1 ∣ θ2) and r*(θ2 ∣ θ1), we write
$$\begin{eqnarray} \displaystyle\frac{r^*(\theta _1\mid \theta _2)}{r^*(\theta _2\mid \theta _1)} = \displaystyle\frac{\displaystyle\frac{\beta ^*_{\theta _1} p(\theta _1\mid \theta _2)}{\sum _{\tilde{\theta }} \beta ^*_{\tilde{\theta }} p(\tilde{\theta }\mid \theta _2) }}{\displaystyle\frac{\beta ^*_{\theta _2} p(\theta _2\mid \theta _1)}{\sum _{\tilde{\theta }} \beta ^*_{\tilde{\theta }} p(\tilde{\theta }\mid \theta _1)}} = \displaystyle\frac{\displaystyle\frac{\beta ^*_{\theta _1} p(\theta _1\mid \theta _2)}{\left(\pi _{\theta _2} u_{\theta _2}p(\theta _2\mid \theta _2)\right)^{1/2}}}{\displaystyle\frac{\beta ^*_{\theta _2} p(\theta _2\mid \theta _1)}{\left(\pi _{\theta _1}u_{\theta _1}p(\theta _1\mid \theta _1)\right)^{1/2}}} = \displaystyle\frac{\beta ^*_{\theta _1}p^{1/2}(\theta _1\mid \theta _1)}{\beta ^*_{\theta _2}p^{1/2}(\theta _2\mid \theta _2)}, \end{eqnarray}$$
where we have used (A5) in the second step, and the symmetry of p and equality of πθuθ across θ in the last step. Define |$\hat{\beta }_{\theta }=\beta ^*_{\theta }p^{1/2}(\theta \mid \theta )$|⁠. We need to prove that if θ1 is more distinct than θ2, then |$\hat{\beta }_{\theta _1}\gt \hat{\beta }_{\theta _2}$|⁠.
By (A5), |$\big (\hat{\beta }_{\theta }\big )_\theta$| satisfy the system of linear equations:
$$\begin{eqnarray} \sum _{\theta ^{\prime }}D_{\theta ^{\prime }\theta }\hat{\beta }_{\theta ^{\prime }}=1 \mbox{ for all } \theta , \end{eqnarray}$$
where . We claim that if θ1 is more distinct than θ2, then |$D_{\theta _3\theta _1}\lt D_{\theta _3\theta _2}$| for all θ3 ≠ θ1, θ2. This follows from p3 ∣ θ1) < p3 ∣ θ2) and from the symmetry of p:
$$\begin{eqnarray} p(\theta _1\mid \theta _1)&=&1-p(\theta _2\mid \theta _1)-\sum _{\theta _3\ne \theta _1,\theta _2}p(\theta _3\mid \theta _1) \gt 1-p(\theta _1\mid \theta _2)-\sum _{\theta _3\ne \theta _1,\theta _2}p(\theta _3\mid \theta _2) \\ &&=p(\theta _2\mid \theta _2), \end{eqnarray}$$
and therefore,
$$\begin{eqnarray} D_{\theta _3\theta _1}= \frac{p(\theta _1\mid \theta _3)}{p^{1/2}(\theta _1\mid \theta _1)p^{1/2}(\theta _3\mid \theta _3)} \lt \frac{p(\theta _2\mid \theta _3)}{p^{1/2}(\theta _2\mid \theta _2)p^{1/2}(\theta _3\mid \theta _3)}=D_{\theta _3\theta _2}. \end{eqnarray}$$
Thus,
$$\begin{eqnarray} D_{\theta _1\theta _1}\hat{\beta }_{\theta _1}+D_{\theta _2\theta _1}\hat{\beta }_{\theta _2}=1-\sum _{\theta _3\ne \theta _1,\theta _2}D_{\theta _3\theta _1}\hat{\beta }_{\theta _3} \gt 1-\sum _{\theta _3\ne \theta _1,\theta _2}D_{\theta _3\theta _2}\hat{\beta }_{\theta _3}= D_{\theta _2\theta _2}\hat{\beta }_{\theta _2}+D_{\theta _1\theta _2}\hat{\beta }_{\theta _1}. \end{eqnarray}$$
Using that Dθθ = 1 and |$D_{\theta \theta ^{\prime }}=D_{\theta ^{\prime }\theta }$|⁠, we have
$$\begin{eqnarray} \hat{\beta }_{\theta _1}+D_{\theta _2\theta _1}\hat{\beta }_{\theta _2} \gt \hat{\beta }_{\theta _2}+D_{\theta _2\theta _1}\hat{\beta }_{\theta _1}. \end{eqnarray}$$
The assumption of sufficient precision of p and symmetry of p imply that |$D_{\theta _2\theta _1}\lt 1$|⁠, and thus |$\hat{\beta }_{\theta _1}\gt \hat{\beta }_{\theta _2}$|⁠, as needed.

A.3. Proofs of the Results from Section 6

 
Proof of Lemma 7.

All rules feasible in |$\mathcal {P}_{iia}$| are feasible in |$\mathcal {R}(\mathcal {P}_{iia})$|⁠: |$\mathcal {R}(\mathcal {P}_{iia})\supset \mathcal {P}_{iia}$|⁠. This is immediate since when βa = 1 for all aA, then r(p, β, σI) = p for all |$p\in \mathcal {P}_{iia}$|⁠.

It remains to show |$\mathcal {R}(\mathcal {P}_{iia})\subset \mathcal {P}_{iia}$|⁠. Consider |$p(\gamma ,\sigma )\in \mathcal {P}_{iia}$| constructed in the setting of Example 1 by the use of the generalised termination strategy γ(m, x), and the action strategy σ(m, x). Recall that |$r(p(\gamma ,\sigma ),\beta ,\hat{\sigma })$| is the choice rule constructed by repetitions of the rule p(γ, σ) according to the termination strategy β = (βa)aA and by applying the action strategy |$\hat{\sigma }:A\longrightarrow A$| upon the termination. We need to show that there exists γ′ and σ′ such that |$r(p(\gamma ,\sigma ),\beta ,\hat{\sigma })=p(\gamma ^{\prime },\sigma ^{\prime })$|⁠. This is indeed so when the termination probability |$\gamma ^{\prime }(\mathfrak {t}\mid m,x)=\gamma (\mathfrak {t}\mid m,x)\beta _{\sigma (m,x)}$| for mm0, the transition probability to the original memory state m0 is |$\gamma ^{\prime }(m_0\mid m,x)=\gamma (m_0\mid m,x)+ \gamma (\mathfrak {t}\mid m,x)\left(1-\beta _{\sigma (m,x)}\right)$|⁠, which is the sum of the probabilities that the original process γ transits to m0 and that the decision process |$r(p(\gamma ,\sigma ),\beta ,\hat{\sigma })$| restarts after termination of p(γ, σ). Additionally, for all |$\tilde{m}\ne m_0$|⁠, |$\gamma ^{\prime }(\tilde{m}\mid m,x)=\gamma (\tilde{m}\mid m,x)$|⁠. The above choice of γ′ implies that the process p(γ′, σ′) replicates the Markov process over the memory states under |$r(p(\gamma ,\sigma ),\beta ,\hat{\sigma })$|⁠. Finally, to replicate the choices upon terminations, we set the action strategy |$\sigma ^{\prime }(m,x)=\hat{\sigma }(\sigma (m,x))$| for all (m, x).

 
Proof of Lemma 8.

Again, trivially, |$\mathcal {R}(\mathcal {P}_{pf})\supset \mathcal {P}_{pf}$|⁠, since r(p, (1, …, 1), σI) = p for all |$p\in \mathcal {P}_{pf}$|⁠. Additionally, |$\mathcal {R}(\mathcal {P}_{pf})\subset \mathcal {P}_{pf}$|⁠. This is indeed so because for any β = (βa)aA and any |$\hat{\sigma }:A\longrightarrow A$|⁠, |$r(p(\gamma ,\sigma ),\beta ,\hat{\sigma })=p(\gamma ^{\prime },\sigma ^{\prime })$| where the termination probability |$\gamma ^{\prime }(\mathfrak {t}\mid h,\theta )=\gamma (\mathfrak {t}\mid h,\theta )\beta _{\sigma (h)}$|⁠, the transition probability to the empty signal history ∅ is set to |$\gamma ^{\prime }(\emptyset \mid h,\theta )=\gamma (\emptyset \mid h,\theta )+ \gamma (\mathfrak {t}\mid h,\theta )\left(1-\beta _{\sigma (h)}\right)$|⁠, and for all |$\tilde{h} \ne \emptyset$|⁠, |$\gamma ^{\prime }(\tilde{h}\mid h,\theta )=\gamma (\tilde{h}\mid h,\theta )$|⁠. Finally, the action strategy is set to |$\sigma ^{\prime }(h)=\hat{\sigma }(\sigma (h))$| for all histories h.

 
Proof of Proposition 5.
We extend the definition of the effective experiment to the setting with discounting. Let
$$\begin{eqnarray} s_\delta (x\mid \theta ;p,\beta )=\sum _t\sum _{\mathbf {x}^t:x_t=x}\delta ^t\rho \left(\mathbf {x}^t\mid \theta ;p,\beta \right), \end{eqnarray}$$
where |$\rho \left(\mathbf {x}^t\mid \theta ;p,\beta \right)$| is the probability of the signal history |$\mathbf {x}^t$| defined in (1). Thus, sδ(x∣ θ; p, β) is the discounted probability that the agent’s last observed signal value is x. It satisfies the recursion:
$$\begin{eqnarray} s_\delta (x\mid \theta ;p,\beta )=\beta _x p(x\mid \theta )+\delta \sum _{x^{\prime }\in X}\left(1-\beta _{x^{\prime }} p\left(x^{\prime }\mid \theta \right)\right) s_\delta (x\mid \theta ;p,\beta ), \end{eqnarray}$$
(A6)
where the first summand is the probability that the decision process terminates with x in the first round and the second summand is the discounted probability that the process terminates with x later. Solving (A6) for sδ gives
$$\begin{eqnarray} s_\delta (x\mid \theta ;p,\beta )=\frac{\beta _x p(x\mid \theta )}{1-\delta +\delta \sum _{x^{\prime }\in X}\beta _{x^{\prime }} p(x^{\prime }\mid \theta )}. \end{eqnarray}$$
The discounted repeated-cognition problem (12) is thus equivalent to
$$\begin{eqnarray} \max _{p\in \mathcal {P},\beta \in B,\sigma \in S}\sum _{\theta \in \Theta ,x\in X} \pi _\theta \frac{\beta _x p(x\mid \theta )}{1-\delta +\delta \sum _{x^{\prime }\in X}\beta _{x^{\prime }} p(x^{\prime }\mid \theta )} u(\sigma (x),\theta ). \end{eqnarray}$$
(A7)
Consider x with an interior termination probability |$\beta ^*_x\in (0,1)$| and let a = σ*(x). The first-order condition of the problem (A7) with respect to βx is:
$$\begin{eqnarray} \sum _{\theta \in \Theta } \pi _\theta \frac{s_\delta (x\mid \theta ;p^*,\beta ^*)}{\beta ^*_x} u(a,\theta ) - \delta \sum _{\theta \in \Theta ,x^{\prime }\in X} \pi _\theta s_\delta (x^{\prime }\mid \theta ;p^*,\beta ^*)\frac{s_\delta (x\mid \theta ;p^*,\beta ^*)}{\beta ^*_x} u(\sigma ^*(x^{\prime }),\theta ) &=& \\ \sum _{\theta \in \Theta } \pi _\theta \frac{s_\delta (x\mid \theta ;p^*,\beta ^*)}{\beta ^*_x} u(a,\theta ) - \delta \sum _{\theta \in \Theta ,a^{\prime }\in A} \pi _\theta r^*_\delta (a^{\prime }\mid \theta ;p^*,\beta ^*,\sigma ^*)\frac{s_\delta (x\mid \theta ;p^*,\beta ^*)}{\beta ^*_x} u(a^{\prime },\theta ) &= &0, \end{eqnarray}$$
where we have summed over all x′ such that σ*(x′) = a′ in the second line. Multiplication by |$\beta ^*_x$| and summation over all x such that σ*(x) = a gives (13).
 
Proof of Proposition 6.

Due to the condition that α/(1 − α) > R, any (β, σ) that leads to a selection of only one action with certainty is dominated by the decision process that terminates after the first round and chooses an action equal to the observed signal value. Thus, both |$\beta ^*_0$| and |$\beta ^*_1$| are positive, and the action strategy is σ*(x) = x or σ*(x) = 1 − x. Let us show that the action strategy σ* must be the identity function σI.

Assume for contradiction that σ*(x) = 1 − x. The payoff difference between the rule rδ(p, β*, σ*) and the choice rule that always selects a = 1 must be positive, since the latter is dominated:
$$\begin{eqnarray} \pi _0u_0r_\delta (0\mid 0;p,\beta ^*,\sigma ^*)+\pi _1u_1r_\delta (1\mid 1;p,\beta ^*,\sigma ^*)-\pi _1u_1 &=& \\ \pi _0u_0r_\delta (1\mid 0;p,\beta ^*,\sigma _I)+\pi _1u_1r_\delta (0\mid 1;p,\beta ^*,\sigma _I)-\pi _1u_1 &\ge &\\ \pi _0u_0r_\delta (1\mid 0;p,\beta ^*,\sigma _I)-\pi _1u_1r_\delta (1\mid 1;p,\beta ^*,\sigma _I) \gt 0, & & \end{eqnarray}$$
where the first inequality follows from the fact that any discounted choice rule satisfies ∑arδ(a ∣ θ; p, β, σ) ≤ 1. Similarly, the payoff difference between the rule rδ(p, β*, σ*) and the rule that always selects a = 0 must be positive:
$$\begin{eqnarray} \pi _0u_0r_\delta (0\mid 0;p,\beta ^*,\sigma ^*)+\pi _1u_1r_\delta (1\mid 1;p,\beta ^*,\sigma ^*)-\pi _0u_0 &=&\\ \pi _0u_0r_\delta (1\mid 0;p,\beta ^*,\sigma _I)+\pi _1u_1r_\delta (0\mid 1;p,\beta ^*,\sigma _I)-\pi _0u_0 &\ge &\\ \pi _1u_1r_\delta (0\mid 1;p,\beta ^*,\sigma _I)-\pi _0u_0 r_\delta (0\mid 0;p,\beta ^*,\sigma _I) \gt 0. & & \end{eqnarray}$$
The last two inequalities imply:
$$\begin{eqnarray} \frac{r_\delta (0\mid 1;p,\beta ^*,\sigma _I)}{r_\delta (0\mid 0;p,\beta ^*,\sigma _I)}\gt \frac{\pi _0u_0}{\pi _1u_1}\gt \frac{r_\delta (1\mid 1;p,\beta ^*,\sigma _I)}{r_\delta (1\mid 0;p,\beta ^*,\sigma _I)}. \end{eqnarray}$$
This establishes contradiction because as shown in the proof of Proposition 5, |$r_\delta (x\mid \theta ;p,\beta ^*,\sigma _I)=s_\delta (x\mid \theta ;p,\beta ^*)=\frac{\beta ^*_x p(x\,\mid \,\theta )}{1-\delta +\delta \sum _{x^{\prime }}\beta ^*_{x^{\prime }}p(x^{\prime }\,\mid \,\theta )}$|⁠, and thus
$$\begin{eqnarray} \frac{r_\delta (1\mid 1;p,\beta ^*,\sigma _I)r_\delta (0\mid 0;p,\beta ^*,\sigma _I)}{r_\delta (0\mid 1;p,\beta ^*,\sigma _I)r_\delta (1\mid 0;p,\beta ^*,\sigma _I)} =\frac{p(1\mid 1)p(0\mid 0)}{p(0\mid 1)p(1\mid 0)}\gt 1. \end{eqnarray}$$

Further, it must hold that |$\beta ^*_0=1$| or |$\beta ^*_1=1$|⁠. Otherwise, if both |$\beta ^*_0\lt 1$| and |$\beta ^*_1\lt 1$|⁠, then the agent can increase both |$\beta ^*_x$| by a same factor. This preserves the conditional action distribution in each state θ and increases the decision rates in both states, and thus it is a profitable deviation.

Additionally, it must be that |$\beta ^*_1=1$|⁠: using the expressions for sδ(θ ∣ θ; p, β) = rδ(θ ∣ θ; p, β, σI), the payoff for σI and (β0, β1) = (β, 1) is
$$\begin{eqnarray} \pi _0u_0\frac{\beta \alpha }{1-\delta +\delta (\beta \alpha +1-\alpha )}+\pi _1u_1\frac{\alpha }{1-\delta +\delta (\alpha +\beta (1-\alpha ))}, \end{eqnarray}$$
(A8)
and payoff for σI and (β0, β1) = (1, β) is
$$\begin{eqnarray} \pi _0u_0\frac{\alpha }{1-\delta +\delta (\alpha +\beta (1-\alpha ))}+\pi _1u_1\frac{\beta \alpha }{1-\delta +\delta (\beta \alpha +1-\alpha )}. \end{eqnarray}$$
(A9)
The assumptions that π1u1 > π0u0 and that α > 1/2 imply that, for any β ∈ (0, 1), (A8) exceeds (A9), as needed.
It therefore remains to find |$\beta ^*_0 \in (0,1]$|⁠. If the optimal value is interior, then it satisfies (13) with a = 0:
$$\begin{eqnarray} \pi _0u_0r_\delta (0\mid 0,p,\beta ^*,\sigma _I) &=& \delta \left(\pi _0u_0r^2_\delta (0\mid 0;p,\beta ^*,\sigma _I)\right. \\ && \left. +\pi _1u_1r_\delta (1\mid 1;p,\beta ^*,\sigma _I)r_\delta (0\mid 1;p,\beta ^*,\sigma _I) \right). \end{eqnarray}$$
After the substitution of |$r_\delta (x\mid \theta ;p,\beta ,\sigma _I)=\frac{\beta _xp(x\,\mid \,\theta )}{1-\delta +\delta \sum _{x^{\prime }}\beta _{x^{\prime }}p(x^{\prime }\,\mid \,\theta )}$|⁠, this condition simplifies into a quadratic equation for |$\beta ^*_0$|⁠. When |$\delta \lt \frac{1}{\alpha + (1-\alpha )R}$|⁠, then this condition does not have an interior solution and the derivative of the value (A8) with respect to β0 at β0 = 1 is positive. Thus, in this case, the unique |$\beta ^*_0$| satisfying the first-order condition is |$\beta ^*_0=1$|⁠.

When |$\delta \gt \frac{1}{\alpha + (1-\alpha )R}$|⁠, then the condition has an interior solution and the derivative of the value (A8) with respect to β0 at β0 = 1 is negative. Thus, for this range of parameters, the unique |$\beta ^*_0$| satisfying the first-order condition is the interior value that solves the quadratic equation, solution of which decreases in δ.

Notes

This paper has been previously circulated under the title: ‘On Second Thoughts, Selective Memory, and Resulting Behavioral Biases.’ We thank Mark Dean, Andrew Ellis, Alessandro Pavan, Philip Reny, Larry Samuelson, Colin Stewart, Balazs Szentes, colleagues at the University of Edinburgh, the audiences at Bocconi, Queen Mary, Columbia, Ecole Polytechnique, Zurich and St Andrews universities, workshops and conferences in Erice, Alghero, Faro, Gerzensee, New York, Cambridge, Vancouver, and Barcelona, the editor (Gilat Levy), and the two referees for helpful comments. Ludmila Matysková and Jan Šedek provided excellent research assistance. Deborah Nováková and Laura Straková have helped with English. Jakub Steiner has received financial support from the Czech Science Foundation grant 16-00703S and from the ERC grant 770652. Philippe Jehiel thanks the ERC grant n 742816 for funding.

Footnotes

1

In the latter case, while the second-thought free conditions need not be satisfied for each problem in isolation, we would still derive that some degree of selective hesitation is optimal.

2

Somewhat less related is a literature that explores how exogenous analogy-based and extrapolation-driven errors in learning lead to behavioural biases; see coarse learning in Jehiel (2005) and its application to overoptimism in Jehiel (2018). By contrast, in our approach, the agent optimises the error distribution given the constraints.

3

We do not allow for mixed action strategies since the optimum can always be achieved with a pure action strategy.

4

This insight exploits the assumption of perfect patience, since impatient agents would trade off informativeness against delay costs. We conjecture that when exponential discounting is considered, then the result that the agent ignores all but two signal realisations continues to hold for sufficiently patient agents and generic signal structures.

5

Such summary statistic of |$\mathcal {P}$| continues to exist when 2 < |X| < ∞. For any pair of signal realisations (x, x′) and an experiment p, let |$d_{x,x^{\prime },p}=\frac{p(x\,\mid \,0)p(x^{\prime }\,\mid \,1)}{p(x\,\mid \,1)p(x^{\prime }\,\mid \,0)}$|⁠. Then, |$\overline{d}$| is the maximum of |$d_{x,x^{\prime },p}$| over all ordered pairs (x, x′) and experiments p.

6

This is to be contrasted with the reputation-based explanation of Gentzkow and Shapiro (2006). See also Calvert (1985), Suen (2004), and Che and Mierendorff (2019) for constrained-optimal media-bias models.

7

We can always achieve this by relabelling the states θ and the signals values x, unless ρ0 = ρ1 or ρ0 = 1 − ρ1.

8

These two assumption are satisfied when p(θ ∣ θ) is sufficiently close to one for each θ.

9

A truncation is obtained by deleting one or more last elements in h.

10

Rules ra that always select an action a can be trivially constructed from p and σI by using βa = 1 and βx = 0 for xa.

References

Basu
P.
,
Chatterjee
K.
(
2015
). ‘
On interim rationality, belief formation and learning in decision problems with bounded memory
’, Unpublished.

Bordalo
P.
,
Gennaioli
N.
,
Shleifer
A.
(
2012
). ‘
Salience theory of choice under risk
’,
The Quarterly Journal of Economics
, vol.
127
(
3
), pp.
1243
85
.

Calvert
R.L.
(
1985
). ‘
The value of biased information: A rational choice model of political advice
’,
The Journal of Politics
, vol.
47
(
2
), pp.
530
55
.

Che
Y.K.
,
Mierendorff
K.
(
2019
). ‘
Optimal dynamic allocation of attention
’,
American Economic Review
, vol.
109
(
8
), pp.
2993
3029
.

Compte
O.
,
Postlewaite
A.
(
2012
). ‘
Belief formation
’, Working Paper,
University of Pennsylvania
, Unpublished.

Fryer
R.G.
,
Harms
P.
,
Jackson
M.O.
(
2018
). ‘
Updating beliefs when evidence is open to interpretation: Implications for bias and polarization
’,
Journal of the European Economic Association
, vol.
17
(
5
), pp.
1470
501
.

Fudenberg
D.
,
Strack
P.
,
Strzalecki
T.
(
2018
). ‘
Speed, accuracy, and the optimal timing of choices
’,
American Economic Review
, vol.
108
(
12
), pp.
3651
84
.

Gabaix
X.
,
Laibson
D.
(
2017
), ‘
Myopia and discounting
’, No. w23254,
National Bureau of Economic Research
.

Gentzkow
M.
,
Shapiro
J.M.
(
2006
). ‘
Media bias and reputation
’,
Journal of Political Economy
, vol.
114
(
2
), pp.
280
316
.

Hellman
M.
,
Cover
T.M.
(
1970
). ‘
Learning with finite memory
’,
Annals of Mathematical Statistics
, vol.
41
, pp.
765
82
.

Itti
L.
(
2007
). ‘
Visual salience
’,
Scholarpedia
, vol.
2
(
9
), p. 3327.

Jehiel
P.
(
2005
). ‘
Analogy-based expectation equilibrium
’,
Journal of Economic Theory
, vol.
123
(
2
), pp.
81
104
.

Jehiel
P.
(
2018
). ‘
Investment strategy and selection bias: An equilibrium perspective on overoptimism
’,
American Economic Review
, vol.
108
(
6
), pp.
1582
97
.doi:10.1257/aer.20161696.

Kahneman
D.
(
2011
).
Thinking, Fast and Slow
,
New York
:
Farrar, Strauss, Giroux
.

Kahneman
D.
,
Tversky
A.
(
1979
). ‘
Prospect theory: An analysis of decision under risk
’,
Econometrica
, vol.
47
(
2
), pp.
263
91
.

Khaw
M.W.
,
Li
Z.
,
Woodford
M.
(
2017
). ‘
Risk aversion as a perceptual bias
’, No. w23294,
National Bureau of Economic Research
.

Leung
B.T.K.
(
2020
). ‘
Limited cognitive ability and selective information processing
,
Games and Economic Behavior
, vol.
120
(
2020
), pp.
345
69
.

Luce
R.D.
(
1986
).
Response Times: Their Role in Inferring Elementary Mental Organization
,
New York
:
Oxford University Press
.

Meyer
M.A.
(
1991
). ‘
Learning from coarse information: Biased contests and career profiles
’,
The Review of Economic Studies
, vol.
58
(
1
), pp.
15
41
.

Netzer
N.
(
2009
). ‘
Evolution of time preferences and attitudes toward risk
’,
The American Economic Review
, vol.
99
(
3
), pp.
937
55
.

Nickerson
R.S.
(
1998
). ‘
Confirmation bias: A ubiquitous phenomenon in many guises
’,
Review of General Psychology
, vol.
2
(
2
), p. 175.

Oswald
M.E.
,
Grosjean
S.
(
2004
). ‘
Confirmation bias
’, in (
Pohl
R.F.
, ed.),
Cognitive Illusions: A Handbook on Fallacies and Biases in Thinking, Judgement and Memory
, vol.
79
.

Piccione
M.
,
Rubinstein
A.
(
1997
). ‘
On the interpretation of decision problems with imperfect recall
’,
Games and Economic Behavior
, vol.
20
(
1
), pp.
3
24
.

Rabin
M.
,
Schrag
J.L.
(
1999
). ‘
First impressions matter: A model of confirmatory bias
’,
The Quarterly Journal of Economics
, vol.
114
(
1
), pp.
37
82
.

Ratcliff
R.
,
McKoon
G.
(
2008
). ‘
The diffusion decision model: theory and data for two-choice decision tasks
’,
Neural Computation
, vol.
20
(
4
), pp.
873
922
.

Rayo
L.
,
Becker
G.S.
(
2007
). ‘
Evolutionary efficiency and happiness
’,
Journal of Political Economy
, vol.
115
(
2
), pp.
302
37
.

Robson
A.J.
(
2001
). ‘
The biological basis of economic behavior
’,
Journal of Economic Literature
, vol.
39
(
1
), pp.
11
33
.

Sims
C.A.
(
2003
). ‘
Implications of rational inattention
’,
Journal of Monetary Economics
, vol.
50
(
3
), pp.
665
90
.

Suen
W.
(
2004
). ‘
The self-perpetuation of biased beliefs
’,
Economic Journal
, vol.
114
(
495
), pp.
377
96
.

Swensson
R.G.
(
1972
). ‘
The elusive tradeoff: Speed vs accuracy in visual discrimination tasks
’,
Perception & Psychophysics
, vol.
12
(
1
), pp.
16
32
.

Wald
A.
(
1945
). ‘
Sequential tests of statistical hypotheses
’,
The Annals of Mathematical Statistics
, vol.
16
(
2
), pp.
117
86
.

Wilson
A.
(
2014
). ‘
Bounded memory and biases in information processing
’,
Econometrica
, vol.
82
(
6
), pp.
2257
94
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.