Learning and Smooth Stopping Robin Mason∗

Juuso V¨alim¨aki



12 October 2009

Abstract We propose a simple model of optimal stopping where the economic environment changes as a result of learning. A primary application of our framework is the problem of how optimally to sell an asset, when the demand for that asset is initially uncertain. We distinguish between two versions of the model. In the first version, a lower price is always more informative; in the second, an intermediate price conveys most information to the seller. For the first model, we show that learning leads to a higher posted price by the seller. In the second version, we give sufficient conditions so that learning leads to a higher posted price for an optimistic seller and a lower price for a pessimistic seller.



University of Exeter and CEPR. University of Exeter Business School, Streatham Court, Rennes Drive, Exeter EX4 4PU, UK, [email protected]. Robin Mason acknowledges financial support from the ESRC under Research Grant RES-062-23-0925. † Helsinki School of Economics and University of Southampton, and HECER. Arkadiankatu 7, FI00100 Helsinki, [email protected].

1

Introduction

Models of optimal stopping capture the trade-off between known current payoffs and the opportunity cost of future potentially superior possibilities. In many of their economic applications, the environment in which the stopping decisions are taken is assumed to be stationary. In the standard job search model, for example, a rejected offer is followed by another draw from the same distribution of offers. See Rogerson, Shimer, and Wright (2005) for a survey. The predictions of these models are at odds with empirical observations. Prices of houses decline as a function of time on the market; see e.g., Merlo and Ortalo-Magne (2004, p. 214). Reservation wages of unemployed workers decline in the duration of the unemployment spell; see e.g., Lancaster and Chesher (1983). We propose a simple model of optimal stopping where the economic environment changes as a result of learning. From a methodological point of view, our main contribution is the development of a framework that captures essential features of the learning problem and yet is analytically tractable. Rather than using the deterministic model of optimal stopping typically used in optimal search models, we generalise the other canonical model of stopping in economic theory, where stopping occurs probabilistically (as in e.g., models of R&D). We cast the model in the language of a seller of a single asset, but in the concluding section, we point out alternative interpretations for the model. In the main model, a seller that is initially uncertain about demand for its asset encounters potential buyers randomly. The seller posts a take-it-or-leave-it price; potential buyers arrive according to a Poisson process and observe the posted price. They buy if and only if their valuation exceeds the posted price. The seller is initially uncertain about the probability of encountering a buyer, and the only information that the seller receives is the decision to buy. The seller updates its beliefs about demand for its good in a Bayesian fashion, becoming more pessimistic about future selling opportunities after each period when a sale does not occur. The rate at which beliefs about future selling opportunities decline depends on the current posted price. Hence, when choosing the current price of its asset, the seller controls its immediate expected profit, conditional on a sale occurring, as well as the beliefs about future demand if no sale occurs. 1

This is a variant of the classic problem studied by Karlin (1962) on the optimal policy for selling an asset. Several examples fit in to the framework. Consider the problem of selling a house. The seller has a single item—his house—to sell. The seller puts details of the house in the window of a real estate agent (or a newspaper), including the asking (or reservation) price. It takes time for potential buyers to notice that a house is for sale. Buyers arrive randomly over time and an individual buyer’s valuation is a random draw. A buyer offers to pay the asking price only if it is less than her valuation; otherwise, she makes no offer and continues to search for other houses. We assume that a buyer does not return in the future: hence any buyer faces a static decision problem. At the outset, the seller may be uncertain about how attractive the house is for the other side of the market. Delay in selling the house conveys information about the true demand for the house. In choosing an asking price, the seller affects not just his payoff when he sells his house, but also how much information he receives. (At one extreme, if the seller sets a very high price, no buyer will be willing to buy, regardless of the state of demand.) The seller’s task is set the asking price for his house over time, as he learns about demand. The same model can be related to the classic job search example described by e.g., Lippman and McCall (1976). An individual, referred to as the searcher, is seeking employment. Each and every day (until he accepts a job), he posts a job application (which states his reservation wage) to a potential employer. In Lippman and McCall, the searcher generates exactly one job offer each day. In our setting, we suppose that vacancies occur randomly over time. The searcher’s skills are unvarying, but prospective employers do not necessarily value them equally. A job offer is made if and only if an employer has a vacancy on the day that it receives the searcher’s application, and the employer’s valuation of the searcher exceeds the reservation wage. Offers not made immediately are lost (the “sampling without recall” case considered by Lippman and McCall). The searcher accepts the first offer that is made that meets his reservation wage and transits to the permanent state of employment. When an offer is not received, the searcher revises his beliefs about demand for his services, and so adjusts the reservation wage that he states

2

in his future applications. Even though our model is a standard Bayesian learning model, the belief dynamics may appear unfamiliar at first sight. Conditional on a sale occurring, the game is over and continuation beliefs play no further role. When we describe the evolution of beliefs over time, we are implicitly conditioning on the event that no sale occurred in the current period. Since this event is bad news about the prospect of a future sale, the seller becomes more pessimistic over time. The seller can then control the (downward) drift of its beliefs through its choice of price. In addition, the process of beliefs is effectively deterministic, again because continuation beliefs play a role in only one of two possible random outcomes. This is in contrast to many learning models where the continuation beliefs are stochastic and form a martingale, and where an agent can control only the variance of her beliefs. We identify two key effects, by comparing a model with learning to a model with stationary posted prices. A more pessimistic future implies a lower current value of being a seller and hence a lower current price. We call this feature the controlled stopping effect. As we have already noted, the current price also determines the magnitude of the change in beliefs if no sale occurs. By setting a more informative price, the seller can cause its beliefs to fall further in the event of no sale occurring. A more pessimistic seller has a lower value (to being a seller) than a more optimistic seller—there is a capital loss from learning. Holding fixed the probability of a sale occurring, then, the seller has an incentive to move its price in the direction that minimizes this capital loss. We call this feature the controlled learning effect . Which effect dominates? Does learning cause the seller to raise or lower its price, relative to the case when no learning occurs? How does this comparison depend on the level of beliefs (i.e., degree of optimism or pessimism) of the seller? Finally, does a more pessimistic seller always set a lower price; or do the two effects lead to more complicated dynamics for the price? A key question for the analysis is whether these two effects work in opposite or the same directions. The benchmark is a model where the rate of contact between the seller and buyers is fixed at its current expected level. Since the value of continuing as a

3

seller decreases as the seller becomes more pessimistic (in the model with learning), the controlled stopping effect on its own leads to lower optimal posted prices. We then distinguish between two different scenarios. In the first, the seller learns about the Poisson rate of contact with buyers. In this case, a lower price is always more informative. We show that the controlled learning effect dominates. Hence the optimal posted price in this first case exceeds the price posted in the equivalent model with no learning. In the second case, the seller learns about the distribution from which buyers’ valuations are drawn (but knows the rate of contact with buyers). If all buyers are willing to buy at very low prices and no buyers buy at very high prices, then it is clear that intermediate prices are the most informative. As a result, the controlled learning effect pulls up the price of an optimistic seller and pushes down the price of a pessimistic one. Since the controlled stopping effect always pushes prices downwards, it is clear that the model with learning yields a relatively low price for a pessimistic seller, in comparison to the model with no learning. We also derive sufficient conditions for the model to yield a higher price for an optimistic seller in the learning case. As a result, learning accelerates the decline in price as beliefs become more pessimistic. A number of papers have studied the problem of a monopolist who is uncertain about the demand it faces and learns about it over time through its pricing experience. In Rothschild (1974), McLennan (1984), Easley and Kiefer (1988), Aghion, Bolton, Harris, and Jullien (1991) and Mirman, Samuelson, and Urbano (1993), the demand curve is fixed and the learning process narrows down the initial uncertainty about it. Rothschild (1974) showed that the cost of experimentation will typically prevent complete learning, even when the true state of demand can be learned perfectly. Aghion, Bolton, Harris, and Jullien (1991) provide a rigorous assessment of the situations in which the seller will choose to learn perfectly about demand, and when not. (In other papers, the object of the learning is not constant, but changes over time in some random way. See, for example, Balvers and Cosimano (1990), Rustichini and Wolinsky (1995) and Keller and Rady (1999).)

4

The general difficulty in analysing this type of situation is the updating that occurs when the seller observes that the object has failed to sell at a particular price. In some models, this would cause the seller to revise downwards her beliefs on the value buyers place on the object. In general, this updating cannot be modelled in a tractable way. (There are certain distributions of valuations for which updating can be done analytically; but even for these cases, the amount of progress that can be made is minimal.) A notable exception to this is Trefler (1993), who provides a characterization of the expected value of information and the direction of experimentation in several cases of learning by a monopoly seller. But there are several differences between our work and Trefler’s. First, we consider the sale of a single item; as a result, the monopolist faces a stopping problem (albeit one where stopping occurs probabilistically). In Trefler’s model, the true (but unknown) economic environment remains stationary across periods, and the model is thus one of repeated sales. Secondly, Trefler considers only cases where the expected value of information is affected monotonically by the seller’s price. In contrast, we consider both the monotonic (section 3) and non-monotonic (section 4) cases. Finally, due to the ‘smooth’ environment that we consider, the Bellman equations are first-order differential equations; they are therefore particularly easy to analyse. Our paper is also related to the literature on search with learning. Burdett and Vishwanath (1988) analyse an optimal stopping problem when wage offers come from exogenous sources. Crucially, in Burdett and Vishwanath (1988), learning is independent of the current actions and hence the stopping problem is quite different from the one that we propose to study. Furthermore, the analysis in Burdett and Vishwanath (1988) is complicated by the fact that new wage offers lead to discrete jumps in the posterior belief whereas in our model, the evolution of beliefs is smooth (see e.g., equation (9)). There is a small number of papers on search in models where the state of the system changes exogenously. Van den Berg (1990) and Smith (1999) are examples of such models. Anderson and Smith (2006) analyse a matching model where each match results in a current immediate payoff and reveals new information about the productivity of the partners. In all of these papers, beliefs change i.e., the environment is non-stationary;

5

but the change in beliefs is not affected by the action chosen by the economic agent. Our contribution is to develop a tractable framework that allows agents to affect their learning through their actions. Finally, the analysis has some resemblance to models of bargaining e.g., Sobel and Takahashi (1983), Gul, Sonnenschein, and Wilson (1986), and Ausubel and Deneckere (1989). In these models with one-sided incomplete information, the uninformed seller makes all the offers, and the game stops when an offer is accepted. Some of the results are similar to those we find in our analysis; for example, that the optimal selling price falls over time as the seller becomes more pessimistic. Nevertheless, our model is quite different from bargaining. Most obviously, it is not a game, but a decision problem. The buyers in our model are myopic and do not behave strategically. This simplifies the situation considerably and allows us to consider the case where learning is non-monotonic (see section 4). The paper is organised as follows. Section 2 presents the general model of learning. Section 3 analyses the model in the case of learning about the rate of contact between the seller and buyers. Section 4 deals with the case of learning about valuations. Section 5 discusses alternative interpretations of the model and concludes.

2

Learning to sell

Consider a seller setting a take-it-or-leave-it price for a single unit of a good of known quality. The seller announces at the start of each period the price for that period. If a buyer announces that she is willing to buy at that price, then the seller sells and stops searching. The probability that the seller encounters such a buyer depends on two factors: whether a buyer is present during the period; and whether that buyer is willing to pay the seller’s posted price. The seller does not observe whether a buyer is present: it observes only when a sale occurs. (We comment further on this below.) Each buyer is present (if at all) for one period, and then disappears. We assume that the seller has a zero marginal cost (or valuation of the product). But discounting means that delay in selling the good

6

is costly. Time is continuous, the seller is infinitely lived and discounts the future at rate r. There are two states of the world: ‘high’ (H) or ‘low’ (L). The seller is initially uncertain about the state and its problem is to post prices to optimize the present value of expected profits. The problem can thus be cast as a continuous time discounted dynamic program where the belief about the state of the world serves as the state variable. The seller sells during a time interval of length ∆t in state H when posting a price p ∈ R+ with probability GH (p)∆t; the probability in state L is GL (p)∆t. Both probabilities are decreasing in p (a higher price makes it less likely that a sale occurs). Let the corresponding derivatives be gL (p) and gH (p), both non-positive. Assume that the distributions GH (·) and GL (·) can be ranked by hazard rate dominance: Assumption 1 (Hazard rate dominance) gH (p) gL (p) ≥ ∀ p ∈ R+ . 1 − GL (p) 1 − GH (p)

Note that assumption 1 implies first-order stochastic dominance (FOSD) i.e., ∆G(p) ≡ GH (p) − GL (p) ≥ 0. Let ∆g(p) ≡ gH (p) − gL (p). The state variable in this problem is then the seller’s scalar belief π ∈ [0, 1] that the state of the world is high. When a sale occurs (at the posted price), the problem stops. Hence the only event that is relevant for updating the seller’s beliefs is when no sale has occurred in a period. Denote the posterior probability that the distribution is H after a history that has included no sales by π. Along the optimal path, prices must occur with strictly positive probability and hence π must be strictly decreasing, by FOSD. The updated posterior after no sale is, by Bayes’ rule,

π + ∆π =

π(1 − GH (p)∆t) π(1 − GH (p)∆t)) + (1 − π)(1 − GL (p)∆t))

7

so that

∆π =

−π(1 − π)∆G(p)∆t ≤ 0, 1 − G(p, π)∆t

(1)

where G(p, π) ≡ πGH (p) + (1 − π)GL (p). Notice that the seller becomes more pessimistic about the state of the world when no sale occurs. One modelling assumption deserves further comment. We assume that the seller observes only sales, and not the arrivals of buyers. One interpretation of this assumption is that the seller places an advertisement in a newspaper each period. The seller cannot tell if a buyer has seen the advertisement. This assumption ensures two things. First, the probability of stopping, G(p, π)∆t, is of order ∆t. Secondly, the change in beliefs ∆π also is of order ∆t. We shall be interested in the continuous-time limit, as ∆t → 0. This assumption then ensures that the dynamic programming problem yields a differential, rather than a difference equation. (This is why we use the label “smooth stopping” for our problem.) As a result, we are able to obtain the qualitative features of the optimal price with learning. First consider the seller’s static problem:  max G(p, π)p . p

Let g(p, π) ≡ πgH (p) + (1 − π)gL (p) ≤ 0. The first-order condition is  ΓS (p, π) ≡ r G(p, π) + g(p, π)p = 0. (The r factor appears to aid comparisons later.) Assume that ΓS (p, π) is strictly decreasing in p, for all π ∈ [0, 1]. In this case, the static problem has a unique solution pS (π). Consider next the benchmark intertemporal model where the seller’s belief π does not change over time. Since the problem remains the same in all periods, the dynamic programming problem of the seller is particularly easy to solve. This solution will serve

8

as a natural point of comparison to the model that incorporates learning i.e., with π changing over time. We refer to this case as the repeated problem. (See also Keller (2007); our repeated benchmark corresponds to his “naive learner” case.) For the repeated problem, the seller’s Bellman equation is  VR (π) = max G(p, π)∆tp + (1 − r∆t)(1 − G(p, π)∆t)VR (π) . p

In the continuous-time limit as ∆t → 0, this becomes  rVR (π) = max G(p, π)(p − VR (π)) . p

(2)

We denote the maximizer of this problem by pR (π)The first-order condition for the repeated problem is ΓR (pR , π) ≡ ΓS (pR (π), π) + G2 (pR (π), π) = 0.

(3)

Since G(p, ·) is decreasing in p, ΓR (p, ·) is decreasing in p; hence there is a unique solution to equation (3). And since G(·, ·) ≥ 0, pR (π) ≥ pS , as expected. We summarise this in the following lemma. Lemma 1 The repeated price is greater than the static price: pR (π) ≥ pS for all π ∈ [0, 1].

For an arbitrary p, we write

VR (p, π) =

G(p, π)p . r + G(p, π)

The value function of the repeated problem is then

VR (π) = max VR (p, π) = p

G(pR (π), π)pR (π) . r + G(pR (π), π)

It will be helpful for later arguments to establish the monotonicity of pR (π). 9

Proposition 1 The repeated price pR (π) is non-decreasing in the level of the seller’s belief π for all π ∈ [0, 1]. Proof. The first-order condition for pR (π) can be written as

G(p, π) + g(p, π)(p − VR (π)) = 0.

Total differentiation of this equation, along with the second-order condition, means that pR (π) is non-decreasing in π iff

∆G(p) −

G(p, π)∆g(p) − g(p, π)VR′ (π), g(p, π)

evaluated at p = pR (π), is non-negative. Since g(·, ·) ≤ 0 and VR′ (·) ≥ 0, the last term −g(p, π)VR′ (π) · 0. The first two terms are together positive, by assumption 1.



Consider next the model with learning: the dynamic problem. The crucial difference compared to the repeated problem is that now π evolves over time according to equation (1). The seller anticipates this and adjusts the optimal price to control the current probability of a sale occurring and the future value of sales. The seller’s Bellman equation is now  VD (π) = max G(p, π)∆tp + (1 − r∆t)(1 − G(p, π)∆t)VD (π + ∆π) . p

In the continuous-time limit as ∆t → 0, this becomes  rVD (π) = max G(p, π)(p − VD (π)) − π(1 − π)∆G(p)VD′ (π) . p

(4)

It is easy to see that in this problem, VD′ (π) > 0 almost everywhere. Since the value in any program is realized only at the stopping moment, the value of any fixed policy must be increasing in the arrival rate and hence π. Hence the optimal policy starting at π yields a higher value when started at state π ′ > π and as a consequence VD (π ′ ) > VD (π). We show in the appendix (see lemma 3) that VD (π) is convex and as a result, VD (π) is 10

differentiable almost everywhere. After standard manipulations, the Bellman equation can be rewritten as  ∆G(p) ′ π(1 − π)VD (π) . VD (π) = max VR (p, π) − p r + G(p, π) 

(5)

This latter equation makes clear the connection between the repeated and dynamic problem. The dynamic value function is equal to a repeated value function, VR (pD , π) (evaluated at the optimal dynamic price, pD ), minus a term that arises due to learning. Note that −∆G(p)π(1 − π)VD′ (π) is the capital change (a loss) of the seller’s program resulting from the evolution of the seller’s posterior. The learning term is the present value of the infinite stream of the capital losses due to learning, using the effective discount rate r + G(p, π). It is instructive to analyse the behaviour of the learning term more carefully. The factor π(1 − π)VD′ (π) in the capital loss term represents the impact of a unit of learning on future profits. It is unaffected by the current choice of p and hence we ignore its effect for the moment. We thus concentrate on the behaviour of



∆G(p) r + G(p, π)

as a function of p. Its derivative with respect to p has two terms. The first, ∆G(p)g(p, π) ≤ 0, (r + G(p, π))2 measures the effect of p on the modified stopping rate. We call this effect the controlled stopping effect. This effect captures the increasing pessimism of the seller and always has a negative sign. This effect alone leads to a lower posted price by the seller. The second term



∆g(p) r + G(p, π)

measures the change in learning, normalised by the stopping rate resulting from a change 11

in p. We call this effect the controlled learning effect. The sign of this term is ambiguous, since ∆g(p) may take either sign. The first-order condition for the dynamic problem is ΓR (p, π) − ΓD (p, π)π(1 − π)VD′ (π) = 0, where ΓD (p, π) ≡ −g(p, π)∆G(p) + (r + G(p, π))∆g(p).

(6) (7)

The first term, −g(p, π)∆G(p), represents the controlled stopping effect; the second term, (r + G(p, π))∆g(p), the controlled learning effect. The sign of ΓD (p, π) is crucial for determining whether the optimal price with learning, pD (π), is above or below the repeated price pR (π). If ΓD (p, π) is non-negative for all p and π, then pD (π) ≥ pR (π). If ΓD (p, π) changes sign as p and/or π changes, then the comparison of pD (π) and pR (π) will be more complicated. An alternative way of viewing this situation is to define the expected continuation value of information (DeGroot (1962)):

I(p, π) ≡ E[VD (πt+∆t )] − VD (π)  = − G(p, π)VD (π) + π(1 − π)∆G(p)VD′ (π) ∆t + O((∆t)2 ).

(8)

Ignoring higher-order terms in ∆t, I(p, π) ≤ 0: the expected value of information is negative. Notice that we are conditioning here on no sales in the period (hence the name continuation value of information) and therefore the value of information is negative. The marginal effect of a small change in price on the expected value of information is  ∂I(p, π) = − g(p, π)VD (π) + π(1 − π)∆g(p)VD′ (π) ∆t. ∂p The first term is certainly positive, since g(·, ·) ≤ 0. The sign of the second term is ambiguous. We now consider these two cases, by analysing two versions of this model. In the first, the seller is learning about the arrival rate of buyers; in the second case, there is 12

uncertainty about the distribution of buyers’ valuations.

3

Learning about the arrival rate of buyers

In this first case, the states of the world are distinguished by the frequency of arrivals of buyers. In the low state L, the Poisson arrival rate is λL > 0. In the high state H, the Poisson arrival rate is λH > λL . Buyers’ valuations are drawn from a distribution F (·) with a density function f (·), so that, conditional on a buyer being present, the probability of the seller’s price p being met is 1 − F (p); the support of the distribution is normalised to the unit interval. Hence GH (p) = λH (1−F (p)) and L(p) = GL (1−F (p)). Assume that F (p) satisfies the monotone likelihood ratio property. This ensures that 1 − F (p) − pf (p) is strictly decreasing in p, so that the objective function in the static problem is concave and therefore the problem has a unique solution. Let λ(π) ≡ πλH + (1 − π)λL .1 Bayes’ rule, given in equation (1), gives the seller’s posterior, after a period when no sale occurred; in the continuous-time limit (∆t → 0), this is ∆π ≡ π(p, ˙ π) = −π(1 − π)∆λ(1 − F (p)) < 0 ∆t→0 ∆t lim

(9)

where ∆λ ≡ λH − λL > 0. Notice that by setting a high price, the seller decreases the rate at which its posterior goes down, since π˙ p (p, π) = π(1 −π)∆λf (p) > 0. The expected continuation value of information,   ∆λ ′ I(p, π) = −λ(π)(1 − F (p)) V (π) + π(1 − π)V (π) ∆t, λ(π) so that   ∆λ ∂I(p, π) ′ = λ(π)f (p)) V (π) + π(1 − π)V (π) ∆t ≥ 0. ∂p λ(π) 1

It is worth noting that we could allow the overall arrival rate of buyers to respond to the seller’s price, with little change to the analysis. Suppose that the probability of a buyer arriving during an interval ∆t is λi φ(p)∆t, where λi ∈ {λL , λH } and φ(p) is a (known) decreasing function of p. Learning takes place about the value of λi . Suppose further that φ(p)p(1 − F (p)) is concave in p. It is then easy to verify that there are no substantive changes in the analysis.

13

Hence a higher price is more informative, in the Blackwell sense. Specializing the analysis in the previous section to the current case, we can write the value function in the repeated problem as:

VR (π) =

λ(π)(1 − F (pR (π)))pR (π) . r + λ(π)(1 − F (pR (π)))

Furthermore, the seller’s dynamic Bellman equation (5) can be written as 

 ∆λ(1 − F (p)) ′ VD (π) = max VR (p, π) − π(1 − π)VD (π) . p r + λ(π)(1 − F (p, π))

(10)

Because learning takes a particularly simple multiplicative form in this model, it is possible to sign the overall effect of learning unambiguously. Equation (6) shows that the sign of ΓD (p, π) is critical for determining the optimal learning price pD (π). In this model,

ΓD (p, π) = −r∆λf (p)

which is non-positive for all p. Consequently, the first-order condition is ΓR (p, π) + r∆λf (p)π(1 − π)VD′ (π) = 0.

(11)

We can also combine equations (10) and (11) to give (1 − F (p))2 rVD (π) = f (p) λ(π)

(12)

with the solution being pD (π). (Since the left-hand side is a monotonic function of p, due to the monotone likelihood ratio property of F (·), there is a unique solution.) Since VD (·) ≥ 0, equation (12) implies that pD (π) ≥ pR (π) for all π. Hence the price posted by the seller who learns about the state of the world is above the price of the non-learning seller. As a result, the probability that the learning seller makes a sale is below that of the non-learning seller; and this despite the fact that the 14

learning seller becomes more pessimistic over time about sales. Pessimism on its own would cause the seller to set a lower price, since the seller should be more willing to sell now rather than wait. But the seller is able to slow down the rate at which it becomes more pessimistic by setting a high price. This more than off-sets the fall in the seller’s posterior, ensuring a higher price. We summarise this discussion in the following theorem. Theorem 1 In the case of learning about the arrival rate of buyers, the learning price pD (π) is greater than the repeated price pR (π) for all π ∈ [0, 1]. We can compare this result to Trefler (1993)’s. By Theorem 4 of Trefler, the seller moves its price in the direction that is more informative, in the Blackwell sense, relative to the static price pS (π). Since a higher price is more informative, this means that pD (π) ≥ pS (π). This is indeed what we find. We know from lemma 1 that the repeated price is greater than the static price: pR (π) ≥ pS (π); theorem 1 tells us that the dynamic price is higher still. The comparison between the repeated and dynamic price does not appear in Trefler, since he does not consider the sale of a single asset. As a result of the ‘smooth’ environment that we consider, the Bellman equation (10) is a first-order ordinary differential equation (ODE). This makes analysis of the seller’s optimal decision a relatively easy matter. The next theorem gives some of its properties; in section 3.1, we solve the ODE explicitly for a particular distribution of buyers’ valuations. Theorem 2 (Monotonicity) The dynamic learning price, pD (π), and the probability of sale, λ(π)(1 − F (pD (π))), are non-decreasing in the seller’s belief π. Proof. To prove the first part, note that the first-order condition (12) implies that pD (π) is non-decreasing in π iff rVD (π)/λ(π) is non-decreasing in π. But       r VD (π) ∂ rVD (π) ′ ′ = + λL V (π) ; π∆λ VD (π) − ∂π λ(π) λ(π)2 π by the facts that VD is increasing and convex in π, this is non-negative and the result follows. 15

To prove the second part, consider the BE:    ∆λ ′ rVD (π) = max λ(π)(1 − F (p)) p − VD (π) − π(1 − π) V (π) p λ(π) D ≡ P (π)Λ(π) where P (π) ≡ λ(π)(1 − F (pD (π))), Λ(π) ≡ pD (π) − VD (π) − π(1 − π)

∆λ ′ V (π). λ(π) D

Then PD′ (π) =

rVD′ (π) − PD (π)Λ′ (π) Λ(π)

assuming that Λ 6= 0. From the first-order condition,

Λ(π) =

1 − F (pD (π)) . f (pD (π))

By assumption, (1 − F (p))/f (p) is decreasing in p. From the first part of the theorem, p′D (π) ≥ 0. This implies that Λ′ (π) ≤ 0. Hence P ′ (π) ≥ 0: i.e., the probability of sale λ(π)(1 − F (pD (π)) increases in π.



We have seen already that the repeated price pR (π) is also increasing in the seller’s belief π: see proposition 1. Hence, in this model where the seller learns about the hazard rate of buyer arrivals, the learning dynamics are qualitatively similar to the dynamics from the repeated problem. The final result in this section gives the comparative statics of the dynamic price with respect to the discount rate, r. Proposition 2 The dynamic learning price pD (π) is decreasing in the discount rate r. Proof. Equation (12) implies that pD (π) is decreasing in r iff rVD (π) is increasing in r. To show the latter, differentiate the Bellman equation (10), using the Envelope theorem: ∂ (rVD (π)) = max p ∂r



− λ(π)(1 − F (p))

∂VD (π) ∂ 2 VD (π) . − π(1 − π)∆λ(1 − F (p)) ∂r ∂π∂r 16

Basic arguments establish that ∂VD (π)/∂r ≤ 0 and ∂ 2 VD (π)/∂π∂r ≤ 0. The result follows.



The proposition establishes the intuitive property that the dynamic price is monotonically decreasing in the discount rate. An infinitely patient seller (r = 0) will set his price at the upper bound of the support of the valuation distribution. In contrast, a myopic seller (r = ∞) will set his price at the static level, which is clearly lower. The dynamic price moves between these two extremes monotonically in r.2

3.1

Two-point example

We can illustrate these results in a simple case where buyers’ valuations can take two values, v > v > 0, with probabilities β ∈ (0, 1) and 1 − β respectively. In this case, we are able to provide explicit solutions for the seller’s value function and posted price. For the problem to be interesting, assume that βv < v, so that in the static problem, the seller would set a price equal to v. Also assume that if the seller knows that λ = λH , its optimal price is v; but if the seller knows that λ = λL , its optimal price is v. That is, assume βλH v λH v λL v βλL v ≥ > ≥ . r + βλH r + λH r + λL r + βλL By standard arguments, the (dynamic) seller’s price strategy then takes a simple form: set the price at v when the posterior is above some cut-off; and v for posteriors below this level. Let the cut-off level of the posterior be π ∗ ∈ (0, 1). For π ≥ π ∗ , the seller’s value function is the solution to the ODE ′

π(1 − π)∆λV (π) +



 r + λ(π) V (π) − vλ(π) = 0. β

2

Perhaps surprisingly, the dynamic price is not monotonic in the arrival rate parameters λH and λL , a fact confirmed in numerical examples.

17

The general solution to this ODE is of the form

V (π) = k1



1−π π

 r+βλ H β∆λ

π+

βλL v rβ∆λv + π r + βλL (r + βλL )(r + βλH )

for π ≥ π ∗ > 0. The first term represents the option value from learning to the seller. (k1 is a constant of integration that will be determined below.) It is a convex function of π. For π < π ∗ , the seller sets its price at v; its value function is then given by the solution to a second ODE: π(1 − π)∆λV ′ (π) + (r + λ(π))V (π) − vλ(π) = 0.

The general solution to this ODE is of the form

V (π) = k2



1−π π

 r+λ H ∆λ

π+

λL v r∆λv + π. r + λL (r + λL )(r + λH )

The first term represents the option value from learning to the seller. It is unbounded as π approaches zero; hence the constant of integration k2 must equal zero. In words, the seller has no option value when π ≤ π ∗ . The reason is that there is no prospect of the seller’s posterior increasing above π ∗ ; hence the seller’s price is constant (equal to v) once the posterior falls below π ∗ . There is then no option for the seller to value. So, for π < π∗,

V (π) =

λL v r∆λv + π. r + λL (r + λL )(r + λH )

There are two remaining parameters to determine: k1 and the optimal boundary π ∗ . Since π ∗ is chosen optimally, value matching and smooth pasting apply; these two boundary conditions are sufficient to determine k1 and π ∗ . The seller’s value function in this case is illustrated in figure 1. It is a strictly convex function of π in the region π ≥ π ∗ ; for π < π ∗ , it is a linear function. The two portions

18

V (π)

b

0

1

π∗

π

Figure 1: The value function with a 2-point valuation distribution value match and smooth paste at π ∗ , given by ∗

π =

 − (v − β¯b)r 2 + (β(v − v)λL − (v − βv)λH )r + β(v − v)λL λH λL  . − (v − βv)r 2 + β(v − v)(λL + λH )r + β(v − v)λL λH ∆λ

Let λ0 be such that βλ0 v λ0 v = , r + βλ0 r + λ0

i.e., λ0 =

r(v − βv) . β(v − v)

For a seller who is certain of the arrival rate, the optimal price is v for all λ ≥ λ0 , and v for λ < λ0 . Then define π0 to be such that π0 λH + (1 − π0 )λL = λ0 . If π ∗ ≤ π0 , then uncertainty causes the seller to set a higher price in this case with a two-point valuation distribution. Lengthy but straightforward calculation shows that this inequality holds if v βv < r + λH r + βλH which was assumed at the outset. In summary: learning about an unknown arrival rate

19

causes the seller to set a higher price.

4

Learning about the distribution of buyers’ valuations

In this second case, the states of the world are distinguished by the distributions from which a buyer’s valuations are drawn (conditional on a buyer being present). In the high state of the world, valuations are drawn from the distribution FH (·); in the low state, they are drawn from FL (·). We assume now that the corresponding densities fH (·) and fL (·) are continuously differentiable. Let the union of the supports of the distributions be [0, 1] (without loss of generality). Let ∆F (p) ≡ FL (p) − FH (p) > 0 denote the difference between the two distributions; and let ∆f (p) ≡ fL (p) − fH (p) denote the difference between the densities. In both states of the world, buyers arrive randomly according to a Poisson process with parameter λ (the same in both states). Let F (p, π) ≡ πFH (p) + (1 − π)FL (p), and define f (p, π) similarly. Finally, assume that 1−F (p, π)−pf (p, π) is strictly decreasing in p for any π, so that the static selling problem has a unique solution. As in the previous section, Bayes’ rule (equation (1)) gives the seller’s posterior, after a period when no sale occurred; in the continuous-time limit (∆t → 0), this is

π˙ = −λπ(1 − π)∆F (p) < 0.

Notice that the seller again becomes more pessimistic about the state of the world when no sale occurs. Even though the static and the repeated problem can be analyzed exactly as in the previous section, the analysis of the dynamic problem turns out to be more complicated. The main difference lied s in the first order condition of the dynamic problem. The

20

dynamic programming equation is (in the continuous-time limit) 

 ∆F (p) ′ VD (π) = max VR (p, π) − π(1 − π)VD (π) , p 1 − F (p, π) + δ

(13)

where

VR (p, π) ≡

(1 − F (p, π))p . 1 − F (p, π) + δ

We ignore for the moment the issue of non-concavity of the maximization problem (but we return to this below). From equation (6), the first-order condition for the dynamic problem is ΓR (pD (π), π) − ΓD (pD (π))π(1 − π)VD′ (π) = 0

(14)

where in this model ΓD (p) = λ2 f (p, π)∆F (p) + (r + λ(1 − F (p, π))λ∆f (p), = rλ∆f (p) + λ2 (fL (p)(1 − FH (p)) − fH (p)(1 − FL (p))).

(15)

The sign of ΓD tells us whether a higher price increases or decreases the present value of the capital losses from learning. This present value is increasing in p when ΓD is positive, and decreasing in p when ΓD is negative. As demonstrated in section 2, this effect operates through two channels, seen most clearly in the first line of equation (15): the controlled stopping (the term λ2 f (p, π)∆F (p)) and controlled learning (the term (r + λ(1 − F (p, π)))λ∆f (p)). It is again easy to see that the controlled stopping effect, on its own, leads to a lower price for the seller: λ2 f (p, π)∆F (p) ≥ 0. In fact, this is a feature of all models where continuation beliefs are more pessimistic than current ones and where the the action has a negative effect on the probability of stopping. In the case of learning covered in section 3, the controlled learning effect is always positive: a higher price always decreases the learning capital loss. In the case of learning

21

analysed in this section, if ΓD (p) had the same sign for all p, then the comparison of pD (π) and pR (π) would be equally straightforward. But this is not the case. ΓD (0) = λ(r + λ)∆f (0) must be non-negative; and ΓD (1) = rλ∆f (1) must be non-positive (both due to FOSD). Hence ΓD (p) must cross zero from above at least once. This non-monotonicity arises quite naturally in the model of learning about distributions. The expected value of information is, in this model,  I(p, π) = −λ(1 − F (p, π)) VD (π) + π(1 − π)

 ∆F (p) ′ V (π) ∆t. 1 − F (p, π) D

Therefore ∂I(p; π) = λf (p, π)VD (π) − λπ(1 − π)∆f (p)VD′ (π); ∂p unlike in the previous section, the sign of this derivative is ambiguous. In words: price can have a non-monotonic effect on the expected value of information. On an intuitive level, neither very high prices nor very low prices are very informative about the type of distribution: the former are always rejected by buyers and the latter are always accepted. In the rest of this section, we suppose that ΓD (p) has a single crossing point i.e., that there is a unique pˆ ∈ (0, 1) such that ΓD (ˆ p) = 0 and ΓD (p) < (>)0 for all p < (>)ˆ p. In other words, we assume that there is a unique price that maximises the capital losses from ˆ D (p) has multiple crossing points. But the learning. (Things are more complicated when Γ same arguments apply, so we concentrate on the case of a single crossing point.) As an illustration, figure 2 plots ΓD (p) for the case where FL (v) = 2v−v 2 and FH (v) = v 2 , where p v ∈ [0, 1]. For this example, ΓD (p) = 2(λp2 + (1 − 2p)(r + λ)) and pˆ = r + λ − r(r + λ). Because ΓD (p) changes sign, we cannot guarantee that the maximand in equation (13) is quasi-concave. The problem is that learning may be sufficiently important that the seller optimally changes its price as her posterior changes, in order to control the progress of its beliefs. In other words, we cannot ensure in general that the first-order

22

ΓD (p)

p



Figure 2: ΓD (p) when FL (v) = 2v − v 2 and FH (v) = v 2 on [0, 1] derivative ΓR (p, π) − ΓD (p)π(1 − π)VD′ (π) is single downward-crossing in p for any π. The first term ΓR (p, π) (the first-order derivative from the repeated problem) is single downward-crossing in p. But since the slope of ΓD is ambiguous (and π(1 −π)VD′ (π) ≥ 0), the overall slope of ΓR (p, π) −ΓD (p)π(1 −π)VD′ (π) cannot be determined in general. If π(1 − π)VD′ (π) is sufficiently large (i.e., learning is sufficiently important), then it may be that single downward-crossing is violated. An example helps to illustrate the issue. Consider the set of distributions FL (·), FH (·) given by FL (p) = 1 − (1 − p)t , FH (p) = pt

for t ≥ 1 and p ∈ [0, 1]. When t = 1, the distributions are identical: the uniform on [0, 1]. As t increases, the distributions diverge. One relevant measure of the distance between the distributions is Z

1

(FL (p) − FH (p))dp = 0

23

t−1 ; t+1

t=2

t=4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

p

Figure 3: The dynamic first-order derivative this distance tends to 1 as t → ∞. Figure 3 plots the first-order derivative of the dynamic problem, ΓR (p, π)−ΓD (p)π(1− π)VD′ (π), when π = 0.18 and for two different value of t : t = 2 and t = 4.3 As is clear from the figure, the first-order derivative is single downward-crossing in p for the lower value of t, but not for the higher value. The intuition is straightforward: when t is large, the distance between the distributions is large, and so learning is relatively important. As a result, the maximand in the dynamic Bellman equation is not quasi-concave. We therefore want to ensure that the value from the repeated problem matters more than the capital losses from learning when setting the dynamic price. If this is the case, then the dynamic first-order derivative is single downward-crossing. Most directly, we need a condition of the form   ∂ ∆F (p) ′ VR (p, π) ≥ ∂ . π(1 − π)V (π) D ∂p ∂p 1 − F (p, π) + δ In words, this requires that changing the price has a greater effect on the value of the repeated problem than it does on the capital losses arising from learning. But a condition like this has two weaknesses. First, the condition is actually stronger than is needed. As we show in the proof of proposition 3, in fact all we need is for the condition to hold when 3

The details of the numerical calculations used in the figure are available from the authors on request. In this figure, we set r/λ = 1. With these distributions, the repeated first-order derivative ΓR (p, π) can fail to be single downward-crossing (contrary to our maintained assumption) at high values of t. In the results discussed here, we have ensured that ΓR (p, π) is single downward-crossing.

24

either π is small and p large (i.e., π ≤ π ˆ and p ≥ pˆ), or vice versa. Secondly, it involves endogenous variables that we cannot solve for: in particular, VD′ (π). Our approach to this second problem is to derive the following bound on the learning effect π(1 − π)VD′ (π). Lemma 2 For any π ∈ [0, 1], the learning effect π(1 − π)VD′ (π) is bounded above as follows:

π(1 −

π)VD′ (π)

    1 − F (ˆ p, π) + δ ≤ B(π) ≡ min π (V (1) − VR (π)), π(1 − π)(V (1) − VR (pR (1), 0)) 1 − FL (ˆ p) + δ

where V (1) ≡ VR (1) = VD (1) is the (common) value function when π = 1:

V (1) ≡ max p



 (1 − FH (p))p . 1 − FH (p) + δ

The proof of the lemma is in the appendix. In order to interpret the first part of the bound B(π), note that equation (13) implies that ∆F (ˆ p)π(1 − π)VD′ (π) ∆F (pR (π))π(1 − π)VD′ (π) ≥ VR (π) − . VD (π) ≥ VR (π) − 1 − F (pR (π), π) + δ 1 − F (ˆ p, π) + δ The first inequality comes from maximisation. The second inequality follows because ∆F (p)π(1 − π)VD′ (π) 1 − F (p, π) + δ is maximised at pˆ. The first lower bound for VD (π) is then the value from the repeated problem, VR (π), minus the maximised present value of the stream of the capital losses in the dynamic problem. This lower bound on VD (π), combined with convexity of the value function, then gives the upper bound in the lemma on the effect of learning, measured by π(1 − π)VD′ (π). Next, we make the following assumption. For this assumption, define π ˆ as follows. If pˆ ∈ (p(0), p(1)), then π ˆ is such that pR (ˆ π ) = pˆ. If pˆ ≤ p(0), then π ˆ = 0 ; if pˆ ≥ p(1), then 25

π ˆ = 1. (Here, p(1) ≡ pR (1) = pD (1) and p(0) ≡ pR (0) = pD (0).) Assumption 2 The first-order derivative ΓR (p, π) − B(π)ΓD (p) satisfies the following partial form of single-crossing: 1. If π ≤ π ˆ , then ΓR (p, π) − B(π)ΓD (p) ≤ 0 for all p ≥ pˆ. 2. If π > π ˆ , then ΓR (p, π) − B(π)ΓD (p) ≥ 0 for all p ≤ pˆ.

In order to see what assumption 2 requires, recall that we are concerned about ensuring that the first-order derivative ΓR (p, π)−B(π)ΓD (p) does not have multiple crossing points in such a way that unambiguous comparison of the dynamic and repeated prices is not possible. We could simply require that the first-order derivative is single downwardcrossing in p for all π. But this is actually a stronger condition than we need for theorem 3 below. Figure 4 shows what assumption 2 requires. Note that ΓR (p, π)−B(π)ΓD (p) = 0 at the point (p = pˆ, π = π ˆ ). The assumption requires, then, that the first-order derivative be single crossing in p only when moving in a south-east/north-west direction in the figure. We do not need to constrain how the derivative behaves to the south-west or north-east in order to get a clear comparison between pR (π) and pD (π). We can illustrate the assumption using the previous example. For the sake of brevity, we concentrate on the second part: ΓR (p, π) − B(π)ΓD (p) ≥ 0 when π > π ˆ and p ≤ pˆ. Numerical analysis4 shows that for t less than around 3.8, assumption 2 is satisfied; but for higher values of t, it is violated. Figure 5 plots ΓR (p, π) and B(π)ΓD (p) for t = 4, when π = 0.18.5 The figure shows that assumption does not hold, since for certain values of p (0.3809 ≤ p ≤ 0.5087), ΓR (p, π) is less than B(π)ΓD (p). Again, the intuition is that when the two distributions are sufficiently far apart, learning is relatively very important and quasi-concavity fails. Assumption 2 cannot, unfortunately, be directly to the primitives of the model (δ and the distributions FH (·) and FL (·)). But, as the preceding discussion shows, at least it is easy to verify whether the assumption is satisfied. 4 5

We set r/λ = 1. Details are available on request from the authors. With this value of t, and with r/λ = 1, pˆ = 0.5593 and π ˆ = 0.1760.

26

π

ΓR (p, π) − B(π)ΓD (p) ≥ 0

π ˆ

ΓR (p, π) − B(π)ΓD (p) ≤ 0

p

pˆ Figure 4: Assumption 2

ΓR (p, π)

B(π)ΓD (p)

0

0.1

0.2

0.3

0.4

0.5

Figure 5: Failure of Assumption 2

27

p

Armed with the assumption, we are finally able to compare the repeated and dynamic prices. Theorem 3 Suppose that assumption 2 holds. If the seller’s belief is sufficiently high (above π ˆ ), then the dynamic price is greater than the repeated price: pD (π) > pR (π). Conversely, if the seller’s belief is sufficiently low (below π ˆ ), then the dynamic price is lower than the repeated price: pD (π) < pR (π). Proof. We provide the proof of the first part of the proposition (if π ≤ π ˆ , then pD (π) ≤ pR (π)). The proof of the second part is similar and so is omitted. Since pR (π) is non-decreasing in π (from proposition 1), if π ≤ πˆ , then pˆ ≥ pR (π). By assumption 2, ΓR (p, π) − G(π)ΓD (p) ≤ 0 for all p ≥ pˆ. But, for p ≥ pˆ, ΓD ≤ 0, and so ΓR (p, π) − G(π)ΓD (p) ≥ ΓR (p, π) − π(1 − π)V ′ (π)ΓD (p). Hence ΓR (p, π) − π(1 − π)V ′ (π)ΓD (p) ≤ 0 for all p ≥ pˆ. Therefore pD (π) < pˆ. But for p < pˆ, ΓD ≥ 0, and so ΓR (p, π) − π(1 − π)V ′ (π)ΓD (p) ≤ ΓR (p, π). Hence pD (π) ≤ pR (π).



Theorem 3 shows the difference between this case, in which ΓD can be of either sign, and the case considered in section 3, where ΓD is negative. In the latter case, the dynamic price is always greater than the repeated price, whatever the seller’s beliefs. This is because the controlled learning effect always dominates the controlled stopping effect in this case; and controlled learning always calls for a higher price. In the case considered in this section, a higher price can either increase or decrease the capital losses from learning. When a higher price decreases the capital loss—when ΓD is negative—the dynamic price is above the repeated. This occurs when the price is relatively high: ΓD (p) is negative for p ≥ pˆ. On the other hand, when a higher price increases the capital loss, the dynamic price is below the repeated. This occurs when the posterior and the price are relatively low. This is illustrated in figure 6 (in which FH (v) = v 2 and FL (v) = 2v − v 2 , and r/λ = 1) which plots the difference between pD (π) and pR (π). It shows that pD (π) is greater than pR (π) for π greater than around 0.5. Note also that the absolute difference between pD (π) and pR (π) is greater when π is low. In this region, the controlled stopping and 28

0.002 0.001 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

π

-0.001 -0.002 -0.003 -0.004

Figure 6: The difference between the repeated and dynamic prices controlled learning effects point in the same direction: both call for a lower price. At higher values of π, the two effects point in opposite directions (and the controlled learning effect dominates). As a result, the difference between the dynamic and repeated prices is smaller.

5

Conclusion

We have written the model in terms of a seller learning about demand for its good. The model can easily be generalised to accommodate some other standard economic models. We can consider a job search problem, when an unemployed worker sets its reservation wage while learning about employment opportunities. By introducing a flow cost for actions, it is possible to transform the model of learning about arrival rates into a model of R&D where the true success probability is initially unknown. Furthermore, there is no particularly compelling reason to concentrate only on Bayesian learning models. The change in posteriors could equally well reflect accumulation of past accumulated actions. Hence this model can be used as a basis for a non-Poisson model of R&D.6 The entire paper is written in terms of ever more pessimistic continuation beliefs. 6 A previous version of this paper includes such an application and is available from the authors upon request.

29

Obviously we could have taken an application where continuation beliefs drift upwards conditional on no stopping. As an example, consider the maintenance problem of a machine whose durability is uncertain. (Or the problem of finding the optimal effort level to demand from an agent or seller.) As long as the machine does not break down, beliefs about its longevity become more optimistic. Apart from the obvious change in signs, the controlled stopping and learning effects are present in such models as well. In this paper, we have analysed models of Poisson-type information revelation where conditional on no stopping, the process of beliefs is deterministic. A useful extension of the model could consider beliefs that are stochastic even conditional on no stopping. Models of controlled Brownian motion would be a natural analytical framework for this. Such an extension would be particularly valuable when extending the model to multiagent situations. While a multi-agent version of the R&D race with learning seems relatively straightforward (since the game ends at the moment of first innovation), models of informational externalities do not seem to fit the Poisson framework that well. In order for our methods to work, beliefs must evolve smoothly over time. A model where the beliefs are driven by controlled Brownian motions, observations on other players could be smoothed by observational noise in such a way that the model is still applicable. These models might lead to even higher current prices by the sellers. Since the value function is convex in beliefs, information flowing from the decisions of other players tends to increase the continuation value to any player. Hence this would counteract the controlled stopping effect and push prices up further. We leave the analysis of these extensions to future research.

Proof of lemma 2 To show the first part of the bound, note that from the Bellman equation (13) and maximisation, δVD (π) ≥ (1 − F (p, π))(p − VD (π)) − ∆F (p)π(1 − π)VD′ (π)

30

for any p ∈ [0, 1]. In particular, this inequality holds for p = pR (π). Rearranging, this gives ∆F (pR (π)) (1 − F (pR (π), π))pR (π) − π(1 − π)VD′ (π) 1 − F (pR (π), π) + δ 1 − F (pR (π), π) + δ ∆F (pR (π)) = VR (π) − π(1 − π)VD′ (π). 1 − F (pR (π), π) + δ

VD (π) ≥

The following lemma establishes that VD (π) is convex. Lemma 3 VD (π) is convex in π. Proof. Consider two points, π and π ′ and let π λ = λπ + (1 − λ)π ′ . We want to show that for all π, π ′ and π λ , we have: VD (π λ ) ≤ λVD (π) + (1 − λ)VD (π ′ ). Denote by pλ the path of optimal prices starting from π λ . Denote by W H (p) the expected profit from an arbitrary path of prices p conditional on the true distribution being H and similarly for W L (p). Then we have: VD (π λ ) = π λ W H (pλ ) + (1 − π λ )W L (pλ ).

We also know that VD (π) ≥ πW H (pλ ) + (1 − π)W L (pλ ), VD (π ′ ) ≥ π ′ W H (pλ ) + (1 − π ′ )W L (pλ ) since pλ is a feasible price path.

31

Hence λVD (π) + (1 − λ)VD (π ′ ) ≥ λ(πW H (pλ ) + (1 − π)W L (pλ )) + (1 − λ)(π ′ W H (pλ ) + (1 − π ′ )W L (pλ )) = (λπ + (1 − λ)π ′ )W H (pλ ) + (1 − λπ − (1 − λ)π ′ )W L (pλ ) = π λ W H (pλ ) + (1 − π λ )W L (pλ ) = VD (π λ ).

This proves the claim.



Since VD (π) is convex, VD′ (π) ≤ (V (1) − VD (π))/(1 − π) . Therefore VD (π) ≥ VR (π) −

∆F (pR (π)) π(V (1) − VD (π)). 1 − F (pR (π), π) + δ

Note that ∆F (p)/(1 − F (p, π) + δ) is maximised at pˆ. Hence

VD (π) ≥ VR (π) −

∆F (ˆ p) π(V (1) − VD (π)). 1 − F ((ˆ p, π) + δ

Rearranging this inequality, and using convexity again, means that

π(1 −

π)VD′ (π)



 1 − F (ˆ p, π) + δ ≤ π(V (1) − VD (π)) ≤ π (V (1) − VR (π)), 1 − FL (ˆ p) + δ

which completes the first part of the proof. For the second half of the bound, note that maximisation implies that

VD (π) ≥ max πVR (p, 1) + (1 − π)VR (p, 0). p

and VD (1) = VR (1). Therefore VD′ ≤ f t(1) ≤ VR′ (1). Differentiating at π = 1, and using the envelope theorem and the convexity of VD (π)

32

gives: VD′ (π) ≤ VR (1) − VR (pR (1), 0). This completes the proof.

References Aghion, P., P. Bolton, C. Harris, and B. Jullien (1991): “Optimal learning by experimentation,” Review of Economic Studies, 58, 621–654. Anderson, A., and L. Smith (2006): “Explosive Convexity of the Value Function in Learning Models,” May 21. Ausubel, L. M., and R. J. Deneckere (1989): “Reputation in Bargaining and Durable Goods Monopoly,” Econometrica, 57(3), 511–31. Balvers, R. J., and T. F. Cosimano (1990): “Actively learning about demand and the dynamics of price adjustment,” Economic Journal, 100, 882–898. Burdett, K., and T. Vishwanath (1988): “Declining Reservation Wages and Learning,” Review of Economic Studies, 55, 655–665. DeGroot, M. (1962): “Uncertainty, Information and Sequential Experiments,” Annals of Mathematical Statistics, 33, 404–419. Easley, D., and N. Kiefer (1988): “Controlling a stochastic process with unknown parameters,” Econometrica, 56, 1045–1064. Gul, F., H. Sonnenschein, and R. Wilson (1986): “Foundations of Dynamic Monopoly and the Coase Conjecture,” Journal of Economic Theory, 39(1), 155–190. Karlin, S. (1962): “Stochastic models and optimal policy for selling an asset,” in Studies in Applied Probability and Management Science, ed. by K. J. Arrow, S. Karlin, and H. Scarf, pp. 148–158. Stanford University Press. 33

Keller, G. (2007): “Passive Learning: a critique by example,” Economic Theory, 33, 263–269. Keller, G., and S. Rady (1999): “Optimal Experimentation in a Changing Environment,” Review of Economic Studies, 66, 475–507. Lancaster, T., and A. Chesher (1983): “An Econometric Analysis of Reservation Wages,” Econometrica, 51, 1661–1676. Lippman, S., and J. McCall (1976): “The Economics of Job Search: A Survey. Part I,” Economic Inquiry, 14, 155–189. McLennan, A. (1984): “Price dispersion and incomplete learning in the long run,” Journal of Economic Dynamics and Control, 7, 33l–347. Merlo, A., and F. Ortalo-Magne (2004): “Bargaining over Residential Real Estate: Evidence from England,” Journal of Urban Economics, 56, 192–216. Mirman, L., L. Samuelson, and A. Urbano (1993): “Monopoly Experimentation,” International Economic Review, 34, 549–563. Rogerson, R., R. Shimer, and R. Wright (2005): “Search-Theoretic Models of the Labor Market: A Survey,” Journal of Economic Literature, XLIII, 959–988. Rothschild, M. (1974): “A Two-armed Bandit Theory of Market Pricing,” Journal of Economic Theory, 9, 185–202. Rustichini, A., and A. Wolinsky (1995): “Learning about variable demand in the long run,” Journal of Economic Dynamics and Control, 19, 1283–1292. Smith, L. (1999): “Optimal job search in a changing world,” Mathematical Social Sciences, 38, 1–9. Sobel, J., and I. Takahashi (1983): “A Multistage Model of Bargaining,” Review of Economic Studies, 50(3), 411–426.

34

Trefler, D. (1993): “The Ignorant Monopolist: Optimal Learning with Endogenous Information,” International Economic Review, 34(3), 565–581. Van den Berg, G. (1990): “Nonstationarity in Job Search Theory,” Review of Economic Studies, 57, 255–277.

35

Learning and Smooth Stopping

Oct 12, 2009 - University of Exeter Business School, Streatham Court, Rennes ...... the distance between the distributions is large, and so learning is relatively ...

1MB Sizes 1 Downloads 230 Views

Recommend Documents

Nonparametric Active Learning, Part 1: Smooth ... - Steve Hanneke
Abstract: This article presents a general approach to noise-robust active learning for classification problems, based on performing sequential hypothesis tests, modified with an early cut-off stopping criterion. It also proposes a specific instantiat

Dynamic Moral Hazard and Stopping - Semantic Scholar
Jan 3, 2011 - agencies “frequently” to find a wide variety of workers. ... 15% and 20% of searches in the pharmaceutical sector fail to fill a post (Pharmafocus. (2007)). ... estate agent can affect buyer arrival rates through exerting marketing

Dynamic Moral Hazard and Stopping - Semantic Scholar
Jan 3, 2011 - agencies “frequently” to find a wide variety of workers. ... 15% and 20% of searches in the pharmaceutical sector fail to fill a post (Pharmafocus. (2007)). ... estate agent can affect buyer arrival rates through exerting marketing

smooth jazz summer.pdf
Page 1. Whoops! There was a problem loading more pages. smooth jazz summer.pdf. smooth jazz summer.pdf. Open. Extract. Open with. Sign In. Main menu.

smooth jazz summer.pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect ...

A preference change and discretionary stopping in a ...
Remark 2. Note that ¯y > 0 is clear and we can easily check that C > 0 if .... HJB equations defined on the domain fx : ¯x < xg and on fx : 0 < x < ˜xg, respectively.

Stopping Unsafe Practices
Q. Failure to operate within the incident command system, e.g., accountability, freelancing. R. Not observing safe practices unique to wildland fire incidents. S. Being complacent on automatic alarm or gas leak incidents. T. Failure to properly maint

smooth criminal hd.pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect ...

PDF Bullies in the Workplace: Seeing and Stopping ...
Book synopsis. A team of interdisciplinary experts provides an up-to-date review of current theories, empirical research, and management strategies that will help organizations address workplace bullying through both prevention and intervention. * Ta

Cheap In Stock Zhiyun Smooth 3 Smooth Iii 3 Axis Brushless ...
Cheap In Stock Zhiyun Smooth 3 Smooth Iii 3 Axis Bru ... Plus 6 5S Samsung S7 6 5 4 Huawei Free Shipping.pdf. Cheap In Stock Zhiyun Smooth 3 Smooth Iii 3 ...

Symbolic Extensions and Smooth Dynamical Systems
Oct 13, 2004 - PSYM = {systems which admit a principal symbolic extension} (which ... analytic properties of the so called entropy structure, a sequence of ...

Can Families Smooth Variable Earnings.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Can Families Smooth Variable Earnings.pdf. Can Families Smooth Variable Earnings.pdf. Open. Extract. Open wi

Symbolic Extensions and Smooth Dynamical Systems
Oct 13, 2004 - Denote this quantity by HDu(Λ). Replacing f by f−1, we obtain the stable Hausdorff dimension of Λ to be the unique number δs = HDs(Λ) such.

4D Printing: Design and Fabrication of Smooth Curved ...
Self-folding structures have great potential in applications such as micro biomedical devices ..... addition, if the shrinkage of the film is too large, the shear stress induced .... Based on the developed simulation tool, an input 2D structure in Fi

TWO EFFICIENT STOPPING CRITERIA FOR ...
Email: [email protected]. ABSTRACT ... teria need not to store any extra data, and can reduce the .... sion of the last iteration, and the S3 criterion needs to store.

Learning Goals and Learning Objectives
Apr 22, 2014 - to guide development of student learning goals. These goals ... Our students will know the professional code of conduct within their discipline.

Decomposing Duration Dependence in a Stopping ...
Apr 7, 2017 - as much of the observed decline in the job finding rate as we find with our preferred model. Despite the .... unemployment using “...a search model where workers accumulate skills on the job and lose skills during ...... Nakamura, Emi

A Robust Stopping Criterion for Agglomerative ...
ters are homogeneous in terms of speaker identity before every merging, by ... a pair of heterogeneous clusters where there are a small num- ber of feature samples ... computes difference between H0 and HA in terms of average log-likelihood ...