A Quantitative Model of Dynamic Customer Relationships Niels Stender University of Aarhus [email protected] December 15, 2006 Abstract A dynamic discrete choice model of the evolution of customer choice in a multiproduct and multi-consumption intensity setting is presented. The transition probability function is formulated conditionally on individual observed and unobserved heterogeneity, using a Hierarchical Bayesian formulation of the latter. The model is used in a study of the customer transaction history of a telecommunications company. Churn and revenue forecasts are derived on an individual level. Estimation is carried out using Metropolis-Hastings MCMC.

Contents 1 Introduction

32

2 Building Models for Marketing Decisions 2.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 State Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Individual Level Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . .

33 33 34 34

3 Model 3.1 Transition Probabilities . . . . . . . . . . . . . 3.2 Interpretation . . . . . . . . . . . . . . . . . . 3.3 Related Model Interpretations . . . . . . . . . 3.4 Likelihood and Posterior . . . . . . . . . . . . 3.4.1 Simpli…ed Choice Probabilities . . . . . 3.4.2 Likelihood and Posterior . . . . . . . . 3.5 Metropolis-Hastings Simulation . . . . . . . . 3.5.1 A Prelude - The Accept-Reject Method 3.5.2 The Metropolis-Hastings Algorithm . . 3.5.3 Complications . . . . . . . . . . . . . . 3.5.4 Choice of Metropolis-Hastings Subtype 3.5.5 Estimation of Model Parameters . . . . 3.6 Sampling . . . . . . . . . . . . . . . . . . . . .

36 36 37 38 38 38 39 40 40 40 41 42 42 43

31

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

4 Application 4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Event Histories . . . . . . . . . . . . . . 4.2 Multiple Choice Components . . . . . . . . . . . 4.2.1 Continuation Choice . . . . . . . . . . . 4.2.2 Product Choice . . . . . . . . . . . . . . 4.2.3 Consumption Intensity . . . . . . . . . . 4.2.4 Choice Components and Transition Data 4.2.5 Interaction Variables . . . . . . . . . . . 4.2.6 Sociodemographics . . . . . . . . . . . . 4.2.7 Duration Dependence . . . . . . . . . . . 4.3 More on Estimation . . . . . . . . . . . . . . . . 5 Results 5.1 Prediction . . . . . . . . . . . . . . . . . . . . 5.2 Exit Behavior . . . . . . . . . . . . . . . . . . 5.3 Who Buys ADSL? . . . . . . . . . . . . . . . 5.4 Lift - Critical Values . . . . . . . . . . . . . . 5.5 Importance of Individual Level Heterogeneity 5.6 Customer Value . . . . . . . . . . . . . . . . . 5.6.1 Example: Marketing Decision . . . . . 5.7 Limitations and Future Research . . . . . . . 5.8 Conclusion . . . . . . . . . . . . . . . . . . . . A Descriptive Statistics A.1 Sociodemographic Variables A.2 Correlation Table . . . . . . A.3 Histograms . . . . . . . . . A.4 Empirical Exit Probabilities

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

B Model Statistics B.1 Parameter Estimates . . . . . . . . . . . . B.2 Empirical Distribution for Individual Level B.3 Churn Sensitivity . . . . . . . . . . . . . . B.4 PSTN to ADSL Transition Sensitivity . . . B.5 5-year Conditional CLV Prediction . . . .

1

. . . .

. . . .

. . . . . . . . . . . . .

. . . Coe¢ . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . cients . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

44 44 44 45 46 46 46 47 47 49 50 51

. . . . . . . . .

51 52 52 54 54 56 56 57 57 58

. . . .

60 60 61 62 63

. . . . .

64 64 66 67 69 70

Introduction

The formation of an ongoing relationship between a costumer and a company is an integral feature of many products and services. The relationship can be dynamic in the sense that consumption intensity and type of product or service involved changes dynamically. Financial performance of such companies depends decisively on the distribution of type and intensity in the customer base, so an understanding of how and why the relationship change through time is an important contributor to informed decision making. Properties of such customer relationships have been studied using a patchwork of marketing research methods such as customer satisfaction, segmentation and churn studies. Recently, there has been an increased focus on creating marketing metrics, among which

32

the concept of customer lifetime value (CLV) calculation is very prominent. The Marketing Science Institute, a non-pro…t organization, ranked marketing metrics as the most important topic in 2002-04 and third place for 2004-06 in The 2002-2004 Research Priorities (2002) and The 2004-2006 Research Priorities (2004). Business functions such as accounting and budgeting can bene…t from revenue forecasts, …nancial analysts and risk managers have an interest in customer base pricing. Marketing management executives are trying to link marketing resource allocation decisions to expected changes in customer value. Schmittlein & Peterson (1994) presented one of the earliest advanced models of relationship duration and revenue, while Reinartz et al. (2003), Donkers et al. (2003) and Kumar et al. (2006) are recent examples. The Kumar et al. paper also presents a brief overview of the recent literature on CLV. A related strain of research is the literature on purchase timing, originating from the challenges posed by retail sector scanner data, such as Manchanda et al. (1999) and Chib et al. (2002). The purpose of this paper is to explore an individual level multi-product, multiconsumption intensity model with a view to CLV calculations. The point of departure is the customer transaction data that already exist in a company’s data archives, in this case, a telecommunications company. The model di¤er from former contributions in its focus on accomodating individual level heterogeneity along with a relatively large state space. More advanced individual level heterogeneity formulations have gained popularity recently due to the possibilities enabled by the rediscovery of the Metropolis-Hastings algorithm, see Chib & Greenberg (1995). In section 2, some of the issues in modeling philosophy are presented. Section 3 describe a dynamic choice model, basically a dynamic version of the multinomial logit model, how to specify individual level heterogeneity and how to estimate it. Section 4 estimates a three component choice model tailored to the needs of telecommunications data, investigates aspects of its out-of-sample predictive ability with respect to consumer behavior and concludes by forecasting CLV 5 years ahead.

2

Building Models for Marketing Decisions

In this section it is argued that state dependence and unobserved heterogeneity are key elements to business models of consumer behavior, and, that hierarchical Bayesian methodology and modern MCMC-estimation are suited to support construction of models containing those elements.

2.1

Challenges

It is challening to model consumer behavior and customer lifetime value. Lack of comprehensive data on the environment and on consumers’information sets, such as attributes of competitive o¤erings, the individual’s knowledge of and preference for these attributes, knowledge of competitive product attributes and switching costs. A second problem is the possible lack of stationarity. Competitors simultaneously change the properties of their product o¤erings, new products are o¤ered, customers gradually learn about new o¤erings and may even change inherent preferences. There are very few studies of consumer behavior where the researcher can claim to know most qualitatively important aspects of the information set that the agent is basing 33

his decisions upon. A study by Miravete & Palacios-Huerta (2002) is one of the few exceptions this author is aware of, when excluding the stream of research related to experimental economics. The mentioned elements spell trouble for those that wish to conduct inference and predict customers’future behavior and value. Rather than constructing an ideal, simple model containing a few variables based on …rst principles, the modeling of customer behavior calls for a pragmatic approach to the problem. For the modeling of key customer behaviors, in many cases, a business analyst or social scientist has few options but to take the available data as a point of departure. To compensate for the high degree of noise and lack of knowledge, any such model must have a great deal of statistical ‡exibility and robustness built into it. When discussing the construction of models for the understanding of customers, several areas from statistics and econometrics intersect. One is that of discrete panel data methods and discrete choice models, while the other is event history analysis with models of durations as a special case. Duration models are concerned with time dependence, speci…cally how a phenomena spends time occupying a given state and how that a¤ects the probability of continually occupying the state.

2.2

State Dependence

Functional form issues aside, the key problems in the modeling of transition data are that of state dependence and unobserved heterogeneity. If the distribution of the state of a service in period t+1 depends on the state in period t or earlier, some form of state-dependence is at play. In the absence of state dependence, the sequence of events (0, 1, 0, 1) for a given service would be just as likely as (1, 1, 0, 0). This will not hold in most realistic cases. For a customer having bought into a given service, …nancial switching costs, through fees required to terminate a contract or entering into new ones, can induce state dependence. Other possibilities include indirect costs such as time lost due to the actual transition, costs of search and information, psychological or rational habit formation, learning and perceived risks are other arguments for state dependence. Lack of state dependence should only be expected in the extreme case of zero transaction costs, a transparent market, identical products, a constant demand and a type of product that cannot be hoarded.

2.3

Individual Level Heterogeneity

Two customers with the same observed characteristics vector might display systematically di¤ering behavior over time. This could be due to elements inherent to the laws governing behavior, but it could also stem from the in‡uence of unobserved characteristics. As has been argued in previous sections, it is not only possible that relevant aspects of the customer’s characteristics are unobserved, it is to be expected. Unaccounted unobserved heterogeneity can lead to fundamental errors of inference on the relation between observed heterogeneity and behavior as argued in general by Heckman (1981), Lancaster (1990) and Hsiao (2003) and in particular for marketing data by Keane (1997). Hence, a useful model of customer behavior must address state dependence and unobserved heterogeneity. In accomodating the stated criteria, the researcher has to make some broad choices in what methodology to use. A nonparametric or parametric formulation? A Frequentist or Bayesian estimation approach? How should unobserved heterogeneity be speci…ed? 34

Nonparametric methods have been increasingly popular in social sciences, as they lessen the number of assumptions necessary to make on the data generating process. The degree of functional form ‡exibility is maximized at the cost, usually, of computation time and interpretability of model results. Unfortunately, little tractable guidance and standards exist in the nonparametric statistical and econometric literature about how to formulate, and estimate, models of dynamic phenomena such as consumer behavior observed over time. A parametric approach is thus followed in this paper. The context in further discussion is implicitly given as that of a discrete choice model, with individual unobserved heterogeneity entering through some linear relationship as an intercept term, though some statements might hold for more complicated and general settings. Allenby & Rossi (1999) and Rossi & Allenby (2000) outline arguments in favor of a hierarchical Bayesian approach using MCMC-methods over a Frequentist approach, when dealing with marketing applications. This paper concurs with their arguments as they are summarized in the following paragraphs. The Frequentist approach by default has as its objective to estimate some average e¤ect across the population, hence leading the researcher to treat individual unobserved heterogeneity as a nuisance parameter that may skew estimates of population parameters. Note that this is not inherent to the Frequentist approach, but a practice carried out in large parts of the applied literature. In a typical marketing scenario, the anatomy of individual unobserved heterogeneity can have important policy implications and might even be the object of the study. The marketeer will often be at least as interested in identifying likely extreme behavioral patterns among customers, as in understanding average behavior. Studies of churn can serve as an example: Accurately capturing tail behavior of individual heterogeneity is essential in detecting the most likely churners. Even more challenging, the marketing analyst will often …nd it useful and pro…table to get some estimate of individual level parameters, rather than just estimating the population distribution parameters. That argument alone will not lead to the use of a Bayesian model formulation. In a Frequentist Fixed E¤ects approach, individual level parameter estimates can often be derived. The matter is complicated by the common panel data problem, that the depth of individual event histories is small relative to the number of cases. Parameters dependent solely on these few datapoints are suspect to great variability. As an analog to specifying a Bayesian prior, the Frequentist can follow the Random E¤ects approach and introduce a random e¤ects/mixing distribution for individual level parameters and estimate the mixing distribution parameters. In what Allenby and Rossi coin an approximate Bayesian approach, individual level estimates can be dervied from the individual level likelihood, conditional on estimated mixing parameter averages. The lowered parameter variance is induced by what is known as parameter shrinkage. The advantage of introducing a mixing distribution comes at the cost of possible misspeci…cation and the necessity of evaluating possibly complicated integrals. The integrals will often have no closed form solution and must be solved numerically. Another issue is the possible high number of individual level parameters that would be part of the estimation. Few optimization routines are geared to solve for parameters numbering in the thousands. The misspeci…cation issues are in practice checked through predictive validity and graphic diagnostics, as presented in Allenby & Rossi (1999). As mentioned in section 2.1, a second pillar is to build as ‡exible models as possible. Thinking of the mixing distribution as a Bayesian prior allows the researcher to avoid the cumbersome integrals and the approximate element in the Random E¤ects approach, 35

when using modern MCMC methods. The advent of these methods has alleviated the former Achille’s heel of complicated hierarchical Bayesian models: The calculation of the posterior. An example of a MCMC method, the Metropolis-Hastings algorithm, is given in section 3.5. The choice of a discrete or continous individual level heterogeneity distribution is discussed next. The use of a discrete point mass heterogeneity distribution for the intercept term has been advocated by Heckman & Singer (1984), when modeling the related problem of durations. It has been the typical choice in many econometric applications. In the marketing science literature, latent class models have been popular. Such models, sometimes implicitly, assume data are generated by agents with unobserved group membership, each group being homogenous with respect to some behavioral parameters. This yields a discrete heterogeneity distribution with probability mass at sets of parameters. A di¤erent approach is advocated by Allenby & Rossi (1999). They argue that a continuous mixing distribution is called for instead, since the tails of the true heterogeneity distribution is unlikely to be well described by a discrete approach. A continous distribution is utilized in this paper, since tail behavior is deemed important.

3

Model

Let N denote the number of customers. For each customer i 2 f1; 2; :::; N g we observe customer choice, or customer states, yit 2 at discrete intervals t 2 f0; 1; 2; :::; Ti g: Ti fyit gt=0 is called an event history. Ti is the maximum time index for customer i’s event history. The choice set, or state space, is a set of discrete values the signify distinct aspect of customer behavior. An example using a choice set of = f0; 1; 2g and its interpretation is given in eq. 1. 8 < 0 do nothing 1 buy A yit = (1) : 2 buy B More examples of choice sets can be found in section 4.2, p. 45. Note that the terms choice, behavior and state are used interchangeably for the remainder of the paper. The terms will always be refering to variations in yit . The challenge now is to understand the evolution in event histories conditional on some observed characteristics. In the following sections a framework for modeling and estimation is laid out. Initially, the framework will only treat a one-dimensional choice situation, while the application section will work with three-dimensional choice.

3.1

Transition Probabilities

Attention is now turned to a formulation of transition probabilities for event histories. It is assumed that the probability distribution of yit depends on yi;t 1 ; a vector of exogeneous variables xit and some additional parameters. Such a mathematical object is called a discrete semi-Markov chain of order 1 (one). Discrete since yit takes on discrete values. Markovian of order one since the probability distribution of yit depends on yi;t 1 but not yi;t i for i 2: Semi-Markov since the probability distribution of yit is not governed by yi;t 1 alone, but may also be in‡uenced by xit . Imposing a Markov order of one is a step that greatly accomodates matematical and computational tractability, but it comes at the cost of less model realism. Example: A cus36

tomer i that displayed a variable consumption pattern of (yi0 ; yi1 ) = (0; 2) might be more likely to enter state 0 again in period t = 2, when compared to customer j with pattern (yj0 ; yj1 ) = (1; 2). To compensate for this phenomena, it is possible to let time-varying dummy-variables enter xit , at the cost of being forced to estimate more parameters. The dummy-variables entering this particular model is decribed in a subsequent section. Customer choice is speci…ed using a random utility model, a standard framework for linking observed customer decisions to customer and choice context characteristics. See Train (2003) for textbook treatment of discrete choice models and random utility. If for customer i at time t 1 we observe state s; that is, yi;t 1 = s; then the utility at time t of choosing state d is uisdt . The utility is speci…ed in eq. 2. The choice made at time t is the choice yielding the maximum utility, eq. 3.

uisdt =

isd

0 sd xit

+

+ "sdt

0

for d 6= s for d = s

(2) (3)

yit = arg max uisdt d

isd represents an individual-speci…c intercept. sd is a vector of coe¢ cients that govern the impact of the observed characteristics vector xit and is of commensurate length. 1 and t; through sd is common across all customers, but speci…c to the state at times t indices s and t. The random element in utility is introduced through the error term "sdt and is assumed to be independently distributed Gumbel/type I extreme value. The …rst element of xit is set to 1 (one), such that the …rst element of sd corresponds to the average intercept. This forces the average of isd to 0 (zero) and is done as a matter of computational convenience. A hierarchical Bayesian approach is adopted in specifying isd . An argument for this approach is presented in section 2.3. isd is given a Normal prior , eq. 4. The dispersion parameter in eq. 4 is also given a prior in the form of an Inverse Gamma distribution in eq. 5. The Inverse Gamma with respect to dispersion is conjugate to the zero-mean Normal distribution, in the sense that the resulting posterior is also a Normal distribution. 2

exp p

( j )=

2

( j ;

)=

2

2

(4)

2 2

2

2 2

2

1

exp

2

where eq. 5 is identical to an Inverse Gamma distribution with shape parameter 2

(5)

;

2

and

2

scale parameter 2 . The hyperparameters, and ; can be interpreted as, respectively, the prior variance and an integer degrees-of-freedom measure of the strength of that belief. Model estimates in subsequent sections are based on 2 = 0:1 and = 1; a vague prior when compared to the number of observations.

3.2

Interpretation

The utility, as speci…ed in eq. 2, is ordinal and can be used for ranking purposes only. Adding or subtracting a constant across a set of choice utilities will not change their ordering. It is therefore common practice to let one choice in the choice set serve as 37

benchmark. In this paper the practice is followed with a slight variation. If yi;t 1 = s then at time t the utility of choosing state s again is set to 0 (zero). An element from sd then tells us, on average, how much utility is gained from changing state away from the current state s to state d in units of the current-state utility per unit of the corresponding element in xit : It is important to notice then, that s1 d is not directly comparable to s2 d for s1 6= s2 ; since the coe¢ cients work on scales normalized with di¤ering standards.

3.3

Related Model Interpretations

The presented model is equivalent to a series of mixed logit models, a standard model in the discrete choice literature. It can also be shown that the model is equivalent to a series of competing risks models observed at discrete intervals and with hazard function 0 hisd (t) = exp isd + sd xi[t] ; de…ning [x] = max fy 2 Zjy xg : The competing risks model are known in the hazard or duration model literature, such as Lancaster (1990) p. 106.

3.4 3.4.1

Likelihood and Posterior Simpli…ed Choice Probabilities

What does the likelihood equations that correspond to eq. 3 look like? To simply matters, we study a time-static case from a choice set of size three. Note that the notation in this section is simpli…ed with respect to variable indices. From and including section 3.4.2, the notation of the previous sections are reintroduced. The utilities for choice 0, 1 and 2 are: u0 u1 u2 y

=0 = v 1 + "1 = v 2 + "2 = arg max ui i

The Gumbel/Extreme Value Type I distribution is de…ned by cumulative distribution x x function F (x) = e e and corresponding density function f (x) = e x e e : Note that "1 and "2 are independent. The …rst case is that of y = 0: Pr (y = 0) = Pr (u0 > u1 [ u0 > u2 ) = Pr ( v1 > "1 [ v2 > "2 ) Z v1 Z v2 f ("1 ; "2 ) d"1 d"2 = 1

1

= F ( v1 ) F ( v2 ) =e

ev1 ev2

The second and third case is that of y = 1 or y = 2:

38

Pr (y = 1) = Pr (u1 > u0 [ u1 > u2 ) = Pr ("1 > v1 [ v1 v2 + "1 > "2 ) Z 1 Z v1 v2 +"1 = f ("1 ; "2 ) d"1 d"2 v1 1 Z 1 = f ("1 ) F (v1 v2 + "1 ) d"1 v1 Z 1 v1 +v2 e "1 ) d"1 = e "1 e (1+e v1

h 1 e (1+e 1 + e v1 +v2 ev1 v = v1 1 e e1 v 2 e +e =

v1 +v2

)e

"1

i1

v1

ev2

Interchange ev1 for ev2 and we have the solution for y = 2: The generalization to larger choice sets is easy but tedious. 3.4.2

Likelihood and Posterior

Let = f sd gs2 ;d2 ns denote the set of common parameters, and i = f isd gs2 ;d2 ns ; the set of individual level coe¢ cients for customer i. Let = ff i gi2N g represent the set of individual level model parameters for all customers. Let = f sd gs2 ;d2 ns specify the set of parameters controlling the individual level heterogeneity population distribution. Using eq. 2 and a generalization of the equations in the prior section, in particular exchanging vi for hisdt , we get eq. 6. Pr (yit jyi;t 1 ; xit ; i ; Pr (d = yit js = yi;t 1 ; xit ; i ;

)= )=

(

(6) 1 e

P

e

j6=s

where hisdt = exp (

isd

P

j6=s

hisjt

hisjt

+

0 sd xit ) ;

P hisdt j6=s hisjt

for d 6= s

for d = s

;

i 2 f1; :::; N g; t 2 f1; ::; Ti g

We can now state the model in full. Given a sample of customers i 2 f1; :::; N g , for all i, collect the corresponding observed behavior fyit gT0 i and characteristics fxit gT0 i in a set D. Collect all parameters in = f ; ; g A customer might never touch a speci…c state and untouched states does not a¤ect the model likelihood. Let Gi denote the set of states touched by customer i; so Gi = fs 2 j9t : yit = sg . The model’s joint posterior p is given in eq. 7. For the distribution parameters, and ; are …xed, as mentioned in section 3.1. ! Ti N Y Y p ( jD) = Pr (yit jyi;t 1 ; xit ; i ; ) (7) i=1

t=1

N Y Y

i=1 s2Gi d2 ns

(

isd j

39

sd )

Y

s2Gi d2 ns

2 sd j

;

In many classical applications we would now to some degree think of and as nuisance parameter vectors and strive to integrate them out. That approach would require the evaluation of complicated integrals and leave us with little knowledge of the individual . The simulation approach presented in the next section frees us from that task, i in while giving us the opportunity to learn about a :

3.5 3.5.1

Metropolis-Hastings Simulation A Prelude - The Accept-Reject Method

The popular Metropolis-Hastings algorithm (MH) is used in this paper for estimation of model parameters. The focus is concentrated on a particular version of the MH algorithm that is considered useful for the problem at hand. See Chib & Greenberg (1995), Albert & Chib (1996) or the textbooks Robert & Casella (1999) and Chen et al. (2000) for an introduction. A detailed exposition of the theory behind MCMC methods in general and the Metropolis-Hastings algorithm in particular is outside the scope of this paper, while enough information will be given for the practitioner to …nd more information and to understand some of the intuition and important issues involved in this type of simulation. Given a stochastic variable with density f (z) R we might wish to infer its mean, or other statistics, g (z) : The usual approach is to solve g (z) f (z) dz analytically. Sometimes this may prove intractable and we will want another approach. An Pnalternative is to simulate 1 realizations z1 ; :::; zn from f (z) and then simply calculate n i=1 g (zi ) for large enough n: The question is now how to generate random draws from the density f (z). Using a computer, it is easy to generate pseudo-random draws from the Uniform distribution. Draws from many standard distributions, such as the Normal distribution, are easy to generate as a transformation of one or more Uniform draws. If no transformation is known for f (z), it is possible to use the Accept-Reject method (AR) as described in (Robert & Casella 1999, p. 49). The idea is to …nd a density f2 (z) that is easier to simulate than f (z) ; but with identical support; then sample from f2 (z), while rejecting some draws to arrive at the original f (z) : Example: If f (z) where de…ned on R the Normal distribution could be used. Pick a value M such that 8z : f (z) M f2 (z) and follow the algorithm in (8). What is the intuition? If f (z) and f2 (z) where identical, M could be set to 1 and all proposals would be accepted. What about the case of f (z) 6= f2 (z)? For a …xed M and f2 (z) ; it is easy to see that values that are likely under f (z) are more likely to be accepted. Values that are likely under f (z) and unlikely under f2 (z) are even more likely to be accepted. It can be proven that the procedure generates values from f (z) : 1. 2. 3. 3.5.2

Generate Z2 f2 (z) ; U Uniform (0; 1) f (Z2 ) Accept Z = Z2 if U M f2 (Z2 ) Goto 1

(8)

The Metropolis-Hastings Algorithm

The Metropolis-Hastings algorithm builds on the AR method, with some useful twists. To study a joint density f (z1 ; z2 ) ; we can generate a sample from it using only tractable random number generators and two functions that are proportional to f (z1 ; z2 ) in z1 and z2 ; respectively.

40

(t)

Metropolis-Hastings Algorithm Example Let zi denote the t’th value in the generated sequence of draws. The parentheses are included to ensure that the time index is not mistaken for the operation of raising to a power. Let z^1 ; z^2 denote a starting values, preferably not too far from the range of f (z1 ; z2 ) : 1.

Set t = 1 and initialize

2.

Draw w1

3.

If u1 <

4.

Draw w2

(t 1)

N z1 f

(t 1) w1 jz2

(t 1)

f z1

(t 1)

(t) w2 jz1

5.

If u2 <

6. 7.

Let t = t + 1 Goto 1

(t 1)

f z2

(t)

jz1

2 z1

(0)

and u1

= (^ z1 ; z^2 )

Uniform(0; 1) (t)

(t)

(t 1)

then let z1 = w1 else let z1 = z1

(t 1)

jz2

N z2 f

;

(0)

z1 ; z2

;

2 z2

and u2 (t)

Uniform(0; 1) (t)

(t 1)

then let z2 = w2 else let z2 = z2

The MH algorithm generates a chain of values for each parameter of interest, as does the AR method. A distribution used to generate wi is called a proposal distribution. When the proposal is a function of the variable value in step t 1; Chib & Greenberg (1995) call it a random walk chain MH algorithm. In the MH literature, the proposal distribution is often multivariate such that (w1 ; w2 ) is drawn in one step. The current approach is called a block-at-a-time or variable-at-a-time. Reasons for selecting these approaches are explained in 3.5.4. Note the R fraction in step 2. Given a full density, the conditional density f (z1 jz2 ) = f (z1 ; z2 ) = f (z1 ; z2 ) dz; so f (z1 jz2 ) =f (z1 jz2 ) = f (z1 ; z2 ) =f (z1 ; z2 ) since the denominators cancel out. In the last fraction, even more terms might cancel out, since we only need terms proportional to z1 and z1 : For many practitioners, the Gibbs sampler is familiar. It should be noted that the Gibbs sampler can be considered a special case of the MH algorithm. The Gibbs sampler requires the calculation of the conditional density, thus forcing the user to engage with a potentially intractable integral. 3.5.3

Complications

There are some complications in using the MH algorithm. In contrast to the AR algorithm, the MH chain converge to draws sampled from the joint posterior distribution. In other words, the MH algorithm will usually not immediately generate values from the target distribution. A second complication is that the MH chain draws usually cannot be considered independent draws from the target distribution. A third complication is how to specify z21 and z22 ; often called tuning parameters. On the convergence issue, the initial values (^ z1 ; z^2 ) plays a role. If they are on the fringe of the probability surface of the target distribution, convergence will take longer. On criteria for convergence in general, more research is currently being carried out. Jarner & Tweedie (2003) show that a necessary condition for the MH chain to be geomtrically ergodic for the target distribution, is for the target distribution’s tails to converge towards zero at exponential speed. In practice, the researcher will study the chain from above some threshold of t iterations and check if plots of the chains seem stationary. The phase before t is called the burn-in phase. In this paper t is …xed to 20000 iterations in the …rst model run. 41

On the dependence issue, except for extreme cases, one can rely on the Law of Large Numbers when calculating statistics based on these dependent chains. It is also a common practice to sample from the chains at points so far from each other, that dependence disappears. With respect to the tuning parameters, adjusting them is a balancing act. Too much variance makes the algorithm reject too many proposals, thus producing a chain of values that hardly changes and does so in jerky jumps. Too little variance can result in too little search space being explored, again producing a chain that is not representative of the target distribution. Optimal scaling results exists for a few target distributions and dimensions, but not for the general case. A common practice is to go for an acceptance rate below the asymptotic, in the number of dimensions, optimal rate for a Normal distribution, around 0.234 as shown in Roberts et al. (1997). In this paper and for relevant parameters, tuning is carried out in the burn-in phase, adjusting individual tuning parameters after 30 to 150 iterations, the exact number choosen at random. A 90% con…dence interval is formed around the empirical acceptance rate, based on iterations since the last tuning phase. If the upper bound is below 0.27, the variance is lowered by 50%. If the lower bound is above 0.27, the variance is increased by 50%. 3.5.4

Choice of Metropolis-Hastings Subtype

Why choose the random walk and variable-at-a-time approach? The variable-at-a-time approach has an advantage when facing a large number of variables. It is di¢ cult to optimize tuning parameters with respect to the acceptance level, when the number of tuning parameters are high. The issue is similar to the general problem of optimization in the face of many variables. A second argument for the random walk and variable-at-a-time is related to the software implementation and model development process. During model development, the model structure can be rather volatile. Parameters are added and removed as more experience is gained with data and new ideas materialize. In that case it can be an advantage to design a plug-in framework to facilitate rapid entry of new parameters and automated adjustment of the MH model, setup of tuning and proposal distributions. Such a framework is, in the author’s experience, easier to build using the two mentioned approaches. 3.5.5

Estimation of Model Parameters

In this section it is described how the model in eq. 7, p. 39 can be simulated using the MH algorithm. The parameters in ; and are simulated one at a time. To simulate, we need the functions equivalent to the conditional density functions is section 3.5.2, step 3 and 5. The time index notation of section 3.5.2 is reused, so (t) denotes the sample of all parameters at index position t. Let (0) be the initial values for all parameters. Pick a z 2 : Then interpret z as nz; the set of all parameters, except for z: The model posterior given z and a scalar z2 taking z’s place in the model, is represented by p (z2 ; z ) : 1. Fix t = 1: Let ^ = referred to as ^ isd .

(t)

: Elements in ^ at the same position as in ; say,

isd ;

will be

2. First we shall examine the simulation of : Fix i 2 f1; :::; N g; s 2 and d 2 ^ to the …rst element of : De…ne a set that pick out transitions for customer i 42

that originate from state s: Such a set is Gis = ft 2 f1; :::; Ti gjyi;t 1 = sg . Let isd denote a proposal for a draw in the target distribution, carried out as in step 2 of the simpli…ed algorithm. Modify Pr ( j ) from eq. 7 to accomodate the changed parameter representation. Then the conditional density ratio simpli…es to eq. 9. ^

^ isd jD

p ^ isd ; ^

^ isd jD

p =

isd ;

Q

=Q

t2Gis

Pr yit jyi;t 1 ; xit ;

isd ;

^

^ t2Gis Pr yit jyi;t 1 ; xit ; ^ isd ;

isd j^sd )

^ isd

(

^ isd

(^ isd j^sd )

(9)

Pick a u Uniform(0; 1) : If u < ; we accept isd and update element ^ isd in ^ to the new value. If we reject, ^ continues to hold ^ isd unchanged. Iterate this step for all i; s and d: 3. In this step, is simulated. Fix s 2 and d 2 so ^sd correspond to the …rst element of ^^: Fix j to the …rst element in ^sd and let ^sdj denote that element. De…ne a set Hs that pick out customers that enter a state s at any point. Let Hs = fi 2 f1; :::; N gj9t : yit = sg: The conditional density ratio the simpli…es to eq. 10. Q Q ^ ^ p sd;j ; ^ ^sd;j jD i2Hs t2Gis Pr yit jyi;t 1 ; xit ; sd;j ; sd;j =Q (10) Q ^sd;j ; ^ ^ p ^sd;j ; ^ ^sd;j jD Pr y jy ; x ; it i;t 1 it i2Hs t2Gis sd;j Proceed as in the previous step. Iterate for all j; s and d:

4. Elements in eq. 11.

are simulated using a direct draw from their posterior distributions, 2 ^ sd j

v + jjHs jj v Inverse Gamma ; 2

2

+

P

i2Hs

2

2 ^ isd

;

(11)

2 where jj jj counts the number of elements in a set. Update ^sd in ^ for each s and d:

5. Let

(t)

= ^: Goto step 2.

When contemplating the issue of computational tractability and speed, it’s interesting to note that the number of individual coe¢ cients, isd ; is proportional to the number of observed customers, while the size of is independent of the number of customers. Observing the simple anatomy of eq. 10 in step 2, the algorithm can be said to scale well in speed with the number of customers.

3.6

Sampling

The model presented so far is implicitly contingent on a special sampling scheme, ‡ow sampling. Random sampling of customers entering a state during an interval of time is called ‡ow sampling, while random sampling of customers already in a state at a …xed point in time is called stock sampling. For a customer base, ‡ow sampling could be implemented by sampling among all customers being activated subsequent to a given date, while stock sampling is identical to indiscriminant sampling among all active customers, ignoring their activation date. To illustrate the idea of sampling further, consider …gure 1. Assume data limitations impose an observation window in the interval t0 to t1 . A ‡ow sampling approach would be 43

A B

C

D

E

t0

t1

Figure 1: Event histories - birth and death of customers

to sample cases activated after t0 : In the …gure, only cases A and B qualify for inclusion. A stock sampling approach would sample everything within the window. A, B, D and E would be included. Stock sampling induces a bias in estimates of parameters. Think of a customer pool of two types of customers, a short-life and a long-life. If new customers ‡ow in at a consant rate and in equal proportion, then in time, the long-life customers will claim a larger proportion of the pool. Usually a model can be corrected to account for this, but it is simpler to utilize ‡ow sampling. See Lancaster (1990), chapter 5 and 8 for a detailed examination of the issue.

4

Application

In the following section we shall study the key dynamics in a customer database from a telecommunications service provider. There are multiple products and the products can be consumed at varying intensities. In the study we shall examine event history dynamics, conditioning on sociodemographic variables, time and individual heterogeneity. A general-to-speci…c elimination is carried out and predictive qualities of aspects of the model is analyzed. A few special questions shall be considered, namely patterns in data related to customer churn, and patterns associated with customers choosing a high-value ADSL product.

4.1 4.1.1

Data Event Histories

9218 subscribers are ‡ow sampled by narrowing interest to customers activated in the period from January 2001 to April 2003. Product type and consumption levels are recorded on a monthly basis, so each timestep equals one month. No individual is observed past April 2003. The primary sample is split into two subsamples of equal size at random. One subsample is used for estimation purposes, while the other is used to measure the quality of 44

Count

600

400

200

0 0

5

10

15

20

25

30

t

Figure 2: Histogram of event history lengths

aspects of model predictions. The median event history length is 15 months. Examining …gure 2 event histories of length 18 and 19 months are of special concern, since both are extreme with counts of 744 and 3, respectively. Experts from the marketing intelligence department think it represents an unusual technical glitch: Customers arriving at t 19 months were not registered until t 18 months ago. The author could choose to model this possibility explicitly, but the impact of censoring, say, 350 customer event histories by one month is probably quite small. It is important to note that the author did not have full access to all available data, nor could he change requirements for new data deliveries inde…nitely as new ideas or limitations turned up based on the investigation.

4.2

Multiple Choice Components

This section builds on the framework laid out in previous sections, but instead of one choice component, three choice components are used. A model with n choice components can always be reduced to a one-dimensional model with a large choice set. Using more than one choice component, in this case, simpli…es model speci…cation and interpretation. Each component is modeled seperately as described in the previous sections. The only form of interaction is through time-varying dummy variables entering the observed characteristics vector xit : For each customer, we observe three components of behavior at discrete, monthly intervals t 2 f0; 1; 2; :::; Ti g:

45

4.2.1

Continuation Choice

The …rst component is the overall continuation choice: Will the customer stay or leave? If at time t a customer decides to stay, let yitc = 1: If he chooses to terminate the relationship let yitc = 0:Eq. 12 represents the customer choice set with respect to continuation. yitc =

0 if termination : 1 if continuation

(12)

The term churn is used repeatedly in subsequent sections and is identical to customer termination. 4.2.2

Product Choice

The second component is product choice. The customer can choose 1) Public Switched Telephone Networking (PSTN), 2) Integrated Service Digital Networking (ISDN) or 3) Asymmetric Digital Subscriber Line (ADSL). In more common language that is a …xed phone line, a pre-broadband data connection or a broadband data connection. Let 8 < 1 for PSTN 2 for ISDN : yitp = : 3 for ADSL A subscription to ISDN or ADSL implicitly includes an ordinary phone line. For all t where yitc = 0; product choice yitp is unde…ned.

4.2.3

Consumption Intensity

The third component is consumption intensity, but could also be called variable revenue. There are two main sources of revenue: Subscription fees and minute usage fees. Subscription fees are a function of yitp ; while variable revenue is captured in yitv : It is a discrete measure of aggregate variable revenue generated from enabled services. The number of intervals must be kept low to avoid spending too many parameters estimating the model, so three levels are allowed. Revenue is generated from phone and ISDN minute usage. Phone revenue is a function of minute usage and call destination, while ISDN line revenue usually derives from minute usage to one speci…c destination. ADSL subscriptions are ‡at-fee based, so no minute usage fees are incurred with respect to data consumption. A customer on an ADSL subscription is still likely to generate minute usage through …xed phone line minute usage. 8 < 1 for low consumption v 2 for medium consumption yit = : 3 for high consumption

What exactly constitutes low, medium and high consumption? The company providing the data requested that bill levels was to be kept secret. Figure 3 shows estimated density for the log(revenue+1), but the x-axis is blanked out on purpose. The low, medium and high thresholds are set at rather arbitrary levels, the two percentiles: 30% and 80% of the sample population. A default approach might have been to set the thresholds at 33.3% and 66.6%, but the higher level was choosen to examine if a distinctly higher variable consumption carries any information with respect to the other choice components. Around 15% of customers have a zero-level variable consumption, as indicated by the vertical line. 46

Figure 3: Empirical density of monthly log(variable bill+1) for active customers

For all t where yitc = 0; consumption intensity yitv is unde…ned. 4.2.4

Choice Components and Transition Data

Table 1 shows data transition frequencies and derived simple Markov probabilities for transitions. The table indicate it’s a rare event to churn and return again as a customer. It might be so, but it has also to do with the time horizon and the di¢ culty of tracking former customers. Even though the company is putting an e¤ort into the area, it will continue to depend on inherently error-prone processes such as name matching. Due to the low frequency and the ambiguity related to the data, the transition is left out of the model at the added advantage of lowering model complexity. Transitions from ADSL to ISDN are left out on the same notice. The transition frequencies for product choice are quite low. Within the available sample window, it is unfortunately a relatively rare event to switch to a new product. The result of this will undoubtly be that estimates of product choice will be less reliable. The opposite is true for the consumption level transition frequencies. Each agent gives rise to twelve unobserved heterogeneity terms. One term in the continuation choice component, …ve in the product choice component and six in the consumption choice component. 4.2.5

Interaction Variables

The choice components do not in‡uence each other directly. Interaction is enabled through p c v a time-varying dummy-variable mechanism. yi;t 1 is allowed to in‡uence yit and yit . The variable internet it is included in the churn model according to eq. 13, to test if data customers are more loyal and if their consumption level is in‡uenced. internet it =

p 1 for yi;t 1 2 f2; 3g (ISDN, ADSL) 0 oherwise

47

(13)

Churn Model

Product Choice

Consumption Level

Transition Frequency

Transition Frequency

Transition Frequency

Destination Source

Destination

Churn

Stay

Churn

8382

43

Stay

924

55433

Source

Destination

PSTN

ISDN

ADSL

Source

LO

MED

HI

PSTN

41743

57

259

LO

4819

1231

239

ISDN

33

5070

106

MED

1443

31347

3020

ADSL

66

15

8084

HI

213

3146

9975

Markov Transition Probabilities

Destination Source

Markov Transition Probabilities

Churn

Stay

Churn

0.9949

0.0051

Stay

0.0164

0.9836

Markov Transition Probabilities

Destination Source PSTN

Destination

PSTN

ISDN

ADSL

Source

LO

MED

HI

0.9925

0.0014

0.0062

LO

0.7663

0.1957

0.038

ISDN

0.0063

0.9733

0.0203

MED

0.0403

0.8754

0.0843

ADSL

0.0081

0.0018

0.9901

HI

0.016

0.2359

0.7481

Table 1: Descriptive statistics ordered by choice component. Upper half: Monthly transition frequencies. Lower half: Simple Markov probabilities.

Churn Model

Stay

Churn

State 1

State 0

Product Choice

Consumption Level

Low

Telephone

State 1

State 1

ISDN

ADSL

High

State 2

State 3

State 3

Medium State 2

Figure 4: Choice components. The inside of dark-grey boxes show allowed choice transitions. Arrows between boxes illustrate what components are allowed to in‡ucence other components.

48

In a similar way the variables isdn it and adsl it are included in the consumption level model yitv .

isdn it =

p 1 for yi;t 1 = 2 (ISDN) 0 otherwise

adsl it =

p 1 for yi;t 1 = 3 (ADSL) 0 otherwise

Customers once subscribing to ISDN or ADSL and then rejecting either might be more or less prone to subscribe again. The dummy variable inettch it (“internet touch”) is equal to one, if the customer doesn’t currently subscribe to ISDN or ADSL, but once did. The variable is included in all three choice components. 8 p if yi;t = f2; 3g < 1 2 1 inettch it = and 9j > 1 : yi;t j 2 f2; 3g : 0 otherwise v The consumption level yi;t variables vlo it and vhi it .

1

is allowed to in‡uence yitc and product choice yitp through

vlo it =

v 1 for yi;t 1 = 1 0 otherwise

vhi it =

v 1 for yi;t 1 = 3 0 otherwise

Figure 4 show what choice components are allowed to in‡ucence other components. 4.2.6

Sociodemographics

Sociodemographic variables from a third party data vendor are included in the analysis. It is very important to understand that variables are recorded on an aggregate level. When referring to, say, the income of a given customer, we are in fact referring to the average income in the customer’s neighbourhood. The database partitions the geographic region into cells of varying size, depending on the legislative requirements with respect to privacy. Some measures of income, for instance, require at least 150 households as a calculation base, while age measures are considered much less sensitive and may be based on as little as 20 households. This scheme induces an additional level of noise in the data, but companies do not have legal access to more detailed types of sociodemographics on a large scale. Sociodemographic variables are grouped into …ve sets: Age, income, jobmarket status, education and household type. Variables in all groups, except income, are measured in proportions as described in appendix A.1, p. 60 along with descriptive statistics. Within these groups, proportions sum to one, so at least one variable must be excluded in each group for analysis to commence. A correlation table is provided in appendix A.2, p. 61. There are many variable-pairs to compare, so two-component multidimensional scaling (MDS) on absolute correlations are visualized in …gure 5. The variables adult, jhi, enone and hs0 are excluded from further analysis on the grounds of being highly correlated to variables child (-0.83), elong (0.80), eshort (-0.80) and hc2p (-0.87) respectively. 49

Figure 5: Correlation structure of sociodemographic variables. Plot of two-component multidimensional scaling of absolute correlations.

As seen in the correlation table and …gure 5, the variables are in general correlated. A high variance on individual parameter estimates must be expected. A general-to-speci…c variable elimination scheme will be utilized later on, but given the correlation between demographic variables, a cautious approach must be taken when interpreting variable coe¢ cients. 4.2.7

Duration Dependence Choice Component Continuation Product Choice Consumption Intensity

Timepoints 1, 4, 12 1, 3, 12, 16 1, 2, 3, 5, 7

Table 2: Time points at which time-dummies are included The probability of transitions in and out of states might relate to duration of stay in the current state. Learning, switching costs and habit formation are examples of phenomena that could induce duration dependence in the decision to staty or go. The probability of exit from a state as a function of duration of stay is graphed in appendix 3.1, p. A.4. Exit out of the churn component model does mean complete termination of the customer relationship, but "exit" for the remaining choice components should be interpreted as the probability of entering any other state, for example, a customer changing usage minute consumption corresponding to a move from medium level to high level consumption. The continuation component does not display an obvious signi…cant pattern, while product choice with respect to PSTN and ISDN display somewhat lowered probability of exit.

50

Clearly, the consumption intensity/consumption level component displays strong duration dependence. The probability of exit from a given consumption intensity state is lowered considerably the longer a customer consumes at a given level. Duration dependence is accounted for by including time-varying dummy variables. A variable is not included at every time point, since the number of extra parameters to be estimated would be too high. Rather, the empirical exit probailities are loosely examined for seemingly signi…cant kinks and time-dummies are included accordingly. Table 2 shows the time-dummy insertion points. A time-dummy for time t is de…ned as in eq. 14. st

4.3

;t

=

1 if t t 0 if t > t

(14)

More on Estimation

20.000 iterations were spent calibrating the control parameters of the Metropolis-Hastings algorithm, another 20.000 on burn-in. One variable in each state in each submodel is eliminated using a general-to-speci…c principle. Then a new cycle of elimination started using 2000 iterations for tune-in and 2000 for parameter estimates. The process was repeated until no insigni…cant variables remained at the 10% level.

5

Results

How does one judge the predictive qualitiy of a model? One measure is that of model lift, where the model’s predictive ability is compared to that of a completely random prediction. Model lift is now explained. Given a sample of size n binary customer types yi 2 f0; 1g; i 2 f0; :::; ng, suppose the objective is to identify likely type-0 customers. We know related characteristics zi and our model scores likely type-0 customers using a function f (zi ) : If the sample contains n0 type-0 customers, we extract n0 customers from the full sample, according to a ranking generated by f (zi ). Let ncorrect designate the number of correct predictions. The lift is the ratio of the proportion of correct predictions in the model, relative to the expected correct predictions by a completely random model. lift =

nc o rre c t n0 n0 n

Lift measures for the model’s ability to predict churn and predict if a customer buys ADSL are reported in section 5.2 and 5.3. Section 5.2 gives a detailed example of lift calculation. Note lift is sometimes calculated using a base di¤erent from n0 ; for instance, lift could be calculated by picking out 10% of the sample and then measuring the ratio of correctness. That approach is not used here. Parameter estimates for a reduced model, reduced in the sense that insigni…cant variable coe¢ cients have been eliminated, are reported in appendix B.1, p. 64. Statistics on the individual level heterogeneity parameters are not reported, as they are too numerous, but their empirical distribution are shown in appendix B.2, p. 66. In line with the misspeci…cation methodology recommended by Allenby & Rossi (1999), the empirical distribution is graphed along with the prior distribution, eq. 4, with the 2 posterior average values for sd plugged in. A Shapiro and Wilk’s test (SW) for normality is carried out and its p-value displayed in each graph, p (SW). It turns out the unobserved 51

heterogeneity parameter for the continuation model is nearly rejected on a 1% level, questioning the choice of prior for this particular parameter. The empirical distribution seems left-skewed and this is con…rmed by a D’Agostino test for skewness, so a better prior should be introduced to accommodate this. A parametric alternative could be, as an example, a Skew-Normal distribution.

5.1

Prediction

It is challenging to interpret coe¢ cient estimates in this complex model. One-period ahead transition sensitivities to variables can be understood directly from coe¢ cient signs, but multiple periods complicate matters by allowing variables to have a direct and indirect e¤ect on the outcome of interest. A further complication is variables exhibiting some degree of multicolinearity, unobserved heterogeneity and states pushing customers into new states with varying degrees of inherent force. In the following sections the outcome of interest is the probability pi of a choice component variable equaling s at least once within n periods of time. This is investigated by simulating the state path n periods ahead for each agent, observing if s occurs. Model parameters are simulated simultaneously using the Metropolis-Hastings algorithm, so statistics based on the simulation is not conditional on average estimates, but on the parameters’full distribution. (k) Let s^it denote the simulated state value in period t for customer i in simulation run numer k. Given K simulation runs, the estimated probability of customer i touching state s is given in eq. 15. o n (k) kj9j : s^i;t+j = s (15) p^i = K Eq. 15 is the basis for lift calculations and for sensitivity analysis. Sensitivity analysis, with respect to sociodemographic variables, is carried out by estimating the average pi conditional on one variable xj at a time. A local regression method, loc…t in the statistical system R, does that for us. To get a rough idea of the sensitivities, the conditional average is noted at the …rst, second and third quantile of each variable.

5.2

Exit Behavior

An out-of-sample prediction of customer termination is carried out in this section. Firstly, the predicted sample did not contribute to the likelihood of any aggregate parameter values. Secondly, individual heterogeneity parameter estimates are based on a truncated version of the individual event history. The agent event history is modi…ed, for an n period ahead prediction, by excluding the remaining n observations from estimation. Individuals with a transaction history length below size n are excluded from comparisons. The list results are shown in table 3 and sensitivity analysis conditional on sociodemographic variables are presented in appendix B.3, p. 67. Table 3 describe di¤erent horizons at which prediction is carried out. In the 1-step ahead predition, the …rst line of the table, n = 3602: Among those, it turns out that 92 churn. The model then picks out 92 it believes are most at risk of churning. It turns it, that among those 92; the model correctly identi…es 15: So the correctness ratio is 15=92 0:163: If we had picked out 92 from among the 3602 at random, on average, we would get a correctness ratio of 92=3602 0:026: This implies that the lift of the model is 0:163=0:026 6:4: 52

horizon 1 2 3 6 12

total actual predicted ratio correct ratio random 3602 92 15 0.163 0.026 3494 144 25 0.174 0.041 3402 197 35 0.178 0.058 2990 293 52 0.178 0.098 2233 424 143 0.337 0.190

lift 6.4 4.2 3.1 1.8 1.8

Table 3: Lift for out-of-sample predicted churn

Figure 6: Predicted 12 months ahead churn conditional on interaction variables. Point: Proportion estimate. Black line: 95% con…dence bands.

The lift values can be compared to the critical values in table 5, p. 5. From that it is clear that the lift values are signi…cant at all horizons. In general, lower risk of churn is associated with areas of higher income, higher proportions of self-employed, high-ranking job positions, educated people living in households as couples or as singles with two or more children. Higher risk of churn is associated with areas of lower income, higher proportions of people with no job market a¢ liation, little or no formal education, and living in single households. Among the sociodemographic groups, age composition seems to carry little signi…cance. Two e¤ects are highly non-linear. The proportion of single households is associated with higher risk for proportions below the median of 0.42. For values above the median, the association is reversed, though less steep. A similar relation holds for couple households with two or more children. Customers with low and high telephone consumption are more likely to churn than medium level consumers, low state consumers being much more likely to churn. Product choice does not seem to induce signi…cant di¤erences in churn, even though possession of ADSL or ISDN lowers risk directly: In the continuation component model parameter estimated in appendix B.1, p. 64 the internet10 coe¢ cient is signi…cant and negative. It is seemingly paradoxical that an e¤ect is directly signi…cant, but results in a prediction that says otherwise. Studying the consumption intensity component model, it turns out that ISDN and ADSL plays an important role. Leaving the low state is 53

horizon 1 2 3 6 12

total actual predicted ratio correct ratio random 2602 11 0 0.000 0.004 2541 26 0 0.000 0.010 2513 55 2 0.036 0.022 2273 86 10 0.116 0.038 1788 125 18 0.144 0.070

lift 0 0 1.7 3.1 2.1

Table 4: Lift for out-of-sample predicted PSTN to ADSL transitions signi…cantly less likely if an internet product is present. Entering state low from state medium is signi…cantly more likely in the presence of an internet product. Entering state low from state high is more likely for ADSL customers. Put di¤erently, assume we are in period t. A customer with consumption level in state medium and product choice in state ISDN/ADSL is less likely to churn than other customers in period t + 1. He is, however, more likely to move his consumption level to state low or high, thereby increasing the risk of churn in period t + 2. Considering that state low is much riskier than state medium and high, the indirect higher risk absorbs the direct e¤ect.

5.3

Who Buys ADSL?

An out-of-sample prediction of ADSL-buying behavior, that is, it is studied who among the non-ADSL customers eventually buy into ADSL. The lift results are shown in table 4 and a sensitivity analysis with respect to sociodemographics variables are shown in appendix B.4, B.4. The model does not pick up until above and beyond a three-month horizon. This has to do with the very small fraction of customers that buy into ADSL and noise surrounding the lift measure. Critical values of the lift measure are shown in table 5 in the next section. From that it is clear that the lift is signi…cant for the 6- and 12-months horizons only. It is also clear that measuring lift on the 1-month horizon will carry little information. Few sociodemographic characteristics have a direct signi…cant impact on the probability of transitions to the ADSL state. Average houshold income, inch, and the proportion of teenagers in a given area, teen, are the only two directly signi…cant sociodemographic variables as seen in appendix B.1, p. 64, for the product choice model. For ISDN customers, only income is directly signi…cant. However, due to the high correlation in sociodemographic variables, there is an observed signi…cance in several variables correlated with income and teenagers. Conditional on PSTN customers, it is clear from appendix B.4, that higher proportions of children, self-employed, couple households and unclassi…ed households are positively related to PSTN to ADSL transitions. Higher proportions of unemployed, people seeking education, uneducated and single housholds without children are negatively related. As seen in …gure 7, PSTN customers with high consumption are more likely to get ADSL, while medium and low consumption types display little di¤erence. ISDN customers are much more likely to get ADSL. No time-dummies survive variable elimination, so duration of stay seems to carry extra information.

5.4

Lift - Critical Values

The lift measure can be interpreted as a statistical test for randomness of a selection mechanism. Given the sample size n and the number of successes in the sample n0 ; let n 54

Figure 7: Predicted 12 months ahead probability of transition to ADSL conditional on interaction variables. Point: Proportion estimate. Black line: 95% con…dence bands. Churn - Critical Lift Values Lift Percentile Horizon 0.9 0.95 1 1.702 2.128 2 1.516 1.685 3 1.403 1.490 6 1.219 1.289 12 1.118 1.155

ADSL - Critical Lift Values Lift Percentile Horizon 0.9 0.95 1 0.000 0.000 2 3.759 3.759 3 2.492 2.492 6 1.844 1.844 12 1.373 1.488

0.975 2.553 1.853 1.578 1.323 1.180

0.975 21.504 7.518 3.323 2.151 1.602

Table 5: Critical value of lift measure at total observations and successes corresponding to table 3 and 4 denote the number of successes we get when sampling n0 items from among the n items. Under H0 the sampling mechanism is completely random, so n is Hypergemetrically distributed with probability function given in eq. 16. n0 z

f (z) =

n n0 n0 z N n0

(16)

The lift, given n is l=

n n0 n0 n

:

Given the critical values of n under H0 we can calculate the critical values of l: The calculation is carried out and shown in table 5. Example of use: The lift table for exit behavior prediction, table 3, shows at lift of 6:4 at a horizon of 1 month. To reject the H0 of random sampling, the lift should be higher than 1:702; 2:128 or 2:553, depending on the signi…cance level (one-sided 10%, 5% or 2.5%). The critical values for that particulat entry were calculated using n = 3602 anf n0 = 92:

55

Horizon Lift 1 Lift 2 1 6.4 4.9 2 4.2 3.4 3 3.1 2.8 6 1.8 1.9 12 1.8 1.8

Ratio 0.76 0.81 0.91 1.03 0.99

Table 6: Lift on 12 months ahead churn predictions. "Lift 1": Full model. "Lift 2": A model excluding individual unobserved heterogeneity $/month Product Choice PSTN 20 ISDN 30 ADSL 40 Consumption Intensity LO 0 MED 10 HIGH 60

Table 7: Revenue per month derived from choice component states

5.5

Importance of Individual Level Heterogeneity

A second run of the model is done excluding individual unobserved heterogeneity, to see if the complication adds anything to the predictive ability of the model. Table 6 reports lift measures, given the alternative model. It is clear that individual unobserved heterogeneity improves the lift of the model, but the relative gain decreases as the forecast horizon grows. The gain disappears when moving beyond a forecast horizon of three months. The method used to compare out of sample lift measures is bound to disfavor a model including individual unobserved heterogeneity. As the forecast horizon grows, the individual event history shrinks in line with the out of sample principle. The unobserved heterogeneity estimates being very dependent on individual event histories will thus gradually loose valuable information. The extent to which this distortion in‡uences the comparison is unknown, but the method makes it unlikely to overestimate the value of unobserved heterogeneity as a modelling element.

5.6

Customer Value

We will now calculate the customer lifetime value (CLV), the present value of customer revenue streams, …ve years into the future, applying a 10% per year discount rate. Each state is assigned a value, not exactly equal to the true value due to secrecy restrictions, but enough for relative values to stay roughly the same. Table 7 reports on the exact values. The individual values are not reported, but a sensitivity analysis is carried out conditioning on all sociodemographic variables, on at a time. The results are shown in appendix B.5, p. 70. When studying the …gure it can be concluded that it forms a pattern similar to that of churning customers. Well-o¤, better-educated customers in couple households are more valuable, as these customers also display lower churn levels. However, many variables associated with signi…cant changes in churn are not signi…cant in association with the 5-year value prediction. Areas with 56

above-average unemployment carry higher churn risk, but not a signi…cantly lower value. Similarly, groups with above-average rates of single households with one child usually go with higher churn rates, but not signi…cantly lower customer value. 5.6.1

Example: Marketing Decision

Imagine a simpli…ed budget decision scenario regarding a marketing campaign designed to win new customers. The revenue from a given campaign is a function of response-rate per person r, number of potential customer touched n, per-person cost of the campaign c and customer present value v. = rnv

cn

v is a stochastic variable, while c and n can be treated as constants. We might have no knowledge of r, but we can …nd a lower bound that must be met. E( )>0 rmin >

c E (v)

Suppose c = $400: Using numbers from the …gure in appendix B.5 we could derive di¤erent bounds for di¤erent sociodemographic groups. Using only household income inch we exchange the unconditional expectation E (v) for a conditional E (vjinch). The expected 5-year present value of a region earning $30,000/year, the …rst quantile, is $825, while an area earning around $42,000/year, the third quantile, has a present value of $1050. The rmin for the former region is then 48.5%, and 38.1% for the latter. A campaign expected to yield a response rate above 38.1%, but below 48.5%, should exclude areas where households on average earn less than $30.000/year. Compare the calculation above to one based on churn rates alone. The expected present value of the high-income group is almost one-third higher than that of the lowincome group. How does this compare to their termination rates? Turning to appendix 5.2, the …rst table, we see that the low-income group has yearly expected churn rates of approximately 17%, while the high-income group has an expected churn rate of 15%. This translates to a 5-year survival rate of 39.4% and 44.4%, respectively. The high-income group is only 1/8 more "valuable" using this measure. Clearly, not just termination rates, but also product mix and consumption intensity plays a role here. When studying the …gure in appendix B.4, it is seen that high-income households, high inch, are more likely to buy into ADSL. Studying the parameter estimates in appendix B.1, the third table, it it seen that the direct e¤ect of high inch is to increase the probability of moving from medium to high consumption intensity.

5.7

Limitations and Future Research

A key limitation in this study has no doubt been the nature of data. Prediction levels could probably go higher if data was made available on a …ner time-scale. A second issue is the variation in the product choice component; transitions are in general quite rare here. A third venue would be data on competitors o¤erings in terms of price and competitive pressure, measured by ad spending week by week. A fourth venue would the inclusion of marketing treatment variables on an individual level. Many companies are in the process of building up capabilities of that nature. 57

Moving on to model formulation, an alternative to discretizing the continous consumption variable would be to extend the statespace and introduce a parametric time-series model of variable consumption, letting relevant parameters depend on state. Consumption revenue is more naturally modeled by a continous distribution, that would enable the analyst to learn more about the potentially very valuable upper tail. The empirical distribution of minute usage revenue, as has been shown earlier, is highly skewed. In‡ow of new customers is currently ignored, but would be a useful addition for a company to get a more complete picture of e¤ects of marketing actions. This model accounts for existing customers only. The presence of large datasets and the lack of knowledge about functional form of the transition probability and observable heterogeneity-relation makes nonparametric approaches attractive. Few papers exist dealing with nonparametric models and state space representations.

5.8

Conclusion

In this paper we have successfully speci…ed and estimated a model accounting for individual level heterogeneity, even though the number of parameters in the model was very high. This should demonstrate to businesses and researchers alike, that it is possible to unify relatively complex models with a relatively large number of observations. Applied marketing researchers and business analysts should …nd the detailed exposition and application of the Metropolis-Hastings algorithm interesting in its own right and animate them to harness its power for their own use. The ability of the model to pick out and pro…le likely candidates that might terminate their relationship with the company, was signi…cantly above that of a random selection procedure. The model performance was not much higher than that of a model without coe¢ cients that vary on the individual level. So it seems that with respect to churn, in this speci…c data setup, much can be gained simply from modeling average e¤ects, ignoring individual level heterogeneity - a factor that would simplify the model setup considerably. This is a bit disappointing, since the main focus of the model-building e¤ort has been put into allowing for individual level heterogeneity. On the other hand, the model did perform between 10% to 30% better on a horizon on and below 3 months; secondly, it is not possible to know in advance if the individual level matters; thirdly, allowing the intercept to vary on the individual level solely might not be the only relevant focus. The performance of the model with respect to predicting future upsale of a speci…c high-value product, broadband internet, seemed to pick up on longer horizons; but few variables explained variation in preferences towards ADSL. The sociodemographic variables might simply not hold information that can help the analyst, or the model’s functional form were not su¢ ciently ‡exible. The multi-product, multi-intensity model facilitated the examination of rich questions in a uni…ed framework. A question asked in the previous section is if high-income customers are worth more. The answer was yes, beacuse their churn rates are lower, they are more likely to choose high-value products and their consumption intensity is likely to be higher. This is an example of an actionable insight that many companies currently does not have. A comprehensive model of customer value is a natural ingredient in the execution of a range of management challenges. A gold standard has not yet emerged, so the …eld of marketing will continue to search for better metrics of customer value and customer behavior. The …eld is hampered by the speci…city of each company’s situation and the 58

typical problems laid out in section 2.1. Any candidate will probably be tailored to the speci…c business segment, use very ‡exible and computationally tractable statistical methods and use data from multiple sources.

59

A

Descriptive Statistics

A.1

Sociodemographic Variables

Title Description Age Groups

Min

Q1

Q2

Q3

Max

child

Proportion aged <10

0.00

0.06

0.10

0.15

0.44

teen adult

Proportion aged 10-19 Proportion aged >=20

0.00 0.09

0.04 0.71

0.08 0.79

0.13 0.87

0.90 1.00

Proportion of households earning >$100k/yr Average household income, normed $100k

0.00 0.08

0.07 0.30

0.16 0.35

0.31 0.42

0.86 1.29

Income inchi inch

Jobmarket Status jself jhi jmed junempl

Proportion self-employed Proportion managers and workers on the highest level Proportion being workers on average pay and lower Proportion unemployed

0.00 0.00 0.00 0.00

0.02 0.05 0.35 0.00

0.05 0.10 0.42 0.01

0.10 0.16 0.49 0.02

0.46 0.62 0.73 0.46

jeduseek jpen

Proportion currently seeking an education Proportion outside the jobmarket

0.00 0.00

0.17 0.04

0.24 0.08

0.33 0.13

0.99 0.57

Proportion with no formal education Proportion with the equivalent of a college degree Proportion with a commercial background Proportion with a short to medium education Proportion with the equivalent of a university degree Proportion with an unknown educational background

0.01 0.00 0.02 0.01 0.00 0.01

0.14 0.01 0.26 0.12 0.01 0.07

0.23 0.04 0.33 0.17 0.04 0.10

0.31 0.09 0.40 0.23 0.08 0.14

0.57 0.64 0.65 0.68 0.60 0.76

Education enone ecoll ecomm eshort elong eunkn

Household Type hs0 hs1 hs2p hc0 hc1

Proportion of singles with no children Proportion of singles with one child Proportion of singles with two children or more Proportion of couples with no children Proportion of couples with one child

0.01 0.00 0.00 0.01 0.00

0.24 0.01 0.01 0.24 0.03

0.42 0.01 0.01 0.31 0.06

0.56 0.03 0.02 0.40 0.08

0.98 0.20 0.25 0.65 0.23

hc2p hother

Proportion of couples with two children or more Proportion of unclassified households

0.00 0.00

0.04 0.01

0.10 0.01

0.20 0.02

0.51 0.16

Overview of sociodemographic variables. Min: Minimum value. Q1: 25% percentile. Q2: Median. Q3: 75% percentile. Max: Maximum value. Names in italics indicate exclusion from estimation due to perfect multicollinearity.

60

0.11

0.29

0.2

0.23

0.1

0.1

inch

jself

Jhi

-0.2

61

0.17

hother

0.3

0.42

0.1

hc0

0.54

0.36

hs2p

hc1

0.19

hc2p

0.15

-0.49

hs0

hs1

-0.1

0.23

0.54

0.32

0.09

-0.48

-0.13

0.05

-0.12

eunkn

0.01

elong

0.1

eshort

0.3

-0.39

-0.3

0.18

ecoll

0.12

-0.12

-0.1

-0.07

0.27

0.16

0.28

-0.82

0

ecomm

enone

-0.01

jeduseek

jpen

-0.12

junempl

jmed

-0.02

-0.83

adult

inchi

1

0.36

0.36

teen

teen

1

child

child

-0.24

-0.66

-0.44

-0.15

-0.41

-0.17

0.59

0.15

0.03

-0.07

-0.29

0.41

-0.07

0.08

0.19

0.12

-0.13

-0.05

-0.3

-0.22

-0.35

1

-0.82

-0.83

adult

0.28

0.29

0.07

0.69

0.58

0.65

-0.23

-0.36

-0.74

-0.35

0.48

0.6

0.34

-0.37

-0.58

-0.55

-0.37

-0.3

0.07

0.66

0.53

0.87

1

-0.35

inchi

0.66

0.44

1

0.87

-0.22

0.16

0.2

0.06

0.46

0.43

0.46

-0.22

-0.28

-0.51

-0.28

0.58

0.54

0.17

-0.25

-0.57

-0.44

-0.32

-0.3

-0.02

inch

-0.41

-0.31

-0.24

-0.01

0.24

1

0.44

0.53

-0.3

0.27

0.23

0.12

0.58

0.4

0.56

-0.21

-0.34

-0.61

-0.24

0.16

0.3

0.3

-0.33

-0.22

jself

jhi

-0.04

0.31

0.4

0.43

-0.26

-0.27

-0.39

-0.31

0.8

0.7

-0.14

0.08

-0.74

-0.3

-0.5

-0.08

-0.1

1

0.24

0.66

0.66

-0.05

-0.02

0.1 0.11

0.1

0.03

0.19

0.23

0.26

0.01

0

-0.29

-0.51

-0.27

0.14

0.49

0.06

-0.08

-0.13

-0.62

-0.03

1

-0.1

-0.01

-0.02

0.07

-0.13

jmed

-0.02

-0.26

-0.21

-0.26

0.08

0.08

0.3

0.01

-0.01

-0.08

-0.35

0.57

0.01

0.4

-0.21

1

-0.03

-0.08

-0.24

-0.3

-0.3

0.12

-0.07

-0.12

junempl

-0.06

-0.32

-0.41

-0.39

0.04

0.08

0.44

0.58

-0.32

-0.5

-0.11

-0.29

0.47

-0.09

1

-0.21

-0.62

-0.5

-0.31

-0.32

-0.37

0.19

-0.1

-0.2

jeduseek

-0.3

-0.41

-0.44

-0.55

0.08

-0.12

-0.01

0.04

-0.41

-0.3

-0.56

0.38

0.43

0.46

0.19

-0.09

-0.34

-0.47

0.45

0.31

1

-0.09

0.4

-0.13

jpen

0

0.07

-0.23

-0.33

-0.44

0.33

0.33

0.32

0.27

-0.67

-0.8

-0.03

-0.23

1

0.31

0.47

0.01

-0.08

-0.74

-0.22

-0.57

-0.58

-0.07

0.12

enone

-0.3

-0.16

-0.52

-0.29

-0.28

-0.08

0.05

0.49

-0.09

0.16

0.06

-0.55

1

-0.23

0.45

-0.29

0.57

0.06

0.08

-0.33

-0.25

-0.37

0.41

-0.39

ecoll

0.08

0.46

0.32

0.47

-0.06

-0.15

-0.53

-0.46

-0.39

0.04

1

-0.55

-0.03

-0.47

-0.11

-0.35

0.49

-0.14

0.3

0.17

0.34

-0.29

0.3

0.18

ecomm

0.1

-0.04

0.34

0.39

0.5

-0.27

-0.28

-0.44

-0.49

0.53

1

0.04

0.06

-0.8

-0.34

-0.5

-0.08

0.14

0.7

0.3

0.54

0.6

-0.07

0.01

eshort

-0.02

0.13

0.23

0.2

-0.17

-0.16

-0.17

-0.16

1

0.53

-0.39

0.16

-0.67

-0.09

-0.32

-0.01

-0.27

0.8

0.16

0.58

0.48

0.03

-0.1

0.05

elong

-0.01

-0.27

-0.3

-0.46

0.13

0.11

0.42

1

-0.16

-0.49

-0.46

-0.09

0.27

0.19

0.58

0.01

-0.51

-0.31

-0.24

-0.28

-0.35

0.15

-0.13

-0.12

eunkn

hs0

-0.22

-0.87

-0.7

-0.77

-0.06

0.15

1

0.42

-0.17

-0.44

-0.53

0.49

0.32

0.46

0.44

0.3

-0.29

-0.39

-0.61

-0.51

-0.74

0.59

-0.48

-0.49

hs1

0.14

-0.21

-0.1

-0.45

0.55

1

0.15

0.11

-0.16

-0.28

-0.15

0.05

0.33

0.43

0.08

0.08

0

-0.27

-0.34

-0.28

-0.36

-0.17

0.09

0.19 0.32

0.36

0.2

0.07

0.02

-0.36

1

0.55

-0.06

0.13

-0.17

-0.27

-0.06

-0.08

0.33

0.38

0.04

0.08

0.01

-0.26

-0.21

-0.22

-0.23

-0.41

hs2p

hc0

-0.02

0.49

0.41

1

-0.36

-0.45

-0.77

-0.46

0.2

0.5

0.47

-0.28

-0.44

-0.56

-0.39

-0.26

0.26

0.43

0.56

0.46

0.65

-0.15

0.15

0.1

hc1

0.13

0.58

1

0.41

0.02

-0.1

-0.7

-0.3

0.23

0.39

0.32

-0.29

-0.33

-0.3

-0.41

-0.21

0.23

0.4

0.4

0.43

0.58

-0.44

0.3

0.42

0.54

0.54

0.16

1

0.58

0.49

0.07

-0.21

-0.87

-0.27

0.13

0.34

0.46

-0.52

-0.23

-0.41

-0.32

-0.26

0.19

0.31

0.58

0.46

0.69

-0.66

hc2p

1

0.16

0.13

-0.02

0.2

0.14

-0.22

-0.01

-0.02

-0.04

0.08

-0.16

0.07

0.04

-0.06

-0.02

0.03

-0.04

0.12

0.06

0.07

-0.24

0.23

0.17

hother

A.2 Correlation Table

A.3

Histograms

Histograms for sociodemographic variables. A vertical line marks the median.

62

A.4

Empirical Exit Probabilities

Empirical exit probability from a given state as a function of duration of stay. X-axis: Time in months. Grey lines: 95% con…dence bands.

63

B

Model Statistics

B.1

Parameter Estimates

Parameter estimates are presented below. Table headings should be interpreted as: Avg: average, SD: standard deviation of parameter; Sig: Signi…cance i.e. a test for zero being outside the con…dence interval (“**”: Yes, outside [2.5%, 97.5%], “*”: outside [5%, 95%], “-“: Inside, not signi…cant); Qx: x-percentile. The parameter names in the title column are to be interpreted in a special way. sd is represented by "sigma[s][d]", where s and d signify source and destination state, for example sigma10 in the …rst table represents 10 in the continuation component model. The …rst element of sd , a constant, is named "constant[s][d]". Coe¢ cients for timedummy variables, eq. 14, st ;t for transition s to d are named "s[t ]_[s][d]". Coe¢ cients for sociodemographic variables are named as in appendix A.1, with a post…x signifying the source state s and destionation state d: "[name][s][d]". Title sigma10

Avg

Continuation Component (0, churn),(1, stay) SD Sig Q0.5 Q0.025 Q0.05

Q0.95

Q0.975

0.4985

0.0780 **

0.4898

0.3854

0.3968

0.6542

0.6898

-3.8968

0.2577 **

-3.8625

-4.4438

-4.3596

-3.5131

-3.4308

s12_10

0.3345

0.0817 **

0.3338

0.1872

0.2091

0.4720

0.4931

vlo10

1.4436

0.0898 **

1.4417

1.2771

1.2956

1.5915

1.6226

vhi10

0.6948

0.0874 **

0.6954

0.5232

0.5449

0.8370

0.8584

-0.2849

0.0940 **

-0.2842

-0.4647

-0.4389

-0.1429

-0.1074

Teen10

1.2677

0.5676 **

1.2333

0.0934

0.357

2.2022

2.4470

jmed10

0.8048

0.3645 **

0.8014

0.1629

0.237

1.4075

1.5219

-1.5456

0.5063 **

-1.5595

-2.6212

-2.4313

-0.7822

-0.5926

constant10

internet10

ecomm10 eshort10

-2.0976

0.5704 **

-2.0853

-3.1628

-3.0467

-1.2135

-1.0323

eunkn10

-1.5766

0.6570 **

-1.6201

-2.7138

-2.5856

-0.5049

-0.2644

inch10

-0.9122

0.4459 **

-0.9275

-1.7056

-1.5824

-0.1318

-0.0389

Title sigma12 constant12

Product Choice Component (1,PSTN),(2,ISDN),(3,ADSL) Avg SD Sig Q0.5 Q0.025 Q0.05 Q0.95

Q0.975

3.8453

0.4748 **

3.7995

3.0861

3.1515

4.62

4.7297

-14.7535

1.5472 **

-14.8811

-17.4563

-17.2028

-12.0182

-11.8988

teen12

6.7159

3.7733 *

6.5287

-0.1194

0.8553

13.1094

13.729

inch12

6.0041

1.8473 **

6.1692

2.2538

2.8238

8.7652

9.0687

sigma13

1.9199

0.178 **

1.9016

1.5936

1.6592

2.2424

2.2962

-7.4215

0.37 **

-7.3776

-8.1827

-8.059

-6.926

-6.8228

constant13 vhi13

0.628

0.152 **

0.6204

0.3415

0.3828

0.8688

0.9131

inettch13

0.9399

0.3792 **

0.957

0.1868

0.3062

1.5663

1.6448

inch13

2.1206

0.7945 **

2.2071

0.4225

0.8043

3.2755

3.4872

sigma21 constant21 sigma23 constant23 child23 sigma31 constant31

1.1396

0.2467 **

1.1165

0.7664

0.7869

1.5852

1.6735

-5.6423

0.2764 **

-5.6335

-6.2283

-6.1138

-5.2292

-5.1775

1.915

0.2735 **

1.9186

1.4258

1.4725

2.3588

2.4193

-5.5955

0.397 **

-5.5947

-6.3822

-6.2781

-4.9223

-4.8316

6.3864

2.1342 **

6.3848

2.4604

3.1741

10.1244

10.8509

1.1212

0.2905 **

1.1414

0.516

0.6136

1.5706

1.7193

-3.5775

0.561 **

-3.5224

-4.7046

-4.5593

-2.71

-2.5835

vlo31

-0.931

0.4582 **

-0.9063

-1.904

-1.7511

-0.2687

-0.152

inch31

-4.7501

1.4793 **

-4.8405

-7.2523

-7.0198

-2.2055

-1.8201

64

Title sigma12

Consumption Intensity Component (1,low),(2,medium),(3,high) Avg SD Sig Q0.5 Q0.025 Q0.05 Q0.95

Q0.975

0.7328

0.084 **

0.7307

0.5718

0.5924

0.8744

-2.2348

0.1106 **

-2.2315

-2.4565

-2.4222

-2.0559

-2.026

s1_12

0.6454

0.0904 **

0.6421

0.4655

0.4931

0.7922

0.8119

s2_12

0.3223

0.13 **

0.3118

0.0964

0.1254

0.5666

0.5918

s3_12

0.5216

0.1126 **

0.5209

0.3062

0.3428

0.7153

0.749

adsl12

-0.4471

0.1001 **

-0.4442

-0.6561

-0.6233

-0.2758

-0.2555

teen12

-2.4502

0.6524 **

-2.4517

-3.7968

-3.4847

-1.3905

-1.207

1.5936

0.2266 **

1.589

1.1983

1.2336

2.0352

2.1623

constant12

sigma13 constant13

0.9075

-5.3318

0.4549 **

-5.3073

-6.4524

-6.1831

-4.6774

-4.6178

s1_13

1.2705

0.2373 **

1.278

0.7818

0.8538

1.649

1.6967

s2_13

0.9129

0.568 -

0.8612

-0.1874

-0.0046

1.9328

2.0486

s3_13

-0.9464

0.519 **

-0.91

-2.0462

-1.8466

-0.1491

-0.0484

s5_13

1.0918

0.4617 **

1.0596

0.3226

0.403

1.9252

2.1757

isdn13

-0.5515

0.3334 *

-0.5501

-1.2384

-1.1107

-0.0211

0.0843

adsl13

-1.6051

0.3482 **

-1.5997

-2.3487

-2.2089

-1.0631

-0.9724

sigma21

1.7985

0.0848 **

1.8014

1.6411

1.6658

1.939

1.9666

-3.9129

0.2094 **

-3.9045

-4.3519

-4.3077

-3.5695

-3.5299

s1_21

0.3508

0.0686 **

0.3484

0.2143

0.24

0.464

0.4866

s5_21

0.3033

0.0952 **

0.305

0.1094

0.1374

0.4553

0.4819

isdn21

0.4407

0.1576 **

0.4412

0.122

0.1744

0.7063

0.7627

adsl21

0.5046

0.121 **

0.5062

0.2704

0.3022

0.6997

0.7393

inettch21

-1.0653

0.5716 **

-1.038

-2.245

-2.0726

-0.2515

-0.0579

constant21

jmed21

-1.5004

0.4308 **

-1.5224

-2.2811

-2.1707

-0.7062

-0.5881

hs2p21

4.1931

1.8749 **

4.1284

0.6622

1.2982

7.3462

7.7055

sigma23

1.2674

0.0485 **

1.2694

1.1655

1.1826

1.3486

1.3584

-4.2892

0.1457 **

-4.297

-4.5216

-4.5009

-4.0504

-4.0022

constant23 s1_23

0.304

0.057 **

0.3073

0.189

0.2024

0.3932

0.4084

s2_23

0.1447

0.0724 **

0.1376

0.0148

0.0318

0.2653

0.3059

s3_23

0.1728

0.0706 **

0.1781

0.018

0.0488

0.2868

0.2962

s7_23

0.4411

0.0768 **

0.4486

0.3037

0.3147

0.5665

0.6011

isdn23

0.518

0.0992 **

0.5212

0.3301

0.3548

0.6719

0.7144

teen23

2.2014

0.5052 **

2.1994

1.1854

1.3822

3.0568

3.1808

pen23

2.5148

0.5216 **

2.5166

1.4741

1.6618

3.4371

3.5567

inch23

1.0736

0.3007 **

1.1047

0.4781

0.5509

1.5104

1.5548

sigma31

1.3713

0.3232 **

1.319

0.8367

0.885

2.0209

2.1107

-8.4085

0.542 **

-8.3677

-9.6074

-9.3699

-7.566

-7.4272

constant31 s1_31

0.1436

0.1659 -

0.1488

-0.2073

-0.1274

0.4047

0.4697

s5_31

0.4666

0.261 *

0.4595

-0.0668

0.0382

0.8836

0.9601

adsl31

0.4134

0.2208 *

0.4155

-0.0093

0.0608

0.7877

0.8566

jmed31

3.7239

0.8604 **

3.6425

2.1574

2.3982

5.3874

5.5916

edu_seek31

3.2287

0.7342 **

3.1677

1.889

2.1388

4.5172

4.7266

pen31

7.5359

1.2379 **

7.4957

5.1729

5.5743

9.6053

10.0598

sigma32 constant32

0.7255

0.0504 **

0.7224

0.6369

0.6516

0.8158

0.8349

-2.0982

0.1212 **

-2.0969

-2.3274

-2.3002

-1.9083

-1.8893

s1_32

0.3433

0.0484 **

0.345

0.2527

0.2652

0.42

0.4378

s3_32

0.3038

0.067 **

0.303

0.1627

0.1933

0.4133

0.4305

s7_32

0.6097

0.1069 **

0.6114

0.4051

0.4297

0.7916

0.8169

Isdn32

-0.469

0.0844 **

-0.4691

-0.6334

-0.6097

-0.3329

-0.3058 -0.5654

teen32

-1.4361

0.4694 **

-1.4558

-2.3157

-2.178

-0.674

i10032

-0.2773

0.1698 *

-0.2777

-0.602

-0.5644

-0.013

0.037

hc132

2.0332

2.0345

0.4256

0.6922

3.3167

3.7523

0.839 **

65

B.2

Empirical Distribution for Individual Level Coe¢ cients

parameter population distribution. Black line: Kernel estimate of empirical density. Grey line: Normal density. “p(SW)”: p-value of Shapiro and Wilk’s test for normality.

66

B.3

Churn Sensitivity

Percentage probability of churn in twelve months conditional on sociodemographic variables. Black line: Average risk. Dashed lines: 95% con…dence band. Grey vertical lines: First quantile, median and third quantile of sociodemographic variable.

67

Churn sensitivity. Predicted 12 months ahead churn percentage sensitivity to sociodemographic variables. “m-q1”: change in average churn from …rst quantile to median of variable. “q3-m”: change in average churn from median to third quantile of variable. Point: Average change. Grey line surrounding point: 95% con…dence band.

68

B.4

PSTN to ADSL Transition Sensitivity

Transitions from PSTN to ADSL. Sensitivity of 12 months-ahead probability of transition to sociodemographic variables. “m-q1”: change in average churn from …rst quantile to median of variable. “q3-m”: change in average churn from median to third quantile of variable. Point: Average change. Grey line surrounding point: 95% con…dence band.

69

B.5

5-year Conditional CLV Prediction

Estimated 5 years ahead average present value of customers, discounted at 10% per year - conditional on value of sociodemographic variable.

70

References Albert, J. & Chib, S. (1996), ‘Computation in bayesian econometrics: An introduction to markov chain monte carlo’, Advances in Econometrics 11, 3–24. Part A. Allenby, G. M. & Rossi, P. E. (1999), ‘Marketing models of consumer heterogeneity’, Journal of Econometrics 89(1-2), 57–78. Chen, M.-H., Shao, Q.-M. & Ibrahim, J. G. (2000), Monte Carlo Methods in Bayesian Computation (Springer Series in Statistics), Springer. URL: http://www.amazon.fr/exec/obidos/ASIN/0387989358/citeulike04-21 Chib, S. & Greenberg, E. (1995), ‘Understanding the metropolis-hastings algorithm’, The American Statistician 49(4), 327–335. Chib, S., Seetharaman, P. B. & Strijnev, A. (2002), Analysis of multi-category purchase incidence decisions using iri market basket data, in ‘Econometric Models in Marketing’, Vol. 16 of Advances in Econometrics : A Research Annual, pp. 57–92. Donkers, B., Verhoef, P. C. & Jong, M. d. (2003), ‘Predicting customer lifetime value in multi-service industries’. Heckman, J. J. (1981), Heterogeneity and state dependence, in S. Rosen, ed., ‘Studies in Labor Markets’, University of Chicago Press, pp. 91–139. Heckman, J. & Singer, B. (1984), ‘A method for minimizing the impact of distributional assumptions in econometric-models for duration data’, Econometrica 52(2), 271–320. Hsiao, C. (2003), Analysis of Panel Data, Econometric Society Monographs, Cambridge University Press. Jarner, S. F. & Tweedie, R. L. (2003), ‘Necessary conditions for geometric and polynomial ergodicity of random walk-type markov chains’, Bernoulli 9(4), 559–578. URL: http://www.projecteuclid.org/Dienst/UI/1.0/Summarize/euclid.bj/1066223269 Keane, M. P. (1997), ‘Modeling heterogeneity and state dependence in consumer choice behavior’, Journal of Business and Economic Statistics 15(3), 310–327. ¼S Kumar, V., Lemon, K. N. & Parasuramen, A. (2006), ‘Managing customers for value âA¸ an overview and research agenda’, Journal of Service Research 9(2), 87–94. Lancaster, T. (1990), The Econometric Analysis of Transition Data, Cambridge University Press. Manchanda, P., Ansari, A. & Gupta, S. (1999), ‘The "shopping basket": A model for multicategory purchase incidence decisions’, Marketing Science 18(2), 95–114. Miravete, E. J. & Palacios-Huerta, I. (2002), ‘Learning temporal preferences’, CEPR Discussion Papers . Reinartz, W., Thomas, J. S. & Kumar, V. (2003), ‘Allocating acquisition and retention resources to maximize customer pro…tability’. Robert, C. P. & Casella, G. (1999), Monte Carlo Statistical Methods, Springer Texts in Statistics, third edn, Springer. 71

Roberts, G. O., Gelman, A. & Gilks, W. R. (1997), ‘Weak convergence and optimal scaling of random walk metropolis algorithms’, Annals of Applied Probability 7(1), 110–120. Rossi, P. E. & Allenby, G. M. (2000), ‘Statistics and marketing’, Journal of the American Statistical Association 95(450), 635–638. Schmittlein, D. C. & Peterson, R. A. (1994), ‘Customer base analysis - an industrial purchase process application’, Marketing Science 13(1), 41–67. The 2002-2004 Research Priorities (2002), Technical report, Marketing Science Institute. The 2004-2006 Research Priorities (2004), Technical report, Marketing Science Institute. Train, K. E. (2003), Discrete Choice Models with Simulation, Cambridge University Press.

72

A Quantitative Model of Dynamic Customer ...

5.3 Who Buys ADSL? ... B.4 PSTN to ADSL Transition Sensitivity . ...... When contemplating the issue of computational tractability and speed, itrs interesting ..... seems left(skewed and this is confirmed by a DrAgostino test for skewness, so a ...

634KB Sizes 0 Downloads 251 Views

Recommend Documents

A Quantitative Model of Banking Industry Dynamics
Apr 11, 2013 - loans? ▻ Big banks increase loan exposure to regions with high downside risk. ... Document Banking Industry Facts from Balance sheet panel data as in Kashyap and Stein (2000). ...... Definitions Entry and Exit by Bank Size.

A Quantitative Model of Banking Industry Dynamics
Mar 21, 2013 - industry consistent with data in order to understand the relation .... it does allow us to consider how a big bank's loan behavior can ...... In Figure 11, we analyze the evolution of loan returns by bank size (when sorted by loans).

Capital Requirements in a Quantitative Model of ...
Jan 25, 2017 - (TACC) at University of Texas at Austin for providing HPC resources ..... in unconsolidated subsidiaries and associated companies, direct ...... We assume that the support of δ for big banks is large enough that the constraint.

Capital Requirements in a Quantitative Model of ...
May 24, 2017 - Introduction. Data. Model. Equilibrium. Calibration. Counterfactuals. Conclusion ... bank lending by big and small banks, loan rates, exit, and market ...... assess the quantitative significance of capital requirements. 28 / 112 ...

Capital Requirements in a Quantitative Model of Banking Industry ...
Jul 15, 2014 - of capital requirements on bank risk taking, commercial bank failure, and market ..... facts relevant to the current paper in our previous work [13], Section ...... the distribution of security holdings of the big bank is lower than th

``Capital Requirements in a Quantitative Model of Banking Industry ...
How can financial regulation help prevent/mitigate the impact of ... and regulatory changes ... Only one regulatory requirement (equity or liquidity) will bind.

A Dynamic Model of Price Signaling and Consumer ...
A Dynamic Model of Price Signaling and Consumer. Learning∗. Matthew Osborne†and Adam Hale Shapiro‡. March 7, 2012. Preliminary Draft: Please do not cite without the authors' permission. Abstract. We develop a model of consumer learning and pric

A Dynamic Model of Privatization with Endogenous Post-Privatization ...
Aug 21, 2008 - would imply minimal changes in post-privatization firm performance in the ..... G.3 Meanwhile, by entitling G to monitor, government ownership allows ...... “Rent Seeking and Government Ownership of Firms: An Application to ...

A Dynamic Model of Price Signaling, Consumer ...
Nov 22, 2014 - consumers' beliefs about the product's quality by altering both the price and ... price and quality, accounting for the fact that its choice will affect ...

Capital Requirements in a Quantitative Model of ...
Oct 9, 2015 - bank lending by big and small banks, loan rates, and market structure in the commercial banking industry (positive analysis).

Capital Requirements in a Quantitative Model of Banking Industry ...
Jan 25, 2017 - of capital requirements on bank risk taking, commercial bank failure, and market .... facts relevant to the current paper in our previous work [13], Section 2 ...... the distribution of security holdings of the big bank is lower than t

A Dynamic Spatial Model of Rural-Urban ...
Jun 30, 2014 - The new economic geography models presented in Fujita et al (2001) are also related to our analysis. In their ...... Afghanistan. Bangladesh. Brazil. Chile. China. Dominican Republic. India. Indonesia. Korea, Rep. Malaysia. Mexico. Bhu

A Dynamic Model of Privatization with Endogenous Post-Privatization ...
Aug 21, 2008 - 4Since ownership in this model is defined as cash flow rights, this ..... to privatize the firm at ts or to postpone the privatization until the instant.

A Dynamic Model of Competitive Entry Response
incumbent and an entrant can invest in a new technology, and the entrant can also ... Key words: new product development; defensive strategy, Markov perfect ...

A Model of Dynamic Pricing with Seller Learning
They focus on the problem for individuals selling their own homes, while we study the ..... Therefore, the trade-off between the expected current ...... grid points.

Estimating a Dynamic Discrete Choice Model of Health ...
in the agenda of the U.S. Department of Health and Human Services. Reaching .... of dynamic and static models highlights the importance of accounting ...... 15This approach has the advantage that it can be estimated using standard software.

A dynamic stochastic general equilibrium model for a small open ...
the current account balance and the real exchange rate. ... a number of real frictions, such as habit formation in consumption, investment adjustment costs ...... also define the following equations: Real imports. (. ) m t t t t m Q c im. = +. (A30).

A dynamic general equilibrium model to evaluate ...
tax structure. High statutory tax rates, various exemptions and narrow tax bases characterize .... savings and tax payments. Let t ... k be her stock of assets in t and tr the interest rate. If ..... 6 Author's estimate based on DANE national account

A dynamic general equilibrium model to evaluate ...
productivity is not affected by the agent's birth date. Government collects taxes on labor income, capital income and consumption. We assume that the capital ...

A dynamic causal model for evoked and induced ...
Jul 30, 2011 - changes in spectral activity induced by other sources and exogenous. (e.g., stimulus) ..... of model alternatives, all one can do is “motivate model space carefully” .... The free-energy principle: a unified brain theory? Nat. Rev.

A Three-dimensional Dynamic Posture Prediction Model for ...
A three-dimensional dynamic posture prediction model for simulating in-vehicle seated reaching movements is presented. The model employs a four-segment ...