IJRIT International Journal of Research in Information Technology, Volume 1, Issue 12, December, 2013, Pg. 177-185

International Journal of Research in Information Technology (IJRIT)

www.ijrit.com

ISSN 2001-5569

Customer Churn Prediction Model using Machine Learning Algorithms Innocent Mamvura University of Witwatersrand, Department of Computer science Johannesburg, South Africa [email protected] Abstract The paper presents a customer churn study in the telecommunications industry .Churn Analysis is used to predict behaviour of customers who are most likely to take their business to a competitor. The aim of this paper was to find out the extent of customer churn in telecommunications industry, the causes and effects of customer churn and to develop a predictive model for churn using machine learning algorithms. The results showed that, J48 decision tree classifier had a higher predicting accuracy of 87.4%, Radial Basis Function Network (86.2%) and the lowest is Naive Bayes Network (85.8%).Customer Service Calls is the most important predictor of customer churning, and then followed by International Plan, International Calls and Night Calls. A customer churn prediction model can assist managers predict customer behaviour by analysing historical and current data, extracting key data for decision making, and discovering hidden relationships and patterns.

Keywords: Radial Basis Function Network, J48, Naïve Bayes Network

1. Introduction Customer churn has become a significant problem for organizations in, financial sector, insurance, health care, banking, and telecommunications industry. It is less expensive to retain a current customer than to find a new one [12]. Statistically speaking, acquiring new customers can cost up to five times more than satisfying and retaining existing customers. The aim of this paper was to find out the extent of customer churn in telecommunications industry, the causes and effects of customer churn and to develop a predictive model for churn using machine learning algorithms. Telecommunication firms spend a high percentage of their operation cost on trying to win back Innocent Mamvura, IJRIT

177

customers who have left. With a high competitive nature of the telecommunication industry, organizations should be more reactive than proactive when it comes to customer satisfaction. This paper proposes the Naïve Bayes Network, J48 and the Radial Basis Function Network based approaches to predict customer churn in telecommunications industry. The rest of the paper is organized as follows. Section 2 reviews the literature, related to customer churn and different data mining techniques used to predict churn in different studies. Section 3 describes the current research methodology, and Section 4 presents the experimental results. Finally, conclusion is provided in Section 5.

2. Related Work [8] performed experiments using the Naïve Bayes and Bayesian Network algorithms and compared the results with C4.5 decision tree classifier. They presented a new subset of features categorised as contract-related features, call patterns description features and calls pattern changes description features. Experimental results show improved prediction rates for all the models used. [9] combined four different data mining techniques for churn prediction, which are backpropagation artificial neural networks (ANN), self-organizing maps (SOM), alpha-cut fuzzy c-means (ߙ-FCM), and Cox proportional hazards regression model. The experimental results show that the hierarchical models outperform the single Cox regression baseline model in terms of prediction accuracy, Types I and II errors, RMSE, and MAD metrics. In addition, the ߙ-FCM + ANN + Cox model significantly performs better than the two other hierarchical models. [14] applied fuzzy multi-criteria classification approach which involves multiple criterion parameter based for identifying the customer churn. [10] provided a descriptive analysis of how methodological factors contribute to the accuracy of customer churn predictive models.. The survey indicated that logistic regression (45%) and decision trees (23%) were the most common estimation techniques, but neural nets (11%), discriminant analysis (9%), cluster analysis (7%), and Bayes (5%) were used as well. [13] proposed a framework, called Group-First Churn Prediction, which eliminates the a priori requirement of knowing who recently churned. The method looks at the customer interactions to predict which groups of subscribers are most prone to churn, before even a single member in the group has churned. [1] investigated the determinants of customer churn in the Korean mobile telecommunications service market. Results indicate that call quality-related factors influence customer churn; however, customers participating in membership card programs are also more likely to churn. Multinomial logistic regression model on independent variables was used to analyse the mediation effects of customer status. The logistic results showed that the call drop rates and number of complaints has a significant impact on the probability of churn. Loyalty points have a significant negative impact on the probability of churn: the more loyalty points customers have accumulated, the less likely they are to churn. [2] proposed a neural network (NN) based approach to predict customer churn in subscription of cellular wireless services. The results of experiments indicate that neural network based approach can predict customer churn with accuracy more than 92%. [6] used logistic regression technique and tree analysis to predict customer Innocent Mamvura, IJRIT

178

churning. The analysis shows that subscribers, who do not have a discounted package, have very high tendency to churn.

3. Materials and Methods 3.1 Datasets The paper used the churn dataset from the UCI machine learning repository [5]. The churn dataset deals with cellular service provider’s customers and the data pertinent to the voice calls they make. Customers have a choice of service providers, or companies providing them with cellular network services. The dependent variable churn had two values; churners(0), Non Churners(1). The following independent variables were used; Customers Service Calls, International Calls, Voice Mail Plan, Day Calls, Night Calls, Eve Calls and International Calls. 3.2 Methods 3.2.1 J48 Algorithm The J48 Decision tree classifier [7] creates a decision tree based on the attribute values of the available training data .It builds the decision tree from labeled training data set using information gain and it examines the same that results from choosing an attribute for splitting the data. To make the decision the attribute with highest normalized information gain is used. Then the algorithm recurs on smaller subsets. The splitting procedure stops if all instances in a subset belong to the same class. Then the leaf node is created in a decision tree telling to choose that class [3]. 3.2.2 Naïve Bayes Network The Naïve Bayes classifier [11] is based on the Bayes rule of conditional probability. It makes use of all the attributes contained in the data, and analyses them individually as though they are equally important and independent of each other. 3.2.3 RBF Network The radial basis network learns a logistic regression or linear regression model using the k-means clustering algorithm by fitting multivariate Gaussians functions which contains an input layer, hidden layer and an output layer. Radial Network standardizes all numeric attributes to zero mean and unit variance. The neural network has an input layer, a hidden layer and an output layer. 3.2.4 Churn Prediction Model The dataset was imported into a data mining tool known as Konstanz Information Miner (KNIME) using the File Reader node which can be used to read data from an ASCII file or URL location. It can be configured to read various formats. The holdout method was used, to split data into training set and test set. The extracted knowledge was tested against the test set to check the accuracy of the model. Classification algorithms such as J48 decision tree Innocent Mamvura, IJRIT

179

miner, BayesNet and RFB Network were joined to the portioning node and the Weka Predictor was used to predict the model.

Fig 1: The Churning Prediction Model

4. Experimental Results The aim of this paper was to find out the extent of customer churn in telecommunications industry, the causes and effects of customer churn and to develop a predictive model for churn using machine learning algorithms. This paper proposes the Naïve Bayes Network, J48 and the Radial Basis Function Network based approaches to predict customer churn in telecommunications industry. This section describes the experimental results obtained J48 The first split is on Customer Service Calls attribute, and then, at the second level, the splits are on International Plan and International Calls, respectively. In the tree structure, a colon introduces the class label that has been assigned to a particular leaf, followed by the number of instances that reach that leaf, expressed as a decimal number because of the way the algorithm uses fractional instances to handle missing values. The pruned tree shows that customers who makes the most customer service calls have greater chances of churning. Customers making the most international calls are churners.

Figure 2: J48 Pruned Tree Innocent Mamvura, IJRIT

180

Table 1.0: J48 Evaluation The correctly and incorrectly classified instances show the percentage of test instances that were correctly and incorrectly classified. The J48 decision classifier has a prediction accuracy of 87.4% (correctly classified instances). Kappa is a chance-corrected measure of agreement between the classifications and the true classes. It's calculated by taking the agreement expected by chance away from the observed agreement and dividing by the maximum possible agreement. A value greater than 0 means that the classifier predicts better than chance. The J48 decision classifier has a Kappa statistic of 0.317. RBF Network The radial basis network learns a logistic regression or linear regression model using the k-means clustering algorithm by fitting multivariate Gaussians functions which contains an input layer, hidden layer and an output layer. Radial Network standardizes all numeric attributes to zero mean and unit variance. The neural network has an input layer, a hidden layer and an output layer.

Innocent Mamvura, IJRIT

181

Table 1.1: RBF Evaluation

The Radial Basis Network classifier has a prediction accuracy of 86.2% (correctly classified instances) and a Kappa statistic of 0.2446 meaning that this classifier predicts better than chance.

BayesNet The Naïve Bayes classifier [11] is based on the Bayes rule of conditional probability. It makes use of all the attributes contained in the data, and analyses them individually as though they are equally important and independent of each other.

Innocent Mamvura, IJRIT

182

Fig 4: BayesNetwork Tree

Table 1.2: BayesNetwork Evaluation The BayesNetwork classifier has a prediction accuracy of 85.8% (correctly classified instances). The BayesNetwok classifier has a Kappa statistic of 0.2816 meaning that this classifier predicts better than chance.

5. Discussions and Conclusion This study investigated predictors of churning using machine learning algorithms. Based on the predictive models obtained, we can clearly see that the J48 decision classifier had the highest accuracy of 87.3% and the Naïve Bayes Innocent Mamvura, IJRIT

183

Network had the lowest accuracy of 85.8%. The Radial Basis Function Network

yields an average accuracy of

around 86.2. An average of 2003 instances out of total 2334 instances is found to be correctly classified with highest score of 2039 instances compared to 2003 instances, which is the lowest score. Customer Service Calls is the most important field within this current network, and then followed by International Plan, International Calls and Night Calls. So we can conclude those customers who are frequently calling to customer service numbers, having international calling plan and making the most night calls, may become potential churners. Operators need to invest more on building optimal churn-prediction models centered on customer experience. With these models, operators can make better customer churn predictions, implement more accurate customer retention programs, and ensure revenue is not lost. Future research work will consider combining hybrid data mining approaches such as clustering and classification techniques and try and improve the performance of the algorithms. Hybrid approaches are particularly combined of two learning stages, in which the first one is preprocessing the data and the second one is the final prediction output [4]. And other prediction techniques such as vector machines and genetic algorithms can be applied. The current methodology of churn prediction can be tested for other sectors like banking, insurance or health care and comparisons can be done for prediction accuracy.

6. Bibliography [1] Ahn, Jae-Hyeon, Sang-Pil Han, and Yung-Seop Lee. "Customer churn analysis: Churn determinants and mediation effects of partial defection in the Korean mobile telecommunications service industry." Telecommunications Policy 30.10 (2006): 552-568. [2] Anuj Sharma and Dr. Prabin Kumar Panigrahi. Article: A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services. International Journal of Computer Applications 27(11):26-31, August 2011. Published by Foundation of Computer Science, New York, USA. [3] Aruna, S., Dr SP Rajagopalan, and L. V. Nandakishore. "Knowledge based analysis of various statistical tools in detecting breast cancer." Computer Science & Information Technology (CS&IT) 2 (2011): 37-45. [4] C. F. Tsai and Y. H. Lu, “Customer churn prediction by hybrid neural networks,” Expert Systems with Applications, vol. 36, no. 10, pp. 12547–12553, 2009. [5] C. L. Blake and C. J. Merz, Churn Data Set, UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/∼mlearn/MLRepository.html. University of California, Department of Information and Computer Science, Irvine, CA, 1998. [6] Gürsoy, Şimşek, and Umman Tuğba. "Customer churn analysis in telecommunication sector." Journal of the School of Business Administration, Istanbul University 39.1 (2010): 35-49. [7] J. Han, M. Kamber and J. Pei, Data Mining Concepts and Techniques, Morgan Kaufmann Publishers, 2012 Innocent Mamvura, IJRIT

184

[8] Kirui, Clement, et al. "Predicting Customer Churn in Mobile Telephony Industry Using Probabilistic Classifiers in Data Mining." (2013). [9] Mohammadi, Golshan, Reza Tavakkoli-Moghaddam, and Mehrdad Mohammadi. "Hierarchical Neural Regression Models for Customer Churn Prediction." Journal of Engineering 2013 (2013). [10] Neslin, Scott A., et al. "Defection detection: Measuring and understanding the predictive accuracy of customer churn models." Journal of Marketing Research (2006): 204-211. [11] P. N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Pearson Education Asia Inc., 2006. [12] R. Mattison, The Telco Churn Management Handbook, Oakwood Hills, Illinois: XiT Press, 2005. [13] Richter, Yossi, Elad Yom-Tov, and Noam Slonim. "Predicting Customer Churn in Mobile Networks through Analysis of Social Groups." SDM. 2010. [14 ] Subramaniam, Sakthikumar, Arunkumar Thangavelu, and Hemavathy Ramasubbian. "Fact-An Adaptive Customer Churn Rate

Prediction Method Using Fuzzy Multi-Criteria Classification Approach For Decision

Making." Asian Journal of Science and Technology 4.11 (2013): 227-233.

Innocent Mamvura, IJRIT

185

Customer Churn Prediction Model using Machine ...

using machine learning algorithms. ... card programs are also more likely to churn. ... service providers, or companies providing them with cellular network ...

1MB Sizes 5 Downloads 250 Views

Recommend Documents

Collective Churn Prediction in Social Network
Jun 11, 2011 - social network service [1]–[4]. Threats arising from churn have substantial impact on the profitability of service providers as retaining an existing ...

Collective Churn Prediction in Social Network
Jun 11, 2011 - 1) Through analysis of the social network data, we propose a simple yet robust .... Churn rate for proportion of churn friends (%). (b) Social ...

Prediction of Advertiser Churn for Google ... - Research at Google
Abstract. Google AdWords has thousands of advertisers participating in auctions to show their advertisements. Google's business model has two goals: first, ...

Improving Location Prediction using a Social Historical Model with ...
Location-based Social Networks (LBSN) are a popular form of social media where users are able to check-in to loca- tions they have ..... [5] H. Gao, J. Tang, and H. Liu. gSCorr: modeling geo-social correlations for new check-ins on location-based soc

Machine Translation Model using Inductive Logic ...
Rule based machine translation systems face different challenges in building the translation model in a form of transfer rules. Some of these problems require enormous human effort to state rules and their consistency. This is where different human l

Experimental Results Prediction Using Video Prediction ...
RoI Euclidean Distance. Video Information. Trajectory History. Video Combined ... Training. Feature Vector. Logistic. Regression. Label. Query Feature Vector.

Anesthesia Prediction Using Fuzzy Logic - IJRIT
Thus a system proposed based on fuzzy controller to administer a proper dose of ... guide in developing new anesthesia control systems for patients based on ..... International conference on “control, automation, communication and energy ...

Points to Churn -
Dec 3, 2012 - bhavna shubh kaaman wali rakhnewale, apni drishti se sabko farishte roop men dekhnewale, karm se sada sukh dene aur lenewale, brahma ...

A Hybrid Prediction Model for Moving Objects - University of Queensland
for a data mining process. ... measures. • We present a novel data access method, Trajectory Pat- ..... node has free space, pk is inserted into it, otherwise it splits.

A Hybrid Prediction Model for Moving Objects - University of Queensland
a shopping center currently (9:05 a.m.), it is unreasonable to predict what her ..... calls, hence, the computing time is significantly reduced. ..... an eight hour period. .... 24. 26. 28. 30. 32. 34. 36. 38. Eps. ( a ) number of patte rn s. Bike. C

An Improved Particle Swarm Optimization for Prediction Model of ...
An Improved Particle Swarm Optimization for Prediction Model of. Macromolecular Structure. Fuli RONG, Yang YI,Yang HU. Information Science School ...

Particle Removal in Linear Shear Flow: Model Prediction and ...
locations in the system. It is important to test particle behavior experimentally under all conditions that may arise. Therefore, the aim of this study is to be able to predict the risk of particle detachment by modeling. For this purpose, particleâ€

MODEL-BASED QOE PREDICTION TO ENABLE ...
the one used in our simulation software (JVT JM 16.2 [13]) is pseudo-random. The linear model states that the impact vanishes after 1/β frames (the intra refresh update interval for the cyclic scheme), which is not the case for the pseudo- random sc

A Three-dimensional Dynamic Posture Prediction Model for ...
A three-dimensional dynamic posture prediction model for simulating in-vehicle seated reaching movements is presented. The model employs a four-segment ...

Model Combination for Machine Translation - Semantic Scholar
ing component models, enabling us to com- bine systems with heterogenous structure. Un- like most system combination techniques, we reuse the search space ...

Customer Targeting Models Using Actively ... - Semantic Scholar
Aug 27, 2008 - porate software offerings like Rational, to high-end services in IT and business ... propensity for companies that do not have a prior re- lationship with .... approach is Naıve Bayes using a multinomial text model[10]. We also ran ..

Model Combination for Machine Translation - John DeNero
System combination procedures, on the other hand, generate ..... call sentence-level combination, chooses among the .... In Proceedings of the Conference on.

Estimation of Prediction Intervals for the Model Outputs ...
Abstract− A new method for estimating prediction intervals for a model output using machine learning is presented. In it, first the prediction intervals for in-.

A Self-Similar Traffic Prediction Model for Dynamic ...
known about the traffic characteristics of wireless networks. It was shown in [1] that wireless traffic traces do indeed exhibit a certain degree of self-similarity and ...

Collision model for vehicle motion prediction after light ...
Vehicle collision mechanics are useful in fields such as vehicle crashworthiness, passenger injury prediction, and ... the prediction results of this 4-DOF model against the commercial vehicle dynamics software. CarSimTM. The rest ... Section 3 prese

Matched-Field Performance Prediction with Model ...
Julien Bonnel is with ENSTA Bretagne, UMR CNRS 6285 Lab-STICC, 2 rue. Francois ... L(fM )]T . (2). To estimate θ, matched-field methods compare the data measured on the array with replicas of the acoustic field derived from the wave equation. .....