IJRIT International Journal of Research in Information Technology, Volume 1, Issue 12, December, 2013, Pg. 177-185
International Journal of Research in Information Technology (IJRIT)
www.ijrit.com
ISSN 2001-5569
Customer Churn Prediction Model using Machine Learning Algorithms Innocent Mamvura University of Witwatersrand, Department of Computer science Johannesburg, South Africa
[email protected] Abstract The paper presents a customer churn study in the telecommunications industry .Churn Analysis is used to predict behaviour of customers who are most likely to take their business to a competitor. The aim of this paper was to find out the extent of customer churn in telecommunications industry, the causes and effects of customer churn and to develop a predictive model for churn using machine learning algorithms. The results showed that, J48 decision tree classifier had a higher predicting accuracy of 87.4%, Radial Basis Function Network (86.2%) and the lowest is Naive Bayes Network (85.8%).Customer Service Calls is the most important predictor of customer churning, and then followed by International Plan, International Calls and Night Calls. A customer churn prediction model can assist managers predict customer behaviour by analysing historical and current data, extracting key data for decision making, and discovering hidden relationships and patterns.
Keywords: Radial Basis Function Network, J48, Naïve Bayes Network
1. Introduction Customer churn has become a significant problem for organizations in, financial sector, insurance, health care, banking, and telecommunications industry. It is less expensive to retain a current customer than to find a new one [12]. Statistically speaking, acquiring new customers can cost up to five times more than satisfying and retaining existing customers. The aim of this paper was to find out the extent of customer churn in telecommunications industry, the causes and effects of customer churn and to develop a predictive model for churn using machine learning algorithms. Telecommunication firms spend a high percentage of their operation cost on trying to win back Innocent Mamvura, IJRIT
177
customers who have left. With a high competitive nature of the telecommunication industry, organizations should be more reactive than proactive when it comes to customer satisfaction. This paper proposes the Naïve Bayes Network, J48 and the Radial Basis Function Network based approaches to predict customer churn in telecommunications industry. The rest of the paper is organized as follows. Section 2 reviews the literature, related to customer churn and different data mining techniques used to predict churn in different studies. Section 3 describes the current research methodology, and Section 4 presents the experimental results. Finally, conclusion is provided in Section 5.
2. Related Work [8] performed experiments using the Naïve Bayes and Bayesian Network algorithms and compared the results with C4.5 decision tree classifier. They presented a new subset of features categorised as contract-related features, call patterns description features and calls pattern changes description features. Experimental results show improved prediction rates for all the models used. [9] combined four different data mining techniques for churn prediction, which are backpropagation artificial neural networks (ANN), self-organizing maps (SOM), alpha-cut fuzzy c-means (ߙ-FCM), and Cox proportional hazards regression model. The experimental results show that the hierarchical models outperform the single Cox regression baseline model in terms of prediction accuracy, Types I and II errors, RMSE, and MAD metrics. In addition, the ߙ-FCM + ANN + Cox model significantly performs better than the two other hierarchical models. [14] applied fuzzy multi-criteria classification approach which involves multiple criterion parameter based for identifying the customer churn. [10] provided a descriptive analysis of how methodological factors contribute to the accuracy of customer churn predictive models.. The survey indicated that logistic regression (45%) and decision trees (23%) were the most common estimation techniques, but neural nets (11%), discriminant analysis (9%), cluster analysis (7%), and Bayes (5%) were used as well. [13] proposed a framework, called Group-First Churn Prediction, which eliminates the a priori requirement of knowing who recently churned. The method looks at the customer interactions to predict which groups of subscribers are most prone to churn, before even a single member in the group has churned. [1] investigated the determinants of customer churn in the Korean mobile telecommunications service market. Results indicate that call quality-related factors influence customer churn; however, customers participating in membership card programs are also more likely to churn. Multinomial logistic regression model on independent variables was used to analyse the mediation effects of customer status. The logistic results showed that the call drop rates and number of complaints has a significant impact on the probability of churn. Loyalty points have a significant negative impact on the probability of churn: the more loyalty points customers have accumulated, the less likely they are to churn. [2] proposed a neural network (NN) based approach to predict customer churn in subscription of cellular wireless services. The results of experiments indicate that neural network based approach can predict customer churn with accuracy more than 92%. [6] used logistic regression technique and tree analysis to predict customer Innocent Mamvura, IJRIT
178
churning. The analysis shows that subscribers, who do not have a discounted package, have very high tendency to churn.
3. Materials and Methods 3.1 Datasets The paper used the churn dataset from the UCI machine learning repository [5]. The churn dataset deals with cellular service provider’s customers and the data pertinent to the voice calls they make. Customers have a choice of service providers, or companies providing them with cellular network services. The dependent variable churn had two values; churners(0), Non Churners(1). The following independent variables were used; Customers Service Calls, International Calls, Voice Mail Plan, Day Calls, Night Calls, Eve Calls and International Calls. 3.2 Methods 3.2.1 J48 Algorithm The J48 Decision tree classifier [7] creates a decision tree based on the attribute values of the available training data .It builds the decision tree from labeled training data set using information gain and it examines the same that results from choosing an attribute for splitting the data. To make the decision the attribute with highest normalized information gain is used. Then the algorithm recurs on smaller subsets. The splitting procedure stops if all instances in a subset belong to the same class. Then the leaf node is created in a decision tree telling to choose that class [3]. 3.2.2 Naïve Bayes Network The Naïve Bayes classifier [11] is based on the Bayes rule of conditional probability. It makes use of all the attributes contained in the data, and analyses them individually as though they are equally important and independent of each other. 3.2.3 RBF Network The radial basis network learns a logistic regression or linear regression model using the k-means clustering algorithm by fitting multivariate Gaussians functions which contains an input layer, hidden layer and an output layer. Radial Network standardizes all numeric attributes to zero mean and unit variance. The neural network has an input layer, a hidden layer and an output layer. 3.2.4 Churn Prediction Model The dataset was imported into a data mining tool known as Konstanz Information Miner (KNIME) using the File Reader node which can be used to read data from an ASCII file or URL location. It can be configured to read various formats. The holdout method was used, to split data into training set and test set. The extracted knowledge was tested against the test set to check the accuracy of the model. Classification algorithms such as J48 decision tree Innocent Mamvura, IJRIT
179
miner, BayesNet and RFB Network were joined to the portioning node and the Weka Predictor was used to predict the model.
Fig 1: The Churning Prediction Model
4. Experimental Results The aim of this paper was to find out the extent of customer churn in telecommunications industry, the causes and effects of customer churn and to develop a predictive model for churn using machine learning algorithms. This paper proposes the Naïve Bayes Network, J48 and the Radial Basis Function Network based approaches to predict customer churn in telecommunications industry. This section describes the experimental results obtained J48 The first split is on Customer Service Calls attribute, and then, at the second level, the splits are on International Plan and International Calls, respectively. In the tree structure, a colon introduces the class label that has been assigned to a particular leaf, followed by the number of instances that reach that leaf, expressed as a decimal number because of the way the algorithm uses fractional instances to handle missing values. The pruned tree shows that customers who makes the most customer service calls have greater chances of churning. Customers making the most international calls are churners.
Figure 2: J48 Pruned Tree Innocent Mamvura, IJRIT
180
Table 1.0: J48 Evaluation The correctly and incorrectly classified instances show the percentage of test instances that were correctly and incorrectly classified. The J48 decision classifier has a prediction accuracy of 87.4% (correctly classified instances). Kappa is a chance-corrected measure of agreement between the classifications and the true classes. It's calculated by taking the agreement expected by chance away from the observed agreement and dividing by the maximum possible agreement. A value greater than 0 means that the classifier predicts better than chance. The J48 decision classifier has a Kappa statistic of 0.317. RBF Network The radial basis network learns a logistic regression or linear regression model using the k-means clustering algorithm by fitting multivariate Gaussians functions which contains an input layer, hidden layer and an output layer. Radial Network standardizes all numeric attributes to zero mean and unit variance. The neural network has an input layer, a hidden layer and an output layer.
Innocent Mamvura, IJRIT
181
Table 1.1: RBF Evaluation
The Radial Basis Network classifier has a prediction accuracy of 86.2% (correctly classified instances) and a Kappa statistic of 0.2446 meaning that this classifier predicts better than chance.
BayesNet The Naïve Bayes classifier [11] is based on the Bayes rule of conditional probability. It makes use of all the attributes contained in the data, and analyses them individually as though they are equally important and independent of each other.
Innocent Mamvura, IJRIT
182
Fig 4: BayesNetwork Tree
Table 1.2: BayesNetwork Evaluation The BayesNetwork classifier has a prediction accuracy of 85.8% (correctly classified instances). The BayesNetwok classifier has a Kappa statistic of 0.2816 meaning that this classifier predicts better than chance.
5. Discussions and Conclusion This study investigated predictors of churning using machine learning algorithms. Based on the predictive models obtained, we can clearly see that the J48 decision classifier had the highest accuracy of 87.3% and the Naïve Bayes Innocent Mamvura, IJRIT
183
Network had the lowest accuracy of 85.8%. The Radial Basis Function Network
yields an average accuracy of
around 86.2. An average of 2003 instances out of total 2334 instances is found to be correctly classified with highest score of 2039 instances compared to 2003 instances, which is the lowest score. Customer Service Calls is the most important field within this current network, and then followed by International Plan, International Calls and Night Calls. So we can conclude those customers who are frequently calling to customer service numbers, having international calling plan and making the most night calls, may become potential churners. Operators need to invest more on building optimal churn-prediction models centered on customer experience. With these models, operators can make better customer churn predictions, implement more accurate customer retention programs, and ensure revenue is not lost. Future research work will consider combining hybrid data mining approaches such as clustering and classification techniques and try and improve the performance of the algorithms. Hybrid approaches are particularly combined of two learning stages, in which the first one is preprocessing the data and the second one is the final prediction output [4]. And other prediction techniques such as vector machines and genetic algorithms can be applied. The current methodology of churn prediction can be tested for other sectors like banking, insurance or health care and comparisons can be done for prediction accuracy.
6. Bibliography [1] Ahn, Jae-Hyeon, Sang-Pil Han, and Yung-Seop Lee. "Customer churn analysis: Churn determinants and mediation effects of partial defection in the Korean mobile telecommunications service industry." Telecommunications Policy 30.10 (2006): 552-568. [2] Anuj Sharma and Dr. Prabin Kumar Panigrahi. Article: A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services. International Journal of Computer Applications 27(11):26-31, August 2011. Published by Foundation of Computer Science, New York, USA. [3] Aruna, S., Dr SP Rajagopalan, and L. V. Nandakishore. "Knowledge based analysis of various statistical tools in detecting breast cancer." Computer Science & Information Technology (CS&IT) 2 (2011): 37-45. [4] C. F. Tsai and Y. H. Lu, “Customer churn prediction by hybrid neural networks,” Expert Systems with Applications, vol. 36, no. 10, pp. 12547–12553, 2009. [5] C. L. Blake and C. J. Merz, Churn Data Set, UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/∼mlearn/MLRepository.html. University of California, Department of Information and Computer Science, Irvine, CA, 1998. [6] Gürsoy, Şimşek, and Umman Tuğba. "Customer churn analysis in telecommunication sector." Journal of the School of Business Administration, Istanbul University 39.1 (2010): 35-49. [7] J. Han, M. Kamber and J. Pei, Data Mining Concepts and Techniques, Morgan Kaufmann Publishers, 2012 Innocent Mamvura, IJRIT
184
[8] Kirui, Clement, et al. "Predicting Customer Churn in Mobile Telephony Industry Using Probabilistic Classifiers in Data Mining." (2013). [9] Mohammadi, Golshan, Reza Tavakkoli-Moghaddam, and Mehrdad Mohammadi. "Hierarchical Neural Regression Models for Customer Churn Prediction." Journal of Engineering 2013 (2013). [10] Neslin, Scott A., et al. "Defection detection: Measuring and understanding the predictive accuracy of customer churn models." Journal of Marketing Research (2006): 204-211. [11] P. N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Pearson Education Asia Inc., 2006. [12] R. Mattison, The Telco Churn Management Handbook, Oakwood Hills, Illinois: XiT Press, 2005. [13] Richter, Yossi, Elad Yom-Tov, and Noam Slonim. "Predicting Customer Churn in Mobile Networks through Analysis of Social Groups." SDM. 2010. [14 ] Subramaniam, Sakthikumar, Arunkumar Thangavelu, and Hemavathy Ramasubbian. "Fact-An Adaptive Customer Churn Rate
Prediction Method Using Fuzzy Multi-Criteria Classification Approach For Decision
Making." Asian Journal of Science and Technology 4.11 (2013): 227-233.
Innocent Mamvura, IJRIT
185