Reference Framework for Handling Concept Drift: An ...

Viewer
Transcript

Reference Framework for Handling Concept Drift: An Application Perspective ˇ Indr˙e Zliobait˙ e and Mykola Pechenizkiy Eindhoven University of Technology, Eindhoven, the Netherlands

Abstract In data mining field, the problem of concept drift has been recognized and actively studied almost for two decades. It refers to changes in the concepts underlying the data, or in the distribution of the data over time. These changes effect the performance of models inferred from the historical data, some of which may be no longer relevant. This paper provides a view to the concept drift research from an application perspective. We overview the application areas where the problem of concept drift is relevant. The goal is to provide a reference framework presenting a whole spectrum of problems related to real application in which handling of concept drift is important. Based on this framework we consider the relations between the different groups of methods that handle concept drift and different types of application tasks. To facilitate this process we categorize the applications based on the properties of the underlying supervised learning tasks. In the discussion we present our vision of the current mismatch between mainstream concept drift research and application needs, and identify promising future research directions in the field from the application perspective. Keywords: concept drift, applications 1. Introduction Realism of the perfect world assumptions often made in machine learning has been challenged years ago [1]. One of these challenges relates to an observation that in the real world the data tends to change over time. As a result, model predictions might become less accurate as the time passes or opportunities to improve the accuracy might be missed. Thus the learning models need to be adaptive to the changes. Preprint submitted to Neurocomputing

December 1, 2010

In predictive analytics, machine learning and data mining the phenomenon of unexpected change in underlying data over time is known as concept drift [2, 3, 4]. Changes in underlying data might occur due to changing personal interests, changes in population, adversary activities or they can be attributed to a complex nature of the environment. The problem of concept drift is of increasing importance to machine learning and data mining as more and more data is organized in the form of data streams rather than static databases, and it is rather unusual that concepts and data distributions stay stable over a long period of time. It is not surprising that the problem of concept drift has been studied in several research communities including but not limited to pattern mining, machine learning and data mining, data streams, information retrieval, and recommender systems. Different approaches for detecting and handling concept drift have been proposed in the literature, and many of them have already proven their potential in a wide range of application domains, e.g. fraud detection, adaptive system control, user modeling, information retrieval, text mining, biomedicine [5, 6, 7, 8]. Concept drift describes changes in the underlying data distributions over time. But there are different types of changes and there are different types of applications. ‘One size fits all’ solution for handling concept drift is hardly possible. This paper overviews the application areas where the problem of concept drift is relevant. The goal is to provide a view to the concept drift research from an application perspective. We look for relations between the methods that handle concept drift and application tasks. We categorize the applications based on the properties of the underlying supervised learning tasks. We present a reference framework, which presents a spectrum of problems related to concept drift, that are motivated by real applications. We elaborate on a few illustrative examples showing the diversity of application properties that may put constraints on the applicability of some of the generic approaches for handling them. The paper is organized as follows. We start from a bird’s view over basic set ups and the main techniques to handle concept drift in supervised learning tasks in Section 2. In Section 3 we overview the current testing practice. In Section 4 we identify the main properties that define tasks with concept drift and in Section 5 provide a categorization of application areas and tasks based on those properties. In relation to these properties we overview a landscape of application tasks. In Section 6 we elaborate on four application examples, 2

Population (source)

Population 2

Test data Population 1

Learning system

Historical data

Historical data

labels

Test data

= ??

Learning system

labels Labels ?

Labels ?

(a) stationary

(b) concept drift

Figure 1: Stationary learning and concept drift. that represent different types of tasks. We emphasize in what way the settings are different from a typical concept drift handling scenario dominating in the literature. Section 7 gives our view what the most promising and urgent future research directions from the application perspective are and concludes the study. 2. Background In this section we review what is concept drift, what techniques are used for handling concept drift in online learning and what are the typical set ups within which concept drift handling techniques are being designed. 2.1. Settings Traditional supervised learning assumes that the training and application data come from the same source, as illustrated in Figure 1 (a). Learning under concept drift brings additional challenges, since it is expected, that the training and testing data might come from different distributions, Figure 1 (b). In online learning scenarios data distributions might change over time. The learning systems have opportunities to periodically retrain using new data. This way they adapt to concept drift. There are two strategic questions to decide when designing adaptive learners. First, how to select the data to form a training set at each time step. Second, how to update the models at every time step. 3

Table 1: Adaptive learning techniques. single model multiple models

with triggers detectors contextual

evolving forgetting dynamic ensembles

2.2. Techniques for handling concept drift The strategies to form a training set can have trigger mechanisms or be constantly evolving. Trigger mechanisms mean that the incomming data is monitored and described, adaptive actions are taken based on the allerts of the triggering mechanism. Constantly evolving mechanisms mean that prespecified adaptive actions are periodically repeated, independently of whether change has happened or not. Model update can be done as a replacement or incremental updates. Replacement means that a single model is in operation, adaptive action replaces the previous model with a new one. Incremental updates typically mean that an ensemble of individual models is maintained. Some selected models might be retired and new ones built over time. But the main way to achieve adaptivity in this case is via fusion rules or model selection. The model is not replaced at every time step or as soon as a change happens, typically only incremental updates are done. Table 1 illustrates the taxonomy. We will discuss each type of the techniques in more details. The basic forgetting approach uses a training window of a fixed length. Training window refers to a sequence of training instances of length w, which is periodically shifted in time towards the future as more data comes. At time t the instances that fall into the period from t − w to t − 1 are used for training. At time t + 1 the instances from t − w + 1 to t are used and so on. This is an example of an evolving approach using a single model. The window moves forward independently whether a change has actually happened or not. Research efforts in adaptive learning with fixed windows focus on determining the window size [3]. More advanced forgetting approach uses instance weighting, for example [9]. Another type of approaches detects a change actively and then discards the training history accordingly. Different methods how to detect a change have been developed, for instance, [10, 11, 12, 13]. These approaches use a single learner. 4

An alternative approach is to use an ensemble of models and achieve adaptivity by manipulating individual model weights. Typically the errors of individual models are monitored and the weights are magnified in cases of correct prediction and decrease if the prediction was incorrect. Examples of different evolving adaptive ensembles can be found in [14, 15, 16]. Finally, there are ensembles with triggers. In this case triggers are used for model selection, also referred as dynamic integration. The incoming data is inspected and described based on the input attributes. The model that fits the description the closest is selected. Examples of adapting using ensembles with triggers can be found in [6, 17, 18, 19]. 2.3. Applicability of drift handling techniques We presented four main principles how to make learning methods adaptive to changes over time. Several instantiations of techniques built on these principles have been reported in the literature. The question is, how to select, which method to use? If we think about change types, single models are suitable and often used when sudden drift is expected. It makes sense, because sudden drift instantly and completely replaces the old concept with a new one. Therefore, the old becomes no longer relevant and can be discarded. Most of the efforts are directed towards detection/replacement and old models are not preserved. If gradual or reoccurring drift is expected, ensemble techniques prevail. Ensemble techniques keep a number of alternative models, which can store past concepts and reuse them if needed. In general, whether implicitly or explicitly, any method for handling concept drift is tailored to an expected change type. By no means it is bad approach. However, we want to emphasize that the application tasks offer and require to take much more properties into consideration. 3. Current practice in the evaluation of concept drift research To motivate the need for application oriented view we survey current practices in evaluating concept drift handling techniques. In recent years attention to streaming non stationary scenarios is increasing. To give a picture of the current practice we surveyed journal papers introducing generic methods for handling concept drift. We do not claim the overview to be a full survey, but rather to give a flavor of the practices in the major data mining journals. 5

Table 2: Characterization of journal papers addressing concept drift. real cases, real drift benchmark real simulated drift on real data only synthetic data total

IDA 3 1 4 3 11

TPAMI 1

DAMI

JMLR

KAIS 1

2 3

1

3

PR 3 2

3

2

1 5

5

TKDE 1 2

sum 8 6 12

1 4

5 33

The following major journals were included into search: JMLR, DAMI, KAIS, TKDE, IDA, TPAMI, PR. We look for manuscripts which relate to ‘data streams’ or/and ‘concept drift’ or/and ’online learning’ keywords. We select the manuscripts that focus on concept drift scenarios. In total we retrieved 33 relevant articles1 . Table 2 summarizes what data was used in evaluation. Some manuscripts use both real and synthetic data. We count it as real case, if real case study is claimed to be used (independently if there are experiments with synthetic data in addition to that). Only less than a quarter of the manuscripts (8) use real cases. The data comes from manufacturing, drilling machine, CD images, logs from training games, interest rates and stock prices, foreign exchange, medical scans, EEG. There are so few manuscripts counted as synthetic data, because there we counted the ones using only synthetic data and nothing else. Three datasets are counted as benchmark: electricity, calendar and forest cover type. We count them into separate category to emphasize the concentration on fixed problems and absence of application oriented case studies. There is another popular benchmark KDD cup, which we counted as synthetic data, because it is synthetic. About a half of the articles (12) used real data, but simulated concept drifts, which is in fact synthetic data from concept drift perspective. The most common explanation is that research needs controlled settings. That is true. But there is a counter argument, that the settings need to be realistic. Simulated drifts can come in variety of magnitudes, frequencies and configurations. How do we know if it ever happens in real settings? Generally, in conference publications the usage of real data is even lower. Clearly there is an important question to consider. Is concept drift a problem 1

as of October, 2010

6

after all, if real data experiments are rare? To motivate the research we provide an application oriented view. We construct a reference framework, which presents a spectrum of problems related to concept drift, that are motivated by real applications. 4. Properties of the tasks We start constructing a framework from organizing properties of the tasks, that are related to the problem of concept drift. A typical concept drift setting assumes tabular data (a single relation) that is evolving over time. Labels become available immediately after the prediction is casted. Changes happen suddenly. The real tasks where concept drift is relevant, can have much more distinct properties. In this section we present a view how to organize these properties. We start from three axes, they describe the task, the environment and operational settings. We combine the type of a task to deal with and how the associated data is organized into the first dimension. We highlight three major points here: • First, there might be different types of tasks at consideration: ranking – a typical task in recommendation, information retrieval and preference learning systems in which it is assumed that each item is assigned some relevance score and top scoring items should be selected; classification – a typical task in diagnosis and decision support, e.g. antibiotic resistance prediction, e-mail spam classification, news categorization); prediction – a typical task in demand prediction, resource scheduling optimization, or in general in applications in which predicting future behavior of people is important, e.g. predictive analytics tasks, and novelty detection - a typical task in fault, fraud, or abnormal behavior detection applications. Note, that by prediction or classification we do not mean regression labels or class labels. These are two orthogonal aspects. Instead, we emphasize that in prediction the labels are about the future events, while in classification labels represent something what is already present, but needs to be diagnosed. • Second, the input data can have different forms. It can be single or multi-relational, sequential, time series, general graph or particular complex structure, bags of instances or a mix. Data instances can 7

be noisy of highly accurate. Relational data can be of low or high dimensionality, have a few or lots of missing value, be almost complete or very sparse, have binary, categorical, ordered or numerical attributes, etc. • Third, the incoming data can be organized in different ways that effects its availability and accessibility. For example, it can come as a stream or in batches. Data re-access can be allowed or single pass over the data strictly enforced. There might be randomly or systematically missing values. All these properties describe the task. The second dimension describes the environment the systems operates in. We identify three important aspects here as well. • First, it is essential to keep in consideration, what the source of concept drift is. In other words, why changes are expected. It can be due to changes in individual preferences (some person was used to like accordion add jazz music, but not any more), population change (in times of crisis salaries tend to get lower for everyone or prices get higher), adversary actions (fraudulent actions are made on purpose to overcome the system, for example credit card frauds), complexity of the environment (in automated vehicle navigation the environment is so complex that it is not feasible to take into account all possibilities deterministically, thus the environment is assumed to be changing). • Second, it is important to specify what types of changes (drifts) are expected in the future. The drifts can be categorized into sudden, incremental (small sudden steps resulting in a smoother drift), gradual or a combination of different types. and reoccurring (seasonal), see Figure 2. • Third, it is important to determine, how predictable changes can be in the particular task. Concept drift can be totally unpredictable (e.g. many changes in financial markets), somewhat predictable (e.g. thanks to a signal from external early warning systems) or the environment might be identifiable in relation to seasonality or reoccurring contexts (e.g. popularity of special kinds of food products, movies or other goods and services).

8

sudden drift

mea an

Change types

time

mean

gradual drift

mean

time

incremental drift

reoccurring contexts t t

m mean

time

time (c) Gama, Menasalvas, Spiliopoulou, Vakali – Barcelona, 24th Sept. 2010

Figure 2: Types of concept drift.

(20)

The third dimension is related to the operational settings of the task. Here we highlight four key factors determining the settings. • First, it describes the label availability. Labels might become known right away in the next time step (e.g. food sales prediction), they might come with a fixed or variable lag (e.g. in credit scoring bankruptcy happens later, may happen years from the issue of credit2 ) or they can be obtained on demand (e.g. interestingness of an article, spam). • Second, the speed of decision making is relevant when selecting which methods to apply. Some decisions might be needed immediately (like fraud detection), the sooner the better, while others, for instance analytical decisions (credit scoring) are more flexible, they can wait. • Third aspect, important in choosing evaluation metrics, is the costs of mistakes. In traditional supervised learning different types of mistakes may have different costs. In concept drift settings, in addition, errors in time might have associated costs (too early or too late prediction of a peak in food sales). While in some applications only the accuracy of predicting the target is of the main importance (e.g. in online mass flow prediction), in other applications both accurate and timely identification of change and accurate prediction of the target are important (e.g. in demand prediction). 2

Although typically the horizon of prediction is fixed, say, to one year, thus the labels become known after one year and it is not possible to know right away.

9

• Finally, the ground truth labels might be hard based on clearly defined and accepted rules or soft, based on personal opinion, or they might be not available at all (impossible or too costly to measure or define in a direct way). We summarize, the discussed properties in Table 3.

Operationa l settings

Assumptions about changes

Data & task at consideration

Table 3: Summary of properties of concept drift applications.

task: detection, classification, prediction, ranking; input data: time series, relational, graph, bags or mix; incoming data: stream, batches, collection iterations on demand; complexity: volume; multiple scans; dimensionality; missing values: unlikely, random, systematic; change source: adversary, preferences, population change, complex environment; change type: sudden, incremental, gradual, reoccurring; change expectation: unpredictable, predictable, identifiable (meta); label availability: real time, on demand, fixed lag, variable lag; decision speed: real time, analytical; costs of mistakes: balanced, unbalanced; ground labels: hard, soft;

The discussed properties are needed in order to determine the type of task, associated environment and operational settings we are dealing with. That information is essential to determine, what characteristics the modeling system should have, what properties need to be prioritized when designing such system and how to evaluate it. 5. Landscape of Applications and Their Properties Let us look at the application tasks, where the problem of concept drift is relevant. We start with a general grouping of the application areas and their characteristics. Then we map the application areas within the three dimensions, along which we organized the properties of application tasks. 10

Table 4: Categorization of applications by type and industry. Landscape of applications Types of apps Monitoring/ Industries control Security, Police

Fraud detection, insider trading detection, adversary actions detection i d i Monitoring & Finance, Banking, management of Telecom, Credit g, , customer segments, g , Scoring, Insurance, Direct Marketing, bankruptcy Retail, Advertising, prediction e‐Commerce Ed Education (higher, i (hi h G i Gaming the system, h professional, child‐ Drop out prediction ren, e‐Learning) Entertainment, , Media

Personal assistance/ Management Ubiquitous personalization and planning applications ‐‐‐‐‐‐‐‐

Crime volume Authentica‐ prediction tion, Intrusion d detection i Demand Location Product or service prediction, based recommendation, g response rate p services,, including complimentary prediction, related ads, budget mobile apps planning M i VOD Music, VOD, movie, i Pl Player‐ Vi Virtual reality, l li learning object centered simulations recommendation, game design, p adaptive news learner‐ access, personalized centered search education

Vakali – Barcelona, 24th Sept. 2010 …(c) Gama, Menasalvas, Spiliopoulou, … …

…

…

(23)

5.1. Application areas We have analyzed major industries where data mining and related approaches either already play an important role or have a potential for that.3 We have also analyzed the different types of applications which are relevant to these industries and came up with the following categorization (Table 4). We grouped different application areas into four main application blocks: personal assistance and information, management and strategic planning, monitoring and control and ubiquitous environment applications. Monitoring and control mostly relates to detection tasks, which indicate abnormal behavior. It includes detection of adversary activities on the web, computer networks, telecommunications, financial transactions. In most of these task normal behavior is modeled and the goal is to alarm when some3

We used extracts of ACM classification http://www.acm.org/about/class/ ccs98-html and KDnuggets polls, e.g. http://www.kdnuggets.com/polls/2010/ analytics-data-mining-industries-applications.html as two reference points for surveying and summarizing application areas and industries.

11

thing abnormal is observed. E-mail and web spam detection though can be considered as a traditional two class classification when labels from both classed are somehow available. Table 5: Properties within the application areas. property

task input data incoming volume multiple scans missing values change source change type expectations

label speed ground labels

Monitoring Personal assistance Management Ubiquitous Control Personalization Strategic planning applications Task detection ranking prediction classification sequential relational time series sequential transactional relational stream batches stream stream iterations high moderate moderate high no/yes yes yes no random unlikely systematic random Environment adversary preferences population complex environment sudden gradual incremental all incremental reoccurring unpredictable unpredictable identifiable unpredictable predictable unpredictable identifiable Operational settings fixed lag on demand real time fixed lag hard soft hard hard

Personal assistance and information applications deal with personalized learning, which includes recommender systems, categorization and organization of textual information, customer profiling for marketing, personal mail categorization and spam filtering. Management and strategic planning includes mostly predictive analytics tasks like evaluation of creditworthiness, demand prediction, food sales, bus travel time prediction, crime maps. Ubiquitous environment applications include a wide spectrum of moving and stationary systems, which interact with changing environment, for instance robots, mobile vehicles, smart household appliances. To make the presentation compact, in each raw we placed a group of industries that share common supervised learning tasks (like e.g. demand prediction). As it can be seen from the table, for each of the industries or groups of industries, more than one application type can be relevant. We 12

TASK PREFERENCES (IR tasks)

PERSONAL ASSISTANCE PERSONALIZATION MANAGEMENT STRATEGIC PLANNING

PREDICTION (mostly time series)

CLASSIFICATION (mostly relational)

DETECTION (one-to-many)

UBIQUITOUS APPLICATIONS

MONITORING AND CONTROL ENVIRONMENT

real time

OPERATIONAL SETTINGS on demand

fixed lag

SUDDEN

INCREGRADUAL MENTAL or (small steps) combination

REOCCURING (seasonal)

later (variable lag)

Figure 3: Categorization and properties of concept drift applications. consider characteristic application examples for each of the application type in Section 6. Before doing that, we consider mapping of properties identified and categorized in Section 4. 5.2. Properties of the learning tasks within the application areas We assign the most likely properties to the respective application areas based on our subjective opinion. We believe that these are the most common properties for a given area, yet acknowledge that contradictory examples within each area are always possible to find. Thus this is not a hard categorization, but rather an assignment to serve as a guideline for better understanding the landscape of applications. Taking into account this disclaimer see Table 5 for properties within application areas and Figure 3 for mapping of the properties with the application areas. We provide a detailed overview and summary of the published research on handling concept drift mapping them to the identified application types in the Appendix A. 6. Discussion Research on concept drift has been rather fragmented. The research problems, although motivated by a belief that this handling concept drift is highly 13

important for practical data mining applications, have been formulated and addressed often in artificial and somewhat isolated settings. This resulted in the situation that we now have generic approaches for detection and handling of concept drift, which have been tested primarily on simulated data or real data with simulated drift. Assumptions behind expected type of changes, reasons for changes were not always stated explicitly for these approaches. Recent studies however do highlight the peculiarities of particular applications and give intuition and/or empirical evidence why traditional generalpurpose concept drift handling techniques are not expected to perform well and suggest tailored or more focused techniques suitable for a particular application type. In this section we provide an elaborate discussion on four particular application examples, where concept drift is relevant. We highlight what are the specific properties of the tasks and how they differ from typical concept drift research scenarios. 6.1. Monitoring and control: Online mass flow estimation Consider online mass flow estimation problem in the boiler example [20]. The boiler is fed with fuel from the fuel container (bunker) as depicted in Figure 4. The fuel inside the container is mixed using a mixing screw. There is a feeding screw at the outlet of the container, which transfers the fuel from the container to the boiler. During the burning stage the mass of fuel inside the container decreases (reflected by a decreasing amount of fuel in the data signal). As new fuel is added to the container (the burning process continues), the fuel feeding stage starts that is reflected by a rapid mass increase. The problem is to predict online mass flow that can be done by estimating each moment in time what the current amount of fuel in the bunker is. There are three main sources of changes in the signal: First, fuel feeding is manual and non standardized process, which is not necessarily smooth, it can have short interruptions. Each operator can have different habits. Besides, the feeding speed depends on the type of fuel used. Second, the feeding screw rotation adds noise to the measured signal. Besides, fuel particle jamming often happens, slowing down the screw for some seconds and distorting the signal estimate. Therefore, the reported mass inside the bunker is not accurate, the signal contains extreme upward outliers in the original signal, that can be seen in Figure 5. Finally, there is a low amplitude rather periodic 14

Figure 4: The origin of the input mass measurements signal. noise, which is caused by the mechanical rotation of the system parts. These amplitudes may become higher depending on the burning setup. The leaning system should deal with two types of change points: an abrupt change to feeding and slower but still abrupt switch to burning, and asymmetric outliers (see Figure 5 left), oriented upwards, which in online settings can be easily mixed with the changes to feeding. Besides there is a symmetric high frequency signal noise. Algorithmic change detection is not trivial as it might seem from visual inspection of the signal. The asymmetric nature of the outliers would elevate the original signal if approximated directly, since there are no corresponding negative outliers. In other words, the noise and outliers do not sum to zero with respect to the true signal. Besides, there are short burning periods within feeding stages, due to possible pauses in feeding (see Figure 5 right), which depend on human factor. These interruption regimes can vary from 5 to 20 seconds and are difficult to discriminate. In addition, we need to take into account that the mass flow signal may 15

Mass (g)

Mass (g)

Time (s)

Time (s)

Figure 5: Peculiarities in the data: upward outliers (left) and short burning periods within the feeding stage (right). have a nonzero second derivative, i.e. the speed of the mass change depends on the amount of fuel in the container – the more fuel is in the container, the higher is the acceleration, thus the more fuel gets into the screw. The weight of the fuel at higher levels of the tank compresses the fuel in the lower levels and in the screw, and the fuel density is increased. Besides, compression and thus the burning speed depends on the type and quality of the fuel. Thus, on the one hand we need to take data properties into consideration for handling changes in the data. However, on the other hand we can simplify the detection task making the explicit assumptions about the anticipated changes and states in which the system may be. From the evaluation point of view it is important to emphasize that there is no ground truth in this case – it is possible to construct some approximation of the ground truth in the offline settings and use it for the evaluation purposes in online settings (only in the experiments, but not in real operational setting), but the evaluation will be biased to the ground truth approximation process. However, in this application we know how many sudden change points are present and constructing a benchmark dataset we can mark all of them and evaluate how quickly, and how accurately we can detect each of it. We can inspect visually and measure experimentally the effect of skipping a change point, or detecting it too late, or detecting it inaccurately, i.e. before change actually happended, or after it, or the effect of generating a false alarm to the performance of the estimator itself. See Appendix B for a few illustrative examples. 6.2. Strategic management: Food wholesales prediction In food sales prediction a large number of factors affect the demand. Designing an intelligent predictor that would beat a simple moving average 16

baseline across a number of products appears to be a non-trivial task [21]. Sudden, gradual and reoccurring drifts are expected to happen in this domain, and there are numerous reasons that may cause the drift. For two obvious example see Figure 6.

Reoccuring and suddent dritft in food sales

Reoccurring season

(c) Gama, Menasalvas, Spiliopoulou, Vakali – Barcelona, 24th Sept. 2010

(34)

Figure 6: Peculiarities in the data: seasonal behavior (top) and abrupt permanent drop in the sales (bottom). In general, the definition of concept drift in this application is not as obvious as in the boiler or the electricity load prediction examples. Some food sales timeseries often demonstrate chaotic behavior, i.e. demand is constantly changing and is hardly predictable at all. Other timeseries may have strong reoccurring patterns as in the top example in Figure 6. However, as a food sales predictor learns not only from timeseries itself but typically from a richer representation (Figure 7), some of such seasonal changes can be already captured by predictive or contextual features and therefore should not be regarded as a drift. Sudden changes can be also caused for various reasons like for example a discontinuity of the product in some of the shops as illustrated in the bottom example in Figure 6. In this example labels can be considered as hard and becoming available ‘immediately’, i.e. before we generate next prediction (unless we have to 17

Challenges in food sales prediction (Zliobaite et al al., 2009)

Figure 7: Training data sources in food wholesales prediction. (c) Gama, Menasalvas, Spiliopoulou, Vakali – Barcelona, 24th Sept. 2010

(33)

predict a few steps ahead). However, if we blindly formulate the problem as time-series prediction, we won’t take into account e.g. out of stock cases. Over and under prediction, as well as predicting sales with ’early’ and ’late’ biases have different costs associated with storage, products becoming perishable, and opportunity costs. 6.3. Personalized information access: Recommender systems Interest of data mining community in recommender systems domain has been boosted by NetFlix competition (www.netflixprize.com). One of the lessons leant from it was that taking temporal dynamics is important for building accurate models. Handling concept drift has another set of peculiarities here. Both items and users are changing over time. Item-side effects include first of all changing product perception and popularity. Popularity of some movies is expected to follow seasonal patterns. User-side effects include changing tastes and preferences of customers, some of which may be short-term or contextual and therefore likely reoccurring (mood, activity, company, etc), changing perception of rating scale, possible change of rater within household and alike problems. As suggested in [5] popular windowing and instance weighing approaches for handling concept drift are not the best choice simply because in collaborative filtering the relations between ratings is of the main importance for predictive modeling. 18

In this application labels are soft, data comes in batches, and the rating matrix is high-dimensional and extremely sparse containing only about 1% of non-zero elements (that makes the application of most machine learning predictors unapplicable and boost the development of advanced collaborative filtering approaches). 6.4. Strategic management: Antibiotic resistance prediction in hospitals Antibiotic resistance is an important problem and it is an especially difficult problem with nosocomial infections in hospitals because pathogens attack critically ill patients who are more vulnerable to infections than the general population and therefore require more antibiotics. Prediction model is based on information about patients, hospitalization, pathogens and antibiotic themselves. The data arrives in batches, the labels become available with a variable lag depending on the size of the hospital and intensiveness of the patients flow. The size of the data is relatively small both in number of instances and the number of features to be considered. The peculiarity of concept drift is that it may happen for various reasons particularly because pathogens may develop resistance and share this information with peers in different ways. Consequently, the type and severity of changes may depend on the location in the instance space. Furthermore, the drift is expected to be local and reflect e.g. a pathway in the hospital where the resistance was taking place and spread around. This calls for the direct or indirect identification of the regions or subgroups in which concept drift is occurring. Handling concept drift with dynamic integration of classifiers that takes this peculiarity into account was shown to be effective [6]. 7. Conclusion The problem of concept drift has been recognized and studied in several areas of computer science related to data mining research. Approaches for handling concept drift are rather diverse and have been developed from two sides – theory-oriented and applications-oriented. By this we mean that many of the papers published in the data mining journals have been proposing new generic approaches for detecting and handling concept drift in a ‘typical’ application setting, while testing these approaches on synthetically generated datasets or real datasets with artificially imputed drifts. Many other papers have been motivated by the needs (and operational settings) of specific applications and have developed tailored solutions 19

for addressing these specific problems without questioning the generality of the developed ideas. In this work we categorized the applications, where handling concept drift is known or expected to be an important component of a supervised learning system. We identified four major types of applications and associated key properties characterizing corresponding settings. We hope that our categorization will serve as a reference framework for the researchers, who are newcomers to the field or the researchers, who find it important to discuss in more detail the applicability of the proposed approach to some of the supervised learning settings. We surveyed numerous application examples of handling concept drift in relation to the categorized tasks, and provided a more elaborate discussion of a few characteristic emphasizing the most important application oriented aspects. Summarizing those we can speculate that the concept drift research area is likely to refocus further from studying general methods to detect and handle concept drift to designing more specific, application oriented approaches that address various issues like delayed labeling, label availability, cost-benefit trade off of the model update and other issues peculiar to a particular type of applications. We also anticipate that there will be a change in the focus from change detection to change description, from reactive detection and handling of concept drift to proactive prediction of reoccurring contexts and meta learning. Acknowledgements This research is partly supported by NWO. We are thankful to the contributors and participants of HaCDAIS 2010 workshop held at ECML/PKDD 2010 for their valuable comments and discussions that helped to better shape this work. References [1] D. Hand, Classifier technology and the illusion of progress, Statistical Science 21 (2006) 1. [2] A. Tsymbal, The problem of concept drift: Definitions and related work, Tech. rep., Department of Computer Science, Trinity College Dublin, Ireland (2004). 20

[3] L. Kuncheva, Classifier ensembles for detecting concept change in streaming data: overview and perspectives, in: Proc. 2nd Workshop SUEMA 2008 (ECAI 2008), 2008, pp. 5–10. [4] G. Widmer, Tracking context changes through meta-learning, Machine Learning 27 (3) (1997) 259–286. [5] Y. Koren, Collaborative filtering with temporal dynamics, Commun. ACM 53 (4) (2010) 89–97. [6] A. Tsymbal, M. Pechenizkiy, P. Cunningham, S. Puuronen, Dynamic integration of classifiers for handling concept drift, Information Fusion 9 (1) (2008) 56–68. [7] P. Lindstrom, S. J. Delany, B. M. Namee, Handling concept drift in a text data stream constrained by high labelling cost, in: H. W. Guesgen, R. C. Murray (Eds.), FLAIRS Conference, AAAI Press, 2010. [8] M. Pechenizkiy, J. Bakker, I. liobaite, A. Ivannikov, T. Krkkinen, Online mass flow prediction in cfb boilers with explicit detection of sudden concept drift, SIGKDD Explorations 11 (2) (2009) 109–116. [9] R. Klinkenberg, Learning drifting concepts: Example selection vs. example weighting, Intell. Data Anal. 8 (3) (2004) 281–300. [10] J. Gama, P. Medas, G. Castillo, P. Rodrigues, Learning with drift detection, in: Advances In Artificial Intelligence, Proc. of the 17th Brazilian Symposium on Artificial Intelligence (SBIA 2004), Vol. 3171 of LNAI, Springer, 2004, pp. 286–295. [11] M. Leeuwen, A. Siebes, Streamkrimp: Detecting change in data streams, in: Proc. of the 2008 European Conf. on Machine Learning and Knowledge Discovery in Databases: ECML PKDD ’08 Part I, 2008, pp. 672– 687. [12] K. Nishida, K. Yamauchi, Detecting concept drift using statistical testing, in: Proc. of Discovery Science, 10th Int. Conf., DS 2007, Vol. 4755 of LNCS, Springer, 2007, pp. 264–269. [13] A. Bifet, R. Gavalda, Learning from time-changing data with adaptive windowing, in: Proc. of SIAM Int. Conf. on Data Mining (SDM’07), SIAM, 2007. 21

[14] J. Z. Kolter, M. A. Maloof, Learning to detect and classify malicious executables in the wild, Journal of Machine Learning Research 6 (2006) 2721–2744. [15] A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, R. Gavalda, New ensemble methods for evolving data streams, in: KDD ’09: Proc. of the 15th ACM SIGKDD int. conf. on Knowledge discovery and data mining, ACM, 2009, pp. 139–148. [16] L. Minku, A. White, X. Yao, The impact of diversity on on-line ensemble learning in the presence of concept drift, IEEE Transactions on Knowledge and Data Engineering 99 (1). [17] I. Zliobaite, Combining time and space similarity for small size learning under concept drift, in: Proc. of ISMIS 2009 - 18th International Symposium on Methodologies for Intelligent Systems, Vol. 5722 of LNCS, 2009, pp. 412–421. [18] I. Katakis, G. Tsoumakas, I. Vlahavas, Tracking recurring contexts using ensemble classifiers: an application to email filtering, Knowledge and Information Systems. [19] I. Zliobaite, J. Bakker, M. Pechenizkiy, Towards context aware sales prediction, in: Proc. of 2009 IEEE International Conference on Data Mining Workshops, Int. Workshop on Domain Driven Data Mining (DDDM09), 2009, pp. 94–99. [20] J. Bakker, M. Pechenizkiy, I. Zliobaite, A. Ivannikov, T. Karkkainen, Handling outliers and concept drift in online mass flow prediction in cfb boilers, in: Proc. of the 3rd Int. Workshop on Knowledge Discovery from Sensor Data (SensorKDD09), 2009, pp. 13–22. [21] I. Zliobaite, J. Bakker, M. Pechenizkiy, Beating the baseline prediction in food sales: How intelligent an intelligent predictor is?, Expert Syst. Appl. under review. [22] T. Lane, C. Brodley, Temporal sequence learning and data reduction for anomaly detection, ACM Trans. Inf. Syst. Secur. 2 (3) (1999) 295–331.

22

[23] A. Patcha, J. Park, An overview of anomaly detection techniques: Existing solutions and latest technological trends, Comput. Netw. 51 (12) (2007) 3448–3470. [24] M. Masud, J. Gao, L. Khan, J. Han, B. Thuraisingham, A multipartition multi-chunk ensemble technique to classify concept-drifting data streams, in: Proc. of Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD’09), 2009, pp. 363–375. [25] J. Kim, P. Bentley, U. Aickelin, J. Greensmith, G. Tedesco, J. Twycross, Immune system approaches to intrusion detection — a review, Natural Computing: an international journal 6 (4) (2007) 413–466. [26] R. Yampolskiy, V. Govindaraju, Direct and indirect human computer interaction based biometrics, Journal of computers 2 (10) (2007) 76–88. [27] N. Poh, R. Wong, J. Kittler, F. Roli, Challenges and research directions for adaptive biometric recognition systems., in: Proc. of Advances in Biometrics, Third International Conference, ICB 2009, Vol. 5558 of LNCS, Springer, 2009, pp. 753–764. [28] O. Mazhelis, S. Puuronen, Comparing classifier combining techniques for mobile-masquerader detection, in: ARES ’07: Proc. of the The 2nd Int. Conf. on Availability, Reliability and Security, IEEE Computer Society, 2007, pp. 465–472. [29] C. Hilas, Designing an expert system for fraud detection in private telecommunications networks, Expert Syst. Appl. 36 (9) (2009) 11559– 11569. [30] S. Delany, P. Cunningham, A. Tsymbal, A comparison of ensemble and case-base maintenance techniques for handling concept drift in spam filtering, in: Proc. of the 19th Int. Conf. on Artificial Intelligence (FLAIRS 2006), AAAI Press, 2006, pp. 340–345. [31] F. Fdez-Riverola, E. Iglesias, F. Diaz, J. Mendez, J. Corchado, Applying lazy learning algorithms to tackle concept drift in spam filtering, Expert Syst. Appl. 33 (1) (2007) 36–48. [32] T. Fawcett, ”in vivo” spam filtering: a challenge problem for kdd, SIGKDD Explor. Newsl. 5 (2) (2003) 140–148. 23

[33] A. Sudjianto, S. Nair, M. Yuan, A. Zhang, D. Kern, F. Cela-Diaz, Agus sudjianto, sheela nair, ming yuan, aijun zhang, daniel kern, fernando cela-daz, Technometrics 52 (1) (2010) 5–19. [34] R. A. Becker, C. Volinsky, A. R. Wilks, Fraud detection in telecommunications: History and lessons learned, Technometrics 52 (1) (2010) 20–33. [35] D. J. Hand, Fraud detection in telecommunications and banking: Discussion of becker, volinsky, and wilks (2010) and sudjianto et al. (2010), Technometrics 52 (1) (2010) 34–38. [36] F. Crespo, R. Weber, A methodology for dynamic data mining based on fuzzy clustering, Fuzzy Sets and Systems 150 (2005) 267–284. [37] J. Moreira, Travel time prediction for the planning of mass transit companies: a machine learning approach, Ph.D. thesis, Faculty of Engineering of University of Porto (2008). [38] J. Zhou, L. Cheng, W. Bischof, Prediction and change detection in sequential data for interactive applications, in: National Conference on Artificial Intelligence (AAAI), AAAI, 2008, pp. 805–810. [39] J. Luo, A. Pronobis, B. Caputo, P. Jensfelt, Incremental learning for place recognition in dynamic environments, in: Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS07), 2007, pp. 721– 728. [40] L. Liao, D. Patterson, D. Fox, H. Kautz, Learning and inferring transportation routines, Artif. Intell. 171 (5-6) (2007) 311–331. [41] K. Ku-Mahamud, N. Zakaria, N. Katuk, M. Shbier, Flood pattern detection using sliding window technique, in: Proc. of the 3rd Asia International Conference on Modelling & Simulation, 2009, pp. 45–50. [42] A. Krause, C. Guestrin, Nonmyopic active learning of gaussian processes: an exploration-exploitation approach, in: ICML ’07: Proc. of the 24th int. conf. on Machine learning, ACM, 2007, pp. 449–456. [43] A. Pawling, N. Chawla, G. Madey, Anomaly detection in a mobile communication network, Comput. Math. Organ. Theory 13 (4) (2007) 407– 422. 24

[44] S. Gauch, M. Speretta, A. Chandramouli, A. Micarelli, User profiles for personalized information access, in: The Adaptive Web, Springer Berlin / Heidelberg, 2007, pp. 54–89. [45] D. Widyantoro, J. Yen, Relevant data expansion for learning concept drift from sparsely labeled data, IEEE Trans. on Knowl. and Data Eng. 17 (3) (2005) 401–412. [46] D. Billsus, M. Pazzani, A hybrid user model for news story classification, in: UM ’99: Proc. of the 7th int. conf. on User modeling, SpringerVerlag, 1999, pp. 99–108. [47] G. Lebanon, Y. Zhao, Local likelihood modeling of temporal text streams, in: ICML ’08: Proc. of the 25th int. conf. on Machine learning, ACM, 2008, pp. 552–559. [48] R. Klinkenberg, I. Renz, Adaptive information filtering: Learning drifting concepts, in: Proc. of AAAI-98/ICML-98 workshop Learning for Text Categorization, 1998, pp. 33–40. [49] F. Mourao, L. Rocha, R. Araujo, T. Couto, M. Goncalves, W. Meira, Understanding temporal aspects in document classification, in: WSDM ’08: Proc. of the int. conf. on Web search and web data mining, ACM, 2008, pp. 159–170. [50] I. Katakis, G. Tsoumakas, I. P. Vlahavas, An ensemble of classifiers for coping with recurring contexts in data streams, in: ECAI, Vol. 178 of Frontiers in Artificial Intelligence and Applications, IOS Press, 2008, pp. 763–764. [51] M. Hasan, E. Nantajeewarawat, Towards intelligent and adaptive digital library services, in: ICADL 08: Proc. of the 11th Int. Conf. on Asian Digital Libraries, Springer-Verlag, 2008, pp. 104–113, neturiu. [52] O. Flasch, A. Kaspari, K. Morik, M. Wurst, Aspect-based tagging for collaborative media organization, in: From Web to Social Web: Discovering and Deploying User and Content Profiles: Workshop on Web Mining, WebMine 2006. Revised Selected and Invited Papers, Vol. 4737 of LNAI, Springer-Verlag, 2007, pp. 122–141.

25

[53] T. Yamaguchi, Constructing domain ontologies based on concept drift analysis, in: in IJCAI-99. Workshop on Ontologies and Problem-Solving Methods, 1999. [54] J. Scanlan, J. Hartnett, R. Williams, Dynamicweb: Adapting to concept drift and object drift in cobweb, in: AI ’08: Proc. of the 21st Australasian Joint Conf. on Artificial Intelligence, Springer-Verlag, 2008, pp. 454–460. [55] A. da Silva, Y. Lechevallier, F. Rossi, F. de Carvalho, Construction and analysis of evolving data summaries: An application on web usage data, in: ISDA ’07: Proc. of the 7th Int. Conf. on Intelligent Systems Design and Applications, IEEE Computer Society, 2007, pp. 377–380. [56] P. D. Bra, A. Aerts, B. Berden, B. de Lange, B. Rousseau, T. Santic, D. Smits, N. Stash, Aha! the adaptive hypermedia architecture, in: HYPERTEXT ’03: Proc. of the 14th ACM conf. on Hypertext and hypermedia, ACM, 2003, pp. 81–84. [57] M. Black, R. Hickey, Classification of customer call data in the presence of concept drift and noise, in: Soft-Ware 2002: Proc. of the 1st Int. Conf. on Computing in an Imperfect World, Springer-Verlag, 2002, pp. 74–87. [58] N. Lathia, S. Hailes, L. Capra, knn cf: a temporal social network, in: RecSys ’08: Proc. of the 2008 ACM conf. on Recommender systems, ACM, 2008, pp. 227–234. [59] A. Rozsypal, M. Kubat, Association mining in time-varying domains, Intell. Data Anal. 9 (3) (2005) 273–288. [60] Y. Koren, Collaborative filtering with temporal dynamics, in: KDD ’09: Proc. of the 15th ACM SIGKDD int. conf. on Knowledge discovery and data mining, ACM, 2009, pp. 447–456. [61] R. Bell, Y. Koren, C. Volinsky, The bellkor 2008 solution to the netflix prize, online (2008). URL http://www.research.att.com/~volinsky/netflix/

26

[62] Y. Ding, X. Li, Time weight collaborative filtering, in: CIKM ’05: Proc. of the 14th ACM int. conf. on Information and knowledge management, ACM, 2005, pp. 485–492. [63] P. Kumar, V. Ravi, Bankruptcy prediction in banks and firms via statistical and intelligent techniques - a review, European Journal of Operational Research 180 (1) (2007) 1–28. [64] M. Harries, C. Sammut, K. Horn, Extracting hidden context, Mach. Learn. 32 (2) (1998) 101–126. [65] T. Sung, N. Chang, G. Lee, Dynamics of modeling in data mining: interpretive approach to bankruptcy prediction, J. Manage. Inf. Syst. 16 (1) (1999) 63–85. [66] R. Horta, B. de Lima, C. Borges, Data pre-processing of bankruptcy prediction models using data mining techniques, Online (2009). URL http://blog.campe.com.br/wp-content/uploads/2009/03/ witpress_conf-2.pdf [67] I. Zliobaite, T. Krilavicius, Clan: Clustering for credit risk assessment, An entry to pakdd 2009 data mining competition, Vilnius University and Vytautas Magnus University (2009). [68] R. Giacomini, B. Rossi, Detecting and predicting forecast breakdowns, Working Paper 638, ECB (2006). [69] R. Klinkenberg, Meta-learning, model selection and example selection in machine learning domains with concept drift, in: Proc. of Annual Workshop of the Special Interest Group on Machine Learning, Knowledge Discovery, and Data Mining (FGML-2005) of the German Computer Science Society (GI) Learning - Knowledge Discovery - Adaptivity (LWA-2005), 2005, pp. 64–171. [70] M. Harries, K. Horn, Detecting concept drift in financial time series prediction using symbolic machine learning, in: In Proc. of the 8th Australian joint conf. on artificial intelligence, 1995, pp. 91–98. [71] J. Ekanayake, J. Tappolet, H. C. Gall, A. Bernstein, Tracking concept drift of software projects using defect prediction quality, in: Proc. of

27

the 6th IEEE International Working Conference on Mining Software Repositories (MSR’09), 2009, pp. 51–60. [72] X. Song, C. Jermaine, S. Ranka, J. Gums, A bayesian mixture model with linear regression mixing proportions, in: KDD ’08: Proc. of the 14th ACM SIGKDD int. conf. on Knowledge discovery and data mining, ACM, 2008, pp. 659–667. [73] M. Kukar, Drifting concepts as hidden factors in clinical studies, in: Proc. of AIME 2003, 9th Conference on Artificial Intelligence in Medicine in Europe, 2003, pp. 355–364. [74] P. Gago, A. Silva, M. Santos, Adaptive decision support for intensive care, in: Proc. of 13th Portuguese Conference on Artificial Intelligence, 2007, pp. 415–425. [75] M. Black, R. Hickey, Detecting and adapting to concept drift in bioinformatics, in: Proc. of Knowledge Exploration in Life Science Informatics, International Symposium, KELSI 2004, Vol. 3303 of LNCS, Springer, 2004, pp. 161–168. [76] G. Forman, Incremental machine learning to reduce biochemistry lab costs in the search for drug discovery, in: 2nd Workshop on Data Mining in Bioinformatics, 2002, pp. 33–36. [77] C. Jermaine, Data mining for multiple antibiotic resistance, online (2008). URL http://www.cise.ufl.edu/~cjermain/DM [78] S. Thrun, M. Montemerlo, H. Dahlkamp, D. Stavens, A. Aron, J. Diebel, P. Fong, J. Gale, M. Halpenny, G. Hoffmann, K. Lau, C. Oakley, M. Palatucci, V. Pratt, P. Stang, S. Strohband, C. Dupont, L.-E. Jendrossek, C. Koelen, C. Markey, C. Rummel, J. van Niekerk, E. Jensen, P. Alessandrini, G. Bradski, B. Davies, S. Ettinger, A. Kaehler, A. Nefian, P. Mahoney, Winning the darpa grand challenge, Journal of Field Robotics 23 (9) (2006) 661–692. [79] M. Procopio, J. Mulligan, G. Grudic, Learning terrain segmentation with classifier ensembles for autonomous robot navigation in unstructured environments, J. Field Robot. 26 (2) (2009) 145–175. 28

[80] A. Lattner, A. Miene, U. Visser, O. Herzog, Sequential pattern mining for situation and behavior prediction in simulated robotic soccer, in: RoboCup 2005: Robot Soccer World Cup IX, Vol. 4020 of LNCS, 2006. [81] P. Rashidi, D. Cook, Keeping the resident in the loop: Adapting the smart home to the user, IEEE Trans. on Systems, Man, and Cybernetics, Part A: Systems and Humans 39 (5) (2009) 949–959. [82] D.Anguita, Smart adaptive systems: State of the art and future directions of research, in: Proc. of the 1st European Symp. on Intelligent Technologies, Hybrid Systems and Smart Adaptive Systems, EUNITE 2001, 2001. [83] D. Charles, A. Kerr, M. McNeill, M. McAlister, M. Black, J. Kcklich, A. Moore, K. Stringer, Player-centred game design: Player modelling and adaptive digital games, in: Digital Games Research Conference 2005, Selected Papers Publication, 2005, pp. 285–298. [84] R. Bolton, D. Hand, Statistical fraud detection: A review, Statistical Science 17 (3) (2002) 235–255. [85] S. Donoho, Early detection of insider trading in option markets, in: KDD ’04: Proc. of the 10th ACM SIGKDD int. conf. on Knowledge discovery and data mining, ACM, 2004, pp. 420–429. [86] D. Blei, J. Lafferty, Dynamic topic models, in: ICML ’06: Proc. of the 23rd int. conf. on Machine learning, ACM, 2006, pp. 113–120. [87] C. Wang, D. Blei, D. Heckerman, Continuous time dynamic topic models, in: Uncertainty in Artificial Intelligence [UAI], AUAI Press, 2008, pp. 579–586. [88] Y. Yang, X. Wu, X. Zhu, Mining in anticipation for concept change: Proactive-reactive prediction in data streams, Data Min. Knowl. Discov. 13 (3) (2006) 261–289. [89] J. Kleinberg, Bursty and hierarchical structure in streams, in: KDD ’02: Proc. of the 8th ACM SIGKDD int. conf. on Knowledge discovery and data mining, ACM, 2002, pp. 91–101.

29

Appendix A. Handling Concept Drift in Application Examples: An Overview of Published Work Following the categorization of applications presented in this paper we provide an overview of examples of application tasks found in the research literature. Monitoring and Control Monitoring and control applications typically can be characterized as data streams. The data volumes are large and it needs to be processed in real time. Two types of tasks can be distinguished: prevention and protection against adversary actions, and monitoring for management purposes. Monitoring against adversary actions. Monitoring against adversary actions can be seen as unsupervised learning task or as one class classification, where the properties of ‘normal behavior’ are well defined, while the properties of attacks can differ and change from case to case. Classes are typically highly imbalanced with a few real attacks. Computer security. Intrusion detection is one of the typical monitoring problems. The task is to detect unwanted access to computer systems mainly through network (e.g. internet). There are passive intrusion detection systems, which only detect and alert the owner, and active systems, which take protective action. In both cases concept drift relates only to the detection part. The primary source of concept drift is adversary actions. The attackers try to invent new ways how to attack, which would overcome the existing security. The secondary source of concept drift is technological progress in time, when more advanced and powerful machines are developed and become accessible to intruders. Besides, the ‘normal’ behavior can also change over time. Lane and Brodley [22] explicitly formulated the problem of concept drift in intrusion detection a decade ago. They presented a detection system using instance based learning. Current research directions and problems in intrusion detection can be found in a general review [23]. From supervised learning, lately, ensemble techniques have been proposed [24]. Artificial immune systems, which mimic adaptivity of natural immune systems, are also popular techniques for intrusion detection[25].

30

Biometric authentication. In biometric authentication [26, 27] concept drift can be caused by changing physiological factors, for example growing beard. Like in credit applications, here adaptivity of the algorithms should be used with caution, due to potential adversary behavior. Knowing that the system is adaptive someone might drive it to adapt in such a way that the security is breached. Telecommunications. Mobile masquerade detection problem [28] from research perspective is closely related to intrusion detection. The goal is to prevent adversaries from unauthorized access to a private data. There are again two sources of concept drift: adversary behavior trying to overcome the control as well as changing behavior of legitimate users. Fraud detection and prevention in telecommunication industries [29] is also subject to concept drift due to similar reasons. The problem of spam (content or link) detection [30, 31] has also connections to masquerade detection as e-mail, web, blogosphere and recommender system spammers try to find approaches to mimic normal behavior. However, e-mail spam types are subject to seasonality and popularity of the topics or merchandize. There is a drift in the amount of spam over time, as well as in the content of the classes [32]. Besides, personal interpretation of what is spam might differ and change. Therefore the spam detection task is also naturally linked to the personal assistance application type. Finance. Data mining techniques are employed to monitor streams of financial transactions (credit cards, internet banking) to alert for possible frauds or insider trading. A general view of fraud detection task with emphasis to concept drift can be found in [33, 34, 35]. Monitoring for management. Monitoring for management typically uses streaming data from sensors. The tasks are characterized by high volumes of data and real time decision making, but typically there is no adversary actions. Transportation. Traffic management systems use data mining to determine traffic conditions or states [36], e.g. car density in a particular area, accidents. Traffic control centers are the end users of such systems. Transportation systems are dynamic, and the traffic patterns are changing seasonally as well as permanently, thus the systems have to be able to handle concept drift.

31

Data mining can also be employed for prediction of public transportation travel time [37], which is relevant for scheduling and planning. The task is also subject to concept drift due to traffic patterns, human driver factors, irregular seasonality. Positioning. Concept drift might occur in remote sensing in fixed geographic locations. Interactive road tracking is an image understanding system to assist a cartographer annotating road segments in aerial photographs [38]. In this problem change detection comes into play when generalizing to different roads over time. In place recognition [39] or activity recognition [40] dynamics of the environment cause concept drift in the learned models. Climate patterns, such as floods, are expected to be stationary, but the detection systems have to incorporate not regular reoccurring contexts. In a light of a climate change the systems might benefit from adaptive techniques, for instance, sliding window training [41]. In [42] the authors use active learning of non stationary Gaussian process for river monitoring. Industrial monitoring. In production monitoring human factor can be the source of concept drift. Consider a boiler used for heat production. The fuel feeding and burning stages might depend on individual habits of a boiler operator, when the fuel is manually input into the system [20]. The control task is to identify the start and end of the fuel feeding. The techniques need to be equipped with mechanisms to handle concept drift. In service monitoring changing behavior of the users can be the source of a drift. For example, data mining is used to detect accidents or defects in telecommunication network [43]. A change in call volumes may be the results of an increased number of people trying to call friends or family to tell them what is happening or a decrease in network usage caused by people being unable to use the network. Or the change might be unrelated to the telecommunication network at all. The fault detection techniques need to be able to handle such anomalies. Personal Assistance and Information These applications mainly organize and personalize information. There are no global labels, they are different from individual to individual. Applications can be grouped into individual assistance for personal use, and customer profiling for business (marketing).

32

Personal assistance. Personal assistance applications perform user modeling. They aim to personalize the flow of information, the process is often called information filtering. A rich technical presentation on user modeling can be found in [44]. One of the primary applications of user modeling is representation of queries, news, blog entries with respect to current user interests. Changing user interests over is the main cause of concept drift. Large part of personal assistance applications are related to textual data. The problem of concept drift has been addressed in news story classification [45, 46] or document categorization [47, 48, 49]. [50] in a light of changing user interests address the issue of reoccurring contexts. Drifting user interests are relevant in building personal assistance in digital libraries [51] or networked media organizer [52]. In addition, there is a large body of research addressing web personalization and dynamics [53, 54, 55, 56], which is as well subject to drifting user interests. In contrast to end user text mining discussed before, here mostly interim system data (logs) is mined. Customer profiling. Customer profiling uses aggregated data from many users. The goal is to segment the customers based on their interests and needs. Since individual interests are changing over time, data mining techniques should take these changes into account. Direct marketing is one of the applications. Adaptive data mining methods are used in customer segmentation based on product preferences, for example cars [36], or service usage, for example telecommunications [57]. Lately, in addition to similarity measures between individual customers, social network analysis has been employed into customer segmentation [58]. It is observed that user interests do not evolve simultaneously. The users that used to have similar interests in the past might no longer share the interests in the future. The authors model this as an evolving graph. Adaptivity is also relevant to association rule mining applied to shopping basket identification and analysis [59]. Automatic recommendations can be related to both customer profiling and personal assistance. The publicity of recommender systems research has increased rapidly with a NetFlix movie recommendation competition. The winners used temporal aspect as one of the keys to the problem [60, 61]. Three sources of drift were taken into account: movie biases (popularity changes over time), user bias (natural drift of users’ rating scale, which was benchmarked to the recent ratings) and changes in user preferences. In earlier 33

works on recommender systems [62] changes over time were handled via time weighting. Management and strategic planning Management applications typically relate to predictive analytics tasks: demand prediction, travel time predictions, event prediction (e.g. crime maps, epidemic outbreaks). Finance. Bankruptcy prediction or individual credit scoring is typically considered to be a stationary problem [63]. However, in these problems concept drift is closely related to a hidden context [64], changes in context, which is not observed or measured in the original model. The need for different models for bankruptcy prediction under different economic conditions was acknowledged and proposed in [65]. The need for models to be able to deal with non stationarity has been rarely acknowledged [66]. The problem is that although concept drift problem is present, adversaries might make use of full adaptivity of the models. In such case offline adaptivity, which would be restricted to already seen subtypes of customers, might be a solution [67]. Macroeconomics. Concept drift is relevant in making macroeconomic forecasts [68], predicting the phases of a business cycle [69]. The data is drifting primary due to large number of influencing factors, which are not feasible to be taken into prediction models. Due to the same reason financial time series are known to be non stationary to predict [70]. In business management, in particular, software project management, careful planning can be inaccurate if concept drift is not taken into account. [71] employ data mining models for project time prediction, the models are equipped with concept drift handling techniques. Biomedical applications. can be subject to concept drift due to adaptive nature of microorganisms [72, 6]. The effect of antibiotics to a patient is often naturally diminishing over time, since microorganisms mutate and evolutionary develop antibiotic resistance. If a patient is treated with antibiotic when it is not necessary, a resistance might develop and antibiotics might no longer help when they are really needed. Clinical studies and systems need adaptivity mechanisms to changes caused by human demographics [73, 74]. Changes in disease progression can also be triggered by changes in a drug being used [75]. In incremental drug discovery 34

experiments the drift between training and testing sets can caused by non uniform sampling [76]. Data mining can be used to discover emerging resistance and monitor nosocomial infections in hospitals (the infections which result from the treatment) [77]. Given patient and microbiology data as an input, the task is to model the resistance. The resistance patterns change over time. Ubiquitous applications In ubiquitous applications the problem of concept drift is often called dynamic environment. The objects learn how to interact with the environment and since the environment is changing, the learners need to be adaptive. Mobile systems and robotics. Ubiquitous knowledge discovery (UKD) deals with the distributed and mobile systems, operating in a complex, dynamic and unstable environment. The word ’ubiquitous’ means distributed at a time. Navigation systems, vehicle monitoring, household management systems, music mining are examples of UKD. A winning entry in 2005 Darpa navigation challenge used online learning for road image classification into drivable and not drivable [78]. They used an adaptive Mixture of Gaussians, for gradual adaptation they were adjusting the internal Gaussian and rapid adaptation by replacement of the Gaussians with the new ones. The needed speed of adaptation would depend on the road conditions. Adaptivity to changing environment has been addressed in robotics [79], for instance in designing a player for robot soccer [80]. Intelligent systems. ‘Smart’ home systems [81] or intelligent household appliances [82] need to be adaptive to changing environment and user needs. Virtual reality. needs mechanisms to take concept drift into account. In computer game design [83] adversary actions of the players (cheating) might be one of the drift sources. In flight simulation the strategies and skills differ across different users [64]. Summary In the table below we summarize the discussed applications with concept drift. 35

Categories

against adversaries

for management

personal assistance

Applications Monitoring and Control computer security intrusion detection telecommunications intrusion detection, fraud finance fraud, insider trading transportation traffic management positioning place, activity recognition industrial mon. boiler control, telecom mon. Assistance and Information news, document classification textual information spam categorization web

customer profiling

marketing recommender systems

References [22, 24, 25] [28, 29] [84, 85] [36, 37] [38, 39, 40] [20, 43] [45, 46, 47, 48, 49] [30, 31]

web personalization libraries, media

[53, 54, 55, 56] [51, 52]

customer segmentation movie recommendations

[36, 57, 58, 59] [60, 61, 62]

document organization articles, mail economics macroeconomics, forecasting project management software project mgmt. Decision Making creditworthiness bankruptcy prediction

[86, 87, 88, 89] [68, 69, 70] [71]

biomedicine

drug research clinical research

[6, 76, 77] [73, 74, 75]

security

authentication

information

finance

antibiotic res., drug disc. disease monitoring

biometrics AI and Robotics mobile systems robots, vehicles intelligent systems ‘smart’ home, appliances virtual reality computer games, flight sim.

36

[65, 66, 67]

[26, 27] [78, 79, 80] [81, 82] [83, 64]

Appendix B. Effects of three different change detection mistakes on the performance of online mass flow estimator 4

1.44

x 10

Mass, g

1.42 1.4 1.38 1.36 1.34 1.32 3.45

3.46

3.47 Time, s

3.48

False positive case.

37

3.49

3.5 4 x 10

4

x 10

Original signal Predicted signal Detected consumption−to−feeding change Detected feeding−to−consumption change Detected outlier

1.7

1.6

1.5

Mass, g

1.4

1.3

1.2

1.1

1

0.9

4.3

4.31

4.32

4.33 Time

4.34

4.35

4.36 4

x 10

‘Early’ bias example. 4

x 10

Original signal Predicted signal Detected consumption−to−feeding change Detected feeding−to−consumption change Detected outlier

1.6

1.5

Mass, g

1.4

1.3

1.2

1.1

1

6200

6250

6300 Time

‘Late’ bias example. 38

6350

Handling Concept Drift in Information Systems