D2PM: Domain Driven Pattern Mining

Viewer
Transcript

D2PM: Domain Driven Pattern Mining Cláudia Antunes Instituto Superior Técnico [email protected]

1 Introduction One of the main unsolved problems that arise in the knowledge discovery process is the inability to use background knowledge. This failure impairs the focus of the process on user expectations, which contributes to minimize its usefulness, with results being dominated by known information. Pattern mining is a paradigmatic case of this, and aims at discovering patterns, that are shown frequently by a set of entities. A pattern corresponds to a set of elements that occur according to a specific structure. On the last years, the mining community has centered its attention on the alignment between discovered information and user expectations, trying to adapt existing methods to be domain driven. In this context, it is possible to consider either dedicated algorithms to specific domains, or general methods able to use incorporated domain knowledge that guides the mining process. Along the short history of mining research, the use of constraints has been seen as the most effective way to approach the above problem. Indeed, its usage allows for focusing the mining process into sub-spaces where useful information is likely to be gained. However, they have been used seldom in real problems, with a specific syntax and semantics for each application. Nevertheless, recent work has shown that it is possible to use background knowledge to guide the mining process, in particular in the discovery of frequent sequential patterns (Antunes, et al., 2005). In the sequence of those results and the developments in knowledge representation (Ontologies in particular), the research was centered on the use of these tools to incorporate background knowledge into the mining process. The Onto4AR framework (Antunes, 2008) is the first framework designed entirely to achieve that goal, and presents good perspectives, with an equal number of challenges. The D2PM (Domain Driven Pattern Mining) project aims to extend that framework in order to create an environment where any kind of pattern can be discovered recurring to available background knowledge. This extension will be achieved through addressing three different aspects of pattern mining: transactional pattern mining, sequential pattern mining and considering temporal factors in pattern mining. The Onto4AR framework presents a new formulation for pattern mining, redefining the notion of pattern. In particular, it leaves to the user the responsibility of choosing the constraints to apply for identifying patterns, from a set of pre-defined general constraints. In order to enlarge its application new kinds of constraints should be developed, including constraints based on the axioms present in the domain

2

Cláudia Antunes

ontology. Beside the definition of these new constraints, efficient algorithms to deal with them are mandatory, allowing for the reduction of processing times. To include sequential pattern mining, the goal is to extend the framework to be able to deal interchangeably both with transactional and sequential patterns. To reach this goal, it is necessary to consider a meta-ontology where sequential constraints, like the ones defined by automata, may be defined easily. The last issue relates to deal with events, which means to deal with time. Events occur naturally in temporal sequences, where the different states for a set of entities are recorded, but can also be considered in transactional pattern mining, in order to consider different subsets of transactions to perform the discovery process. The goal here is to enrich the framework with a time ontology to define domain events, which can then be used to constraint the pattern mining process. Our team has a considerable experience and number of publications on pattern mining, with primary investigator being work on this topic, as her main research interest, for about twelve years. In particular, these problems were addressed on her Master, deploying a pattern mining application in the Consultation of Low Vision in the Santa Maria Hospital, resulting from a collaboration project between this institution and IST. Following this project, sequential pattern mining methods and the use of constraints were developed and studied in her PhD. Beside scientific contributions, an application with the developed methods was deployed in a Portuguese retail company, again in a collaboration project. In the last year, both primary investigator and investigator Andreia Silva, developed a new approach to apply pattern mining to bibliographic data, in the context of the TELplus, a project funded by the European Commission (ECP-2006-DILI-510003). This experience, combined with the basic knowledge on the area of knowledge representation, resulting from investigators’ roots in Artificial Intelligence and basic formation of all team members on related subjects, supports a high likelihood for the success of the D2PM project.

2 Literature Review The advances on the field of knowledge discovery and the increase of computation power have created new expectations about data mining results. Nowadays, discovering information is not sufficient, since it includes trivial known information among large amounts of it. In the last two decades pattern mining has gained some attention, being applied mostly to basket analysis. Pattern mining (PM) is an unsupervised task, which tries to capture existing dependencies among attributes and its values, described as association rules. The problem was first introduced in 1993 (Agrawal, et al., 1993), and is defined as the discovery of “all association rules that have support and confidence greater than an user-specified minimum support and minimum confidence, respectively”. An association rule corresponds to an implication of the form A-->B, where A and B are propositions (sets of pairs attribute/value), that expresses that when A occurs, B also occurs with a certain probability (the rule’s confidence). The support of the rule is given by the relative frequency of instances that include A and

D2PM: Domain Driven Pattern Mining

3

B, simultaneously. These rules are discovered by identifying the set of propositions that occur together, usually known as patterns; and then, by combining them to define the implications. With the advances in this area, we can consider either transactional patterns, corresponding to sets of elements that appear together, or structured patterns, where they correspond to sets of elements that follow a specific structure (like sequences or graphs). The problems of PM are well-studied and most of them derive from the explosion of the number of discovered patterns, as the minimum support decreases, and the human inability to deal with it. However, research in PM has mostly focused on developing fast algorithms, and relatively little has been done to control this problem. Indeed, the consensual solution is to incorporate background knowledge (BK) in the mining process (Yang, et al., 2006), but the main efforts in this area have been driven to the definition of constraint languages specific for pattern mining, which do not contribute to generalize the use of pattern mining in real situations. It is undeniably that several works claimed to include BK in the mining process; however most of them are simple tools that allow the user to control the process. These tools are a considerable improvement since they center the process in user expectations and expertise, but do not represent any knowledge in an explicit form. In this manner, only experts can take advantage of them. Other approaches consist on the use of BK to define interestingness measures to filter off uninteresting patterns, but this is a case to case solution, and, once more, there is no explicit representation of knowledge. An exception to this picture is the work in inductive databases that inherit most of the know-how acquired by inductive logic programming research. Nevertheless, the most popular approach is the use of constraints to reduce the scope of the mining process (Bayardo, 2005). In this context, constraints represent user’s expectations or knowledge, and are used to guide the mining process. However, and despite the potentiality of the use of constraints, most of the times they have been used, not to represent BK but to constrain queries, see for example the work by Feldman and Hirsh on mining associations on text (Feldman, et al., 1996) and the work on mining frequent patterns, see (Bayardo, et al., 1999), (Antunes, et al., 2003). Exceptions are few, but allow legitimate good expectations. The use of Bayesian networks to represent BK to post-prune already known patterns (Jaroszewicz, et al., 2004) is a great advance. Indeed, these networks reflect the relations and their probabilities, and can be seen as simplified ontologies (where, only concepts and relations are represented). Another case is in the field of sequential PM, where SPIRIT (Garofalakis, et al., 1999) represents knowledge with regular expressions and in (Antunes, et al., 2002) it is represented by context-free languages. Additionally, the use of constraint relaxations was proposed to allow for the discovery of unknown patterns (Antunes, et al., 2005). The Inductive Queries for Mining Patterns and Models project, funded by an IST programme (see http://iq.ijs.si/IQ/), tries to define a theoretical framework for data mining, using inductive databases and constrained mining. In particular, it explores declarative forms to query the databases, instead of procedural ones. However, most important results are in the area of clustering and usage demonstrations were done almost exclusively in the field of bioinformatics.

4

Cláudia Antunes

A parallel effort has been drawn by our team in the definition of a framework for PM. The Onto4AR framework (Antunes, 2008) is centered on the use of a domain ontology and assumes a new formulation of the PM problem, where the meaning of an item is clearly defined in the domain of application. The central idea is to provide an environment where PM can be done by anyone. In this context, pre-defined constraints exist and other can be added, in order to provide to the user the ability of controlling the mining process. When considering PM, the discovery of temporal patterns is another important aspect that claims for attention. Indeed, cyclic (Ozden, et al., 1998) and calendric association rules (Ramaswamy, et al., 1998) were the first proposals to relate content and temporal co-occurrences, identifying association rules that cover only a portion of the database. However, again they do not incorporate any knowledge about time in an explicit form. Only more recent work tried to introduce some knowledge in the discovery of such rules, in particular the OntoInterleaved algorithm (Antunes, 2007). In this case, and despite it was defined in the context of a time ontology, it is not applied to structural data, and seems that it can be improved in order to increase its performance.

3 D2PM Framework Being able to incorporate domain knowledge in the mining process is a problem pursued since the first days of data mining, but there were just a few approaches proposed and none of them solves the problem entirely. The D2PM project, proposed here, does not intend to solve the entire DM problem, but aims to propose a solution for pattern mining in the presence of domain knowledge, without introducing new languages or strange mechanisms. In summary, the central goal of the D2PM project is to develop a framework to support the entire process of pattern mining guided by domain knowledge, covering its wide range of expressions (from transactional to structured patterns). The project goes through the definition of a new D2PM framework, as an extension of the Onto4AR framework, maintaining the knowledge base represented through a domain ontology, and providing a set of pre-defined constraints that allow for the guided discovery of information. In addition, it would provide a set of efficient algorithms that use those constraints and the existing knowledge to mine patterns, from transactional to sequential ones, either by considering or ignoring temporal aspects. The previous framework already presents some considerable advantages over other approaches (described in the section Literature Review). At first, the representation of existing knowledge is made using the most consensual tool from the knowledge representation area – Ontologies. These formalisms are able to represent the relevant concepts within a domain and the relationships between them, and can be used to provide the description of that domain, including the properties for each concept. Being generally accepted, domain ontologies become to exist in almost all domains, from general to detailed descriptions of particular domains. In this manner, this set of existing libraries can be used to represent available domain knowledge, in a wide

D2PM: Domain Driven Pattern Mining

5

range of application areas. Naturally, if there is no ontology for a specific domain, it can be created with the help of a domain expert. The second feature of the Onto4AR framework that deserves some attention is the existence of a set of pre-defined constraints to guide the mining process. These constraints are general constraints that can be applied to any domain, and correspond to some imposition over the characteristics of the patterns. In particular, it was considered two types of constraints: the interestingness measures and the content constraints. These last ones require the existence of a determined number of relations among the items in the pattern (from taxonomical to non-taxonomical relations). The framework is prepared to incorporate any new constraint that fits in these two categories, and the definition of new kinds of constraints, for example temporal constraints, but it requires the development of efficient algorithms to use those constraints. By allowing the existence of pre-defined constraints, the framework avoids the usage of any specific constraint language to specify the constraints to use; and by using independent-domain constraints, the discovery of unknown information is not threaten. A last advantage to refer is the control of the explosion on the number of discovered patterns in the context of the Onto4AR framework. Indeed, by using the domain ontology for defining actual constraints, the number of discovered patterns is reduced and consequently processing times are improved. As a direct result of the reduction of the number and scope of discovered patterns, the alignment between discovered information and user expectations is enhanced.

CHALLENGES AND CONTRIBUTIONS Despite pointed advantages, there are several major issues that were not addressed yet in the previous framework, and this is the goal of this project. The first contribution of the D2PM project is providing an integrated environment to apply pattern mining in all of its expressions. The main idea is to extend the referred framework, allowing for the discovery of transactional, sequential and more general structured patterns, including temporal information. In order to do this, a pattern should be defined as a generic concept, but following the same philosophy, which means that it will correspond to a bag of instances or concepts, that satisfy a set of constraints defined over the ontology that represents the available domain knowledge. This new formulation is based on the definition of the constraints that distinguish the different expressions of patterns, called structural constraints. For example a transactional pattern corresponds to a bag of instances or concepts with no repetitions, and at least satisfies the minimum support constraint. The second major contribution of the project would be the definition of the set of generic domain-independent constraints, defined over a domain ontology. The goal is to provide a set of pre-defined constraints that can be instantiated to any domain represented in an ontology, with minor efforts. In particular, two kinds of constraints would be considered: content and temporal constraints. While content constraints should be used to constrain the items present in discovered patterns, temporal constraints should reduce the scope of data that will be

6

Cláudia Antunes

considered for supporting the discovered patterns. Again, the strategy passes by the extension of already proposed constraints, either for transactional (Antunes, 2008) and for sequential patterns ( (Garofalakis, et al., 1999) and (Antunes, et al., 2002)), namely in the definition of axiom based constraints. However, in the last case, it will be required to develop new mechanisms to represent formal languages over the metaontology referred above and the ontology for representing the domain knowledge. In terms of temporal constraints, the main goal of the project is to allow for the exploration of the dynamic aspects of entities, instead of only exploring its static characteristics. In this context, our framework should provide constraints for considering temporal issues, like the existence of cycles, or the fact that some transactions only occur in certain periods of time. Along with temporal constraints, mechanisms for identifying the impact of some events on entities behaviors would be a considerable advance in the discovery of temporal information. The third major contribution of this project will be the creation of efficient algorithms to deal with provided constraints. The first issue to be addressed is that several of those constraints, for example content constraints, are not anti-monotonic. Indeed, all efficient algorithms in the context of pattern mining explore the antimonotonic property of minimum support constraint, and their adaptations to deal with other constraints are in some cases very difficult. However, algorithms to deal with constrained sequential pattern mining have shown that even in the presence of nonanti-monotonic constraints, they maintain their efficiency at acceptable rates (Antunes, et al., 2005). Following those results, the development and adaptation of new methods will be made, keeping the efforts centered on pattern growth methods (FP-growth (Han, et al., 2000), PrefixSpan (Pei, et al., 2001)) and GenPrefixSpan (Antunes, et al., 2003) and PrefixGrowth (Pei, et al., 2002)). In terms of exploring temporal information, the development of new algorithms will be based on the ideas present on GSP (Srikant, et al., 1996) and Onto-Interleaved algorithm (Antunes, 2007). As a result of the D2PM project, it is expected to have a better formulation of the entire problem of pattern mining, defined in a theoretical and solid framework. In addition, it is expected to have a set of efficient methods for incorporating domain knowledge into the core of the pattern mining process, either for contents and temporal issues, and simultaneously, a set of pre-defined constraints that can be used independently of the problem domain. In this manner, it should be possible to apply pattern mining as an effective unsupervised tool to gain information, keeping the focus of the discovery process on user expectations and background knowledge. 3.1 Framework Definition (Task 1) The goal for this task is to define the new framework (the D2PM framework), considering all the expressions of pattern mining, including temporal aspects of data. This will be achieved by combining the formulation of pattern mining in the Onto4AR framework, and the consideration of events instead of transactions as usual. An event differs from a transaction, since an event is a pair (set of propositions, instant of time) instead of (set of propositions, timestamp). Indeed, the substitution of the timestamp with a specific instant of time makes possible the introduction of all

D2PM: Domain Driven Pattern Mining

7

temporal information about the occurrence of the event, and makes use of that information to enrich the mining process. In order to cover all different kinds of patterns, from transactional to structured ones, in the same framework, it is required to define the constraints that patterns should satisfy in order to belong to one of those types. These constraints, called structural constraints, will be defined by considering a meta-ontology where the basic concepts related to sequences and other structures are represented. Since sequences are one of the simplest structures, and the research on the solutions to other structures is usually based on those results, the project will only consider sequences, maintaining the goal of creating a wide and unique framework to deal with any kind of pattern. In this manner, the result of this task will enclose the basic definitions of all the terms considered in pattern mining, including the ones needed to include temporal information. From a pure theoretical point of view, this task will create an ontology for the process of domain driven pattern mining, where all the concepts and relations among those concepts are precisely defined, and where the usage of the knowledge represented in the domain ontology by the use of constraints are also clear. Due to its nature, the task will be mainly conducted by the primary investigator and the PhD students (Andreia Silva and BI1). Naturally, master students should be involved in the discussion and all the decision process, in order to warrant the actual integration of all aspects (constraints and algorithms) covered by the framework. The development and validation of the new framework will be revisited, according to the difficulties identified in the other two tasks. At the end of the task, a paper for describing the new formulation of pattern mining should be published in a scientific journal. Along the design of the framework, a technical report per semester should be produced, for describing the weak points of the framework in covering the requirements of each expression of pattern mining, and proposing new redefinitions. 3.2 Definition of Constraints (Task 2) The second task will be dedicated to the definition of generic domain-independent constraints, defined over a domain ontology. The goal is to provide a set of predefined constraints that can be instantiated to any domain represented in an ontology. First, content constraints would be defined, either for transactional or sequential patterns. In particular, axiom-based constraints should be defined, in order to explore one of the most powerful tools in ontologies. These constraints should be useful, for example, for defining the equality among items, or the validity of a pattern in the domain context. Additionally, the framework should have the ability to represent constraints like the ones defined by regular (Garofalakis, et al., 1999) and context-free languages (Antunes, et al., 2002). Despite these languages are a natural way to represent background knowledge in the presence of sequential data, their definition is nontrivial and their corresponding automata (DFA – deterministic finite automata and PDA – push-down automata) tend to be very large and difficult to be specified by users. Moreover, their adaptation to deal with sequences of itemsets introduces

8

Cláudia Antunes

significant difficulties in their usage. In this manner, the D2PM project should consider the meta-ontology referred above to represent the same knowledge, and provide mechanisms to define those constraints in that context. A second category of constraints to be considered is the set of temporal constraints. Indeed, since the great majority of phenomena occur over time, the analysis of temporal data has been one of the data mining goals, from its beginning. However, traditional data mining operations are not able to deal with their intrinsic dynamic nature, since they usually treat temporal data as unordered collections of events, ignoring temporal information. As seen before, they usually center their attention into the analysis of the data transactions (intra-transactional analysis), instead of analyzing the relations between different transactions (inter-transactional analysis). In the last decade, the exploration of temporal data, usually called temporal data mining, achieved a considerable attention in the data mining community (Antunes, et al., 2001). Its main goal is to provide the ability to explore the dynamic aspects of entities, instead of only exploring its static characteristics. In particular, it is desirable to infer some cause-effect relations, allowing for the understanding of the evolution of analyzed entities. In this context, our framework should provide constraints for considering temporal issues, like the existence of cycles, or the fact that some transactions only occur in certain time intervals. Along with temporal constraints, mechanisms for identifying the impact of some events on entities behaviors would be a considerable advance in the discovery of temporal information. Additionally to these constraints, it is required that the framework warrant that the main goal of data mining is not in risk in the presence of such restrictive constraints (Hipp, et al., 2002). In order to do that, a last category of constraints – constraint relaxations (Antunes, et al., 2005), should be considered in the new framework. Note that despite of proposed relaxations cover a wide range of possibilities, they were designed in the context of sequential pattern mining, and have to be adapted to be applied to transactional data. This task will be coordinated by the PI, content constraints would be studied by Andreia Silva in the context of her PhD dissertation (in the first half of the task), and temporal constraints will be developed by BI1, in the context of his PhD. These researchers will be aided in the implementation of necessary mechanisms for incorporating those constraints by BIC1 and BIC2, respectively, in the second half of the task. The results achieved will be used as the input for task 3. 3.3 Pattern Mining Algorithms (Task 3) The third task is dedicated to the design and development of pattern mining methods that use background knowledge to guide the mining process. These new methods will use the new formulation of the problem of pattern mining provided by Task 1, and the structure of constraints defined in Task 2. At first, efficient methods for transactional patterns should be developed, in particular efficient methods for dealing with non-anti monotonic constraints. The main idea is to extend the FP-growth algorithm (Han, et al., 2000) to deal with content constraints. This will be achieved by Carlos Jacinto.

D2PM: Domain Driven Pattern Mining

9

In parallel, existing algorithms to deal with both content and temporal constraints in sequential patterns would be adapted to the new context. The challenge here is to consider constraints defined over ontologies, instead of constraints defined as automata, which will be developed by Andreia Silva. The third goal is to develop methods that use temporal information to guide the mining process. The first efforts will be dedicated to the design of methods to identify the impact of events on the evolution of sequences of events. This will be approached by considering a time-ontology for representing known events, and pattern-growth sequential pattern mining algorithms (like PrefixSpan (Pei, et al., 2001)). This corresponds to the main goal of the master research of André Louçã. At last, the adaptation of transactional algorithms (developed in the first step of this task) and OntoInterleaved algorithm (Antunes, 2007) will be conducted by BI1 in order to deal with cyclic and event-based constraints. The design of these methods will be done by the primary investigator and researchers André Louçã and BI1. This task is expected to deliver a set of pattern mining algorithms, implemented in a java integrated package, which will use the packages resulting from preceding tasks. This package will be accompanied by its corresponding documentation and a detailed technical report describing their architectures, methods and most important data structures. The development of each method will be done following the spiral methodology, with several iterations on the design, implementation and validation of pattern mining methods. Naturally, each step may develop more than one method, which should be described and published on international conferences. In addition, methods should be evaluated on small case studies, and those studies should also be published on international conferences. In conclusion, at the end of the task, it is expected to have a software package able to mine patterns in transactional and sequential data, effectively and efficiently and at least 6 papers published on international conferences.

4 Project Timeline and Management 4.1 Description of the Management Structure (3000 chars) Project management will be performed exclusively by the primary investigator due to its importance to the success of the project. Its goal is to warrant that the milestones are accomplished without delays, and that the quality of the developed work reaches the highest standards. This continuous task will be done through weekly meetings, between primary investigator and other researchers, namely PhD and Master students. In these meetings, the research paths will be discussed and defined, allowing the anticipation of actual problems and the definition of solutions to them. Primary investigator will make use of adequate project management tools (like Gantt scheduling charts, risk analysis and estimation costs) to follow project’s progress. Moreover, along with these tools, the primary investigator will make use of her experience on advising master students and co-advising of under-graduate students. In seven years, the primary investigator has advised ten dissertations and had co-advised four

10

Cláudia Antunes

dissertations, all closed in time and almost always reaching a classification of “Very Good”. Along with weekly meetings, technical reports and conference papers will be structured and reviewed by the primary investigator. In this manner, the quality control can be performed in time and in an adequate way. This task will also include the organization of workshops along the three years, aiming to accomplish two goals: to create a team, where researchers interact to each other, and to motivate team members. One of the main identified risks is that someone abandons the project. Among possible causes is the lack of motivation to be involved in a long-time project. In order to avoid this situation, all team members are, or will be on next year, enrolled in related projects in the scope of their PhD and Master dissertations. In this manner, the probability of the occurrence of this risk is minimized. 4.2 Milestones 30-06-2010 M1 - D2PM definition It consists on the delivery of the first version of the new framework, where the expressions of patterns are defined according to a set of structural constraints (1st step of Task 1). At this point all definitions will be provided to start the other tasks. A 1st technical report will be published. 28-02-2011 M2 – Constraints definition It consists on the delivery of the first set of content and temporal constraints, either for transactional and sequential contexts (Task 2). In parallel, the implementation of the 1st version of the framework will be ready (Task 1). It includes the first workshop. 31-08-2011 M3 - First algorithms Algorithms for transactional pattern mining will be published (Task 3) and the implementation of the proposed constraints will be finished (Task 2). The proposal for sequential and temporal pattern mining algorithms (Task 3) will be presented at this time. A technical report should be presented. 31-03-2012 M4 - D2PM 1st revision In M4, the revision of the framework will be presented, based on the requirements identified in Task2 and 3. Corresponding implementations would also be concluded. A technical report describing the adaptations will be published. The second workshop will be organized.

D2PM: Domain Driven Pattern Mining

11

30-09-2012 M5 - Sequential and Temporal PM algorithms In M5, all the proposed algorithms would be proposed, implemented and validated. The last requirements to be considered in the framework will be identified, and described in the 4th technical report. 31-12-2012 M6 – Delivery M6 corresponds to the final delivery of the framework, including its final revision. Its description will be published in the final report, and presented in the 3rd workshop, it should include an assessment of the advantages of the framework, and pointing out the remaining open issues.

6 Scientific activity diffusion actions In February of the 2nd and 3rd year, and on the finish of the project, the research team will organize a workshop, to present project results and other data mining challenges. These workshops are dedicated to project team, graduation and postgraduation students of Information Systems and Computer Engineering, interested in knowledge discovery in databases. During the project the team will submit scientific papers to both international conferences and journals, in order to disseminate project results, and on this manner contribute to advance the state of the art in the area of pattern mining, in general, and on the use of background knowledge in the mining process, in particular.

12

Cláudia Antunes

Bibliography

Agrawal R., Imielinsky T. and Swami A. Mining Association Rules between Sets of Items in Large Databases [Conference] // ACM SIGMOD Conf. on Management of Data. - 1993. - pp. 207-216. Antunes C. An Ontology-based Framework for Mining Patterns in the Presence of Background Knowledge [Conference] // International Conference on Advanced Intelligence. - [s.l.] : Posts and Telecom Press, 2008. - pp. 163-168. Antunes C. and Oliveira A. L. Inference of Sequential Association Rules Guided by Context-Free Grammars [Conference] // Proc. 6th Int'l Conf. Grammatical Inference. - [s.l.] : Springer, 2002. - pp. 1-13. Antunes C. and Oliveira A.L. Constraint Relaxations for Discovering Unknown Sequential Patterns [Conference] // Knowledge Discovery in Inductive Databases / ed. Goethals Bart and Siebes Arno. - Pisa, Italy : Springer, 2005. - Vol. LNCS 3377. - pp. 11-32. - 3-540-25082-4. Antunes C. and Oliveira A.L. Generalization of Pattern-Growth Methods for Sequential Pattern Mining with Gap Constraints [Conference] // Int'l Conf. on Machine Learning and Data Mining. - [s.l.] : Springer, 2003. - pp. 239-251. Antunes C. Temporal Pattern Mining Using a Time Ontology [Book Section] // New Trends in Artificial Intelligence / book auth. J. Neves M. Santos, J. Machado. Guimarães : [s.n.], 2007. - 13 978-989-95618-0-9.. Bayardo R.J. The Hows, Whys, and Whens of Constraints in Itemset and Rule Discovery [Conference] // Proc. of the Workshop on Inductive Databases and Constraint Based Mining. - 2005. - pp. 1-13. Bayardo R.J., Agrawal R. and Gunopulos D. Constraint-Based Rule Mining in Large, Dense Databases [Conference] // 15th IEEE Int'l Conf. on Data Engineering. [s.l.] : IEEE Press, 1999. - pp. 188-197. Feldman R. and Hirsh H. Mining Associations in Text in the Presence of Background Knowledge [Conference] // 2nd International Conference on Knowledge Discovery in Databases and Data Mining. - Portland, Oregon, USA : ACM Press, 1996. - pp. 343-346. Garofalakis M., Rastogi and Shim SPIRIT: Sequential Pattern Mining with Regular Expression Constraint [Conference] = VLDB'99 // Int’l Conf. Very Large Databases / ed. Atkinson Malcolm P. [et al.]. - Edinburgh, Scotland : Morgan Kaufmann, 1999. - pp. 223-234. - 1-55860-615-7. Han J., Pei J. and Yin Y. Mining Frequent Patterns without Candidate Generation [Conference] // Proc. Int'l Conf. on Management of Data. - [s.l.] : ACM Press, 2000. pp. 1-12. Hipp J. and Güntzer U. Is pushing constraints deeply into the mining algorithms really what we want? [Journal] // SIGKDD Explorations. - [s.l.] : ACM Press, 2002. 1 : Vol. 4. - pp. 50-55.

D2PM: Domain Driven Pattern Mining

13

Jaroszewicz S. and Simovici D. Interestingness of Frequent Itemsets Using Bayesian Networks as Background Knowledge [Conference] // ACM Int’l Conf on Knowledge Discovery and Data Mining. - [s.l.] : ACM Press, 2004. - pp. 178 – 186. Ozden B., Ramaswamy S. and Silberschatz A. Cyclic association rules [Conference] // Int'l Conf. Data Engineering (ICDE 98). - [s.l.] : IEEE Press, 1998. pp. 412-421. Pei J. [et al.] PrefixSpan: Mining Sequential Patterns Efficiently by PrefixProjected Pattern Growth [Conference] // Int’l Conf Data Engineering. - [s.l.] : IEEE Computer Society Press, 2001. - pp. 215-224. Pei J. and Han J. Constrained frequent pattern mining: a pattern-growth view [Journal] // SIGKDD Explorations. - [s.l.] : ACM Press, 2002. - 1 : Vol. 4. - pp. 3139. Ramaswamy S., Mahajan S. and Silberschatz A. On the Discovery of Interesting Patterns in Association Rules [Conference] // 24th Int’l Conf. Very Large Data Bases (VLDB 98). - [s.l.] : Morgan Kaufmann, 1998. - pp. 368-379. Srikant R. and Agrawal R. Mining Sequential Patterns: generalizations and performance improvements [Conference] // Int’l Conf Extending Database Technology / ed. Apers Peter M. G., Bouzeghoub Mokrane and Gardarin Georges. [s.l.] : Springer, 1996. - Vol. 1057. - pp. 3-17. Yang Q. and Wu X. 10 Challenging Problems in Data Mining Research [Journal] // Int’l Journal of Information Technology & Decision Making. - [s.l.] : World Scientific Publishing Company, 2006. - 4 : Vol. 5. - pp. 594-604.

Prior Knowledge Driven Domain Adaptation