Concept Map Mining: A definition and a framework for its evaluation Jorge J. Villalon School of Elec. & Inf. Engineering University of Sydney [email protected]

Abstract Concept maps are visual representations of knowledge, widely used in educational contexts. We use the term ”Concept Map Mining” (CMM) to refer to the automatic extraction of Concept Maps from documents such as essays. The principles behind CMM have been proposed for applications such as: information extraction in specific knowledge domains, the measurement of student understanding and misconceptions based on written essays, and as a preliminary step to creating domain ontologies. Previous work on the automatic extraction of concept maps present two problems: 1) overly simplistic and varying definitions of concept maps, and 2) the lack of an evaluation framework that can be used to measure the quality of the generated maps. In this paper, we propose a formal definition of the term CMM, with a focus on educational applications. We also propose an evaluation framework that will allow other researchers to share a common ground to evaluate the performance of CMM methods.

1

Introduction

Concept Maps (CMs) have been widely used in education as a way to represent students’ knowledge [11]. They comprise concepts and their relationships, arranged hierarchically according to the importance of the concepts described. Several studies have shown that concept maps are a valid and reliable medium to represent students’ understanding [5, 10], making them a valuable pedagogical tool. CMs are generally used in educational scenarios in two ways: 1) Students build a CM following a focus question, or 2) students analyze a previously built CM (usually expert-built). Both approaches have been found to improve student learning outcomes. Another method for allowing students to demonstrate their understanding of a topic, is to ask them to write a document or academic essay. Essays are considered among the most representative source of student understanding [4].

Rafael A. Calvo School of Elec. & Inf. Engineering University of Sydney [email protected]

Educational researchers have shown that writing is a task in which higher cognitive functions, such as analysis and synthesis, can be fully developed [4]. CMs have been used as a way to support the process of writing, by providing students with a tool to organize their knowledge before writing the essay. The automatic extraction of CMs from text, or what we are calling ”Concept Map Mining” (CMM), could provide new means to use the student knowledge elicited by essays. CMs represent semi-structured information which allows computers to process them in many ways, (for example, by calculating distances between CMs, or identifying sets of related concepts, and/or propositions). In this way, teachers could use CMM results to search for common propositions (a triple concept - relationship - concept) and misconceptions among students, or to more easily group students with similar levels of knowledge. The idea of automatically extracting concept maps has been proposed before, but previous studies expose challenges at two levels: An inconsistent definition of concept maps, and a lack of an evaluation framework. The first problem occurs when authors claim to be creating CMs, but are actually creating what might be better described as semantic networks, with their own unique characteristics, and which are not always suitable for educational purposes. The second problem is that each study uses a different method to evaluate its contribution, making it very hard to compare studies. Despite these issues the automatic construction of CM, with some caveats, is becoming possible. Due to their importance in education, the impact of such results could be considerable. In Section 2 and 3 we discuss previous studies related to CMM which have contributed to our proposed formal definition for CMM. In Section 4 we present an evaluation framework, and the design of a tool to support the creation of gold standards for CMM. In 4.1 propose a technical implementation. In Section 5 we conclude.

2

Previous work

Until recently, few studies have claimed to extract CM automatically from documents, particularly following the definition established by Novak in [11]. Alves et al. [1] presents a system called TextStorm, that extracts raw CMs from text, which are then interactively built upon by students. Their definition of CM includes concepts and linking words, but no hierarchy among concepts. They evaluated the quality of the CMs generated, by manually marking the correctness of propositions, extracted from a set of articles, essays and book chapters. Two other studies, by Clariana et al. [3], and by Richardson and Fox [12], describe the construction of CMs, but their maps don’t include labeled relationships, so propositions are not formed. Of these, only the Alves study reports an evaluation, which compares the distance between a gold standard map to human scores on the quality of the text. Two other studies, by Valerio et al. [14] and Zouaq et al. [15], create CMs with labeled relationships, but no hierarchy. These studies present promising results but their evaluations are simplistic, the first study evaluates just the concepts extracted, and the second uses a manual approach like Alves. More recently Chen et al. [2] extracted CMs from sets of academic papers, however the relationships are not labeled. The evaluation was performed manually by experts in the domain of the corpus, who assessed the quality of the maps.

3

A definition of Concept Map Mining

By our definition, CMM refers to the task of automatically generating a CM from text. If the CM is generated specifically from a written essay, we will refer to the output as an Automatic Concept Map from an Essay (or ACME). CMM is tightly related to the research on CMs and writing. The quality of an ACMEs will be affected by both, the author’s knowledge (represented by concepts and relationships), and her writing skills. We will follow Novak’s definition and quality measures of CM to define CMM requirements, an approach also followed by Valerio [14]. The various aspects of CMM can then be summarized into four key requirements: Educational utility, Simplicity, Semi-formality and Subjectivity. Educational utility: The aim of CMM for education is to produce CMs that provide accurate information about students’ knowledge, therefore it must follow Novak’s definition of CMs. This is, they must include concepts, linked by linking words, that form propositions. They must also have a topology: More general concepts should be higher in the map, and more specific concepts on lower levels. Concepts with the same level of generalization should be in the same level of the topology [11]. Therefore, the output of CMM should comprise: Concepts, Relationships and a Topology.

Simplicity: An ACME, to be useful, must be the best possible summary of the complete essay, with a restriction on the number of concepts. Novak’s definition indicates that a focus question should require no more than 25 concepts. ACME’s main use is for human analysis, giving teachers and students an alternative and structural representation of the students’ understanding, hence, simplicity is the first important requirement. This requirement affects CMM in two ways: Information must not be redundant, therefore synonyms and redundant propositions must be avoided, and information loss must be minimized, hence, the selected propositions should provide good coverage of the contents in the essay. If a document is too big, several ACMEs should be used, in the same way summaries are created per chapter, or per part. Semi-formality: Valerio argues that the relationships within CMs are not formal, in the sense of formal semantic networks, see [13]. We agree with this assertion, however, we believe that ACMEs should be semi-formal representations. ACMEs should provide semantic information on top of the raw text, which could eventually be used for more intelligent ways of analyzing the essays. For example, we can think of at least two possibilities: To allow computers to calculate distances between CMs, and to identify similar and opposite propositions. This requirement affects CMM in the way the concepts and linking words are represented – they should use the same word the author used, but in a form that would help information retrieval tasks. Subjectivity: An ACME represents the author’s knowledge and writing skills. In an educational context, we want to infer the student’s understanding and perspective on a topic. We argue that terminology used by the student is also important for assessing the outcome, so ACMEs should be represented in the same way the author did, this is, using the same terms. If a student uses a certain word to refer to a particular concept, his vocabulary level is inevitably reflected, so a concept map should retain this information on the students’ writing skills. This requirement affects CMM in two ways: The words for the concepts and relations must be extracted literally from the document, and the hierarchy of concepts must reflect the importance of the concepts relative to what was written in the particular document. Formally, a CM can be defined as a triplet CM = {C, R, G} where C is a set of concepts C = {c1 , c2 , ..., cn }, R is a set of relations between concepts R = {r1 , r2 , ..., rk } and G = g1 , g2 , ..., gm is a sorted set of generalization levels. Each concept ci corresponds to a word, or phrase, and it is unique in C. Each relation ri , is a triplet of the form ri = (cp , cq , li ), where cp and cq are concepts from C, and li is the label for the relation ri which also corresponds to a word or phrase. Each generalization level gi corresponds to a set of concepts gi = c1 , c2 , ..., cs that share the same level of generalization, the set is ordered

Figure 1. Concept Map Mining process in the sense that for two levels gi and gj , gi is more general than gj if and only if i < j.

3.1

General Concept Map Mining Process

The CMM process can be expressed as the proper identification of a concept map CM from a document D. This process has three steps: Concept extraction (CE), identifying the set of concepts C, Relationship extraction (RE), identifying R, and Topology extraction (TE), identifying a generalization G of the set of concepts. CE must be the first step of the process, because C defines R and G. RE and TE are independent, so there’s no loss of generalization assuming that RE goes first. By the definition of ACMEs, every term used to describe a concept or a relationship must appear in the document, therefore D itself defines all potential words (or phrases) that could become part of the ACME. We can formalize this idea by defining a document as a triplet D = Cd , Rd , Gd where Cd corresponds to all the concepts, Rd corresponds to all the propositions, and Gd corresponds to the levels of generalization expressed in the essay. Two sub-tasks for each task in the CMM process are then identified: Identification and summarization. The former correspond to identify concepts, relations, and a generalization from a document, i.e. identify Cd , Rd and Gd . The latter corresponds to identify the subsets C, R and G that are a good summary of D. Figure 1 shows as a diagram all the steps in the CMM process.

4

An evaluation framework for CMM

Evaluating the quality of an ACME is a complex task because it deals with knowledge and meaning that is both highly subjective and context dependent. One research area that deals with the subjective analysis of text is Automatic Essay Grading (AEG). Assessing the quality of a written essay, is a highly subjective task. It has been reported that inter-human agreement on essay grading is not high. On measuring the quality of AEG methods, researchers have compared the human-machine agreement, to the inter-

human agreement. They argue that, whenever the humanmachine agreement, is equal to or better than inter-human agreement, the performance of the automatic grading system is acceptable [8]. CMs assessment has also been reported to be highly subjective, hence we propose the use of human-machine agreement as the quality measure, and inter-human agreement as a baseline. It is well known that human behavior is highly subjective, specially when it comes to interpretations such as, quality or completeness. The subjectivity in our case, appears when we ask the subjects if the CM they constructed reflects the same knowledge expressed in the source essay. It has been reported that different people will construct different concept maps, even if they are answering the same focus question, and share the same level of expertise [9]. We propose the creation of a set of human-created concept maps from essays (HCME), as a gold standard. This set will define: A reference set to which ACMEs can be compared, and inter-human agreement as the baseline for the accuracy. The set must be created by human annotators, who are preferably also experienced concept mappers. They must strictly adhere to the methodology defined for CMM. A distance measure between the CMs must be defined, we propose the use of the simple distance used by McClure et al. in [10], already validated for CM assessment. To ensure that the annotators follow the CMM process, we designed a tool for creating HCMEs that fosters the proposed methodology. The tool follows the three main steps in the process, each one with a different interface, sequentially connected. The first interface, for concept extraction, ensures that concepts are expressed using words that appear in the essay. The second, for relationship extraction, allows only concepts from the previous step to be used, and linking words that appear in the same paragraph in which the two concepts appear. The third section allows for the accommodation of the concepts in different levels of generalization. Figure 2 shows the second section interface designed to support RE.

4.1

A suggested implementation

The CMM process implementation has three modules: Concept identification, relationship identification and summarization: • Concept identification using grammar trees. Nouns, verbs and adjectives, are parts of speech in written discourse, and can be identified using part of speech (POS) tagging systems. • Cascading relationship identification. We propose the use of three methods in cascade: Using regular expressions on the grammar tree, according to the method by

References

Figure 2. Relationship extraction interface Hearst [7]. Exploiting typed dependencies (grammatical relations) between concepts, as in Zouaq et al. [15]. Finally finding the verb between the concepts using a grammar tree. • Concept and relationship summarization. Using Singular Value Decomposition on a term by sentence matrix from the essay, each singular vector represents a topic sorted by the explained variance. Concepts and relationship belonging to the top singular vectors would be considered as in [6].

5

Conclusion

We have proposed a definition and methodology for ”Concept Map Mining” – the task of automatically extracting concept maps from documents. Our CMM definition is nested in the educational purposes of concept maps, and as such, imposes four requirements for the CMM process: Educational utility, Simplicity, Semi-formality, and Subjectivity. We have also proposed an evaluation framework for CMM to facilitate comparative study and evaluation. The framework involves the creation of gold standards comprised of concept maps extracted by human annotators from a set of essays. The annotators would adhere strictly to the methodology described in the CMM process. Humanmachine agreement will provide a measure for the quality of a particular method, and inter-human agreement will provide a baseline.

6

Acknowledgments

This project was supported by Australian Research Council Discovery Project DP0665064.

[1] A. O. Alves, F. C. Pereira, and A. Cardoso. Automatic reading and learning from text. In Proceedings of the International Symposium on Artificial Intelligence, 2002. [2] N.-S. Chen, P. Kinshuk, C.-W. Wei, and H.-J. Chen. Mining e-learning domain concept map from academic articles. Computers & Education, 50:694–698, 2008. [3] R. B. Clariana and R. Koul. A computer-based approach for translating text into concept map-like representations. In Proceedings of the First International Conference on Concept Mapping, 2004. [4] J. Emig. Writing as a mode of learning. College Composition and Communication, 28:122–128, 1977. [5] K. en Chang, Y.-T. Sung, and I.-D. Chen. The effect of concept mapping to enhance text comprehension and summarization. The Journal of Experimental Education, 71(1):5– 23, 2002. [6] Y. Gong and X. Liu. Generic text summarization using relevance measure and latent semantic analysis. In SIGIR ’01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 19–25. ACM, 2001. [7] M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics, 1992. [8] M. A. Hearst. The debate on automated essay grading. Intelligent Systems and Their Applications, 15(5):22–37, 2000. [9] H. Herl, H. O. Jr., G. Chung, and J. Schacter. Reliability and validity of a computer-based knowledge mapping system to measure content understanding. Computers in Human Behavior, 15:315–333, 1999. [10] J. R. McClure, B. Sonak, and H. K. Suen. Concept map assessment of classroom learning: Reliability, validity, and logistical practicality. Journal of Research in Science Teaching, 36:475–492, 1999. [11] J. D. Novak and D. B. Gowin. Learning How To Learn. Cambridge University Press, 1984. [12] R. Richardson and E. A. Fox. Using concept maps in digital libraries as a cross-language resource discovery tool. In Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, 2005. [13] J. F. Sowa. Semantic networks. Wiley, 1987. [14] A. Valerio and D. Leake. Jump-starting concept map construction with knowledge extracted from documents. In Proceedings of the Second International Conference on Concept Mapping, 2006. [15] A. Zouaq, R. Nkambou, and C. Frasson. Building domain ontologies from text for educational purposes. LECTURE NOTES IN COMPUTER SCIENCE, 4753:393–407, 2007.

Concept Map Mining: A definition and a framework for ...

domains, the measurement of student understanding and misconceptions based on written essays, and as a prelim- inary step to creating domain ontologies.

124KB Sizes 0 Downloads 230 Views

Recommend Documents

Concept-Map-Producing-a-Podcast.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item.Missing:

A Map/Reduce Parallelized Framework for Rapidly ...
Astronomical Data Analysis Software and Systems XIX. P74. ASP Conference Series, Vol. XXX, 2009. Y. Mizumoto, K.-I. Morita, and M. Ohishi, eds.

A Web Service Mining Framework
would be key to leveraging the large investments in applica- tions that have ... models, present an inexpensive and accessible alternative to existing in .... Arachidonic Acid. Omega-3 inheritance parent. Aspirin block COX1 energy block COX2.

D2PM: a framework for mining generic patterns
Applications developed under the Onto4AR framework showed that axiomatic constraints could also play an important role in the mining process, delegating the meaning of equality to the ontology, in particular to its axioms [11]. Actually, in this cont

A CONCEPT FOR A ROLE OF SEROTONIN AND ...
administration of reserpine show no change in bleeding or clotting time.6 It ... nervous system came from its discovery in the brain.6* The observation by.

A CONCEPT FOR A ROLE OF SEROTONIN AND ...
OF NORMAL. AND RESERPINE-TREATED RABBITS*. Drug. 1 Brain serotonin. I. Efiect .... animals were pretreated with i roniaaid (100 mg./kg.) .... Am. J. Physiol.

MINING IMBALANCED AND CONCEPT-DRIFTING ...
The minority class is learned incrementally based on support vectors to ..... AU. C. (%. ) Chunk Size. BAL. IMB. INC. INS. (d) Chunk size. 55. 60. 65. 70. 75. 80. 5.

A Proposed Framework for Proposed Framework for ...
approach helps to predict QoS ranking of a set of cloud services. ...... Guarantee in Cloud Systems” International Journal of Grid and Distributed Computing Vol.3 ...

A Recipe for Concept Similarity
knowledge. It seems to be a simple fact that Kristin and I disagree over when .... vocal critic of notions of concept similarity, it seems only fair to give his theory an.

A Design Concept for a Robotic Lunar Regolith ...
ety of tasks supporting the establishment and maintenance of a permanent lunar base. These can be categorized into con- struction and harvesting tasks. A. Construction. Initially the robotic system will assist with assembling the. TABLE I. AVERAGE CO

MATHS-CONCEPT-MAP-FINAL.pdf
Whoops! There was a problem loading more pages. Retrying... MATHS-CONCEPT-MAP-FINAL.pdf. MATHS-CONCEPT-MAP-FINAL.pdf. Open. Extract.

Reference Framework for Handling Concept Drift: An ...
In predictive analytics, machine learning and data mining the phenomenon ...... [13] A. Bifet, R. Gavalda, Learning from time-changing data with adaptive.

Ch. 10 Concept Map & Crossword.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Ch. 10 Concept ...

Bounded Rationality And Learning: A Framework and A ...
Email: [email protected]; University of Pennsylvania. ‡. Email: .... correctly specified model (which is always the best fit), while our paper corresponds ..... learning.14 The following examples illustrate several types of misspecification.

Developing a Framework for Decomposing ...
Nov 2, 2012 - with higher prevalence and increases in medical care service prices being the key drivers of ... ket, which is an economically important segmento accounting for more enrollees than ..... that developed the grouper software.

A framework for consciousness
needed to express one aspect of one per- cept or another. .... to layer 1. Drawing from de Lima, A.D., Voigt, ... permission of Wiley-Liss, Inc., a subsidiary of.

A GENERAL FRAMEWORK FOR PRODUCT ...
procedure to obtain natural dualities for classes of algebras that fit into the general ...... So, a v-involution (where v P tt,f,iu) is an involutory operation on a trilattice that ...... G.E. Abstract and Concrete Categories: The Joy of Cats (onlin

Microbase2.0 - A Generic Framework for Computationally Intensive ...
Microbase2.0 - A Generic Framework for Computationally Intensive Bioinformatics Workflows in the Cloud.pdf. Microbase2.0 - A Generic Framework for ...

A framework for consciousness
single layer of 'neurons' could deliver the correct answer. For example, if a ..... Schacter, D.L. Priming and multiple memory systems: perceptual mechanisms of ...

A SCALING FRAMEWORK FOR NETWORK EFFECT PLATFORMS.pdf
Page 2 of 7. ABOUT THE AUTHOR. SANGEET PAUL CHOUDARY. is the founder of Platformation Labs and the best-selling author of the books Platform Scale and Platform Revolution. He has been ranked. as a leading global thinker for two consecutive years by T

There's a Map for That - Esri
The role of Gis is expanding beyond network management into marketing, customer service, workforce management, environmental management, engineering ... Figure 1: Tailored Apps: The Collector app allows field workers to capture.