Sentence recognition using artificial neural networksq ...

Viewer
Transcript

Knowledge-Based Systems 21 (2008) 629–635

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

Sentence recognition using artiﬁcial neural networks q Maciej Majewski a,*, Jacek M. Zurada b a b

Faculty of Mechanical Engineering, Koszalin University of Technology, Raclawicka 15-17, 75-620 Koszalin, Poland Electrical and Computer Engineering Department, University of Louisville, 405 Lutz Hall, Louisville, KY 40292, USA

a r t i c l e

i n f o

Article history: Received 17 June 2007 Accepted 24 March 2008 Available online 8 April 2008 PACS: 01.30.y

a b s t r a c t The paper describes an application of artiﬁcial neural networks (ANN) for natural language text reasoning. The task of knowledge discovery in text from a database, represented with a database ﬁle consisting of sentences with similar meanings but different lexico-grammatical patterns, was solved with ANNs which recognize the meaning of the text using training ﬁles with limited dictionary. The paper features recognition algorithms of text meaning from a selected source using 3-layer ANNs. Tests of the new method have also been described. Ó 2008 Elsevier B.V. All rights reserved.

Keywords: Knowledge discovery Sentence recognition Artiﬁcial intelligence Human–computer interaction Natural language processing

1. Introduction For linguistic research, there is a need for consciously created and organized collections of data and information that can be used to perform knowledge discovery in text and to evaluate the performance and effectiveness of related tools. Knowledge discovery in text is a non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in unstructured text data. These patterns are unknown, hidden or implicit in semi-structured and unstructured collections of text. Below are some knowledge discovery tasks that many subject disciplines are interested in: Identiﬁcation and retrieval of relevant documents from large collections of documents. Identiﬁcation of relevant sections in large documents (passage retrieval). Co-reference resolution, i.e., the identiﬁcation of expressions in texts that refer to the same entity, process or activity. Extraction of entities or relationships from text collections.

Automated characterization of entities and processes in texts. Automated construction of ontologies for different domains (e.g., characterization of medical terms). Construction of controlled vocabularies from ﬁxed sets of documents for particular domains. The need to construct controlled vocabularies for subject domains has meant that terminological extraction from corpora has become an important process in tasks related to knowledge discovery in text. The proposed system for knowledge discovery in text uses neural networks for natural language understanding as shown in Fig. 1. The motivation behind using the binary neural networks for knowledge discovery is that they offer an advantage of simple binarization of words and sentences, as well as very fast training and run-time response of this type of neural networks [14]. The system consists of a selected data source, 3-layer ANNs, network training sets, letter chain recognition algorithms, syntax analysis algorithms, as well as coding algorithms for words and sentences. 2. The state of the art

q

The source code of the implemented ANN is available by emailing the corresponding author. * Corresponding author. Tel.: +48 94 3478352. E-mail addresses: [email protected] (M. Majewski), jacek. [email protected] (J.M. Zurada). URL: http://ci.louisville.edu/zurada (J.M. Zurada). 0950-7051/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.knosys.2008.03.053

Knowledge discovery is a growing ﬁeld. There are many knowledge discovery methodologies in use and under development. Some of these techniques are generic, while others are domainspeciﬁc.

630

M. Majewski, J.M. Zurada / Knowledge-Based Systems 21 (2008) 629–635

Database (selected data source)

sentences with similar meanings

Intelligent text knowledge discovery meaning mechanisms (reasoning) (Neural networks)

but different lexicogrammatical patterns

word recognition sentence recognition syntax analysis

Data source input

Learning algorithms are an integral part of knowledge discovery. Learning techniques may be supervised or unsupervised. In general, supervised learning techniques enjoy a better success rate as deﬁned in terms of usefulness of discovered knowledge. According to [1,3], learning algorithms are complex and generally considered the hardest part of any knowledge discovery technique. Machine discovery is one of the earliest ﬁelds that has contributed to knowledge discovery [5]. While machine discovery relies solely on an autonomous approach to information discovery, knowledge discovery typically combines automated approaches with human interaction to assure accurate, useful, and understandable results.

Human

feedback information knowledge discovery parameters

Fig. 1. Steps involved in proposed knowledge discovery in text.

sentences in natural language

DATABASE Selected database

Selected source data input

sentences with similar meanings but different lexicogrammatical patterns

coded words

(Fig. 3a)

Algorithm

10010011011

sentence recognition cycle (Fig. 6)

feedback information knowledge discovery parameters

Algorithm for coding words

recognized letter strings

LETTER STRING RECOGNITION

ANN training file Learning words 01100010111

WORD RECOGNITION MODULE (Fig. 4)

HUMAN

recognized text meaning

recognized words as sentence components

ANN training files Learning meaningful sentences 1000100100100001000010

SENTENCE RECOGNITION MODULE (Fig. 5)

processed coded sentence

Algorithm for coding sentences

indexed sentence component words

(Fig. 3b)

SENTENCE SYNTAX ANALYSIS Algorithm

10010011011

Fig. 2. Scheme of the proposed system for knowledge discovery in text.

a

Bit = { 0 (for ANN: -1)

, 1

b

}

WORD 1 WORD 2 WORD 3

b

ABCDEFGH I J KLMNOPQRSTUVWXYZ

a

1 2 3 4 5 6 7 8 9 10 11 12 13 14

C O M P U T E R

a

ABCDEFGH I J KLMNOPQRSTUVWXYZ Represented letter

b

WORD b Bit number

a Fig. 3. Inputs of (a) the word recognition module, (b) the sentence recognition module.

631

M. Majewski, J.M. Zurada / Knowledge-Based Systems 21 (2008) 629–635

Output: images of the recognized words

Classification to word classes (Set of word patterns as sentence components) C1

C2

Cp

N32

N31

a1

ABCDEFGH I J KLMNOPQRSTUVWXYZ 1 2 3 4 5 6 7 8 9 10 11 12 13 14

N3p

a2

ap

M

C O M P U T E R

a

ABCDEFGH I J KLMNOPQRSTUVWXYZ

Image of recognized word ‘COMPUTER’ N21

p=number of patterns

N2p

N22

Input of the net: n=26*a 1

b1

b2

N11

b

1

1

bp

ABCDEFGH I J KLMNOPQRSTUVWXYZ

W

N1p

N12

a xi={-1,+1} x1

x2

xn

1 2 3 4 5 6 7 8 9 10 11 12 13 14

C O M P U T

a

ABCDEFGH I J KLMNOPQRSTUVWXYZ

Binary images as isolated components of the text created by the letter string recognition module

Binary image of letter string ‘COMPUT’

Fig. 4. Word recognition module.

Classification to sentence classes (Set of possible meaningful sentence patterns) C1

C2

Output: binary image of the recognized sentence

C3

Cp

Binary images of sentence patterns

ACQUIRED CAMERA COMPUTER

N31

N32

N33

IMAGES

N3p Output layer

LABORATORY

OF

a1

a2

a3

ap Recursive layer MAXNET

PROBES THE WITH

N22

N21

N2p

N23

p=number of patterns 1

1

M

1

1

ACQUIRED

b1

b2

b3

N12

N11

bp N1p

N13

Binary distance layer

CAMERA COMPUTER IMAGES

W b

N01

x1

N02

x2

N03

x3

N04

x4

N05

x5

N0n

N06

x6

Input of the net: n=a*b

xi={-1,+1}

PROBES

Binary image of a sentence

xn a

Fig. 5. Overall diagram of a 3-layer neural network for sentence recognition module.

632

M. Majewski, J.M. Zurada / Knowledge-Based Systems 21 (2008) 629–635

There are many different approaches to knowledge discovery techniques [13]. There are quantitative approaches, such as the probabilistic and statistical approaches, other approaches utilize visualization techniques. There are classiﬁcation approaches such as Bayesian classiﬁcation, inductive logic, pattern discovery and decision tree analysis [3,5]. Classiﬁcation is probably the oldest and most widely-used of all the knowledge discovery approaches

[4,9,13]. This approach groups data according to similarities or classes. Pattern detection of important trends is the basis for the deviation and trend analysis approach. Other approaches include deviation and trend analysis, genetic algorithms, neural networks, and hybrid approaches that combine two or more techniques [8]. The probabilistic approach family of knowledge discovery utilizes graphical representation models to compare different knowl-

the computers made an acquisition of a full-color image of a monitored probe using all 12 filters of the digicamera

ACQUIRED CAMERA COMPUTER

LETTER STRING RECOGNITION MODULE

IMAGES

comput acqui imag prob cam PROBES

WORD RECOGNITION MODULE

computer acquired images probes camera

BINARY CODING

SENTENCE RECOGNITION MODULE

Hamming distance DH= 5 to the class representing sentence: ‘the computer acquired images of probes with the laboratory camera’

Hamming distance DH= 12 to the class representing sentence: ‘the computer surveyed probes with the thermal emission spectrometer’

ACQUIRED CAMERA COMPUTER

COMPUTER EMISSION

IMAGES LABORATORY

OF PROBES SPECTROMETER SURVEYED THE THERMAL

PROBES THE

WITH

WITH

Fig. 6. Illustration of sentence recognition cycle.

b1 x1

N01

a1

N11

N21

N31

C1

N32

C2

N3p

Cp

1

b2 x2

N02

a2

N12

N22 1

bp xn

N0n Wpn

Input layer xi={-1,+1}

ap

N1p

Binary distance layer

N2p Mpp

1

MAXNET layer

Output layer

Fig. 7. Structure of the Hamming neural network as a classiﬁer-expert module for word and sentence recognition.

633

M. Majewski, J.M. Zurada / Knowledge-Based Systems 21 (2008) 629–635

algorithms. The coded words are inputs of the neural network for recognizing words (Fig. 3a). The network uses a training ﬁle containing also words and is trained to recognize words as sentence components, with words represented by output neurons (Fig. 4). In the next stage, the coded words are transferred to the sentence syntax analysis module. The module analyses and indexes words properly before they are processed by the algorithm for coding sentences. The commands are coded as vectors and then become inputs of the sentence recognition module (Fig. 3b). The module uses a 3-layer Hamming neural network as shown in Fig. 5, either to recognize the sentence and ﬁnd its meaning or else it fails to recognize it (Fig. 6). The neural network of this module uses a training ﬁle containing patterns of possible meaningful sentences. Because of the binary input signals, the Hamming neural network is chosen for both the word recognition and sentence recognition module as shown in Fig. 7 which directly realizes the onenearest-neighbour classiﬁcation rule [15]. Each training data vector is assigned a single class and during the recognition phase only a single nearest vector to the input pattern x is found and its class C i is returned. There are two main phases of the operation of the expert-network: training (initialization) and classiﬁcation. Training of the binary neural network consists of copying reference patterns as the weights of the matrix W pn , as follows (1):

edge representations [9]. These models are based on probabilities and data independencies. The statistical approach uses rule discovery and is based on data relationships. An inductive learning algorithm can automatically select useful join paths and attributes to construct rules from a database with many relations [4]. This type of induction is used to generalize patterns in data and to construct rules from the noted patterns. ANNs may be used as tools for knowledge discovery. They are particularly useful for pattern recognition, and are sometimes grouped with the classiﬁcation approaches. A hybrid approach to knowledge discovery combines more than one approach and is also called a multi-paradigmatic approach [2]. Although implementation may be more difﬁcult, hybrid tools are able to combine their strengths of various approaches. Some of the commonly used methods combine visualization techniques, induction, neural networks, and rule-based systems to perform the desired knowledge discovery. Deductive databases and genetic algorithms have also been used in hybrid approaches. The methods of sentence recognition proposed in the literature lead to interesting new approaches and techniques, but only very few experiments and evaluation of the proposed methods have been reported. In [6], an extended Kohonen feature map was described that was able to store sequences of input patterns. This network is able to learn to simulate a ﬁnite-state machine for the grammar, given examples of legal sentences from the regular grammar. Jiang in [7] proposed an algorithm which represents a framework for classiﬁer combination in grammar-guided sentence recognition that is applicable to a variety of different tasks. In [12], the sentence recognition method using word cooccurrence probability was described, and compared with the method using Context-Free Grammar.

wi ¼ xi ;

ð1Þ

where p is the number of input patterns-vectors x, each of the same length n; wi is the i-th row of the matrix W of dimensions (p n). For given n the computation time is linear with the number of input patterns p. The goal of the recursive layer N 2: is the selection of the winning neuron representing a word or command class. The characteristic feature of this group of neurons is a self connection of a neuron to itself with a weight mii ¼1 for all 1 6 i 6 p, whereas all other weights are kept negative. Initialization of the layer N 2: consists of assigning negative values to the square matrix M pp except the main diagonal. Originally Lippmann proposed initialization [10] (2):

3. Description of the method In the proposed knowledge discovery system shown in abbreviated form on Fig. 2, sentences are extracted from the database. Individual words treated here as isolated components of the text are processed by the letter string recognition and coding

100

100

98

98

96

96

Recognition rate R [%]

Recognition rate R [%]

16i6p

94

94

92

92

90

90

88

88 1

2

3

4

5

6

7

8

9

10

11

12

Data set number n Fig. 8. Sentence meaning recognition rate as a set of words recognized earlier.

634

M. Majewski, J.M. Zurada / Knowledge-Based Systems 21 (2008) 629–635

mkl ¼ ðp 1Þ1 þ nkl where

for

k 6¼ l;

1

for

k¼l

In the classiﬁcation phase, the group N 1: is responsible for computation of the binary distance between the input pattern z and the training patterns already stored in the weights W. Usually this is the Hamming distance (4):

ð2Þ

1 6 k; l 6 p; p > 1

where n is a random value for which j n j ðp 1Þ1 . However, it appears that the most efﬁcient and still convergent solution is to set equal weights for all neurons in N 2: which are then modiﬁed at each step during the classiﬁcation phase as follows (3): mkl ¼ ek ðtÞ ¼ ðp tÞ1 for where 1 6 k; l 6 p; p > 1

k 6¼ l;

1

for

k¼l

bi ðz; WÞ ¼ 1 n1 DH ðz; wi Þ;

ð4Þ

where bi 2 ½0; 1 is a value of an i-th neuron in the N 1: layer, DH ðz; wi Þ 2 f0; 1; . . . ; ng is a Hamming distance of the input pattern z and the i-th stored pattern wi (i-th row of W). In the classiﬁcation stage, the layer N 2: operates recursively to select the winning neuron. This process is governed by the following Eq. (5):

ð3Þ

where t is a classiﬁcation time step. In this case the convergence is achieved in p 1 r steps, where r > 1 stands for the number of nearest stored vectors in W.

a

16i6p

0,93

Lmin =0,49*L 6

11

Minimum number of letters Lmin

5

9

4

6

3

4

2

7

12

10

8

5

95%

1

1

2

3

0 0

2

4

6

8

10

12

14

Number of letters of the word L

b

W min =0,62*W 0,79

Minimum number of words Wmin

4

3

4

2

6

7

8

6

7

8

5

95%

1

1

2

1

2

3

0 0

3

4

5

9

Number of words of the sentence W Fig. 9. Sensitivity of (a) word recognition: minimum number of letters of the word being recognized vs. number of word letters, (b) sentence meaning recognition: minimum number of words of the sentence being recognized vs. number of sentence component words.

M. Majewski, J.M. Zurada / Knowledge-Based Systems 21 (2008) 629–635

ai ½t þ 1 ¼ u

n X

! mij aj ½t

¼ u ai ½t þ

j¼1

n X

! mij aj ½t

ð5Þ

j¼1;i6¼j

where ai ½t is an output of the i-th neuron of the layer N 2: for the iteration t; u is the threshold function deﬁned as follows (6): uðxÞ ¼ x

for

x > 0;

0 otherwise

ð6Þ

Depending on the chosen scheme (2)–(3) of the weights mij in (5), we obtain different dynamics of the classiﬁcation stage. The iterative process (5) proceeds up to a point where only one neuron has value different than 0 – this neuron is a winner which represents the word or command class. 4. Experimental results The process of data insertion into the database of sentences probes for experiments was supported with the implemented speech interface. The speech recognition engine was a continuous density mixture Gaussian Hidden Markov Model system which uses vector quantization for speeding up the Euclidean distance calculation for probability estimation [11]. The system uses context dependent triphonic cross word acoustic models with speaker normalization based on vocal tract length normalization, channel adaptation using mean Cepstral subtraction and speaker adaptation using Maximum Likelihood Linear Regression. The test dataset consisted of the database of 1500 sentences, ﬁles consisting of 522 letter strings, 87 training words and 510 meaningful training sentences. The ﬁrst test measured the performance of the sentence meaning recognition with the sentence recognition module as a set of words recognized earlier (Fig. 8). As shown in Fig. 9a, the ability of the implemented neural network to recognize a word depends on the number of letters of that word. For best performance, the neural network requires a minimum number of letters of each word being recognized as its input. As shown in Fig. 9b, the ability of the neural network to recognize the sentence depends on the sentence length (wordcount). Similarly, for best sentence recognition, the neural network requires a certain minimum wordcount of the given sentence. 5. Conclusions and perspectives Application of binary neural networks allows for recognition of sentences in natural language with similar meanings but different lexico-grammatical patterns, which can be encountered in docu-

635

ments, texts, vocabularies and databases. The method presented in this paper can be easily extended. In the literature there are very few reports about sentence recognition. The method proposed in this paper is a conceptually new approach to this problem. The experimental results of the proposed method of sentence recognition show its excellent and promising performance, and can be used for further development and experiments. In the future, sentences in natural language will undoubtedly be the most important way of communication between humans and computers. Great progress is made in many ﬁelds of science, where communication between humans and computers is an important task. The proposed neural network is both effective and ﬂexible which makes its applications possible. As an interface, it allows for more robustness to human’s errors. The proposed solution also eliminates scarcities of the typical co-operation between humans and computers. References [1] M. Berry, G. Linoff, Mastering Data Mining, John Wiley & Sons, 2000. [2] I. Cloete, J.M. Zurada, Knowledge-Based Neurocomputing, MIT Press, Cambridge, Massachusetts, 2000. [3] M.H. Dunham, Data Mining Introductory and Advanced Topics, Prentice Hall, 2003. [4] J. Han, M. Kamber, Data Mining: Concepts and Techniques, second ed., Morgan Kaufmann, 2006. [5] D.J. Hand, H. Mannila, P. Smyth, Principles of Data Mining, MIT Press, 2000. [6] A. Hoekstra, M. Drossaers, An extended Kohonen feature map for sentence recognition, in: Proceedings of ICANN 1993, Lecture Notes in Computer Science, Springer, 1993, pp. 404–407. [7] X. Jiang, K. Yu, H. Bunke, Classiﬁer combination for grammar-guided sentence recognition, in: Proceedings of MCS 2000, Lecture Notes in Computer Science, Springer, 2000, pp. 383–392. [8] M. Kantardzic, Data Mining: Concepts, Models, Methods, and Algorithms, Wiley-IEEE Press, 2002. [9] D.T. Larose, Discovering Knowledge in Data: An Introduction to Data Mining, John Wiley & Sons, Inc., 2004. [10] R. Lippman, An Introduction to Computing with Neural Nets, IEEE Transactions on Acoustic, Speech, and Signal Processing, IEEE Signal Processing Society, Piscataway 4 (3) (1987) 4–22. [11] M. Majewski, W. Kacalak, Intelligent human–machine voice communication system, Engineering Mechanics International Journal for Theoretical and Applied Mechanics 12 (3) (2005) 193–200. [12] I. Murase, S. Nakagawa, Sentence recognition method using word cooccurrence probability and its evaluation, in: Proceedings of ICSLP 1990, pp. 1217–1220. [13] P.N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Pearson Addison Wesley, 2005. [14] J.M. Zurada, Introduction to Artiﬁcial Neural Systems, PWS Publishing Company, Boston, Massachusetts, 1992. [15] J.M. Zurada, R.J. Marks, C.J. Robinson, Computational Intelligence: Imitating Life, IEEE Press, New York, 1994.

Electromagnetic field identification using artificial neural ... - CiteSeerX

View-invariant action recognition based on Artificial Neural ...

Using artificial neural networks to map the spatial ...

Financial Time Series Forecasting Using Artificial Neural ... - CiteSeerX

Loudness Model using Artificial Neural Networks

Using Artificial Neural Network to Predict the Particle ...

Electromagnetic field identification using artificial neural ...

Using artificial neural networks to map the spatial ...

Loudness Model using Artificial Neural Networks

Financial Time Series Forecasting Using Artificial Neural ... - CiteSeerX

offline handwritten word recognition using a hybrid neural network and ...

Sentence Segmentation Using IBM Word ... - Semantic Scholar

Neural Graph Learning: Training Neural Networks Using Graphs

Artificial neural networks for automotive air-conditioning systems (2 ...

ARTIFICIAL NEURAL NETWORK MODELLING OF THE ...

Artificial Neural Network for Mobile IDS Solution

Impact of Missing Data in Training Artificial Neural ...

Prediction of Software Defects Based on Artificial Neural ... - IJRIT