Knowledge-Based Systems 21 (2008) 629–635

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

Sentence recognition using artificial neural networks q Maciej Majewski a,*, Jacek M. Zurada b a b

Faculty of Mechanical Engineering, Koszalin University of Technology, Raclawicka 15-17, 75-620 Koszalin, Poland Electrical and Computer Engineering Department, University of Louisville, 405 Lutz Hall, Louisville, KY 40292, USA

a r t i c l e

i n f o

Article history: Received 17 June 2007 Accepted 24 March 2008 Available online 8 April 2008 PACS: 01.30.y

a b s t r a c t The paper describes an application of artificial neural networks (ANN) for natural language text reasoning. The task of knowledge discovery in text from a database, represented with a database file consisting of sentences with similar meanings but different lexico-grammatical patterns, was solved with ANNs which recognize the meaning of the text using training files with limited dictionary. The paper features recognition algorithms of text meaning from a selected source using 3-layer ANNs. Tests of the new method have also been described. Ó 2008 Elsevier B.V. All rights reserved.

Keywords: Knowledge discovery Sentence recognition Artificial intelligence Human–computer interaction Natural language processing

1. Introduction For linguistic research, there is a need for consciously created and organized collections of data and information that can be used to perform knowledge discovery in text and to evaluate the performance and effectiveness of related tools. Knowledge discovery in text is a non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in unstructured text data. These patterns are unknown, hidden or implicit in semi-structured and unstructured collections of text. Below are some knowledge discovery tasks that many subject disciplines are interested in:  Identification and retrieval of relevant documents from large collections of documents.  Identification of relevant sections in large documents (passage retrieval).  Co-reference resolution, i.e., the identification of expressions in texts that refer to the same entity, process or activity.  Extraction of entities or relationships from text collections.

 Automated characterization of entities and processes in texts.  Automated construction of ontologies for different domains (e.g., characterization of medical terms).  Construction of controlled vocabularies from fixed sets of documents for particular domains. The need to construct controlled vocabularies for subject domains has meant that terminological extraction from corpora has become an important process in tasks related to knowledge discovery in text. The proposed system for knowledge discovery in text uses neural networks for natural language understanding as shown in Fig. 1. The motivation behind using the binary neural networks for knowledge discovery is that they offer an advantage of simple binarization of words and sentences, as well as very fast training and run-time response of this type of neural networks [14]. The system consists of a selected data source, 3-layer ANNs, network training sets, letter chain recognition algorithms, syntax analysis algorithms, as well as coding algorithms for words and sentences. 2. The state of the art

q

The source code of the implemented ANN is available by emailing the corresponding author. * Corresponding author. Tel.: +48 94 3478352. E-mail addresses: [email protected] (M. Majewski), jacek. [email protected] (J.M. Zurada). URL: http://ci.louisville.edu/zurada (J.M. Zurada). 0950-7051/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.knosys.2008.03.053

Knowledge discovery is a growing field. There are many knowledge discovery methodologies in use and under development. Some of these techniques are generic, while others are domainspecific.

630

M. Majewski, J.M. Zurada / Knowledge-Based Systems 21 (2008) 629–635

Database (selected data source)

sentences with similar meanings

Intelligent text knowledge discovery meaning mechanisms (reasoning) (Neural networks)

but different lexicogrammatical patterns

word recognition sentence recognition syntax analysis

Data source input

Learning algorithms are an integral part of knowledge discovery. Learning techniques may be supervised or unsupervised. In general, supervised learning techniques enjoy a better success rate as defined in terms of usefulness of discovered knowledge. According to [1,3], learning algorithms are complex and generally considered the hardest part of any knowledge discovery technique. Machine discovery is one of the earliest fields that has contributed to knowledge discovery [5]. While machine discovery relies solely on an autonomous approach to information discovery, knowledge discovery typically combines automated approaches with human interaction to assure accurate, useful, and understandable results.

Human

feedback information knowledge discovery parameters

Fig. 1. Steps involved in proposed knowledge discovery in text.

sentences in natural language

DATABASE Selected database

Selected source data input

sentences with similar meanings but different lexicogrammatical patterns

coded words

(Fig. 3a)

Algorithm

10010011011

sentence recognition cycle (Fig. 6)

feedback information knowledge discovery parameters

Algorithm for coding words

recognized letter strings

LETTER STRING RECOGNITION

ANN training file Learning words 01100010111

WORD RECOGNITION MODULE (Fig. 4)

HUMAN

recognized text meaning

recognized words as sentence components

ANN training files Learning meaningful sentences 1000100100100001000010

SENTENCE RECOGNITION MODULE (Fig. 5)

processed coded sentence

Algorithm for coding sentences

indexed sentence component words

(Fig. 3b)

SENTENCE SYNTAX ANALYSIS Algorithm

10010011011

Fig. 2. Scheme of the proposed system for knowledge discovery in text.

a

Bit = { 0 (for ANN: -1)

, 1

b

}

WORD 1 WORD 2 WORD 3

b

ABCDEFGH I J KLMNOPQRSTUVWXYZ

a

1 2 3 4 5 6 7 8 9 10 11 12 13 14

C O M P U T E R

a

ABCDEFGH I J KLMNOPQRSTUVWXYZ Represented letter

b

WORD b Bit number

a Fig. 3. Inputs of (a) the word recognition module, (b) the sentence recognition module.

631

M. Majewski, J.M. Zurada / Knowledge-Based Systems 21 (2008) 629–635

Output: images of the recognized words

Classification to word classes (Set of word patterns as sentence components) C1

C2

Cp

N32

N31

a1

ABCDEFGH I J KLMNOPQRSTUVWXYZ 1 2 3 4 5 6 7 8 9 10 11 12 13 14

N3p

a2

ap

M

C O M P U T E R

a

ABCDEFGH I J KLMNOPQRSTUVWXYZ

Image of recognized word ‘COMPUTER’ N21

p=number of patterns

N2p

N22

Input of the net: n=26*a 1

b1

b2

N11

b

1

1

bp

ABCDEFGH I J KLMNOPQRSTUVWXYZ

W

N1p

N12

a xi={-1,+1} x1

x2

xn

1 2 3 4 5 6 7 8 9 10 11 12 13 14

C O M P U T

a

ABCDEFGH I J KLMNOPQRSTUVWXYZ

Binary images as isolated components of the text created by the letter string recognition module

Binary image of letter string ‘COMPUT’

Fig. 4. Word recognition module.

Classification to sentence classes (Set of possible meaningful sentence patterns) C1

C2

Output: binary image of the recognized sentence

C3

Cp

Binary images of sentence patterns

ACQUIRED CAMERA COMPUTER

N31

N32

N33

IMAGES

N3p Output layer

LABORATORY

OF

a1

a2

a3

ap Recursive layer MAXNET

PROBES THE WITH

N22

N21

N2p

N23

p=number of patterns 1

1

M

1

1

ACQUIRED

b1

b2

b3

N12

N11

bp N1p

N13

Binary distance layer

CAMERA COMPUTER IMAGES

W b

N01

x1

N02

x2

N03

x3

N04

x4

N05

x5

N0n

N06

x6

Input of the net: n=a*b

xi={-1,+1}

PROBES

Binary image of a sentence

xn a

Fig. 5. Overall diagram of a 3-layer neural network for sentence recognition module.

632

M. Majewski, J.M. Zurada / Knowledge-Based Systems 21 (2008) 629–635

There are many different approaches to knowledge discovery techniques [13]. There are quantitative approaches, such as the probabilistic and statistical approaches, other approaches utilize visualization techniques. There are classification approaches such as Bayesian classification, inductive logic, pattern discovery and decision tree analysis [3,5]. Classification is probably the oldest and most widely-used of all the knowledge discovery approaches

[4,9,13]. This approach groups data according to similarities or classes. Pattern detection of important trends is the basis for the deviation and trend analysis approach. Other approaches include deviation and trend analysis, genetic algorithms, neural networks, and hybrid approaches that combine two or more techniques [8]. The probabilistic approach family of knowledge discovery utilizes graphical representation models to compare different knowl-

the computers made an acquisition of a full-color image of a monitored probe using all 12 filters of the digicamera

ACQUIRED CAMERA COMPUTER

LETTER STRING RECOGNITION MODULE

IMAGES

comput acqui imag prob cam PROBES

WORD RECOGNITION MODULE

computer acquired images probes camera

BINARY CODING

SENTENCE RECOGNITION MODULE

Hamming distance DH= 5 to the class representing sentence: ‘the computer acquired images of probes with the laboratory camera’

Hamming distance DH= 12 to the class representing sentence: ‘the computer surveyed probes with the thermal emission spectrometer’

ACQUIRED CAMERA COMPUTER

COMPUTER EMISSION

IMAGES LABORATORY

OF PROBES SPECTROMETER SURVEYED THE THERMAL

PROBES THE

WITH

WITH

Fig. 6. Illustration of sentence recognition cycle.

b1 x1

N01

a1

N11

N21

N31

C1

N32

C2

N3p

Cp

1

b2 x2

N02

a2

N12

N22 1

bp xn

N0n Wpn

Input layer xi={-1,+1}

ap

N1p

Binary distance layer

N2p Mpp

1

MAXNET layer

Output layer

Fig. 7. Structure of the Hamming neural network as a classifier-expert module for word and sentence recognition.

633

M. Majewski, J.M. Zurada / Knowledge-Based Systems 21 (2008) 629–635

algorithms. The coded words are inputs of the neural network for recognizing words (Fig. 3a). The network uses a training file containing also words and is trained to recognize words as sentence components, with words represented by output neurons (Fig. 4). In the next stage, the coded words are transferred to the sentence syntax analysis module. The module analyses and indexes words properly before they are processed by the algorithm for coding sentences. The commands are coded as vectors and then become inputs of the sentence recognition module (Fig. 3b). The module uses a 3-layer Hamming neural network as shown in Fig. 5, either to recognize the sentence and find its meaning or else it fails to recognize it (Fig. 6). The neural network of this module uses a training file containing patterns of possible meaningful sentences. Because of the binary input signals, the Hamming neural network is chosen for both the word recognition and sentence recognition module as shown in Fig. 7 which directly realizes the onenearest-neighbour classification rule [15]. Each training data vector is assigned a single class and during the recognition phase only a single nearest vector to the input pattern x is found and its class C i is returned. There are two main phases of the operation of the expert-network: training (initialization) and classification. Training of the binary neural network consists of copying reference patterns as the weights of the matrix W pn , as follows (1):

edge representations [9]. These models are based on probabilities and data independencies. The statistical approach uses rule discovery and is based on data relationships. An inductive learning algorithm can automatically select useful join paths and attributes to construct rules from a database with many relations [4]. This type of induction is used to generalize patterns in data and to construct rules from the noted patterns. ANNs may be used as tools for knowledge discovery. They are particularly useful for pattern recognition, and are sometimes grouped with the classification approaches. A hybrid approach to knowledge discovery combines more than one approach and is also called a multi-paradigmatic approach [2]. Although implementation may be more difficult, hybrid tools are able to combine their strengths of various approaches. Some of the commonly used methods combine visualization techniques, induction, neural networks, and rule-based systems to perform the desired knowledge discovery. Deductive databases and genetic algorithms have also been used in hybrid approaches. The methods of sentence recognition proposed in the literature lead to interesting new approaches and techniques, but only very few experiments and evaluation of the proposed methods have been reported. In [6], an extended Kohonen feature map was described that was able to store sequences of input patterns. This network is able to learn to simulate a finite-state machine for the grammar, given examples of legal sentences from the regular grammar. Jiang in [7] proposed an algorithm which represents a framework for classifier combination in grammar-guided sentence recognition that is applicable to a variety of different tasks. In [12], the sentence recognition method using word cooccurrence probability was described, and compared with the method using Context-Free Grammar.

wi ¼ xi ;

ð1Þ

where p is the number of input patterns-vectors x, each of the same length n; wi is the i-th row of the matrix W of dimensions (p  n). For given n the computation time is linear with the number of input patterns p. The goal of the recursive layer N 2: is the selection of the winning neuron representing a word or command class. The characteristic feature of this group of neurons is a self connection of a neuron to itself with a weight mii ¼1 for all 1 6 i 6 p, whereas all other weights are kept negative. Initialization of the layer N 2: consists of assigning negative values to the square matrix M pp except the main diagonal. Originally Lippmann proposed initialization [10] (2):

3. Description of the method In the proposed knowledge discovery system shown in abbreviated form on Fig. 2, sentences are extracted from the database. Individual words treated here as isolated components of the text are processed by the letter string recognition and coding

100

100

98

98

96

96

Recognition rate R [%]

Recognition rate R [%]

16i6p

94

94

92

92

90

90

88

88 1

2

3

4

5

6

7

8

9

10

11

12

Data set number n Fig. 8. Sentence meaning recognition rate as a set of words recognized earlier.

634

M. Majewski, J.M. Zurada / Knowledge-Based Systems 21 (2008) 629–635

mkl ¼ ðp  1Þ1 þ nkl where

for

k 6¼ l;

1

for

k¼l

In the classification phase, the group N 1: is responsible for computation of the binary distance between the input pattern z and the training patterns already stored in the weights W. Usually this is the Hamming distance (4):

ð2Þ

1 6 k; l 6 p; p > 1

where n is a random value for which j n j ðp  1Þ1 . However, it appears that the most efficient and still convergent solution is to set equal weights for all neurons in N 2: which are then modified at each step during the classification phase as follows (3): mkl ¼ ek ðtÞ ¼ ðp  tÞ1 for where 1 6 k; l 6 p; p > 1

k 6¼ l;

1

for

k¼l

bi ðz; WÞ ¼ 1  n1 DH ðz; wi Þ;

ð4Þ

where bi 2 ½0; 1 is a value of an i-th neuron in the N 1: layer, DH ðz; wi Þ 2 f0; 1; . . . ; ng is a Hamming distance of the input pattern z and the i-th stored pattern wi (i-th row of W). In the classification stage, the layer N 2: operates recursively to select the winning neuron. This process is governed by the following Eq. (5):

ð3Þ

where t is a classification time step. In this case the convergence is achieved in p  1  r steps, where r > 1 stands for the number of nearest stored vectors in W.

a

16i6p

0,93

Lmin =0,49*L 6

11

Minimum number of letters Lmin

5

9

4

6

3

4

2

7

12

10

8

5

95%

1

1

2

3

0 0

2

4

6

8

10

12

14

Number of letters of the word L

b

W min =0,62*W 0,79

Minimum number of words Wmin

4

3

4

2

6

7

8

6

7

8

5

95%

1

1

2

1

2

3

0 0

3

4

5

9

Number of words of the sentence W Fig. 9. Sensitivity of (a) word recognition: minimum number of letters of the word being recognized vs. number of word letters, (b) sentence meaning recognition: minimum number of words of the sentence being recognized vs. number of sentence component words.

M. Majewski, J.M. Zurada / Knowledge-Based Systems 21 (2008) 629–635

ai ½t þ 1 ¼ u

n X

! mij aj ½t

¼ u ai ½t þ

j¼1

n X

! mij aj ½t

ð5Þ

j¼1;i6¼j

where ai ½t is an output of the i-th neuron of the layer N 2: for the iteration t; u is the threshold function defined as follows (6): uðxÞ ¼ x

for

x > 0;

0 otherwise

ð6Þ

Depending on the chosen scheme (2)–(3) of the weights mij in (5), we obtain different dynamics of the classification stage. The iterative process (5) proceeds up to a point where only one neuron has value different than 0 – this neuron is a winner which represents the word or command class. 4. Experimental results The process of data insertion into the database of sentences probes for experiments was supported with the implemented speech interface. The speech recognition engine was a continuous density mixture Gaussian Hidden Markov Model system which uses vector quantization for speeding up the Euclidean distance calculation for probability estimation [11]. The system uses context dependent triphonic cross word acoustic models with speaker normalization based on vocal tract length normalization, channel adaptation using mean Cepstral subtraction and speaker adaptation using Maximum Likelihood Linear Regression. The test dataset consisted of the database of 1500 sentences, files consisting of 522 letter strings, 87 training words and 510 meaningful training sentences. The first test measured the performance of the sentence meaning recognition with the sentence recognition module as a set of words recognized earlier (Fig. 8). As shown in Fig. 9a, the ability of the implemented neural network to recognize a word depends on the number of letters of that word. For best performance, the neural network requires a minimum number of letters of each word being recognized as its input. As shown in Fig. 9b, the ability of the neural network to recognize the sentence depends on the sentence length (wordcount). Similarly, for best sentence recognition, the neural network requires a certain minimum wordcount of the given sentence. 5. Conclusions and perspectives Application of binary neural networks allows for recognition of sentences in natural language with similar meanings but different lexico-grammatical patterns, which can be encountered in docu-

635

ments, texts, vocabularies and databases. The method presented in this paper can be easily extended. In the literature there are very few reports about sentence recognition. The method proposed in this paper is a conceptually new approach to this problem. The experimental results of the proposed method of sentence recognition show its excellent and promising performance, and can be used for further development and experiments. In the future, sentences in natural language will undoubtedly be the most important way of communication between humans and computers. Great progress is made in many fields of science, where communication between humans and computers is an important task. The proposed neural network is both effective and flexible which makes its applications possible. As an interface, it allows for more robustness to human’s errors. The proposed solution also eliminates scarcities of the typical co-operation between humans and computers. References [1] M. Berry, G. Linoff, Mastering Data Mining, John Wiley & Sons, 2000. [2] I. Cloete, J.M. Zurada, Knowledge-Based Neurocomputing, MIT Press, Cambridge, Massachusetts, 2000. [3] M.H. Dunham, Data Mining Introductory and Advanced Topics, Prentice Hall, 2003. [4] J. Han, M. Kamber, Data Mining: Concepts and Techniques, second ed., Morgan Kaufmann, 2006. [5] D.J. Hand, H. Mannila, P. Smyth, Principles of Data Mining, MIT Press, 2000. [6] A. Hoekstra, M. Drossaers, An extended Kohonen feature map for sentence recognition, in: Proceedings of ICANN 1993, Lecture Notes in Computer Science, Springer, 1993, pp. 404–407. [7] X. Jiang, K. Yu, H. Bunke, Classifier combination for grammar-guided sentence recognition, in: Proceedings of MCS 2000, Lecture Notes in Computer Science, Springer, 2000, pp. 383–392. [8] M. Kantardzic, Data Mining: Concepts, Models, Methods, and Algorithms, Wiley-IEEE Press, 2002. [9] D.T. Larose, Discovering Knowledge in Data: An Introduction to Data Mining, John Wiley & Sons, Inc., 2004. [10] R. Lippman, An Introduction to Computing with Neural Nets, IEEE Transactions on Acoustic, Speech, and Signal Processing, IEEE Signal Processing Society, Piscataway 4 (3) (1987) 4–22. [11] M. Majewski, W. Kacalak, Intelligent human–machine voice communication system, Engineering Mechanics International Journal for Theoretical and Applied Mechanics 12 (3) (2005) 193–200. [12] I. Murase, S. Nakagawa, Sentence recognition method using word cooccurrence probability and its evaluation, in: Proceedings of ICSLP 1990, pp. 1217–1220. [13] P.N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Pearson Addison Wesley, 2005. [14] J.M. Zurada, Introduction to Artificial Neural Systems, PWS Publishing Company, Boston, Massachusetts, 1992. [15] J.M. Zurada, R.J. Marks, C.J. Robinson, Computational Intelligence: Imitating Life, IEEE Press, New York, 1994.

Sentence recognition using artificial neural networksq ...

and organized collections of data and information that can be used .... methods combine visualization techniques, induction, neural net- works, and .... [3] M.H. Dunham, Data Mining Introductory and Advanced Topics, Prentice Hall,. 2003.

275KB Sizes 1 Downloads 206 Views

Recommend Documents

Electromagnetic field identification using artificial neural ... - CiteSeerX
resistive load was used, as the IEC defines. This resistive load (Pellegrini target MD 101) was designed to measure discharge currents by ESD events on the ...

View-invariant action recognition based on Artificial Neural ...
View-invariant action recognition based on Artificial Neural Networks.pdf. View-invariant action recognition based on Artificial Neural Networks.pdf. Open.

Using artificial neural networks to map the spatial ...
and validation and ground control points (GCPs) to allow registration of the ..... also like to thank Dr Arthur Roberts and two anonymous reviewers for their.

Financial Time Series Forecasting Using Artificial Neural ... - CiteSeerX
Keywords: time series forecasting, prediction, technical analysis, neural ...... Journal of Theoretical and Applied Finance, Information Sciences, Advanced ... data of various financial time series: stock markets, companies stocks, bond ratings.

Loudness Model using Artificial Neural Networks
Various 3-layered feed-forward networks were trained using the Quasi Newton. Backpropagation algorithm with 2,500 training epochs for each network, having ...

Using Artificial Neural Network to Predict the Particle ...
B. Model Implementation and Network Optimisation. In this work, a simple model considering multi-layer perception (MLP) based on back propagation algorithm ...

Electromagnetic field identification using artificial neural ...
National Technical University of Athens, 9 Iroon Politechniou Str., 157 80 Athens. 4. National ..... Trigg, Clinical decision support systems for intensive care units: ...

Using artificial neural networks to map the spatial ...
The success here to map bamboo distribution has important ..... anticipated that a binary categorization would reduce data transformation complexity. 3.2.

Loudness Model using Artificial Neural Networks
Universidad Tecnológica de Chile, Brown Norte 290, Santiago, Chile, [email protected] c. Universidad Tecnológica de Chile, Brown Norte 290, Santiago, Chile, [email protected]. English Translation by A. Osses. ABSTRACT: This article prese

Financial Time Series Forecasting Using Artificial Neural ... - CiteSeerX
Faculty of Mathematics and Computer Science. Department of ... Financial prediction is a research active area and neural networks have been proposed as one.

offline handwritten word recognition using a hybrid neural network and ...
network (NN) and Hidden Markov models (HMM) for solving handwritten word recognition problem. The pre- processing involves generating a segmentation ...

Sentence Segmentation Using IBM Word ... - Semantic Scholar
contains the articles from the Xinhua News Agency. (LDC2002E18). This task has a larger vocabulary size and more named entity words. The free parameters are optimized on the devel- opment corpus (Dev). Here, the NIST 2002 test set with 878 sentences

Neural Graph Learning: Training Neural Networks Using Graphs
many problems in computer vision, natural language processing or social networks, in which getting labeled ... inputs and on many different neural network architectures (see section 4). The paper is organized as .... Depending on the type of the grap

Artificial neural networks for automotive air-conditioning systems (2 ...
Artificial neural networks for automotive air-conditioning systems (2).pdf. Artificial neural networks for automotive air-conditioning systems (2).pdf. Open. Extract.

ARTIFICIAL NEURAL NETWORK MODELLING OF THE ...
induction furnace and the data on recovery of various alloying elements was obtained for charge ..... tensile tests of the lab melt ingot has been employed.

Artificial Neural Network for Mobile IDS Solution
We advocate the idea that mobile agents framework enhance the performance of IDS and even offer them new capabilities. Moreover agent systems are used in ...

Impact of Missing Data in Training Artificial Neural ...
by the area under the Receiver Operating .... Custom software in the C language was used to implement the BP-ANN. 4. ROC. Receiver Operating Characteristic.

Prediction of Software Defects Based on Artificial Neural ... - IJRIT
IJRIT International Journal of Research in Information Technology, Volume 2, Issue .... Software quality is the degree to which software possesses attributes like ...