A System for Recognition of Biological Patterns in Toxins Using Computational Intelligence Bernardo Penna Resende de Carvalho, Thais Melo Mendes, Ricardo de Souza Ribeiro, Ricardo Fortuna, Jos´e Marcos Veneroso and Maur´ıcio de Alvarenga Mudado Abstract— This work presents an innovative way to find biological patterns in toxins in order to classify them according to their biological functions. Basing on relevant biological information (database) it was developed a system that uses computational intelligence to discover novel patterns within the primary and secondary structures of a set of toxins. The discovered patterns make it possible to differentiate these toxins by their function: binding to specific channels for sodium, calcium or potassium ions. The classification rules are built using a given toxin database which is pre-processed according to the existence of signal peptide or propeptide in the primary sequence, together with the predicted secondary structures and its physical and chemical characteristics and water affinity information. The best obtained patterns are combined together in order to generate a final rule. All the experiments were performed using 802 toxin primary sequences labeled as channel functions obtained from two public databases, ATDB and Tox-Prot. After using the system to solve three different binary classification problems, each one for a specific ion channel, a committee is used to obtain the final classification label for each toxin. The committee got a classification accuracy of 80%, with correctness of 97%, 67% and 55% respectively to sodium, potassium and calcium channels.

I. I NTRODUCTION

T

HERE are many venomous animals in nature, like scorpions and snakes, whose venoms are usually dangerous to humans. These venoms are composed by several substances: amino acids, biogenic amines, peptides, proteins, enzymes, nucleotides, lipids and others [1]. Many of these components, usually called toxins, have different mechanisms of action and also different targets in organisms, causing physiological consequences as paralysis or necrosis. The information gathered in current venom researches has brought the possibility of using toxins as biopharmaceuticals or pharmacological tools [2]. Some molecules have potential application in agriculture and medicine and others are promising pharmacological tools for understanding the physiology of organisms, organs, cells and even cellular components, like the ion channels [3]. Some toxins are able to increase the permeation of ions through the cellular membranes, by binding to ion channels. Ion channels are proteins normally involved with the functioning of the nervous system of many organisms. More specifically, they participate in the generation, modulation Corresponding author: [email protected] Bernardo Penna, Thais Melo, Ricardo Ribeiro, Ricardo Fortuna and Jos´e Marcos are with ABG Information Systems. Maur´ıcio is with Fundac¸a˜ o Ezequiel Dias. The authors would like to thank FAPEMIG for the support in the development of this work (process APQ-6751-4.01/07).

and transduction of the nervous impulses, by permeating ions, like sodium, potassium, calcium and chloride through nervous cells. Malfunctioning of the ion channels can cause many diseases like cardiac arrhythmias, migraines, epilepsy and chronic pain. The toxins that bind to ion channels have an overall structural similarity. But some differences among the primary, secondary and tertiary structures define their target action: binding to specific channels for sodium, potassium, calcium or chloride ions [4]. The primary structure (amino acid sequence) of toxins is usually not sufficient to determine their function. So using only a similarity search between a novel toxin and a database of known toxins is not enough to determine its putative homologous function. On the other hand, the secondary and tertiary structures, which are more related to the 3D form of the toxin, are strongly associated to its function. However, they are much more difficult to obtain (often by x-ray crystallography experiments), so they have to be predicted by a software (with a low estimation error) from the primary structure. Some patterns in the primary structure that can differentiate the function of toxins have already been identified, like the cysteine residues found in specific positions [4] and structural motifs [5]. With new emerging biotechnologies the rate of toxin discovery is raising. However, the existing standards are not sufficient to determine the specific functions of each novel toxin. Therefore, the discovery of new patterns and signatures that relate the primary and secondary structure with the function of a toxin would bring great progress to this area. In this work we set up an intelligent system that is able to discover novel patterns within the primary and secondary structures from a toxin database, together with relevant biological information. The obtained patterns make it possible to differentiate these toxins by their specific binding to different types of ion channels: sodium, potassium or calcium. The five steps of the system are explained with details in Sec. II: primary sequence pre-processing, secondary structure prediction, sequence alignment, pattern recognition and rule combination. The system uses SignalP [6], ProP [7], Sspro4 [8], ClustalW [9] and GPLab [10] software, most of them based on computational intelligence techniques. Sec. III presents the results achieved by the system and Sec. IV concludes the work and points to some possible future developments.

II. T HE INTELLIGENT SYSTEM FOR RECOGNITION OF BIOLOGICAL PATTERNS

Fig. 1 presents the system pipeline, which uses a database of toxin primary sequences as input and is able to classify each one according to its biological function, besides of giving interesting information of their rules.

of the most popular methods for predicting signal peptide. It consists of two different predictors, based on Neural Networks [11] and Hidden Markov [12] techniques. SignalP produces both classification and cleavage site assignment, while most of similar methods only classify proteins as secretory or non-secretory. ProP uses an internal database of cleavage sites in the border region between signal peptide, propeptide and mature protein to apply a Neural Network [11] technique for predicting the existence of a propeptide. With the results of SignalP and ProP, the signal peptides and propeptides predicted can be eliminated from the primary toxin sequences at the cleavage point indicated, with Perl [13] scripts. B. Prediction of secondary structure and deduction of relevant biological information to be used together with the primary structure for pattern recognition

Fig. 1. Pipeline of the developed pattern recognition system: primary sequence pre-processing, secondary structure prediction, sequence alignment, pattern recognition and rule combination.

A. Pre-processing of primary structure to use as the input to the system Before using the primary structures to generate other relevant biological information, it is necessary to analyze them in order to get the mature proteins, i.e., the primary sequences without signal peptide and/or propeptide. The signal peptide is a region of amino acids that directs proteins to be transported to various cell organelles. The propeptide is another sequence presented in many inactive proteins that can be turned into an active form by posttranslational modification. Some toxin databases already show information about the presence of these regions in many toxins. However, several toxins remain without characterization, so it is essential the use of a software to predict these regions in those sequences. The software SignalP [6] is used to predict the presence of a signal peptide and ProP [7] predicts the existence of a propeptide in each toxin of the database. SignalP is one

The secondary sequences of toxins, related to their threedimensional structures, are predicted by using the software SSPro4 [8], the most used predictor of secondary structures known. Its algorithm is based on Bayesian Networks [14] meshed with a Neural Network parametrization to accelerate belief propagation and learning. SSpro4 uses evolutionary information from homologous proteins by using PSI-BLAST [15] profiles in its input. Homologous structures are also taken into consideration and are used in combination with the prediction output to improve accuracy. Sspro4 uses a given toxin primary sequence as input and generates a string as output, with the same size of the primary structure input. This string is composed by three different characters representing the predicted secondary structures for each site: random coil, alpha-helix and extended beta-sheet. Each amino acid is classified into different categories according to its physical and chemical characteristics or according its water affinity information. For the first group, the possible categories are neutral, basic, acid, aliphatic, cyclic, aromatic, with thiol group and with carboxamide group. The two categories applied to the last group are hydrophobic or hydrophilic. As these information are important to determine the overall structure of proteins, some scripts were developed in Perl [13] to create two new set of inputs: physical and chemical characteristics and water affinity information. After the prediction of secondary structure and deduction of biological information, there are four data groups to be used on the next step in order to find biological patterns: primary structure (PRI), secondary structure (SEC), physical and chemical characteristics (PCC) and water affinity information (WAI). The sequences for each group are represented by strings. All of these strings have the same size of the primary structure, because each character of the string encodes the information associated with the amino acid in the primary structure. C. Alignment of primary structure and replication of the generated gaps into other kind of sequences The primary structures of toxins have often different sizes, i.e., each one consists of a different number of combined

amino acids. Because the pattern recognition evaluated by the system generates rules in the form of binary trees, it is necessary that all samples have the same size. This is done by performing an alignment of the primary sequences themselves, introducing gaps in regions of dissimilarity (e.g. regions of insertions and deletions of amino acids occurred during molecular evolution of toxins). The software ClustalW [9] is used to perform a global alignment using all toxin primary structures existent in the input data. The global alignment is used to group similar regions of amino acid sequences and to open gaps in insertion/deletion sites. Besides the size equalization applied to samples, like explained before, the alignment is important to group similar information increasing the performance of the pattern recognizer used in the next step. A Perl script is used to replicate the gaps generated by the alignment of the primary structures to the other sets of three sequences (SEC, PCC and WAI). This is necessary in order to keep all four group of sequences with the same size and all sites of correspondent information aligned. D. Recognition of biological patterns and generation of classification rules This step aims to find relevant biological patterns in all four groups created in the earlier steps. Classification rules are built (as a binary tree structure) in order to detect patterns in those sequences. These patterns can classify toxins according to their functions and also give some innovative information to biologists. The software GPLab [10] is used to find patterns in PRI, SEC, PCC and WAI groups separately. GPLab is based on Genetic Programming [16], an evolutionary algorithm to find patterns, which are represented by binary trees as the classification of the biological function used in this work. The evolution process of the individuals in Genetic Programming is performed according to a fitness function, which is the classification error we want to iteratively decrease. There are two operators that simulate the evolutionary process: crossover and mutation. Crossover is applied to an individual only by switching one of its nodes with another node from another individual in the population. With a treebased representation, replacing a node means replacing the whole branch. Mutation affects an individual in the population, by replacing a whole node in the selected individual or just one node’s information. Since each individual of Genetic Programming is represented by a tree, its solution corresponds to a program that solves the toxin binding to ion channel function. E. Combination of classification rules found Since each binary tree created by GPLab gives us biological patterns related to each one of the four groups (PRI, SEC, PCC or WAI), it is important to combine these rules together in order to obtain a more general rule that represents relevant information of the majority of these groups. The combination process is performed by a script developed in Matlab [17], where the best rules found by GPLab (Tabs. I, II and III)

TABLE I B EST RULES FOUND FOR THE NA X ALL CLASSIFICATION USING GPL AB WITH FINAL INDIVIDUAL FITNESS AND NUMBER OF PATTERS

Rule 1 2 3 4 5 6 7 8

Group PRI PRI SEC SEC PCC PCC WAI WAI

Fitness 138 153 158 148 146 145 160 156

Patterns 3 3 1 2 2 2 1 2

+ error (%) 30.8 28.2 32.5 24.8 28.4 23.2 34.6 32.2

- error (%) 2.3 9.0 5.7 11.3 7.0 12.4 3.9 5.4

TABLE II B EST RULES FOUND FOR THE C A X ALL CLASSIFICATION USING GPL AB WITH FINAL INDIVIDUAL FITNESS AND NUMBER OF PATTERS

Rule 1 2 3 4 5 6 7 8

Group PRI PRI SEC SEC PCC PCC WAI WAI

Fitness 100 119 149 122 129 136 112 124

Patterns 3 4 1 3 3 3 4 3

+ error (%) 48.2 46.4 84.9 59.6 34.3 79.5 51.8 65.7

- error (%) 1.6 2.2 1.2 3.6 11.2 0.6 4.1 2.3

were put together in a combinatorial way by using the logic operators AND and OR to associate them. For each possible combination of a new rule, the script evaluates its fitness (sum of the classification error for all samples) and compares with the best one up to that time, until only the best composed rule remains (Figs. 2, 3 and 4). The script returns the combination that best classifies each ion channel function against others, i.e., the one whose classification error is the smallest. III. R ESULTS AND D ISCUSSION All the experiments were performed using 802 toxin primary sequences of scorpion, spider, conus, wasp, bee and sea anemone, labeled as Na, Ca or K channel functions. The 746 sequences or 93% of the data, were got from ATDB [18] and the other 7%, or 56 sequences, were harvested from Tox-Prot [19]. The dataset is composed by 419 samples of Na channel function and 166 and 217 toxins of Ca and K functions respectively. TABLE III B EST RULES FOUND FOR THE K X ALL CLASSIFICATION USING GPL AB WITH FINAL INDIVIDUAL FITNESS AND NUMBER OF PATTERS

Rule 1 2 3 4 5 6 7 8

Group PRI PRI SEC SEC PCC PCC WAI wAI

Fitness 101 154 110 115 118 115 125 110

Patterns 2 2 2 2 4 1 1 2

+ error (%) 28.6 67.7 36.4 30.4 39.6 29.0 34.1 39.2

- error (%) 6.6 1.2 5.3 8.3 5.4 8.8 8.6 4.2

This problem is a typical multi-class classification which can be evaluated with an one-versus-all decomposition. With this strategy, the system developed was used to solve three different problems and then a committee is used to select the best answer: • • •

Classification of Na channel samples (positive class) against Ca and K ones (negative class). Classification of Ca channel samples (positive class) against Na and K ones (negative class). Classification of K channel samples (positive class) against Na and Ca ones (negative class).

All the three problems above were divided into four runs of the system, each one for a sequence group, i. e., PRI, SEC, PCC and WAI ones. Since GPLab is an inherently random technique, because there is a random population initialization at the first generation, several experiments have been made in order to find the best rules. After many runs of GPLab, it was possible to build Tabs. I, II and III, that show the best two rules for each sequence group obtained on the three classification problems presented. The information shown on these tables are: • • •

• • •

Rule: reference index. Type: group used for the rule creation. It can be PRI, SEC, PCC or WAI. Fitness: fitness value got P with GPLab. Since the fitness 802 is the classification error i=1 |predictedi − desiredi |, smaller fitness means a better result. Patterns: number of pattern comparisons found using this rule, i.e., the complexity of the rule. + error (%): percentage of missclassification of this rule for the positive class. - error (%): percentage of missclassification of this rule for the negative class.

Tab. I shows that fitness of Na x all classification are worse (higher) than other ones, mostly because the number of samples for positive class (Na channel function) is 52% of the data. The best fitness got for this problem is 138, for a PRI group rule with 3 patterns found and the best negative class error (2.3 %). On the other hand, the worst fitness presented in this table is 160 for rule 7, got in WAI group, where it was found one pattern and the biggest positive class error (34.6 %). The results for Ca x all classification is presented in Tab. II, where the best rule got 100 of fitness with PRI group. The worst fitness (149) was got by rule 3, using SEC group, with a positive class error of almost 85% and just one pattern found. Both best and worst negative class errors were got by PCC group, where rules 5 and 6 have 0.6% and 11.2% respectively. Rules 2 and 7 found the biggest number of patterns for this classification problem: 4 patterns each one.

TABLE IV RULE COMBINATIONS FOUND FOR EACH CLASSIFICATION PROBLEM Class Na Ca K

Rules (4 and 6) or 7 1 or 6 (6 and 7) or 2

Fitness 118 87 90

Pat. 5 6 4

+ error (%) 21.0 44.0 30.0

- error (%) 7.8 2.2 4.3

One can see from Tab. III that best (101) and worst (154) fitness values were got in PRI group, which also have the best and worst positive class error, respectively 28.6% and 67.7%. The biggest number of patterns (4) was got by rule 5 (PCC group) for this classification problem. Tab. IV corresponds to the last step of the developed system, the combination process. After the evaluation of all possible combinations of rules presented on Tabs. I, II and III, the best rules for each classification problems are given. For instance, Ca classification rules 1 and 6 were combined and got a 87 fitness, instead of the best 100 of Tab. II, while K classification rules 6, 7 and 2 were combined resulting on a 90 fitness, also better than all results of Tab. III. For each combined rule shown on Tab. IV, it was analyzed all positions used by the detected patterns that appeared in the correctly classified toxins that bind to Na, Ca and K channels. Although the importance of features cannot be taken separately (without taking into account the whole classification rule), the frequency of appearance of a given feature indicates this feature incorporates relevant information for the correctness of the classification.

Fig. 2. Best combination of rules found for Na classification after the last step of the intelligent system: 5 pattern comparisons using 3 groups

TABLE V F REQUENCY OF PATTERNS FOUND IN NA X ALL CLASSIFICATION . *PATTERN 5 IS COMPARED WITH HYDROPHOBIC FEATURE ON F IG . 2 Pattern 1 1 2 2 3 3 4 4 5*

Position SEC (81) SEC (100) SEC (109) SEC (238) PCC (49) PCC (81) PCC (108) PCC (109) WAI (68)

1st freq. (%) random coil (53.4) gap (98.2) beta sheet (39.8) gap (98.1) gap (45.2) thiol group (65.1) thiol group (56.0) gap (34.5) gap (58.0)

2nd freq. (%) gap (30.3) random coil (01.6) gap (34.5) random coil (01.9) basic (17.8) gap (30.3) gap (29.2) neutral (32.3) hydrophobic (40.1)

Fig. 3. Best combination of rules found for Ca classification after the last step of the intelligent system: 6 pattern comparisons using 2 groups Fig. 4. Best combination of rules found for K classification after the last step of the intelligent system: 4 pattern comparisons using 3 groups

For the classification rule for Na binding toxins (Fig. 2 and Tab. V), some positions (e.g. 81 and 109) have appeared more than once, suggesting its importance for the classification. Some features have appeared with higher occurrence in some positions and deserve particular attention: beta sheet and random coil for SEC group, thiol, basic and neutral for PCC and hydrophobic for WAI groups. The thiol group was found more frequently (65% and 56%) in the positions 81 and 108. The amino acid cystein (the only one present in the thiol group) is very important for the 3D structure of toxins. Cysteins form dissulphid bridges that defines the final folding of the structure, and therefore interferes with its function. Positions 109 and 81 use more frequently beta-sheets and random coil secondary structures (40% and 53%, together) but alpha helix seems not being used. Several studies show that alpha helix is not directly involved in the pharmacological activity of Na binding toxins and are, probably, more involved in the stabilization of the structure [20]. Basic amino acids are used with lower frequency (18%) in position 108. They seem to have crucial importance to the binding of toxins to the Na channel [21]. Several studies show that basic residues in some positions of the toxin are involved directly in its activity [20]. The increasing in the toxicity to mammals, during evolution, appear to be involved with the global rise of positive charges of these molecules (basic amino acids have positive charges) [22]. TABLE VI F REQUENCY OF PATTERNS FOUND IN C A X ALL CLASSIFICATION . *PATTERN 2 IS COMPARED WITH GLYCINE AND ** PATTERN 4 IS COMPARED WITH NEUTRAL FEATURE ON F IG . 3 Pattern 1 1 2* 3 3 4** 5 5 6 6

Position PRI (161) PRI (171) PRI (50) PRI (58) PRI (60) PCC (31) PCC (132) PCC (187) PCC (132) PCC (134)

1st freq. glycine asparagine gap gap gap gap gap gap gap gap

(%) (17.5) (24.8) (32.9) (98.0) (98.3) (97.2) (96.8) (98.3) (96.8) (63.2)

2nd freq. (%) aspartate (15.7) gap (22.4) serine (17.8) tyrosine (01.3) phenylalanine (01.1) neutral (01.5) aliphatic (02.0) neutral (01.4) aliphatic (02.0) carboxamide (11.3)

Neutral and hydrophobic amino acids are more frequently found (32% and 40%) in positions 109 and 68. Neutral amino acids (glycine and alanine) are also hydrophobic. It’s known that superficial hydrophobic residues are conserved in several toxins, probably because they interact to the channel [23]. Glycine residues are frequently conserved to facilitate the formation of some secondary structures, like turns, and some foldings of the polipeptide chain that can achieve favorable conformational angles [24]. Fig. 3 and Tab. VI present the classification rule for Ca binding toxins, where one can see the most frequently used features: glycine, aspartate, asparagine and serine with more than 15% for PRI group, aliphatic for PCC group (which are serine and threonine). Aspartate and glycine are frequently used in position 161 (18 and 16%). Aspartate is negatively charged and it has already been demonstrated that negative and positive charges are important for the toxin to achieve the correct electrostatic potential for its biological activity [25]. Glycine has structural importance to the toxin already discussed. Asparagine, also frequently ocurring in position 171 (25%), is very conserved in scorpion toxins that act in calcium channels (conclusion by visual analysis of the alignment). Moreover, asparagine and threonine when present in certain positions of the toxin, perform structural roles in the molecule stabilization [26]. In scorpion toxins, asparagine is also important to keep the proper orientation of the N-terminal portion of the molecule [27]. Position 50 of PRI group uses frequently the feature serine (18%), which TABLE VII F REQUENCY OF PATTERNS FOUND IN K X ALL CLASSIFICATION . *PATTERN 3 IS COMPARED WITH GLUTAMINE FEATURE ON F IG . 4 Pattern 1 1 2 2 3* 4 4

Position PCC (67) PCC (107) WAI (21) WAI (109) PRI (37) PRI (26) PRI (32)

1st freq. (%) gap (54.8) gap (46.2) gap (65.9) hydrophobic (69.4) gap (63.3) gap (46.2) cysteine (90.0)

2nd freq. thiol group basic hydrophilic gap aspartate valine lysine

(%) (31.2) (20.8) (31.3) (28.1) (07.0) (18.3) (03.4)

appears to be important for toxin function when present in specific positions of the toxin [28]. In the classification rule for K channel binding toxins (Fig. 4 and Tab. VII), the following features were more frequently used: cysteine and valine (PRI); thiol group and basic (PCC); hydrophilic and hydrophobic (WAI). Valine amino acids, more frequently used at position 26, are important to form a superficial hydrophobic surface on the structure that is related to the binding on the channel and also to stabilize the molecule [29]. Aromatic amino acids are also hydrophobic and, as already discussed, are important to form the hydrophobic surface. The following features had their importance already discussed: thiol group and cysteine, more used at positions 67 and 32 with 31% and 90% of frequency; hydrophobic amino acids, frequently used at positions 109 with 69% frequency; basic amino acids, more used at position 107 (21% of frequency). Hydrophilic feature (position 21, with 31% frequency) still has unkown importance for the determination of the function of K channel binding toxins, what only experimental tests could biologically confirm the use of this found pattern. The high usage of the gap features inserted by the program ClustalW reflects a low global similarity between the sequences. This means that several toxin sequences had amino acids at positions that could not be aligned with others and also were revealed by the system. After the development of the intelligent system and its application into three different binary classification problems (Na, Ca and K), it’s necessary to give a single label for each toxin. One possible way to do this is by using a classification committee as follows: 1) A given toxin is evaluated according to the rules generated for each classification problem separately (Figs. 2, 3 and 4). 2) Weights are associated to each one of these three rules, according to + error column of Tab. IV as one can see below. 3) The best classification result among the three rules above for this toxin corresponds to its labeled class. The weight for each rule is given by the positive class correctness, according to + error column of Tab. IV: • • •

Na: 100 - 21 = 79 that corresponds to 39% of total. Ca: 100 - 44 = 56 that corresponds to 27% of total. K: 100 - 30 = 70 that corresponds to 34% of total.

TABLE VIII F INAL RESULT OF THE SYSTEM : MULTI - CLASSIFICATION USING COMMITTEE FOR NA , C A AND K CLASSIFIERS Class Na Ca K Total

Classifier weight 0.39 0.27 0.34 1.00

Number of toxins 419 166 217 802

Accuracy (%) 97.0 55.0 67.0 80.0

Using the weights 0.39, 0.27 and 0.34 respectively for the rules of Na, Ca and K, the classification error of the committee is 160, which corresponds to almost 20% of the data. The committee of rules got a total classification accuracy of 80% for 802 toxins according to Tab. VIII, with the correctnesses of 97%, 55% and 67% respectively for classes Na, Ca and K. IV. C ONCLUSIONS The primary sequence of a toxin is usually not enough to determine its function. This work presents an innovative way to find biological patterns in toxins in order to classify them according to their biological functions using computational intelligence. The discovered patterns make it possible to differentiate these toxins by their function: specific binding to different types of ion channels, like sodium, potassium or calcium. The intelligent system is composed of five steps: primary sequence pre-processing, secondary structure prediction, sequence alignment, pattern recognition and rule combination. All the experiments were performed using 802 toxin primary sequences of scorpion, spider, conus, wasp, bee and sea anemone. The dataset is composed by 419 samples of Na channel function and 166 and 217 toxins of Ca and K functions respectively. After using the system to solve three different binary classification problems, each one for a specific ion channel, a committee is used in the database to obtain the final classification label for each toxin. Even with a low level of similarity among sequences, showed by the frequency of gaps, the developed system achieved a total error of 20% of the data, which means the system is robust, but also can still be improved. Some fraction of the system classification error can be justified by incorrect or even ambiguous annotations. The intelligent system will be incorporated into a developed database structure [30] in order to make easier the information processing. Other biological functions can be added together with PRI, SEC, PCC and WAI groups to improve the system capability of recognize different patterns. It is also possible to increase the number of computational intelligence techniques used by the system in order to get better classification results. R EFERENCES [1] C. R. Diniz, ”Chemical and pharmacologic aspects of Tityinae venoms”. Handbook Experimental Pharmacology, Berlin Springer-Verlag, c.14, 379-394., 48, 1978. [2] H. Jackson and T. N. Parks, ”Spider toxins: recent applications in neurobiology”, Ann. Rev. Neurosci, 12:405-414, 1989. [3] M. V. Gomez, E. Kalapothakis, C. Guatimosin and M. A. M. Prado, ”Phoneutria nigriventer venom: a cocktail of toxins that affect ion channels”, Cell. Molec. Neurobiol, 22: 579-588, 2002. [4] M. H. de Lima, S. G. Figueiredo, A. M. Pimenta, D. M. Santos, M. H. Borges, M. N. Cordeiro, M. Richardson, L. C. Oliveira, M. Stankiewicz and M. Pelhate, ”Peptides of arachnid venoms with insecticidal activity targeting sodium channels”, Comp. Biochem. Physiol. C. Toxicol. Pharmacology, 146: 264-279, 2007. [5] S. Kozlov and E. Grishin, ”Classification of spider neurotoxins using structural motifs by primary structure features. Single residue distribution analysis and pattern analysis techniques” Toxin 46, 672-686, 2005.

[6] J. D. Bendtsen, H. Nielsen, G. von Heijne and S. Brunak, ”Improved Prediction of Signal Peptides: SignalP 3.0”, Journal Mol. Biol., 340, 783-795, 2004. [7] P. Duckert, S. Brunak and N. Blom, ”Prediction of proprotein convertase cleavage sites”, Protein Engineering, Design and Selection, vol. 17 no. 1 pp. 107-112, 2004. [8] G. Pollastri, D. Przybylski, B. Rost and P. Baldi, ”Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles”, Proteins, 47, 228-335, 2002. [9] J. D. Thompson, D. G. Higgins and T. J. Gibson, ”Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gag penalties and weight matrix choice”, Nucleic Acids Res, v.22, p.4673-4680, 1994. [10] S. Silva, ”GPLAB - A Genetic Programming Toolbox for MATLAB”, ECOS - Evolutionary and Complex Systems Group, University of Coimbra, Portugal, 2007. [11] D. E. Rumelhart, G. E. Hinton and R. J. Williams, ”Learning internal representations by error propagation”, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, 1986. [12] L. Baum, T. Petrie, G. Soules and N. Weiss, ”A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains”, Ann Math Statistics, 1970. [13] T. Christiansen and N. Torkington, ”Perl Cookbook”, 2nd Edition, 2003. [14] J. Pearl, ”Bayesian Networks: A Model of Self-Activated Memory for Evidential Reasoning”, Proceedings of the 7th Conference of the Cognitive Science Society, University of California, Irvine, CA, pp. 329334, August 15-17, 1985. [15] S. F. Altschul, D. J. Lipman, W. Miller, T. L. Madden, A. A. Schaffer, J. Zhang and Z. Zhang, ”Gapped blast and psi-blast: a new generation of protein database search programs”, Nucleic Acid Research, 25(17), December, 1997. [16] J. R. Koza, ”Genetic Programming: On the Programming of Computers by Means of Natural Selection”, MIT Press, 1992. [17] C. B. Moler, J. N. Litte and S. Bangert, ”MATLAB User’s Guide”, The MathWorks Inc, Sherborn, Massachusetts, 1992. [18] Q. Y. He, Q. Z. He, X. C. Deng, L. Yao, E. Meng, Z. H. Liu and S. P. Liang, ”ATDB: a uni-database platform for animal toxins”, Nucleic Acids Research, 1-5, 2007. [19] F. Jungo and A. Bairoch, ”Tox-Prot, the toxin protein annotation program of the Swiss-Prot protein knowledgebase”, Toxicon 2005, 45:293-301, 2005. [20] E. Blanc, O. Hassani, S. Meonier, P. Mansuelle, F. Sampiere, H. Rochat and H. Darbon, ”H-NMR-derived secondary structure and overall fold of a natural anatoxin from the scorpion Androctonus australis”, Hector. Eur. J. Biochem, 247, 1118-1126, 1997. [21] J. C. Roges, Y. Qu, T. N. Tanada, T. Schever and W. A. Catterall, ”Molecular determinants of high affinity binding of scorpion toxin and sea anemone toxin in the S3-S4 extracellular loop in domain IV of the sodium channel subunit”, J. Biol. Chem, 271(27), 15950-15962, 1996. [22] N. Srairi-Abid, P. Mansuelle, T. Mejri, H. Karovi, H. Rochat, F. Sampieri and M. el Ayeb, ”Purification, characterization and molecular modelling of two toxin-like proteins from the Androctonus australis Hector venom”, Eur. J. Biochem, 267, 5614-5620, 2000. [23] C. Devaux, P. Fourquet and C. Granier, ”A conserved sequence region of scorpion toxins rendered immunogenic induces broadly cross-reactive neutralizing antibodies”, Eur. J. Biochem, 242, 727-735, 1996. [24] I. Polikarpov, M. S. M. Junior, S. Marangoni, M. H. Toyama and A. Teplyakov, ”Cristal structure of neurotoxins Ts1 from Tityus serrulatus provides insights into the specificity and toxicity of scorpion toxins”, Journal Mol. Biol, 290, 175-184, 1999. [25] N. Zilberberg, O. Froy, E. Loret, S. Cestele, D. Arad, D. Gordon, M. Gurevitz, ”Identification of structural elements of a scorpion neurotoxin important for receptor site recognition”, Journal Biol. Chem., 273(23), 14810-14816, 1997. [26] D. Housset, C. H. Rochat, J. P. Astier, J. C. Fontecilla-Camps, ”Crystal structure of toxin II from the scorpion Androctonus australis Hector refined at 1,3 resolution”, Journal Biol. Chem., 238, 88-103, 1994. [27] H. M. Li, D. C. Wang, Z. H. Zeng, L. Jin, R. Q. Hu, ”Cristal structure of an acidic neurotoxin from scorpion Buthus martensii Karsch at 1,85 resolution”, Journal Mol. Biol. 261, 415-431, 1996. [28] E. S. Caldern-Aranda, B. Slisko, E. J. York, G. B. Gurrola, J. M. Stewart and L. D. Possani, ”Mapping of an epitope recognized by a neutralizing monoclonal antibody specific to toxin Cn2 from the

scorpion Centruroides noxius, using discontinous synthetic peptides”, Eur. Journal Biochem, 264, 746-755, 1999. [29] S. Mouhat, B. Jouirou, A. Mosbah, M. de Waard, J. M. Sabatier, ”Diversity of folds in animal toxins acting on ion channels”, Biochem. Journal, 238, 717-726, 2004. [30] R. S. Ribeiro, D. M. Teixeira, R. A. Soares, T. M. Mendes, M. H. Borges, M. N. Cordeiro, M. Richardson, M. A. Mudado, B. P. R. Carvalho and S. V. Carvalho, ”FUTDB - Funed Toxin Data Base: a database to store data for experiments of protein and toxin purification”, IV Conference of the Brazilian Association for Bioinformatics and Computational Biology, X-Meeting 2008, Salvador, Brazil, 2008.

A System for Recognition of Biological Patterns in ...

A System for Recognition of Biological Patterns in Toxins Using. Computational Intelligence. Bernardo Penna Resende de Carvalho, Thais Melo Mendes, Ricardo de Souza Ribeiro,. Ricardo Fortuna, José Marcos Veneroso and Maurıcio de Alvarenga Mudado. Abstract—This work presents an innovative way to find.

252KB Sizes 1 Downloads 292 Views

Recommend Documents

Review of Iris Recognition System Iris Recognition System Iris ... - IJRIT
Abstract. Iris recognition is an important biometric method for human identification with high accuracy. It is the most reliable and accurate biometric identification system available today. This paper gives an overview of the research on iris recogn

Review of Iris Recognition System Iris Recognition System Iris ...
It is the most reliable and accurate biometric identification system available today. This paper gives an overview of the research on iris recognition system. The most ... Keywords: Iris Recognition, Personal Identification. 1. .... [8] Yu Li, Zhou X

A Distributed Speech Recognition System in Multi-user Environments
services. In other words, ASR on mobile units makes it possible to input various kinds of data - from phone numbers and names for storage to orders for business.

A Distributed Speech Recognition System in Multi-user ... - USC/Sail
A typical distributed speech recognition (DSR) system is a configuration ... be reduced. In this context, there have been a number of ... block diagram in Fig. 1.

A Distributed Speech Recognition System in Multi-user ... - USC/Sail
tion performance degradation of a DSR system. From simulation results, both a minimum-mean-square-error. (MMSE) detector and a de-correlating filter are shown to be effective in reducing MAI and improving recognition accuracy. In a CDMA system with 6

Face Authentication /Recognition System For Forensic Application ...
Graphic User Interface (GUI) is a program interface item that allows people to interact with the programs in more ways than just typing commands. It offers graphical icons, and a visual indicator, as opposed to text-based interfaces, typed command la

89. GESTURE RECOGNITION SYSTEM FOR WHEELCHAIR ...
GESTURE RECOGNITION SYSTEM FOR WHEELCHAIR CONTROL USING A DEPTH SENSOR.pdf. 89. GESTURE RECOGNITION SYSTEM FOR ...

accent tutor: a speech recognition system - GitHub
This is to certify that this project prepared by SAMEER KOIRALA AND SUSHANT. GURUNG entitled “ACCENT TUTOR: A SPEECH RECOGNITION SYSTEM” in partial fulfillment of the requirements for the degree of B.Sc. in Computer Science and. Information Techn

Automated Recognition of Patterns Characteristic of ...
Such data are generated on a regular basis by labeling one or more ..... We initially sought a means of visualizing the degree of separation of the 5 image ...

Recognition of incomplete patterns by bumble bees
the top halves of the training patterns, the bottom halves or the side halves. Three conditions .... The GP values were the ones of primary interest, .... This account.

Optical character recognition for vehicle tracking system
This paper 'Optical Character Recognition for vehicle tracking System' is an offline recognition system developed to identify either printed characters or discrete run-on handwritten ... where clear imaging is available such as scanning of printed do

Recent Improvements to IBM's Speech Recognition System for ...
system for automatic transcription of broadcast news. The .... vocabulary gave little improvements, but made new types .... asymmetries of the peaks of the pdf's.

An Effective Segmentation Method for Iris Recognition System
Biometric identification is an emerging technology which gains more attention in recent years. ... characteristics, iris has distinct phase information which spans about 249 degrees of freedom [6,7]. This advantage let iris recognition be the most ..

A Possibilistic Approach for Activity Recognition in ...
Oct 31, 2010 - A major development in recent years is the importance given to research on ... Contrary as in probability theory, the belief degree of an event is only .... The Gator Tech Smart House developed by the University of ... fuse uncertain i

An Optical Character Recognition System for Tamil ...
in HTML document format. 1. ... analysis [2], image gradient analysis [3], ... Figure 1. Steps involved in complete OCR for Tamil documents. 2. PREPROCESSING.

Optical character recognition for vehicle tracking system
Abstract. This paper 'Optical Character Recognition for vehicle tracking System' is an offline recognition system developed to identify either printed characters or discrete run-on handwritten characters. It is a part of pattern recognition that usua

An Optical Character Recognition System for Tamil ...
For finding the text part we use the Radial Basis Function neural network (RBFNN) [16]. The network is trained to distinguish between text and non-text (non-.

The SMAPH System for Query Entity Recognition and ...
Jul 6, 2014 - the Wikipedia pages occurring in the search results, and from an ... H.3.5 [Information Storage and Retrieval]:. Online .... TagMe searches the input text for mentions defined by the ..... Software, 29(1): 70-75, 2012. Also in ACM ...

A Robust High Accuracy Speech Recognition System ...
speech via a new multi-channel CDCN technique, reducing computation via silence ... phone of left context and one phone of right context only. ... mean is initialized with a value estimated off line on a representative collection of training data.

A Protected Interruption Recognition system aligned with ... - IJRIT
Keywords: Wireless mobile ad-hoc network, security goal, security attacks, ... need an interruption recognition system, which can be categorized into two ..... in this process is reasonable with a good network performance in terms of security as.

A Possibilistic Approach for Activity Recognition in ...
Oct 31, 2010 - electronic components, the omnipresence of wireless networks and the fall of .... his activity, leading him to carry out the actions attached to his.

Optical character recognition for vehicle tracking system
This paper 'Optical Character Recognition for vehicle tracking System' is an offline ... Image is a two-dimensional function f(x,y), where x and y are spatial ... one must perform the setup required by one's particular image acquisition device. ....