2D-RNA-Coupling Numbers: A New Computational Chemistry Approach to Link Secondary Structure Topology with Biological Function ´ LEZ-DI´AZ,1 GUILLERMI´N AGU ¨ ERO-CHAPIN,2 JAVIER VARONA,3 REINALDO MOLINA,4 HUMBERTO GONZA GIOVANNA DELOGU,2 LOURDES SANTANA,1 EUGENIO URIARTE,1 GIANNI PODDA2 1

Department of Organic Chemistry, University of Santiago de Compostela, Santiago de Compostela 15782, Spain 2 Dipartimento Farmaco Chimico Tecnologico, Universita´ Degli Studi di Cagliari, Cagliari 09124, Italy 3 Biomedicine Unit, FES Iztacala, UNAM, Los Reyes Iztacala, Tlalnepantla DF 54090, Mexico 4 REQUIMTE, Facultade de Ci^ encias, Universidade do Porto, 4169-007 Porto, Portugal Received 23 August 2006; Revised 6 October 2006; Accepted 9 October 2006 DOI 10.1002/jcc.20576 Published online 5 February 2007 in Wiley InterScience (www.interscience.wiley.com).

Abstract: Methods for prediction of proteins, DNA, or RNA function and mapping it onto sequence often rely on bioinformatics alignment approach instead of chemical structure. Consequently, it is interesting to develop computational chemistry approaches based on molecular descriptors. In this sense, many researchers used sequence-coupling numbers and our group extended them to 2D proteins representations. However, no coupling numbers have been reported for 2D-RNA topology graphs, which are highly branched and contain useful information. Here, we use a computational chemistry scheme: (a) transforming sequences into RNA secondary structures, (b) defining and calculating new 2D-RNA-coupling numbers, (c) seek a structure-function model, and (d) map biological function onto the folded RNA. We studied as example 1-aminocyclopropane-1-carboxylic acid (ACC) oxidases known as ACO, which control fruit ripening having importance for biotechnology industry. First, we calculated  k(2D-RNA) values to a set of 90-folded RNAs, including 28 transcripts of ACO and control sequences. Afterwards, we compared the classification performance of 10 different classifiers implemented in the software WEKA. In particular, the logistic equation ACO ¼ 23.8   1(2D-RNA) þ 41.4 predicts ACOs with 98.9%, 98.0%, and 97.8% of accuracy in training, leaveone-out and 10-fold cross-validation, respectively. Afterwards, with this equation we predict ACO function to a sequence isolated in this work from Coffea arabica (GenBank accession DQ218452). The  1(2D-RNA) also favorably compare with other descriptors. This equation allows us to map the codification of ACO activity on different mRNA topology features. The present computational-chemistry approach is general and could be extended to connect RNA secondary structure topology to other functions. q 2007 Wiley Periodicals, Inc.

J Comput Chem 28: 1049–1056, 2007

Key words: RNA secondary structure; molecular descriptors; sequence-function relationships; coupling numbers; linear classifiers; machine learning algorithms

Introduction Structural genomics projects aim to provide a sharp increase in the number of structures of functionally unannotated, and largely unstudied, proteins with its respective DNA and RNA sequences. Then, algorithms and tools capable of deriving information about the function are very useful. However, current methods for predicting protein function are mostly reliant on identifying a similar protein of known function. In earlier works others demonstrated that by representing proteins using simple numeric attributes (molecular descriptors) it is possible to assign proteins function annotation.1–4 The method did not rely on detecting

similarity to another protein and could be applied to any protein for which the attributes could be calculated.4 In principle 1D, 2D, and 3D structural parameters for small molecules could be applied to nucleic acids to overcome this problem. For instance, Marrero-Ponce et al. have reported the use of molecular descriptors to predict RNA and proteins prop-

Correspondence to: H. Gonza´lez-Dı´az; e-mail: [email protected] or [email protected] Contract/grant sponsor: Xunta de Galiza; contract/grant numbers: PXIB20304PR, BTF20302PR

q 2007 Wiley Periodicals, Inc.

1050

Gonza´lez-Dı´az et al. • Vol. 28, No. 6 • Journal of Computational Chemistry

erties without using alignment techniques.5,6 The simplest of these indices are the 1D ones. However, 1D molecular structure descriptors have been largely explored by other researchers. Particularly outstanding are the contributions after Chou et al. and Cai et al., which signed many interesting articles on the use of sequence-order-coupling numbers to encode pseudo-aminoacid compostion.7–27 On the other hand, the use of 3D descriptors presupposes detailed or at least approximately knowing of 3D structure.28–30 In the case of proteins and DNA one alternative is the use of nonrealistic but very useful 2D sequence representations such as those after Randiıˆc et al.31 Afterwards, we can calculate graph invariants of the sequence graph and use it later in sequences structurefunction studies.32 Some researchers prefer to refer to these studies as Quantitative-Structure-Activity-Relationship (QSAR), which are methods that use numeric indices of the structure of smallsized and macromolecules to predict biological properties.33–37 In general, the development and application of new sequence representations is an active field of computational chemistry. For instance one can cite the seminal works of Liao et al.38–40 in DNA sequences or Liu and Wang41 in proteins. Yao et al.42 and Liao et al.43–47 also reported interesting representations for RNA although possibly because of the more branched nature of RNA secondary chemical structure topology representations of this kind are less common. However, in the case of RNAs accurate and timely methods can be applied to estimate folded secondary structure.48,49 Subsequently, a solution we have proposed is to scaling the QSAR problem up to RNA by transforming sequences into 2D secondary structures first.50 Later, one may calculate molecular descriptors of the RNA secondary structure and use it for QSAR studies.51 One of this indices called the electrostatic potential has been shown to correlated to biological function of biopolymers.52,53 In the present study, we selected as example QSAR study of the RNAs of the family of 1-aminocyclopropane-1-carboxylic acid (ACC) oxidases known as ACO. This plant enzyme participates in ethylene production from S-adenosylmethionine via ACC, the immediate precursor of ethylene. Ethylene plays an important role in plant growth and development, including the control of germination, senescence, floral fading, and fruit ripening. Therefore, ACOs are of the major importance in fruit ripening and so in plant molecular biology and biotechnology. They belong to multi-gene family in many plant species and whose members are differentially regulated encodes ACO enzyme. ACOs action is considered one of the rate-limiting steps in ethylene production. Large losses of fruits and vegetables are incurred annually because of ethylene’s effects on plant senescence.54–56 In particular, coffee plant (Coffea arabica cv caturra rojo) presents a rapid fruit’s senescence process causing significant losses because of the falling of its fruits to the soil and it is also known that coffee’s ripened fruits in the soil can help growing up of some grubs affecting directly to the plants. It determines the importance of studying these ACOs with QSAR techniques.57 First, we are going to introduce and calculate by the first time 2D-RNA-coupling numbers  k(2D-RNA) to a set of 90 RNA secondary structures including 28 ACC oxidases and 62 control sequences. We retain herein the symbol  to denote the same symbol of the sequence-order coupling numbers, which

can be see as ‘‘1D parents’’ but highlight the fact that we are working with 2D-RNA structures. Afterwards, we will carry out experiments with 10 different classifiers implemented in the software WEKA to predict ACO RNA function from different plants. Classifiers will be compared and one of them selected. The selected classifier shall be used for function annotation and back-projection mapping of the sequence-function relationship for a new ACO sequence isolated from Coffea arabica and reported for the first time in this work. The selected model then shall also allow predicting the contribution to ACO activity of different structural features on the RNA secondary structure.

Methods Calculating RNA Secondary Structure Parameters

The approach used here to calculate the 2D-RNA coupling numbers has been implemented on our in-house software MARCHINSIDE (MARkovian CHemicals IN SIlico DEsign) approach. MARCH-INSIDE uses as input the connectivity table (ct) files generated with the software RNAstructure, which estimates the secondary structure based on RNA folding free energy rules.58 The method determines the  k(2D-RNA) for a folded RNA secondary structure as the average of the number of links for all nucleotides placed at topological distance k of each nucleotide. Involving in calculation the properties of nodes at larger topological distance k each time give the analogy to sequence coupling numbers; the difference here is that the nodes included do not form a sequence but a 2D structure. We calculate this average as previously reported using a Markov model. In this sense, we calculate and sum the valence of the node in the graph for each j-th nucleotide (j) multiplied by the absolute probability A p1(j) with which we find this nucleotide moving from any other place at topological distance k. The valence j ranges from j ¼ 3 for hydrogen bonded nucleotides, j ¼ 2 for free not chain terminal nucleotides, to j ¼ 1 for free chain terminal nucleotides: k ðRNAÞ ¼

n X

A

pk ðjÞj

(1)

j¼1

In particular, the evaluation of the above equation for k ¼ 0 give the zero-rank 2D-RNA coupling number  0(RNA) and for k ¼ 1 the first-rank  1(RNA) or contact coupling number used in this work. (see Table1 for details). A free executable of MARCH-INSIDE that calculate total and local coupling numbers for connectivity tables (ct files) is available to the public upon request to the corresponding author at [email protected] Experimental Studies

This procedures follow three well established protocols and techniques already published in the scientific available literature55,59: 1. Genomic DNA Isolation: Coffee leaf tissue was ground in liquid nitrogen using a precooled mortar and pestle. Genomic DNA was isolated from leaves following a previously

Journal of Computational Chemistry

DOI 10.1002/jcc

2D-RNA-Coupling Numbers

1051

Table 1. Elements for the Definition of 2D-RNA Coupling Numbers  k(2D-RNA).

Fragment of one RNA secondary structure prediced for the sequence: c1 ucgauuuuaaauuuuugau20a

1

c1 þu2 þc3 þa19

pu2 ;a19 ¼ a

a19 þu2 þc1 þc3

pu2 ;c1 ¼ c

19

1

21

k ð2D-RNAÞ ¼

A

p0 ðc1 Þ

A

p0 ðu2 Þ :

:

Pc1 ;u2 6 1 Pu2 ;u2 ; 6  T A p0 ðu21 Þ :6 6 : 4 : 0

pa19 ;u2 ¼ u

0 1

Pu2 ;u2 ; : : :

u2 þc1

2

1

1

1

pc1 ;u2 ¼ u

0 1

Pu2 ;c3 ; : : :

2

u2 þa19 þg18 þu20

: : : : :

0 : : : 1

Pa19 ;u20 ;

3k 2 7 7 7 7 5

3 c1 6 u2 7 X n 6 7 A 7 :6 Pk ðjÞj 6 : 7¼ 4 : 5 j¼1 u20

a

These codes are the used for a classical representation of a nucleic acid sequences. Please note that there are only four letters ‘‘a, t, g, c’’ for a DNA sequence, using ‘‘u’’ instead of ‘‘t’’ in the case of RNA. The letters represent different classes of nucleic acid bases and the number used immediately after a base indicates, when used, the position of the base in the sequence.

described protocol. The pellet was resuspended in 300 L of water at 508C. The DNA solution was purified using a Qiagen Tip-500 column as per manufacturer’s instruction (Qiagen GmbH, Germany). 2. PCR Amplification: PCR from leaf tissue was performed by using 200 ng of genomic DNA. The reaction mixture was composed by 2.5 U Taq Pol (Gibco), 1 mM of each dNTP, 1.5 mM MgCL2, 2 M of forward primer 50 CTG TTY CAR GAY GAY AAR GT 30 and reverse primer 50 GCG NAG YTT CAT RTA RTC YTC 30 , respectively, in buffer Taq Pol 1 (Gibco) up to 50 L of total volume. Reaction was completed in three steps using thermo-cycler Perkin Elmer 2400. The thermo-cycler was programmed as follows: 5 min previous template denaturation at 948C, 1 min template denaturation at 948C, 2 min primer annealing at 558C with an increment of 0.18C, and 2 min primer extension at 728C for 30 cycles plus a final extension step at 728C for 5 min. The PCR reaction was run in agarose gel 1% until visual band separation. 3. Cloning and Sequencing: The PCR reaction showed an intensive band corresponding to the expected size *400 bp and it was purified using GEL Band Purification kit (AmershamPharmaciaBiotech). This PCR’s product was cloned into 1 pGEM -TEasy (Promega, USA) and recombinants selection was followed by white and blue colonies criteria using competent cells XL-1Blue. Sequencing was carried out on the same

cloning vector using M13 phage’s primers by MWG sequencing service (MWG-Biotech, Ebersberg, Germany).

Results and Discussion Comparison of Different Classifiers

In Table2 we summarize the overall results of the knowledge mining analysis with Machine Learning Algorithms implemented in WEKA software.60 As depicted there, all the eight algorithms seek rules that discriminate between ACO and other sequences with accuracies higher than 96% in training and 93% in 10-fold cross validation. In all cases there is a strong relationship between predicted and observed classification expressed by kappa () statistic values higher than 0.9. The  values indicates perfect discrimination for  ¼ 1.61 Considering the high efficiency of all algorithms, we selected the Logistic regression function as the best-found taking into consideration not only predictability but also simplicity and possibilities of back-projection: 1. Predictability: The Logistic regression function presented the higher predictability with leave-one-out accuracy of 98% and 10-fold cross-validation accuracy of 97.8%.

Journal of Computational Chemistry

DOI 10.1002/jcc

Gonza´lez-Dı´az et al. • Vol. 28, No. 6 • Journal of Computational Chemistry

1052

Table 2. Summary Result for Different QSAR Studies of ACOS.

Method

Rules

Comparing different classifiers JRip R1: if  1(RNA)  0.18 )aco R2: otherwise ) no Part R1: if  1(RNA)  0.10 )no R2: otherwise ) aco OneR R1: if  1(RNA)  0.16 ) no R2: otherwise ) aco Nnge R1: if 0.15   1(RNA)  0.11 ) no R2: if 6.08   1(RNA)  0.16 ) no R3: if  1(RNA) ¼ 0.167 ) aco R4: if 0.18   1(RNA)  3.41 ) aco Conjunctive rule R1: if  1(RNA)  0.03 ) no R2: otherwise ) aco REPTree R1: if  1(RNA)  0.02 ) no R2: otherwise ) aco Random tree R1: if  1(RNA)  0.04 ) aco R2: else if  1(RNA) < 0.17 then ) no R3: else if  1(RNA) < 0.16 then ) aco R4: else if  1(RNA)  0.16 then ) no Logistic R1: acc ¼ 24.0  1(RNA) þ 41.4 Comparing different molecular descriptors Logistic R1: aco ¼ 24.0  1(RNA) þ 41.4 Logistic R1: aco ¼ 23.8 1 þ 41.4 LDA R1: aco ¼ 2.88 Y1 þ 2.11 Y  9.04

nra

ceb

%c

%cvd

e

Ref.f

2

89

98.9

95.5

0.97

w

2

89

98.9

93.3

0.97

w

2

87

96.7

94.4

0.92

w

4

90

100

95.5

100

w

2

89

98.9

94.4

0.97

w

2

89

98.9

95.5

0.97

w

4

90

100

95.5

1.00

w

1

89

98.9

97.8

0.97

w

1 1 1

89 89 31

98.9 98.9 81.1

97.8 97.8 85.7

0.97 0.97 –

w w 35

w means this work, and Ref. 35 was listed with the other references. a Number of rules. b Number of correctly classified sequences. c Train accuracy. d 10-folded cross validation accuracy. e  Statistics. f Reference.

2. Simplicity: We measured this aspect in terms of the number of rules (nr) the algorithm built to discriminate ACO from other sequences. Logistic regression function was also the simplest using only 1 rule. 3. Back-projection: This aspect expresses the possibility of a QSAR model to be back-projected or backwards mapped from the variables space to structure. This property allows one to calculate the effect of every substructural element over the biological function given the model.62,63 Considering that the coupling numbers  k(2D-RNA) has an additive nature one can calculate the  k(2D-RNA) local values for different steams, loops, or other secondary structural features. This local  k(2DRNA) values are afterwards substituted in the QSAR equation to estimate the effect of substructures over function. So, the Logistic regression function can also be back-projected. Comparison of different molecular descriptors using linear classifiers

In addition, we compared  1(2D-RNA) with other molecular descriptors used earlier or in this work for the same problem. We only compared linear classifiers to minimize the effect of the classification approach and focus on the effect of the molec-

ular descriptor. The  1(2D-RNA) is more simple to calculate than the molecular descriptors called entropies (Yk); which where used in a previously reported model.57 The  1(2D-RNA) logistic model also very favorable compares in terms of number of number of sequences studied, number of variables in the model, and accuracy with respect this other model. We compare the coupling number besides with the electrostatic potential index 1. We obtain very similar results with  1(2D-RNA) and 1, see Table 2. The molecular descriptor 1 is similar to  1(2DRNA) but assign to each nucleotide the absolute sum of the electrostatic charge of all the atoms of the nucleotide. Consequently, unlike  1(2D-RNA), which only measures connectivity, the 1 index is also more complicated to calculate.28,52,64–66 In any case, we can explain the similar results for both indices base on the fact that both are calculated for a large RNA secondary structure. This fact determines that the effect of the weight of each particular nucleotide is less important than the topology of the RNA as a whole. In addition, in the case of RNA the determination of the secondary structure consider the nature of the nucleotides being unnecessary to differentiate between nucleotides in the calculation of the descriptors. In Table3 we depict the name, and  k(2D-RNA) values for all the sequences used in this work.

Journal of Computational Chemistry

DOI 10.1002/jcc

2D-RNA-Coupling Numbers Table 3. Names and  1(2D-RNA) Standardized Values for all Sequences.

Name ACC oxidases Rumex palustris Fagus sylvatica (european beech) Carica papaya (papaya)1 Musa acuminate Nicotiana glutinosa Pyrus pyrifolia ppaoxb mRNA Malus sylvestris Pelargonium x hortorum Passiflora edulis Gossypium barbadense Nicotiana tabacum Psidium guajava Pisum sativum(pea) Sorghum bicolor(sorghum) Betula pendula Malus x domestica1 Carica papaya(papaya)4 Prunu persica(peach) Populus euramericana Malus domestica Oryza sativa Diospyros kaki(kaki persimmon)1 Carica papaya(papaya)3 Actinidia deliciosa Helianthus annuus Carica papaya(papaya) Rosa roxburghii Vigna angularis(adzuki bean) NON-ACC oxidases Gonolabis marginalis mitoch Rat u1 small nuclear RNA gene E.crassus DNA RNA 50 -triphosphatase P. multimicronucleatum telomerase Tetrahymena capricornis telom RNA Mcs6 RNA (mycoplasma capricolum) Nesogaster lewisi mitoch. Gen Aichi virus genomic RNA Tetrahymena paravorax telom gene Mcs4 RNA (Mycoplasma capricolum) RNA polymerase sigma factor Cytophaga sp. 16s rib RNAb3114 rat gata-1gene Js1 bacterium 16s Tetrahymena australis telom RNA

Table 3. Names and  1(2D-RNA) Standardized Values for all Sequences

1

3.409 2.269 2.055 2.060 1.892 1.936 1.632 1.129 1.206 1.114 0.937 0.989 0.864 0.662 0.572 0.602 0.522 0.488 0.485 0.468 0.317 0.287 0.285 0.202 0.225 0.181 0.168 0.185 0.148 0.170 0.172 0.174 0.180 0.172 0.192 0.187 0.256 0.269 0.273 0.288 0.297 0.289 0.323 0.326

Molecular biology experimental techniques allowed our group to study sequences relevant for biochemistry.32,67 The experimental techniques used herein allowed us to isolate for the first time a possible ACO DNA partial sequence from Coffea arabica cv. caturra rojo. The genomic DNA solution was measured at 260 nm in a spectrophotometer reaching a concentration of 6.06 g/ L and its integrity was checked in agarose gel 0.8%. Sequencing of the band showed a fragment of 423 bp and its nucleotide is already published at GenBank (http://www.ncbi. nlm.nih.gov) data base with accession DQ218452.

1053

Name

1

Monascus pilosus 18s rRNA Chlamydospora 18s rRNA Cordyceps militaris genes 18s rRNA Burkholderia sp. Pj310 16s rRNA Comamonas sp. Pj111 16s rRNA Alpha proteobacterium 16s rRNA Tag c4 16s rRNA Porphyridium p. 18s Bos Taurus Crenarchaeote 16s Gama proteobac 16s rRNA2 Cytophaga sp. 16s ribo 16s ribosomal RNA Rhodella violacea gen for 18s rRNA Actinomyc bact 16s rRNA5 Burkholderia sp. Pj431 16s Actinomycetes bacterium 16s rRNA2 Gramþbact 16s RNA2 G proteobacterium 16s r Dixonielloa grisea 18s r Alpha proteobac 16s rRNA Dwarf guajaba poligalacturonase RNAb3042 Gama proteobact 16s rRNA Tick-borne encephalitis virus gene G proteobact 16s rRNA Actinomyc bact 16s rRNA3 Gramþbact 16s rRNA Human telomerase RNA Gramþbact 16s rRNA3 Rat histidine transporter Bacillus sp. 16s rRNA Actinomycetes bacterium 16s rRNA Actinomyc bact 16s rRNA4 Babesia sp gen for small RNA Proreus simulans mitoch.gene Pseudomonas sp. Pj311 16s P. caudatum telomerase Burkholderia sp. Pj604 16s Rana dybowskii 12s rRNA Halophilic bacterium tag f1 gene Bacillus brevis African clawed frog House mouse Rana chensinensis 12s Tetrahymena borealis telom RNA

0.326 0.327 0.331 0.340 0.339 0.344 0.360 0.351 0.351 0.360 0.359 0.372 0.372 0.372 0.366 0.389 0.376 0.395 0.397 0.393 0.402 0.409 0.407 0.401 0.423 0.417 0.412 0.461 0.459 0.494 0.581 0.643 0.366 0.370 0.404 0.158 0.358 0.301 0.292 0.363 0.384 0.404 0.419 6.083 0.355 0.109

The Figure 1 depicts a picture of the fruits from one plant of Coffea arabica as well as the electrophoresis picture for the experiment of isolation of this novel sequence. To illustrate the practical use of the model we predicted the biological activity for this novel sequence isolated in this work: 1. First, we transformed the DNA sequence into the RNA sequence. 2. Second, we derived the 2D-RNA structure using the software RNA structure.

Journal of Computational Chemistry

DOI 10.1002/jcc

1054

Gonza´lez-Dı´az et al. • Vol. 28, No. 6 • Journal of Computational Chemistry

coherent with the positive contribution predicted for this region with the RNA back-projection map. Then all these facts confirm the utility of MARCH-INSIDE as additional tool for alignment independent QSAR studies.4,69 In closing, in this work we introduce an alignment independent method for function annotation of ACOs, which are important plant sequences. The result may be of interest for computational chemist that deal with computer aided approaches to describe macromolecular structures as well as for researchers in plant molecular biology, biochemistry, bioinformatics, and biotechnology. The practical use of the method was demonstrated with the isolation and prediction by the first time of a new ACO sequence from Coffea Arabica cv. Caturra rojo.

Conclusions

Figure 1. (I) Fruits of Coffea arabica, (II) PCR reaction results: (a) Negative control without genomic DNA, (b) 1Kb ladder (Gibco) BLR (c) PCR reaction with degenerated primers using genomic DNA of Coffea arabica.

This study make emphasis on the definition of new structural parameters for RNA secondary structure-function relationship studies, which is a branch of computational chemistry less studied nowadays. This work demonstrates that the generalization of the sequence coupling numbers  k to RNA secondary structure coupling numbers  k(RNA) is a promising approach in the sense aforementioned. The study also show the possibility of back-projection of the models derived with  k(RNA) to map the contribution of different

3. Third, using the software MARCH-INSIDE we calculate the electrostatic potential 1 for the RNA secondary structure. 4. Later, using the logistic equation reported in Table 2 we found a very high conditional probability of 0.92 of being an ACO sequence. 5. Next, using the software MARCH-INSIDE we calculate local  k(2D-RNA) for some substructural features of the RNA secondary structure. 6. We substitute afterwards the local electrostatic potentials in calculate the logistic equation and calculate the contribution of each sub-structure to the ACO activity. 7. Finally, we build the back-projection map drawing the secondary structure the RNA and using a color scale to rank the contribution of each fragment to the biological activity. After this analysis, we can conclude that the model predicts a high probability for the new DNA sequence to encode an ACO protein. We have to stand out that the present approach does not need to use any alignment procedure. Therefore the present method can be used as complementary approach to the alignment techniques as the BLAST like procedures are.68 The importance of alignment independent techniques as alternative is a currently active field of research in molecular biology. Predictions with BLAST analysis and with MARCH-INSIDE approach coincide very well. Both methods identify the sequence as an ACO and predict a high positive contribution for the region between 200 and 350 bp, see Figures 2 and 3. There is also a nonmatching region from 70 to 170 bp approximately in BLAST analysis that is suspect to be a low contribution or intron region in our DNA sequence. Therefore, once we turn coffee’s DNA sequence into RNA, particularly this region will not contribute to ACO function. This hypothesis is

Figure 2. Results of the polynucleotide sequences alignment with the BLAST approach.

Journal of Computational Chemistry

DOI 10.1002/jcc

2D-RNA-Coupling Numbers

Figure 3. Back-projection map for a fragment of the secondary RNA structure transcript of the DNA sequence isolated from Coffea arabica.

RNA topologies to the biological function. Finally, the study demonstrates that in the case of RNA the determination of the secondary structure consider the nature of the nucleotides being unnecessary to differentiate between nucleotides in the calculation of the descriptors. All of these facts validate the use of  k(RNA) as a new and promising tool for RNA structure-function computational chemistry analysis complementary to bioinformatics techniques based on sequence alignment.

Acknowledgments Authors thank Drs. Collazo P and Valderrama L, from CINVESTAV, Irapuato, Guanajuato, Me´xico for primers design. Gonza´lez-Dı´az H acknowledges two months contract as guest professor from the Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, Spain (August– September 2006).

References 1. O’ Sullivan, O.; Suhre, K.; Abergel, C.; Higgins, D. G.; Notredame, C. J Mol Biol 2004, 340, 385. 2. Schafferhans, A.; Klebe, G. J Mol Biol 2001, 307, 407. 3. Kiel, C.; Wohlgemuth, S.; Rousseau, F.; Schymkowitz, J.; Ferkinghoff-Borg, J.; Wittinghofer, F.; Serrano, L. J Mol Biol 2005, 348, 759. 4. Dobson, P. D.; Doig, A. J. J Mol Biol 2003, 330, 771. 5. Marrero Ponce, Y.; Castillo Garit, J. A.; Nodarse, D. Bioorg Med Chem 2005, 13, 3397.

1055

6. Marrero-Ponce, Y.; Medina-Marrero, R.; Castillo-Garit, J. A.; Romero-Zaldivar, V.; Torrens, F.; Castro, E. A. Bioorg Med Chem 2005, 13, 3003. 7. Chou, K. C.; Cai, Y. D. J Proteome Res 2006, 5, 316. 8. Chou, K. C.; Cai, Y. D. J Cell Biochem 2004, 91, 1197. 9. Chou, K. C.; Cai, Y. D. Proteins 2003, 53, 282. 10. Chou, K. C.; Cai, Y. D. J Cell Biochem 2003, 90, 1250. 11. Cai, Y. D.; Zhou, G. P.; Chou, K. C. J Theor Biol 2005, 234, 145. 12. Cai, Y. D.; Liu, X. J.; Xu, X. B.; Chou, K. C. J Cell Biochem 2002, 84, 343. 13. Cai, Y. D.; Liu, X. J.; Xu, X. B.; Chou, K. C. Mol Cell Biol Res Commun 2000, 4, 230. 14. Cai, Y. D.; Lin, S. L. Biochim Biophys Acta 2003, 1648, 127. 15. Cai, Y. D.; Chou, K. C. J Theor Biol 2006, 238, 395. 16. Cai, Y. D.; Chou, K. C. Biochem Biophys Res Commun 2004, 323, 425. 17. Liu, H.; Yang, J.; Wang, M.; Xue, L.; Chou, K. C. Protein J 2005, 24, 385. 18. Pan, Y. X.; Li, D. W.; Duan, Y.; Zhang, Z. Z.; Xu, M. Q.; Feng, G. Y.; He, L. Acta Biochim Biophys Sin (Shanghai) 2005, 37, 88. 19. Chou, K. C. Proteins 2001, 43, 246. 20. Chou, K. C. Biochem Biophys Res Commun 2000, 278, 477. 21. Pan, Y. X.; Zhang, Z. Z.; Guo, Z. M.; Feng, G. Y.; Huang, Z. D.; He, L. J Protein Chem 2003, 22, 395. 22. Demeler, B.; Zhou, G. W. Nucleic Acids Res 1991, 19, 1593. 23. Blom, N.; Gammeltoft, S.; Brunak, S. J Mol Biol 1999, 294, 1351. 24. Bologna, G.; Yvon, C.; Duvaud, S.; Veuthey, A. L. Proteomics 2004, 4, 1626. 25. Brunak, S.; Engelbrecht, J.; Knudsen, S. J Mol Biol 1991, 220, 49. 26. Yang, Z. R.; Wang, L.; Young, N.; Trudgian, D.; Chou, K. C. Curr Protein Pept Sci 2005, 6, 479. 27. Chou, K. C.; Cai, Y. D. J Biol Chem 2002, 277, 45765. 28. Gonzalez-Diaz, H.; Molina, R.; Uriarte, E. FEBS Lett 2005, 579, 4297. 29. Gonzalez-Diaz, H.; Molina, R.; Uriarte, E. Bioorg Med Chem Lett 2004, 14, 4691. 30. Ramos de Armas, R.; Gonzalez-Diaz, H.; Molina, R.; Uriarte, E. Proteins 2004, 56, 715. 31. Randiıˆc, M.; Vraıˆcko, M.; Nandy, A.; Basak, S. C. J Chem Inf Comput Sci 2000, 40, 1235. 32. Agu¨ero-Chapin, G.; Gonzalez-Diaz, H.; Molina, R.; Varona-Santos, J.; Uriarte, E.; Gonzalez-Diaz, Y. FEBS lett 2006, 580, 723. 33. Perez, M. A.; Sanz, M. B.; Torres, L. R.; Avalos, R. G.; Gonzalez, M. P.; Gonzales-Diaz, H. Eur J Med Chem 2004, 39, 905. 34. Marrero Ponce, Y.; Cabrera Perez, M. A.; Romero Zaldivar, V.; Gonzalez-Diaz, H.; Torrens, F. J Pharm Pharm Sci 2004, 7, 186. 35. Perez Gonzalez, M.; Dias, L. C.; Helguera, A. M.; Rodriguez, Y. M.; de Oliveira, L. G.; Gomez, L. T.; Gonzalez-Diaz, H. Bioorg Med Chem 2004, 12, 4467. 36. Gonzalez-Diaz, H.; Uriarte, E. Biopolymers 2005, 77, 296. 37. Ramos de Armas, R.; Gonzalez-Diaz, H.; Molina, R.; Perez Gonzalez, M.; Uriarte, E. Bioorg Med Chem 2004, 12, 4815. 38. Liao, B.; Ding, K. J Comput Chem 2005, 26, 1519. 39. Liao, B.; Wang, T. M. J Comput Chem 2004, 25, 1364. 40. Liao, B.; Xiang, X.; Zhu, W. J Comput Chem 2006, 27, 1196. 41. Liu, L.; Wang, T. J Comput Chem 2006, 27, 1119. 42. Yao, Y.-H.; Nan, X.-Y.; Wang, T.-M. J Comput Chem 2005, 26, 1339. 43. Yu-Hua, Y.; Liao, B.; Tian-Ming, W. J Mol Struct (Theochem) 2005, 755, 131. 44. Liao, B.; Wang, T. J Biomol Struct Dynam 2004, 21, 827. 45. Liao, B.; Ding, K.; Wang, T. J Biomol Struct Dynam 2005, 22, 455. 46. Liao, B.; Luo, J.; Li, R.; Zhu, W. Int J Quantum Chem 2006 106, 1749.

Journal of Computational Chemistry

DOI 10.1002/jcc

1056

Gonza´lez-Dı´az et al. • Vol. 28, No. 6 • Journal of Computational Chemistry

47. Zhu, W.; Liao, B.; Ding, K. J Mol Struct (Theochem) 2005, 757, 193. 48. Hoen, P. A.; Out, R.; Commandeur, J. N.; Vermeulen, N. P.; van Batenburg, F. H.; Manoharan, M.; van Berkel, T. J.; Biessen, E. A.; Bijsterbosch, M. K. RNA 2002, 8, 1572. 49. Yang, S. P.; Song, S. T.; Tang, Z. M.; Song, H. F. Acta Pharmacol Sin 2003, 24, 897. 50. Gonzalez-Diaz, H.; de Armas, R. R.; Molina, R. Bioinformatics 2003, 19, 2079. 51. Gonzalez-Diaz, H.; Perez-Bello, A.; Uriarte, E.; Gonzalez-Diaz, Y. Bioorg Med Chem Lett 2006, 16, 547. 52. Saiz-Urra, L.; Gonzalez-Diaz, H.; Uriarte, E. Bioorg Med Chem 2005, 13, 3641. 53. Schleinkofer, K.; Wiedemann, U.; Otte, L.; Wang, T.; Krause, G.; Oschkinat, H.; Wade, R. C. J Mol Biol 2004, 344, 865. 54. Castellano, J. M.; Vioque, B. Plant Growth Regulation 2002, 38, 203. 55. Dellaporta, S. L.; Word, J.; Hicks, J. B. Plant Mol Biol Reporter 1983, 1, 19. 56. Giovannoni, J. Ann Rev Plant Physiol Plant Mol Biol 2001, 52, 725. 57. Gonzalez-Diaz, H.; Aguero-Chapin, G.; Varona-Santos, J.; Molina, R.; de la Riva, G.; Uriarte, E. Bioorg Med Chem Lett 2005, 15, 2932. 58. Mathews, D. H.; Sabina, J.; Zuker, M.; Turner, D. H. J Mol Biol 1999, 288, 911.

59. Mason, M. G.; Botella, J. R. J Plant Physiol 1997, 24, 239. 60. Witten, I. H.; Frank, E. WEKAs (Waikato Environment for Knowledge Analysis), Data mining software in Java. 2000. http://www.es. waikato.ac.nz/ml/weka/). 61. Witten, I. H.; Frank, E. Data Mining: Practical Machine Learning Tools and Techniques; 2nd edition; Morgan Kaufmann Publishers, 2005, pp. 265–320; p. 525. 62. Stiefl, N.; Baumann, K. J Med Chem 2003, 46, 1390. 63. Gonzalez-Diaz, H.; Torres-Gomez, L. A.; Guevara, Y.; Almeida, M. S.; Molina, R.; Castanedo, N.; Santana, L.; Uriarte, E. J Mol Model (Online) 2005, 11, 116. 64. Gonzalez-Diaz, H.; Uriarte, E. Bioorg Med Chem Lett 2005, 15, 5088. 65. Gonzalez-Diaz, H.; Perez-Bello, A.; Uriarte, E.; Gonzalez-Diaz, Y. Bioorg Med Chem Lett 2006, 16, 547. 66. Gonzalez-Diaz, H.; Sanchez-Gonzalez, A.; Gonzalez-Diaz, Y. J Inorg Biochem 2006, 100, 1290. 67. Vazquez-Padron, R. I.; de la Riva, G.; Aguero-Chapin, G.; Silva, Y.; Pham, S. M.; Soberon, M.; Bravo, A.; Aı¨touchea, A. FEBS Lett 2004, 570, 30. 68. Altschul, A.; Stephen, F.; Madden, T. L.; Scha¨ffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J. Nucleic Acids Res 1997, 25, 3389. 69. Dobson, P. D.; Doig, A. J. J Mol Biol 2005, 345, 187.

Journal of Computational Chemistry

DOI 10.1002/jcc

2D-RNA-coupling numbers: A new ... - Wiley Online Library

GIOVANNA DELOGU,2 LOURDES SANTANA,1 EUGENIO URIARTE,1 GIANNI PODDA2. 1Department of Organic Chemistry, University of Santiago de ...

210KB Sizes 4 Downloads 52 Views

Recommend Documents

Toward a New Pragmatist Politics - Wiley Online Library
Abstract: In A Pragmatist Philosophy of Democracy, I launched a pragmatist critique of Deweyan democracy and a pragmatist defense of an alternative view.

A new blue-light emitting polymer: Synthesis ... - Wiley Online Library
RAPID COMMUNICATION. A New Blue-Light ... 3Department of Chemistry, Faculty of Education, Tanta University, Egypt ... Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing 100029, People's Republic of China.

TARGETED ADVERTISING - Wiley Online Library
the characteristics of subscribers and raises advertisers' willingness to ... IN THIS PAPER I INVESTIGATE WHETHER MEDIA TARGETING can raise the value of.

PDF(3102K) - Wiley Online Library
Rutgers University. 1. Perceptual Knowledge. Imagine yourself sitting on your front porch, sipping your morning coffee and admiring the scene before you.

s temperate marine phylogeography, with new ... - Wiley Online Library
Africa, 3Department of Zoology, Allan Wilson. Centre for Molecular Ecology and Evolution,. University of Otago, Dunedin, New Zealand. *Correspondence: Luciano Beheregaray, School of Biological Sciences, Flinders University,. Adelaide, SA 5001, Austra

Strategies for online communities - Wiley Online Library
Nov 10, 2008 - This study examines the participation of firms in online communities as a means to enhance demand for their products. We begin with theoretical arguments and then develop a simulation model to illustrate how demand evolves as a functio