A polynomial time supertree construction method Jaikishan Jalan Iowa State University, Ames, IA, USA [email protected]

Abstract In this paper, I propose a polynomial time supertree construction method which guarantees that the supertree constructed contains some special structured clusters which are present in all the input trees. We show that unlike Min Cut, supertree constructed does not always preserve nesting property. The algorithm runs in polynomial time in terms of the number of total input taxa set that are present in the input trees. We will also compare the results with the existing technique like MRP, MRF, MinCut and Modified Min Cut and show that the supertree constructed is different from the one obtained from those methods.

1

Introduction

Supertree is rooted evolutionary tree constructed from the smaller phylogenies that share some but not necessarily all taxa (leaf nodes) in common. This, supertree can help us to understand the relationship among a larger number of taxa that do not occur in any single input tree. In addition to helping synthesize hypotheses of relationships among larger sets of taxa, supertree can suggest optimal strategies for taxon sampling, can reveal emerging patterns in the large knowledge base of phylogenies currently in the literature, and can provide useful tools for comparative biologists who frequently have information about variation across much broader sets of taxa than those found in any one tree. Given a set of phylogenetic input trees, the main objective of a supertree problem is to preserve information from the input trees and able to derive novel relationships among the input taxa sets. A supertree is a solution to a supertree problem. The most widely used supertree algorithm in phylogenetic is Matrix Representation using Parsimony (MRP). MRP encodes the input trees into binary characters, and the supertree is constructed from the resulting data matrix using a parsimony tree building method. The MRP supertree method is NP - complete and hence no polynomial time algorithm exists for MRP. Similarly, supertree by flipping is NP-complete[5]. Min Cut and Modified MinCut [1] runs in polynomial time and also preserve nesting property but they do not speak about strict consensus property at all. To date, as per best of my knowledge, there is no polynomial time supertree construction method that maintains the strict consensus property in the supertree. In this paper, I try to propose a supertree construction method such that the supertree contains clusters (with some special property of the structure of the subtree below it) that are present in all the input trees. This algorithm is very attractive because it can potentially scale to handle large problems.

2 Proposed Algorithm 2.1 Terminologies In this section we define terminologies which will be used in the algorithm. Definition 1: A profile is defined as a set of rooted phylogenetic X – tree with overlapping taxa set and possibly incompatible information. Let the leaf set of tree is represented by . Let Definition 2: A

is defined as a pair

where

. Its weight is define as

: Depth of least common ancestor of from the root of : Maximum path length of leaf node from the root in

2.2 Algorithm In this section, we propose the algorithm that will generate a supertree which will contain all the leafs nodes that are present in all the input trees. It will also contain the all the clusters with special property of the subtree below it that are present in all the input trees. The algorithm starts by generating all the data pair over only when there exists at least one such that and calculate their weight as per the definition 2 in Section 2.1. The algorithm sorts these data pairs in descending order in terms of their weight. It then picks the data pair from the sorted sequence and combines it with the partially connected tree depending upon its weight. If the data pair is already contained in the partial connected tree by the algorithm, then it will just ignore it. However if contains one node of the data pair, then it compares the weight of the data pair and the data pair which introduced the common node in one of the previous iteration. Let be the child of the root of which contains . If the weight is same, then the algorithm adds to the least common ancestor of nodes of . If the weight of is less than , then it creates a temporary node with left child as and right child as and replace with in . If is not contained in at all, then it replaces by a single node whose left child is and right child is . The algorithm terminates only when all the leaf nodes present in the input trees are contained in .

Algorithm: 1. Sort all possible data pair in descending order of their weights. 2. 3. For Let 3.1 If

= 0 then

3.2 If Let 3.2.1 If

3.4 If 3.5 If

= 1 then . Let

was the pair which introduced then

in

previously.

Add to the 3.2.2 If then Let be the child of root of which contains . We make a new node with left child is and right child Replace by in = 2 then Do nothing; then Return ;

4. End Clearly, . This can be proved using contradiction. Let . This means there exists at least one leaf node which is not present in . The algorithm starts by generating all possible data pairs over a given profile . Hence at least one data pair must be generated from assuming that If

doesn’t contain , then two cases can happen: : If this is the case, then algorithm would have included in step 3.1. Hence when the algorithm has terminated, has . This is a contradiction. : If this is the case, then algorithm would have included in step . Hence when the algorithm has terminated, has . This is a contradiction.

Hence,

will contain all leaf nodes present in all the input trees.

The above algorithm runs in polynomial time. The outer loops iterate maximum for . During the iteration, algorithm maintains an array which contains the taxa that has been added to it. Hence the intersection between and a data pair can be implemented in a constant amount of time. We can find least common ancestor using algorithm which takes using Bender and Colton Algorithm [4]. Hence the algorithm runs in maximum of .

3 Properties In this section, we will review some of the properties that the supertree generated by the algorithm guarantees to preserve and also those which it might not always preserve. We show that the supertree constructed guarantees to have all the cherries which are present in all the input trees. We will also show that the for all those clusters such that the subtree below it is caterpillar shaped which are present in all the input trees, it will also be present in the supertree. Finally, we will conclude this section by giving a counter example to prove that the algorithm does not preserve the nesting property.

Lemma 1: Proof: Let be a cluster which is present in all input trees. . Consider all the trees such that the root of is the child of the root of the input tree that contains . Now, since is cluster of size 2, it will occur only as a cherry in tree . Therefore for all pairs of the form

Note that

Let

where

be the partially constructed tree by the algorithm before or was added. As shown above pair has the highest weight compared any other pair which can possibly introduce or . Therefore, will be added in .

Lemma 2: All those clusters such that the subtree below it is caterpillar shaped and are present in all the input trees, it will also be present in the supertree. Proof: Consider a cluster where . Consider all the trees the root of the input tree that contains and

present in all the input trees which is caterpillar shaped such that the root of is the child of . Now all pairs of the form are related as

Note that for all

,

Therefore, in general, we can say that all pairs of the form are related as

and

- [1] Consider a partial built tree by the algorithm which doesn’t contain any node from . Now, consider a data pair that the algorithm picks from all the data pairs which introduces a node from . It will always be chosen from since for any other data pair of the form , its weight will be less. After this being said, let us assume that the cluster is caterpillar shaped.

Now consider all the pair of the form weighting scheme,

.Clearly from the definition of the

- [2]

Let be the partial supertree built by the algorithm which does not contain any node from . Now from [1] and [2], will be the first pair added to . After this has been done, say after some time later during the execution of the algorithm, it tries to add another node from . From [2], the second highest pair will be or . Again, it will be highest weighted pair of the form from [1]. Therefore, algorithm will chose anyone from or . Hence, in any case, the algorithm will run step and it will form the cluster . Similar argument can be extended and by the end the algorithm will contain . The below example shows that it always preserves a caterpillar shaped cluster. However, it might not necessarily preserve other type of cluster. In this case, it does not preserve .

Lemma 3:

will not always preserve nesting property

Proof: To prove the above claim, it is sufficient to show that there exists a supertree input for which the supertree obtained by the algorithm does not preserve the nesting property. Consider two input trees as shown in Fig 1. Let and . Clearly, A is nested in B in both the input trees but not nested in output tree.

Thus, we review that the supertree constructed using the proposed algorithm always contains the cherries and clusters such that subtree below them is caterpillar shaped if they are contained in all the input trees. We also showed that the supertree may not always preserve the nesting property.

4 Comparison with MC, MMC, MRF and MRP In this section, we will try to compare the results obtained from the proposed algorithm with some of the existing technique like MC, MMC, MRF and MRP.

4.1 MC and MMC In this section we compare our supertree with Min Cut and Modified min Cut. Consider the two input trees T1 and T2 as shown in the figure below. It also shows the supertree produced by MC, MMC and by our algorithm. We can clearly see the difference in the structure of the tree. We leave it as a future work to derive the conclusion from the comparison. The main intention of this is to show that our algorithm behaves differently compared to MC and MMC algorithm.

4.2 MRF and MRP In this section, we compare the supertree obtained from the proposed algorithm with MRP and MRF. Consider the Input Tree 1 and Input Tree 2 as show in the figure below. Figure also shows the MRP and MRF tree generated using Rainbow* and the supertree obtained from our algorithm.Again, we leave it as a future work to derive the conclusion from the comparison.

5 References [1] Roderic D. M. Page. Modified mincut supertrees [2] Olaf R.P. Bininda-Emonds (2004) The evolution of supertrees. In TRENDS in Ecology and Evolution: Vol.19 No.6 June 2004. [3] Mike Steel; Andreas W. M. Dress; Sebastian Bocker (2000). Simple but Fundamental Limitations on Supertree and Consensus Tree Methods. Systematic Biology, Vol. 49, No. 2. (Jun., 2000), pp. 363-368. [4] Michael A. Bender, Mart´ın Farach-Colton (2000). The LCA Problem Revisited [5] Duhong Chen,,Oliver Eulenstein,David Fernandez-Baca,Michael Sanderson (2006). Minimum-Flip Supertrees: Complexity and Algorithms. In IEEE Transactions on Computational Biology and Bioinformatics April-June 2006 (Vol. 3, No. 2) pp. 165-173

A polynomial time supertree construction method

and the supertree is constructed from the resulting data matrix using a parsimony tree building method. ... The algorithm starts by generating all the data pair .... 4.1 MC and MMC. In this section we compare our supertree with Min Cut and Modified min Cut. Consider the two input trees T1 and T2 as shown in the figure below.

458KB Sizes 0 Downloads 202 Views

Recommend Documents

A CONTINUATION METHOD TO SOLVE POLYNOMIAL SYSTEMS ...
the path of pairs (/t,7t), where /t,t ∈ [0,T] is a polynomial system and /t(7t) = 0. He proved ... namely H(d) is the vector space of systems of n homogeneous polyno-.

A Polynomial-Time Dynamic Programming ... - Research at Google
A Polynomial-Time Dynamic Programming Algorithm for Phrase-Based. Decoding with a Fixed .... gorithms all have exponential time runtime (in the length of the ...

A Polynomial-time Approximation Scheme for ... - Research at Google
The line segments are the edges of the planar ... component serves to separate some terminals t1,...,tp ... it follows that the solution Ei∗ ∪ M returned in Line 9.

A Polynomial-Time Dynamic Programming Algorithm ... - ACL Anthology
Then it must be the case that c(Hj) ≥ c(Hj). Oth- erwise, we could simply replace Hj by Hj in H∗, thereby deriving a new 1-n path with a lower cost, implying that H∗ is not optimal. This observation underlies the dynamic program- ming approach.

Polynomial Wigner--Ville distribution-based method for ...
In [1], phase differentiation was performed by multiplying the complex signal with its pixel-shifted complex conjugate. The phase differentiation approach is susceptible to noise and a filtering-based method was shown in [2]. The above methods provid

Fast and Generalized Polynomial Time Memory ...
plete [4]. A formulation of VSC for more general memory consistency models .... If the LHS of the implication is satisfied we call O1 and O2 as locally ordered memory ..... on an 8 way 1.2 Ghz Intel R Xeon R 5 processor platform running Linux. .... I

Polynomial-Time Isomorphism Test for Groups with no ...
Keywords: Group Isomorphism, Permutational Isomorphism, Code Equiva- lence. .... lence, in [3] we gave an algorithm to test equivalence of codes of length l over an ... by x ↦→ xg := g−1xg. For S ⊆ G and g ∈ G we set Sg = {sg | s ∈ S}. Pe

Construction of non-convex polynomial loss functions for ... - arXiv
Jun 17, 2014 - Abstract. Quantum annealing is a heuristic quantum algorithm which exploits quantum resources to minimize an objective function embedded as the energy levels of a programmable phys- ical system. To take advantage of a poten- tial quant

A Novel Method for Travel-Time Measurement for ...
simulation results obtained through use of a simulation program developed by the authors. ... In contemporary modern wireless communications systems.

Robust Estimation of Reverberation Time Using Polynomial Roots
not from the time domain or frequency domain responses h[n] and H(z) one does not .... order to test the validity of Section 3.2 such a vector was created 6000 ...

Polynomial Time Algorithm for Learning Globally ...
Gippsland School of Information Technology, Monash University, Australia .... parents, penalized by a term which quantifies the degree of statistical significance.

Polynomial-time Optimal Distributed Algorithm for ...
Reassignment of nodes in a wireless LAN amongst access points using cell breathing ... monitor quantities, surveillance etc.) [8]. Authors in [9] have proposed ...

A Novel Method for Travel-Time Measurement for ...
simulation results obtained through use of a simulation program developed by the ... input data is taken from first-arrival travel-time measurements. The .... Data Recovery: ... beginning at 7 msec, at z=0, the free surface, corresponds to a wave.

A novel time-memory trade-off method for ... - Semantic Scholar
Institute for Infocomm Research, Cryptography and Security Department, 1 ..... software encryption, lecture notes in computer science, vol. ... Vrizlynn L. L. Thing received the Ph.D. degree in Computing ... year. Currently, he is in the Digital Fore

Polynomial-time Isomorphism Test for Groups with ...
algorithm to test isomorphism for the largest class of solvable groups yet, namely groups with abelian Sylow towers, defined as ...... qi , qi = pmi . Then we need to use Wedderburn's theory on the structure of semisimple algebras.2. ▷ Lemma 5.4 (L

Polynomial-time Optimal Distributed Algorithm for ...
a reallocation problem is independent of the network size. Remark 2: The ... We now begin the proof of convergence of the proposed algorithm. Proof: Let gi. =.

A new method to obtain lower bounds for polynomial ...
Dec 10, 1999 - Symposium on Applied Algebra, Algebraic Algorithms and Error Correcting. Codes, AAECC-5, Lecture Notes in Computer Science vol.

Fast and Generalized Polynomial Time Memory ...
and SMP platforms with a large number of execution threads. .... If the LHS of the implication is satisfied we call O1 and O2 as locally ordered memory ..... In PACT '03: Proceedings of the 12th International Conference on Parallel Archi- tectures ..

Polynomial-mal.pdf
+cx+d F¶ _lp]Z ̄nsâ Hcq LSIw (x+1) Bbm a+c=b+d F¶v. sXfnbn¡qI? 16. P(x)= x. 3. +6x2. +11x-6+k bpsS LSI§fmW" (x+1), (x+2) Ch F¦n k bpsS hne F ́v ? 17. P(x)=x15. -1 sâ LSIamtWm (x-2) F¶v ]cntim[n¡qI. 18. P(x)=x2. +1 sâ LSIamtWm (x-2) FÂ

Polynomial Division.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Polynomial ...

Automatic Polynomial Expansions - GitHub
−0.2. 0.0. 0.2. 0.4. 0.6. 0.8. 1.0 relative error. Relative error vs time tradeoff linear quadratic cubic apple(0.125) apple(0.25) apple(0.5) apple(0.75) apple(1.0) ...

CONSTRAINED POLYNOMIAL OPTIMIZATION ...
The implementation of these procedures in our computer algebra system .... plemented our algorithms in our open source Matlab toolbox NCSOStools freely ...

2.3 Polynomial Division.pdf
Page 1 of 6. 2.3 POLYNOMIAL DIVISION. Objectives. Use long division to divide polynomials by other polynomials. Use the Remainder Theorem and the Factor Theorem. Long Division of Polynomials. *In the previous section, zeros of a function were discuss

Method and apparatus using geographical position and universal time ...
Aug 15, 2002 - M h d d pp. f p d g h ..... addition to transactional data, user location data and event ..... retrieve the user's public key from a public key database,.