Phylogenetics

Advanced article Article Contents

Wen-Hsiung Li, University of Chicago, Chicago, Illinois, USA . Introduction

Phylogenetics concerns the development of methods for reconstructing evolutionary relationships among organisms and the application of such methods to reconstruct phylogenetic trees using morphological or molecular data.

. Terminologies . Methods of Tree Reconstruction . Man’s Closest Relatives

Online posting date: 30th April 2008

Introduction Phylogenetics involves the development of methods for reconstructing evolutionary relationships among organisms, as well as the application of these methods to reconstruct phylogenetic trees among the organisms under study. Phylogenetic trees are constructed using morphological or molecular data. Here we are concerned with only molecular data; the subject matter is usually referred to as ‘molecular phylogenetics’ or ‘molecular systematics’. Note that one can apply the same methodology to reconstruct evolutionary relationships among genes. The molecular approach to systematics was initiated at the end of the nineteenth century by Nuttall (1904), who used serological cross-reactions to study phylogenetic relationships among various groups of animals. However, extensive use of molecular data in phylogenetic studies did not begin until the early 1960s, after the introduction of protein sequencing, protein electrophoresis, and other molecular techniques into the field. Protein sequence data allowed, for the first time, investigation of long-term evolution such as the relationships among mammalian orders or even more distantly related taxa (Fitch and Margoliash, 1967). On the other hand, less expensive and more expedient methods, such as protein electrophoresis, hybridization of deoxyribonucleic acid (DNA) to DNA and immunological methods, although less accurate than protein sequencing, were extensively used to study the phylogenetic relationships among populations or closely related species. The advent of various recombinant DNA techniques since the 1970s has led to a rapid accumulation of DNA sequence data, stimulating much interest in molecular systematics. Yet, the development of the polymerase chain reaction (PCR) method had made systematic studies even easier, resulting in an even higher level of activity in phylogenetic reconstruction. Now, various genome sequencing projects have produced tremendous amounts of sequence data, which has stimulated even more intense phylogenetic study.

ELS subject area: Evolution and Diversity of Life How to cite: Li, Wen-Hsiung (April 2008) Phylogenetics. In: Encyclopedia of Life Sciences (ELS). John Wiley & Sons, Ltd: Chichester. DOI: 10.1002/9780470015902.a0005135.pub2

Molecular data have proved useful for studying not only the phylogenetic relationships among closely related populations or species, such as the relationships among human populations (Cann et al., 1987; Vigilant et al., 1991; Yu et al., 2002) and those between humans and apes, but also very ancient evolutionary occurrences such as the origin of mitochondria and chloroplasts (Cedergren et al., 1988; Giovannoni et al., 1988) and the divergence of phyla and kingdoms (Woese, 1987; Baldauf et al., 2000). In the future, most phylogenetic issues are likely to be resolved by molecular data. We may eventually be able to fulfill Darwin’s dream of having ‘a fairly true genealogical tree of each great kingdom of Nature’. To take advantage of molecular data, however, one must understand the methodology of molecular phylogenetics. We shall therefore present the basic principles of tree reconstruction and examples of phylogenetic trees.

Terminologies A phylogenetic tree is a graph composed of nodes and branches, in which only one branch connects any two adjacent nodes (Figure 1). The nodes represent the taxonomic units, and the branches define the relationships among the units in terms of descent and ancestry. The branches of a tree are also known as the edges. The branching pattern of a tree is called the topology. The branch length usually represents the number of changes that have occurred in that branch. As the taxonomic units represented by the nodes can be species, populations, individuals or genes, they are simply referred to as operational taxonomic units (OTUs). Figure 1 illustrates two common ways of drawing a phylogenetic tree. In Figure 1a, the branches are unscaled; their lengths are not proportional to the number of changes, which are indicated on the branches. This presentation allows us to line up the extant OTUs and also to place the nodes representing divergence events on a timescale in which the times of divergence are known or have been estimated. In Figure 1b, the branches are scaled, that is, their lengths are proportional to the number of changes. Phylogenetic trees can be either rooted or unrooted Figure 2). In a rooted tree, there exists a particular node, called the ‘root’ (Figure 2a), from which a unique path leads to any other node. The direction of each path corresponds to evolutionary time, and the root is the common ancestor of all the OTUs under study. An unrooted tree is a tree that

ENCYCLOPEDIA OF LIFE SCIENCES & 2008, John Wiley & Sons, Ltd. www.els.net

1

Phylogenetics

Unscaled

Scaled 2 1

G

F

3

2

1

B

1 2 B 2 C

2

2 1

A

2 C

H

I

A

1

D

6

6

D

Sequence 1: GACTGGTAC–A Sequence 2: GATTGGTAC–A Sequence 3: GATAGGCACTA Sequence 4: GACAAGCACTA

E (a)

Time

(b) 1 unit

Figure 1 Two alternative representations of a phylogenetic tree for five OTUs. (a) Unscaled branches: extant OTUs are lined up, and nodes are positioned proportionally to times of divergence. (b) Scaled branches: lengths of branches are proportional to the numbers of molecular changes. 1 unit: a nucleotide substitution or amino acid substitution. (Reproduced with permission from Li and Graur, 1991.)

Rooted tree

B

C

Methods of Tree Reconstruction

A

C D

B

E

E (a)

Time

(b)

Figure 2 Rooted (a) and unrooted (b) phylogenetic trees. Arrows indicate the unique path leading from the root (R) to OTU D. (Reproduced with permission from Li and Graur, 1991.)

only specifies the relationships among the OTUs but does not define the evolutionary path (Figure 2b). Unrooted trees do not make assumptions or require knowledge about common ancestors. The majority of tree-making methods yield unrooted trees. To root an unrooted tree, we usually need an outgroup, that is, an OTU for which external information, such as paleontological evidence, clearly indicates that it has branched off earlier than the taxa under study. The root is then placed between the outgroup and the node connecting it to the other OTUs. For example, if OUT E in Figure 2b is known to be the outgroup, then the root can be placed as in Figure 2a. There are two types of molecular data: characters and distances. A character can be a nucleotide (or an amino acid) at a site in a DNA (protein) sequence, or the presence or absence of a deletion or insertion at a site. That is, each nucleotide (amino acid) site in a DNA (protein) sequence 2

The other type of data are distance data, which are computed from amino acid or DNA sequence data. These are also called ‘distance matrix’ data, because the distances are usually presented in a matrix form. As described below, some methods use character state data, whereas others use distance data.

Unrooted tree A

R

can be considered a character site. For example, in the following alignment of four DNA sequences, each position can be considered a character site. The character at the third site is C in sequences 1 and 4, but T in sequences 2 and 3. The gap at position 10 can be due to either a deletion in the common ancestor of the first two sequences or an insertion in the common ancestor of the third and fourth sequences.

Numerous tree reconstruction methods have been proposed (Felsenstein, 1988; Li, 1997; Nei and Kumar, 2000). Here we discuss the three types of methods that have been the most commonly used in the literature. These are (1) distance matrix methods, (2) maximum parsimony methods and (3) maximum likelihood methods.

Distance matrix methods In distance matrix methods, evolutionary distances (usually numbers of nucleotide or amino acid substitutions between sequences) are computed for all pairs of taxa, and a phylogenetic tree is constructed by using an algorithm based on some functional relationships among the distance values. As an illustration of many basic principles and concepts in phylogenetics, we shall first describe the simplest method of tree reconstruction. We shall then describe the most commonly used distance method. The simplest method for tree reconstruction is the unweighted pair-group method with arithmetic mean (UPGMA). It employs a sequential clustering algorithm, in which local topological relationships are inferred in order of decreasing similarity and a phylogenetic tree is built in a stepwise manner. That is, we first identify the two OTUs that are most similar to each other (have the shortest distance) and treat them as a new single OTU. Such an OTU is referred to as a composite OTU. Subsequently, from among the new group of OTUs, we identify the pair with the highest similarity, and so on, until only two OTUs are left. Consider a case of four OTUs. The pairwise evolutionary distances are given by the following matrix:

ENCYCLOPEDIA OF LIFE SCIENCES & 2008, John Wiley & Sons, Ltd. www.els.net

Phylogenetics

A

D C

B (a)

dAB/2

A B

A

A

B

a

c

A

x

b

d

B D

B

C D

C (b)

d(AB)C/2

Figure 3 Diagram illustrating the stepwise construction of a phylogenetic tree for four OTUs, according to the UPGMA method. (Reproduced with permission from Li and Graur, 1991.)

2 3 4

1

2

3

d 12 d 13 d 23 d 14 d 24 d34

In this matrix, dij represents the distance between OTUs i and j. The first two OTUs to be clustered are the ones with the smallest distance. Let us assume that d12 has the smallest value. Then, OTUs 1 and 2 are the first to be clustered, and the branching point is positioned at a distance of d12/2 substitutions (Figure 3a). After the first clustering, OTUs 1 and 2 are treated as a single composite OTU, and a new distance matrix is computed. OTU

12

3

d(12)3

4

d(12)4

C

(b)

(a)

Figure 4 Unrooted trees with (a) four OTUs and (b) five OTUs. (Reproduced with permission from Li and Graur, 1991.)

d(ABC)D/2

(c)

OTU

E

3 d34

In this matrix, d(12)3=(d13+d23)/2 and d(12)4= (d14+d24)/2. In other words, the distance between a simple OTU and a composite OTU is the average of the distances between the simple OTU and the constituent OTUs of the composite OTU. If d(12)3 turns out to be the smallest distance in the new matrix, then OTU C will be joined to the composite OTU (12) with a branching node at d(12)3/2 (Figure 3b). The final step consists of clustering the last OTU (OTU 4) with the composite OTU (123). The root of the entire tree is positioned at d(123)4/2=[(d14+d24+ d34)/3]/2. The final tree inferred is shown in Figure 3c. In the UPGMA method, the distance between two composite OTUs is computed as the arithmetic mean of the pairwise distances between the constituent OTUs of the two composite OTUs. For example, the distance between the composite OTUs (ij ) and (mn) is dðijÞðmnÞ ¼ ðdim þ din þ djm þ djn Þ=4

½1

It may now become clear why the method is called the ‘unweighted pair-group method with arithmatic means’; that is, the distance between composite OTUs is the arithmetic mean of the individual distances, and all pairs of OTUs involved are given the same weight. The most serious drawback of UPGMA is that it assumes a constant rate of evolution among all the evolutionary lineages under study. As this assumption usually does not hold well, the method often does not perform well. A much better method is the neighbor-joining method (Saitou and Nei, 1987). First we will explain the concept of ‘neighbor’. In an unrooted tree, two OTUs are said to be neighbors if they are connected through a single internal node. For example, in Figure 4a, A and B are neighbors and so are C and D. Conversely, for four OTUs with unknown relationships, if we can decide which pairs are neighbors, then the tree topology is decided. For example, if we decide that OTUs A and B are a pair of neighbors, then we obtain the correct tree in Figure 4a. In Figure 4b, neither A and C nor B and C are neighbors. However, if we combine OTUs A and B into one composite OTU, then the composite OTU (AB) and OTU C become a new pair of neighbors. The neighbor-joining method (NJ) finds neighbors sequentially that may minimize the total length of the tree. It starts with a starlike tree (Figure 5a), in which there is no clustering of OTUs. The first step is to separate a pair of OTUs (e.g. 1 and 2) from all the others (Figure 5b). In this tree there is only one interior branch, that is, the branch connecting nodes X and Y, where X is the common node for OTUs 1 and 2 and Y is the common node for the others (3, 4,_, n). For this tree, the sum of all branch lengths is S12 ¼

n X 1 1 1 ðd1k þ d2k Þ þ d12 þ 2ðn  2Þ k¼3 2 n2



n X

dij

½2

3i5jn

Any pair of OTUs can take the positions of 1 and 2 in the tree, and there are n(n21)/2 ways of choosing them (Figure 5c). Among all possible pairs of OTUs, the one that gives the smallest sum of branch lengths is chosen. This pair of OTUs is then regarded as a single OTU, and the arithmetic mean distances between OTUs are computed to form a new distance matrix. The next pair of OTUs that gives the smallest sum of branch lengths is then chosen. This

ENCYCLOPEDIA OF LIFE SCIENCES & 2008, John Wiley & Sons, Ltd. www.els.net

3

Phylogenetics

Maximum parsimony methods

procedure is continued until all n23 interior branches are found. At this stage, only three OTUs are left. For three OTUs, there is only one possible unrooted tree topology.

6

4 X

(b)

3

X

5

2 3 7

6

4

6 (a)

1

2

8 7

8

Y

5

1

7

8

1 X

Y

5

Maximum parsimony methods use character state data. The principle of maximum parsimony is to search for a tree that requires the smallest number of evolutionary changes to explain the differences observed among the OTUs under study. Such a tree is called a maximum parsimony tree. Often, more than one tree with the same minimum number of changes are found, so that no unique tree can be inferred. The method discussed below was first developed for amino acid sequence data (Eck and Dayhoff, 1966) and was later modified for use on nucleotide sequence data (Fitch, 1977). We will start with the definition of informative sites. A nucleotide site is phylogenetically informative only if it favors some trees over others. To illustrate the distinction between informative and noninformative sites, consider the following four hypothetical sequences:

3

4

2

Site

6

4

5

6

7

8

9

1 2 3 4

A A A A

A G G G

G C A A

A C T G

G G A A

T T C T

G G C C

C C C C

A G A G

There are three possible unrooted trees for four OTUs (Figure 6). Site 1 is not informative, because all sequences at this site have A, so that no change is required to explain differences among OTUs in any of the three possible trees. At site 2, sequence 1 has A, while all other sequences have G, and so a simple assumption is that the nucleotide has changed from G to A in the lineage leading to sequence 1. This site also is not informative, because each of the three

1

Figure 5 (a) A starlike tree with no hierachical structure. (b) A tree in which OTUs 1 and 2 are clustered. (c) Any pair of OTUs can take the positions of 1 and 2 in the tree, and there are n(n21)/2 ways of choosing them (see text) by permission of Oxford University Press. Reproduced with permission from Saitou and Nei, 1987.

Tree I A3 G

Tree III

Tree II

1G

1G

A

C2 A

1G

A

C2 A

A

2C

A4

3A

A4

4A

A3

1A

T3

1A

C2

1A

C2

Site 4

C

T

T

C

A

2C

G4

3T

G4

1G

A3

1G

G2

Site 5 (c)

3

7

8

2 (c)

(b)

2

X

Y

3

(a)

1

5

4

Site 3

Sequence

G 2G

A

A A4

3A

T3

4G 1G

A

G2 A

A4

T

A

4A

A3

Figure 6 Three possible unrooted trees for four DNA sequences that have been used to choose the most parsimonious tree. The terminal nodes indicate the nucleotide type at homologous positions in the extant species. Each dot on a branch means a substitution is inferred on that branch. Note that the nucleotides at the two internal nodes of each tree represent one possible reconstruction from among several alternatives. For example, the nucleotide at both the internal nodes of tree III (c) (bottom right) can be G instead of A. In this case, the two substitutions will be positioned on the branches leading to species 3 and 4. However, the minimum number of required substitutions remains the same. (Reproduced with permission from Li and Graur, 1991.)

4

ENCYCLOPEDIA OF LIFE SCIENCES & 2008, John Wiley & Sons, Ltd. www.els.net

Phylogenetics

possible trees requires one change. For site 3, each of the three possible trees requires two changes (Figure 6a), so it is also not informative. In Figure 6a, if we assume that the nucleotide at the node connecting OTUs 1 and 2 in tree I is C instead of G, the number of changes required for the tree is still two. Figure 6b shows that for site 4, each of the three trees requires three changes; thus, site 4 is also noninformative. For site 5, tree I requires only one change, whereas trees II and III each require two changes (Figure 6c). Therefore, this site is informative. From these examples, we see that a site is informative only when there are at least two different kinds of nucleotides at the site, each of which is represented in at least two of the sequences under study. In the above figure, the informative sites (5, 7 and 9) are indicated by an asterisk. For these sites, tree I requires one, one and two changes respectively; tree II requires two, two and one changes; and tree III requires two, two and two changes. Thus, tree I is chosen because it requires the smallest number of changes (four) at the informative sites. Note that in this simple example, only three informative sites are available. Although tree I is favored, it is not statistically better than the other two trees. To achieve statistical significance, a tree has to be supported by many more informative sites than the others. When the number of OTUs under study is larger than four, the situation becomes more complicated, because there are many more possible trees to consider and because inferring the number of substitutions for each alternative tree becomes more tedious. However, the basic principle remains simple: to infer the minimum number of substitutions required for a given tree (see Li, 1997).

Maximum likelihood methods

pij ðtÞ ¼ þ

3 4at 4e

½3

and the probability that the nucleotide at time t is j, j6¼i, is given by pij ðtÞ ¼ 14 þ 14 e4at

y t2 z

t3

1

2 j

i

3 k

4 l

Figure 7 Model tree for the derivation of the likelihood function under a constant rate of evolution.

consider the hypothetical tree given in Figure 7. The likelihood function for a nucleotide site with nucleotides i, j, k and l in sequences 1, 2, 3 and 4, respectively, can be computed as follows. If the nucleotide at the ancestral node (the root) was x, the probability of having nucleotide l in sequence 4 is Pxl (t1+t2+t3). This is because t1+t2+t3 is the total amount of time between the two nodes, the probability of having nucleotide y at the common ancestral node of sequences 1, 2 and 3 is Pxy(t1); and so on. Thus, given x, y and z at the ancestral node and the two internal nodes, the probability of observing i, j, k and l at the tips of the tree is equal to px ðt1 þ t2 þ t3 Þpxy ðt1 Þpyk ðt2 þ t3 ÞPyz ðt2 ÞPzi ðt3 ÞPzj ðt3 Þ ½5

In maximum likelihood methods, one searches for the maximum likelihood (ML) value of the character state configurations among the sequences under study for each possible tree. The tree with the largest ML value is chosen as the preferred tree. The ML method requires a probabilistic model for the process of nucleotide substitution. That is, we must specify the transition probability from one nucleotide state to another in a time interval in each branch. For example, consider the one-parameter model with the rate of substitution per site per unit time. Assume that the nucleotide at a given site is i at time 0. Then, the probability that the nucleotide at time t is i is given by 1 4

x t1

½4

For the formulas, see Li (1997). See also: Evolutionary Distance The next step is to set up the likelihood function. Let us use the case of four sequences (taxa) as an example. For simplicity, we assume a constant rate of substitution and

All of the above transition probabilities can be computed by using eqns [5] and [6]. Since in practice we do not know the ancestral nucleotide, we can only assign a probability gx, which is usually assumed to be the frequency of nucleotide x in the sequence. Noting that x, y and z can be any of the four nucleotides, we sum over all possibilities and obtain the following likelihood function: X hði; j; k; lÞ ¼ qx pxl ðt1 þ t2 þ t3 Þ x



X

pxy ðt1 Þpyk ðt2 þ t3 Þ

y



X

½6

pyz ðt2 Þpzi ðt3 Þpzj ðt3 Þ

z

Note that this likelihood function represents the probability of observing the configuration of nucleotides i, j, k and l in sequences 1, 2, 3 and 4 for the given hypothetical tree in Figure 7. If the hypothetical tree is different from that in Figure 7, then the likelihood function is different from that of eqn [6]. That is, the likelihood function depends on the hypothetical tree. So far, we have considered a single site. The likelihood for all sites is the product of the likelihoods for individual sites if all the nucleotide sites evolve independently. For a given set

ENCYCLOPEDIA OF LIFE SCIENCES & 2008, John Wiley & Sons, Ltd. www.els.net

5

Phylogenetics

of data, one computes the ML value for each tree topology. This procedure is essentially to find the branch lengths that give the largest value for the likelihood function. Finally, one chooses the topology with the highest ML value as the best tree, which is called the ML tree. For a more detailed treatment of ML methods, see Nei and Kumar (2000).

Subfamily Homininae

Hominidae

Pongine

Pongidae

B

Hylobatinae

Hylobatidae

H

Homininae

Hominidae

C

G

Man’s Closest Relatives

O

Who are man’s closest evolutionary relatives? This question has intrigued biologists for centuries. For instance, Darwin (1871) suggested that the African apes, the chimpanzee (Pan) and the gorilla (Gorilla), are man’s closest relatives and that man’s evolutionary origins were to be found in Africa. However, Darwin’s view fell into disfavor for various reasons, and for a long time taxonomists believed that the genus Homo was only distantly related to the apes; thus, Homo was given a family of its own, the Hominidae (Figure 8a). Chimpanzees, gorillas and orangutans (Pongo) were usually placed in a separate family, the Pongidae (Figure 8a). The gibbons (Hylobates) were either classified separately or with the Pongidae (Figure 8b). Goodman (1963) recognized that this systematic arrangement is anthropocentric, because placing the various apes in one family and humans in another implies that the apes share a more recent common ancestry with each other than with humans. When Homo was put in the same clade with a living ape, it was usually with the Asian ape, the orangutan (Figure 8c). By using a serological precipitation method, Goodman (1962) was able to demonstrate that humans, chimpanzees and gorillas constitute a natural clade (Figure 8d), with orangutans and gibbons having diverged from the other apes at much earlier dates. However, serological studies, electrophoretic studies and amino acid sequences could not resolve the evolutionary relationships among humans and the African apes, and the so-called human–gorilla–chimpanzee trichotomy continued to be an extremely controversial issue. The first large set of DNA sequence data for resolving the controversy was obtained by Miyamoto et al. (1987). In the following discussion, this data set, and that of Maeda et al. (1988), will be used to illustrate the distance methods and maximum parsimony methods described above. The ML method is more complex and not discussed here. The following matrix shows the number of nucleotide substitutions per 100 sites between each pair of the following species: humans (H), chimpanzees (C), gorillas (G), orangutans (O) and rhesus monkeys (R). H C G O R

6

C

G

O

Family

H

R

1:45 1:51 1:57 2:98 2:94 3:04 7:51 7:55 7:39 7:10

(a)

C G

Pongine Pongidae

O (b)

B

Hylobatinae

Hylobatidae

H Homininae O Hominidae C Gorillinae G

(c)

B

Hylobatinae

Hylobatidae

H C

Homininae Hominidae

G

(d)

O

Pongine

B

Hylobatinae

Hylobatidae

Figure 8 Four alternative phylogenies and classifications of modern apes and humans (Hominoidea). (a,b) Traditional classifications setting humans apart. (c) Clustering of humans with the orangutan. (d) Cumulative molecular as well as morphological evidence before the DNA era favored this classification. H: human; C: chimpanzee; G: gorilla; O: orangutan; B: gibbon. (Reproduced with permission from Li and Graur, 1991.)

Let us first apply the UPGMA method to these distances. The distance between humans and chimpanzees is the shortest (dHC=1.45). Therefore, we join these two OTUs first and place the node at 1.45/2= 0.73 (Figure 9a).

ENCYCLOPEDIA OF LIFE SCIENCES & 2008, John Wiley & Sons, Ltd. www.els.net

Phylogenetics

Human

0.73

Gorilla

0.77

Orangutan

1.49 (a)

Rhesus monkey

3.69

Human

0.72 0.06

0.74

0.92

Chimpanzee 0.76

5.80

Chimpanzee

Gorilla

1.30

Orangutan (b)

Table 1 Q values for the neighbor-joining method Step 1

Step 2

Q12 5 22261 Q13 5 22243 Q14 5 22057 Q15 5 22047 Q23 5 22231 Q24 5 22075 Q25 5 22041 Q34 5 22045 Q35 5 22089 Q45 5 22431

Q12 5 21357 Q13 5 21348 Q1(45) 5 2342 Q23 5 21342 Q2(45) 5 21348 Q3(45) 5 21357

OTUs 1, 2, 3, 4 and 5 are human, chimpanzee, gorilla, orangutan, and rhesus monkey respectively. (Reproduced with premission from Li and Graur, 1991.)

Rhesus monkey H

Figure 9 Phylogenetic tree for humans, chimpanzees, gorillas, orangutans and rhesus monkeys inferred from the (a) UPGMA method and (b) the neighbor-joining method. (Reproduced with permission from Li and Graur, 1991.)

Following the procedure described above, we compute the distances between the composite OTU (HC) and each of the other species, and obtain a new distance matrix: (HC) G O R

G

O

R

1:54 2:96 3:04 7:53 7:39 7:10

Since (HC) and G are separated by the shortest distance, they are the next to be joined together, and the connecting node is placed at 1.54/2=0.77. Continuing the process, we obtain the tree in Figure 9a. Note that the estimated branching node for H and C is very close to that for (HC) and G, so the above data did not provide a conclusive resolution of the branching order. Much more data have now become available, and humans and chimpanzees are now believed to be the closest relatives (Chen and Li, 2001). We now use the neighbor-joining method. Instead of eqn [2], Studier and Keppler (1988) found the following simpler criterion: Q12 ¼ ðn  2Þd12 

n X i¼1

d1i 

n X

d2i

i¼1

Denoting the five species in the original distance matrix by species 1, 2,_ and 5, and using the dij values in the distance matrix, we have Q12=22261. In a similar manner, we obtain the other Q values under step 1 in Table 1. Since Q45 is the smallest, we take OTUs 4 and 5 (orangutan and rhesus monkey) as the first pair of neighbors. When OTUs 4 and 5 are regarded as a composite OTU, the new distance matrix becomes

C G (OR)

C

G

1:45 1:51 1:57 5:25 5:25 5:22

Using the new distance matrix and the same procedure as above, we obtain the Q values under step 2 in Table 1. Since Q12 and Q3(45) are the smallest, we choose OTUs 1 and 2 as one pair of neighbors and OTUs 3 and (45) as another pair of neighbors. The final tree obtained by this method is given in Figure 9b. One might wonder why in this method the rhesus monkey and the orangutan were chosen as the first pair of neighbors. The reason is as follows. First, the tree inferred by the neighbor-joining method is an unrooted tree in which rhesus monkey and orangutan are in fact a pair of neighbors. Second, this pair was chosen before the human– chimpanzee pair because the data indicated that it is a stronger pair of neighbors than the human–chimpanzee pair. Finally, let us consider the maximum parsimony method. For simplicity, let us consider only humans, chimpanzees, gorillas and orangutans. Table 2 shows the informative sites for the region of 10.2 kb that includes the Z-globin pseudogene and its surrounding regions (Koop et al., 1986; Miyamoto et al., 1987; Maeda et al., 1988). For each site, the hypothesis supported is given in the last column. If we consider base changes only, then there are 15 informative sites, of which eight support the human–chimpanzee clade (hypothesis I), four support the chimpanzee– gorilla clade (hypothesis II) and three support the human–gorilla clade (hypothesis III). In addition, all four informative sites involving a gap support the human– chimpanzee clade. Therefore, the human–chimpanzee clade is chosen as the most likely representation of the true phylogeny. See also: Ancient DNA: Phylogenetic Applications; Evolutionary Distance; Gene Trees and Species Trees

ENCYCLOPEDIA OF LIFE SCIENCES & 2008, John Wiley & Sons, Ltd. www.els.net

7

Phylogenetics

Table 2 Informative sites among sequences of human, chimpanzee, gorilla and orangutan Sitea Data from Miyamoto et al. (1987) 34 560 1287 1338 3057–3060 3272 4473 5153 5156 5480 6368 6808 6971 Data from Maeda et al. (1988) 127–132 1472 D2131 D2224 2341 2635

Human

Chimpanzee

Gorilla

Orangutan

Hypothesis supported

A C c G  T C A A G C C G

G C  G  T C C G G T T G

A A T A TAAT  T C G T C T T

G A T A TAAT  T A A T T C T

III I I I I I I II II I III II I

 G A A G G

 G A G C G

AATATA A G A G A

AATATA A G G C A

I I I III III I

a

Site numbers correspond to those given in the original sources. The total length of the sequence used is 10.2 kb. Hypotheses: I, human and chimpanzee in one clade; II, chimpanzee and gorilla in one clade; III, human and gorilla in one clade. Each asterisk denotes the deletion of a nucleotide at the site. Modified from Williams and Goodman (1989) and Li and Graur (1991). See Li (1997).

b c

References Baldauf SL, Roger AJ, Wenk-Siefert I and Doolittle WF (2000) A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 290: 972–977. Cann RL, Stoneking M and Wilson AC (1987) Mitochondrial DNA and human evolution. Nature 325: 31–36. Cedergren R, Gray MW, Abel Y and Sankoff D (1988) The evolutionary relationships among known life forms. Journal of Molecular Evolution 28: 98–112. Chen FC and Li WH (2001) Genomic divergence between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. American Journal of Human Genetics 68: 444–456. Darwin C (1871) The Descent of Man and Selection in Relation to Sex. London: John Murray. Eck RV and Dayhoff MO (1966) Atlas of Protein Sequence and Structure. Silver Spring, MD: National Biomedical Research Foundation. Felsenstein J (1988) Phylogenies from molecular sequences: inference and reliability. Annual Review of Genetics 22: 521–565. Fitch WM (1977) On the problem of discovering the most parsimonious tree. American Naturalist 111: 223–257. Fitch WM and Margoliash E (1967) Construction of phylogenetic trees. Science 155: 279–284.

8

Giovannoni SJ, Turner S, Olsen GJ et al. (1988) Evolutionary relationships among cyanobacteria and green chloroplasts. Journal of Bacteriology 170: 3584–3592. Goodman M (1962) Immunochemistry of the primates and primate evolution. Annals of the New York Academy of Science 102: 219–234. Goodman M (1963) Serological analysis of the systematics of recent hominoids. Human Biology 35: 377–424. Koop BF, Goodman M, Xu P, Chan K and Slightom JL (1986) Primate Z-globin DNA sequences and man’s place among the great apes. Nature 319: 234–238. Li WH (1997) Molecular Evolution. Sunderland, MA: Sinauer Associates. Li WH and Graur D (1991) Fundamentals of Molecular Evolution. Sunderland, MA: Sinauer. Maeda N, Wu CI, Bliska J and Reneke J (1988) Molecular evolution of intergenic DNA in higher primates: pattern of DNA changes, molecular clock and evolution of repetitive sequences. Molecular Biology and Evolution 5: 1–20. Miyamoto MM, Slightom JL and Goodman M (1987) Phylogenetic relationships of humans and African apes from DNA sequences in the b-globin region. Science 238: 369–373. Nei M and Kumar S (2000) Molecular Evolution and Phylogenetics. New York: Oxford University Press.

ENCYCLOPEDIA OF LIFE SCIENCES & 2008, John Wiley & Sons, Ltd. www.els.net

Phylogenetics

Nuttall GHF (1904) Blood Immunity and Blood Relationship. Cambridge, UK: Cambridge University Press. Saitou N and Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4: 406–425. Studier JA and Keppler KJ (1988) A note on the neighbor-joining algorithm of Saitou and Nei. Molecular Biology and Evolution 5: 729–731. Vigilant L, Stoneking M, Harpending H, Hawkes K and Wilson AC (1991) African populations and the evolution of human mitochondrial DNA. Science 253: 1503–1507.

Woese CR (1987) Bacterial evolution. Microbiology Review 51: 221–271. Yu N, Fu YX and Li WH (2002) DNA polymorphism in a worldwide sample of human X chromosomes. Molecular Biology and Evolution 19: 2131–2141.

Further Reading Kosiol C, Bofkin L and Whelan S (2006) Phylogenetics by likelihood: evolutionary modeling as a tool for understanding the genome. Journal of Biomedical Informatics 39: 51–61.

ENCYCLOPEDIA OF LIFE SCIENCES & 2008, John Wiley & Sons, Ltd. www.els.net

9

Phylogenetics

Denoting the five species in the original distance matrix by species 1, 2,_ and 5, and using the dij values in the distance matrix, we have Q12=22261. In a similar manner, we ob- tain the other Q values under step 1 in Table 1. Since Q45 is the smallest, we take OTUs 4 and 5 (orangutan and rhesus monkey) as the first pair of ...

362KB Sizes 0 Downloads 218 Views

Recommend Documents

Molecular Phylogenetics and Evolution 38
plete mitochondrial genome sequences of 10 species and a »14 kb sequence from an eleventh species ...... try of Nature Protection of Turkmenistan for hosting.

Introduction to phylogenetics using - GitHub
Oct 6, 2016 - 2.2 Building trees . ... Limitations: no model comparison (can't test for the 'best' tree, or the 'best' model of evolution); may be .... more efficient data reduction can be achieved using the bit-level coding of polymorphic sites ....

Molecular Phylogenetics and Evolution 38
A rank-free hierarchical representation of the taxonomy proposed here is shown in ...... Sarich, V.M., Wilson, A.C., 1973. Generation time and genomic evolu-.

Molecular phylogenetics of Boulengerula
Simon P. Loader1,2, Mark Wilkinson2, James A. Cotton3,4, G. John ... Our analyses identified genetic differences between several mtDNA clades that .... chain marked as dark areas. ..... Judged by the KH and Templeton tests, the best trees.

Molecular phylogenetics of the spider infraorder ... - CiteSeerX
logenetic markers (e.g., nuclear protein-coding genes), and the integration of both ..... manual.php/> (last accessed 20.01.06). Sanmartın, I., Ronquist, F., 2004.

Phylogenetics Theory and Practice of Phylogenetic Systematics.pdf ...
Retrying... Phylogenetics Theory and Practice of Phylogenetic Systematics.pdf. Phylogenetics Theory and Practice of Phylogenetic Systematics.pdf. Open.

Molecular phylogenetics of the spider infraorder ... - CiteSeerX
The following primary results are supported by both Bayesian and parsimony analyses of combined matrices representing ..... and 28S data partitions, the computer program MrModel- .... of 10–365 likelihood units, as compared to constrained.

Phoronid phylogenetics (Brachiopoda ... - Wiley Online Library
1Smithsonian Environmental Research Center, PO Box 28, 647 Contees Wharf Road, Edgwater, MD ... 24 taxa, phoronid rDNAs were combined with data from brachiopods and distant (molluscan) outgroups. ... E-mail: [email protected].

Introduction to phylogenetics using
Feb 22, 2016 - Three main classes of phylogenetic approaches are introduced, namely distance-based, maximum parsimony, and maximum likelihood methods. We also illustrate how to assess .... and adegenet [2] are here used essentially for their graphics

The Posterior and the Prior in Bayesian Phylogenetics
... [email protected]. 2School of Computational Science, Florida State University, Tallahassee, ... First published online as a Review in Advance on July 21, 2006.

Molecular Phylogenetics and the Diversification of ... - Semantic Scholar
Apr 3, 2014 - as a whole does not exhibit a significant signature of density-dependent diversification ..... Digital Distribution Maps of the Birds of the Western ...

The Posterior and the Prior in Bayesian Phylogenetics
the statistical behavior of the PP, especially in relation to the BP (Alfaro et al. .... clade will be found in the full data analysis and gives systematists a sense of ...

The Posterior and the Prior in Bayesian Phylogenetics
... University, Pullman,. Washington 99164-4236; email: [email protected] ...... An ad hoc approach could also be used to mimic the be- havior of a conservative ...

Molecular phylogenetics of the spider infraorder ...
sampling and the inclusion of additional character systems (more genes and morphology) are ..... and 28S data partitions, the computer program MrModel- test v.

Molecular phylogenetics of the hummingbird genus ...
spite a good knowledge of higher-level relationships in the family ... Ecology and Evolution, State University of New York, Stony Brook, NY 11794-5245, ..... University of California, Los Angeles; MVZ: Museum of Vertebrate Zoology, University ...

The Troubled Growth of Statistical Phylogenetics
1971, at that meeting in Ann Arbor, Gareth ... onomy Conference meetings, which were among the only .... This was reflected in available software such as PAUP ...

Phylogenetics of the southern African dwarf ...
De- spite the reported phenotypic diversity in this region corresponding to described species (Raw, 1995, 2001), ... Even with a lack of duplicate samples.