J Mol Evol (2006) 63:153–164 DOI: 10.1007/s00239-005-0163-7

Getting the Proto-Pax by the Tail Eugene Vorobyov, Ju¨rgen Horst Institut fu¨r Humangenetik, UKM, Vesaliusweg 12-14, D-48149, Mu¨nster, Germany Received: 4 July 2005 / Accepted: 31 January 2006 [Reviewing Editor: Dr. Juergen Brosius]

Abstract. Pax genes encode transcription factors governing the determination of different cell types and even organs in the development of multicellular animals. Pax proteins are characterized by the presence of three evolutionarily conserved elements: two DNA-binding domains, the paired domain (PD) and paired-type homeodomain (PtHD), and the short octopeptide sequence (OP) located between PD and PtHD. PD is the defining feature of this class of genes, while OP and/or PtHD may be divergent or absent in some members of the family. Phylogenetic analyses of the PD and PtHD sequences do not distinguish which particular type of the extant Pax genes more resembles the ancestral type. Here we present evidence for the existence of a fourth evolutionarily conserved domain in the Pax proteins, the pairedtype homeodomain tail (PHT). Our data also imply that the hypothetical proto-Pax protein most probably exhibited a complex structure, PD-OP-PtHDPHT, which has been retained in the extant proteins Pax3/7 of the ascidia and lancelet, and Pax7 of the vertebrates. Finally, based on structural considerations, a scenario for the evolutionary emergence of the proto-Pax gene is proposed. Key words: domain

Pax genes — Homeodomain — Paired

The sequences reported in this paper have been deposited in the GenBank database under accession numbers AY235576 and DQ322591. Correspondence to: Ju¨rgen Horst; email: [email protected]

Introduction Pax genes encode transcription factors that play fundamental roles in the development of multi-cellular animals (Metazoa). These genes are specific to the animal lineage and so far have not been found in unicellular organisms, fungi, or plants. In general, Pax proteins are characterized by the presence of three conservative elements: two DNA-binding domains, the paired domain (PD) and homeodomain (HD), and the short octopeptide sequence (OP) located between PD and HD. The paired domain, named after its first identification in the Drosophila gene paired (Frigerio et al. 1986), is the defining feature of this class of genes, while OP and HD may be dispensable. The HD is a characteristic of another large family of ancient transcriptional factors, which evolved before the split of plants, fungi, and Metazoa (for review see Gehring et al. 1994). The HD of Pax proteins belongs to a distinct class of HD-containing genes and is referred to as the paired-type homeodomain (PtHD) (Galliot et al. 1999). In vertebrates the Pax gene family consists of four groups of paralogous genes, Pax1/9, Pax3/7, Pax2/5/ 8, and Pax4/6. The genes of each group originated via duplication from four corresponding progenitor genes in an animal lineage leading to vertebrates. Four Pax genes, representing the respective progenitor types, are found as unique genes in the basal chordates, the lancelet (Cephalochordata) (Holland et al. 1995, 1999; Glardon et al. 1998; Kozmik et al. 1999), and ascidia (Urochordata), with the exception that ascidia have two Pax2/5/8 copies that resulted from a taxon-specific gene duplication (Wada et al. 2003). Data obtained from the completely sequenced

154

genomes of D. melanogaster and C. elegans, which represent the ecdysozoan clade of the Protostomia, also support the grouping of Pax genes into four main types, although revealing a presence of taxonspecific Pax forms (Jun et al. 1988; Hobert and Ruvkun 1999). Identification of the Pax genes in diploblastic animals indicates a very early origin of the Pax2/5/8 and Pax3/7 gene types (cnidarian PaxB and PaxD, respectively) and also reveals the existence of additional genes, PaxA and PaxC, which may represent other ancient Pax types, which were lost in the ancestor of chordates (Miller et al. 2000). Currently, there is insufficient sequence information on the Pax genes of nonchordate animals to allow a restoration of the complete evolutionary history of this family. Moreover, on the level of individual proteins it is not yet possible to assign clear orthology for some of the currently known Pax members found in Protostomia and Deuterostomia. The picture of Pax gene evolution is confusing, due partly to the presence of gene duplications and diversifications, which could be, in fact, taxon-specific. Another complication arises from the complex structure of the Pax proteins. The conservative domains PD, OP, and PtHD within one protein could have evolved independently in different organisms under different evolutionary pressures and then could have been independently reduced or even lost in different lineages. Two alternative scenarios have been proposed to explain the evolution of Pax genes. One is based on the assumption that the first Pax gene contained only PD (represented by PaxA/neuro) and the second Pax gene appeared as a result of fusion of this PD with a HD-containing gene (Catmull et al. 1998; Miller et al. 2000). Such capturing events could have happened more than once and give different primary Pax types (Galliot and Miller 2000). The other scenario considers only one capturing event followed by gene duplications that gave rise to the distinct Pax forms (Balczarek et al. 1997; Hoshiyama et al. 1998; Miller et al. 2000). In this model the PaxA gene is supposed not to represent the progenitor type, but a remnant form missing its HD. Recently, it has been proposed that the PD might originate from a DNA-transposase of the Tc1/ mariner transposons (Breitling and Gerber 2000). Previously, it has been noted that the Tc1 DNAbinding domain and the first half of the PD have a similar helix-turn-helix structure and share certain invariant amino acid residues (Franz et al. 1994; Vos and Plasterk 1994; Ivics et al. 1996). The superfamily of Tc1/mariner transposons has an ancient origin and is exceptionally widespread in living organisms, ranging from protozoa to vertebrates. That makes Tc1 a good candidate for a paired domain progenitor. Breitling and Gerber (2000) used the Tc1 se-

quences as an outgroup for the reconstruction of the Pax phylogenetic tree and found the most probable position of the root favoured a basal dichotomy between PaxD/3/7 plus Pax1/9 and the rest of the family (Fig. 1). These data support the second proposed evolutionary scenario. However, sequences of the Tc1 DNA-binding domain and PD are so divergent that their comparison does not allow distinguishing which particular type of the currently existing Pax genes more resembles the hypothetical ancestral type. Here we present evidence for the existence of an additional domain, termed the paired-type homeodomain tail (PHT), which was conserved in some of the Pax proteins. Furthermore, we infer that the first Pax gene most probably exhibited a complex structure, PD-OP-PtHD-PHT, which has been retained in the extant genes Pax3/7 of the ascidia and lancelet, and Pax7 of the vertebrates. Finally, based on structural considerations, we propose a scenario for the evolutionary history of the proto-PAX gene.

Materials and Methods Analysis of the Last Coding Exon of the Amphioxus Pax2/5/8 Gene Independent samples of the Amphioxus genomic DNA were kindly provided by A. Perevozchikov and A. Karabinos. A DNA fragment containing the putative PHT sequence was amplified with primers designed on the basis of the AmphiPax2/5/8 cDNA (accession no. AF053762): 5¢-CTATGTCAGGCAGTGATTACTCAT and 5¢-CATCCTGTTACCACCTTGTCA. The PCR products were gel-purified and subjected to direct automated sequencing using the BigDye Terminator Cycle Sequencing kit (ABI).

Sequence Search and Phylogenetic Analysis The nucleic acid and protein sequences were collected from databases and analyzed using the NCBI server http://www.ncbi.nlm.nih.gov and the HUSAR/UWGCG computer program package http://genome.dkfz-heidelberg.de/. The search for the PtHD and PHT sequences in the fish genomes was done using the following resources: Fugu rubripes, http://www.ncbi.nlm.nih.gov/BLAST/Genome/fugu. html; and Tetraodon nigroviridis, http://www.bioinformatics.tll. org.sg/cgi-bin/GloBLAST/service?rm=blastF&db=tetraodon. Genes of Ciona intestinalis and Ciona savignyi were analyzed using the servers http://genome.jgi-psf.org/ciona4/ciona4.home.html and http://www.broad.mit.edu/annotation/ciona, respectively, and Ciona intestinalis cDNA resource http://ghost.zool. kyoto-u.ac.jp/indexr1.html. The Drosophila genome and genes were analyzed using a server developed by the FlyBase Consortium http:// flybase.net/genes/. The amino acid sequences were first aligned with the ClustalW program (http://www.ebi.ac.uk/clustalw) and then adjusted manually. Phylogenetic analyses were performed using the 60-amino acid (aa) residues of HDs, 128-aa residues of PDs, and 21-aa residues of PHTs. The following methods and software were used for the evolutionary analyses: minimum evolution (Rzhetsky and Nei 1992) and neighbor-joining (Saitou and Nei 1987) methods with the JTT substitution model (Jones et al. 1992) and the complete-dele-

155 PD

OP

HD

Tc1 transposases Pax1/9/meso PaxD/3/7/gooseberry/paired PaxB/2/5/8/sparkling PaxC PaxA/neuro Pax4/6/eyeless Fig. 1. Simplified phylogenetic tree of the Pax family according to the scenario proposed by Breitling and Gerber (2000) and schematic representation of the Pax protein structure. The groups are defined by sequence similarity and domain organization: Pax1/9 contain no HD; PaxA/C/4/6, no OP; and Pax2/5/8, a partial and diverged HD. Representatives of the PaxA and -C types are not found in the Chordata. PaxB, exemplified by the cnidarian and

Pax3/7 Pax3/7 Pax3/7 Pax3/7 Pax7 Pax7 Pax7 Pax7 PAX7

Tunicate Tunicate Tunicate Lancelet Lamprey Fish Chick Mouse Human

Cs Ci Hr Bf Pm Dr Gg Mm Hs

echinodermata genes, encodes the complete HD and is considered here as a probable paralogue of Pax2/5/8. Tc1 transposase is used as an outgroup that places the root between PaxD/1/9/3/7 and the rest of Pax types. PaxA/B/C/D—cnidarian genes; Pax1/9/3/7/2/5/8/ 4/6—genes of vertebrates; meso/gooseberry/paired/sparkling/neuro/eyeless—genes of Drosophila. PD, paired domain; OP, octopeptide; HD, homeodomain.

TTATSHNSYASCQYSPYGQ ASGDY---GSAGIAALRMKSREHSAALGLIPVGAG--PSI-QHAY SS.A..Q............ .....---.................SI......G.--..V-.... QYH.T..PF......G... .AS..---EN...T................S....GAT.M-.P.. SSSAHAY.MDGSWVQGANS TDFNS---N.L......Q............Q.AG.---AMA.-.. GYGLEPMP-.AY..GQ... TAA..LAKNVGLGGQR.L.LG....V...LQ.A-----ETG.-.. SYSVDPVT-.GY...Q... TAV..LAKNVSLSTQR...LGD...V...LQ.------ETG.-.. SYSVDPV--.GY..GQ... TAV..LTKNVSLSTQR...LG....V...L..------ETG.-.. GYSVDPV--.GY...Q... TAV..LAKNVSLSTQR...LG....V...L..------ETG.-.. GYSVDPV--.GY..GQ... TAV..LAKNVSLSTQR...LG....V...L..------ETG.-.. Exon 8 | Exon 9

Fig. 2. Comparison of the Pax3/7 and Pax7 C-terminal protein regions of the genes from Urochordata (tunicate), Cephalochordata (lancelet), and Vertebrata species. Pax3/7 of Ciona savignyi (Cs) is used as a reference sequence. Identical amino acid residues and gaps in the alignment are shown by dots and dashes, respectively. (Bf) Branchiostoma floridae; (Ci) Ciona intestinalis; (Cs) Ciona savignyi; (Dr) Danio rerio; (Gg) Gallus gallus; (Hr)

Halocynthia roretzi; (Hs) Homo sapiens; (Mm) Mus musculus; (Pm) Petromyzon marinus. Sequence identifiers are as follows: Pax3/7(Cs)—scaffold 317; Pax3/7(Ci)—scaffold 209; Pax3/7(Hr)—BA12289; Pax3/7(Bf)—AAF89581; Pax7(Pm)— AAL04156; Pax7(Dr)—AAC41255; Pax7 (Gg)—BAA23005; Pax7 (Mm)—AAG16663; PAX7(Hs)—DQ322591.

tion option and bootstrap analysis (Felsenstein 1985) implemented in the MEGA software (Kumar et al. 2004). In order to reduce the number of HD sequences for the phylogenetic analysis presented in Fig. 6 we used the protein consensus sequences representing the respective groups of orthologous genes. The consensus sequences were inferred manually, based on the ClustalW comparisons of orthologues, by replacing residues, which are not conserved, with a symbol (?), and the residues of similar amino acids in the respective positions with the ones found most frequently. The pairwise deletion option was used for the tree constructions of the consensus sequences. All phylogenetic trees presented in this study show only topology. Lists of sequences, accession numbers of which are not provided, are available from the first author upon request.

species, which did not duplicate and as such could have evolved less far from their common progenitor. The sequence encoded by the last coding exon of the vertebrate Pax7, but not Pax3, was found to share a high similarity with the corresponding C-terminal motif of Pax3/7. This suggests that Pax3 had lost the respective exon after the duplication. As shown in Fig. 2 the C-termini encoded by the last exons of the Pax3/7 genes of ‘‘lower’’ chordate animals and Pax7 of vertebrates share a short conserved region. To gain insight into the nature and significance of this conserved sequence, we performed a search in the protein database. The search revealed a number of targets showing a high similarity that suggests their possible homology. Importantly, all these targets were found to be located in the C-termini of the paired-type homeodomain proteins. A literature search revealed that this C-terminal sequence was previously described as a conserved part of the Otp, aristaless, and Rx gene family members and consequently was named the OAR domain (Simeone et al. 1994; Furukawa et al. 1997; Miura et al. 1997) or paired tail (Mathers et al. 1997) or C-peptide (Galliot et al. 1999). The latter article was devoted to evolutionary

Results and Discussion The C-Termini of Pax3/7 and Pax7 Contain an Evolutionary Conserved Domain Pax3 and Pax7 genes derived via duplication in an animal lineage leading to vertebrates. In order to distinguish which of these genes retained more features of their hypothetical ancestor, we compared all available Pax3 and Pax7 sequences with the single Pax3/7 genes of Urochordata and Cephalochordata

156

analysis of all PtHD-containing genes, including Pax, that were known at that time. In this study the Cpeptide motif was also found in the Pax3/7 protein of Halocynthia roretzi and as a putative sequence in the PaxB protein of Hydra littoralis (Galliot et al. 1999). Thus, the presence of this domain in the Pax proteins could indicate their phylogenetic relation to a certain subfamily of the homeobox-containing genes, which in turn may shed light on the evolutionary origin of the Pax genes. However, this subject has not been closely investigated so far.

Genomic Quest for the C-Terminal Domain In order to extract more information about the Cterminal domain we performed a global search in all currently available sequence databases and several sequenced genomes. For an initial search we used the BlastP program with a very high expectation value (e = 10000) and a set of probes representing C-termini of the well-known paired-type HD proteins. Afterwards, a growing number of identified targets allowed us to define positions of the invariant amino acid residues and to use this preliminary consensus as a more sensitive probe. Applying the TblastN program, we analyzed genomes of seven organisms: human, mouse, two species of fish (Fugu rubripes and Tetraodon nigroviridis), tunicate (Ciona intestinalis), fruit fly (Drosophila melanogaster), and nematode (Caenorhabditis elegans). In each case, when a related sequence was found, the analysis was complemented with a search in the surrounding area for the presence of a homeodomain. All the genomic sequences found were then translated in order to obtain the reading frames harbouring the corresponding targets. To determine the identity and colocalization of the resulting partial proteins, their sequences were compared with the nonredundant and EST databases. Finally, we collected and analyzed the potential candidate sequences from genomes of each organism. In support of previous observations, this domain was found to be associated exclusively with homeodomains of the paired type, to which the Pax HD belongs. In particular, this sequence was always located downstream of the homeodomains very close to or in the C-termini of these proteins and often encoded by a separate exon. Regarding this exclusive specificity, we prefer to term this domain the paired-type homeodomain tail (PHT), based on the modification of a previous name. The consensus of the PHT consists of 21 amino acid residues, RxSSIAxLRLKAKEHSxxLxx, in which 7 positions are almost invariant (S4, A6, L8, R9, K11, A12, and H15; underlined), 6 positions are gene type-specific (designated ‘‘x’’), and the other 8 are most often occupied by residues of similar prop-

erties. The gene type-specific residues are conserved among the orthologous proteins of different organisms, although they do not show a common physiochemical property at the respective positions of different gene types. A more advanced form of the PHT consensus, suitable for a search in the Protein Information Resource (PIR) (Wu et al. 2003), is the following: [RKCM]-x-[STNA]-S-[IVL]-[AE]-x-LR-[LMAR]-[KR]-A-[KRQL]-[EQK]-[HF]-[ASTI]-xx[LVIMG]-xx (possible variations of amino acid residues are given in brackets). In human, mouse, Fugu, and Tetraodon we distinguish 10 PtHD gene types, except Pax, that are characterized by the presence of PHT: Al/Arx, Alx/ Cart, Drg11, Rx, Chx10, Prx, OG12, Otp, Mbx, and Ptx (named according to the most studied representatives) (Fig. 3). The term ‘‘gene type’’ signifies here a certain gene or a group of paralogous genes. In the vertebrates most of these gene types are represented by two or more paralogues. In Ciona, definite orthologues of the genes of each type, except Drg11, were found as single genes; however, the PHT sequence could be detected in only one gene, Ptx. In Drosophila, seven PHT-bearing genes were identified, Al/Arx, Drg11, Rx, Chx10, OG12, Otp, and Ptx. Among those, orthologues of Chx10, OG12, and Drg11 were identified de novo using as ‘‘fingerprints’’ their PHTs (found in the downstream exons) matching C-termini of the corresponding chordate genes (PtHDs are CG15782, CG5369, and CG2808 and PHTs are CG15783, CG13141, and CG10017, respectively; Suppl. 1). No PHT sequences could be detected in the nematode genome. In order to exclude the possibility that PHT was undetected due to its high divergence in this animal, we compared the fulllength PtHD proteins of C. elegans to each other. Assuming that evolution of this domain, being under functional constraint, should have proceeded coordinately in the paralogous genes, it could be expected that the proteins would exhibit certain homology in their C-termini, quasi-PHT, even if it was dissimilar to the primary PHT sequence. The comparison revealed no similarity in the sequences downstream of HD (data not shown), suggesting the complete loss of the PHT domain in genes of the worm. This conclusion was also supported by comparison of all its full-length PtHD proteins with their orthologues in the database. There were no similarities found between these proteins and others of this type aside from their HD and OP sequences, implying that only these domains were conserved in the respective proteins of the nematode. Identification of PHT in Pax2/5/8 and PaxA The search for PHT, inspired by the finding of a related sequence in the Pax3/7 and Pax7 genes, resulted

157

Vertebrata human, mouse, fish OP Prx/Pmx Rx Chx10 Al/Arx Alx/Cart OG12 Otp Ptx Mbx Drg11 Gsc Anf Otx Arix Unc4

PtHD

Urochordata Ciona intestinalis

Arthropoda D. melanogaster

Nematoda C. elegans

PHT Prx/Pmx Rx Chx10 Al/Arx Alx/Cart OG12 Otp Ptx Mbx

Rx Chx10 Al/Arx

Rx Chx10 Al/Arx

OG12 Otp Ptx

Ptx

Gsc

Drg11 Gsc

Gsc

Otx Arix Unc4

Otx Arix Unc4

Otx Arix Unc4

Genes - identified de novo Fig. 3. Schematic representation of the protein structure of the PtHD gene types and their presence in genomes of vertebrates, tunicate, fly, and nematode.

in a collection of samples (Suppl. 2). A subsequent comparative analysis of these sequences revealed that PHT of Pax3/7 is more similar to PHTs of the OG12 and Mbx proteins than to others (Figs. 4A, B, and D). However, the PHTs of Pax7 proteins of higher animals show some divergence in the first aminoterminal residues (Figs. 2 and 4B). This fact indicates that the progenitor gene, that gave rise to the extant Pax3 and Pax7 genes of vertebrates, contained a genuine PHT domain, which was later inherited only by Pax7 and acquired a specific modification. Extending the search to available sequences of other organisms, we found PHTs in the Pax2/5/8 genes of lancelet (Branchiostoma florida) and tunicates (Phallusia mammilata and Ciona intestinalis) and in the PaxA genes of two species of Hydra. The PHT of the lancelet Pax2/5/8 gene was first detected as a putative sequence in the 3¢-end of its mRNA, located some nucleotides downstream of the reading frame (RF) predicted for this gene (accession no. AF053762). In order to confirm this RF we sequenced the corresponding genomic region, which was obtained by PCR from two independent DNA samples of lancelet. We found a two-nucleotide gap in the 3¢-end of the previously reported sequence, leading to a premature termination of the respective RF. Translation of the corrected sequence produced a prolonged C-terminus, ending with a PHT (accession no. AY235576). As shown in Fig. 4 (A, C, and D), the PHT of the lancelet Pax2/5/8 is most related to PHTs of the OG12 and Mbx proteins. The PHT of the ascidian Pax2/5/8 is slightly divergent at four of its N-terminal amino acid residues and contains an insertion of serine between position 16 and position

17 and, also, shows similarities to the PHT sequences of Mbx and OG12 proteins (Fig. 4D). The PHTs of the PaxA genes of two hydra species terminate the respective proteins, retaining only 15 amino-terminal residues of the consensus. The PaxA PHTs are more divergent compared to others but still retain six of the eight invariant residues, including one change to a similar residue, in the core region (Fig. 4C). This is an intriguing case: considering the absence of the PHT sequences in the currently known Pax proteins of other cnidaria species, one can assume that there must be a special reason for the PaxA gene of Hydra, lacking the HD domain, to keep those residues preserved. There are two other Pax genes, which demonstrate the possible ‘‘independence’’ of PHT from HD. The PHT-containing Pax2/5/8 genes of B. florida and P. mammilata show a highly divergent and totally absent HD sequences, respectively (Figs. 4D and 5). This fact indicates that the function of PHT may be required even in the absence of an HD and could be perhaps associated with the function of PD. Our analysis does not support the presence of PHT in the PaxB gene of Hydra littoralis, which was previously considered as encoding the putative C-peptide (Galliot et al. 1999). The putative sequence does not match the PHT consensus and the respective region is not conserved in other PaxB orthologues from Cnidaria and Porifera (data not shown). In conclusion, these data reveal that Pax genes of different types encode an additional evolutionary conserved domain, characteristic of other PtHD genes, suggesting their common evolutionary antecedents.

158

A

Pax2/5/8A-Phm-CAB96396

58

Mbx-Gg-Cont10.311 59

48

Dmbx-Bf-AAT66431 Pax2/5/8-Bf-AAF89581 OG12b-Mm-AAC52831

74

SHOXa-Hs-CAA72299

57

54

OG12-Ol-AAC05613

53 83

92

52

Shox-Bf-AAL83210 OG12-Apm-XP 394583 OG12-Dm-CG13141 Pax3/7-Hr-BA12289

80 94

Pax3/7-Cs-Sc317 Ptx-Dm-O18400 PTX2-Hs-NP 000316

39

Prx1-Gg-Q05437

15 99

Prx2-Gg-Q90963

74

Arx-Hs-Q96QS3

Alx-Sp-AAP34698

16

Arx-Dr-O42115 Al-Dm-Q06453

20 9

97

Al-Ag-XP 317481

57

Rx2a-Xl-O42567

Rx-Hs-Q9Y2V3 90

Rx1-Xl-O42201 98

31

Rx3-Dr-O42358 Rx-Dm-Q9W2Q1

20 98 52

Drg11-Hs-XP 060970 Drg11-Rn-NP 665710 Drg11-Dm-CG10017

52 66

Otp-Sk-AAP79292 Otp-Pv-AAM33145

37

97

Otp-Pl-O76971 Chx10-Hs-P58304 80

Chx10-Dm-CG15783

85

ALX4-Hs-Q9H161 Alx4-Gg-AAC61772 CART1-Hs-CAD90155

65 99

Fig. 4. PHT sequences of the Pax proteins. A Unrooted tree showing relationships between the PHTs of the Pax and other PtHD-containing proteins. The PHTs of Pax3/7, Mbx, and OG12 proteins cluster together and form a group separated from others. In particular, this group can be characterized by the last three common carboxy-terminal residues, LGL. This fact may indicate a common ancestry of these genes, although their convergent evolution cannot be excluded. The tree was constructed using the neighbor-joining method with the JTT substitution model. Orthologous sequences in the Pax, Mbx, and OG12 cluster are marked by identical symbols. Newly identified PHTs of the Drosophila OG12, Drg11, and Chx10 proteins are shown in boldface. B Comparison of the PHT sequences of Pax3/7, Pax7, OG12, and Mbx. OG12 of Drosophila is used as a reference sequence. C Comparison of PHTs of the Pax2/5/8, PaxA, OG12, and Alx/Cart proteins. OG12 of mouse is used as a reference sequence. The underlined residues are almost invariant in the PHT consensus. The frames surround the respective conserved positions in all sequences.

Cart1-Gg-XP 425445

D Comparison of PHTs of the Pax2/5/8, Pax3/7, OG12, Mbx, and Alx/Cart proteins. OG12 of human is used as a reference sequence. Identical amino acid residues and gaps in the alignment are shown by dots and dashes, respectively. Different reference sequences are taken in order to facilitate the visual comparison. The PHT sequences of Pax, Mbx and OG12 are distinguished from others by sharing the last three residues, LGL. The PHTs of Alx/Cart, which are less related to those of Pax, are shown here to stress the similarities among the PHTs of Pax, Mbx, and OG12. (Ag) Anopheles gambiae; (Apm) Apis mellifera; (Bf) Branchiostoma floridae; (Ci) Ciona intestinalis; (Cs) Ciona savignyi; (Dm) Drosophila melanogaster; (Dr) Danio rerio; (Fr) Fugu rubripes; (Gg) Gallus gallus; (Hr) Halocynthia roretzi; (Hs) Homo sapiens; (Mm) Mus musculus; (Ol) Oryzias latipes; (Pl) Paracentrotus lividus; (Pv) Patella vulgata; (Phm) Phallusia mammilata; (Rn) Rattus norvegicus; (Sk) Saccoglossus kowalevskii (Sp) Strongylocentrotus purpuratus; (Tn) Tetraodon nigroviridis; (Xl) Xenopus laevis.

159 B OG12-Dm-CG13141

Og12b-Mm-AAC52831 SHOTb-Hs-CAA05342 SHOXa-Hs-CAA72299 Shox-Bf-AAL83210 Pax3/7-Hr-BA12289 Pax3/7-Cs-Sc317 Pax3/7-Ci-Sc209 Pax3/7-Bf-AAF89581 Pax7-Mm-AAG16663 Dmbx-Bf-AAT66431 MbxL-Dr-AAM90587 Mbx-Gg-Cont10.311 MbxL-Hs-AAM90590 C Og12b-Mm-AAC52831

SHOTb-Hs-CAA05342 OG12-Fr-Sc2448 SHOXa-Hs-CAA72299 Shox-Bf-AAL83210 OG12-Dm-CG13141 Pax2/5/8-Bf-AAF89581 PaxA-Hl/Hm-AAB58290 Alx4-Tn-CAG08730 Alx4-Gg-AAC61772 ALX4-Hs-Q9H161 Cart1-Xl-Q91574 CART1-Hs-CAD90155

D MbxL-Hs-AAM90590

MbxA-Dr-AAL58532 Mbx-Gg-Cont10.311 Dmbx-Bf-AAT66431 Pax2/5/8A-Phm-CAB96396 Og12b-Mm-AAC52831 SHOTb-Hs-CAA05342 OG12-Fr-Sc2448 SHOXa-Hs-CAA72299 Shox-Bf-AAL83210 OG12-Dm-CG13141 Pax2/5/8-Bf-AAF89581 Pax3/7-Cs-Sc317 Pax3/7-Hr-BA12289 Pax3/7-Ci-Sc209 Pax3/7-Bf-AAF89581 Alx4-Tn-CAG08730 Alx4-Gg-AAC61772 ALX4-Hs-Q9H161 Cart1-Xl-Q91574 CART1-Hs-CAD90155 Fig. 4.

K . . . . E G G N N Q . . .

S N N N N N . . . V I T T T

S . . . N A A A L . N T T T

S . . . . G G G G L . . . .

I . . . . . . . . S . . . .

A . . . . T . . . T E E E E

D . . . . A A A A Q S N N N

L . . . . . . . . R . . . .

R . . . . . . . . . . . . .

M L L L L . . . Q . . L L L

K . . . . . . . . . R R R R

A . . . . S S S S L . . . .

K . . R R R R R R G R . . .

K . . . . E E E E E Q Q Q Q

H . . . . . . . . . . . . .

S A A A A . . . . . A A A A

E A A . . A A A A A A A A A

S A A A A A A . A V A . . .

L . . . . . . I . . . . . .

G . . . . . . . . . . . . .

L . . . . . . . . . I . . .

K . . . . . . M . . . R R

N . . . . S D G S S T S S

S . . . N . I Q . . . . .

S . . . . . . . . . . . .

I . . . . . L M . . . . .

A . . . . . . . . . . . .

D . . . . . . A A A A V V

L . . . . . . . . . . . .

R . . . . . . . . . . . .

L . . . . M M G M M M M M

K . . . . . . G . . . . .

A . . . . . . H . . . . .

K . . R R . . V . . . . .

K . . . . . E S E E E E E

H . . . . . . . . . . . .

A . . . . S T G S S S T T

A . E E E E . N . . . . .

A . . . . S . * . . . N N

L . . . . . .

G . . . . . .

L . . . . . .

I I I I I

S S S S S

W W W W W

K . . Q E . . . . . . . G E G N . . . R R

T . . I K N N N N N S D S N S S S S . S S

T . . N N S S S S N S I A A A L S S S S S

S . . . E . . . . . . . G G G G . . . . .

I . . . . . . . . . . L . . . . . . . . .

E . . . V A A A A A A A A T A A A A A A A

N . . S . D D D D D D D A A A A A A A V V

L . . . . . . . . . . . . . . . . . . . .

R . . . . . . . . . . . . . . . . . . . .

L . . M M . . . . . M M M M M Q M M M M M

R . . . . K K K K K K K K K K K K K K K K

A . . . T . . . . . . . S S S S . . . . .

K . . R . . . . R R . . R R R R . . . . .

Q . . . D K K K K K K E E E E E E E E E E

H . . . . . . . . . . . . . . . . . . . .

A . . . S . . . . . S T S S S S S S S T T

– S – – – – -

A . . . . . . E E E E . . . . . . . . . .

S . . A P A A A A A . A A A . A A A A N N

L . . . V . . . . . . . . . I . I I I I I

G . . . . . . . . . . . . . . . S S S S S

L . . I . . . . . . . . . . . . W W W W W

Continued.

The Ancestor of Pax Genes Encoded Four Domains: PD, OP, PtHD, and PHT Based on the comparison of the paired domain sequences, the Pax proteins can be divided into two supergroups: PaxD/3/7/1/9 and the rest of the family, represented by the Pax6/4, PaxA/neuro, and PaxB/2/ 5/8 groups (Fig. 5). This classification is strongly

supported by the phylogenetic analyses performed in other studies in which different computational methods were employed and sets of sequences investigated (Sun et al. 1997; Balczarek et al. 1997; Hoshiyama et al. 1998; Wada et al. 1998; Gro¨ger et al. 2000). In all the phylogenetic trees obtained, PaxD/3/7/1/9 is placed into a basal position. This basal position is also supported by the analysis, where

160

PD 45 44

OP

HD

PHT

PAX5-Hs-NP 057953 PAX2-Hs-NP 003978

27 Pax2/5/8-Bf-AAC12733 20 Sparkling-Dm-AAB86598 59

Pax2/5/8-Phm-CAB96396 PAX8-Hs-NP 003457

80 66

Pax2/5/8-Pl-AAB70245 PaxB-Hl-AAB58291

37

100 95

PaxA-Hl-AAB58290 PaxA-Cq-AAB58292

43 PaxA-Am-AF053458 PaxC-Am-AAC15711

77

Poxn-Dm-CAA41721 Pax6-Bf-CAA11364

27

Eyeless-Dm-O18381

35 49 57

PAX6-Hs-NP 000271 Pax6-Lo-AAB40616 Toy-Dm-AAD31712

30

99

Pax6-Pl-A57374 PAX4-Hs-NP 006184

60 44

Paired-Dm-P06601 Gsbd-Dm-P09082 Gsbp-Dm-P09083

52 82

PAX7-Hs-NP 002575 PAX3-Hs-NP 000429

95 44

Pax3/7-Bf-AAF89581 39

Pax3/7-Hr-BA12289 PaxD-Am-AAF64461 Pax1/9-Pl-AAB69868

74 89

PAX1-Hs-NP 006183 45

61

PAX9-Hs-NP 006185 Pax1/9-Bf-AAA81364

99

Pax1/9-Hr-BAA74835

40 96

Pax1/9-Ci-BAA74829 Poxm-Dm-S06950 Hypothetical proto-Pax

Fig. 5. Unrooted phylogenetic tree of the paired domain sequences supporting the subdivision of the Pax gene family into two supergroups. The tree was obtained using the neighbor-joining method with the JTT substitution model and the subsequent bootstrap test of phylogeny. The PHT-bearing Pax genes are marked in boldface, and their respective protein structures are

shown. (Am) Acropora millopora; (Bf) Branchiostoma floridae; (Ci) Ciona intestinalis; (Cq) Chrysaora quinquecirrha; (Dm) Drosophila melanogaster; (Hl) Hydra littoralis; (Hr) Halocynthia roretzi; (Hs) Homo sapiens; (Lo) Loligo opalescens; (Pl) Paracentrotus lividus; (Phm) Phallusia mammilata.

the Tc1 transposase sequence (the putative PD progenitor) was used as an outgroup (Breitling and

Gerber 2000). On the other hand, PaxA/neuro and PaxB/2/5/8 most probably evolved from a common

161

OP

HD

PHT

Wariai-Dd Anf

55 76

Prx/Pmx Rx

59

54 66 9

79

Alx/Cart Chx10 Al/Arx OG12

PtHD genes

38

36

Unc4

52

Ptx

33

Mbx Gsc Drg11

45

Otx

37

Otp

68 64

Arix

26 81

Antp Hox

54

Xlox

60 74

42

Evx Mox

25

only in Metazoa

Barx Emx

43

75

Not Dll

86

Lbx

2 30

Gsx

60

Msx

19 68 59

Gbx Tlx NK1

35

NK2

65 57

Hex Cdx

77

Dbx En Hbx2-Dd

85

Six-type1 Six-type2 Exd-d/I

91

in Plants, Fungi and Metazoa

Ath1-At-d/i

22

Meis

82

KN

61 24

Fig. 6. Phylogenetic tree of the main types of homeodomain sequences. The protein structures of the genes discussed in the text are shown. The consensus sequences representing the respective groups of orthologues were used for the tree construction (see Materials and Methods). The HD sequences of genes of plants and

P53147-Yc-d/i

fungi were used as an outgroup. The tree was obtained by the neighbor-joining method with the JTT substitution model and supported by the bootstrap test of phylogeny. Hbx2 and Wariai are genes of the slime mold, Dictyostelium discoideum.

162 Urmetazoa

Cnidaria

Triploblasts

Gsc-like Pax4/6

OP PtHD PHT

+

Pax2/5/8

Al-, Chx10-, Rx-like

PaxA

lost in PaxA/Poxneuro Chordata

+ Tc1/PD

specific to Cnidaria

PaxB/A/2/5/8/4/6 PaxC PaxB

Proto-Pax

PaxD

Pax3/7 and Pax7

PaxD/1/9/3/7 Pax1/9/meso

Fig. 7. The scenario for emergence of the proto-Pax and the following evolution of the Pax genes. One of the preexisting OP-HD genes (see Fig. 6) duplicated and gave rise to a number of the OPPtHD genes. Next, one of the OP-PtHD genes acquired a PHT domain and also was expanded by duplications. Among the currently existing genes, the triple domain structure, OP-PtHD-PHT, is present only in the Al/Arx, Chx10, and Rx genes (the Prx gene type also contains all three domains, but is only found in the chordates and, as such, may represent a chordate specific type). The last step in formation of the first Pax gene was the insertion of a Tc1-like transposase in front of an OP-PtHD-PHT gene. The resulting protoPax, Tc1/PD-OP-PtHD-PHT, duplicated and gave rise to the two currently existing supergroups of the Pax family, PaxD/3/7/1/9 and the rest of the family, represented by the PaxB/2/5/8, PaxA/neuro,

and Pax6/4 gene types (see Fig. 5). The following evolution of the Pax gene types was mainly based on divergence and loss of some of these four domains of the progenitor. Only Pax3/7 and Pax7 of chordates preserved the full ancestral structure. In contrast to other studies, we do not consider PaxC as a distinct gene type. PaxC is found only in the coral A. millepora and is most likely a Cnidaria specific gene. The PD sequence of PaxC clusters with sequences of PaxA proteins from Cnidaria species, rather than with any Pax gene type from other animals (Fig. 5). The PD sequences of PaxC and PaxA share several unique diagnostic residues that differentiate them from PDs of other Pax proteins (Y21, M22, C32, E38, and L44). Therefore, we distinguish only three gene types in Cnidaria: PaxA/C, PaxB, and PaxD. Five Pax gene types are present in triploblastic animals, except that PaxA/poxneuro is lost in Chordata.

ancestor that was already separated from a progenitor of PaxD/3/7/1/9. These data imply that the PaxD/ 3/7/1/9 group arose from one of two genes resulting from duplication of a proto-Pax. Considering the presence of PHT domain in proteins of both supergroups, Pax3/7 and PaxA/2/5/8, it is possible to conclude that this domain was also present in their ancestor, a proto-Pax (Fig. 5). Based on this assumption, the structure for the first Pax gene can be deduced. This hypothetical gene most likely encoded a protein consisting of four domains, PD-OPPtHD-PHT, which are characteristics of the extant Pax3/7 and Pax/2/5/8 proteins of lancelet. Thus, it seems that the proto-Pax gene owned this complex structure from its very beginning. Thus, further evolution of its descendants can be explained by a subsequent loss of some of these pre-existing domains. In the known Pax proteins the ancestral-like structure is retained only in PAX3/7 of the ascidia and lancelet and PAX7 of the vertebrates (Pax2/5/8 of the lancelet contains a highly divergent HD). Pax genes from many other taxa are yet to be discovered. However, it is important to note that the currently known Pax genes of nonchordate animals do not exhibit the full set of domains of proto-Pax, suggesting that in terms of structure they are not representing the ancestral forms.

Assembling the Proto-Pax, Based on Structural Considerations The fact that proto-Pax could have appeared as a gene bearing the complex structure, composed of conserved elements pertaining to other gene families, demands explanations. The oldest domain of the virtual proto-Pax protein is HD (found in all eukaryotes), and obviously it was the central conserved element around which the other domains evolved. The paired-type HD represents one of the two superfamilies that are specific to animals (Fig. 6). The other superfamily consists of genes that were most likely derived from the four related ancient clusters, EHGbox, Extended HOX, NKL, and paraHOX (Pollard and Holland 2000). The octopeptide sequence, also known as the Eh1 domain, can be found in genes of both superfamilies (S. Smith and Jaynes 1996; Muhr et al. 2001), suggesting its pristine association with HD (PtHD genes with OP: Gsc, Al/Arx, Rx, Prx, Chx10, and Hes/Anf; non-PtHD genes with OP: En, Not, Gsx, Msx, Tlx, Dbx, NK1, NK2, Emx, and Gbx). On the other hand, the PHT is a property of the PtHD genes only. Therefore, it is most plausible that one of the genes, exhibiting the structure OP-PtHD, acquired PHT and gave rise to an OP-PtHD-PHT

163

gene that later expanded by duplications (OPPtHD-PHT genes: Al/Arx, Rx, Chx10, and Prx). In turn, this would imply that the PtHD-PHT genes are secondary and represent remnants missing the OP sequence (PtHD-PHT genes: Mbx, Ptx, Otp, Drg11, OG12, and Alx/Cart). Definite orthologues of the chordate genes that structurally represent all these intermediate steps are found in the Cnidaria species: Gsc–OP-PtHD; Otx–PtHD only; Al/Arx– OP-PtHD-PHT; and Mbx, Otp–PtHD-PHT (Broun et al. 1999; K. Smith et al. 1999, 2000; Muller et al. 1999; Bridge et al. 2000), although the cnidarian genes preserved only their PtHD sequences (an exception is Al of Hydra, which retained OP). The last step that led to the birth of the proto-Pax gene, according to this scenario, was the insertion of a Tc1-like transposon in front of one of the OPPtHD-PHT genes, followed by a conversion of the transposase to PD (Fig. 7). Distinct Pax genes, representing basic gene types of this family (except for Pax4/6), are also found in the Porifera and Cnidaria species, suggesting that the evolutionary events described above took place a long time before the radiation of the Metazoa. It is intriguing that PtHDs of these genes had evolved and acquired their extant shape already in primitive animals, and must have preceded the emergence of the morphological diversity of the animal kingdom. It seems that the identity of different types of the PtHD domain was fixed during pre-Cambrian time and that the following explosive animal body plan evolution had nearly no effect on the PtHD sequences. Considering this fact, one can conclude that solely sequence information on PtHDs from currently existing animals of all phyla would never be enough for the direct reconstitution of their phylogeny. Phylogenetic analysis of PtHDs shows that most of them are quite distant from each other; even the homeodomains of Pax genes are not clustered together on the evolutionary tree but, rather, are joined to PtHDs of different non-Pax types (data not shown). Therefore, the structural approach, applied in this study, presents an additional opportunity that can allow large-scale reconstruction of the pedigree of PtHD-encoding genes. In conclusion, identification of the PHT domain in the Pax proteins provides one more clue that will help restoration of their evolutionary history. More PtHD and Pax genes are yet to be identified in animals of other taxa, outside the chordates, ecdysozoans and cnidarians. Nevertheless, now we predict that animals, which are directly ancestral to the chordates, most probably retain a Pax gene encoding the full proto-Pax-like structure.

Acknowledgments. We are grateful to C.E. Cook and J. Schmitz for critical reading of the manuscript, valuable comments and helpful discussions. We also thank A. Perevozchikov and A. Karabinos for providing the Amphioxus genomic DNA. This study was supported by the Deutsche Forschungsgemeinschaft. E.V. dedicates this paper to I. Vorobyova.

References Balczarek KA, Lai ZC, Kumar S (1997) Evolution and functional diversification of the paired box (Pax) DNA-binding domains. Mol Biol Evol 14:829–842 Breitling R, Gerber JK (2000) Origin of the paired domain. Dev Genes Evol 210:644–650 Bridge DM, Stover NA, Steele RE (2000) Expression of a novel receptor tyrosine kinase gene and a paired-like homeobox gene provides evidence of differences in patterning at the oral and aboral ends of hydra. Dev Biol 220:253–262 Broun M, Sokol S, Bode HR (1999) Cngsc, a homologue of goosecoid, participates in the patterning of the head, and is expressed in the organizer region of Hydra. Development 126:5245–5254 Catmull J, Hayward DC, McIntyre NE, Reece-Hoyes JS, Mastro R, Callaerts P, Ball EE, Miller DJ (1998) Pax-6 origins: implications from the structure of two coral pax genes. Dev Genes Evol 208:352–356 Felsenstein J (1985) Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39:783–791 Franz G, Loukeris TG, Dialektaki G, Thompson CR, Savakis C (1994) Mobile Minos elements from Drosophila hydei encode a two-exon transposase with similarity to the paired DNA-binding domain. Proc Natl Acad Sci USA 91:4746–4750 Frigerio G, Burri M, Bopp D, Baumgartner S, Noll M (1986) Structure of the segmentation gene paired and the Drosophila PRD gene set as part of a gene network. Cell 47:735–746 Furukawa T, Kozak CA, Cepko CL (1997) Rax, a novel paired-type homeobox gene, shows expression in the anterior neural fold and developing retina. Proc Natl Acad Sci USA 94:3088–3093 Galliot B, Miller D (2000) Origin of the anterior patterning. Trends Genet 16:1–4 Galliot B, de Vargas C, Miller D (1999) Evolution of homeobox genes: Q50 Paired-like genes founded the Paired class. Dev Genes Evol 209:186–197 Gehring W, Affolter M, Bu¨rglin T (1994) Homeodomain proteins. Annu Rev Biochem 63:487–526 Glardon S, Holland LZ, Gehring WJ, Holland ND (1998) Isolation and developmental expression of the amphioxus Pax-6 gene (AmphiPax-6): insights into eye and photoreceptor evolution. Development 125:2701–2710 Gro¨ger H, Callaerts P, Gehring WJ, Schmid V (2000) Characterization and expression analysis of an ancestor-type Pax gene in the hydrozoan jellyfish Podocoryne carnea. Mech Dev 94:157– 169 Hobert O, Ruvkun G (1999) Pax genes in Caenorhabditis elegans. Trends Genet 15:214–216 Holland ND, Holland LZ, Kozmik Z (1995) An amphioxus Pax gene, AmphiPax-1, expressed in embryonic endoderm, but not in mesoderm: implications for the evolution of class I paired box genes. Mol Marine Biol Biotechnol 4:206–214 Holland LZ, Schubert M, Kozmik Z, Holland ND (1999) AmphiPax3/7, an amphioxus paired box gene: insights into chordate myogenesis, neurogenesis, and the possible evolutionary precursor of definitive vertebrate neural crest. Evol Dev 1:153–165 Hoshiyama D, Suga H, Iwabe N, Koyanagi M, Nikoh N, Kuma K, Matsuda F, Honjo T, Miyata T (1998) Sponge Pax cDNA re-

164 lated to Pax-2/5/8 and ancient gene duplications in the Pax family. J Mol Evol 47:640–648 Ivics Z, Izsvak Z, Minter A, Hackett PB (1996) Identification of functional domains and evolution of Tc1-like transposable elements. Proc Natl Acad Sci USA 93:5008–5013 Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8:275–282 Jun S, Wallen RV, Goriely A, Kalionis B, Desplan C (1998) Lune/ eye gone, a Pax-like protein, uses a partial paired domain and a homeodomain for DNA recognition. Proc Natl Acad Sci USA 95:13720–13725 Kozmik Z, Holland ND, Kalousova A, Paces J, Schubert M, Holland LZ (1999) Characterization of an amphioxus paired box gene, AmphiPax2/5/8: developmental expression patterns in optic support cells, nephridium, thyroid-like structures and pharyngeal gill slits, but not in the midbrain-hindbrain boundary region. Development 126:1295–1304 Kumar S, Tamura K, Nei M (2004) MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief Bioinform 5:150–163 Mathers PH, Grinberg A, Mahon KA, Jamrich M (1997) The Rx homeobox gene is essential for vertebrate eye development. Nature 87:603–607 Miller DJ, Hayward DC, Reece-Hoyes JS, Scholten I, Catmull J, Gehring WJ, Callaerts P, Larsen JE, Ball EE (2000) Pax gene diversity in the basal cnidarian Acropora millepora (Cnidaria, Anthozoa): implications for the evolution of the Pax gene family. Proc Natl Acad Sci USA 97:4475–4480 Miura H, Yanazawa M, Kato K, Kitamura K (1997) Expression of a novel aristaless related homeobox gene Arx in the vertebrate telencephalon, diencephalons and floor plate. Mech Dev 65:99– 109 Muhr J, Andersson E, Persson M, Jessell TM, Ericson J (2001) Groucho-mediated transcriptional repression establishes progenitor cell pattern and neuronal fate in the ventral neural tube. Cell 104:861–873 Muller P, Yanze N, Schmid V, Spring J (1999) The homeobox gene Otx of the jellyfish Podocoryne carnea: role of a head gene in striated muscle and evolution. Dev Biol 216:582–594

Pollard SL, Holland PW (2000) Evidence for 14 homeobox gene clusters in human genome ancestry. Curr Biol 10:1059–1062 Rzhetsky A, Nei M (1992) A simple method for estimating and testing minimum evolution trees. Mol Biol Evol 9:945–967 Saitou N, Nei M (1987) The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425 Simeone A, D’Apice MR, Nigro V, Casanova J, Graziani F, Acampora D, Avantaggiato V (1994) Orthopedia, a novel homeobox-containing gene expressed in the developing CNS of both mouse and Drosophila. Neuron 13:83–101 Smith KM, Gee L, Blitz IL, Bode HR (1999) CnOtx, a member of the Otx gene family, has a role in cell movement in hydra. Dev Biol 212:392–404 Smith KM, Gee L, Bode HR (2000) HyAlx, an aristaless-related gene, is involved in tentacle formation in hydra. Development 127:4743–4752 Smith ST, Jaynes JB (1996) A conserved region of engrailed, shared among all en-, gsc-, Nk1-, Nk2- and msh-class homeoproteins, mediates active transcriptional repression in vivo. Development 122:3141–3150 Sun H, Rodin A, Zhou Y, Dickinson DP, Harper DE, HewettEmmett D, Li WH (1997) Evolution of paired domains: isolation and sequencing of jellyfish and hydra Pax genes related to Pax-5 and Pax-6. Proc Natl Acad Sci USA 94:5156–5161 Vos JC, Plasterk RH (1994) Tc1 transposase of Caenorhabditis elegans is an endonuclease with a bipartite DNA binding domain. EMBO J 13:6125–6132 Wada H, Saiga H, Satoh N, Holland PW (1998) Tripartite organization of the ancestral chordate brain and the antiquity of placodes: insights from ascidian Pax-2/5/8, Hox and Otx genes. Development 125:1113–1122 Wada S, Tokuoka M, Shoguchi E, Kobayashi K, Di Gregorio A, Spagnuolo A, Branno M, Kohara Y, Rokhsar D, Levine M, Saiga H, Satoh N, Satou Y (2003) A genomewide survey of developmentally relevant genes in Ciona intestinalis II. Genes for homeobox transcription factors. Dev Genes Evol 213:222–234 Wu CH, Yeh LS, Huang H, Arminski L, Castro-Alvear J, Chen Y, Hu Z, Kourtesis P, Ledley RS, Suzek BE, Vinayaka CR, Zhang J, Barker WC (2003) The Protein Information Resource. Nucleic Acids Res 31:345–347

A

Chx10 Chx10

OG12 OG12

Drg11

Drg11 Supplement 1.

1

B

66 99 74

31

91 70

55 41 95 25 98 43 99

68

99 99

88 99

31

68 99 77 99

14 92 82 37 98

99

61

26 98 99

CHX10-Hs CG15782-Chx10-Dm VSX1-Hs CART1-Hs ALX4-Hs ALX3-Hs Al-Dm ARX-Hs Arx-Ci UNC4-Hs Unc4-Dm Rx-Dm RX-Hs DRG11-Hs CG2808-Drg11-Dm Arix-Hs PHDP(Arix)-Dm PRX1-Hs PRX2-Hs Prx-Ci CG5369-OG12-Dm OG12-Bf SHOTa(OG12)-Hs PTX1-Hs Ptx-Dm MBX-Hs Manqacle(Mbx)-Hl GSC1-Hs GSC2-Hs Gsc-Hl Gsc-Dm Otp-Dm OTP-Hs Otx-Hl Otx-Dm OTX2-Hs

Supplement 1.

2

Supplement 2. Arx-Hs-Q96QS3 Arx-Mm-O35085 Arx-Xl-AAN05413 Arx2-Xl-AAS91656 Arx-Dr-O42115 Al-Apm-XP_392557 Al-Dm-Q06453 Al-Ag-XP_317481

RASSIAALRLKAKEHAAQLTQ RASSIAALRLKAKEHAAQLTQ RASSIAALRLKAKEHAAQLTQ RASSIAALRLKAKEHAAQLTQ RASSIAALRLKAKEHSAQLTQ RTNSIASLRLKAREYELHLEM RTSSIAALRLKAREHELKLEL RSSSIAALRLKAREHELRLEM

Chx10-Hs-P58304 Chx10-Mm-AAH58806 Chx10-Gg-Q9IAL1 Chx10/Vsx2-Ol-Q9I9A3 Chx10-Dr-U62898 Chx10-Apm-XP_394790 Chx10-Dm-CG15783

RENSIAVLRAKAQEHSTKVLG RENSIAALRAKAQEHSTKVLG RENSIAALRAKAQEHSTKVLG RENSIAALRAKAQEHSAKVLG RENSIAALRAKAQEHSAKVLG RNESIACLRAKAQQHQLQLSL RNNSIACLRAKAQEHQARLLN

Rx-Hs-Q9Y2V3 Rx-Mm-AAC53129 Rx1-Gg-Q9PVY0 Rx2-Gg-BAA84749 Rx1-Xl-O42201 Rx2a-Xl-O42567 Rx1-Asm-Q9I9D5 Rx1-Dr-O42356 Rx2-Dr-O42357 Rx3-Dr-O42358 Rx-Ol-Q9I9A2 Rx3-Ol-CAC69975 Rx-Sk-AAP79282 Rx-Apm-XP_394144 Rx-Dm-Q9W2Q1 Rx-Pd-AAU20320

RNSSIAALRLKAKEHIQAIGK RNSSIAALRLKAKEHIQAIGK RSSSIASLRMKAKEHIQTIDK RNTSIASLRMKAKEHIQSIGK RNNSIASLRMKAKEHIQFIGK RNNSIASLRMKAKEHIQSFGK RSSSIASLRMKAKEHIQSMDK RSSSIAALRMKAKEHIQSMDK RSSSIAALRMKAKEHIQSMDK RNTSIASLRMKAKEHIQSIGK RNSSIAALRMKAKEHIQSMDK RNSSIASLRMKAKEHIQSFGK RSSSIVSLRMKAKEHIENLGK RTTSIQALRMRAKEHVESITK RSNSIATLRIKAKEHLDNLNK RSTSIVSLRMRAKEHMKAKEH

Drg11-Hs-XP_060970 Drg11-Rn-NP_665710 Drg11-Gg-XP_426514 Drg11-Dr-CR762493 Drg11-Dm-CG10017

RTASVATLRMKAREHSEAVLQ RTASVAALRMKAREHSEAVLQ RTASVAALRMKAREHSEAVLQ RTASVAALRMKAREHSEAVLQ RSNSVAELRRKAQEHSAALLQ

Otp-Hs-NP_115485 Otp-Mm-O09113 Otp-Dr-AAH76366 Otp-Sk-AAP79292 Otp-He-AAS00591 Otp-Lv-AAR17090 Otp-Pl-O76971 Otp-Pv-AAM33145 Otp-Ag-XP_313975 Otp-Dm-P56672

RGTSIASLRRKALEHTVSMSF RGTSIASLRRKALEHTVSMSF RGTSIASLRRKALEHTVSMSF RGTSIAQLRRKALEHSVTLNG RGTSIASLRRKALEHAASLNG RGTSIASLRRKALEHAASLNG RGTSIASLRRK-LEHAASLNG RGTSIAALRRKALEHSACLTG RGHSIAALRRRASELNQTMPS TLHSIAALRRRASELNAIPSY

ALX4-Hs-Q9H161 Alx4-Gg-AAC61772 Alx4-Tn-CAG08730 Alx-Sp-AAP34698 CART1-Hs-CAD90155 Cart1-Gg-XP_425445 Cart1-Xl-Q91574

KTSSIAALRMKAKEHSAAISW KSSSIAALRMKAKEHSAAISW KSSSIAALRMKAKEHSAAISW RTNSIAALRLRAKEHSSVMGM RSSSIAVLRMKAKEHTANISW RSSSIAVLRMKAKEHAANISW RSSSIAVLRMKAKEHTANISW

Prx1-Mm-P63013 Prx2-Mm-Q06348 Prx2-Hs-Q99811 Prx1-Gg-Q05437 Prx2-Gg-Q90963

MANSIANLRLKAKEYSLQRNQ MANSIASLRLKAKEFSLHHSQ MANSIASLRLKAKEFSLHHSQ MANSIANLRLKAKEYSLQRNQ MANSIASLRLKAKEFSLHQNQ

PTX1-Hs-AAH03685 PTX2-Hs-NP_000316 PTX3-Hs-O75364 Ptx2-Mm-P97474 Ptx1-Gg-AAC23684 Ptx2-Gg-O93385 Ptx1-Xl-CAC12834 Ptx2-Xl- Q9PWR3 Ptx3-Xl-AAG15383 Ptx2-Dr-Q9W5Z2 Ptx-Bb-AAF03901 Ptx-Bs-AAT75269 Ptx-Ci-CAD27490 Ptx-Dm-O18400

CNSSLASLRLKSKQHS-SFGYG CNSSLASLRLKAKQHS-SFGYA CNSSLASLRLKAKQHA-SFSYP CNSSLASLRLKAKQHS-SFGYA CNSSLASLRLKSKQHS-SFGYS CNSSLASLRLKAKQHS-SFGYA CNSSLASLRLKSKQHS-TFGYS CNSSLASLRLKAKQHS-SFGYA CNSSLASLRLKAKQHA-NFTYP CNSSLASLRLKAKQHS-SFGYA CNSSIAALRLKAKQHS-TSVAS CNSSIASLRLKAKQHSDYMNYP CASSLTSLRLKAKQHN-PVSPY MSSSIATLRLKAKQHA-SAGFG

SHOTb-Hs-CAA05342 OG12b-Mm-AAC52831 SHOXa-Hs-CAA72299 OG12-Gg-XP_416865 OG12-Fr-Sc2448 OG12-Ol-AAC05613 Shox-Bf-AAL83210 OG12-Dm-CG13141 OG12-Apm-XP_394583

KNSSIADLRLKAKKHAAALGL KNSSIADLRLKAKKHAAALGL KNSSIADLRLKARKHAEALGL KNSSIADLRLKARKHAEALGL KNSSIADLRLKAKKHAEALGL KNSSIADLRLKARKHTEALGL KNNSIADLRLKARKHAEALGL KSSSIADLRMKAKKHSESLGL KNSSIADLRLKARRHQEALGL

MbxL-Hs-AAM90590 Mbx-Gg-Cont10.311 MbxA-Dr-AAL58532 Mbx-Tn-CAF95838 Dmbx-Bf-AAT66431

KTTSIENLRLRAKQHAASLGL KTTSIENLRLRAKQHAASLGL KTTSIENLRLRAKQHAASLGL KTTSIENLRLRAKQHAASLGL QINSIESLRMRARQHAAALGI

PaxA-Hl-AAB58290 PaxA-Hm- BAA36344

MGQSMAALRGGHVSHGN---MGQSMAALRGGHVSHGN----

Pax2/5/8-Bf-AAF89581 KDISLADLRMKAKEHT-AALGL Pax2/5/8A-Phm-CAB96396 EKNEIVNLRMRTKDHSSAPVGL Pax2/5/8-Ci-BAC41498 NTGKVLNLRG--KEHP-ATLEM PAX7-Hs-AL021528 Pax7-Mm-AAG16663 Pax7-Gg-BAA23005 Pax7-Dr-AAC41255 Pax3/7-Bf-AAF89581 Pax3/7-Hr-BAA12289 Pax3/7-Cs-Sc317 Pax3/7-Ci-Sc209 Pax7-Pm-AAL04156

NVSLSTQRRMKLGEHSAVLGL NVSLSTQRRMKLGEHSAVLGL NVSLSTQRRMKLGEHSAVLGL NVSLSTQRRMKLGDHSAVLGL NSLGIAALRQKSREHSAALGL ENAGITALRMKSREHSAALGL GSAGIAALRMKSREHSAALGL GSAGIAALRMKSREHSASIGL NVGLGGQRRLKLGEHSAVLGL

3

Legends to Supplementary Material

Supplement 1. Identification of the reading frames containing HD and PtHD of the Drosophila genes, which are orthologous to the Chx10, OG12 and Drg11 genes of chordates. (A) Schematic representations showing the reading frames and their genomic locations. The sequences containing PHTs are shown in red. Chx10: CG15782-CG15783, chromosome arm X, cytogenetic map 5A3, scaffold AE003434; OG12: CG5369CG13141, chromosome arm 2L, cytogenetic map 31D9, scaffold AE003628; Drg11: CG2808-CG10017, chromosome arm 2L, cytogenetic map 24B1, scaffold AE003579. The diagrams were obtained using a server developed by the FlyBase Consortium http://flybase.net/genes/. The PHT sequences of these proteins are given in supplement 2, and their identity is supported by the phylogenetic analysis presented in figure 3(A). (B) Unrooted neighbor-joining tree of the PtHDs. The Drosophila sequences CG15782, CG5369 and CG2808 are orthologs of Chx10, OG12 and Drg11, respectively. (Bf) Branchiostoma floridae, (Ci) Ciona intestinalis, (Dm) Drosophila melanogaster, (Hl) Hydra littoralis, (Hs) Homo sapiens.

Supplement 2. Collection of the PHT sequences from 13 PtHD gene-types of different organisms. (Ag) Anopheles gambiae, (Apm) Apis mellifera, (Asm) Astyanax mexicanus, (Bb) Branchiostoma belcheri, (Bf) Branchiostoma floridae, (Bs) Botryllus schlosseri, (Ci) Ciona intestinalis, (Cs) Ciona savignyi, (Dm) Drosophila melanogaster, (Dr) Danio rerio, (Fr) Fugu rubripes, (Gg) Gallus gallus, (He) Heliocidaris erythrogramma, (Hl) Hydra littoralis, (Hm) Hydra magnipapillata (Hr) Halocynthia roretzi, (Hs) Homo sapiens, (Lv) Lytechinus variegatus, (Mm) Mus musculus, (Ol) Oryzias latipes, (Pd) Platynereis dumerilii, (Pl) Paracentrotus lividus, (Pm) Petromyzon marinus, (Pv) Patella vulgata, (Phm) Phallusia mammilata, (Rn) Rattus norvegicus, (Sk) Saccoglossus

kowalevskii

(Sp)

Strongylocentrotus

purpuratus,

(Tn)

Tetraodon

nigroviridis, (Xl) Xenopus laevis.

4

Getting the Proto-Pax by the Tail

homeobox-containing gene expressed in the developing CNS of both mouse and Drosophila. Neuron 13:83–101. Smith KM, Gee L, Blitz IL, Bode HR (1999) CnOtx, a member of the Otx gene family, has a role in cell movement in hydra. Dev. Biol 212:392–404. Smith KM, Gee L, Bode HR (2000) HyAlx, an aristaless-related.

255KB Sizes 1 Downloads 264 Views

Recommend Documents

Getting the Proto-Pax by the Tail
Abstract. Pax genes encode transcription factors governing the determination of different cell types and even organs in the development of multicellular animals.

the long tail pdf
There was a problem loading more pages. the long tail pdf. the long tail pdf. Open. Extract. Open with. Sign In. Main menu. Displaying the long tail pdf.

The Emergence of Different Tail Exponents in the ...
Dec 23, 2012 - ... (No.18GS0101) from the Ministry of Education, Culture, Sports, Science and Technology, Japan. 17 ... 94 (2006) 171-192. [36] X. Gabaix ...

Tail-On
zincporphyrin-fullerene dyads exhibit seven one-electron reversible redox reactions within the accessible potential window of the solvent and the measured ...

Getting Around the City
Getting Around the City. Written by Kira Freed • Illustrated by Fred Volke ... I go to my friend's house. I go on my skateboard. My sister goes to school. She goes ...

Getting Around the City
My sister goes to school. She goes on a bike. 4. Getting Around the City • Level D. 3. Page 4. My mom goes to work. She goes in a car. My dad goes to work.

Tail-On
efficiency changes to some extent in comparison with the results obtained for the “tail-on” form, suggesting the presence of some ... visible spectrum,11 and small reorganization energy in electron- .... ring of fulleropyrrolidine to the metal ce

Mermaid Tail Fin Pattern - The Silly Pearl.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Mermaid Tail Fin ...

Exploring the Long Tail of (Malicious) Software ... - Roberto Perdisci
involving hundreds of thousands of Internet machines, collected over a period of seven .... anti-malware provider, and study the proprieties of benign, malicious, and ...... ISBRInstaller, Trusted Software Aps, The Nielsen Company bot. Benjamin ... U

Exploring the Long Tail of (Malicious) Software ... - Roberto Perdisci
Table III shows that many file hosting services, such as softonic. ..... WEBPIC DESENVOLVIMENTO DE SOFTWARE LTDA, JDI BACKUP. LIMITED, Wallinson.

To Have a Tiger by the Tail: Improving Music ... - Semantic Scholar
distribution of this data is heavily biased towards the most popular music ... Since online services are ... tion of music similarity for tail music content, in particular.

To Have a Tiger by the Tail: Improving Music ... - Semantic Scholar
on Wikipedia or music blogs can provide a good comple- mentary signal for artists for which listening data is sparse or unavailable. Web search data can also be ...

disappearing tail lights.pdf
Page 1 of 1. disappearing tail lights.pdf. disappearing tail lights.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying disappearing tail lights.pdf.

dragon tail uranium.pdf
dragon tail uranium.pdf. dragon tail uranium.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying dragon tail uranium.pdf.

pdf-1419\the-long-tail-why-the-future-of-business-is ...
Try one of the apps below to open or edit this item. pdf-1419\the-long-tail-why-the-future-of-business-is-selling-less-of-more-from-hyperion.pdf.

Does the Tail Wag the Dog? How Options Affect Stock ... - David C. Yang
amount of stock in each company (to remain hedged) can deviate from the .... observation might be the Apple, Inc. call option with strike price $600.00 that ...