‘‘Contrasting Patterns of Selection at Pinus pinaster Ait. Drought Stress Candidate Genes as Revealed by Genetic Differentiation Analyses’’ Emmanuelle Eveno,* Carmen Collada,  M. Angeles Guevara,  Vale´rie Le´ger,* Alvaro Soto,  Luis Dı´az,  Patrick Le´ger,* Santiago C. Gonza´lez-Martı´nez,  M. Teresa Cervera,  Christophe Plomion,* and Pauline H. Garnier-Ge´re´* *Institut National de la Recherche Agronomique, Unite´ Mixte de Recherche 1202 Biodiversity Genes & Communities, Cestas, France; and  Department of Forest Systems and Resources, Forest Research Institute, Centro de Investigacio´n Forestral-Instituto Nacional de Investigacio´ y Tecnologı´a Agraria y Alimentaria, Madrid, Spain The importance of natural selection for shaping adaptive trait differentiation among natural populations of allogamous tree species has long been recognized. Determining the molecular basis of local adaptation remains largely unresolved, and the respective roles of selection and demography in shaping population structure are actively debated. Using a multilocus scan that aims to detect outliers from simulated neutral expectations, we analyzed patterns of nucleotide diversity and genetic differentiation at 11 polymorphic candidate genes for drought stress tolerance in phenotypically contrasted Pinus pinaster Ait. populations across its geographical range. We compared 3 coalescent-based methods: 2 frequentist-like, including 1 approach specifically developed for biallelic single nucleotide polymorphisms (SNPs) here and 1 Bayesian. Five genes showed outlier patterns that were robust across methods at the haplotype level for 2 of them. Two genes presented higher FST values than expected (PR-AGP4 and erd3), suggesting that they could have been affected by the action of diversifying selection among populations. In contrast, 3 genes presented lower FST values than expected (dhn-1, dhn2, and lp3-1), which could represent signatures of homogenizing selection among populations. A smaller proportion of outliers were detected at the SNP level suggesting the potential functional significance of particular combinations of sites in drought-response candidate genes. The Bayesian method appeared robust to low sample sizes, flexible to assumptions regarding migration rates, and powerful for detecting selection at the haplotype level, but the frequentist-like method adapted to SNPs was more efficient for the identification of outlier SNPs showing low differentiation. Population-specific effects estimated in the Bayesian method also revealed populations with lower immigration rates, which could have led to favorable situations for local adaptation. Outlier patterns are discussed in relation to the different genes’ putative involvement in drought tolerance responses, from published results in transcriptomics and association mapping in P. pinaster and other related species. These genes clearly constitute relevant candidates for future association studies in P. pinaster.

Introduction Natural selection is a powerful force that promotes adaptive phenotypic differentiation between natural populations for fitness-related traits (Schemske 1984; Endler 1986; Linhart and Grant 1996; Kittelson and Maron 2001; Latta 2003; Kawecki and Ebert 2004). Variation in climatic conditions, such as rainfall or temperature, is thought to confer strong selective pressure on plant local adaptation to different environments (Joshi et al. 2001; Thomas 2005). This can generate clinal variation at fitness-related traits (Huey et al. 2000; Garcı´a-Gil et al. 2003; Jump et al. 2006). In particular, for outcrossing species, the persistence and repetition of clines at particular genes illustrate the action of spatially varying selection that counterbalances the homogenizing effects of gene flow (Rehfeldt et al. 1999; Storz 2002; Baines et al. 2004). In tree species, and despite interaction with demographic effects, the role of natural selection in promoting local adaptation has been largely demonstrated in common garden experiments (provenance, progeny, and clonal tests) (e.g., Hurme et al. [1997] on bud set and frost resistance in Pinus sylvestris; Garnier-Ge´re´ and Ades [2001] on growth and survival in Eucalyptus delegatensis; Aitken and Hannerz [2001] on cold adaptation in conifers; Gonza´lez-Martı´nez et al. [2002] on survival and growth for Pinus pinaster).

Key words: drought stress, candidate genes, adaptive evolution, Pinus pinaster. E-mail: [email protected]. Mol. Biol. Evol. 25(2):417–437. 2008 doi:10.1093/molbev/msm272 Advance Access publication December 7, 2007 Ó The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected]

At the molecular level, it is much less clear how selection affects patterns of nucleotide variation as demographic factors, such as drift and variation in population size, can produce similar patterns (Tajima 1989; Fay and Wu 1999; Galtier et al. 2000; Nielsen 2001; Heuertz et al. 2006). Unraveling the molecular basis of adaptation thus still remains a challenging task and, despite being a long-term goal of population genetics and evolutionary biology (Howe and Brunner 2005; Wright and Gaut 2005), is still a largely untouched area (Schlo¨tterer 2002a; Ehrenreich and Purugganan 2006). Fundamental questions include the nature, number, and location of genes or mutations involved in adaptation (Barton and Keightley 2002) and also the distribution and magnitude of their allelic effects on phenotypes (Storz 2005; Wright and Gaut 2005). Traditionally, the approach used to search for genes underlying adaptive trait variation has been to map quantitative trait loci (QTLs), based on significant statistical associations between molecular markers and trait variation, either in known pedigrees (Borevitz and Chory 2004) or more rarely in undomesticated natural populations (Slate 2005). This approach suffers from large confidence intervals (CIs) around the estimated QTL location, which potentially include tens to hundreds of different candidate genes (Flint and Mott 2001; Christians and Keightley 2002; Masle et al. 2005). This is due to the large blocks of linkage disequilibrium (LD) between markers, generated by the small number of recombination events that typically occurs in 2 or 3 generations’ pedigrees. In the past decade, progress in both public genomic resources and population genomics methods has dramatically

418 Eveno et al.

changed the prospects to better understand the genetic basis of adaptation (Ford 2002; Luikart et al. 2003; Wright and Gaut 2005). In tree species, there has been a considerable increase in genes’ sequence data (Kado et al. 2003; Brown et al. 2004; Krutovsky and Neale 2005; Gonza´lez-Martı´nez, Krutovsky, and Neale 2006). Also, the development of a large body of coalescence-based methods allowing analysis and comparison of nucleotide diversity patterns within and among species (or populations) enables insightful studies of adaptation at the molecular level to be performed (Charlesworth et al. 2001; Nielsen 2005; Sabeti et al. 2006). Intense debates are currently being raised regarding which demographical or historical processes and which forms of selection mainly affect population genomic patterns of variation (Fay and Wu 2000; Andofalto 2001; Hellmann et al. 2003; Haddrill et al. 2005). Ideally, population genomics requires to study as many genes or markers along chromosomes as possible, with the aim of distinguishing locus-specific effects from genomewide patterns due to species structure and demographic histories (Cavalli-Sforza 1966; Schlo¨tterer 2002a; Luikart et al. 2003). These locus-specific effects could either show a higher than expected differentiation due to diversifying selection between populations or a lower than expected differentiation due to balanced selection across populations. There has been a recent interest in studies involving a larger sampling of distinct populations for the search of adaptive evolution in plants (Schaal and Olsen 2000; Schlo¨tterer 2002b) as local adaptation may occur on an appropriate time scale for selection detection with population genomics methods (Wright and Gaut 2005). Many tree species would constitute good models for this purpose due to their large-sized natural populations with little confounding substructure and domestication history, outcrossing mating systems leading to high gene flow, sufficient nucleotide diversity, and long history of ecological and quantitative genetics research (Howe and Brunner 2005; reviewed in Savolainen and Pyha¨ja¨rvi 2007). In conifers in particular, genome scans are not yet feasible due to their large genome size (e.g., ;25.5 pg/C for P. pinaster as estimated by Chagne´ et al. [2002], i.e., about 170 times the size of the Arabidopsis genome) and rapid decay of LD (reviewed in Neale and Savolainen 2004), calling for candidate multigene approaches, which have been advocated in recent reviews (Tabor et al. 2002; Vasema¨gi and Primmer 2005). In such approaches, ‘‘neutral’’ expectations accounting for structure and demographic history can be both model and empirically based, valuing previous information available on neutral markers in the targeted species (Beaumont and Nichols 1996; Storz 2005). In this study, we report a multilocus scan of differentiation among phenotypically contrasted populations of P. pinaster using recently identified structural candidate genes for water-deficit response, which were identified from expression studies (Dubos and Plomion 2003; Dubos et al. 2003; Watkinson et al. 2003; Gonza´lez-Martı´nez et al. 2006). Looking across the species full range, physiological and gene expression studies in P. pinaster showed large differences in the genes and mechanisms involved in drought tolerance (Nguyen-Queyrens and Bouchet-Lannat 2003;

Dubos and Plomion 2003; Dubos et al. 2003), whereas a QTL mapping study (Brendel et al. 2002) identified 4 QTLs in a French west coast population for d13C, a time integrated estimate of water use efficiency (Farquhar et al. 1989). Pinus pinaster also presents a fragmented geographic distribution, with a relatively high genetic differentiation among populations at putative neutral markers (mean GST varying from 0.10 to 0.17 for rangewide studies using different markers: Petit et al. 1995, 2005; Bucci et al. 2007), compared with other conifer species (e.g., mean GST of around 5% in P. sylvestris [Waldmann et al. 2005]). Considering phenotypic variation in P. pinaster, higher levels of population divergence have been observed on adaptive traits such as height and survival (QST of around 0.7 from multisite Spanish provenance trials, Gonza´lez-Martı´nez et al. 2002) that could be explained by contrasted ecological conditions and local adaptation (Alı´a et al. 1995, 1997). The discrepancy between molecular and quantitative levels of differentiation suggests the importance of diversifying selection among populations (McKay and Latta 2002; Le Corre and Kremer 2003). Sampling in phenotypically distinct populations across the species full range, we compared different coalescentbased approaches for testing whether nucleotide differentiation patterns at both single nucleotide polymorphisms (SNPs) within genes and whole-candidate genes could depart from neutral patterns and result from natural selection. One approach was based on an infinite island model and produced simulated null FST distributions conditional on heterozygosity, using frequentist-like tests to infer outlier patterns (Beaumont and Nichols 1996). Null distributions were shown by the authors to be robust to a large range of population structure models and interlocus variation for mutation rates, but we found that this method was less robust in the case of biallelic loci, especially for limited sample sizes, and thus proposed instead a specific method for SNPs using similar coalescent simulations. These 2 methods, however, assume an island model of migration, thus another approach based on a hierarchical ‘‘Bayesian’’ model was applied (Beaumont and Balding 2004), in order to accommodate the common situation of heterogeneous migration rates among natural populations. We demonstrated significant outlier patterns at 5 genes potentially involved in different mechanisms of drought stress response, which can be interpreted in terms of divergent or homogenizing selection, variation in migration rates, and population isolation.

Materials and Methods Populations’ Sampling and Climatic Data Seeds were collected from 20 to 30 trees in 24 populations from France, Spain, Morocco, and Tunisia, which are representative of the natural fragmented distribution of P. pinaster (fig. 1). Among them, 10 populations were chosen for DNA sequencing due to their contrasting climates, as described by a set of 6 variables based on observations for 23 years for temperature and 37 years for rainfall (except for northern African populations, table 1). Each of

Drought Stress Genes Differentiation in Pinus pinaster 419

FIG. 1.—Distribution of Pinus pinaster (gray area, after Burban and Petit [2003]). Ten native populations chosen for sequencing on the basis of contrasted climatic conditions are indicated by an arrow.

these variables was chosen so that it explained less than 30% of variation of any other, based on Pearson correlation coefficient (table 1). The annual total mean rainfall (ANMR) ranges from 348 mm for Oria (Spain) to 1343 mm for Pineta (Corsica). A total rainfall of the driest month

(RDRYM) of 2 mm for Oria and 3 mm for Pinia (Corsica) illustrates more severe summer drought periods for these populations (compared with an average of 11.5 mm for the others). The variation in ANMT between populations is consistent with climates along their latitudinal range: from

Table 1 Geographical Coordinates and Climatic Data for 10 Contrasted Pinus pinaster Populations Sampling Location Pleucadec Mimizan San Cipriano Pinia Pineta Coca Arenas Oria Tabarka Tamrabta Mean a

Longitudea

Latitude (N)b

Elevation (m)

ANMR (mm)c

RDRYM (mm)d

ANMT (°)e

ANMT Variance (°)f

MTCM (°)g

MTWM (°)h

2°20# (W) 1°18# (W) 8°42# (W) 9°29# (E) 9°02# (E) 4°05# (W) 5°08# (W) 2°37# (W) 8°54# (E) 5°0# (W)

47°47# 44°08# 42°08# 42°01# 41°58# 41°22# 40°02# 37°52# 36°65# 34°0#

80 37 310 10 750 788 1,359 1,232 66 1,600

716 995 661 705 1,343 428 957 348 999 763 792

11 16 18 3 12 7 5 2 n.a.i 7 9

11.6 13.8 15.1 n.a.i 12.2 12.2 13.8 12.5 17.9 17.2 14.0

0.32 0.45 0.26 n.a.i 2.18 0.26 0.65 0.23 n.a.i n.a.i 0.62

4.0 6.3 6.9 n.a.i 4.1 3.2 4.8 3.9 n.a.i n.a.i 4.7

19.0 21.2 24.3 n.a.i 21.4 22.5 24.4 22.9 n.a.i n.a.i 22.2

Longitude East (E) or West (W). Latitude North (N). c ANMR averaged across the years 1964 to 2001 for French, Corsican (Me´te´oFrance, http://www.meteofrance.com/) and Spanish (Cancio AF, personal communication) populations with on average 10% of missing years. ANMR estimated from 1933 to 1963 for Tamrabta population (Destremau et al. 1976) and from 1901 to 1990 for Tabarka population (Socie´te´ Centrale pour l’Equipement du territoire Tunisie 2005). d RDRYM averaged across the years. e Annual mean temperature averaged across the years 1978 to 2001 for French, Corsican, and Spanish populations (same source data than ANMR) with on average less than 1% of missing years. ANMT estimated on 18 years in a close station, Meknes, for Tamrabta population (http://www.weatherbase.com/) and from 1901 to 1990 for Tabarka population (same source data than ANMR). f Variance of ANMT across the years. g Mean temperature of the coldest month averaged across the years. h Mean temperature of the warmest month averaged across the years. i Data not available. b

420 Eveno et al.

averages of 11.6° for Pleucadec (North–West of France) to 17.2° and 17.9° for Tamrabta (Morroco) and Tabarka (Tunisia), respectively. The Pineta population in Corsica shows a larger variance of ANMT across the years compared with other populations (between 3 and 8 times higher). Finally, the mean temperatures of the coldest and warmest months (MTCM and MTWM) are, respectively, lower for Coca and Oria and higher for Arenas de San Pedro (‘‘Arenas’’ hereafter) and San Cipriano than for other populations (table 1). Candidate Gene Selection, Sequencing, and Polymorphisms Detection A set of 13 candidate genes (table 2) were chosen considering previous gene expression studies using various drought stress treatments in maritime pine and other conifer species (Chang et al. 1996; Padmanabhan et al. 1997; Costa et al. 1998; Dubos and Plomion 2003; Chaumeil 2006), nucleotide diversity analyses in related species (Gonza´lezMartı´nez et al. 2006), and also for their putative role during drought stress responses: 1) cell wall reinforcement for 2 lignification genes, CCoAOMT and COMT, 2) maintenance of osmotic balance under dehydration conditions related to sugar metabolism for Glucan and Ino3, 3) protection of cytoplasmic structures for dhn-1 and dhn-2, 4) signal transduction processes induced by stress conditions for pp2c, and 5) degradation of proteins damaged by drought effects for rd21A. Publicly available sequences were blasted in gene databases such as National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/BLAST/) and The Institute for Genomic Research (TIGR) (http://compbio.dfci. harvard.edu/tgi/cgi-bin/tgi/gimain.pl?gudb5pine) to search for homologsinP. pinasterororthologs inclose species(Pinus taeda, Pinus radiata, and P. sylvestris) (table 2). After alignment with Bioedit v. 7.0.5 (http://www.mbio.ncsu.edu/BioEdit/bioedit.html) to get an overview of gene structure and prediction ofin silico polymorphisms, primers were designed, using bothPrimer3 (http://frodo.wi.mit.edu/cgi-bin/primer3/ primer3_www.cgi) and OligoCalc (http://www.basic. northwestern.edu/biotools/OligoCalc.html) for polymerase chain reaction (PCR) amplification of candidate genes fragments with an expected length of 500 to 900 base pairs (bp) (supplementary table S1, Supplementary Material online). These fragments preferentially included the 3# untranslated regions (UTRs) to avoid amplification of different multigene family members or covered (for short length genes such as dehydrins) the full length of the gene. DNA was extracted from haploid tissue of megagametophyte seeds as described by Plomion et al. (1995) for a discovery panel of 5 to 31 individuals in each of the 10 contrasted populations. Purified PCR products were sequenced using either the DYEnamic ET Terminator kit (Amersham Biosciences Inc., Uppsala, Sweden) or the BigDye Terminator v 3.1 Cycle Sequencing kit (Applied Biosystems, Foster City, CA) and separated using capillary electrophoresis (MegaBACE 1000, Amersham Biosciences Inc., or ABI 3130 XL, ABI 3700, and ABI 3730, Applied Biosystems). Base calling was performed from the raw chromatograms using Sequence Analyser (Amersham Biosciences Inc.) or Seqscape v. 2.5 (Applied Biosystems).

Sequence alignments and quality attribution were processed with either CodonCode Aligner v. 1.5.1 (Codon Code Corporation, Dedham, MA) or Chromas Pro ver. 2.31 (Technelysium Pty Ltd, Tewantin, Queensland, Australia). All detected polymorphisms (SNPs and insertion–deletions [indels]) were visually checked in CodonCode and further validated using Phred scores (quality threshold) above 20. Nuclear Microsatellites (nuSSRs) Genotyping and Diversity Estimates Putatively neutral nuSSRs were used to account for genetic differentiation caused by demographic and other processes not related with selection (e.g., genetic drift resulting from geographic isolation or population expansion); thus, no detailed analysis of nuSSRs diversity is presented here. Rather, their use was for identifying homogeneous gene pools and computation of genetic differentiation estimates that could represent a neutral reference. Eight polymorphic nuSSRs were selected from those previously developed by Chagne´ et al. (2004): NZPR413, NZPR1078, ctg64; Mariette et al. (2001): ctg275, FRPP91, FRPP94, ITPH4516, and Guevara et al. (2005): A6F03. Criteria of choice included their localization to different linkage groups, their allelic richness (at least 4 alleles), and their potential for multiplex amplification. Genotyping was performed on genomic DNA isolated from needles from the complete set of 24 sampled populations (using 20–30 individuals per population and primer pairs as described in supplementary table S2 [Supplementary Material online]). Genetic diversity and inbreeding coefficient (FIS) were estimated as detailed in supplementary tables and figures (Supplementary Material online) : mean diversity estimates were consistent with previously reported values in either pedigrees or different samples of natural provenances (Mariette et al. 2001; Derory et al. 2002; Chagne´ et al. 2004; Guevara et al. 2005) (supplementary table S2, Supplementary Material online). FIS values were close to zero (mean 50.07 across the 24 populations) except for FRPP91 and ITPH4516, which presented significant heterozygote deficits within several populations (supplementary table S2, Supplementary Material online). Populations were clustered using 2 different methods in order to define homogeneous gene pools: one was based on the pairwise FST matrix among populations (table 3) and the other was based on the Bayesian algorithm proposed in Structure v. 2.0 (Pritchard et al. 2000). More details on the methods are given in supplementary tables and figures (Supplementary Material online). Both clustering methods gave consistent outputs and allowed the identification of 6 groups (table 4 and supplementary figs. S1 and S2 [Supplementary Material online]): Corsica (group C), continental France (group F), northwestern and central Spain (group S) and 3 clusters with single populations: Oria (southern Spain, group O), Tamrabta (Morocco, group M), and Tabarka (Tunisia, group T). Candidate Genes and SNPs Statistical Analyses Molecular Diversity and Differentiation For each candidate gene, nucleotide diversity was estimated with both hp (Nei 1987), based on the average

Table 2 Description of the 13 Candidate Genes

Gene Putative Arabinogalactan/ proline-rich protein Put. Arabinogalactan/ glycin-rich protein

Caffeoyl-CoA-Omethyltransferase Putative Caffeic-Omethyltransferase Putative Glucan-endo-1,3beta-glucosidase precursor Putative Inositol-3-phosphate synthase

Gene Code

Accession / TIGR or GenBank Homologsa

AM502293/BX255640

Downregulation by osmotic stress in Pinus pinaster rootsc Downregulation by osmotic stress in P.pinaster rootsc

CCoAOMT

AM502291/TC73846

Upregulation by water stress in P.pinaster needlesd

COMT

AM774402/TC66403

PR-AGP4 GRP3

AM501931/TC67837

Glucan

AM502290/TC66277

Ino3

AM774401/TC64990

dhn-1

AM502292/TC57901

Dehydrin 2 Early responsive to dehydration 3

dhn-2

EU020010/TC58333

erd3

EU020011/TC65051

Protein phosphatase 2C Cystein protease (pseudotzain)

pp2c

EU020014/TC62065

rd21A

EU020015/TC72918

lp3-1

EU020012/TC73436

lp3-3

EU020013/TC73474

a

Upregulation by osmotic stress in P.pinaster rootsc

Upregulation by wounding, cold and drought stress in Picea glaucah Upregulation by water stress in Pinus taeda rootsi

Upregulation by water stress in P. taeda rootsno Upregulation by water stress in P. taeda rootsn

Length Screened (bp)

Functional Category

Putative Role in Drought Response

b

N

Total

Coding Region

Noncoding Region 5# UTR

Cell wall formation

Unknown

208

1,630

360

Cell wall formation

Unknown Cell wall thickening and maintenance of intracellular osmotic pressionef Cell wall thickening and maintenance of intracellular osmotic pressionef Osmotic adjustment related to sugars metabolismg Osmotic adjustment related to sugars metabolismg

172

450

341

204

1,519

465

708

346

20

1,400

1019

35

346

158

1,357

161

1,004

24

859

479

380

132

828

668

56

513

402

111

Unknown Signal transduction pathwayjkl Damage repair (protein degradation)m

58

796

554

201

53

591

414

177

54

942

561

Unknown

57

412

146

Unknown

54 1,250

407 11,704

240 5,810

Lignin biosynthesis

Lignin biosynthesis Carbohydrate metabolism (starch and sucrose metabolism) Raffinose family oligosaccharides biosynthesis Drought stress (belong to LEA) gene family) Drought stress (belong to LEA gene family) Putative methyltransferase with SAM-binding domain Phosphatase similar to abi1 in Arabidopsis thaliana Cystein protease similar to rd21A in A. thaliana Drought stress (belong to the ASR [abscisic-, stress-, and ripeninginduced] gene family) Drought stress(belong to the ASR gene family)

Cytoplasmic structures protectiong Cytoplasmic structures protectiong

88

Intron 708

3# UTR 474 109

44

186

192

116

41

195

266

1,061

167 3,751

1,082

Accession number of submitted sequences/Contig or accession number of homologs in TIGR (http://compbio.dfci.harvard.edu/tgi/cgi-bin/tgi/gimain.pl?gudb5pine) or GenBank (http://www.psc.edu/general/software/packages/ genbank/genbank.html) databases. b Number of haploid sequences. c Dubos et al. (2003). d Costa et al. (1998). e Cruz et al. (1992). f Joyce et al. (1983). g Ramanjalu and Bartels (2002). h Richard et al. (2000). i As shown by expression profiles obtained using the Magic Gene Discovery tool (Laboratory for Genomics and Bioinformatics, University of Georgia; http://www.fungen.org). j Bray (1997). k Shinozaki and Yamaguchi-Shinozaki (2000). l Rodriguez (1998). m Ingram and Bartels (1996). n Padmanabhan et al. (1997). o Chang et al. (1996).

Drought Stress Genes Differentiation in Pinus pinaster 421

Dehydrin 1

ABA- and WDSinduced-gene-1 ABA- and WDSinduced-gene-3 Total

Evidence of Differential Expression under Drought Treatments in Conifer Species

422 Eveno et al.

Table 3 Matrix of Pairwise FST Averaged across 8 Nuclear SSRs between the 10 Populations Population

Pineta

Pinia

Mimizan

Pleucadec

Arenas

Coca

San Cipriano

Oria

Tamrabta

Tabarka

Pineta Pinia Mimizan Pleucadec Arenas Coca San Cipriano Oria Tamrabta Tabarka

* 0.038 0.308 0.275 0.202 0.201 0.250 0.173 0.319 0.333

* 0.253 0.220 0.148 0.145 0.178 0.119 0.265 0.303

* 0.015 0.072 0.060 0.112 0.113 0.296 0.316

* 0.046 0.039 0.077 0.099 0.271 0.293

* 0.001 0.045 0.060 0.233 0.200

* 0.045 0.049 0.224 0.222

* 0.072 0.247 0.247

* 0.156 0.183

* 0.289

*

number of pairwise differences between sequences, and hS (Watterson 1975), based on the number of segregating sites, on the total number of polymorphic sites, nonsynonymous and silent sites (including synonymous sites in coding regions and sites in noncoding regions). Both SNPs and indels were included in diversity estimates, with indels in coding regions classified as nonsynonymous sites. Haplotype diversity was estimated using Hd statistics of Nei (1987). Diversity estimates were computed in Arlequin v. 3.01 (Excoffier et al. 2005), which allows for missing data, and DNAsp v. 4.10 (Rozas et al. 2003), which does not. Additionally, the number of small linkage disequilibrium blocks (SLDB), defined as contiguous sets of nucleotides in complete LD that included at least 2 polymorphisms, was described for each gene, with their mean length and mean number of SNPs. Possible departures of hP values from those expected under neutrality were tested using coalescent simulations under both no and free recombination in Proseq v. 2.91 (Filatov 2002), which allows for subdivided populations (10 demes were used). Molecular genetic differentiation was then assessed among the 10 populations for each candidate gene at both haplotype and SNP levels, with an analysis of molecular variance (AMOVA) on the distance (number of differences) matrix between ‘‘haplotypes’’ or sequence pairs, using Arlequin. In the AMOVA analysis, the among-population component of total variation represents the nucleotide differentiation estimate NST (Nei and Kumar 2000) that accounts for genetic relatedness between haplotypes in a general framework allowing for missing data. Classical Weir and Cockerham (1984) FST parameters were also estimated along the genes for each polymorphic site having a minor allele frequency (MAF) above 5% across populations, and also for each gene by considering each haplotype as an independent allele (after excluding missing data).

Around 40,000 independent loci were generated for 100 demes (as an approximation to the infinite island model), using the mean FST values observed in nuSSRs as the neutral expectation. The correspondence between simulated and observed FST means was checked with a precision of þ/ 0.01%. We considered a 2-sided test at an a% significance level (a up to 10% not to be too conservative) for detecting either signatures of diversifying selection by identifying outliers showing FST values greater than the (1½a)% quantile of the simulated null distribution, or for detecting balancing, or homogenizing selection from outliers with FST values lower than the ½a% quantile of the distribution. The ½a% and (1½a)% quantiles were computed following Beaumont and Nichols (1996). Infinite allele and stepwise mutation models (IAMs and SMMs, respectively) were considered to be well suited to candidate genes (assuming independent haplotypes) and nuSSRs (which were also tested for neutrality using this approach), respectively. Both models produced very similar distributions for our experimental sample sizes (10–80 gametes) but were not suited to SNPs, mostly biallelic, for which we designed a method based on coalescent simulations that were performed with Simcoal2 ver. 2.1.2 (Laval and Excoffier 2004). In this method, biallelic SNPs and a similar island migration model to that used in Fdist2 were simulated. We will refer to this as the ‘‘FstSNP’’ method. We calculated the ½a% and (1½a)% quantiles for 11 discrete classes of heterozygosity, each covering a heterozygosity range of 0.05, using the R package (http://www.r-project. org/). Because different numbers of haploid sequences were available for different genes, simulations were applied to: 1)

Neutrality Tests Based on Genetic Differentiation

Gene Pools

To search for possible signatures of natural selection, we applied a first approach (implemented in ‘‘Fdist2,’’ Beaumont and Nichols 1996), that derives the expected neutral distribution of FST values for many loci conditional on their heterozygosity (and thus accounting for variation in allele frequency), using coalescent simulations under a symmetrical island migration model and assuming migrationdrift equilibrium. We will refer to it as the Fdist2 method.

Table 4 Matrix of Pairwise FST Averaged across 8 Nuclear SSRs between the 6 Homogeneous Gene Pools

C F S O M T

C

F

S

O

M

T

* 0.258 0.166 0.154 0.295 0.316

* 0.052 0.105 0.275 0.293

* 0.052 0.216 0.200

* 0.156 0.183

* 0.289

*

NOTE.—C: Corsica, F: continental France, S: northwestern and central Spain, O: Oria, M: Tamrabta, T: Tabarka.

Drought Stress Genes Differentiation in Pinus pinaster 423

a first group of genes (PR-AGP4, CCoAOMT, GRP3, Glucan, and dhn-1) for which the sample size ranged from 9 to 14 gametes per population (whether using Fdist2 without missing data in sequences or FstSNP allowing for missing data) and 2) a second group (erd3, dhn-2, lp3-1, lp3-3, pp2c, and rd21A) for which the sample size was around 6 gametes per population. Loci (genes and SNPs) with FST values falling outside the ½a% or the (1½a)% quantiles of the envelope boundaries were considered as significant outliers. SNPs in complete LD were excluded from the analyses to limit redundant information, as well as SNPs with a MAF below 5%. To account for multiple testing in the 2 frequentist methods (Fdist2 and FstSNP), adjusted P values were first computed using the Bonferroni procedure where the family-wise error rate (FWER) is divided by the number of tests performed (Sokal and Rohlf 1995). This procedure only controls the chance of making even a single type I error and is recognized to be overly conservative as the overall interpretation might be greatly affected by a large number of type II errors (Verhoeven et al. 2005). A recent method based on the false discovery rate (FDR) allows instead to control the FDR (expected proportion of false positives among all significant tests) under a certain level, giving individual q values measures of significance (Storey and Tibshirani 2003). This method was shown to be more powerful than methods controlling the FWER and has thus been applied using the Qvalue R library (http://faculty. washington.edu/;jstorey/qvalue/). To account for likely departures from a symmetrical migration model in our sampling of populations (see, e.g., Gonza´lez-Martı´nez et al. 2007 for Oria population), a third approach was also used. This approach was based on a hierarchical Bayesian model and implemented via Markov Chain Monte Carlo (MCMC) simulations (Beaumont and Balding 2004). The underlying method for estimating differentiation allows for heterogeneous migration rates among populations, which could have resulted from reduced immigration in some isolated populations, and is thus robust to a number of demographic models in structured populations (Beaumont 2005). After having expressed the likelihood for the allele counts as a function of FSTij, defined for each locus i and population j, the method consists in regressing log[FSTij/(1FSTij)] onto a locus effect (ai) and a population effect (bj) (interaction effects [kij] were first tested and found not significant in the full data set), which posterior distributions are obtained from approximately 2,000 uncorrelated outputs of the total MCMC outputs (including at least 14,000 iterations [up to 22,000] and at least 27,0000 iterations [up to 450,000], respectively, for the burn-in and post-burn-in periods). The mean and standard deviation of prior distributions were set by default at 0 and 1 for ai, and 2 (in order to have FSTij as a growing function of bj) and 1 for bj. For each locus, ai was said to be significant at a P value when its 1-tailed 100[1(P value)]% posterior interval excluded zero (Beaumont and Balding 2004). Significant and positive values of ai can then be interpreted as diversifying selection among populations, whereas negative values would be indicative of spatially homogenizing selection (e.g., balancing selection in each population). Similarly, a population effect was considered to be significant if 100[1(P value)]%

of its posterior distribution was shifted from the prior mean. A positive (respectively negative) outlier could be due to a population showing less gene flow (respectively more gene flow) comparing with others or to one with a smaller effective population size Ne (respectively larger Ne). Compared with the previous approach, loci corresponding to different sample sizes could be analyzed simultaneously because the method accounts for sample sizes variation at each locus. Different values of standard deviations were tested for the ai prior, but this did not affect the results much, suggesting that the data were sufficiently informative to produce ai robust posterior distributions. All Bayesian analyses were performed using the program ‘‘BayesFst’’ (Balding DJ, personal communication); thus, the method is referred to as BayesFst. A specific parameter proposed in the program to account for prior information on an average correlation level between loci was also used, but results obtained with setting it to 0 or to 0.2 (for a weak overall correlations among SNPs within and between genes) were the same with our data. Additionally, the Bayesian approach presents the advantage of dealing with the multiple testing problem by accounting of how plausible the null hypothesis is through the choice of prior distributions. It will result precisely for each particular locus or SNP, in a P value that is defined as the probability that the null hypothesis is true, using probability as a direct measure of uncertainty (Shoemaker et al. 1999; O’Hagan and Luce 2003). Finally, we also performed the same analysis after having grouped ‘‘close’’ haplotypes belonging to the same lineages (i.e., distant from one another by a small number of mutations and no recombination) for each candidate gene. The objective here was first to increase the precision on each new haplotype frequency estimate (by reducing the haplotype number) and second to get a Bayesian estimate of differentiation for each gene that would be closer to a NST estimate due to the pooling of less distant haplotypes and thus that would better fit the original view of FST as an inbreeding coefficient along with what is proposed in the Bayesian method. An arbitrary threshold of around 10% differences (number of mutation differences between pairs of haplotypes relative to the total number of mutations) was used in a first step to pool haplotypes. This rule corresponded to a maximum of 6 mutations’ differences (for PR-AGP4) and a minimum of one mutation’s difference for GRP3, lp3-3, and dhn-2. Classical Weir and Cockerham’s FST estimates were also computed at each gene with the newly defined haplotypes.

Results Nucleotide sequence data were obtained for 13 candidate genes spanning over 11 kb, with about half from coding regions (table 2). Among them, one gene fragment (Ino3 . 850 bp sequenced) presented only one SNP in 24 gametes and another (COMT, 1,400 bp sequenced for 20 gametes) was monomorphic. In the 11 remaining genes, 302 SNPs and 36 indels (from which 71 and 7 were singletons, respectively) were detected, representing on average one polymorphic site per 28 bp. Approximately 20% of the SNPs identified were nonsynonymous (table 5).

424 Eveno et al.

Table 5 Number of Identified Segregating Sites, Nucleotide and Haplotype Diversities, and SLDB at 11 Polymorphic Candidate Genes across the 10 Populations Across All Sites Gene code PR-AGP4 CCoAOMT GRP3 Glucan dhn-1 erd3 dhn-2 lp3-1 lp3-3 pp2c rd21A Mean Total

Nb

Lc

197 179 168 158 121 58 56 72 54 53 54

1,549 1,515 576 1,357 818 795 440 407 406 590 914

1,155

9,367

SNPs (singl.) 90 41 13 36 21 12 18 10 9 9 43

Indels (singl.)

(4) (8) (3) (4) (4) (7) (10) (2) (1) (4) (24)

5 9 (1) 0 9 (1) 2 (1) 1 (1) 3 (1) 2 0 3 (2) 2

302 (71)

36 (7)

Silent Sitesa

Nonsynonymous Sites hpd

6.55 2.70 2.69 6.13*(þ) 7.02*(þ) 1.51 8.36 9.20*(þ) 4.41 2.28 9.45 5.48

hsd (þ)

10.03* 4.70 3.96 4.31 4.78 3.57 10.68*(þ) 6.08 4.93 4.54 11.11 6.24

Le

SNPs (singl.)

Indels (singl.)

397 336 251 73 399 426 265 116 187 315 432

13 7 (4) 4 0 11 (2) 4 (2) 5 (3) 2 (2) 2 3 (1) 11 (9)

1 0 0 1 (1) 0 1 (1) 0 0 0 0

62 (23)

3 (2)

3,197

hpd

Le

SNPs (singl.)

Indels (singl.)

4.13 1.33 3.30 n.a.i 5.77 1.63 4.45 n.a.i 1.50 1.12 2.02 2.81

986 637 185 1,054 220 369 175 291 219 275 482

70 (12) 21 (2) 4 28 (4) 10 (2) 8 (5) 13 (7) 8 7 (1) 7 (3) 35 (15)

4 2 0 9 (1) 1 1 (1) 2 2 0 3 (2) 2

4,893

211 (50)

26 (4)

hpd 10.17 7.12 5.35 9.60 6.99 1.37 14.20 12.53 6.89 3.60 15.92 8.52

Nh (singl.)f 24 20 18 12 10 12 17 24 8 11 15

(12) (10) (7) (8) (6) (7) (11) (15) (2) (5) (9)

Hdg

SLDBh

0.702 0.755 0.848 0.576 0.727 0.498 0.847 0.874 0.767 0.770 0.850 0.747

15/3/46 3/2/27 0/0/0 8/3/78 2/2/38 2/2/59 2/2/30 0/0/0 1/2/61 2/2/53 7/3/37

169 (91)

NOTE.—singl; singleton. a Silent sites include synonymous polymorphisms in coding regions and polymorphisms in noncoding regions. b Total number of haploid sequences analyzed. c Total analyzed length in base pair. d Nucleotide diversity hp (Nei 1987) and hs (Watterson 1975) per site (103). Departures from neutrality were tested using coalescent simulations in a population subdivided with 10 demes and no recombination, using Proseq v2.91. ***P , 0.001, **P , 0.01, *P , 0.05. () and (þ): observed nucleotide diversity, respectively, lower and higher than expected under neutrality. e Estimated length in base pair. f Number of haplotypes. g Haplotype diversity (Nei 1987). h SLDB defined as contiguous sets of nucleotides including at least 2 polymorphisms in complete LD: number of blocks/mean number of SNPs per block/mean length in base pair per block. i Nucleotide diversity not estimated because of a length of coding sequence lower than 150 bp.

Drought Stress Genes Differentiation in Pinus pinaster 425

Patterns of Nucleotide Polymorphism The average nucleotide diversity, hp, across polymorphic fragments was 5.48 (103) and varied from 1.51 (103) for erd3 to 9.45 (103) for rd21A (table 5). This mean value dropped to 4.63 (103) when including the 2 monomorphic genes. Based on 95% CIs built using the variance formula provided by Tajima (1983) and assuming normal distributions, diversity estimates at all genes largely overlapped (data not shown), thus even hp values from genes at range extremes could not be considered as significantly different. Nucleotide diversity at silent sites was higher (between 2- and 8-fold) than diversity at nonsynonymous sites for 7 genes out of the 9 where the comparison was possible. This general trend is consistent with what is expected in functional regions because amino acid changes are likely to be constrained by background selection of variable intensity depending on the region (Li and Graur 1991). Remarkably, 2 genes (erd3 and dhn-1) showed similar values for both types of diversity, which was due to 2 nonsynonymous SNPs with relatively high MAF (;17–20%) in erd3 and a high proportion (11 out of 21) of nonsynonymous polymorphic sites in dhn-1. The average of hS was slightly higher (6.24 103) than hp: values ranged from 3.57 (103) for erd3 to 11.11 (103) for rd21A, with corresponding 95% CIs also overlapping. Significant spatial genetic structure associated with the current fragmented distribution of maritime pine has been reported for various marker systems (Petit et al. 1995; Mariette et al. 2002; Burban and Petit 2003; Bucci et al. 2007). As a consequence, classical neutrality tests based on allelic frequency spectrum were not applied to the overall sample of data because similar patterns could arise from both various demographic and natural selection effects and could lead to incorrect data interpretations (Hein et al. 2004). Instead, given the large variation observed for hp and hs values among genes, possible nucleotide diversity departures from values expected under neutrality were tested with coalescent simulations assuming an island model in subdivided populations and no recombination. Five genes out of 10 presented values that were significantly higher than expected: Glucan, dhn-1, and lp3-1 for hp and PR-AGP4 and dhn-2 for hS. Haplotype diversity estimates (Hd) among genes ranged from 0.498 for erd3 to 0.874 for lp3-1 with a mean value of 0.75 (table 5). The high values of both Hd and hp for lp3-1 can be explained by the presence of a nuSSR in the sequenced part of the 3# UTR, a case that is usually associated with a higher mutation rate (Brohede and Ellegren 1999). Excluding the microsatellite region, hp dropped to 6.19 (103) for lp3-1 and was no longer an outlier. Overall, the number of contiguous sets of SNPs in complete LD was very low and their size very small (table 5). This is consistent with previous conifer studies (Gonza´lez-Martı´nez et al. 2006; Heuertz et al. 2006) and illustrates the variation and generally very rapid LD decay along single genes, (see also supplementary fig. S3A and B [Supplementary Material online] for LD estimates among polymorphic SNPs across genes in all populations). Comparing genes, however, lower values of Hd, and thus stronger haplotype structure, were observed for 3 of them (PR-AGP4, Glucan, and erd3), consistently with

their higher mean length of SLDB, ranging from 46 bp for PR-AGP4 to 78 bp for glucan (table 5). The opposite trend was observed in 2 genes (GRP3 and lp3-1). These patterns could be due in part to the effects of genetic differentiation among populations, but all genes were not affected in a similar manner; thus, other forces might be acting. Genetic Differentiation at Control Loci and Candidate Genes Throughout the text, FST refers to the classical estimate of Weir and Cockerham (1984), considering either the haplotype (or allelic) level for each locus (gene or nuSSR) or biallelic SNP loci, whereas NST refers to the estimate of Nei and Kumar (2000) only at the haplotype level (see Materials and Methods). For the control loci (8 nuSSRs), the mean FST value observed for the 10 populations was 0.15, with NZPR413 and NZPR1078 showing the largest and lowest estimates, respectively (supplementary table S2, Supplementary Material online). Compared with estimates observed across the full set of 24 populations, the FST range and average were slightly inflated across the 10 populations selected for candidate gene sequencing (supplementary table S2, Supplementary Material online). However, FST values among the 6 homogeneous gene pools were very similar, whether including all 24 populations or only the 10 selected populations (table 4 and supplementary table S3B [Supplementary Material online]), confirming that the genetic structure in this contrasted subset was representative of the whole species range. For the candidate genes, the mean FST value for the 10 populations was similar to that for nuSSRs (;0.14, table 6), although a larger range was observed (from 0.002 for dhn-2 to 0.33 for lp3-3, table 6), in particular toward the lower values. A large variation was also found for FST values at polymorphic SNPs (with MAF .5%) along each gene, the highest range being observed for PR-AGP4 (from 0.01 to 0.64) and erd3 (from 0.07 to 0.59). Comparing different estimates of genetic differentiation at nuSSRs, the mean FST value was comparable with the mean RST value (defined in supplementary Materials and methods, Supplementary Material online) estimated among the 6 gene pools but lower among the 10 populations (mean RST of 0.22 compared with mean FST of 0.15, supplementary table S2 [Supplementary Material online]). At candidate genes, no strong difference was observed between FST and NST mean values across the 10 populations. However, the FST value was half the NST value for GRP3 (0.16 compared with 0.27 in table 6), and this difference was significant at P , 0.006 using an allele permutation test implemented in the SPAGeDi software v. 1.2 (Hardy and Vekemans 2002). This means that a phylogeographical pattern exists for this gene due to a correspondence between the phylogeny of haplotypes and their geographical distribution (see also supplementary fig. S4, Supplementary Material online). Outlier Detection All analyses were performed first considering either the 6 homogeneous gene pools defined from nuSSRs data or the 10 populations separately. The results obtained were

426 Eveno et al.

Table 6 Molecular Differentiation Estimates for 11 Candidate Genes across the 10 Populations (excluding missing data) Candidate genes PR-AGP4 CCoAOMT GRP3 Glucan dhn-1 erd3 dhn-2 lp3-1 lp3-3 pp2c rd21A Total Mean

Na 11.1 8.6 9.5 8.6 10.3 7 5.6 7.2 5.4 5.3 5.4

STb 85 31 8 37 20 13 21 12 9 12 45 293 26.6

NST g

0.210*** 0.178*** 0.273*** 0.078 0.051 0.256*** 0.018 0.088* 0.215*** 0.065 0.146* 0.134

NHc

SNRd

FST

Range of Variatione

NHGf

FSTHG

25 20 17 10 11 12 17 24 8 10 15 167 15.2

19 13 7 5 7 4 6 9 6 4 12 92 8.4

0.249*** 0.140*** 0.158*** 0.080 0.027 0.306*** 0.002 0.058* 0.330*** 0.055 0.154***

0.003–0.636 0.00–0.312 0.106–0.346 0.036–0.096 0.000–0.088 0.068–0.587 0.047–0.089 0.010–0.177 0.038–0.265 0.066–0.141 0.010–0.299

9 7 10 4 7 10 7 17 6 7 5 89 8.1

0.318*** 0.168** 0.187*** 0.077 0.029 0.286*** 0.037 0.062* 0.225** 0.023 0.127*

0.137

0.128

a

Mean number of analyzed haploid sequences per population, excluding missing data. Total number of polymorphisms excluding missing data. c Number of haplotypes used for the estimation of FST. d Number of polymorphisms excluding rare ones (MAF ,5%) and those in complete LD. e Minimum–maximum values of FST at each SNP, excluding rare polymorphisms and those in complete LD. f Number of haplogroups used for the estimation of FSTHG; g Permutation tests: ***P , 0.001, **P , 0.01, *P , 0.05. b

very similar, in terms of proportions and nature of outliers, whether at the haplotype or SNPs levels. Despite a drop in sample sizes of 25–50% from the 6 gene pools to the 10 population sampling, we did not observe any apparent loss in power for outlier detection across the 3 methods, conversely to what we had assumed initially. In some cases, outliers were even masked by the clustering of populations. Therefore, only the 10 populations analysis is presented here, and results using the 6-groups clustering are given as supplementary tables S4 and S5 (Supplementary Material online).

Following the Fdist2 approach, one nuSSR, NZPR413, was a clear outlier above the 95% quantile upper bound (supplementary fig. S5B [Supplementary Material online], sample size of 48 gametes). The neutral envelopes were thus simulated again after excluding this marker (using the new FST mean value of 0.138). All remaining FST values for nuSSRs fell within the 5% and 95% envelope bounds (fig. 2) and showed a smaller range of mean withinpopulation diversity values (HS) than that of candidate genes, especially because of the high HS values for lp3-1 and dhn-2 (fig. 2).

FIG. 2.—Distribution of observed FST values for each gene, for SNPs within genes, and for nuSSRs, as a function of their mean heterozygosities (HS), across the 10 populations that constitute the sequencing panel (FST for simulations of 0.138). The 5% and 95% quantiles of the simulated neutral envelopes are represented by continuous and dotted lines (corresponding to a 2-tailed global test at 10%), respectively for the Fdist2 and FstSNP methods. Black lines represent the boundaries of envelopes for the first set of genes (with an average of 14 gametes for SNPs and 9 gametes for genes excluding missing data), and gray lines those for the second set of genes (with 6 gametes for both SNPs and genes on average). Outlier genes detected with Fdist2 are in bold, and outlier SNPs detected with FstSNP are indicated by arrows.

Drought Stress Genes Differentiation in Pinus pinaster 427

Table 7 Outliers Detected at the Haplotype Level across 10 Contrasted Populations, Using the Fdist2, FstSNP, or BayesFst Methods Fdist2

BayesFst HGa

Gene PR-AGP4 erd3 GRP3 dhn-2 lp3-1 dhn-1

FST 0.249 0.306 0.158 0.002 0.058 0.027

P Valuec

q Valued

0.01

0.09

0.06

0.27

Hb

P Valuec

aie

SDf

0.04 0.10 0.09 0.03 0.11 0.02

0.67 1.29 0.63 1.14 0.58 1.12

0.47 0.67 0.40 0.68 0.57 0.60

P Valuec

aie

SDf

0.04

1.22

0.58

0.05 0.05 0.01

0.39 0.75 1.28

0.63 0.60 0.58

a

Groups of haplotypes including haplotypes with less than 10% of polymorphisms differences relative to the total number of polymorphisms along each gene (see details in the text). b All haplotypes are considered separately as independent alleles. c Significance level (P value) for detected outliers (in bold for P , 0.05). d q values are the positive FDR analogues of P values. They were obtained for each outlier gene, by considering 11 independent hypotheses corresponding to the 11 candidate genes, and for each SNP, by considering all 94 hypotheses corresponding to the 94 polymorphisms tested. e Mode of the posterior distribution of ai. f Standard deviation of the posterior distribution of ai.

At the haplotype level, the 2 dehydrin genes dhn1 and dhn-2, presented, out of the 11 genes analyzed, lower FST estimates than the 5% lower bound (fig. 2 and table 7). When computing q values analogs to P values, as if in a multiple testing framework controlling for the FDR while randomly repeating the same experiment, dhn-1 was still significant at 9%. Using BayesFst, these negative outlier patterns were supported along with that of lp3-1 at 5% based on the posterior distributions of locus-effect parameters ai, and using either raw data or grouping close haplotypes (respectively ‘‘H’’ or ‘‘HG’’ analyses in columns 5 and 8 of table 7, see also ai posterior distributions for detected outliers in supplementary fig. S6 [Supplementary Material online]). Three additional outliers were also identified with higher differentiation values than expected either with the H or HG analyses: PR-AGP4 and erd3 at 4% and GRP3 at 9% (table 7, positive values of ai). Envelopes of the FST distribution were also simulated specifically for SNPs with the FstSNP method. For HS values ranging from 0.1 to 0.4 (maximum HS value of 0.5 for biallelic markers), wider neutral expected areas were produced (due essentially to higher upper bounds) than those defined with the Fdist2 method (fig. 2). The observed variance of simulated FST values is thus larger toward lower heterozygosity corresponding to alleles closer to fixation. Generally, across both FstSNP and BayesFst, outlier patterns at the SNP level were consistent with those observed at the haplotype level (PR-AGP4, erd3, dhn-1, and dhn-2, tables 7 and 8), but distinct outlier patterns or stronger outliers were also found at others (CCoAOMT and GRP3). This result would be expected in a situation where many SNPs are not in complete LD, some being independent even at short distances within genes (supplementary fig. S3A and B, Supplementary Material online). Out of 94 nonredundant SNPs with MAF above 5%, 15 were found as outliers using significance levels of 10%, but only 2 positive outliers were detected by both FstSNP and BayesFst methods, and none of the 8 outliers detected with FstSNP remained significant after correction for multiple testing using q values (table 7). It should be noted also that with the BayesFst method, only

positive outlier SNPs were detected but none with negative ai estimates (even at 10%), comparing with the FstSNP method for which more outliers identified fell below the simulated bounds. For PR-AGP4, about half of the SNPs (12 out of 20) were detected as positive outliers (FST values higher than expected P values ranging from 4% to 9%), consolidating the pattern observed at the haplotype level. Interestingly, most were located in the first large intron or in the 3# UTR region (fig. 3), whereas the one nonsynonymous mutation was not detected as a significant outlier. Among those 12 outlier SNPs, all except SNP S525 were correlated (r2 varying from 0.5 to 0.9), although not in complete LD. For GRP3, which was an outlier at the gene level at 9% with BayesFst, only one synonymous SNP (out of 6) was a positive outlier with BayesFst (S164, table 8), whereas the nonsynonymous mutation with an FST of around 30% (S110, table 8) was not significant with either method (FstSNP or BayesFst). In contrast, the one nonsynonymous SNP for erd3 (S42, table 8) was the only outlier (out of 4). For negative outlier SNPs, based on P values, the dhn1 and dhn-2 genes detected at the haplotype level also had a few SNPs having FST values lower than expected with the FstSNP method (table 8), one nonsynonymous SNP from dhn-1 being the strongest (S725, table 8). Although not detected as an outlier at the haplotype level, CCoAOMT presented 2 strong negative SNP outliers, one nonsynonymous (I832, table 8), which were in LD (r2 5 0.8). No correlation was observed among SNPs from different genes. For all detected outliers with BayesFst, whether at the gene or SNP levels, the standard deviations of the ai posterior distributions were all reduced and comprised between 0.5 and 0.7, comparing with the prior distribution values of one. Contribution of Populations to Outlier Patterns Another interest of the Bayesian method used here was the possibility to account for heterogeneity in migration rates by estimating and testing population effects (bj) to determine if they were significantly different from prior mean values. Significant population effects would be indicative

428 Eveno et al.

Table 8 Outliers Detected at the SNP Level for SNPs Having a Heterozygosity Greater Than 0.1 across 10 Contrasted Populations, Using the Fdist2, FstSNP, or BayesFst Methods SNP PR-AGP4 S182 S331 S472 S525 S666 S762 S1177 S1181 S1409 S1480 S1488 GRP3 S110 S164 S633 erd3 S42 CCoAOMT I832 S980 dhn-2 S202 S332 S406 lp3-1 S176 dhn-1 S310 S725

a

Gene Region/Type of Mutation Intron/ Intron/ Intron/ Intron/ Intron/ Intron/ Exon/ns Exon/s 3# UTR/ 3#UTR/ 3#UTR/ Exon/ns Exon/s 3# UTR/ Exon/ns Exon/ns Exon/ns Exon/ns Intron/ Exon/s 3# UTR/ Exon/s Exon/ns

FST 0.383 0.398 0.382 0.550 0.341 0.339 0.333 0.350 0.365 0.394 0.391 0.285 0.347 0.277 0.587 0.034 0.031 0.047 0.056 0.045 0.174 0.063 0.043

Fdist2 P Valueb 0.08 0.06 0.08 0.02 0.09 0.09 0.08 0.08 0.06 0.08 0.08 0.06 0.10 0.04 0.01 0.02 0.04 0.07 0.08 0.02

FstSNP b

P Value

0.07

BayesFst c

q Value

0.51

0.06 0.01 0.02

0.49 0.33 0.33

0.05 0.04 0.06 0.01

0.49 0.49 0.49 0.33

b

P Value

aid

SDe

0.05 0.05 0.08 0.08 0.09 0.08

0.95 0.98 0.62 0.74 0.95 0.73

0.50 0.49 0.50 0.62 0.55 0.52

0.04

0.97

0.52

0.04

1.21

0.6

0.03

1.55

0.71

a

s: synonymous polymorphism, ns: nonsynonymous polymorphism. b Significance level (P value) for detected outliers (in bold for P , 0.05). c q values are the positive FDR analogues of P values. They were obtained for each outlier gene, by considering 11 independent hypotheses corresponding to the 11 candidate genes, and for each SNP, by considering all 94 hypotheses corresponding to the 94 polymorphisms tested. d Mode of the posterior distribution of ai. e Standard deviation of the posterior distribution of ai.

of distinct migration rates from other populations, different effective population sizes, or particular mating patterns. Using data from the 10 populations separately, bj values were significantly higher than expected (at 5%) for Tamrabta (Morocco), Tabarka (Tunisia), and Pleucadec (northern population in France), whereas they were significantly lower and negative for Oria and Coca (Spain) (fig. 4). These

results illustrate the overall contribution of populations to outlier patterns, with the Moroccan and Tunisian populations generally showing the strongest differentiation from the others and from each other at nuSSRs (table 3), and across most genes (supplementary table S6, Supplementary Material online). These contributions were observed especially for PR-AGP4, erd3, and GRP3, which exhibit the

FIG. 3.—Schematic representation of PR-AGP4 haplotype structure (singletons excluded) and occurrence of each principal haplotype in the different gene pools (C: Corsica, F: continental France, S: northwestern and central Spain, O: Oria, M: Tamrabta, T: Tabarka). Each base is represented by a different color A (black), T (dark gray), C (light gray), G (white), deletion (-). Exons are represented by boxes and noncoding regions (5# UTR, 3# UTR, and intronic regions) by a line. SNP positions detected as outliers at 10 % with FstSNP or BayesFst methods are indicated by arrows. A nonsynonymous SNP not detected as an outlier but showing a FST value of 0.38 among the 6 gene pools is indicated in brackets.

Drought Stress Genes Differentiation in Pinus pinaster 429

FIG. 4.—Mode values of the posterior distributions of bj for each population along a latitudinal gradient. Significance levels for outlier populations are indicated in brackets. The dotted line represents the value of the prior mean bj distribution set at 2 in the BayesFst method.

strongest pairwise FST values with other populations (supplementary table S6A, B, and D, Supplementary Material online), with values often above 0.4. It is remarkable than even for negative gene outliers with FST values close to zero overall, Pleucadec and Tabarka populations can show substantial pairwise differentiation with other populations (e.g., between Tabarka and other populations for dhn-1 [supplementary table S6H, Supplementary Material online]; between Pleucadec and 4 other populations for pp2c [supplementary table S6G, Supplementary Material online]; and between Tamrabta and Arenas, Pleucadec and Arenas or Tabarka for dhn-2 [supplementary table S6E, Supplementary Material online]). The variation in bj parameter estimates also confirmed the contrasted patterns of immigration rates in the different populations, indicating that Tabarka, Tamrabta, and Pleucadec might have been more isolated, a result consistent with their geographic locations. Conversely, the Coca and Oria Spanish populations had negative and significant bj values, which illustrate their contribution to the overall negative outlier patterns (fig. 4). This could be due either to higher immigration rates from other populations or to their having larger effective population sizes. It is worth noting that for PR-AGP4, erd3, and GRP3, which are positive gene outliers, Oria showed pairwise FST values close to zero with either Tamrabta or Tabarka, which are also among the positive outliers for the population effect (supplementary tables S6A, B, and D, Supplementary Material online). This could thus be consistent with more exchanges (past or present) from these populations and Oria, despite their higher isolation to other populations.

Discussion Outlier Patterns Detected in Drought-Response Candidate Genes This work reports nucleotide diversity and differentiation estimates for 11 polymorphic candidate genes for drought stress tolerance in contrasted, rangewide P. pinaster population samples. Using a multilocus scan of differentiation

based on haploid sequence data, we compared 3 different methods that aimed to detect outliers from simulated neutral expectations: 1) the Beaumont and Nichols (1996) method (referred to as Fdist2 method), 2) an extension of this approach that we specifically developed for biallelic SNPs (the FstSNP method), and 3) a Bayesian method (Beaumont and Balding 2004), referred to as the BayesFst method. Outliers were identified at the haplotype level for 5 genes using a 5% threshold, which were robust across methods for 2 of them (dhn1 and dhn2). Two genes presented a higher differentiation (FST values) than expected (PR-AGP4 and erd3), suggesting that they could have been affected by the action of diversifying selection among homogeneous gene pools and populations. In contrast, 3 genes presented a lower genetic differentiation than expected (dhn-1, dhn2, and lp3-1), which could represent signatures of homogenizing selection among populations and/or balancing selection within populations. Outlier patterns at the SNP level were consistent with those observed at the haplotype level for the 5 outliers identified, but others like CCoAOMT presented distinct patterns. Even if several nonsynonymous sites (for erd3, dhn1, or CcoAOMT) might be of potential interest by themselves, particular combinations of SNPs might better represent candidate genes functional importance. Overall, we detected a higher proportion of outliers at the haplotype level (45% i.e., 5 out of 11) compared with that at the SNP level (6% and 4% for positive and negative outliers, respectively, and 11% in total out of 94 informative SNPs at P , 5% combining results from the FstSNP and the BayesFst methods). This points out that particular combinations of sites in haplotypes can have a greater functional significance than single sites, but it is not possible to conclude on which test, either haplotype-based or SNP-based, is more powerful for outlier detection. The power of haplotype-based method could, for example, depends on the structure of LD existing among the SNPs. In addition, nonsynonymous mutations may not alone harbor higher functional importance than those in nontranslated regions because other SNPs located in promoter or intronic regions can also be affected by selection (Shabalina et al. 2001). In this study, the proportion of outlier SNPs

430 Eveno et al.

among those that were nonsynonymous and those that were silent was similar (22% and 15%, respectively). The proportion of outliers reported here at the haplotype level was greater than those reported in previous published surveys using FST-based outlier detection methods for studying adaptive divergence, in which no more than 6% of outliers were detected (Wilding et al. 2001; Campbell and Bernatchez 2004; Achere´ et al. 2005; Bonin et al. 2006). The main difference with our study is that the cited surveys focused on anonymous markers (mostly Amplified fragment length Polymorphisms [AFLPs]), instead of candidate genes. In recent genome scans involving geneassociated markers, the frequency of outliers identified was indeed higher than at anonymous markers: 12% of 78 expressed sequence tag-based microsatellites in Vasema¨gi et al. (2005) and 21% of SNPs from expressional candidate genes for abiotic stress in Scotti-Saintagne et al. (2004), compared with 9% of outliers identified on average for other marker types (genomic nuSSRs and AFLPs). Focusing directly on strong functional and expressional candidate genes in our study might have increased the likelihood of detecting outliers, providing further empirical support for the candidate gene approach. Although the observation of both positive and negative gene outlier patterns seems more consistent with locus-specific effects due to different types of selection, rather than with demographic effects which would have affected the genome in a more homogenous manner, it is necessary to examine how these demographic factors have been considered in the different models used. Both population sizes and migration rates are accounted for in the metapopulation model underlying BayesFst, and both Fdist2 and FstSNP are conservative in the sense that they define rather large ‘‘neutral envelopes’’ likely to encompass much of the variation associated with demographic and sampling effects. Moreover, even for low Hs values, the lower expected distribution with Fdist2 was shown to be very robust to a mixed colonization model with different numbers of founders (Beaumont and Nichols 1996). For the 3 negative outliers identified in this study (dhn-1, dhn-2, and lp3-1), no clinal variation was observed overall and no differentiation was detected between populations pairs, except between Tamrabta/Pleucadec with Arenas and Pleucadec with Tabarka (supplementary tables S6E, F, and H, Supplementary Material online) for dhn-2 due to differences in particular haplotypes, between Tabarka and other populations for dhn-1 (but with much lower values [around 0.2] than for positive outliers) due to one missing haplotype in Tabarka, and between Oria and several populations for lp3-1. Otherwise, we observe the maintenance of most haplotypes for these 3 genes in all populations, even for those separated by large geographic distances (cf. most pairwise values below zero in supplementary tables S6 [Supplementary Material online], in contrast with much stronger pairwise FST values at nuSSRs [table 3]). Thus, from the general assumption that the effects at these 3 genes could be at least partly due to homogenizing selection, a few mechanisms can be invoked. First, balancing selection at the molecular level favoring the maintenance of alleles at intermediate frequencies could result from the strong heterogeneity of environmental condi-

tions across years with recurrent events of more severe drought periods. Balanced polymorphisms at genes involved in adaptive traits may thus be a consequence of this large temporal heterogeneity that would alternatively favor one haplotype over another, one being more advantageous during events of severe drought for example. Such interannual variability in climate was proposed by Jump et al. (2006) to interpret the patterns of balanced polymorphisms at a putative locus linked to temperature effects in Fagus Sylvatica. This heterogeneity may also be combined with spatial heterogeneity (various levels or trophic and edaphic conditions) leading to disruptive selection within stands and increasing allelic diversity at genes involved in the trait variation (Rueffler et al. 2006). One way to test among alternative hypotheses would be to collect genotypic data and estimate FIS coefficients to see whether heterozygotes are or are not in excess in natural populations for these particular genes. It will be interesting also in future association studies to see whether effects on phenotypic variation are observed for specific alleles, particular combinations of alleles at one locus (in heterozygous state) or at several loci interacting and involved in the adaptive traits targeted. Factors Affecting the Efficiency of Detecting Selection in the Different Methods Tested All methods used were based on coalescent simulations, but each presents its own set of hypotheses and limitations and raises a number of important issues related to robustness to low sample sizes and departures from demographic and mutation model assumptions, to their power to detect a particular type of selection, and to the levels of experimental differentiation considered as neutral. Beaumont and Nichols (1996) showed, by simulation, that the joint distributions of FST and heterozygosity (HS) were robust to variation in sample size when above 50 haploid individuals. However, even if FST distributions were broader for sample sizes below 20, they were still informative. In our study, the drop in sample sizes from 9 to 6 gametes in the Fdist2 method (corresponding to the 2 sets of genes studied) or from 14 to 6 gametes in the FstSNP method, affected the 90% upper boundary of the neutral simulated envelope by around 5 units of FST values (in %) on average for HS lower than 0.5 and a bit less for higher HS (fig. 2). Although this did not greatly affect our results (only erd3 might have been a stronger outlier with bigger sample size at the haplotype level with Fdist2), we can envisage that for genomic scans of larger regions or loci, small sample sizes could miss outliers; therefore, we recommend a minimum of 20 haploids for such studies. Our analysis also shows that the effect of the mutation model in coalescent simulations was important, and not entirely accounted for by the Fdist2 method, justifying the use of simulations that explicitly model biallelic SNPs (FstSNP method). For the range of HS values observed at biallelic loci, the simulated neutral envelope was far broader (especially the upper bound for HS below 0.25) than when using an IAM or a SMM (fig. 2). This is consistent with the Beaumont and Nichols (1996) test of sensitivity to mutation rates, which already showed that for lower mutation rates,

Drought Stress Genes Differentiation in Pinus pinaster 431

and HS values below 0.2, upper boundaries were inflated, but not as much as what we observed using the FstSNP simulations. Thus, a larger variance of FST values for biallelic markers with low diversity HS can result in reduced power for detecting positive outliers. Comparing positive SNP outliers detected with either BayesFst or FstSNP methods, P values lower than 5% were observed in more than half the cases with BayesFst, whereas only 2 outliers were significant at 10% with FstSNP (tables 7 and 8). This highlights the reasonable discrimination power of the Bayesian method, even for low sample sizes, for detecting directionally selected loci, as noted in Beaumont and Balding (2004) simulations. However, the same authors also showed, with a mean FST of 0.1, the low power of their method for detecting negative outliers in the biallelic case, except for the stronger selection coefficients tested (;10%). In our study with a mean FST of around 0.14, none of the 4 negative SNP outliers detected with FstSNP were also identified with BayesFst. At the haplotype level, however, the Bayesian method was shown to give more precise FST estimates as the number of alleles increased from 2 to 8 (Balding 2003), which might lead to greater discriminating power in our case for the haplotype or gene level versus the SNP level. This is illustrated in the example data set used by Beaumont and Balding (2004), which led to identifying 5 multiallelic loci under balancing selection for a mean FST of 0.17 that were not detected by the Fdist2 method. In our study for genes with a number of haplotypes varying from 8 to 25 (table 6), 2 more positive outliers were detected with BayesFst that otherwise fell within the 10% boundaries of the Fdist2 envelope. Another critical issue is that of multiple hypotheses testing, considering that in classical frequentist methods, a separate test is performed at each locus, so the number of false positives is bound to increase with the number of SNPs tested. Referring to the 2 frequentist methods Fdist2 and FstSNP, the proportion of outliers at the haplotype level drops to 9% with a type I error rate of 5% (18% with a risk of 10%), whereas at the SNP level, this proportion falls just above this risk. No test remained significant using FDR q values, except for dhn-1 at 9%. However, because SNPs included in our data set have not been chosen randomly but have been identified from carefully selected expressional and functional candidate genes, it seems more adequate to assess the results on a per gene basis. Doing that for candidate genes for which outlier SNPs were identified (PR-AGP4, erd3, CCoAOMT, and dhn1), the proportions of outlier SNPs are above the type I error rates (15%, 25%, 14%, and 10%, respectively, at a 5% error rate). Among all the SNPs detected as outliers, only those within PRAGP4 were correlated with various strengths except one, as well as the 2 outliers for CcoAOMT (r2 ; 0.08), which explains their correlated test outputs. However, no correlation was observed among the outlier SNPs from the different genes. With respect to the Fdist2 and FstSNP methods, an advantage of the Bayesian analysis is that, through the a priori distributions, it avoids the classical problem of multiple testing encountered in frequentist-based methods (Shoemaker et al. 1999; O’Hagan and Luce 2003). Considering this difference in interpretation, and also that demography in

BayesFst is accounted for in a more flexible way than in Fdist2, outliers identified in common with both methods reinforce the likelihood that these genes might have been affected by natural selection and that the multiple testing correction attempted in our case might have reduced statistical power too strongly. Finally, the assumption of a constant FST due to homogeneous migration rates among populations might be the strongest assumption in the Beaumont and Nichols (1996) model because it was shown to be robust to a range of departure from the assumption of migrations-drift equilibrium. In reality, migration rates between natural populations of maritime pine are probably heterogeneous, considering the pairwise FST matrix between populations (table 3). Also, it is likely that populations isolated by geographic barriers such as Oria (Gonza´lez-Martı´nez et al. 2007) could have different or lower effective population sizes. The Bayesian approach is thus an interesting alternative because it accounts for heterogeneity in population size or migration rates and allows testing of population-specific effects in the regression step. This might explain differences observed in P values for positive SNP outliers detected with either BayesFst or FstSNP. It is therefore difficult to make recommendations as to which method is more appropriate and efficient for detecting selection based on genetic differentiation, in particular because we do not know which are the ‘‘true’’ loci under selection. We do have, however, a preference for the Bayesian method of Beaumont and Balding (2004) due to its efficiency in our case despite small sample sizes, the flexibility of assumptions concerning demographic parameters and population effects, which might account to nonequilibrium situations, and the fact that it deals effectively with the problem of multiple testing. Given the probability meaning of P values, they should not be set at too low levels to avoid being overly conservative, especially in cases of small sample sizes. Its main limitation, illustrated in our case, might be its lack of power to detect negative outliers for biallelic SNPs, which calls for either much larger samples or the application of another method for comparison. The FstSNP method appears to be an interesting alternative accounting for SNPs frequency distribution and typically large FST variances. Functional Interpretation of Candidate Genes Showing Outlier Patterns Seeking the causes for outlier behavior will ultimately lead to functional genomics studies in your species of interest (Storz 2005), but information can also be obtained from homologous genes in other species. The question is whether the interpretation that we proposed in terms of past or still current selection pressures for the 5 outliers identified can be linked to their putative implications in drought responses. We try below to summarize available knowledge on these genes, acknowledging the fact that genomic studies of drought stress in plants reveal large numbers of potential candidates that ultimately constitute complex regulatory networks (Seki et al. 2001, 2007; Ramanjalu and Bartels 2002; Street et al. 2006). Concerning the dehydrin genes potentially submitted to homogenizing selection (dhn-1 and dhn-2), which

432 Eveno et al.

showed both an absence of lineage differentiation among populations, a high nucleotide diversity, and numerous nonsynonymous mutations for dhn-1, a large body of results over the past 10 years have shown their strong link with drought stress response providing also insights into their functional role (Dure 1993; Ingram and Bartels 1996; Bray 1997; Ramanjalu and Bartels 2002): dhn-1 and dhn2 both encode dehydrin proteins belonging to the late embryogenesis abundant (LEA) gene family, known to allow plant protective reactions against dehydration. Their highly hydrophilic nature plays a role in the maintenance of protein or membrane structure, sequestration of ions, binding of water, and thus helps to maintain the minimum cellular water requirements. In conifers, increased transcript accumulation for the Picea glauca dhn-1 homolog was observed in needle (and possibly roots) tissues after applying stress for 48 h (Richard et al. 2000), and in P. taeda, a dhn-2 homolog (based on BlastN searches) was overexpressed in needles under mild drought stress (Watkinson et al. 2003). Evidence of a Populus dhn-1 homolog overexpression under different abiotic stresses, including drought, has also been shown (Caruso et al. 2002). Less is known about the role of lp3-1, belonging to a gene family called ASR (ABA/water stress/ripening-induced) first described in tomato (Thompson and Corlett 1995), but this gene was shown to be overexpressed in stems under progressively applied water stress, in order to be closer to field conditions (Padmanabhan et al. 1997). This gene also showed a significant genetic association with an increase in proportion of latewood in P. taeda, although just caused by a few extreme clones (Gonza´lez-Martı´nez, Wheeler, et al. 2007). The consistent overexpression of these 3 genes in various experiments of drought stress or different species might be compatible with a homogenizing mode of selection action that could be linked to more or less frequent recurrent events of drought experienced in maritime pine natural populations. The erd3 gene had the lowest diversity among all genes studied and a very high FST value for one nonsynonymous SNP that was fixed for one allele in half the populations. This pattern can be compared with what was observed by Gonza´lez-Martı´nez et al. (2006) in a large unstructured population of P. taeda, with the erd3 gene homolog showing also low diversity and a few nonsynonymous mutations, with an excess of rare mutations. Alternative explanations proposed in P. taeda were a selective sweep or a signature of past population expansion. In P. pinaster, there is no simple latitudinal gradient at the nonsynonymous outlier SNP or one that would simply be consistent with a higher level of drought stress on the basis of climatic variables: the same allele that is fixed in Arenas receiving a high level of rainfall is also fixed in Tabarka having much lower rainfall. No additional information was found in the literature on this gene’s putative function. It was originally annotated in P. taeda on a cDNA clone from roots recovering from drought, by similarity with dehydration-responsive genes identified in Oryza sativa, and could be a methyltransferase with S-adenosyl Methionin (SAM)-binding domain (and led to the annotation of the erd3 complete coding sequence in Arabidopsis thaliana (TAIR:AT4G19120 in http://www.arabidopsis.org/).

For the other positive outlier PR-AGP4, most outlier SNPs are located in noncoding regions (intron and 3# UTR). This gene also presents a high diversity and strong haplotype structure across populations with 4 less frequent haplotypes being population private and many sites belonging to small blocks of LD (fig. 3, supplementary fig. S3 [Supplementary Material online]). If we consider separately populations from the western range (France, Spain, and Morocco) and from the eastern range (Corsica and Tunisia), SNP allelic frequencies (except S525) follow a latitudinal gradient from South to North. The latitude itself is linked to a decrease in temperature (correlations of 0.67, Pr , 5% with the annual mean rainfall ANMT, and of 0.71 with the mean temperature of the wettest month, MTWM, in supplementary table S7 [Supplementary Material online]) and to an increase in RDRYM (correlation of 0.5, not significant at 5%, in supplementary table S7 [Supplementary Material online]). The implication of this gene in drought stress response in maritime pine would be consistent with differential expression studies in both P. pinaster and P. taeda. Using osmotic stress in P. Pinaster, Dubos et al. (2003) revealed a strong downregulation of this gene in roots (greater than that observed in needles). In P. taeda, the homolog ptaAGP4 was identified as strongly expressed in differentiating xylem of bent trees compared with other tissues (Zhang et al. 2000). Moreover, from an expression study in the same species (Yang et al. 2005), ptaAGP4 was shown to have higher expression in one ecotype originating from a region with very low precipitation (Lost Pines in TX) compared with the ecotype from South Louisiana with wetter conditions. Interestingly, in controlled drought stress conditions, the gene was strongly repressed for both ecotypes (Yang et al. 2005), consistent with what has been observed with osmotic stress in French Atlantic (humid) P. pinaster populations. Although the functional role of this gene is not yet clear, drought stress has been shown to cause alterations in the chemical composition and physiological properties of the cell wall (e.g., wall extensibility) (Ingram and Bartels 1996). We may therefore assume a potential role for this gene in cell differentiation/elongation and growth, that would be repressed in hydroponic stress treatments, and could have evolved differentially in populations submitted to different environmental conditions. For both erd3 and PR-AGP4 candidate genes, functional genomics studies will be needed to unravel their mode of action. It might also be of interest to extend the sequencing to 5# UTR regulating regions that might contain functionally important sites for the gene expression and might have affected the diversity patterns observed. Indeed, polymorphisms localized in promoter region have been shown to be directly involved in trait variation (e.g., McGuire et al. 1994 for disease traits in humans; Andersen et al. 2005 for the gene Dwarf8 in maize) or to have been the target of positive selection that resulted in a change of regulation (e.g., the tb1 domestication gene in maize, Wang et al. 1999). Insights into Local Adaptation of Populations from Contrasted Environmental Conditions The observation of 2 positive outliers (PR-AGP4 and erd3) among the 11 genes analyzed is consistent with the

Drought Stress Genes Differentiation in Pinus pinaster 433

proposed interpretation that local adaptation, already demonstrated at the phenotypic level for traits like growth or water use efficiency (Alı´a et al. 1995, 1997; Garnier-Ge´re´ et al. unpublished results), could have led to divergent selection at genes potentially involved in adaptive traits, despite high gene flow. Local adaptation can result from many different environmental pressures (e.g., high temperature, soil fertility, and heterogeneity of rainfall), which might interact across seasons and lead not only to different responses in each population, but also to different genes being more or less strongly affected by selection for a similar response. The overall contribution of particular populations to positive outlier patterns was further illustrated by the variation in the bj population-specific parameter estimates from the Bayesian analysis (fig. 4). The Tunisian and Moroccan populations (both presenting the highest annual mean temperature, around 17°) have a preponderant role in these patterns, as well as a French northern population (Pleucadec). These populations could either be more isolated, thus experiencing less gene flow from others, or have smaller effective population sizes (Ne). Moroccan stands have been shown to exhibit low diversity using chloroplast microsatellites and allozymes (Wahid et al. 2004; Bucci et al. 2007) due to the typically small and fragmented populations sampled which are severely threatened by overexploitation and lack of natural regeneration (Wahid et al. 2004) and thus might be of relatively small Ne. However, the population studied here (Tamrabta) covers hundreds of hectares (ha) with over 200 trees per ha. Similarly, the Tabarka and Pleucadec populations can be considered as having reasonable population sizes due to large surrounding stands. Thus, the combination of low immigration from populations occurring in different environments, and relatively strong selection pressures at the extreme of the range are necessary to explain their higher differentiation with other populations. This could have generated a favorable situation for adaptive divergence to occur efficiently at particular genes such as PR-AGP4, erd3, and GRP3. Beaumont and Balding (2004) showed through simulations that appreciable discrimination of positively selected loci was obtained when the selection rate was 5-fold higher than the migration rate, in a situation where FST was lower than in our case, so this might constitute a conservative estimate for selection coefficients at positive outliers detected. Selection coefficients in the range of 0.01 to 0.04 would be sufficient to allow the detection of outliers with the methods employed here in case of genes with relatively high diversity, on the basis of simulations in a comparable situation by Hoffman et al. (2006). In contrast for Hs values below 0.4 and for an upper confidence boundary reaching 0.4 for FST, we may assume that selection effects must have been fairly strong to produce outliers, in particular for PR-AGP4 S525 and erd3 S42. For other genes with weaker selection coefficients, we could hypothesize that gene flow could have interacted to mask particular effects of local adaptation at the molecular level. Conversely, Coca and Oria (in central and southern spain, respectively) behaved as negative population outliers from their bj parameter values (fig. 4), indicating that they would either experience more gene flow from other populations or have a larger population size. For Oria, this might

be consistent with the fact that it would belong to former glacial refugial areas in southeastern Spain (suggested initially by Salvador et al. [2000]). Substantial Chloroplast simple-sequence repeat variation has been found in this population (12 haplotypes and genetic diversity values of 0.96; our unpublished results) despite being relatively well isolated from nearby populations (Gonza´lez-Martı´nez, et al. 2007). Coca negative population outlier pattern may originate from admixture between surrounding gene pools in this population of large effective population size. Perspectives and Conclusion Although confirmatory evidence may come from functional studies, other sources of evidence can strengthen assumptions that selection might have influenced or caused the outlier patterns observed (Storz 2005; Vasema¨gi and Primmer 2005). Additional evidence of clinal variation and possible links to the climatic conditions in which the populations have evolved for PR-AGP4 has been mentioned above, but it might not be easy to disentangle which environmentally mediated factors had the strongest effect on putative candidate genes or to discern whether these effects are confounded with demographic effects. Studying differentiation patterns at a finer spatial scale along an environmental gradient could avoid historical effects, but one needs also to carefully evaluate alternative neutral scenarios. More insight from the respective roles of demography and selection can come from tests performed within larger and unstructured populations, which might be more powerful than FST-based tests to detect recent selective events (Fay and Wu 2000; Biswas and Akey 2006). Complex patterns of selection could also lead to intermediate levels of FST and go undetected as values would fall within the neutral expected envelopes. Indeed, it has been shown theoretically that in species with high gene flow, selection for adaptive divergence at polygenic traits might be more associated with covariances of allele frequencies across populations than with high FST at individual loci coding for a trait (Latta 2003; Le Corre and Kremer 2003). This would call for a shift toward multilocus methods to identify selection at the molecular level, which might be envisaged in the near future with sequence data from larger samples of genes and populations in nonmodel species. Despite their robustness to a large range of demographic scenarios (Beaumont 2005; Nielsen 2005), the FST parameters used here are estimated from allele frequencies considered as equally distant (unordered alleles), following Weir and Cockerham (1984). With the NST estimate of population differentiation on sequence data, information on the genealogical distances among haplotypes is integrated and can enable assessment of the phylogeographical structure. The NST estimate might be also more appropriate to capture Wright’s original definition of FST as an inbreeding coefficient (the probability of alleles that are identicalby-descent being combined in zygotes) and thus provide a more accurate estimation of Nm despite a larger variance (Neigel 2002). Among the genes studied, GRP3 showed a significantly higher NST than FST value, which could be due to a strong signal of phylogeographic structure. The geographic distribution of haplotype frequencies in

434 Eveno et al.

the different populations at this locus illustrates fairly well the pattern detected and is comparable with similar distributions observed previously for both mitochondrial DNA and chloroplastDNA markers (Burban and Petit 2003; Bucci et al. 2007; see also supplementary fig. S4 [Supplementary Material online]). How this type of pattern may be affected by selection is an open question and suggests an interest in developing methods based on NST for searching outliers when sequence data on candidate genes are available. This exploratory study provided criteria from FSTbased methods that allowed candidate genes potentially affected by selection to be identified. One way to validate these predicted functional roles is to perform association studies between their allelic variation and drought stress response in maritime pine natural or breeding populations. All markers, whether genes, single sites, or haplotypes, will have to be tested for association with phenotypic variation as it is not clear what will their respective impact be on the power to detect true associations (Buntjer et al. 2005). We plan to test associations between relevant SNPs and several traits related to water use efficiency (e.g., carbon isotope discrimination, growth, and biomass) in a common garden experiment. Comparing differentiation between populations at the phenotypic level (QST) and candidate gene level (FST) or mapping these genes in maritime pine pedigrees to look for colocalization with targeted traits’ QTLs, would also provide useful evidence for their putative involvement in drought tolerance trait variation (Brendel et al. 2002). Recently published results in conifers (Brown et al. 2004; Neale and Savolainen 2004; Howe and Brunner 2005; Gonza´lez-Martı´nez, Wheeler, et al. 2007) with similar reproductive and ecological characteristics (extensive natural populations, little confounding substructure), genome structure similarities (Chagne´ et al. 2004), comparable levels of nucleotide diversity, and rapid decay of LD across genes, suggest the potential efficiency of candidate gene association studies in this group of species. This study also illustrates that conifer species (despite their large genome sizes) can be good models for studying the molecular basis of adaptive divergence in natural populations. Supplementary Material Supplementary tables S1–S7 and figures S1–S6 are available at Molecular Biology Evolution online (http:// www.mbe.oxfordjournals.org/). Acknowledgments We thank David Balding for providing the ‘‘BayesFst’’ program and further assistance on its application to our data, Marı´a Luisa Lafuente and Rau´l Ferna´ndez for technical assistance, Ricardo Alia and Jean Brach for help with populations sampling and discussions on material phenotypic variation, Christian Burban for help in acquisition and interpreting climatic data on southern populations, 2 anonymous reviewers and Josquin Tibbits for his helpful comments and discussion on the manuscript. This work and E. Eveno scholarship were funded by the European

TREESNIPs project no. 836501. The work of S. C. Gonza´lez-Martı´nez was supported by a ‘Ramo´n y Cajal’ fellowship RC02-2941. Part of the sequence data analyzed here were obtained at the Genotyping and Sequencing facility of Bordeaux (grants from the Conseil Re´gional d’Aquitaine no. 20030304002FA and 20040305003FA and from the European Union, FEDER no. 2003227).

Literature Cited Achere´ V, Favre JM, Besnard G, Jeandroz S. 2005. Genomic organization of molecular differentiation in Norway spruce (Picea abies). Mol Ecol. 14:3191–3201. Aitken SN, Hannerz M. 2001. Genecology and gene resource management strategies for conifer cold hardiness. In: Bigras FJ, Columbo SJ, editors. Conifer cold hardiness. Dordrecht (The Netherlands): Kluwer Academic Publishers. p. 23–53. Alı´a R, Gil L, Pardos J. 1995. Performance of 43 Pinus pinaster Ait. provenances on 5 locations in central Spain. Silvae Genet. 44:75–81. Alı´a R, Moro J, Denis JB. 1997. Performance of Pinus pinaster Ait. provenances in Spain: interpretation of the genotypeenvironment interaction. Can J For Res. 27:1548–1559. Andersen JR, Schrag T, Melchinger AE, Zein I, Lubberstedt T. 2005. Validation of Dwarf8 polymorphisms associated with flowering time in elite European inbred lines of maize (Zea mays L.). Theor Appl Genet. 111:206–217. Andofalto P. 2001. Adaptive hitchhiking effects on genome variability. Curr Opin Genet Dev. 11:635–641. Baines JF, Das A, Mousset S, Stephan W. 2004. The role of natural selection in genetic differentiation of worldwide populations of Drosophila ananassae. Genetics. 168: 1987–1998. Balding DJ. 2003. Likelihood-based inference for genetic correlation coefficients. Theor Popul Biol. 63:221–230. Barton NH, Keightley PD. 2002. Understanding quantitative genetic variation. Nat Rev Genet. 3:11–20. Beaumont MA. 2005. Adaptaion and speciation: What can FST tell us? Trends Ecol. Evol. 20:435–440. Beaumont MA, Balding DJ. 2004. Identifying adaptive genetic divergence among populations from genome scans. Mol Ecol. 13:969–980. Beaumont MA, Nichols RA. 1996. Evaluating loci for use in the genetic analysis of population structure. Proc R Soc Lond. B. 263:1619–1626. Biswas S, Akey JM. 2006. Genomic insights into positive selection. Trends in Genetics. 22:437–446. Bonin A, Taberlet P, Miaud C, Pompanon F. 2006. Explorative genome scan to detect candidate loci for adaptation along a gradient of altitude in the common frog (Rana temporaria). Mol Biol Evol. 23:773–783. Borevitz JO, Chory J. 2004. Genomic tools for QTL analysis and gene discovery. Curr Opin Plant Biol. 7:132–136. Bray EA. 1997. Plant responses to water deficit. Trends Plant Sci. 2:48–54. Brendel O, Pot D, Plomion C, Rozenberg P, Guehl JM. 2002. Genetic parameters and QTL analysis of delta C-13 and ring width in maritime pine. Plant Cell Environ. 25:945–953. Brohede J, Ellegren H. 1999. Microsatellite evolution: polarity of substitutions within repeats and neutrality of flanking sequences. Proc R Soc Lond B Biol Sci. 266:825–833. Brown GR, Gill GP, Kuntz RJ, Langley CH, Neale DB. 2004. Nucleotide diversity and linkage disequilibrium in loblolly pine. Proc Natl Acad Sci USA. 101:15255–15260.

Drought Stress Genes Differentiation in Pinus pinaster 435

Bucci G, Gonza´lez-Martı´nez SC, Le Provost G, Plomion C, Ribeiro MM, Sebastiani F, Alı´a R, Vendramin GG. 2007. Range-wide phylogeography and gene zones in Pinus pinaster Ait. revealed by chloroplast microsatellite markers. Mol Ecol. 16:2137–2153. Buntjer JB, Sorensen AP, Peleman JD. 2005. Haplotype diversity: the link between statistical and biological association. Trends Plant Sci. 10(10):466–471. Burban C, Petit RJ. 2003. Phylogeography of maritime pine inferred with organelle markers having contrasted inheritance. Mol Ecol. 12:1487–1495. Campbell D, Bernatchez L. 2004. Generic scan using AFLP markers as a means to assess the role of directional selection in the divergence of sympatric whitefish ecotypes. Mol Biol Evol. 21:945–956. Caruso A, Morabito D, Delmotte F, Kahlem G, Carpin S. 2002. Dehydrin induction during drought and osmotic stress in Populus. Plant Physiol Biochem. 40:1033–1042. Cavalli-Sforza LL. 1966. Population structure and human evolution. Proc R Soc Lond B. 164:362–379. Chagne´ D, Chaumeil P, Ramboer A, et al. (12 co-authors). 2004. Cross-species transferability and mapping of genomic and cDNA SSRs in pines. Theor Appl Genet. 109:1204–1214. Chagne´ D, Lalanne C, Madur D, et al. (12 co-authors). 2002. A high density linkage map of Pinus pinaster based on AFLPs. Ann For Sci. 59:627–636. Chang S, Puryear JD, Dias MADL, Funkhouser EA, Newton RJ, Cairney J. 1996. Gene expression under water deficit in loblolly pine (Pinus taeda): isolation and characterization of cDNA clones. Physiol Plant. 97:139–148. Charlesworth D, Charlesworth B, McVean GA. 2001. Genome sequences and evolutionary biology, a two-way interaction. Trends Ecol Evol. 16(5):235–242. Chaumeil P. 2006. Plasticite´ mole´culaire de deux e´cotypes de pin maritime soumis a` un stress osmotique PhD Thesis. University of Bordeaux-I, Bordeaux, France. Christians JK, Keightley PD. 2002. Genetic architecture: dissecting the genetic basis of phenotypic variation. Curr Biol. 12:415–416. Costa P, Bahrman N, Frigerio JM, Kremer A, Plomion C. 1998. Water-deficit-responsive proteins in maritime pine. Plant Mol Biol. 38:587–596. Cruz RT, Jordan WR, Drew MC. 1992. Structural changes and associated reduction of hydraulic conductance in roots of Sorghum bicolour L. following exposure to water deficit. Plant Physiol. 99:203–212. Derory J, Mariette S, Gonzale´z-Martı´nez SC, Chagne´ D, Madur D, Gerber S, Brach J, Persyn F, Ribeiro MM, Plomion C. 2002. What can nuclear microsatellites tell us about maritime pine genetic resources conservation and provenance certification strategies? Ann For Sci. 59:699–708. Destremau DX, Jolly H, Thari T. 1976. Contribution a` la connaissance des provenances de Pinus pinaster. Ann Rech For Maroc. 16:101–153. Dubos C, Le Provost G, Pot D, Salin F, Lalane C, Madur D, Frigerio JM, Plomion C. 2003. Identification and characterization of water-stress-responsive genes in hydroponically grown maritime pine (Pinus pinaster) seedlings. Tree Physiol. 23:169–179. Dubos C, Plomion C. 2003. Identification of water-deficit responsive genes in maritime pine (Pinus pinaster Ait.) roots. Plant Mol Biol. 51:249–262. Dure L III. 1993. Structural motifs in Lea proteins. In: close TJ, Bray EA, editors. Plant responses to cellular dehydration during environmental stress. Vol. 10. Rockville (MD): American Society of Plant Physiologists. p. 91–103. Ehrenreich IM, Purugganan MD. 2006. The molecular genetic basis of plant adaptation. Am J Bot. 93:953–962.

Endler JA. 1986. Natural selection in the wild. Princeton (NJ): Princeton University Press. Excoffier L, Laval G, Schneider S. 2005. Arlequin ver. 3.0: an integrated software package for population genetics data analysis. Evol Bioinform Online. 1:47–50. Farquhar GD, Ehleringer JR, Hubick KT. 1989. Carbon isotope discrimination and photosynthesis. Annu Rev Plant Physiol Plant Mol Biol. 40:503–537. Fay JC, Wu C-I. 1999. A human population bottleneck is not incompatible with the discordance between patterns of mitochondrial vs. nuclear DNA variation. Mol Biol Evol. 16:1003–1006. Fay JC, Wu C-I. 2000. Hitchhiking under positive Darwinian selection. Genetics. 155:1405–1413. Filatov DA. 2002. ProSeq: a software for preparation and evolutionary analysis of DNA sequence data sets. Mol Ecol Notes. 2:621–624. Flint J, Mott R. 2001. Finding the molecular basis of quantitative traits: successes and pitfalls. Nat Rev Genet. 2:437–445. Ford MJ. 2002. Applications of selective neutrality tests to molecular ecology. Mol Ecol. 11:1245–1262. Galtier N, Depaulis F, Barton NH. 2000. Detecting bottlenecks and selective sweeps from DNA sequence polymorphism. Genetics. 155(2):981–987. Garcı´a-Gil MR, Mikkonen M, Savolainen O. 2003. Nucleotide diversity at two phytochrome loci along a latitudinal cline in Pinus sylvestris. Mol Ecol. 12:1195–1206. Garnier-Ge´re´ PH, Ades PK. 2001. Environmental surrogates for predicting and conserving adaptive genetic variability in tree species. Conserv Biol. 15:1632–1644. Gonza´lez-Martı´nez SC, Alı´a R, Gil L. 2002. Population genetic structure in a Mediterranean pine (Pinus pinaster Ait.): a comparison of allozyme markers and quantitative traits. Heredity. 89:199–206. Gonza´lez-Martı´nez SC, Ersoz E, Brown GR, Wheeler NC, Neale DB. 2006. DNA sequence variation and selection of tag single-nucleotide polymorphisms at candidate genes for drought-stress response in Pinus taeda L. Genetics. 172:1915–1926. Gonza´lez-Martı´nez SC, Go´mez A, Carrio´n JS, Agu´ndez D, Alı´a R, Gil L. 2007. Spatial genetic structure of an explicit glacial refugium of maritime pine (Pinus pinaster Aiton) in southeastern Spain. In: Weiss S, Ferrand N, editors. Phylogeography in southern European refugia: evolutionary perspectives on the origins of European biodiversity. Kluwer Academic Publishers: London. p. 257–269. Gonza´lez-Martı´nez SC, Krutovsky KV, Neale DB. 2006. Foresttree population and genomics and adaptive evolution. New Phytol. 170:227–238. Gonza´lez-Martı´nez SC, Wheeler NC, Rezos E, Nelson CD, Neale DB. 2007. Association genetics in Pinus taeda L.I. wood property traits. Genetics. 175:399–409. Guevara MA, Chagne´ D, Almeida MH, et al. (16 co-authors). 2005. Isolation and characterization of nuclear microsatellite loci in Pinus pinaster Ait. Mol Ecol Notes. 5:57–59. Haddrill PR, Thornton KR, Charlesworth B, Andolfatto P. 2005. Multilocus pattern of nucleotide variability and the demographic and selection history of Drosophila melanogaster populations. Genome Res. 15:790–799. Hardy OJ, Vekemans X. 2002. SPAGeDi: a versatile computer program to analyse spatial genetic structure at the individual or population levels. Mol Ecol Notes. 2:618–620. Hein J, Schierup M, Wiuf C. 2004. Gene genealogies, variation and evolution: a primer in coalescent theory. New York: Oxford University Press. Hellmann I, Ebersberger I, Ptak SE, Pa¨a¨bo S, Przeworski M. 2003. A neutral explanation for the correlation of diversity

436 Eveno et al.

with recombination rates in humans. Am J Hum Genet. 72:1527–1535. Heuertz M, De Paoli E, Ka¨llman T, Larsson H, Jurman I, Morgante M, Lascoux M, Gyllenstrand N. 2006. Multilocus patterns of nucleotide diversity, linkage disequilibrium and demographic history of norway spruce [Picea abies (L.) Karst]. Genetics. 174:2095–2105. Hoffman EA, Schueler FW, Jones AG, Blouin MS. 2006. An analysis of selection on a colour polymorphism in the northern leopard frog. Mol Ecol. 15:2627–2641. Howe GT, Brunner AM. 2005. An evolving approach for understanding plan adaptation. New Phytol. 167:1–5. Huey RB, Gilchrist GW, Carlson ML, Berrigan D, Serra L. 2000. Rapid evolution of a geographic cline in size in an introduced fly. Science. 287:308–309. Hurme P, Repo T, Savolainen O, Pa¨a¨kko¨nen T. 1997. Climatic adaptation of bud set and frost hardiness in Scots pine (Pinus sylvestris). Can J For Res. 27:716–723. Ingram J, Bartels D. 1996. The molecular basis of dehydration tolerance in plants. Annu Rev Plant Physiol Plant Mol Biol. 47:377–403. Joshi J, Schmid B, Caldeira MC, et al. (18 co-authors). 2001. Local adaptation enhances performance of common plant species. Ecol Lett. 4:536–544. Joyce DC, Aspinall D, Edwards GR. 1983. Water deficit and the growth and anatomy of the radish fleshy axis. New Phytol. 93:439–446. Jump AS, Hunt JM, Martinez-Izquierdo JA, Penuelas J. 2006. Natural selection and climate change: temperature-linked spatial and temporal trends in gene frequency in Fagus sylvatica. Mol Ecol. 15:3469–3480. Kado T, Yoshimaru H, Tsumura Y, Tachida H. 2003. DNA variation in a conifer, Cryptomeria japonica (Cupressaceae sensu lato). Genetics. 164:1547–1559. Kawecki TJ, Ebert D. 2004. Conceptual issues in local adaptation. Ecol Lett. 7:1225–1241. Kittelson PM, Maron JL. 2001. Fine-scale genetically based differentiation of life-history traits in the perennial shrub lupinus arboreus. Evolution. 55:2429–2438. Krutovsky KV, Neale DB. 2005. Nucleotide diversity and linkage disequilibrium in cold-hardiness- and wood quality-related candidate genes in Douglas fir. Genetics. 171:2029–2041. Latta RG. 2003. Gene flow, adaptive population divergence and comparative population structure across loci. New Phytol. 161:51–58. Laval G, Excoffier L. 2004. SIMCOAL 2.0: a program to simulate genomic diversity over large recombining regions in a subdivided population with a complex history. Bioinformatics. 20:2485–2487. Le Corre V, Kremer A. 2003. Comparison of genetic variability at neutral markers, QTLs and adaptive trait in a subdivided population: effects of selection intensity, variance of local optima and migration. Genetics. 164:1205–1219. Li WH, Graur D. 1991. Fundamentals of molecular evolution. Sunderland (MA): Sinauer Associates. Linhart YB, Grant MC. 1996. Evolutionary significance of local genetic differentiation in plants. Annu Rev Ecol Syst. 27:237–277. Luikart G, England PR, Tallmon D, Jordan S, Taberlet P. 2003. The power and promise of population genomics: from genotyping to genome typing. Nat Rev Genet. 4:981–994. Mariette S, Chagne´ D, Decroocq S, Giovanni G, Lalanne C, Madur D, Plomion C. 2001. Microsatellite markers for Pinus pinaster Ait. Ann For Sci. 58:203–206. Mariette S, Le Corre V, Austerlitz F, Kremer A. 2002. Sampling within the genome for measuring within population diversity: trade-offs between markers. Mol Ecol. 11:1145–1156.

Masle J, Gilmore SR, Farquhar GD. 2005. The ERECTA gene regulates plant transpiration efficiency in Arabidopsis. Nature. 436:866–870. McGuire W, Hill AV, Allsopp CE, Greenwood BM, Kwiatkowski D. 1994. Variation in the TNF-alpha promoter region associated with susceptibility to cerebral malaria. Nature. 371:508–511. McKay JK, Latta RG. 2002. Adaptive population divergence: markers, QTL and traits. Trends Ecol Evol. 17:285–291. Neale DB, Savolainen O. 2004. Association genetics of complex traits in conifers. Trends Plant Sci. 97:325–330. Nei M. 1987. Molecular Evolutionary Genetics. New York: Columbia University Press. Nei M, Kumar S. 2000. Molecular evolution and phylogenetics. New York: Oxford University Press. Neigel JE. 2002. Is FST obsolete? Conserv Genet. 3:167–173. Nguyen-Queyrens A, Bouchet-Lannat F. 2003. Osmotic adjustment in three-year-old seedlings of five provenances of maritime pine (Pinus pinaster) in response to drought. Tree Physiol. 23:397–404. Nielsen R. 2001. Statistical tests of selective neutrality in the age of genomics. Heredity. 86:641–647. Nielsen R. 2005. Molecular signatures of natural selection. Annu Rev Genet. 39:197–218. O’Hagan A, Luce BR. 2003. A primer on Bayesian statistics in health economics and outcomes research. London: Medtap International Inc. Padmanabhan V, Dias DMAL, Newton RJ. 1997. Expression analysis of a gene family in loblolly pine (Pinus taeda L.) induced by water deficit stress. Plant Mol Biol. 35:801–807. Petit RJ, Bahrman N, Baradat PH. 1995. Comparison of genetic differentiation in maritime pine (Pinus pinaster Ait.) estimated using isozyme, total protein and terpenic loci. Heredity. 75:382–389. Petit RJ, Duminil J, Fineschi S, Hampe A, Salvini D, Vendramin GG. 2005. Comparative organization of chloroplast, mitochondrial and nuclear diversity in plant populations. Mol Ecol. 14:689–701. Plomion C, Bahrman N, Durel CE, O’Malley DM. 1995. Genomic analysis in Pinus pinaster (Maritime pine) using RAPD and protein markers. Heredity. 74:661–668. Pritchard JK, Stephens M, Donnelly P. 2000. Inference of population structure using multilocus genotype data. Genetics. 155:945–959. Ramanjalu S, Bartels D. 2002. Drought- and dessication-induced modulation of gene expression in plants. Plant Cell Environ. 25:141–151. Rehfeldt GE, Ying CC, Spittlehouse DL, Hamilton DA Jr. 1999. Genetic responses to climate in Pinus contorta: niche breadth, climate change and reforestation. Ecol Monogr. 69:375–407. Richard S, Morency MJ, Drevet C, Jouanin L, Se´guin A. 2000. Isolation and characterization of a dehydrin gene from white spruce induced upon wounding, drought and cold stresses. Plant Mol Biol. 43:1–10. Rodriguez PL. 1998. Protein phosphatase 2C (PP2C) function in higher plants. Plant Mol Biol. 38:919–927. Rozas J, Sanchez-DelBarrio JC, Messeguer X, Rozas R. 2003. DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics. 19:2496–2497. Rueffler C, Van Dooren TJM, Leimar O, Abrams PA. 2006. Disruptive selection then what? Trends Ecol Evol. 21(5):238–245. Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, Shamovsky O, Palma A, Mikkelsen TS, Altshuler D, Lander ES. 2006. Positive natural selection in the human lineage. Science. 12:1614–1620. Salvador L, Alı´a R, Agu´ndez D, Gil L. 2000. Genetic variation and migration pathways of maritime pine (Pinus pinaster Ait) in the Iberian peninsula. Theor Appl Genet. 100:89–95.

Drought Stress Genes Differentiation in Pinus pinaster 437

Savolainen O, Pyha¨ja¨rvi T. 2007. Genomic diversity in forest trees. Curr Opin Plant Biol. 10:1–6. Schaal BA, Olsen KM. 2000. Gene genealogies and population variation in plants. Proc Natl Acad Sci USA. 97:7024–7029. Schemske DW. 1984. Population structure and local selection in Impatiens pallida (Balsaminaceae), a selfing annual. Evolution. 38:817–832. Schlo¨tterer C. 2002a. Towards a molecular characterization of adaptation in local populations. Curr Opin Genet Dev. 12:683–687. Schlo¨tterer C. 2002b. A microsatellite-based multilocus screen for the identification of local selective sweeps. Genetics. 160: 753–763. Scotti-Saintagne C, Mariette S, Porth I, Goicoechea PG, Barreneche T, Bode´ne`s C, Burg K, Kremer A. 2004. Genome scanning for interspecific differentiation between two closely related oak species [Quercus robur L. and Q. petraea (Matt.) Liebl.]. Genetics. 168:1615–1626. Seki M, Narusaka M, Abe H, Kasuga M, YamaguchiShinozaki K, Carninci P, Hayashizaki Y, Shinozaki K. 2001. Monitoring the expression pattern of 1300 Arabidopsis genes under drought and cold stresses by using a full-length cDNA microarray. Plant Cell. 13:61–72. Seki M, Umezawa T, Urano K, Shinozaki K. 2007. Regulatory metabolic networks in drought stress responses. Curr Opin Plant Biol. 10:296–302. Shabalina SA, Ogurtsov AY, Kondrashov VA, Kondrashov AS. 2001. Selective constraint in intergenic regions of human and mouse genomes. Trends Genet. 17:373–376. Shinozaki K, Yamaguchi-Shinozaki K. 2000. Molecular responses to dehydration and low temperature: differences and cross-talk between two stress signalling pathways. Curr Opin Plant Biol. 3:217–233. Shoemaker J, Painter I, Weir BS. 1999. Bayesian statistics in genetics. Trends Genet. 15:354–358. Slate J. 2005. Quantitative trait locus mapping in natural populations: progress, caveats and future directions. Mol Ecol. 14:363–379. Socie´te´ Centrale pour l’Equipement du Territoire- Tunisie. 2005. Integrated management for water resources in the North of Tunisia. Final Report for the Tunisian Republic agriculture and hydraulic resources ministry. Sokal RR, Rohlf FJ. 1995. Biometry. 3rd ed. San Francisco (CA): Freeman. Storey JD, Tibshirani R. 2003. Statistical significance for genomewide studies. Proc Natl Acad Sci USA. 100: 9440–9445. Storz JF. 2002. Contrasting patterns of divergence in quantitative traits and neutral DNA markers: analysis of clinal variation. Mol Ecol. 11:2537–2551. Storz JF. 2005. Using genome scans of DNA polymorphism to infer adaptive population divergence. Mol Ecol. 14:671–688. Street NR, Skogstrom O, Tucker J, Rodriguez-Acosta M, Nilsson P, Jansson S, Taylor G. 2006. The genetics and genomics of the drought response in Populus. Plant J. 48: 321–341. Tabor HK, Risch NJ, Myers RM. 2002. Opinion: candidate-gene approaches for studying complex genetic traits: practical considerations. Nat Rev Genet. 3:391–397. Tajima F. 1983. Evolutionary relationship of DNA sequences in finite populations. Genetics. 105:437–460.

Tajima F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 123:585–595. Thomas CD. 2005. Recent evolutionary effects of climate change. In: Lovejoy TE, Hannah L, editors. Climate change and biodiversity. New Haven (Connecticut): Yale University Press. p. 75–88. Thompson AJ, Corlett JE. 1995. mRNA levels of four tomato (L. esculentum Mill L.) genes related to fluctuating soil water status. Plant Cell Environ. 18:773–780. Vasema¨gi A, Nilsson J, Primmer CR. 2005. Expressed sequence tag-linked microsatellites as a source of gene-associated polymorphisms for detecting signatures of divergent selection in atlantic salmon (Salmo salar L). Mol Biol Evol. 22: 1067–1076. Vasema¨gi A, Primmer CR. 2005. Challenges for identifying functionally important genetic variation: the promise of combining complementary research strategies. Mol Ecol. 14: 3623–3642. Verhoeven KJF, Simonsen KL, McIntyre LM. 2005. Implementing false discovery rate control: increasing your power. Oikos. 108:643–647. Wahid N, Gonza´lez-Martı´nez SC, El Hadrami I, Boulli A. 2004. Genetic structure and variability of natural populations of maritime pine (Pinus pinaster Aiton) in Morocco. Silvae Genet. 53:93–99. Waldmann P, Garcı´a-Gil MR, Sillanpa¨a¨ MJ. 2005. Comparing Bayesian estimates of genetic differentiation of molecular markers and quantitative traits: an application to Pinus sylvestris. Heredity. 94:623–629. Wang R-L, Stec A, Hey J, Lukens L, Doebley J. 1999. The limits of selection during maize domestication. Nature. 398: 236–239. Watkinson JI, Sioson AA, Vasquez-Robinet C, et al. (14 coauthors). 2003. Photosynthetic acclimation is reflected in specific patterns of gene expression in drought-stressed loblolly pine. Plant Physiol. 133:1702–1716. Watterson GA. 1975. On the number of segregating sites in genetical models without recombination. Theor Popul Biol. 7:256–276. Weir BS, Cockerham CC. 1984. Estimating F-statistics for the analysis of population structure. Evolution. 38:1358–1370. Wilding CS, Butlin RK, Grahame J. 2001. Differential gene exchange between parapatric morphs of Littorina saxatilis detected using AFLP markers. J Evol Biol. 14:611–619. Wright SI, Gaut BS. 2005. Molecular population genetics and the search for adaptive evolution in plants. Mol Biol Evol. 22: 506–519. Yang S-H, Wang H, Sathyan P, Stasolla C, Loopstra CA. 2005. Real-time RT-PCR analysis of loblolly pine (Pinus taeda) arabinogalactan-protein-like genes. Physiol Plant. 124: 91–106. Zhang YI, Sederoff RR, Allona I. 2000. Differential expression of genes encoding cell wall proteins in vascular tissues from vertical and bent loblolly pine trees. Tree Physiol. 20: 457–466.

Marcy Uyenoyama, Associate Editor Accepted December 4, 2007

Contrasting Patterns of Selection at Pinus pinaster Ait ...

France; and |Department of Forest Systems and Resources, Forest Research Institute, Centro de .... such approaches, ''neutral'' expectations accounting for.

460KB Sizes 2 Downloads 375 Views

Recommend Documents

Pinus pinaster
associations in situ in an unstructured natural population of maritime pine (eastern Iberian. Peninsula) under a mixed-effects model framework. RR-BLUP was used to build predictive models for serotiny in this region. Model prediction power outside th

Pinus pinaster Aiton
kernels were fitted to both genetic and ecological data in a widespread Mediterranean pine, .... understanding of this key Mediterranean pine, an Intensive. Study Plot (ISP) was ...... Nathan by the Israel Science Foundation (Grants 474/02 and.

Pinus pinaster Aiton
1942, data not shown). As in the previous .... all individuals within the subset of data. The statistical sig- ... gauge fitness under the optimal growing conditions of.

Pinus pinaster Aiton
change was analyzed using migration matrix models and maximum likelihood .... southern Spain based on ecological and historical data (Alía et al. 1996).

Pinus pinaster Aiton, Pinaceae - Semantic Scholar
polymorphisms) can be used as predictors of maladaptation to climate in maritime pine (Pinus pinaster Aiton), an outcrossing long-lived ... Geographically defined molecular proxies for climate adaptation will thus critically enhance the predictive po

a case study in a Pinus pinaster Ait. progeny trial - Springer Link
... 6 October 2008 /Revised: 13 April 2009 /Accepted: 26 April 2009 /Published online: 19 May 2009. © Springer-Verlag 2009. Abstract The management of a genetic improvement ... in a clonal seed orchard and to study how deviations from.

Pinus pinaster Aiton, Pinaceae - Semantic Scholar
13: 132–138. Wagner, A., L. Donaldson, H. Kim, L. Phillips, H. Flint et al.,. 2009 Suppression of 4-Coumarate-CoA ligase in the coniferous gymnosperm Pinus radiata. Plant Physiol. 149: 370–383. Westbrook, J. W., M. F. Resende, Jr., P. Munoz, A. R

AIT
5 days ago - Key Cash Flow Statement Data. 2016. 2017F. 2018F. 2019F. Net Income. 429. 475. 553. 612. Depreciation & Amortization. 142. 164. 179. 192.

Contrasting relatedness patterns in bottlenose dolphins - NCBI
Published online 17 January 2003. Contrasting ..... super-alliance members might prefer to associate with males to which they are more closely related. .... Behav. 31, 667–682. Vehrencamp, S. L. 1983b Optimal degree of skew in cooperat-.

Growth patterns and carbon balance of Pinus radiata ...
protocol (Schulze et al., 2002), thus reliable tools are required for quantifying ... with forest growth data provided by stem analysis; .... in Table 2. As for the comparison with stand growth data, .... large uncertainties in areas with high convec

nuclear microsatellites reveal contrasting patterns of ...
species life history (breeding system, modes of seed and pol- .... genetic clusters in humans, where genetic differences among .... ulations in the input file.

Contrasting evolutionary patterns in populations of demersal sharks ...
Oct 24, 2017 - DOI 10.1007/s00227-017-3254-2. ORIGINAL PAPER. Contrasting evolutionary patterns in populations of demersal sharks throughout the western Mediterranean. Sergio Ramírez‑Amaro1,2. · Antonia Picornell1 · Miguel Arenas3,4,5 · Jose A.

SELECTION AND COMBINATION OF ... - Research at Google
Columbia University, Computer Science Department, New York. † Google Inc., Languages Modeling Group, New York. ABSTRACT. While research has often ...

Unified and Contrasting Cuts in Multiple Graphs - Research at Google
Aug 13, 2015 - ing wide scale applications from social networks to medical imaging. A popular analysis is to cut the graph so that the disjoint ..... number of graphs (typically 10s or at most 100s). In ad- .... google.com/site/chiatungkuo/. 10. 20.

COMPARATIVE WOOD ANATOMY OF PINUS ...
74100 Bartin, Turkey [E-mail: [email protected]]. SUMMARY. Pinus sylvestris L. subsp. hamata (Steven) Fomin var. compactaTosun is quite different ...

Contrasting Neurocomputational Models of Value ...
3.1 Illustration of the effect parameterization method. . . . . . . . . . . . 35 ... 3.16 The magnitude of the compromise effect for a symmetric value function. 50.

AIT - Application for Approval.pdf
Page 1 of 1. 1560 Broadway, Suite 1350, Denver, CO 80202 P 303.894.7800 F 303.894.7693 www.dora.colorado.gov/professions. APPLICATION FOR ...

Natural Selection Simulation at PHET.pdf
natural de envejecimiento, la crisis de la mediana edad o una crisis profesional. Whoops! There was a problem loading this page. Retrying... Whoops! There was a problem loading this page. Retrying... Natural Selection Simulation at PHET.pdf. Natural

Detecting Argument Selection Defects - Research at Google
CCS Concepts: • Software and its engineering → Software defect analysis; .... In contrast, our analysis considers arbitrary identifier names and does not require ...

Natural Selection Simulation at PHET.pdf
Access the simulation and explore the settings. Answer the following. questions. 1. What are some VARIABLES that you have control over in the. simulation? 2.

Contrasting vulnerability.pdf
implementing them in a Geographic Information System (GIS) with. a 10-m spatial resolution. We used the average ratio of precipita- tion (P) to potential ...

Online Selection of Diverse Results - Research at Google
large stream of data coming from a variety of sources. In this pa- per, we ... while simultaneously covering different regions of the world, pro- viding a good mix of ...

Contrasting Neurocomputational Models of Value ...
The discrimination between the two theories will be based on computational explo- ... my own except where explicitly stated otherwise in the text. This work has not .... 4.7 Performance of DFT and LCA in a choice setup with four options sim-.