Molecular Ecology (2013) 22, 5877–5889
Genetic signatures of natural selection in response to air pollution in red spruce (Picea rubens, Pinaceae) S T A N I S L A V B A S H A L K H A N O V , * 1 A N D R E W J . E C K E R T † and O M P . R A J O R A * *Faculty of Forestry and Environmental Management, University of New Brunswick, PO Box 44000, 28 Dineen Drive, Fredericton, NB E3B 5A3, Canada, †Department of Biology, Virginia Commonwealth University, Richmond, VA 23284, USA
Abstract One of the most important drivers of local adaptation for forest trees is climate. Coupled to these patterns, however, are human-induced disturbances through habitat modification and pollution. The confounded effects of climate and disturbance have rarely been investigated with regard to selective pressure on forest trees. Here, we have developed and used a population genetic approach to search for signals of selection within a set of 36 candidate genes chosen for their putative effects on adaptation to climate and humaninduced air pollution within five populations of red spruce (Picea rubens Sarg.), distributed across its natural range and air pollution gradient in eastern North America. Specifically, we used FST outlier and environmental correlation analyses to highlight a set of seven single nucleotide polymorphisms (SNPs) that were overly correlated with climate and levels of sulphate pollution after correcting for the confounding effects of population history. Use of three age cohorts within each population allowed the effects of climate and pollution to be separated temporally, as climate-related SNPs (n = 7) showed the strongest signals in the oldest cohort, while pollution-related SNPs (n = 3) showed the strongest signals in the youngest cohorts. These results highlight the usefulness of population genetic scans for the identification of putatively nonneutral evolution within genomes of nonmodel forest tree species, but also highlight the need for the development and application of robust methodologies to deal with the inherent multivariate nature of the genetic and ecological data used in these types of analyses. Keywords: air pollution, climate, environmental change, local adaptation, Picea rubens, signatures of natural selection Received 17 June 2013; accepted 17 September 2013
Introduction Understanding the genetic basis of adaptation in natural populations and the corresponding genetic effects of environmental and climatic change to patterns of locally adapted phenotypes is a fundamental goal in ecological genetics (Barrett & Hoekstra 2011). Forest tree populations are adapted to their local site and climate conditions (Savolainen et al. 2007), with this process primarily affected by the interaction between the homogenizing Correspondence: Om P. Rajora, Fax: 506-453-3538; E-mail: [email protected]
1 Present address: Maritime Provinces Higher Education Commission, 82 Westmorland Street, Suite 401, PO Box 6000, Fredericton, NB E3B 5H1, Canada © 2013 John Wiley & Sons Ltd
effects of gene flow and diversifying effects of natural selection. Despite the potential for extensive long distance gene flow, forest tree populations are locally adapted and are capable of quick adaptation and evolutionary response to selection pressures (Savolainen et al. 2007; Kremer et al. 2012). For example, forest trees exhibit strong geographical variation in fitness-related phenotypic traits that is often correlated with environmental variables, which is evidence for local adaptation (Endler 1986). The identification of the genes underlying these locally adapted phenotypes, however, has been elusive. With the recent development of genomewide resources (e.g. Parchman et al. 2012; Birol et al. 2013; Nystedt et al. 2013), especially single nucleotide polymorphisms (SNPs), there is increasing interest in identifying genes underlying local adaptation and the effects of
5878 S . B A S H A L K H A N O V , A . J . E C K E R T and O . P . R A J O R A environmental and climatic changes on natural plant populations (Neale & Kremer 2011). Patterns of local adaptation within natural plant populations are geographically based so that other geographically based patterns such as neutral population structure, historical demography and phenotypic plasticity can confound inference of local adaptation from surveys of genetic diversity (Gienapp et al. 2008; Barrett & Hoekstra 2011). Studying selection and adaptation in long-lived organisms, such as forest trees, faces additional challenges: overlapping generations that violate assumptions underlying many methods of inference, long reproduction cycles that limit the feasibility of controlled multi-generation selection and breeding experiments, and poorly annotated genomes further complicate the identification of genes affected by nonneutral processes. Despite these limitations, however, evidence for local adaptation is well developed for forest trees at both the phenotypic and molecular levels (Savolainen et al. 2007; Neale & Kremer 2011), and genomewide resources are under development for many forest tree species (e.g. Mackay et al. 2012). The existence of overlapping generations within natural populations of forest trees, however, can be utilized to provide unparalleled opportunities to examine the impacts of anthropogenic disturbances on relatively short timescales, especially when the local effective population size is large or the environment is highly heterogeneous in space and time (Lourenco et al. 2013). Global climate change and environmental pollution have become important driving forces of evolution in temperate and boreal ecosystems over the last century (Scholz & Bergmann 1984; Barnes et al. 1999; Longauer et al. 2001; Aitken et al. 2008). Red spruce (Picea rubens Sarg.) is an excellent system for investigating the genetic effects of air pollution within natural plant populations. The natural range of this species extends from eastern Canada throughout the northeastern United States (US) and into the southeastern United States along the mountaintops of the Appalachian Mountains. This species is sensitive to air pollution, and its southern high-elevation populations have been severely affected by air pollution in the second half of the 20th century (Siccama et al. 1982; Johnson 1983; Johnson & Siccama 1984; DeHayes et al. 1991; Dehayes & Hawley 1992). The northern populations have largely remained unaffected by air pollution. Exposure to high concentrations of sulphate in cloud water and higher frequency of cloud events at mountaintop locations are believed to be the primary contributors to stand damage in the southern portion of red spruce range. At high-elevation sites, exposure to acid clouds explains up to 59.5% of variance in annual tree growth, whereas climatic factors explain only 7.6% of this variance (Webster et al. 2004);
thus, pollution has had a clear effect on the survival components of fitness. There is a multiplicity of effects by air pollution on plants (Seyyednejad et al. 2011). At the molecular level, these effects revolve around general plant stress responses to reactive oxygen species (ROS; e.g. Li & Yi 2012) and, for sulphate specifically, involve the complex biochemical mechanisms required for sulphur assimilation and thiol metabolism (Yi et al. 2010). With regard to red spruce, exposure to sulphate depositions affects cold acclimation and winter dormancy mechanisms, which leads to extensive winter frost damage to the currentyear needles that in turn results in defoliation, reduced growth rates and ultimately tree death (DeHayes et al. 1991). High concentrations of sulphate in cloud fog damage cuticular wax in needles, leading to increased transpiration during the winter and subsequent drought stress (Esch & Mengel 1998). The same factors – frost damage and drought stress – are the two primary causes of seedling mortality in red spruce (Blum 1990). Much of the adaptive genetic response to anthropogenic sulphate deposition by red spruce populations thus could be confounded with adaptive responses to cold and drought stresses. To identify allele frequency changes associated with anthropogenic sulphate deposition, we investigated genetic variation, structure and divergence between three age classes sampled from within five natural populations of red spruce growing at different air pollution levels. We disentangled the effects of genetic structure, demography and climate-related geographical variation from that of air pollution through the use of amongcohort comparisons based upon SNPs from candidate genes and putatively neutral simple sequence repeats (SSRs). Specifically, we tested the prediction that age classes are expected to show differential correlations with pollution levels due to the fact that the oldest trees in the stand had passed the most sensitive developmental stage (seedling) long before the anthropogenic sulphate deposition reached its maximum in 1960–1970s and thus were likely less affected by air pollution, whereas younger trees had to survive under polluted conditions. Using an environmental association approach (see Vasem€ agi 2006), we identified three SNPs that were overly associated with sulphate pollution levels primarily in young trees. In addition, seven SNPs were overly associated with climate variables, including the three outliers with respect to pollution class, but significant correlations were primarily observed in old trees. Thus, we provide the first evidence for putative signatures of directional selection in response to anthropogenic air pollution in natural plant populations and discuss the broad implications of these results. © 2013 John Wiley & Sons Ltd
A I R P O L L U T I O N A N D L O C A L A D A P T A T I O N I N R E D S P R U C E 5879
Materials and methods Population sampling Five pristine natural old-growth populations of red spruce were sampled across the species range in areas receiving various levels of sulphate deposition (Fig. 1; Table S1, Supporting information). Stands without significant human intervention documented in the past were selected so that confounding effects of harvesting and silviculture activities could be avoided. The annual data for non-sea-salt sulphate deposition were obtained from the National Atmospheric Chemistry (NAtChem) Precipitation Chemistry Database (Ro et al. 2009). Although the absolute levels of non-sea-salt SO42 depositions varied over time, the high-elevation sites in Southern Appalachians consistently demonstrated lower pH values and higher SO42 contents in cloud water (Anderson et al. 2006). At the same time, a significant spatial and temporal variation in acid fog exposure and temperature regimes is expected within each site. It would be extremely difficult to account for all possible within-site variation for the environmental conditions. Thus, the WV, TN and NY red spruce populations were considered polluted, and the NB and NS populations served as unpolluted control.
Fig. 1 Sampling locations and the estimated non-sea-sulphate deposition levels (source: Canadian National Atmospheric Chemistry Database, Science and Technology Branch, Environment Canada). Populations are abbreviated as follows: TN (Mt. Clingmans Dome, Tennessee, 35˚35′N, 083˚28′W, 1800 m, polluted); WV (Gaudineer Knob, West Virginia, 38˚38′N, 079˚50′W, 1242 m, polluted); NY (Mt. Rusk, New York, 42˚12′N, 074˚15′ W, 1100 m, polluted); NB (Loch Alva Lake, New Brunswick, 45˚16′N, 66˚18′W, 150 m, nonpolluted); NS (Abraham Lake, Nova Scotia, 45˚09′N, 062˚36′W, 196 m, nonpolluted). © 2013 John Wiley & Sons Ltd
Within each population, 60 trees were sampled in each of the three age groups: (i) old trees (the oldest trees in the stand), (ii) young trees (2–4 cm DBH) and (iii) seedlings (7–12 cm tall). The approximate tree ages for the three classes were >100 years, 22–30 years and 2–4 years, respectively. Trees were randomly sampled with a minimum spacing of 50 m between individuals to minimize effects of family structure. Minimum spacing between trees was calculated based on the average seed dispersal distance (Blum 1990). One population of black spruce (Picea mariana Mill.) from Goose Bay, Labrador, Newfoundland, (NL) (53°17′N, 60°25′W) was included as a control to ascertain possible hybridization effects in the northern parts of red spruce range. This experimental design thus allows for the disentangling of signatures of air pollution-driven selection from the confounding effects of demography, population structure, hybridization and other evolutionary forces. Needles were collected from each of the total 900 red spruce trees sampled. Genomic DNA was extracted using a high-throughput magnetic fishing protocol (Bashalkhanov & Rajora 2008).
Discovery and genotyping of single nucleotide polymorphisms (SNPs) Expressed sequence tags (ESTs) for 96 candidate genes potentially involved in the adaptive response to cold and drought stresses were selected from the existing publications, conifer EST databases and GenBank (Namroud et al. 2008; Wegrzyn et al. 2008). Primer pairs for 63 genes yielded PCR amplicons in red spruce. Discovery of SNPs was performed on a test panel of 48 individuals randomly selected from range-wide collections of red spruce and 12 black spruce individuals. Fragments smaller than 400 bp were screened using the high-resolution melting assay on a LightScanner HR1 (Idaho Technology Inc., Salt Lake City, Utah). Larger fragments were assayed by EcoTILLING (McCallum et al. 2000) using the SniPer Eco-Mix Kit (Frontier Genomics, Auke Bay, Alaska). Polymorphic fragments were identified and then sequenced by Agencourt Bioscience (Beverly, Massachusetts). The resulting sequences were quality-trimmed with PHRED threshold score of 20 and aligned. Overall, 182 SNPs in 63 genes were identified, and 61 of these SNPs in 36 genes were selected for subsequent genotyping based on their meeting the criteria for multiplex assay for genotyping. SNP genotyping was carried out using the iPLEX Gold assay (Sequenom Inc., San Diego, California). Only allele calls with >99% confidence and loci with >95% successful calls were retained for the analysis. As a result, 33 SNP markers in 25 genes passed quality control and were polymorphic in the red spruce samples (Tables S2, S3,
5880 S . B A S H A L K H A N O V , A . J . E C K E R T and O . P . R A J O R A Supporting information). Annotations for these SNPs were derived from alignments between the sequences of amplicons used to discover the SNPs and the putative homologous candidate gene in Arabidopsis thaliana L. All of the sampled individuals were also genotyped using nine EST and genomic DNA-based microsatellite markers developed earlier in Rajora laboratory at Dalhousie University (SSRs; Table S4, Supporting information). These SSRs were used as a reference set of neutral markers with which to assess the rate of false positives for tests of neutrality and environmental correlations.
Data analysis Genetic diversity and population structure. Standard metrics of genetic diversity were calculated for the SSR and SNP genotype data for populations and age classes classified into populations – observed (Hobs) and expected (Hexp) heterozygosity, and Wright’s F-statistics. Population structure was assessed in two ways. First, hierarchical F-statistics were estimated using the variance components method of Yang (1998), as implemented in the hierfstat library (Goudet 2005) available in the R computing environment (R Development Core Team 2012). Age classes were classified into populations, and populations were classified into pollution classes (polluted vs. nonpolluted). Analyses were conducted separately for SSRs and SNPs and included only populations of red spruce. Second, overall population structure was assessed using the Bayesian clustering algorithm available in the program STRUCTURE, version 2.3.1 (Falush et al. 2003). The number of populations (K) was assumed to range from 2 to 10, and for each value of K, five independent runs of the Markov Chain Monte Carlo (MCMC) sampler were initiated for 1.0 9 106 steps after a burn-in of 1.0 9 105 steps. For all analyses, trees from all age classes, only SSR data, the control population of black spruce, and the admixture model with correlated allele frequencies, were used. Resulting admixture coefficients (q-values) for each value of K were averaged across replicated runs using the program CLUMPP, ver. 1.1.2 (Jakobsson & Rosenberg 2007), and visualized using standard graphical functions in R. The best value of K was assumed to be the number of observed populations (K = 6), including the control population of black spruce, although inspection of the plot of the log-probability of the data against K and use of the DK method (Evanno et al. 2005) were also employed. Tests of neutrality and environmental correlations. Two forms of outlier tests were employed to assess the nonneutrality of SNP markers. First, we employed a
hierarchical analysis of FST outliers following the logic of Excoffier et al. (2009) using the SNP data. We employed a hierarchical island model of population structure where age classes were classified into populations and populations were classified into pollution classes (i.e. two populations were placed within the nonpolluted class, and three populations were placed within the polluted class). Parameters for the hierarchical island model were estimated pairwise for each level in the hierarchy from the equilibrium relationship between FST and the migration rate using the SSR data after which 1.0 9 106 SNPs were simulated using ms (Hudson 2002). The estimated migration rates used for the simulations were further adjusted until the multilocus FST for the simulated data did not differ from the observed value for the SSRs by more than 5%. For each simulated data set, hierarchical fixation indices were estimated using the hierfstat library in R. The sets of 1.0 9 106 hierarchical fixation indices were used as null distributions to identify outliers for the observed hierarchical fixation indices at a two-tailed significance threshold of P = 0.05. Second, we employed an environmental correlation approach that incorporates population structure covariates to account for the confounding issue of population history on genotype–environment correlations (see Vasem€ agi 2006). The general framework employed follows the logistic regression approach given by Joost et al. (2007), but incorporates population structure covariates. Thus, a logistic regression model was constructed for each SNP, where the count of the minor allele for each individual was the response variable and admixture coefficients (xk = 1 through xk = 5) from the program STRUCTURE, pollution classes (xpollution) and a climate variable (PC1 from a PCA using 19 bioclimatic variables, xclimate) were the predictor variables (i.e. the full model was: logit P = b0 + b1xk = 1 + b2xk = 2 + b3xk = 3 + b4xk = 4 + b5xk = 5 + b6xpollution + b7xclimate). Note that one of the clusters from the STRUCTURE analysis was dropped, as the q-values for an individual tree sum to one. The effect of pollution classes on allele counts for each SNP was assessed using a likelihood ratio test (LRT) comparing a logistic regression model with STRUCTURE admixture coefficients and pollution class as predictors to a model with only STRUCTURE admixture coefficients as predictors. The large-sample approximation to the LRT based on the chi-square distribution with one degree of freedom was used to obtain a P-value for each SNP. Separate tests for each SNP were conducted for all trees and for trees grouped by age class. Multiple tests were accounted for using a minimum P-value permutation approach utilizing the SSR data as neutral controls. Specifically, pollution class was randomized with respect to SSR genotypes and STRUCTURE © 2013 John Wiley & Sons Ltd
A I R P O L L U T I O N A N D L O C A L A D A P T A T I O N I N R E D S P R U C E 5881 admixture coefficients 10 000 times. For each of the randomized data sets, 33 SSR alleles (i.e. coded as 0, 1 and 2 genotypes based on counts of the allele in each individual) were selected randomly and tested for effects due to pollution class as described previously. The minimum P-value for the set of 33 tests for each of the 10 000 randomized data sets was retained, and the 5% quantile of this set of 10 000 minimum P-values was used as the multiple-test corrected significance level. The same permutation analysis was conducted using the SNP data only, and the multiple-test corrected significance level was similar to that of the SSR alleles (results not shown). Lastly, we used the previously described framework to also test for the effects of climate on allele counts for each SNP. Climate data were obtained from generic gridfiles (2.5 arc-minute resolution), available at the WorldClim database website (http://www.worldclim. org/current), containing interpolated values for 19 bioclimatic variables. Given the latitude and longitude of each population (Table S1, Supporting information), the raster and extract functions from the raster library in R were used to retrieve population-specific climate data. Trees from the same population were assigned the same climate data, and environmental correlations were identified as described previously for pollution classes using LRTs. The 19 bioclimatic variables were first subjected to principal components analysis (PCA) to reduce dimensionality using the prcomp function in R, and scores on the first principal component (PC), which accounted for 62.85% of the total variance, were used in the downstream analyses. This PC was comprised equitably of most bioclimatic variables, with 13 of the 19 bioclimatic variables having absolute values of loadings on this PC between 0.20 and 0.30 (Fig. S1, Supporting information). All tests were conducted using the glm function assuming a binomial error distribution and a logit link function as employed in the stats library of R. Population structure covariates were obtained from the STRUCTURE results assuming K = 6, with one of the six clusters being dropped randomly conditional on it not being the cluster identified as corresponding to introgression from black spruce. Simulations were used to assess the ability of the aforementioned method to correct for the effects of population structure. Specifically, we simulated 1000 sets of 33 SNPs in the program ms assuming 4Neu = 2 (per locus) and an island model comprised of five populations where 4Nem = 27. The value of 4Nem was selected to match that expected for the Fpopulation,total listed in Table 1 for the SNPs at mutation–drift equilibrium. From each of the five populations, 50 diploid individuals were sampled. For each of the simulated data sets, a STRUCTURE analysis was conducted assuming K = 5. © 2013 John Wiley & Sons Ltd
Table 1 Multilocus variance components and hierarchical population structure with regard to pollution class for red spruce populationsa. Bootstrap confidence (95%, n = 10 000 replicates across loci) intervals are located in parentheses Hierarchical fixation index
Fpopulation,pollution Fage class,pollution
0.0091 (0.0021–0.0211) 0.0651 (0.0349–0.1355)
0.01374 (0.0043–0.0251) 0.0147 (0.0054–0.0254) 0.0010 (0.0005–0.0026)
Estimates exclude the SNP RPRS13b located in a chloroplast locus due to the haploid and nonrecombining nature of the chloroplast genome.
Climate data were then assigned to each of the five populations based on the observed values from the original data set (i.e. observed climate values for the three polluted populations were assigned randomly to the three simulated polluted populations). For each set of 33 SNPs, we counted the number of genotype–environment tests (i.e. the effect of pollution corrected for climate), conducted as described previously with and without population structure corrections in the model, where the observed P-value was less than the multipletest corrected value. This count divided by the number of tests (i.e. 33) was used as the false-positive rate. The mean of the simulated distribution of this false-positive rate was used as the estimate of the false-positive rate.
Results We investigated genetic variation between three age classes within five populations of red spruce growing at different air pollution levels. Genetic variation and population structure were assessed using nine nuclear microsatellites and 33 SNPs sampled from five populations, each with three age classes, classified into polluted (TN, WV, NY) vs. unpolluted (NS, NB) populations. Two forms of neutrality tests were employed using the 33 SNPs – (i) an outlier analysis of hierarchical population structure and (ii) logistic regression models describing correlations of allele counts for each of the SNPs with pollution class.
Genetic diversity and population structure Expected heterozygosity (Hexp) across SSR markers within populations ranged from 0.000 to 0.898, with a mean (1 standard deviation [SD]) across loci and populations of 0.521 (0.329). For SNPs, Hexp ranged from 0.005 to 0.429, with a mean (1 SD) of 0.141 (0.132)
5882 S . B A S H A L K H A N O V , A . J . E C K E R T and O . P . R A J O R A across loci and populations. Expected heterozygosity across loci and marker types exhibited trends across age classes (Kruskal–Wallis tests: P < 0.05), with heterozygosity increasing with age, but not among populations (Kruskal–Wallis tests: P > 0.50) nor between pollution classes (Wilcox rank sum tests: P > 0.25), although the means within pollution classes did decrease for polluted sites for SSRs (polluted: 0.579, unpolluted: 0.618), but not SNPs (polluted: 0.140, unpolluted: 0.140). There was, however, an increased abundance of monomorphic SNPs in the polluted stands (Fig. S2, Supporting information). In general, observed heterozygosities (Hobs) were lower than expected based upon observed allele frequencies and Hardy–Weinberg equilibrium (Table S5, Supporting information). Trends similar to those for Hexp were also observed for Hobs across age classes, populations and pollution classes. As for Hexp, this resulted in overall values of Wright’s F-statistic that were positive. Values of F, however, were not significantly different across age classes, populations or pollution classes (Kruskal–Wallis tests and Wilcox rank sum tests: P > 0.20). An analysis of population structure based on hierarchical, multilocus F-statistics, revealed substructuring at
the level of pollution classes, populations and age classes (F-statistics: 0.0157–0.0798; Table 1, see also Table S6 (Supporting information) for variance components). Within pollution classes, populations and age classes were also substructured (F-statistics: 0.0091–0.0651; Table 1), as were age classes classified into populations (F-statistics: 0.0010–0.0566; Table 1). Most hierarchical F-statistics for both marker types were significantly greater than zero, with only Fpopulation,pollution (SSRs) and Fage class,population (SNPs) having their 95% bootstrap confidence intervals include zero (Table 1). Magnitudes of these indices were similar across marker types, although indices were approximately 1.4-fold larger for SNPs relative to SSRs at the level of pollution classes and populations and approximately 2.5-fold larger for SSRs relative to SNPs at the level of age classes. Regardless of marker type, these analyses were consistent with population substructuring at the level of pollution class, populations and age classes. Analysis of population structure using only SSRs and the Bayesian clustering algorithm employed in the program STRUCTURE revealed qualitatively similar patterns to those observed from the hierarchical F-statistics (Fig. 2). Most individuals were admixed within populations and age classes, with the exception being
Fig. 2 Patterns of population structure as assessed using nine SSRs and the Bayesian clustering algorithm in the program STRUCTURE reveal moderate substructuring of allele frequencies among pollution classes, populations and age classes, as well as introgression of black spruce alleles into nonpolluted populations (NB & NS). The reference black spruce population is labelled as NL in each panel. © 2013 John Wiley & Sons Ltd
A I R P O L L U T I O N A N D L O C A L A D A P T A T I O N I N R E D S P R U C E 5883 the old age class in the TN population. This pattern was apparent for all values of K and is expected for the level of structure revealed using hierarchical F-statistics. Inspection of the plot of K vs. DK revealed that the optimal value of K was at K = 2 (Fig. S3, Supporting information), which corresponded roughly to polluted vs. nonpolluted populations, although the plot of K vs. the likelihood levelled off closer to K = 6 (Fig. S1, Supporting information). This disparity is expected under scenarios of hierarchical population structure (Evanno et al. 2005), which is again the likely scenario for the sampled red spruce populations based on hierarchical F-statistics. Inclusion of a pure black spruce population (NL), moreover, indicated that, at all values of K, red spruce trees from the two populations comprising the nonpolluted sites (NB and NS) contained admixed trees with black spruce at a higher frequency than populations sampled at the polluted sites (TN, WV and NY). This was also observed in the patterns of linkage disequilibrium between SNPs, with most values of r2 larger than 0.20 being observed in these two, nonpolluted populations (Fig. S2, Supporting information). Within the polluted populations, LD was virtually nonexistent. Thus, admixture with black spruce was confounded with classification of sites as polluted vs. nonpolluted and may have partly contributed to allele frequency differences between these site categories. However, much of this confounded admixture effect was disentangled
by examining allele frequency changes among three age cohorts within populations (see below).
Tests of neutrality and environmental correlations Simulations under a hierarchical island model of population structure were used to assess magnitudes of hierarchical F-statistics as outliers (Table 2, Fig. 3). In general, these outliers had hierarchical F-statistics that were 3-fold or greater than the average across loci and the average predicted from simulations. A total of four unique SNPs were outliers for Fpopulation,total: RPRS64a, RPRS76b, RPRS77a and RPRS81b. Analysis of the correlation between pollution classes and allele counts for each of the 33 SNPs while accounting for population structure revealed 21 SNPs for which the P-value associated with a likelihood ratio test comparing models with and without effects due to pollution was less than a = 0.05 (Table 2). This is 12.7-fold larger than the expected number of false positives at a = 0.05 (21 observed vs. 1.65 expected). Application of similar tests considering one age class at a time resulted in 6 (seedlings), 9 (mature) and 12 (old) tests significant using a = 0.05. Again, these were 3.7- to 7.2-fold larger than the expected number of false positives using a = 0.05. In comparison, applications of similar models to SSR alleles (n = 121) resulted in 12 or less significant tests for all trees and trees segregated by age class. This is at most
Table 2 Summary of outliers for hierarchical population structure, correlations to pollution class and correlations to climate. Bolded values are significant using an a = 0.05, which was corrected for multiple tests for the correlation analyses (see Materials and Methods). Values are either FST (population structure) or P-values associated with a likelihood ratio test of models with and without the listed effect (correlation analyses) RPRS13b* Population structure NA Fpopulation,total Pollution correlations All 0.008 Seedling 0.068 Mature 0.015 Old 0.127 Climate correlations All 2.11e-13 Seedling 0.066 Mature 3.64e-05 Old 1.91e-09 Pollution | Climate correlations† All 0.134 Seedling 0.095 Mature 0.241 Old 0.308
2.31e-07 1.01e-04 5.93e-04 0.125
0.002 0.096 0.201 0.016
3.55e-06 3.90e-04 0.062 0.032
0.005 0.041 0.001 0.012
1.45e-09 1.34e-06 1.85e-04 0.272
0.847 0.361 0.463 0.585
2.85e-07 0.111 0.002 2.13e-04
4.55e-13 8.22e-05 8.62e-05 3.68-05
1.75e-07 0.088 0.078 1.83e-05
7.70e-05 0.036 0.002 0.003
8.74e-10 0.003 1.00e-04 6.84e-06
9.75e-06 0.034 0.014 0.001
5.33e-05 9.09e-05 0.004 0.441
0.389 0.446 0.675 0.765
9.34e-04 8.84e-05 0.322 0.289
0.115 0.237 0.054 0.267
3.01e-06 7.76e-05 0.009 0.560
0.660 0.422 0.580 0.712
*This SNP is located in a chloroplast locus and was thus excluded from FST analyses due to the haploid and nonrecombining nature of the chloroplast genome. † This represents a test of a model with both climate and pollution relative to a model with just climate. © 2013 John Wiley & Sons Ltd
0.12 0.08 0.04 0.00
Hierarchical F-statistic (Fpopulation,total)
5884 S . B A S H A L K H A N O V , A . J . E C K E R T and O . P . R A J O R A
Expected heterozygosity (He/[1-Fpopulation,total]) Fig. 3 Outlier SNPs with respect to hierarchical population structure as defined using a hierarchical island model parameterized using the SSR data.
only 2.0-fold larger than the expected number of false positives. Thus, there is an enrichment of signals for correlations to pollution class in the SNP data relative to the SSR data. To correct for multiple tests, we employed a minimum P-value approach based on randomization and subsampling of SSR alleles (see Materials and Methods). After such corrections, 3, 3, 2 and 0 SNPs remained significant for tests involving all, seedling, mature and old trees, respectively (Table 2). The significant SNPs were RPRS34a (all, seedling and mature trees), RPRS76b (all and seedling trees) and RPRS81b (all, seedling and mature trees) and were located in genes encoding 4coumarate-CoA ligase 2 (RPRS34a), heat-shock protein (RPRS76b) and an ascorbate peroxidase (RPRS81b). Two of these three SNPs (RPRS76b and RPRS81b) were also outliers with respect to Fpopulation,total (Fig. 3, Table 2). All three of these SNPs were noncoding, with two located in 3′UTR positions (RPRS34a and RPRS81b) and the other being intronic (RPRS76b), Thus, SNPs significantly affected by pollution class, after correcting for population structure and multiple tests, were primarily observed in seedling and mature trees. The method employed, moreover, appears to control quite well for false positives, as simulated data achieved a false-positive rate close to a = 0.05 (Fig. S4, Supporting information). Many of the candidate genes for response to pollution were also good candidates for response to climate, and pollution levels were confounded with climate differences (Wilcox rank sum test: P < 0.05). We also searched therefore for environmental correlations with climate variables. Application of logistic regression models predicting SNP allele counts using population structure covariates and/or climate, as described previously for
pollution classes, resulted in 23 SNPs for which the P-value associated with a likelihood ratio test comparing models with and without effects due to climate was less than a = 0.05. This is 13.9-fold larger than the expected number of false positives at a = 0.05 (23 observed vs. 1.65 expected). Application of similar tests considering one age class at a time resulted in 3 (seedlings), 9 (mature) and 11 (old) tests significant using a = 0.05. Again, these were 5.5- to 6.7-fold larger than the expected number of false positives using a = 0.05. In comparison, applications of similar models to SSR alleles (n = 121) resulted in six or less significant tests for all trees and trees segregated by age class. Thus, there is an enrichment of signals for correlations to climate in the SNP data relative to the SSR data. Of the 23 tests with P < 0.05, seven remained significant after multiple test corrections (Table 2). Of these seven SNPs, six were located in noncoding positions (five in UTRs and one in an intron), while one was located in a nonsynonymous position. The latter produced a proline– histidine polymorphism within an MYB transcription factor (RPRS77a). Four of these seven SNPs were also the four outliers with respect to Fpopulation,total. Notably, the effects across age classes were reversed with respect to pollution class, so that most of the significant effects were observed in old trees and the number of significant correlations decreased in mature and seedling trees. Of the seven outliers with respect to climate correlations, three of them were the outliers with respect to pollution class. The effects at these SNPs, however, were reversed with respect to age class, and their effects remained significant after accounting for climate (Table 2).
Discussion We have provided evidence of spatially variable selection due to anthropogenic sulphate deposition in natural red spruce stands. Specifically, we identified three SNPs that are very strongly associated with pollution class relative to neutral SSRs for young trees. This association remained even after accounting for the confounding effects of climate. These three SNPs were also associated with climate, although the association was primarily observed for old trees. Four additional SNPs were associated with climate, again primarily for old trees. These results are informative about the operation of selection in natural tree populations and to the conservation of red spruce populations in the face of continued pollution and climate change.
Anthropogenic pollution, climate change and the fate of red spruce populations Red spruce is adapted to moist and cool conditions, as reflected in its increasingly high-elevation habitat at © 2013 John Wiley & Sons Ltd
A I R P O L L U T I O N A N D L O C A L A D A P T A T I O N I N R E D S P R U C E 5885 southern latitudes. It also exhibits relatively little phenotypic variation in traits facilitating a large niche breadth (Morgenstern et al. 1981) and has relatively low genetic diversity within populations (Hawley & DeHayes 1994; Rajora et al. 2000). Predicted climate change will alter the availability of this habitat type at southern latitudes, suggesting local extirpation of mountaintop populations once local migration is unfeasible (Aitken et al. 2008). In addition, the area of most stability and potential for migration to track changing climate variables northward is overlapping with that of black spruce, where hybridization between the two species is common (Morgenstern & Farrar 1964; Gordon 1976; Fig. 2). As shown here, however, loci within the genome of red spruce have responded to climate change historically, and sulphate pollution more recently, despite its relatively low phenotypic and neutral genetic diversity that suggested this would be unlikely. Similar results were found for populations of sapphire cress (Boechera fecunda (Rollins) Dorn.), which were locally adapted to climate despite low marker diversity (McKay et al. 2001). This is expected theoretically because the probability of local adaptation depends upon the interaction of the local effective population size (Ne), which can be much larger than the range-wide and long-term Ne (see Karasov et al. 2010), and the strength of local selection pressures as they affect the genes underlying the adaptive trait. The evolutionary fate of red spruce is complicated by different evolutionary dynamics across its range. In the north, adaptation and migration to changing climates is confounded with hybridization with black spruce, which can be a potent evolutionary driver (Lewontin & Birch 1966), while in the south, suitable habitats are restricted to increasingly higher elevations thus promoting smaller populations that are increasingly isolated. Continued environmental pollution exerts additional pressure on southern populations. Genetic effects of air pollution within natural populations have been studied in several boreal and temperate forest tree species with contradictory results (e.g. Scholz & Bergmann 1984; Bergmann & Scholz 1987; Geburek et al. 1987; Barnes et al. 1999; Longauer et al. 2001, 2004). The present study, however, clearly identified three SNPs in three genes as putatively involved in acidification stress response in red spruce populations. Unlike previous studies, we monitored allele frequencies at candidate loci in pre- and postselection age cohorts as opposed to un-stratified populations. More work is needed, however, to better understand how allele frequencies at these loci will be affected in the context of the demographic changes that will accompany responses by these populations to continued climate change. © 2013 John Wiley & Sons Ltd
Evolutionary genetics of local adaptation for red spruce Here, we have documented climate, as represented by the first PC from a PCA using 19 bioclimatic variables (Hijmans et al. 2005), as an important driver for local adaptation among populations of red spruce. The seven SNPs correlated with this climate PC reside in genes with functions related to gene regulation (MYB transcription factor, RPRS77a), molecular chaperone activity (heatshock protein, RPRS76b), stress-induced calcium signalling (calmodulin-like protein, RPRS64a), regulation of signalling pathways (terminal-flower like protein, RPRS95c), peroxidase activity (ascorbate peroxidase, RPRS81b), lignin and flavonoid biosynthesis (4-coumarate–CoA ligase 2 (4CL), RPRS34a) and chlorophyll biosynthesis (light-independent protochlorophyllide reductase, RPRS13b). Interestingly, light-independent protochlorophyllide reductase is encoded in the chloroplast genome, and its identification as putatively nonneutral is relatively novel (see Budar & Roux 2011). Polymorphisms in genes with these molecular functions tend to be correlated with environmental variables in model plants and forest trees (Abe et al. 1997; Amasino 2010; Lindberg et al. 2012), which lends credence to their identification here as climate responsive. A definitive interpretation of these functions as climate responsive, however, would be premature without further experimental evidence (Barrett & Hoekstra 2011; Pavlidis et al. 2012). A subset of these SNPs was also associated with pollution levels in young trees, which is consistent with the onset of anthropogenic sulphate deposition (Anderson et al. 2006). Specifically, RPRS34a, RPRS76b and RPRS81b were associated with pollution levels, which encoded a 4CL protein, a heat-shock protein and an ascorbate peroxidase, respectively. The molecular functions of these proteins are sensible for responses to sulphate deposition. Ascorbate peroxidases are wellestablished parts of the antioxidant system that controls reactive oxygen species (ROS) in plants. As an example, the activity and expression of cytosolic ascorbate peroxidase were found to be enhanced in response to sulphur dioxide exposure in Arabidopsis thaliana (Kubo et al. 1995). In addition, heat-shock proteins (HSPs) are molecular chaperons that play a key role in maintaining cellular homeostasis under optimal and stressful conditions (Wang et al. 2004). Abiotic stresses, such as those caused by sulphate pollution, can affect protein stability and HSPs help in the stabilization of proteins and membranes. For example, significant increase in HSP expression was observed in Pinus pinaster Ait. in response to environmental pollution (Acquaviva et al. 2012). Therefore, the observation that SNPs within ascorbate peroxidase and HSP genes in red spruce are correlated with air
5886 S . B A S H A L K H A N O V , A . J . E C K E R T and O . P . R A J O R A pollution is consistent with the role these proteins play in the abiotic stress response. Correlation of 4CL with air pollution is also consistent with its functional biological role in phenylpropanoid pathway, which leads to the synthesis of phytochemicals essential for the survival of plants. Of most interest, however, is the observation that the three SNPs found associated with sulphate pollution are also associated with climate in older trees, which points to a model of selection upon standing variation. In support of this hypothesis, the standardized allele qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2Ni frequency deviation (SAFD ¼ ðpi pg Þ pg ð1p , where gÞ Ni is the sample size for the young cohort in population i, pg is the global allele frequency for minor allele for old cohorts across all populations, pi is the allele frequency for the minor allele in the young cohort of population i; see Christiansen et al. (1976) for these three SNPs changed significantly in younger trees relative to the global allele frequencies in old trees, especially in the polluted populations. The direction of allele frequency change, moreover, was often in opposite directions between polluted and nonpolluted populations, yet in the same direction across populations within either category (results not shown). The sole exception was SNP RPRS76b, which exhibited a strong clinal pattern for the frequency of the minor allele. When taken together with the results illustrated in Table 2, these results are consistent with patterns expected from local adaptation via divergent selection upon standing variation. As the candidate loci for SNP screening were selected from genes presumably involved in cold and drought stress response, it is possible that natural selection imposed by climate fluctuations after the last glacial maximum had increased allele frequencies at these loci to appreciable frequencies, so that when sulphate deposition was initiated, they were able to provide the necessary genetic variance upon which selection could act. Such models are commonly used to explain evolutionary responses by quantitative traits to selection (Barrett & Schluter 2008) and should be common evolutionary responses by tree populations (Savolainen et al. 2007). More work, however, is needed to understand fully the complex intersection of genetic architecture affected by selection due to multiple causes. The use of multiple age cohorts, though, is a promising start to understand this complexity.
Limitations and future directions Although the results presented here are sensible, strongly supported, and are logically consistent based on theoretical expectations, there are several limitations to the analyses. First, a small number of markers were used to
discover outliers. This limits the strength of conclusions that can be made about the full genetic architecture of adaptive responses to environmental pressures by red spruce. Second, a small number of populations were sampled, and although range-wide, this number limits the ability to draw general inferences about responses to spatially varying selection pressures (de Mita et al. 2013). Third, we utilized two different classes of markers that differ in levels of standing genetic diversity as the result of differing mutation rates. This could bias results due to differing levels of power in genotype–environment correlations (but see Materials and Methods – in order to minimize this bias, microsatellite alleles were coded as 0, 1, 2 depending upon the count of alleles similar to that for SNP alleles), homoplasy and biases when applying expectations of FST as a function of heterozygosity. Use of SSR markers, however, has been demonstrated to be more powerful than SNPs for descriptions of population structure and demography (Hamblin et al. 2007), which is quite central to this study. Furthermore, the observed patterns here illustrate that FST was similar between SSRs and SNPs. One exception, however, was the F-statistic for age classes classified into populations (Fage class,population, see Table 1), which was higher than among-population differentiation for the SSRs. This could reflect some form of natural selection, within-stand demography change or a combination of both, but further work would be needed to explain this pattern. Lastly, the structure of red spruce’s range makes detection of outliers difficult as population structure and environmental gradients, including changing climates and pollution levels, are confounded across latitude (see Eckert et al. 2010; Holliday et al. 2010), which suggests that future work should carefully consider sampling design to decouple these confounding factors as much as is feasible (Sork et al. 2013). The vast majority of these limitations can be addressed in future work, especially given that conifer genomics has progressed to the point of draft genomes and genomewide data sets (Parchman et al. 2012; Birol et al. 2013; Nystedt et al. 2013). As such, systems of natural populations amendable to asking and answering fundamental questions within evolutionary biology will move to the forefront of empirical research. Red spruce is one such system with which to tease apart the complex interactions of multiple sources of selection, and as such, we have established that natural forest tree populations are likely adapting to rapid environmental change and anthropogenic pollution.
Acknowledgements The research was funded by the Canada Research Chair Program (CRC950-201869) funds and the Natural Sciences and © 2013 John Wiley & Sons Ltd
A I R P O L L U T I O N A N D L O C A L A D A P T A T I O N I N R E D S P R U C E 5887 Engineering Research Council of Canada Discovery Grant RGPIN 170651 to O.P. Rajora. S. Bashalkhanov was supported by the University of New Brunswick start-up funds provided to O.P. Rajora and a Canadian Forest Service graduate student’s supplemental stipend. David DeKoeyer (Agriculture and Agri-Food Canada) provided access to the Idaho Technology LightScanner instrument for mutation screening. Sulphate deposition data and maps used in this publication were provided by the Canadian National Atmospheric Chemistry (NAtChem) Database (Science and Technology Branch, Environment Canada). The results reported in this manuscript are based on a part of the Ph.D. thesis of Stanislav Bashalkhanov supervised by the Principal Investigator Om P. Rajora. We thank Loren Rieseberg and anonymous reviewers for their valuable comments and suggestions on the previous version of this manuscript.
References Abe H, Yamaguchi-Shinozaki K, Urao T et al. (1997) Role of Arabidopsis MYC and MYB homologs in drought- and abscisic acid-regulated gene expression. The Plant Cell, 9, 1859– 1868. Acquaviva R, Venella L, Sorrenti V et al. (2012) Biochemical modifications in Pinus pinaster Ait. as a result of environmental pollution. Environmental Science and Pollution Research, 19, 3850–3855. Aitken SN, Yeaman S, Holliday JA, Wang T, Curtis-Mclane S (2008) Adaptation, migration or extirpation: climate change outcomes for tree populations. Evolutionary Applications, 1, 95–111. Amasino R (2010) Seasonal and developmental timing of flowering. The Plant Journal, 61, 1001–1013. Anderson JB, Baumgardner RE, Sandra EG (2006) Trends in cloud water sulfate and nitrate as measured at two mountain sites in the Eastern United States: regional contributions and temporal changes compared with regional changes in emissions, 1986-1999. Atmospheric Environment, 40, 4423–4437. Barnes J, Bender J, Lyons T, Borland A (1999) Natural and man-made selection for air pollution resistance. Journal of Experimental Botany, 50, 1423–1435. Barrett RDH, Hoekstra H (2011) Molecular spandrels: tests of adaptation at the genetic level. Nature Reviews Genetics, 12, 767–780. Barrett RDH, Schluter D (2008) Adaptation from standing genetic variation. Trends in Ecology and Evolution, 23, 38–44. Bashalkhanov S, Rajora OP (2008) Protocol: a high-throughput DNA extraction system suitable for conifers. Plant Methods, 4, 20. Bergmann F, Scholz F (1987) The impact of air pollution on the genetic structure of Norway spruce. Silvae Genetica, 36, 80– 83. Birol I, Raymond A, Jackman SD et al. (2013) Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data. Bioinformatics, 29, 1492–1497. doi:10.1093/bioinformatics/btt1/78. Blum BM (1990) Red spruce. In: Burns RM, Honkala BH (tech. coords.) Silvics of North America, vol 1, pp, 250–259. Agric. Handb. No. 654. USDA For. Serv., Washington, DC. Budar F, Roux F (2011) The role of organelle genomes in plant adaptation. Plant Signaling and Behavior, 6, 635–639.
© 2013 John Wiley & Sons Ltd
Christiansen FB, Frydenberg O, Hjorth JP, Simonsen V (1976) Genetics of Zoarces populations. IX. Geographic variation at the three phosphoglucomutase loci. Hereditas, 83, 245–256. Dehayes DH, Hawley GJ (1992) Genetic implications in the decline of red spruce. Water Air and Soil Pollution, 62, 233–248. DeHayes DH, Thornton FC, Waite CE, Ingle MA (1991) Ambient cloud deposition reduces cold tolerance of red spruce seedlings. Canadian Journal of Forest Research, 21, 1292–1295. Eckert AJ, van Heerwaarden J, Wegrzyn JL et al. (2010) Patterns of population structure and environmental associations to aridity across the range of loblolly pine (Pinus taeda L., Pinaceae). Genetics, 185, 969–982. Endler JA (1986) Natural Selection in the Wild. Princeton University Press, Princeton, NJ. Esch A, Mengel K (1998) Combined effects of acid mist and frost drought on the water status of young spruce trees (Picea abies). Environmental and Experimental Botany, 39, 57–65. Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology, 14, 2611–2620. Excoffier L, Hofer T, Foll M (2009) Detecting loci under selection in a hierarchically structured population. Heredity, 103, 285–298. Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics, 164, 1567–1587. Geburek T, Scholz F, Knabe W, Vornweg A (1987) Genetic studies by isozyme gene loci on tolerant and sensitivity in an air polluted Pinus sylvestris field trial. Silvae Genetica, 36, 49–53. Gienapp P, Teplitsky C, Alho JS, Mills JA, Merila J (2008) Climate change and evolution: disentangling environmental and genetic responses. Molecular Ecology, 17, 167–178. Gordon AG (1976) The taxonomy and genetics of Picea rubens and its relationship to Picea mariana. Canadian Journal of Botany, 54, 781–813. Goudet J (2005) Hierfstat, a package for R to compute and test hierarchical F-statistics. Molecular Ecology Notes, 5, 184–186. Hamblin MT, Warburton ML, Buckler ES (2007) Empirical comparison of simple sequence repeats and single nucleotide polymorphisms in assessment of maize diversity and relatedness. PLoS ONE, 2, e1367. Hawley GJ, DeHayes DH (1994) Genetic diversity and population structure of red spruce (Picea rubens). Canadian Journal of Botany, 72, 1778–1786. Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A (2005) Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology, 25, 1965–1978. Holliday JA, Ritland K, Aitken SA (2010) Widespread, ecologically relevant genetic markers developed from association mapping of climate-related traits in Sitka spruce (Picea sitchensis). New Phytologist, 188, 501–514. Hudson RR (2002) Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics, 18, 337–338. Jakobsson M, Rosenberg NA (2007) CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics, 23, 1801–1806. Johnson AH (1983) Red spruce decline in the Northwestern U.S.: hypothesis regarding the role of acid rain. Journal of the Air Pollution Control Association, 33, 1049–1054.
5888 S . B A S H A L K H A N O V , A . J . E C K E R T and O . P . R A J O R A Johnson AH, Siccama TG (1984) Decline of red spruce in northern Appalachians: assessing the possible role of acid deposition. TAPPI Journal, 67, 68–72. Joost S, Bonin A, Bruford MW et al. (2007) A Spatial Analysis Method (SAM) to detect candidate loci for selection: towards a landscape genomics approach to adaptation. Molecular Ecology, 16, 3955–3969. Karasov T, Messer PW, Petrov DA (2010) Evidence that adaptation in Drosophila is not limited by mutation at single sites. PLoS Genetics, 6, e1000924. Kremer A, Ronce O, Robledo-Arnuncio JJ et al. (2012) Long-distance gene flow and adaptation of forest trees to rapid climate change. Ecology Letters, 15, 378–392. Kubo A, Saji H, Tanaka K, Kondo N (1995) Expression of Arabidopsis cytosolic ascorbate peroxidase gene in response to ozone or sulfur dioxide. Plant Molecular Biology, 29, 479–489. Lewontin RC, Birch LC (1966) Hybridization as a source of variation for adaptation to new environments. Evolution, 20, 315–336. Li L, Yi H (2012) Effect of sulfur dioxide on ROS production, gene expression and antioxidant enzyme activity in Arabidopsis plants. Plant Physiology and Biochemistry, 58, 53–56. Lindberg S, Kader MDA, Yemelyanov V (2012) Calcium signaling in plant cells under environmental stress. In: Environmental Adaptations and Stress Tolerance of Plants in the Era of Climate Change(eds Ahmad P & Prasad MNV), pp. 325–360. Springer, NY, USA. Longauer R, Gomory D, Paule L et al. (2001) Selection effects of air pollution on gene pools of Norway spruce, European silver fir, and European beech. Environmental Pollution, 115, 405–411. Longauer R, Gomory D, Paule L et al. (2004) Genetic effects of air pollution on forest tree species of the Carpathian Mountains. Environmental Pollution, 130, 85–92. Lourenco JM, Glemin S, Galtier N (2013) The rate of molecular adaptation in a changing environment. Molecular Biology and Evolution, 30, 1292–1301. Mackay J, Dean JF, Plomion C et al. (2012) Towards decoding the conifer giga-genome. Plant Molecular Biology, 80, 555–569. McCallum CM, Comai L, Greene EA, Henikoff S (2000) Targeting Induced Local Lesions IN Genomes (TILLING) for plant functional genomics. Plant Physiology, 123, 439–442. McKay JK, Bishop JG, Lin J-Z, Richards JH, Sala A, MitchellOlds T (2001) Local adaptation across a climatic gradient despite small effective population size in the rare sapphire rockcress. Proceedings of the Royal Society of London B: Biological Sciences, 268, 1715–721. de Mita S, Thuillet A-C, Gay L et al. (2013) Detecting selection along environmental gradients: analysis of eight methods and their effectiveness for outbreeding and selfing populations. Molecular Ecology, 22, 1383–1399. doi:10.1111/mec.12182. Morgenstern EK, Farrar JL (1964) Introgressive hybridization in red and black spruce. Univ. Toronto, Fac. For. Tech. Rep. 4. Toronto, ON. 46 pp. Morgenstern EK, Corriveau A, Fowler DP (1981) A provenance test of red spruce in nine environments in eastern Canada. Canadian Journal of Forest Research, 11, 124–131. Namroud M-R, Beaulieu J, Juge N, Laroche J, Bousquet J (2008) Scanning of genome for single nucleotide polymorphisms involved in the adaptive population differentiation in white spruce. Molecular Ecology, 17, 3599–3616.
Neale DB, Kremer A (2011) Forest tree genomics: growing resources and applications. Nature Reviews Genetics, 12, 111– 122. Nystedt B, Street N, Wetterbom A et al. (2013) The Norway spruce genome sequence and conifer genome evolution. Nature, 497, 579–584. doi:10.1038/nature12211. Parchman TL, Gompert Z, Mudge J, Schilkey FD, Benkman CW, Buerkle CA (2012) Genome-wide association genetics of an adaptive trait in lodgepole pine. Molecular Ecology, 21, 2991–3005. Pavlidis P, Jensen JD, Stephen W, Stamatakis A (2012) A critical assessment of storytelling: gene ontology categories and the importance of validating genomic scans. Molecular Biology and Evolution, 29, 3237–3246. R Development Core Team (2012) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL: http://www.R-project.org/. Rajora OP, Mosseler A, Major JE (2000) Indicators of population viability in red spruce, Picea rubens. II. Genetic diversity, population structure, and mating behaviour. Canadian Journal of Botany, 78, 941–956. Ro CU, Vet RJ, Narayan J (2009) Analyzed data fields from the National Atmospheric Chemistry database (NAtChem) and analysis facility (ed. Air Quality Research Division SaTB, Environment Canada). Savolainen O, Pyhajarvi T, Knurr T (2007) Gene flow and local adaptation in trees. Annual Review of Ecology, Evolution and Systematics, 38, 595–619. Scholz F, Bergmann F (1984) Selection pressure by air pollution as studied by isozyme-gene-systems in Norway spruce exposed to sulphur dioxide. Silvae Genetica, 33, 238–241. Seyyednejad SM, Niknejad M, Koochak H (2011) A review of some different effects of air pollution on plants. Research Journal of Environmental Sciences, 5, 302–309. Siccama TG, Bliss M, Vogelmann HW (1982) Decline of red spruce in the Green Mountains of Vermont. Bulletin of the Torrey Botanical Club, 109, 162–168. Sork VL, Aitken SN, Dyer RJ, Eckert AJ, Legendre P, Neale DB (2013) Putting the landscape into the genomics of forest trees: approaches for understanding local adaptation and population responses to a changing climate. Tree Genetics and Genomes, 9, 901–911. Vasem€ agi A (2006) The adaptive hypothesis of clinal variation revisited: single-locus clines as a result of spatially restricted gene flow. Genetics, 173, 2411–2414. Wang W, Vinocur B, Shoseyov O, Altman A (2004) Role of plant heat-shock proteins and molecular chaperones in the abiotic stress response. Trends in Plant Science, 9, 1360–1385. Webster KL, Creed IF, Nicholas NS, Van Miegroet HJ (2004) Exploring interactions between pollutant emissions and climatic variability in growth of red spruce in the Great Smoky Mountains National Park. Water, Air, and Soil Pollution, 159, 225–248. Wegrzyn J, Lee JM, Tearse BR, Neale DB (2008) TreeGenes: a forest tree genome database. International Journal of Plant Genomics, 2008, 412875. Yang RC (1998) Estimating hierarchical F-statistics. Evolution, 52, 950–956. Yi H, Galant A, Ravilious E, Preuss ML, Jez JM (2010) Sensing sulfur conditions: simple to complex protein regulatory
© 2013 John Wiley & Sons Ltd
A I R P O L L U T I O N A N D L O C A L A D A P T A T I O N I N R E D S P R U C E 5889 mechanisms in plant thiol metabolism. Molecular Plant, 3, 269–279.
Table S1 Red spruce populations sampled, their geographic locations, and the site classes. Table S2 EST loci (RPRS) annotations and number of SNPs assayed in Picea rubens.
S. Bashalkhanov conceived the project idea, including the use of multiple age cohorts, developed the experimental design in consultation with O.P. Rajora, conducted the field and laboratory work and contributed to the writing and revision of the article. A.J. Eckert contributed to the data analysis, writing and revision of the article. O.P. Rajora contributed to the conception, experimental design, provided overall guidance and research directions and contributed to the writing and revision of the article.
Table S3 Flanking sequences of targeted SNPs in red spruce. Table S4 Microsatellite primers sequences, amplification conditions, amplicon size range, and number of alleles in Picea rubens. Table S5 Genetic diversity parameters in three age classes in polluted (TN, WV, NY) and control (NB, NS) populations. Table S6 Multilocus variance components with regard to pollution class for red spruce populations. Fig. S1 Loadings on the first principal component (PC) from a principal components analysis (PCA) of 19 bioclimatic variables obtained for each of the five sampled populations.
Data accessibility SNP and SSR genotypes, Population and climate data and R scripts: uploaded to Dryad. doi:10.5061/dryad. 10j72. Flanking regions of SNP and SSR primers: uploaded as online supporting information.
Supporting information Additional supporting information may be found in the online version of this article.
© 2013 John Wiley & Sons Ltd
Fig. S2 Patterns of pairwise linkage disequilibrium (LD) as measured by the sqaured allelic correlation coefficient (r2) within age classes and populations of red spruce for 33 SNPs. Fig. S3 Results from the program STRUCTURE with regard to the number of genetic clusters (K). Fig. S4 Simulation-based estimates (n = 1000 sets of 33 SNPs) of the false positive rate associated with the modified logistic regression method of Joost et al. (2007).