How genomic data reflect past selection Codon substitution based tests Site-frequency spectrum tests Complications Polymorphism vs. Divergence tests Extended haplotype tests Hard vs. soft sweeps Patterns of inter-population divergence
Attributes of DNA sequence data that reflect past selection • Patterns of synonymous and nonsynonymous variation • Site frequency spectrum • Aberrant patterns of linkage disequilibrium • Exceptionally long haplotypes • Pattern of polymorphism out of kilter with levels of divergence • Exceptional divergence between populations.
The Plan • • • • • • • •
How genomic data reflect past selection Codon substitution based tests Site-frequency spectrum tests Complications Polymorphism vs. Divergence tests Extended haplotype tests Hard vs. soft sweeps Patterns of inter-population divergence
Testing human-specific accelerated divergence by maximum-likelihood.
Substitution at synonymous and nonsynonymous sites
dN = rate of nonsynonymous substitutions per
nonsynonymous site (those that change encoded amino acid)
dS = rate of synonymous substitution per synonymous site (“silent” sites)
= dN/dS
Simple calculation of dN/dS is not very informative Human and chimp genes differ by only a few nucleotides. dN/dS can be spuriously large if dS is small by chance. Need a formal statistical test to determine if dN is significantly greater than dS.
Comparative Evolution of Sociality in Bees
Hollis Woodard & Brielle Fischman
Gene Robinson
What molecular changes are associated with social evolution?
Gene gain and loss
✦
Caste Cooperative dimorphism brood care
Regulatory sequence changes ✦
Coding sequence changes ✦
Social feeding
Pheromone signaling
Bee phylogeny
Alternative Hypothesis 1: ωSocial ≠ ωSolitary
Alternative Hypothesis 3: ωHoney bees ≠ ωOther bees
Alternative Hypothesis 2: ωHighly Eusocial ≠ ωPrimitively Eusocial ≠ ωSolitary
Null Hypothesis :
Text
Testing the ability of flies to learn
Dunce mutant flies are slow learners (and this is the gene that evolved fast in social bees!)
Inference of selection from branch-specific substitution rate acceleration
Kosiol et al. (2008) PLosGen 4:e1000144
Gene ontology terms with inflated signs of selection
Kosiol et al. (2008) PLosGen 4:e1000144
Excess amino acid substitutions in Ig variable domain
Kosiol et al. (2008) PLosGen 4:e1000144
The Plan • • • • • • • •
How genomic data reflect past selection Codon substitution based tests Site-frequency spectrum tests Complications Polymorphism vs. Divergence tests Extended haplotype tests Hard vs. soft sweeps Patterns of inter-population divergence
The SNP site frequency spectrum and natural selection
Purifying selection excess rare Sweep in progress excess common Heterozygous advantage excess intermediate
Sample size = 30 alleles
Caveats in using SFS to infer selection • Ascertainment bias in SNP identification • Admixture / population stratification • Demographic history confounds the tests
Estimating genetic diversity () within populations
= =a function of the number of polymorphic sites in a population (S) n = number of alleles
A second estimate of 4N n
k =
i=1
Allele 1: ACTGGCTGAACTT
n
j 1
n 2
ij
Allele 2: ACTGGTTGAACTT Allele 3: GCTGGTTGAACCT
k12 k 23 k13 k12 k 23 k13 = = 3 3 2
k = number of differences between pairs of alleles i and j n = number of alleles
Testing neutrality
Tajima’s D statistic proportional to:
Under neutrality, Tajima’s D ≈ 0
See also Fay and Wu’s H and Fu and Li’s F tests
Testing neutrality Under purifying selection, Tajima’s D < 0
Examples of large mtDNA breaks within Population subdivision produces excess rare variants species
Ascertainment bias
Difference in SFS of phase I HapMap vs Perlegen SNPs arose from differences in SNP discovery.
Using SFS for inference of selection • Avoid ascertainment problem by using full sequence data • Correct for admixture
Pr[ xl
(i )
( i ) (i ) = ( j, v)] = 2 pklj qk pklv qk k k
– where qk(i) is the proportion of individual i’s genome that came from population k, and pklj is the frequency of allele j at locus l in population k
• Correct for demographic history • Use composite likelihood to fit growth and bottleneck parameters to silent sites (assumed to be neutral) • Use these demographic parameters to correct SFS
Rasmus Nielsen
European autosomal and X site frequency spectra Replacements
Silent
1
2 3
4 5
6 7
8
9 10 11 12 13 14 15
Note that X has more high frequency derived sites
Radical nonsynonymous changes have greater skew to SFS
Seeking evidence for balancing selection: Excess intermediate frequency alleles
Andres et al. (2009) Mol. Biol. Evol. 26:2755–2764
Site Frequency Spectra for ERAP2
Andres et al. (2009) Mol. Biol. Evol. 26:2755–2764
ERAP2 haplotype network
ancestral
Yoruba Luhya Palestinian
Gujarati Han Toscani
Andres et al. (2009) Mol. Biol. Evol. 26:2755–2764
Nucleotide diversity
Selective sweeps leave troughs of diversity
Position along chromosome
Kim and Stephan (2002) Genetics 160: 765–777
Bottlenecks and pseudoSweeps • Bottleneck + growth leaves sweep-like troughs of diversity
H w
Bottleneck simulation: 10 kb, = 0.01/site, r = 0.1/site, tr = 0.004, d = 0.015 and f = 0.03
Selective sweeps and allele age
Coop et al. (2009) Plos Genet 5:e1000500
Complications • Demography (growth, migration) • Population substructure • Kinship among sampled individuals
Solution • First fit demographic model by Approximate Bayesian Computation and infer selection after accommodating growth. • Infer population substructure by applying Principal Components Analysis, and using PCs in model fits. • Infer kinship by IBD methods and use kinship matrix in subsequent models.
The Plan • • • • • • • •
How genomic data reflect past selection Codon substitution based tests Site-frequency spectrum tests Complications Polymorphism vs. Divergence tests Extended haplotype tests Hard vs. soft sweeps Patterns of inter-population divergence
Contrasting polymorphism with interspecific divergence McDonald-Kreitman Test
Divergent
Polymorphic
Synonymous
n11
n12
Replacement
n21
n22
Counts of divergent sites and SNPs
G = 816.03,
P < 2 x 10-16
Bustamante et al. (2005) Nature 437:1153-1157.
McDonald-Kreitman Poisson random field model Divergent Synonymous
Replacement
2 s F(, , =2Ns)
Polymorphic 4N(1/i)
G(4N, =2Ns)
Fit the above parameters by MCMC Sawyer and Hartl (1992) Genetics 132:1161
Posterior density for 2Ns (obtained for each individual gene)
Frequency
300
200
100
0 -1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
Sampled value of 2Ns
The above indicates that Pr( > 0) > 0.97
Positive & negative selection inferred from human polymorphism
Inferences from human-chimp tests: outlier processes Biological process
Number of genes
p-value
Immunity and defense
566
0.0000
Sensory perception
274
0.0000
Protein biosynthesis
178
0.0000
Chemosensory perception
173
0.0000
Olfaction
155
0.0000
T-cell mediated immunity
145
0.0000
B-cell- and antibodymediated immunity
107
0.0000
Gametogenesis
65
0.0002
Spermatogenesis and motility
31
0.0003
Translational regulation
28
0.0003
Biological process unclassified
4127
0.0005
Inhibition of apoptosis
37
0.0044
2Ns is correlated with dN/dS (r = 0.377, P < 10-9)
Note! Can detect significance even if dN/dS << 1.
Positively selected
Negatively selected
Many negatively selected genes are associated with known Mendelian disorders Huntingtin has Pr(<0) greater than 99%
As a class, transcription factors show strong positive selection
39/240 transcription factors had Pr(>0) > 97.5%
The Plan • • • • • • • •
How genomic data reflect past selection Codon substitution based tests Site-frequency spectrum tests Complications Polymorphism vs. Divergence tests Extended haplotype tests Hard vs. soft sweeps Patterns of inter-population divergence
Drosophila pachea and senita cactus
Biosynthesis of ecdysone
Biosynthesis of ecdysone
Neverland
D. pachea must use lathosterol to make ecdysone
nvd mutations unique to D. pachea
dN/dS of nvd along the phylogeny
Reduced polymorphism in neighborhood of nvd
Polymorphism statistics for D. pachea
nvd vs. “control” regions of the genome
nvd vs. “control” regions of the genome
• nvd region has reduced nucleotide diversity and recombination rate. • Skewed site frequency spectrum toward rare recovery from sweep.
Hudson-Kreitman-Aguadé test results
nvd has only 5.6% of the polymorphism expected given its level of divergence.
Omega statistic indicates a recent selective sweep
Kim and Nielsen 2004; Jensen et al. 2007
Extended haplotype in neighborhood of nvd
Virginie Orgogozo
Lang et al. (2012) Science 337:1658-1661.
The Plan • • • • • • • •
How genomic data reflect past selection Codon substitution based tests Site-frequency spectrum tests Complications Polymorphism vs. Divergence tests Extended haplotype tests Hard vs. soft sweeps Patterns of inter-population divergence
Extreme allele frequency differences are strongly correlated with mean FST
Coop et al. (2009) Plos Genet 5:e1000500
Exceptional divergence in allele frequency found in the 1000 genomes project SLC24A5
skin pigmentation
SMC45A5
skin pigmentation
DARC
immunity
TLR1
immunity
ADH1B
alcohol metabolism
EDAR
hair morphology
ABCC11
ear wax consistency
But the 1000 genomes project finds thousands of other SNPs with high divergence between populations
Genes with strong inter-population differentiation and unusually long haplotypes
Derived allele is out-of-Africa Coop et al. (2009) Plos Genet 5:e1000500
Derived allele is European and central Asian
Coop et al. (2009) Plos Genet 5:e1000500
Derived allele is east Asian
Coop et al. (2009) Plos Genet 5:e1000500
Two-dimensional site-frequency spectrum
Yi et al. (2010) Science 329:75-78
Full exome sequencing of 50 Han and 50 Tibetans identifies EPAS1 in altitude acclimation
Yi et al. (2010) Science 329:75-78
Conclusions •Evidence for natural selection is rampant. •Signatures of selection arise from patterns of polymorphism and divergence. •Tests probe different times in the past when selection could have acted. Extended haplotype tests are for very recent events. •Positive selection is especially common in genes involved in immunity, defense, and perception. •Genes that have a signature of recent selection generally display larger allele frequency differences between populations. •Power of tests is improved with additional genome sequences of related species.
Exceptionally long haplotypes. ⢠Pattern of polymorphism out of kilter with levels of divergence ..... and unusually long haplotypes. Derived allele is out-of-Africa ...
strategy, it cannot capture the rapid initial reduction in error or the overcompensatory drift. Therefore, we modeled the strategy as a setpoint/reference signal that ...
the harmful effects with the technology and knowledge we have nowadays, it could be why the reason Feng ... Figure 1:Feng Shui Analysis (Smith, 2006, p.10) ...
and Technology (INIA), Madrid, Spain. 2Department of ..... Gene Engineering of the Ministry of Education, Sun Yat- sen University ...... 171:15â22. Baradat PH ...
man et al. 2012; Budde et al. 2014), and reduced by mul- tiple environmental factors (Kruuk & Hadfield 2007). An alternative to measuring genetic variance under controlled conditions is to use the genetic similarity ...... genomic prediction: coat co
Mar 11, 2009 - Figure 1. The Time Course of Adaptation following an Increase in Temporal Contrast Depends on ...... cent dye (0.1 mM Alexa 488; Molecular Probes). ... Gaussian noise in the frequency domain and filtering at 50 Hz, 60 Hz, or.
Kording KP, Beierholm U, Ma WJ, Quartz S, Tenenbaum JB, Shams L (in press) Causal inference in Cue combination. PLOSOne. Robinson FR, Noto CT, Bevans SE (2003) Effect of visual error size on saccade adaptation in monkey. J. Neurophysiol 90:1235-1244.
Apr 21, 2013 - At the extreme, if a single orientation were shown. 100% of the ..... Chance, F.S. & Abbott, L.F. Inputspecific adaptation in complex cells through.
Alternatively, for large values of N, that is N â« (m + n), in view of Theorem 3, we can instead ... .360 ± .003 .352 ± .008 ..... in view of (16), zâ is a solution of the.
such as Skype relaying and content distribution systems. We believe that ... ID and current bandwidth limit using a directory service such as DNS or it can be ...
away, we needed to develop a simpler schema. Our goal .... rewards described by Cozzens (1989) might also introduce conceptual ambiguity because ...... global indicator that an advisee is maintaining the proper citational patterns that allows.
Nov 17, 2010 - variations in loudness of speech between different programs. 5'457'769 A ..... In an alternative implementation, the loudness esti mator 14 also ... receives an indication of loudness or signal energy for all segments and makes ...
example, one common move in scientific discourse is a claim, usually a statement of fact that is novel. Claims .... institutional affiliation, and document object index), the full text of the article without images, and works cited ..... the 505 rese
cells before pathogens replicate to sufficient numbers to cause disease or death. .... ODE model, and compare predictions to empirical data. We then use an ...
exigency to a particular audience, in this case readers of academic research articles. For example, one common move in scientific discourse is a claim, usually a ...
Nov 10, 1994 - for understanding the FORTRAN processing as described herein is FX/FORTRAN Programmer's Handbook, Alliant. Computer Systems Corp., July 1988. LikeWise, general purpose computers like those from Alliant Computer Sys tems Corp. can be us
Nov 10, 1994 - âDigital audio tape for data storageâ, IEEE Spectrum, Oct. 1989, pp. 34â38, E. .... analytical and empirical phenonomena and techniques, a central features of ..... number of big spectral values (bigvalues) number of pairs of ...
Separation of melting and environmental signals in an ice core with seasonal melt. John C. Moore,1 Aslak Grinsted,1,2 Teija Kekonen,1,3 and Veijo Pohjola4. Received 1 March 2005; revised 21 March 2005; accepted 21 April 2005; published 19 May 2005. [
Oct 11, 2001 - symbol frequencies of the QAM and VSB signals by apply. (52). _ 348/726 ..... Selected packets are used to reproduce the audio portions of the DTV program, ..... to time samples supplied from a sample clock generator is.
Nov 17, 2010 - the implementation described here, the block length for cal. 20. 25. 30. 35 ..... processing circuitry coupled to the input terminal and the memory ...