Signals of adaptation in genomes

Andrew Clark

Cornell University

The Plan • • • • • • • •

How genomic data reflect past selection Codon substitution based tests Site-frequency spectrum tests Complications Polymorphism vs. Divergence tests Extended haplotype tests Hard vs. soft sweeps Patterns of inter-population divergence

Attributes of DNA sequence data that reflect past selection • Patterns of synonymous and nonsynonymous variation • Site frequency spectrum • Aberrant patterns of linkage disequilibrium • Exceptionally long haplotypes • Pattern of polymorphism out of kilter with levels of divergence • Exceptional divergence between populations.

The Plan • • • • • • • •

How genomic data reflect past selection Codon substitution based tests Site-frequency spectrum tests Complications Polymorphism vs. Divergence tests Extended haplotype tests Hard vs. soft sweeps Patterns of inter-population divergence

Testing human-specific accelerated divergence by maximum-likelihood.

human

dN2 mouse

dN1

chimp

Mouse-human split: 80-110 MYA Human-chimp split: 4.6 – 6 MYA

Clark et al. (2003) Science 302:1960-1963

Substitution at synonymous and nonsynonymous sites

dN = rate of nonsynonymous substitutions per

nonsynonymous site (those that change encoded amino acid)

dS = rate of synonymous substitution per synonymous site (“silent” sites)

 = dN/dS

Simple calculation of dN/dS is not very informative Human and chimp genes differ by only a few nucleotides. dN/dS can be spuriously large if dS is small by chance. Need a formal statistical test to determine if dN is significantly greater than dS.

Comparative Evolution of Sociality in Bees

Hollis Woodard & Brielle Fischman

Gene Robinson

What molecular changes are associated with social evolution?

Gene gain and loss



Caste Cooperative dimorphism brood care

Regulatory sequence changes ✦

Coding sequence changes ✦

Social feeding

Pheromone signaling

Bee phylogeny

Alternative Hypothesis 1: ωSocial ≠ ωSolitary

Alternative Hypothesis 3: ωHoney bees ≠ ωOther bees

Alternative Hypothesis 2: ωHighly Eusocial ≠ ωPrimitively Eusocial ≠ ωSolitary

Null Hypothesis :

Text

Testing the ability of flies to learn

Dunce mutant flies are slow learners (and this is the gene that evolved fast in social bees!)

Inference of selection from branch-specific substitution rate acceleration

Kosiol et al. (2008) PLosGen 4:e1000144

Gene ontology terms with inflated signs of selection

Kosiol et al. (2008) PLosGen 4:e1000144

Excess amino acid substitutions in Ig variable domain

Kosiol et al. (2008) PLosGen 4:e1000144

The Plan • • • • • • • •

How genomic data reflect past selection Codon substitution based tests Site-frequency spectrum tests Complications Polymorphism vs. Divergence tests Extended haplotype tests Hard vs. soft sweeps Patterns of inter-population divergence

The SNP site frequency spectrum and natural selection

Purifying selection  excess rare Sweep in progress  excess common Heterozygous advantage  excess intermediate

Sample size = 30 alleles

Caveats in using SFS to infer selection • Ascertainment bias in SNP identification • Admixture / population stratification • Demographic history confounds the tests

Estimating genetic diversity () within populations

“Watterson’s theta”

Sn  = n 1 1 i i=1

Allele 1: ACTGGCTGAACTT Allele 2: ACTGGTTGAACTT Allele 3: GCTGGTTGAACCT * * * S=3

= =a function of the number of polymorphic sites in a population (S) n = number of alleles

A second estimate of 4N n

 k =

i=1

Allele 1: ACTGGCTGAACTT

n

j 1

n    2 

ij

Allele 2: ACTGGTTGAACTT Allele 3: GCTGGTTGAACCT

k12  k 23  k13 k12  k 23  k13 = = 3 3   2

k = number of differences between pairs of alleles i and j n = number of alleles



Testing neutrality

Tajima’s D statistic proportional to:

 

Under neutrality, Tajima’s D ≈ 0



See also Fay and Wu’s H and Fu and Li’s F tests

Testing neutrality Under purifying selection, Tajima’s D < 0

Examples of large mtDNA breaks within Population subdivision produces excess rare variants species

Ascertainment bias

Difference in SFS of phase I HapMap vs Perlegen SNPs arose from differences in SNP discovery.

Using SFS for inference of selection • Avoid ascertainment problem by using full sequence data • Correct for admixture

Pr[ xl

(i )

 ( i )  (i )  = ( j, v)] = 2  pklj qk   pklv qk   k  k 

– where qk(i) is the proportion of individual i’s genome that came from population k, and pklj is the frequency of allele j at locus l in population k

• Correct for demographic history • Use composite likelihood to fit growth and bottleneck parameters to silent sites (assumed to be neutral) • Use these demographic parameters to correct SFS

Rasmus Nielsen

European autosomal and X site frequency spectra Replacements

Silent

1

2 3

4 5

6 7

8

9 10 11 12 13 14 15

Note that X has more high frequency derived sites

Radical nonsynonymous changes have greater skew to SFS

Seeking evidence for balancing selection: Excess intermediate frequency alleles

Andres et al. (2009) Mol. Biol. Evol. 26:2755–2764

Site Frequency Spectra for ERAP2

Andres et al. (2009) Mol. Biol. Evol. 26:2755–2764

ERAP2 haplotype network

ancestral

Yoruba Luhya Palestinian

Gujarati Han Toscani

Andres et al. (2009) Mol. Biol. Evol. 26:2755–2764

Nucleotide diversity

Selective sweeps leave troughs of diversity

Position along chromosome

Kim and Stephan (2002) Genetics 160: 765–777

Bottlenecks and pseudoSweeps • Bottleneck + growth leaves sweep-like troughs of diversity

H   w

Bottleneck simulation: 10 kb,  = 0.01/site, r = 0.1/site, tr = 0.004, d = 0.015 and f = 0.03

Selective sweeps and allele age

Coop et al. (2009) Plos Genet 5:e1000500

Complications • Demography (growth, migration) • Population substructure • Kinship among sampled individuals

Solution • First fit demographic model by Approximate Bayesian Computation and infer selection after accommodating growth. • Infer population substructure by applying Principal Components Analysis, and using PCs in model fits. • Infer kinship by IBD methods and use kinship matrix in subsequent models.

The Plan • • • • • • • •

How genomic data reflect past selection Codon substitution based tests Site-frequency spectrum tests Complications Polymorphism vs. Divergence tests Extended haplotype tests Hard vs. soft sweeps Patterns of inter-population divergence

Contrasting polymorphism with interspecific divergence McDonald-Kreitman Test

Divergent

Polymorphic

Synonymous

n11

n12

Replacement

n21

n22

Counts of divergent sites and SNPs

G = 816.03,

P < 2 x 10-16

Bustamante et al. (2005) Nature 437:1153-1157.

McDonald-Kreitman Poisson random field model Divergent Synonymous

Replacement

2  s F(, , =2Ns)

Polymorphic 4N(1/i)

G(4N, =2Ns)

Fit the above parameters by MCMC Sawyer and Hartl (1992) Genetics 132:1161

Posterior density for 2Ns (obtained for each individual gene)

Frequency

300

200

100

0 -1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

Sampled value of 2Ns

The above indicates that Pr( > 0) > 0.97

Positive & negative selection inferred from human polymorphism

Inferences from human-chimp tests: outlier processes Biological process

Number of genes

p-value

Immunity and defense

566

0.0000

Sensory perception

274

0.0000

Protein biosynthesis

178

0.0000

Chemosensory perception

173

0.0000

Olfaction

155

0.0000

T-cell mediated immunity

145

0.0000

B-cell- and antibodymediated immunity

107

0.0000

Gametogenesis

65

0.0002

Spermatogenesis and motility

31

0.0003

Translational regulation

28

0.0003

Biological process unclassified

4127

0.0005

Inhibition of apoptosis

37

0.0044

2Ns is correlated with dN/dS (r = 0.377, P < 10-9)

Note! Can detect significance even if dN/dS << 1.

Positively selected

Negatively selected

Many negatively selected genes are associated with known Mendelian disorders Huntingtin has Pr(<0) greater than 99%

As a class, transcription factors show strong positive selection

39/240 transcription factors had Pr(>0) > 97.5%

The Plan • • • • • • • •

How genomic data reflect past selection Codon substitution based tests Site-frequency spectrum tests Complications Polymorphism vs. Divergence tests Extended haplotype tests Hard vs. soft sweeps Patterns of inter-population divergence

Drosophila pachea and senita cactus

Biosynthesis of ecdysone

Biosynthesis of ecdysone

Neverland

D. pachea must use lathosterol to make ecdysone

nvd mutations unique to D. pachea

dN/dS of nvd along the phylogeny

Reduced polymorphism in neighborhood of nvd

Polymorphism statistics for D. pachea

nvd vs. “control” regions of the genome

nvd vs. “control” regions of the genome

• nvd region has reduced nucleotide diversity and recombination rate. • Skewed site frequency spectrum toward rare  recovery from sweep.

Hudson-Kreitman-Aguadé test results

nvd has only 5.6% of the polymorphism expected given its level of divergence.

Omega statistic indicates a recent selective sweep

Kim and Nielsen 2004; Jensen et al. 2007

Extended haplotype in neighborhood of nvd

Virginie Orgogozo

Lang et al. (2012) Science 337:1658-1661.

The Plan • • • • • • • •

How genomic data reflect past selection Codon substitution based tests Site-frequency spectrum tests Complications Polymorphism vs. Divergence tests Extended haplotype tests Hard vs. soft sweeps Patterns of inter-population divergence

Extreme allele frequency differences are strongly correlated with mean FST

Coop et al. (2009) Plos Genet 5:e1000500

Exceptional divergence in allele frequency found in the 1000 genomes project SLC24A5

skin pigmentation

SMC45A5

skin pigmentation

DARC

immunity

TLR1

immunity

ADH1B

alcohol metabolism

EDAR

hair morphology

ABCC11

ear wax consistency

But the 1000 genomes project finds thousands of other SNPs with high divergence between populations

Genes with strong inter-population differentiation and unusually long haplotypes

Derived allele is out-of-Africa Coop et al. (2009) Plos Genet 5:e1000500

Derived allele is European and central Asian

Coop et al. (2009) Plos Genet 5:e1000500

Derived allele is east Asian

Coop et al. (2009) Plos Genet 5:e1000500

Two-dimensional site-frequency spectrum

Yi et al. (2010) Science 329:75-78

Full exome sequencing of 50 Han and 50 Tibetans identifies EPAS1 in altitude acclimation

Yi et al. (2010) Science 329:75-78

Conclusions •Evidence for natural selection is rampant. •Signatures of selection arise from patterns of polymorphism and divergence. •Tests probe different times in the past when selection could have acted. Extended haplotype tests are for very recent events. •Positive selection is especially common in genes involved in immunity, defense, and perception. •Genes that have a signature of recent selection generally display larger allele frequency differences between populations. •Power of tests is improved with additional genome sequences of related species.

Signals of adaptation in genomes

Exceptionally long haplotypes. • Pattern of polymorphism out of kilter with levels of divergence ..... and unusually long haplotypes. Derived allele is out-of-Africa ...

3MB Sizes 0 Downloads 226 Views

Recommend Documents

pdf-14104\horizontal-gene-transfer-genomes-in-flux-methods-in ...
... apps below to open or edit this item. pdf-14104\horizontal-gene-transfer-genomes-in-flux-methods-in-molecular-biology-from-brand-humana-press.pdf.

Attenuation of Adaptation - Princeton University
strategy, it cannot capture the rapid initial reduction in error or the overcompensatory drift. Therefore, we modeled the strategy as a setpoint/reference signal that ...

Molecular Footprints of Local Adaptation in Two ... - Oxford Academic
ios of Hormathophylla spinosa (Cruciferae). Am Nat. 155:657–. 668. González-Martınez SC, Dillon S, Garnier-Géré P, et al. (16 co-authors). Forthcoming 2010.

the use and adaptation of feng shui in spatial ... - MOBILPASAR.COM
the harmful effects with the technology and knowledge we have nowadays, it could be why the reason Feng ... Figure 1:Feng Shui Analysis (Smith, 2006, p.10) ...

Molecular Footprints of Local Adaptation in Two ... - Oxford Academic
and Technology (INIA), Madrid, Spain. 2Department of ..... Gene Engineering of the Ministry of Education, Sun Yat- sen University ...... 171:15–22. Baradat PH ...

Field heritability of a plant adaptation to fire in heterogeneous ...
man et al. 2012; Budde et al. 2014), and reduced by mul- tiple environmental factors (Kruuk & Hadfield 2007). An alternative to measuring genetic variance under controlled conditions is to use the genetic similarity ...... genomic prediction: coat co

Timescales of Inference in Visual Adaptation
Mar 11, 2009 - Figure 1. The Time Course of Adaptation following an Increase in Temporal Contrast Depends on ...... cent dye (0.1 mM Alexa 488; Molecular Probes). ... Gaussian noise in the frequency domain and filtering at 50 Hz, 60 Hz, or.

Causal inference in motor adaptation
Kording KP, Beierholm U, Ma WJ, Quartz S, Tenenbaum JB, Shams L (in press) Causal inference in Cue combination. PLOSOne. Robinson FR, Noto CT, Bevans SE (2003) Effect of visual error size on saccade adaptation in monkey. J. Neurophysiol 90:1235-1244.

Adaptation maintains population homeostasis in ...
Apr 21, 2013 - At the extreme, if a single orientation were shown. 100% of the ..... Chance, F.S. & Abbott, L.F. Inputspecific adaptation in complex cells through.

Domain Adaptation in Regression - Research at Google
Alternatively, for large values of N, that is N ≫ (m + n), in view of Theorem 3, we can instead ... .360 ± .003 .352 ± .008 ..... in view of (16), z∗ is a solution of the.

ClimateWise - Integrated Climate Change Adaptation Planning In ...
Page 1 of 47. Integrated. Climate. Change. Adaptation. Planning. in San. Luis. Obispo. County. Marni. E. Koopman,1. Kate. Meis,2. and. Judy. Corbett2. 1. The. GEOS. Institute. (previously. the. National. Center. for. Conservation. Science. and. Polic

Bandwidth Adaptation in Streaming Overlays
such as Skype relaying and content distribution systems. We believe that ... ID and current bandwidth limit using a directory service such as DNS or it can be ...

Finding Genre Signals in Academic Writing - Journal of Writing Research
away, we needed to develop a simpler schema. Our goal .... rewards described by Cozzens (1989) might also introduce conceptual ambiguity because ...... global indicator that an advisee is maintaining the proper citational patterns that allows.

Controlling loudness of speech in signals that contain speech and ...
Nov 17, 2010 - variations in loudness of speech between different programs. 5'457'769 A ..... In an alternative implementation, the loudness esti mator 14 also ... receives an indication of loudness or signal energy for all segments and makes ...

Finding Genre Signals in Academic Writing - Journal of Writing Research
example, one common move in scientific discourse is a claim, usually a statement of fact that is novel. Claims .... institutional affiliation, and document object index), the full text of the article without images, and works cited ..... the 505 rese

The Value of Inflammatory Signals in Adaptive Immune ...
cells before pathogens replicate to sufficient numbers to cause disease or death. .... ODE model, and compare predictions to empirical data. We then use an ...

Finding Genre Signals in Academic Writing - Journal of Writing Research
exigency to a particular audience, in this case readers of academic research articles. For example, one common move in scientific discourse is a claim, usually a ...

Perceptual coding of audio signals
Nov 10, 1994 - for understanding the FORTRAN processing as described herein is FX/FORTRAN Programmer's Handbook, Alliant. Computer Systems Corp., July 1988. LikeWise, general purpose computers like those from Alliant Computer Sys tems Corp. can be us

Perceptual coding of audio signals
Nov 10, 1994 - “Digital audio tape for data storage”, IEEE Spectrum, Oct. 1989, pp. 34—38, E. .... analytical and empirical phenonomena and techniques, a central features of ..... number of big spectral values (bigvalues) number of pairs of ...

Separation of melting and environmental signals in an ... - Aslak Grinsted
Separation of melting and environmental signals in an ice core with seasonal melt. John C. Moore,1 Aslak Grinsted,1,2 Teija Kekonen,1,3 and Veijo Pohjola4. Received 1 March 2005; revised 21 March 2005; accepted 21 April 2005; published 19 May 2005. [

Learning Contrast-Invariant Cancellation of Redundant Signals in ...
Sep 12, 2013 - Citation: Mejias JF, Marsat G, Bol K, Maler L, Longtin A (2013) Learning Contrast-Invariant Cancellation of Redundant Signals in Neural Systems. PLoS Comput. Biol 9(9): e1003180. doi:10.1371/journal.pcbi.1003180. Editor: Boris S. Gutki

Decimation of baseband DTV signals prior to channel equalization in ...
Oct 11, 2001 - symbol frequencies of the QAM and VSB signals by apply. (52). _ 348/726 ..... Selected packets are used to reproduce the audio portions of the DTV program, ..... to time samples supplied from a sample clock generator is.

Controlling loudness of speech in signals that contain speech and ...
Nov 17, 2010 - the implementation described here, the block length for cal. 20. 25. 30. 35 ..... processing circuitry coupled to the input terminal and the memory ...