The use of approximate Bayesian computation in ...

Viewer
Transcript

Conserv Genet DOI 10.1007/s10592-009-0032-9

RESEARCH ARTICLE

The use of approximate Bayesian computation in conservation genetics and its application in a case study on yellow-eyed penguins Joao S. Lopes • Sanne Boessenkool

Received: 3 September 2009 / Accepted: 11 December 2009 Springer Science+Business Media B.V. 2009

Abstract The inference of demographic parameters from genetic data has become an integral part of conservation studies. A group of Bayesian methods developed originally in population genetics, known as approximate Bayesian computation (ABC), has been shown to be particularly useful for the estimation of such parameters. These methods do not need to evaluate likelihood functions analytically and can therefore be used even while assuming complex models. In this paper we describe the ABC approach and identify specific parts of its algorithm that are being the subject of intensive studies in order to further expand its usability. Furthermore, we discuss applications of this Bayesian algorithm in conservation studies, providing insights on the potentialities of these tools. Finally, we present a case study in which we use a simple IsolationMigration model to estimate a number of demographic parameters of two populations of yellow-eyed penguins (Megadyptes antipodes) in New Zealand. The resulting estimates confirm our current understanding of M. antipodes Electronic supplementary material The online version of this article (doi:10.1007/s10592-009-0032-9) contains supplementary material, which is available to authorized users. J. S. Lopes (&) School of Biological Sciences, University of Reading, Reading RG6 6AJ, UK e-mail: [email protected] URL: http://www.reading.ac.uk/*sar05sal S. Boessenkool Department of Zoology, University of Otago, Great King Street, Dunedin 9016, New Zealand Present Address: S. Boessenkool National Centre for Biosystematics, Natural History Museum, University of Oslo, Blindern, P.O. Box 1172, 0318 Oslo, Norway

dynamic, demographic history and provide new insights into the expansion this species has undergone during the last centuries. Keywords Approximate Bayesian computation Historical demography Likelihood-free Isolation-Migration model Megadyptes antipodes Population genetics Abbreviations ABC Approximate Bayesian computation MCMC Markov chain Monte Carlo

Introduction In the last decade the use of population genetics methods in conservation studies has grown remarkably. In fact, technologies in genomics, systematics and population biology became practically ubiquitous in conservation biology (DeSalle and Amato 2004). Various types of genetic analyses can reveal the patterns that are relevant when managing endangered populations. Furthermore, in areas such as for example monitoring populations, genetic approaches may provide a cheaper and more sensitive and reliable means than traditional methods (Schwartz et al. 2007). In 1986, Allendorf et al. enumerated the study areas in which genetics tools would be most advantageous. These areas included studies to minimize inbreeding and loss of genetic diversity, to establish pedigrees and parentage relations, to calculate hybridization levels, to detect invasive species; to better define population connectivity or to

123

Conserv Genet

improve identification of potentially endangered populations. Later, Avise (1996) identified certain fields in conservation that would arise or would greatly benefit from the influx of population genetics methods to that area. Such fields comprise the reconstruction of population histories, the identification of significant units for conservation, the forensic tracking of individuals and the policing of illegal trade. Nowadays, although still struggling to define its place in the context of environmental conservation, conservation genetics is an established field (Amos and Balmford 2001). Its importance is undeniable and, as noted by Frankham (2005), the adverse effects of ignoring the genetic factors in conservation are potentially significant. These effects can be as varied as underestimation of extinction risks, adoption of inappropriate recovery strategies, use of inappropriate populations for reintroductions, misdiagnosis of fragmented populations (Frankham et al. 2002) and establishment of hybridization processes of different evolutionary significant units or taxa resulting in reduced fitness (Frankham 2005). The importance in conservation of demographic inference, a particular branch of population genetics, has been noted for some time. As early as in 1988, Lande published a controversial article pointing out that demographic factors (i.e. population sizes and the connectivity between populations) could be more important for understanding extinction than any of the genetic factors that could be incorporated into a theory of conservation biology. Indeed, demographics alone can provide a global perspective on the set of events that structure species and communities relevant to conservation (Barnosky et al. 2001). Understanding the contribution of such events is important for making predictions about the future of species (Waits et al. 1998; Allendorf and Luikart 2007). Moreover, demographic tools can detect early warning signs to conservation, such as population subdivision, local extinction and loss of connectivity, through the genetic signature left by these events (Riddle et al. 2008). Initially the use of demographic methods in conservation genetics was made by applying classic population genetic measures such as FST (Wright 1950) to characterize the levels of variation and genetic contact between related populations. More recently, however, these classical measures have been criticized for being imprecise (Zhang and Hewitt 2003; Neigel 2002). With the increased availability of molecular data, the use of coalescence based methods for statistical analysis of populations has grown in popularity. In this context, a statistical framework known as approximate Bayesian computation (ABC) methods has been developed and used to estimate parameters such as gene flow or effective population sizes (Schwartz et al. 2007). Along with demographic studies, an understanding of landscape dynamics and area-based conservation will be vital to help

123

conservation decisions. According to DeSalle and Amato (2004), the goal of conservation biology should be, in fact, the long-term health of geographic regions with a particular conservation interest. Spatial methods are crucial in the assessment of connectivity in the formation of reserves and in determining the genetic impact of the introductions of non-endemic populations and reintroductions of endemics. The use of these methods can, however, be problematic since they can be very computational demanding. Again, the ABC framework has been shown to be a good solution to that problem (see for example Estoup et al. 2004). This paper is divided in two major sections: the first describes the ABC approach, its use in conservation genetics, and recent advances in the development of the method; and the second presents an application of ABC to a dataset of two populations of yellow-eyed penguins (Megadyptes antipodes) in New Zealand (Boessenkool et al. 2009a, b) for which we estimate effective population sizes, time since divergence and migration rates.

Approximate Bayesian computation methods In the early 1990s the use of likelihood-based statistics, such as Markov chain Monte Carlo methods (MCMC), to perform accurate statistical inferences from molecular genetic data underwent an increase in popularity (Wilson and Balding 1998). These methods had some limitations, however, in particular the requirement of using explicit likelihood calculations. Even when such calculations were possible, the application of MCMC methods could be computationally prohibitive. Attempts to use increasing amounts of data and models of greater complexity led to the development of approximate Bayesian computation (ABC) methods (Marjoram and Tavare´ 2006). When writing about ABC methods three papers should be mentioned: the first description of a true ABC method by Pritchard et al. (1999) and two papers that extended the approach and gave the technique its name: the study by Beaumont et al. (2002) that explicitly recognised the technique as a method of conditional density estimation and provided refinements based on local linear regression; and the embedding of the ABC method within MCMC by Marjoram et al. (2003). However, the path to these methods as a simulation based approach had been laid down previously by other papers, particularly Tavare´ et al. (1997) and Weiss and von Haeseler (1998). The initial development of ABC depended on the ability to rapidly simulate genetic data from different demographic scenarios based on coalescent theory (Kingman 1982; Hudson 1983; Tajima 1983, and early applications in population genetics focused on inferring demographic history (Pritchard et al. 1999; Estoup et al. 2001; Estoup and Clegg 2003; Excoffier

Conserv Genet

et al. 2005). As its efficiency and robustness were recognized, its popularity increased and the method started to be used in a wide variety of fields (Lopes and Beaumont 2009). The application of ABC in areas related to population genetics included phylogeographic studies (Hickerson et al. 2006; Leache´ et al. 2007; Legras et al. 2007); conservation studies (Chan et al. 2006; Evans et al. 2008; Aspi et al. 2009); epidemiologic studies (Shriner et al. 2006; Tanaka et al. 2006; Toni et al. 2009) and studies in ecology (Jabot and Chaves 2009). The method has also been applied to other fields, such as, theoretical statistics (Padon 2008), protein structure evolution (Ratmann et al. 2007; Grelaud et al. 2009), stereology (Bortot et al. 2007), chromosomal evolution (Koerich et al. 2008), biochemical signalling pathways (Toni and Stumpf 2009), meteorology (Joshi 2007) and hatching management (Slabbert et al. 2009). ABC is characterized by two main features: the use of summary statistics to summarise data, which significantly reduces the size of data to handle; and the use of Monte Carlo simulations that avoid the need to use explicit likelihood functions. These characteristics confer great flexibility to this statistical analysis, allowing for the analysis of complex models, the use of large data sets and the estimation of parameters that would otherwise be intractable. Standard ABC-rejection/regression approach The ABC-rejection and ABC-regression methods are arguably the most widely used in population genetics (Beaumont et al. 2002; Wilkinson 2008; Blum 2009). The ABC-rejection algorithm (Pritchard et al. 1999) is as follows: (1) (2) (3)

(4)

Sample (vector valued) parameter, U, from the prior: Ui * p(U); Simulate data, D, given Ui: Di * p(D | Ui); Summarize Di with a set of chosen summary statistics to obtain Si; go to (1) until N sample points from the joint distribution p(S, U) have been created; Accept the points whose Si is within a distance d from s’, the real data summarized by the same set of summary statistics, |Si – s0 | \ d;

In steps (1) to (3) we simulate independent pairs (Ui, Si), i = 1, 2,…, N, where each Ui is an independent draw from the prior distribution U, and Si are values that summarize simulated values of D with U = Ui. In this case, (Ui, Si) are random draws from the joint density. An estimate of the posterior distribution p(U|S = s0 ) is obtained in step (4), the rejection-step. It is assumed that the Ui for which |Si - s0 | is small form an approximation of a random sample from the desired posterior distribution p(U|D = d0 ). In the ABC-regression method (Beaumont et al. 2002) two innovations are proposed at step (4): smooth weighting

and regression adjustment. The idea is to improve the sampling of the posterior density by weighting the Ui according to its distance from the real data by evaluating |Si - s0 | and to adjust the Ui using local-linear regression to weaken the effect of that discrepancy. This algorithm can be described as: (4)

Apply a weighting scheme to the points according to their distance to s’ and perform a weighted linear multiple regression with points assigned non-zero weight. Adjust points according to Ui* = Ui ? Ri, where Ri is the residual of the ith point.

Beaumont et al. (2002) establish the relation between these two algorithms (ABC-rejection and ABC-regression), pointing out that the rejection method can be viewed as a special case of the local-linear regression approach when using a uniform kernel and a local-constant regression. Also, as d tends to zero, the ABC-regression and ABCrejection methods become equivalent. Developments on the ABC method Several approaches have been suggested to stretch the application of ABC to more complex situations. Its statistical properties have also been subject of study in recent papers (Wilkinson 2008; Blum 2009) and the comparison with its correspondent full-likelihood methods has been performed (Beaumont et al. 2002; Hickerson et al. 2006; Sousa et al. 2009). The choice of the summary statistics that are used in an ABC analysis is still a matter of debate. Some attempts to provide a methodology for such choice have been proposed (Hamilton et al. 2005; Joyce and Marjoram 2008) but these are far from providing a general agreement. Alternatively, a proposed solution is to compare the errors obtained from the use of different sets (Hickerson et al. 2006; Neuenschwander et al. 2008). This empirical technique is, however, not very satisfactory since its result cannot be generalized for all the parameter space of even just a particular model, but just to the explored region that is selected a priori. Another concern in ABC is the exploration of the space of the prior. When assuming a very wide prior this exploration demands a great number of simulations, which may cause the simulation time to reach prohibitive values [although parallel computation can be easily implemented for ABC methods (e.g. Excoffier et al. 2005)]. For this reason Sisson et al. (2007) suggested a sequential approach based on the work by Del Moral et al. (2006) that explores the space using consecutive steps that take previous posterior distributions as the new priors. Beaumont et al. (2009) note a bias in the original description by Sisson et al. (2007), and propose a

123

Conserv Genet

modified algorithm based on importance sampling arguments (see also Toni et al. 2009). This sequential approach is promising and has now been applied in recent studies (Peters et al. 2008; Toni et al. 2009; Del Moral et al. 2008; Toni and Stumpf 2009). The final aim of a Bayesian method is to obtain the posterior distribution. In ABC we aim to estimate this conditional density by sampling the simulated data that are closest to the data in study. The distance from the studied data used to accept or reject the simulated points, d, will condition the rate of this acceptance. When considering the regression step (Beaumont et al. 2002) we can use a large value of d, so that we need significantly less simulations to obtain the same accepted points. The concern regarding the method of Beaumont et al. (2002) was that it assumes a linear relation between the data and the parameters in the vicinity of the value of the real data. This assumption has now been relaxed with the use of a quadratic adjustment, which provides some advantages (Blum 2009). A nonlinear conditional heteroscedastic regression has also been proposed coupled with a conditional density estimation using importance sampling, which greatly reduces the computational demand of the ABC methodology (Blum and Francois 2009).

SimCoal, Anderson et al. 2005) and applied in several studies (e.g. Chan et al. 2006; Fabre et al. 2009). More recently two other packages have been developed: DIY ABC (Cornuet et al. 2008) and popABC (Lopes et al. 2009). These programs have been developed mainly to provide a fast and user-friendly way to run ABC studies, in particular for model-choice inferences. DIY ABC has been shown to be of importance when analysing microsatellite data (Verdu et al. 2009), while popABC is particularly useful for multilocus phylogeographic approaches with both STR and SNP data (Palero et al. 2009). Another program that was recently developed and has been used fairly frequently is ONeSAMP (Tallmon et al. 2008). This software can only deal with a single WrightFisher population but has a very friendly web-based interface. For this reason it has already been extensively used to calculate effective population sizes (Witzenberger and Hochkirch 2008; Aspi et al. 2009; Johnson et al. 2009; Slabbert et al. 2009). We present a table in which we summarize the existing software developed to apply approximate Bayesian methods (Table 1). This table also presents the ‘‘flavour’’ of the ABC algorithm provided and whether the software has a graphic interface to guide the user.

Available software

The use of ABC in conservation genetics

As stated above, the first application of ABC studies was on the estimation of historical demographic parameters. In these studies it was essential to use software to simulate genetic trees using the coalescence. In 2002, Richard Hudson published an extremely flexible coalescence simulator called ms. Subsequently, the use of ABC analysis increased considerably and many studies used ms (Haddrill et al. 2005; Hickerson et al. 2005; Wright et al. 2005; Thornton and Andolfatto 2006; De Mita et al. 2007; Franc¸ois et al. 2008; Legrand et al. 2009). The ms software has now been pipelined in the msBayes package (Hickerson et al. 2007). This package comprises several scripts that allow the user to run a complete ABC analysis. It has been used especially in phylogeographic studies (Hickerson et al. 2006; Rosenblum et al. 2007; Hickerson and Meyer 2008; Carnaval et al. 2009). In 2004 Laval and Excoffier released the coalescent simulator SIMCOAL2.0. This software simulates genetic trees using the discrete-generation coalescent approach. Although this approach involves slow simulations, it allows greater flexibility when defining demographic models. This software has been applied to ABC studies fairly frequently (Excoffier et al. 2005; Fagundes et al. 2007; Kayser et al. 2008; Patin et al. 2009). A user-friendly interface for SIMCOAL to perform ABC analysis on temporally spaced samples has also been developed (Serial

Population genetics tools have become an integral part of conservation studies. These tools are particularly useful to quantify parameters that affect endangered populations such as inbreeding, effective population sizes, minimum viable population sizes, levels of genetic variation and gene flow (DeSalle and Amato 2004). The ABC methods have revealed themselves particularly useful for the estimation of such parameters due to their flexibility and robustness. Furthermore, ABC methods seem suitable to study complex population models of particular importance in conservation genetics (Table 2). Spatial dispersal models, for example, while of importance for conservation studies, are too complex for traditional statistical approaches. Dispersal patterns can affect the persistence of local populations, species extinction rates, the evolution of species ranges, synchrony of population size changes, and many other important ecological properties necessary for good conservation planning (Whitlock and McCauley 1999). Spatial dispersal models can be used to evaluate the viability of particular populations, but they can also be useful to predict the ecological behaviour of invasive species. One of the first applications of ABC in conservation was to infer the spatial expansion dynamics of the invasive species Bufo marinus. The geographic range of this invasive toad is expanding in eastern and northern Australia since the first isolates became

123

Conserv Genet Table 1 Available software to perform ABC computation analysis

ABC approximate Bayesian computation, MCMC Markov chain Monte Carlo

Algorithm

Graphic interface

References

ms

Simulator

No

Hudson (2002)

SIMCOAL 2.0

Simulator

No

Laval and Excoffier (2004)

Serial SimCoal

ABC-regression

No

Anderson et al. (2005)

MIMAR

MCMC–ABC

No

Becquet and Przeworski (2007)

msBayes

ABC-regression

No

Hickerson et al. (2007)

DIY ABC

ABC-regression

Yes

Cornuet et al. (2008)

Rejector

ABC-rejection

No

Jobin and Mountain (2008)

ONeSAMP

ABC-rejection

Yes

Tallmon et al. (2008)

popABC

ABC-rejection

Yes

Lopes et al. (2009)

Table 2 Examples of conservation genetics studies using an ABC analysis Study

Data

Model

Algorithm

Estoup et al. (2004) Miller et al. (2005)

Bufo marinus Diabrotica virgifera

Spatial expanding populations Invasion scenarios

ABC-regression ABC-rejection

Chan et al. (2006)

Ctenomys sociabilis

Population bottleneck

ABC-regression

Evans et al. (2008)

Bufo celebensis

Habitat fragmentation

ABC-rejection

Topp and Winker (2008)

Five landbirds species

Simultaneous divergence

ABC-regression ABC-regression

Witzenberger and Hochkirch (2008)

Gryllus campestris

Animal reintroduction

Aspi et al. (2009)

Canis lupus

Gene flow between populations

ABC-regression

Carnaval et al. (2009)

Three tree-frogs species

Simultaneous divergence

ABC-regression

Johnson et al. (2009)

Haliaeetus vociferoides

Population bottleneck

ABC-regression

Voje et al. (2009)

Four bush-cricket species

Simultaneous divergence

ABC-regression

ABC approximate Bayesian computation

established in 1960. Five different demographic models of expansion have been compared in order to assess which one was statistically best supported (Estoup et al. 2004). Another example of a study on invasion scenarios using ABC was performed by Miller et al. (2005) who studied the introduction routes of the western corn rootworm (Diabrotica virgifera virgifera) into Europe. Their findings revealed at least three independent introductions instead of a single one, revealing important insights that may be determinant in order to prevent such invasions in the future (Estoup et al. 2004). Detecting and understanding the mechanisms that drive shifts in population sizes are valuable when establishing conservation plans. Past occurrences of bottlenecks, in particular, can have a profound effect on a population’s genetic diversity and adaptability. Chan et al. (2006) applied an ABC method to estimate the timing and severity of a bottleneck in an endemic subterranean rodent (Ctenomys sociabilis). Their work revealed a 99.7% decline of the population about 3000 year ago. Another demographic study on bottlenecks using ABC has been presented by Johnson et al. (2009) who studied a critical endangered Madagascar population of fish-eagle (Haliaeetus vociferoides). Their aim was to determine if the low genetic diversity of that population was due to an ancient

bottleneck or a more recent one. The results showed that the studied population has been maintaining a considerably small size for hundreds of thousands of years. One of the most recurrent scenarios in conservation genetics is the occurrence of habitat fragmentation. This fragmentation decreases the potential for interactions between populations, which can lead to genetic erosion. Habitat fragmentation models are of particular importance in the present climatic situation. Evans et al. (2008) applied an ABC algorithm to provide evidence for habitat fragmentation in populations of Celebs toads (Bufo celebensis) living in the Indonesian island of Sulawesi. The results suggest that other species on the same island might be suffering from similar habitat fragmentation. Understanding the processes between related populations, and particularly establishing the connectivity between populations, is essential for achieving success in conservation strategies. Typically, two processes affecting related populations are studied: the divergence time of the populations and the migration events between them. Topp and Winker (2008) studied the existence of simultaneous divergence in four landbirds species of the Queen Charlotte Islands from their respective mainland populations, which suggested that these four species shared a glacial-refugium history. Using an ABC methodology it was revealed that

123

Conserv Genet

actually several colonization events took place. The results indicated the importance of the Queen Charlotte islands for the conservation and management of birds in northwestern North America. A similar study was performed by Carnaval et al. (2009) in tree-frog species from the Brazilian Atlantic forest. Their study indicated that the central region of the forest served as a large climatic refugium for neotropical species in the late Pleistocene, setting new priorities for conservation in Brazil. Another study using the same simultaneous divergence model was performed by Voje et al. (2009), who used a group of bush-cricket species inhabiting both the forest and the savannah of East African mountains. These mountains are characterised by an exceptionally high biodiversity. Voje and co-workers explained this richness of species by a climatically induced retraction of the forest to higher altitudes about 0.8 million years ago, which would have promoted vicariant speciation in the mountain zone. This finding reinforced the importance of conserving the forest patches in the region. Given the importance of gene flow a great deal of effort has been put towards measuring it and its consequences in a conservation perspective (e.g. Whitlock and McCauley 1999; Wilson and Rannala 2003; Berry et al. 2004; Paetkau et al. 2004). Work by Aspi et al. (2009) aimed to calculate the occurrence of migration between populations of the Russian wolf (Canis lupus). They used an ABC method to calculate effective population sizes in order to obtain migration values in terms of number of migrants per generation. Inbreeding depression and loss of genetic diversity are considered important processes affecting small populations that can increase a population’s extinction risk (Frankham 2005). These processes now receive considerable attention when establishing new populations by means of translocation or re-introductions. Witzenberger and Hochkirch (2008) tried to minimise the loss of genetic diversity of a single, endangered population of field crickets (Gryllus campestris) in northern Germany by translocating groups of individuals to form isolated populations. Their aim was to create several genetically distinct populations. They used ABC tools to calculate the effective population size of these populations at different time points, and showed that two populations translocated 10 years from then had already formed unique genetic clusters. The studies presented above provide a good insight on the potentialities of ABC methods in conservation analyses. As shown, ABC can consider a wide variety of models with some degree of complexity. Below we present a case study applying an ABC tool on a typical Isolation-Migration model to estimate several demographic parameters (e.g. effective population size, migration rates, divergence time) of two populations of the endangered yellow-eyed penguin (Megadyptes antipodes). This dataset is particularly suitable

123

to illustrate the use of ABC in a conservation context because (1) the yellow-eyed penguin is an endangered species subject to intensive conservation management, (2) this penguin is only found in the two studied populations and our dataset therefore covers the complete worldwide distribution of the species, (3) several of the estimated parameters have been investigated in previous research to which we compare our results, and (4) we are able to estimate several up to now unknown parameters which improve our understanding of the demographic history of yellow-eyed penguins and with that provide information relevant for the future conservation management of this species.

Demographic study on populations of yellow-eyed penguins (Megadyptes antipodes) Introduction The yellow-eyed penguin (Megadyptes antipodes) is an iconic species of special interest for New Zealand’s wildlife tourism and conservation management programs. Currently, the world’s population, approximately 5,500 individuals, breeds on and around the South Island of New Zealand, and on the subantarctic Auckland and Campbell Islands (Fig. 1; Marchant and Higgins 1990; McKinlay 2001). The species has been classified as endangered by the IUCN (EN B2b(iii)c(iv)) based on its confined breeding range, the decline in suitable habitat and extreme fluctuations in numbers (Birdlife International 2008). Yellow-eyed penguins on the South Island were thought to be a declining remnant of a previously abundant and widespread population, and conservation efforts have largely focused on predator trapping and revegetation of coastal habitat on this island. Recent genetic and morphological analyses of modern and subfossil penguin specimens have suggested, however, that the South Island population was founded just a few hundred years ago when this penguin expanded its range from the subantarctic islands (Boessenkool et al. 2009a). This expansion is thought to have taken place after the demise of M. antipodes’ sister species, M. waitaha, following Polynesian arrival in New Zealand, but before European settlement in the eighteenth century (Boessenkool et al. 2009a). Despite the recent range expansion, contemporary migration rates between the subantarctic and the South Island are estimated to be less than 2% and these areas are consequently recognized as two separate populations (i.e. subantarctic and South Island; Boessenkool et al. 2009b). Nowadays, around 40% of yellow-eyed penguins are found on and around South Island (McKinlay 2001), but the effective population size (Ne) of this population is estimated to be in the low hundreds only, raising concern for the populations

Conserv Genet

N

South Island

Fig. 2 Assumed population model of the historical evolution of the two yellow-eyed penguin populations

South Island population

We have chosen to use an ABC method, instead of an available MCMC based approach (IM, Hey and Nielsen 2004), because of the difference in the way both methods deal with the variance of mutation rate across loci. IM uses a complex system of geometric means that does not take in account different marker types (i.e. sequence data and microsatellite data), whereas the ABC method assumes a prior for the mutation rate of each data type and uses a hierarchical system that deals with inter-locus variation very naturally (as explained below). For this reason, we believe that the ABC method can assume a mutation model that suits the data better.

Auckland Islands

Subantarctic population Campbell Island

0

200 km

Materials and methods Sampling and genetic analyses

Fig. 1 Map of the South and Subantarctic Islands of New Zealand. The grey line represents the current breeding range of Megadyptes antipodes

maintenance of genetic diversity and future adaptive potential (Boessenkool et al. this issue). In the present study we have applied an ABC method to a dataset of 12 microsatellite loci and 813 bp of the first hypervariable region of the mitochondrial DNA of yelloweyed penguins from the South Island and subantarctic populations. We estimate a number of demographic parameters (e.g. divergence time, effective population size, migration rate) assuming a simple Isolation-Migration model (Nielsen and Wakeley 2001) in which two populations have diverged from a single ancestral population in the past (Fig. 2). The estimated parameters are evaluated in light of our current understanding of the yellow-eyed penguin population status and demographic history as revealed by previous studies on this species.

Yellow-eyed penguins were sampled on and around the South Island (N = 249) and the subantarctic Campbell and Auckland Islands (N = 101) of New Zealand. Details of sampling methods can be found in Boessenkool et al. (2009b). DNA was extracted and purified using 40 lg proteinase K in 5% Chelex (Biorad: Walsh et al. 1991). Samples were genotyped at 12 microsatellite loci previously developed for yellow-eyed penguins (Man03, Man08, Man13, Man21, Man22, Man27, Man39, Man47, Man50, Man51, Man54, Man55; Boessenkool et al. 2008). Primer sequences, polymerase chain reaction (PCR) conditions and visualisation of microsatellite alleles are described in Boessenkool et al. (2008). An 813 bp fragment of the hypervariable region I of the mitochondrial control region was sequenced for a subset of 100 samples (60 from the South Island, 40 from the subantarctic) using primers L-Man_CR4 and H-Man-CR7 as described in Boessenkool et al. (2009a, b). Sample sizes and diversity

123

Conserv Genet

indices per locus for each population can be found in the Supplementary Information. Statistical methods We used a standard ABC-regression method to calculate historical demographic parameters of the two populations (South Island and subantarctic, Boessenkool et al. 2009b) of yellow-eyed penguins. We adopted an ‘‘IsolationMigration’’ model (Nielsen and Wakeley 2001) which assumes that two populations diverged from an ancestral population some time in the past and these populations may have been connected by migration since splitting (Fig. 2). As explained above, the ABC approach involves two steps: a rejection step and a regression adjustment and weighting step. For the rejection step we need to evaluate the closeness of simulated data to the observed data. To assess this closeness, a Euclidian distance was computed between the entire set of normalized summary statistics and the normalized summary statistics calculated from the data (‘normalised’ statistics are transformed to have zero mean and unit variance.) A certain percentage of the closest simulated data was then accepted (as in Beaumont 2008). In the second step the accepted data was given a weight between zero and one using an Epanechnikov kernel. This kernel declines quadratically, giving considerably more weight to the accepted data that are close to the observed data (e.g. Beaumont et al. 2002; Hickerson et al. 2006). All parameter values were transformed on a log scale before imposing the linear regression in an attempt to reduce its heteroscedasticity (Beaumont et al. 2002; Estoup et al. 2004). After the regression, the adjusted values were backtransformed taking their exponential so that posterior densities were expressed in the original scale. The simulated data was obtained using a standard coalescent process (Hudson 1990; Nordborg 2001). We used the stepwise mutation model (Kimura and Ohta 1978) for the microsatellite data and an infinite sites model (Kimura 1969) for sequence data. Hamilton et al. (2005) suggested running several hundreds of thousands to millions of simulations in an ABC method depending on the population model complexity. In our analysis 5,000,000 values of the summary statistics sets were generated. Using a tolerance of 0.001 this resulted in 5,000 points from which the parameters were estimated. We opted to use the mode of the posterior distributions as a point estimate of the parameters following previous studies by Hamilton et al. (2005) and Beaumont (2008). We used the popABC program to perform the ABC algorithm up until the rejection step (Lopes et al. 2009). The regression step was then performed in the version 2.5.0 of the package R (Ihaka and Gentleman 1996) using a script developed by Beaumont (makepd.r, www.rubic.rdg.ac.uk/*mab/). In order to obtain posterior

123

density estimations from the adjusted samples we used the locfit function (Loader 1996). The priors for all demographic parameters were uniform distributions bound between specified minimum and maximum values (Table 3). The maximum values for the prior distributions do not affect the results (data not shown). Population sizes were measured in number of individuals, migration rates were measured as the proportion of migrants in a population per generation (7.7 years, see Boessenkool et al. this issue) and the time of divergence was measured in years. To account for uncertainty in mutation rates we used broad priors and treated these variables as nuisance parameters (Table 3). For the mitochondrial control region sequences minimum and maximum values of the distribution were calculated using mutation rates previously estimated for penguins (Lambert et al. 2004). The variation in mutation rate between microsatellite loci was accounted for by using a hierarchical Bayesian framework (Storz and Beaumont 2002). The mutation rates for each of the twelve loci were drawn from a lognormal distribution of base 10 (the prior) with its mean sampled from a normal distribution (hyper-prior) and the standard deviation set to a fixed value, 0.5. Similar approaches have been taken in Pritchard et al. (1999), Excoffier et al. (2005), Hickerson et al. (2006) and Beaumont (2008). The summary statistics were chosen according to their success in previous ABC studies (Beaumont 2008). For the mitochondrial data we used the number of haplotypes; number of segregating sites and the average number of pairwise differences. The microsatellite data were summarized using allele number, heterozygosity and variance in allele length. The statistics for the microsatellite data were averaged across the 12 loci. For the calculation of the distances, the values for each summary statistic were normalized by subtracting its mean and dividing by its standard deviation so that the different units of the summary statistics did not affect the results. All these statistics were computed for each population individually and for both populations pooled together. Hence, the Euclidian distances were computed from a total of 18 normalized summary statistics. Results and discussion The estimated posterior distributions for all parameters are shown in Fig. 3 and summarised in Table 4. Effective population size (Ne) estimates for the two M. antipodes populations are substantially different, with subantarctic having an estimated Ne an order of magnitude higher than South Island. Temporal Ne for the South Island population has been estimated to lie in the low hundreds using modern and museum specimens sampled approximately 100 years

Conserv Genet Table 3 Prior distributions used for the analysis of the Megadyptes antipodes data

Symbol

Description

Prior distribution

Demographic parameters Ne1

South Island effective population size

Ne2

Subantarctic effective population size

Uniform (10, 4000) Uniform (10, 6000)

NeA

Ancestor effective population size

Uniform (10, 6000)

m1

South Island migration rate

Uniform (0, 0.05)

m2

Subantarctic migration rate

Uniform (0, 0.05)

t

Divergence time

Uniform (10, 1000)

Mutation parameters Normal (-4, 0.5)

Mutation rate for microsatellites

Lognormal (0mSTR, 0.5)

lSNP

Mutation rate for mtDNA

Normal (0.00422, 0.00158)

0.015

Mean of mutation rate for microsatellites

lSTR

1e−04

0.010 0.005

3e−04

b Probability density

a

0.000

Probability density

Fig. 3 a Prior (black) and posterior distributions of the South Island effective population size (grey, Ne1). b Prior (black) and posterior distributions of the Subantarctic effective population size (full grey line, Ne2) and the effective population size of the ancestor population size (dashed grey line, NeA). c Prior (black) and posterior distributions of the migration rate into South Island population (full grey line, m1) and Subantarctic population (dashed grey line, m2). d Prior (black) and posterior distributions of the divergence time between the South Island and the Subantarctic populations (grey, t)

0mSTR

0

1000

2000

3000

4000

0

1000

0.02

0.03

0.04

0.0006

30 20 10

0.01

0.0010

d Probability density

c

0.00

0.05

0

200

Migration rates Table 4 Mode and quantiles of the posterior distributions of the estimated demographic parameters for the M. antipodes data

5000

Subantarctic population size

0

Probability density

40

South Island population size

3000

400

600

800

1000

Time of divergence

Symbol

Description

Mode

0.025 quantile

0.975 quantile

Ne1

South Island effective population size

10

10

159

Ne2

Subantarctic effective population size

1226

10

3328

NeA

Ancestor effective population size

1190

10

4424

m1

South Island migration rate

0

0

0.041

m2

Subantarctic migration rate

0.015

0.001

0.031

t

Divergence time

574

96

1000

apart (Boessenkool et al. this issue). Although these estimates are not directly comparable because they concern different time scales, they all strongly suggest that the Ne of

the South Island population is low and that there is concern for this population’s long term adaptive potential. For the subantarctic population Ne is estimated to be *1,200.

123

Conserv Genet

Interestingly, this estimate is remarkably similar to our estimate of Ne for the ancestral yellow-eyed penguin population (NeA = 1190). Previous research has suggested that yellow-eyed penguins are expanded to the South Island from the subantarctic islands just 500 years ago, following the extinction of their sister taxa M. waitaha (Boessenkool et al. 2009a). Our estimate of the divergence time between South Island and subantarctic populations (t = 574) fits well with the previously estimated timing of this event. Furthermore, the similarity between the subantarctic Ne and the ancestral Ne suggests that the M. antipodes population on the subantarctic has remained stable over the past *500 years. The expansion to the South Island has thus not only resulted in a range expansion, but also in an increase of the world total population size of this species. Currently, reliable census size (Nc) estimates of the subantarctic population are not available, although conservation management assumes a census size of approximately 3,000 individuals (McKinlay 2001). Since wildlife populations typically have an Ne/Nc ratio of 0.10–0.11 (Frankham 1995) and a similar ratio has been found for M. antipodes (Boessenkool et al. this issue), our estimated Ne of *1,200 for the subantarctic population suggests that the census size of this population may be higher than is currently assumed. In our ABC the estimated migration rate from the South Island to the subantarctic population is slightly higher than vice versa, albeit both are very low. The contemporary migration rate from South Island to subantarctic has previously been estimated to be zero using assignment analyses (Pritchard et al. 2000; Paetkau et al. 2004), while the migration rate from subantarctic to South Island was estimated to be 1.6% (Boessenkool et al. 2009b). Prevalent migration from south to north (i.e. from subantarctic to South Island) agrees with the expansion history of M. antipodes and with results from long term banding studies (see Boessenkool et al. 2009b). However, the subantarctic population has more unique alleles than the South Island population, allowing for easier detection of migrants using assignment tests (Boessenkool et al. 2009b) and this difference in level of genetic diversity has potentially resulted in an underestimate of the migration rate into South Island. Nevertheless, migration rates in both directions are low and the previous conclusion that these populations should be considered demographically independent (Boessenkool et al. 2009b) is supported by the migrations rates estimated from our ABC. Overall, the ABC analysis has provided estimates of demographic parameters that fit well with our current understanding of M. antipodes demographic history and population status. Our results confirm the recent splitting of the South Island and subantarctic populations, the low migration rates between the two populations and the low Ne

123

of the South Island population (Boessenkool et al. 2009a, b, this issue). The ABC analysis has also provided two new insights that are of importance to yellow-eyed penguin conservation management. First, the subantarctic population numbers appear to have been relatively stable during the last *500 years, and second, our Ne estimates suggest the census size of this population is likely larger than is currently believed. These findings are promising, and reinforce the importance of incorporating monitoring and management of the subantartic population into the yelloweyed penguin recovery plan. Although the expansion of yellow-eyed penguins to South Island has resulted in an increase in the species range and abundance, the South Island population remains unstable and vulnerable, therefore, the importance of the original subantarctic population for the security of the species as a whole cannot be underestimated. The present study illustrates how the use of ABC forms a promising tool in conservation genetics, allowing estimation of a variety of parameters that are of great value for conservation managers. Importantly, this method is easy to apply and runs relatively fast, even when using multiple loci with relatively low number of alleles, as is the case with the M. antipodes dataset and typical for most endangered species. The speed of the analyses forms a potential advantage above full likelihood-based methods (e.g. MCMC approaches) that estimate demographic population parameters, such as for example implemented in the program IM (Hey and Nielsen 2004). Furthermore, it is feasible to add additional parameters and fit more complex models such as a model assuming population expansion (see for example Anderson et al. 2005).

Conclusions In the present situation of global climate change, habitat destruction or alteration and expansion of invasive species, we expect a number of species to go extinct (Riddle et al. 2008). Now, more than ever, it is necessary to investigate extinction risks and to lay down recovery plans for threatened species. These can only be considered scientifically credible when taking into consideration the underlying genetic factors (Frankham 2005). In the last few years ABC methods have proven themselves to be particularly suitable to integrate genetics in conservation studies, in particular through the use of demographic models. The application of the ABC framework in demographic models has already reached a certain degree of maturity, as proved by the numerous demographic studies published using that statistical framework. In this paper we aimed to present the ABC methodology as a powerful tool already available to conservationists. With that in mind we reviewed several

Conserv Genet

conservation studies using this Bayesian tool and we presented new work employing an ABC method to explore conservation related issues on a system of two populations of yellow-eyed penguins in New Zealand. As stated by DeSalle and Amato (2004), the major challenge for modern-day conservation geneticists is to incorporate genetics in a more efficient and better-defined way into conservation decision-making in a complex social, cultural and political context. We believe that ABC methods with its simplicity and robustness can have a part in helping to overtake such challenge. Acknowledgements We would like to thank the organizers of the Trondheim ConGen meeting, Kuke Bijlsma, Volker Loeschcke and Joop Ouborg, for creating such scientific environment that allowed for the creation of this collaborative paper. We are especially grateful to Mark Beaumont, whose insightful comments helped to improve the previous version of the paper considerably. We would also like to thank two anonymous reviewers that helped increase the quality of the manuscript. J.L. is funded by EPSRC grant EP/C533550/1, by Fundacao Ciencia e Tecnologia grant SFRH/BD/43588/2008 and by the ‘‘ESF Science Networking Programme ConGen’’. The University of Otago supported S.B. and provided funding for the genetic analyses of yellow-eyed penguins.

References Allendorf FW, Luikart G (2007) Conservation and the genetics of populations. Mammalia 2007:189–197 Allendorf FW, Leary RF, Soule ME (1986) Conservation biology: the science of scarcity and diversity. Sinauer Associates, Sunderland Amos W, Balmford A (2001) When does conservation genetics matter? Heredity 87:257–265 Anderson CNK, Ramakrishnan U, Chan YL, Hadly EA (2005) Serial SimCoal: a population genetics model for data from multiple populations and points in time. Oxford University Press, Oxford, pp 1733–1734 Aspi J, Roininen E, Kiiskila¨ J, Ruokonen M, Kojola I, Bljudnik L, Danilov P, Heikkinen S, Pulliainen E (2009) Genetic structure of the northwestern Russian wolf populations and gene flow between Russia and Finland. Conserv Genet 10:815–826 Avise JC (1996) The scope of conservation genetics. In: Avise JC, Hamrick JL (eds) Conservation genetics: case histories from nature. Chapman & Hall, New York, pp 1–9 Barnosky AD, Hadly EA, Maurer BA, Christie MI (2001) Temperate terrestrial vertebrate faunas in north and south America: interplay of ecology, evolution, and geography with biodiversity. Conserv Biol 15:658 Beaumont M (2008) Joint determination of topology, divergence time, and immigration in population trees. In: Matsumura S, Forster P, Renfrew C (eds) Simulations, genetics, and human prehistory. McDonald Institute for Archaeological Research, Cambridge, pp 135–154 Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian computation in population genetics. Genetics 162:2025–2035 Beaumont M, Cornuet JM, Marin JM, Robert CP (2009) Adaptivity for ABC algorithms: the ABC-PMC scheme. Biometrika. doi: 10.1093/biomet/asp052 Berry O, Tocher MD, Sarre SD (2004) Can assignment tests measure dispersal? Mol Ecol 13:551–561

Birdlife International (2008) Species factsheet: megadyptes antipodes. In: IUCN(ed) 2007 IUCN red list of threatened species. http://www.iucnredlist.org Blum M (2009) Approximate Bayesian computation: a non-parametric perspective. Arxiv preprint arXiv:0904.0635 Blum MGB, Francois O (2009) Non-linear regression models for approximate Bayesian computation. Stat Comput. doi:10.1007/ s11222-009-9116-0 Boessenkool S, King TM, Seddon PJ, Waters JM (2008) Isolation and characterization of microsatellite loci from the yellow-eyed penguin (Megadyptes antipodes). Mol Ecol Resour 8:1043–1045 Boessenkool S, Austin JJ, Worthy TH, Scofield P, Cooper A, Seddon PJ, Waters JM (2009a) Relict or colonizer? Extinction and range expansion of penguins in southern New Zealand. Proc Biol Sci 276:815 Boessenkool S, Star B, Waters JM, Seddon PJ (2009b) Multilocus assignment analyses reveal multiple units and rare migration events in the recently expanded yellow-eyed penguin (Megadyptes antipodes). Mol Ecol 18:2390–2400 Boessenkool S, Star B, Seddon PJ, Waters JM (this issue) Temporal genetic samples indicate small effective population size of the endangered yellow-eyed penguin. Conserv Genet. doi:10.1007/ s10592-009-9988-8 Bortot P, Coles SG, Sisson SA (2007) Inference for stereological extremes. J Am Stat Assoc 102:84–92 Carnaval AC, Hickerson MJ, Haddad CFB, Rodrigues MT, Moritz C (2009) Stability predicts genetic diversity in the Brazilian Atlantic Forest hotspot. Science 323:785 Chan YL, Anderson CNK, Hadly EA (2006) Bayesian estimation of the timing and severity of a population bottleneck from ancient DNA. PLoS Genet 2:e59 Cornuet JM, Santos F, Beaumont MA, Robert CP, Marin JM, Balding DJ, Guillemaud T, Estoup A (2008) Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24:2713 De Mita S, Ronfort J, McKhann HI, Poncet C, El Malki R, Bataillon T (2007) Investigation of the demographic and selective forces shaping the nucleotide diversity of genes involved in nod factor signaling in Medicago truncatula. Genetics 177:2123 Del Moral P, Doucet A, Jasra A (2006) Sequential Monte Carlo samplers. J R Stat Soc B 68:411–436 Del Moral P, Doucet A, Jasra A (2008) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Working paper, Department of Statistics, University of British Columbia DeSalle R, Amato G (2004) The expansion of conservation genetics. Nat Rev Genet 5:702–712 Estoup A, Clegg SM (2003) Bayesian inferences on the recent island colonization history by the bird Zosterops lateralis lateralis. Mol Ecol 12:657–674 Estoup A, Wilson IJ, Sullivan C, Cornuet JM, Moritz C (2001) Inferring population history from microsatellite and enzyme data in serially introduced cane toads, Bufo marinus. Genetics 159: 1671–1687 Estoup A, Beaumont M, Sennedot F, Moritz C, Cornuet JM (2004) Genetic analysis of complex demographic scenarios: spatially expanding populations of the cane toad, Bufo marinus. Evolution 58:2021–2036 Evans BJ, McGuire JA, Brown RM, Andayani N, Supriatna J (2008) A coalescent framework for comparing alternative models of population structure with genetic data: evolution of Celebes toads. Biol Lett 4:430 Excoffier L, Estoup A, Cornuet JM (2005) Bayesian analysis of an admixture model with mutations and arbitrarily linked markers. Genetics 169:1727–1738 Fabre V, Condemi S, Degioanni A (2009) Genetic evidence of geographical groups among Neanderthals. PLoS One 4(4):e5151

123

Conserv Genet Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, Bonatto SL, Excoffier L (2007) Statistical evaluation of alternative models of human evolution. Proc Natl Acad Sci USA 104:17614 Franc¸ois O, Blum MGB, Jakobsson M, Rosenberg NA (2008) Demographic history of European populations of Arabidopsis thaliana. PLoS Genet 4(5):e1000075 Frankham R (1995) Effective population size/adult population size ratios in wildlife—a review. Genet Res 66:95–107 Frankham R (2005) Genetics and extinction. Biol Conserv 126: 131–140 Frankham R, Ballou JD, Briscoe DA (2002) Introduction to conservation genetics. Cambridge University Press, Cambridge Grelaud A, Robert CP, Marin JM (2009) ABC methods for model choice in Gibbs random fields. C R Math. doi:10.1016/j.crma. 2008.12.009 Haddrill PR, Thornton KR, Charlesworth B, Andolfatto P (2005) Multilocus patterns of nucleotide variability and the demographic and selection history of Drosophila melanogaster populations. Cold Spring Harbor Laboratory Press, New York, pp 790–799 Hamilton G, Currat M, Ray N, Heckel G, Beaumont M, Excoffier L (2005) Bayesian estimation of recent migration rates after a spatial expansion. Genetics 170:409–417 Hey J, Nielsen R (2004) Multilocus methods for estimating population sizes, migration rates and divergence time, with applications to the divergence of Drosophila pseudoobscura and D. persimilis. Genetics 167:747–760 Hickerson MJ, Meyer CP (2008) Testing comparative phylogeographic models of marine vicariance and dispersal using a hierarchical Bayesian approach. BMC Evol Biol 8:322 Hickerson MJ, Dolman G, Moritz C (2005) Comparative phylogeographic summary statistics for testing simultaneous vicariance. Mol Ecol 15:209–223 Hickerson MJ, Stahl EA, Lessios HA (2006) Test for simultaneous divergence using approximate bayesian computation. Evolution 60:2435–2453 Hickerson MJ, Stahl E, Takebayashi N (2007) msBayes: pipeline for testing comparative phylogeographic histories using hierarchical approximate Bayesian computation. BMC Bioinformatics 8:268 Hudson RR (1983) Properties of a neutral allele model with intragenic recombination. Theor Popul Biol 23:183–201 Hudson RR (1990) Gene genealogies and the coalescent process. Oxf Surv Evol Biol 7:1–44 Hudson RR (2002) Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18:337–338 Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5:299–314 Jabot F, Chave J (2009) Inferring the parameters of the neutral theory of biodiversity using phylogenetic information and implications for tropical forests. Ecol Lett 12:239–248 Jobin MJ, Mountain JL (2008) REJECTOR: software for population history inference from genetic data via a rejection algorithm. Bioinformatics 24:2936 Johnson JA, Tingay RE, Culver M, Hailer F, Clarke ML, Mindell DP (2009) Long-term survival despite low genetic diversity in the critically endangered Madagascar fish-eagle. Mol Ecol 18:54–63 Joshi S (2007) Estimating selection coefficient using the ancestral selection graph. In: Department of Biological Science. The Florida State University, Tallahassee Joyce P, Marjoram P (2008) Approximately sufficient statistics and Bayesian computation. Stat Appl Genet Mol Biol 7:26 Kayser M, Lao O, Saar K, Brauer S, Wang X, Nu¨rnberg P, Trent RJ, Stoneking M (2008) Genome-wide analysis indicates more Asian than Melanesian ancestry of Polynesians. Am J Hum Genet 82:194–198

123

Kimura M (1969) The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics 61:893–903 Kimura M, Ohta T (1978) Stepwise mutation model and distribution of allelic frequencies in a finite population. Proc Natl Acad Sci USA 75:2868–2872 Kingman JF (1982) The coalescent. Stoch Process Appl 13:235–248 Koerich LB, Wang X, Clark AG, Carvalho AB (2008) Low conservation of gene content in the Drosophila Y chromosome. Nature 456:949–951 Lambert DM, Ritchie PA, Millar CD, Holland B, Drummond AJ, Baroni C (2004) Rates of evolution in ancient DNA from Ade´lie penguins. Science 295:2270–2273 Lande R (1988) Genetics and demography in biological conservation. Science 241:1455 Laval G, Excoffier L (2004) SIMCOAL 2.0: a program to simulate genomic diversity over large recombining regions in a subdivided population with a complex history. Oxford University Press, Oxford, pp 2485–2487 Leache´ AD, Crews SC, Hickerson MJ (2007) Two waves of diversification in mammals and reptiles of Baja California revealed by hierarchical Bayesian analysis. Biol Lett 3:646 Legrand D, Tenaillon M, Matyot P, Gerlach J (2009) Species-wide genetic variation and demographic history of Drosophila sechellia, a species lacking population structure. Genetics. doi: 10.1534/genetics.108.092080 Legras J, Merdinoglu D, Cornuet JM, Karst F (2007) Bread, beer and wine: Saccharomyces cerevisiae diversity reflects human history. Mol Ecol 16:2091–2102 Loader CR (1996) Local likelihood density estimation. Ann Stat 24:1602–1618 Lopes JS, Beaumont M (2009) ABC: a useful Bayesian tool for the analysis of population data. Infect Genet Evol. doi:10.1016/ j.meegid.2009.10.010 Lopes JS, Balding D, Beaumont MA (2009) PopABC: a program to infer historical demographic parameters. Bioinformatics. doi: 10.1093/bioinformatics/btp487 Marchant S, Higgin PJ (1990) Handbook of Australian. New Zealand and Antarctic birds. Oxford University Press, Melbourne, Australia Marjoram P, Tavare´ S (2006) Modern computational approaches for analysing molecular genetic variation data. Nat Rev Genet 7: 759–770 Marjoram P, Molitor J, Plagnol V, Tavare´ S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci USA 100:15324–15328 McKinlay B (2001) Hoiho (Megadyptes antipodes) recovery plan 2000–2025. Department of Conservation, Wellington Miller N, Estoup A, Toepfer S, Bourguet D, Lapchin L, Derridj S, Kim KS, Reynaud P, Furlan L, Guillemaud T (2005) Multiple transatlantic introductions of the western corn rootworm. Science 310:992 Neigel JE (2002) Is F ST obsolete? Conserv Genet 3:167–173 Neuenschwander S, Largiader CR, Ray N, Currat M, Vonlanthen P, Excoffier L (2008) Colonization history of the Swiss Rhine basin by the bullhead (Cottus gobio): inference under a Bayesian spatially explicit framework. Mol Ecol 17:757–772 Nielsen R, Wakeley J (2001) Distinguishing migration from isolation: a Markov chain Monte Carlo approach. Genetics 158:885–896 Nordborg M (2001) Coalescent theory. In: Balding DJ, Bishop M, Cannings C (eds) Handbook of statistical genetics. Wiley, Chichester, pp 602–635 Padon S (2008) Computational methods for complex problems in extreme value theory. In: Dipartimento di Scienze Statistiche. Universita degli Studi di Padova, Padova

Conserv Genet Paetkau D, Slade R, Burden M, Estoup A (2004) Genetic assignment methods for the direct, real-time estimation of migration rate: a simulation-based exploration of accuracy and power. Mol Ecol 13:55–65 Palero F, Lopes JS, Abello´ P, Macpherson E, Pascual M, Beaumont MA (2009) Rapid radiation in spiny lobsters (Palinurus spp.) as revealed by classic and ABC methods using mtDNA and microsatellite data. BMC Evol Biol 9:263 Patin E, Laval G, Barreiro LB, Salas A, Semino O, SantachiaraBenerecetti S, Kidd KK, Kidd JR, Van der Veen L, Hombert JM (2009) Inferring the demographic history of African farmers and Pygmy hunter—gatherers using a multilocus resequencing data set. PLoS Genet 5(4):e1000448 Peters GW, Fan Y, Sisson SA (2008) On sequential Monte Carlo, partial rejection control and approximate Bayesian computation. Arxiv preprint arXiv:0808.3466v1 Pritchard JK, Seielstad MT, Perez-Lezaun A, Feldman MW (1999) Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol Biol Evol 16:1791–1798 Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959 Ratmann O, Jørgensen O, Hinkley T, Stumpf M, Richardson S, Wiuf C (2007) Using Likelihood-free inference to compare evolutionary dynamics of the protein networks of H. pylori and P. falciparum. PLoS Comput Biol 3:e230 Riddle BR, Dawson MN, Hadly EA, Hafner DJ, Hickerson MJ, Mantooth SJ, Yoder AD (2008) The role of molecular genetics in sculpting the future of integrative biogeography. Progr Phys Geogr 32:173 Rosenblum EB, Hickerson MJ, Moritz C (2007) A multilocus perspective on colonization accompanied by selection and gene flow. Evolution 61:2971–2985 Schwartz MK, Luikart G, Waples RS (2007) Genetic monitoring as a promising tool for conservation and management. Trends Ecol Evol 22:25–33 Shriner D, Liu Y, Nickle DC, Mullins JI (2006) Evolution of intrahost HIV-1 genetic diversity during chronic infection. Evolution 60:1165–1176 Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci USA 104:1760 Slabbert R, Bester AE, D’Amato ME (2009) Analyses of genetic diversity and parentage within a South African hatchery of the Abalone Haliotis midae Linnaeus using microsatellite markers. J Shellfish Res 28:369–375 Sousa VM, Fritz M, Beaumont MA, Chikhi L (2009) Approximate Bayesian computation (ABC) without summary statistics: the case of admixture. Genetics. doi:10.1534/genetics.108.098129 Storz JF, Beaumont MA (2002) Testing for genetic evidence of population expansion and contraction: an empirical analysis of microsatellite DNA variation using a hierarchical Bayesian model. Evolution 56:154–166 Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105:437–460 Tallmon DA, Koyuk A, Luikart G, Beaumont MA (2008) ONeSAMP: a program to estimate effective population size using approximate Bayesian computation. Mol Ecol Resour 8:299–301

Tanaka MM, Francis AR, Luciani F, Sisson SA (2006) Using approximate bayesian computation to estimate tuberculosis transmission parameters from genotype data. Genetics 173: 1511–1520 Tavare´ S, Balding DJ, Griffiths RC, Donnelly P (1997) Inferring coalescence times from DNA sequence data. Genetics 145: 505–518 Thornton K, Andolfatto P (2006) Approximate Bayesian inference reveals evidence for a recent, severe bottleneck in a Netherlands population of Drosophila melanogaster. Genetics 172:1607– 1619 Toni T, Stumpf MPH (2009) Parameter inference and model selection in signaling pathway models. In: Topics in computational biology, Methods in molecular biology series. Humana Press, Totowa Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf MPH (2009) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6:187–202 Topp CM, Winker K (2008) Genetic patterns of differentiation among five landbird species from the Queen Charlotte Islands, British Columbia. Auk 125:461–472 Verdu P, Austerlitz F, Estoup A, Vitalis R, Georges M, The´ry S, Froment A, Le Bomin S, Gessain A, Hombert JM (2009) Origins and genetic diversity of Pygmy hunter-gatherers from western Central Africa. Curr Biol 19:312–318 Voje KL, Hemp C, Flagstad O, Saetre GP, Stenseth N (2009) Climatic change as an engine for speciation in flightless Orthoptera species inhabiting African mountains. Mol Ecol 18:93–108 Waits LP, Talbot SL, Ward RH, Shields GF (1998) Mitochondrial DNA phylogeography of the North American brown bear and implications for conservation. Conserv Biol 408–417 Walsh PS, Metzger DA, Higuchi R (1991) Chelex 100 as a medium for simple extraction of DNA for PCR-based typing from forensic material. Biotechniques 10:506–513 Weiss G, von Haeseler A (1998) Inference of population history using a likelihood approach. Genetics 149:1539–1546 Whitlock MC, McCauley DE (1999) Indirect measures of gene flow and migration: FST 1/(4Nm?1). Heredity 82:117–125 Wilson IJ, Balding DJ (1998) Genealogical inference from microsatellite data. Genetics 150:499–510 Wilson GA, Rannala B (2003) Bayesian inference of recent migration rates using multilocus genotypes. Genetics 163:1177–1191 Witzenberger KA, Hochkirch A (2008) Genetic consequences of animal translocations: a case study using the field cricket, Gryllus campestris L. Biol Conserv 141:3059–3068 Wright S (1950) Genetical structure of populations. Nature 166: 247–249 Wright SI, Bi IV, Schroeder SG, Yamasaki M, Doebley JF, McMullen MD, Gaut BS (2005) The effects of artificial selection on the maize genome. American Association for the Advancement of Science, Washington, DC, pp 1310–1314 Zhang DX, Hewitt GM (2003) Nuclear DNA analyses in genetic studies of populations: practice, problems and prospects. Mol Ecol 12:563–584

123

Approximate Confidence Computation in Probabilistic ...