8th World Congress on Genetics Applied to Livestock Production, August 13-18, 2006, Belo Horizonte, MG, Brasil

DECOMPOSING INBREEDING AND COANCESTRY INTO ANCESTRAL COMPONENTS M. Sargolzaei1 and J.J. Colleau2 1

Course of Environmental Management Science, Graduate School of Science and Technology, Niigata University, Niigata 950-2181, Japan 2 INRA Station de Génétique Quantitative et Appliquée 78352 Jouy en Josas Cédex, France

INTRODUCTION The genetic diversity of populations, or equivalently their genetic drift, has been widely proposed to be monitored using overall parameters derived from genealogical data. The ones most referred to have been the average inbreeding coefficient ( F , Wright, 1922), the average coancestry ( f , Malécot, 1948), the effective population size or equivalently, the inbreeding rate per unit of time, (Wright 1931), founder genome equivalent (Lacy, 1989), effective number of ancestors (Boichard et al., 1997) and effective number of non-founders (Caballero and Toro, 2000). Among them, the most appropriate one seems to be f (Lacy, 1995; Caballero and Toro, 2000). Some parameters explicitly use the detailed contributions of genes of ancestors and others, such as F and f , do not. However, in the latter case, carrying out this partition can be helpful to understand who contributed the most to drift and when. Wright’s (1922) path counting method yields the detailed contributions of genes of nodal common ancestors (NCAs; i.e. who form an inbreeding loop) to individual inbreeding coefficients (F). However, if a NCA is a direct descendant of another NCA, the path counting method provides marginal contribution for the older one. For monitoring purpose, Lacy et al. (1996) and Lacy (1997) decomposed Fs into components due to each founder. Caballero and Toro (2000) proposed a general method for partitioning f into components due to founders and non-founder ancestors. The aim of the present article is to describe another method for decomposing F, F and f into ancestral components. This method exploits the analytical link between the path counting method and the method of Caballero and Toro (2000). MATERIAL AND METHODS Path counting and upward exploration methods. Wright (1922) presented a basic formula for computing F, which is based on computation of contributions of genes of NCAs. With Wright’s (1922) formula, inbreeding coefficients of the NCAs can be expressed in the same way and therefore the formula can be expanded recursively. After expansion, each term would be the contribution of genes of the corresponding NCA (regardless of its inbreeding) to the inbreeding of the individual of interest. NCAs are a subset of the list of common ancestors who appear in both sides of the pedigree for the individual of interest. Non-NCAs contribute to inbreeding through the NCAs. If a NCA is a direct descendant of another NCA, then some fraction of the contribution of genes of the older is explained by the younger, leading to only a marginal contribution of genes for the older NCA The same results as by Wright’s method can be recovered by using an upward exploration method, easier to implement especially when the number of paths is very large, and quite adapted to the partitioning of F and f for a batch of individuals. This method is tabular and first considers at the last generation the table of frequencies of a given pair of labelled genes. Recursive backward exploration of these genes leads one to older genes for which a new frequency table is calculated until founder genes are reached. The corresponding algorithm is

8th World Congress on Genetics Applied to Livestock Production, August 13-18, 2006, Belo Horizonte, MG, Brasil

as follows 1) Define a table of 3 columns, 2) For individuals of interest, set columns 1, 2 and 3 of the first rows to the minimum parents’ ID, maximum parents’ ID and 0.5, respectively, 3) Expand combinations of genes backwards into combinations of older genes and calculate the frequencies of these combinations, 4) When a combination involves the same gene, a NCA is found and its contribution is stored. Note that minimum ID is always placed in column 1 for faster access to already existing combinations of genes. The upward exploration algorithm is fast compared with the path counting algorithm but might be expensive in terms of storage for large populations with long histories. This algorithm can be improved especially for saving storage capacity. For this purpose, we first decompose F, F and f , for a general pedigree, into contributions of Mendelian sampling variances (MSVs) of ancestors (Caballero and Toro, 2000) by an indirect method (Colleau, 2002) and then using the contributions of MSVs we develop a fast method to obtain efficiently the results of the path counting or the upward exploration method. Decomposing F, F and f into contributions of MSVs of ancestors. Using factorization of the numerator relationship matrix ( A = TBT′ ) and relying on the fact that A −1 is sparse, Colleau (2002) developed a fast indirect method to multiply A by any vector in order to compute the average relationships within and between groups as well as F (i.e. Ax = r where x consists of 1(s) at position(s) of individual(s) of interest otherwise 0s). This multiplication is done by Gaussian elimination of T′ −1z = x and T −1r = Bz in turn, where z = T′x and B is a diagonal matrix containing MSVs. Due to the special sparse structure of T−1 (i.e. containing –0.5s at positions where individuals are linked to their parents and 1s on the diagonal), solutions can be obtained by tracing the pedigree up and down. Now suppose that we are interested in obtaining vector m, which contains the contributions of MSVs of ancestors to F of an individual of interest with sire i and dam j. Then, from the indirect method, it follows that the relationship between individuals i and j, when xi is set to 1, is: r j = tij b jj + 0.5(rs j + rd j ) ⇒ aij = tij b jj + 0.5(ais j + aid j ) .

Each relationship can be also expressed in the same way and therefore we recursively obtain: aij = tij b jj + 0.5(tis j bs j s j + 0.5(" + ") + tid j bd j d j + 0.5(" + "))

After expanding the relationship terms up to the founders and algebraic manipulation, it can be easily shown that the numerical coefficient of each term is equal to the fraction of genes that each ancestor passed to individual j (‘contribution’ of this ancestor according to James and McBride, 1958). Then, we have: aij =

max( i , j )

∑ tik t jk bkk , k =1

where tik t jk bkk is the contribution of MSV of ancestor k to aij , and thus mk = 0.5tik t jk bkk . Extending to a group of individuals is straightforward. The F for a group can be expressed as Fg = 0.5( ∑ a s j d j )n g−1 , where G is the set of ID numbers for the group considered and n g is the j∈G

number of individuals in the group, and thus we get mk = 0.5( ∑ t s j k t d j k bkk )n g−1 . Note that, for j∈G

decomposing F into m, the indirect method should be run as (2 × the number of sires of the group) times, because the sum corresponding to a given sire can be obtained from two back explorations of the pedigrees. In the vector x, 1(s) are introduced at location(s) pertaining to the sire for the first exploration and to the corresponding mates for the second exploration.

8th World Congress on Genetics Applied to Livestock Production, August 13-18, 2006, Belo Horizonte, MG, Brasil

This method is much more efficient than treating Fs as self-coancestries (i.e. 1+Fs) as mk = 0.5 ∑ t 2jk bkk n g−1 , which requires tracing back the pedigree n g times. Moreover, f for a j∈G

group can be obtained by f g = 0.5( ∑ rk )n g−2 , when corresponding elements of vector x for the k∈G

group considered are set to 1s, whence we have mk = 0.5( ∑ t jk ) 2 bkk n g−2 . Note that the term j∈G

∑ t jk can be readily obtained by solving T′−1z = x , which requires tracing back the pedigree j ∈G

only once, in contrast with the previous situation. The analytical link between contributions of genes of NCAs and contributions of MSVs of ancestors. Let the known vector u represent the results of the path counting method for an individual of interest. We now trace back the origins of one established occurrence of inbreeding. A gene sampled from the sire has a genetic value of expectation 0.5 g si and of sampling variance equal to 0.25(1 − Fsi ) where g si is the genetic value of the sire. Genetic variance is 1. Then, if we sample twice the same paternal gene, the expectation of the sum of the genetic values is g si , its sampling variance is equal to 1 − Fsi and genetic variance is 2. The same is true when the gene sampled twice comes from the dam, thus we have: 0.25(var( g si ) + var( g di )) + 0.5 − 0.25( Fsi + Fdi ) = 1

This equation is quite general but this shows that we are allowed to decompose inbreeding coefficients (here 1, for the known occurrence of inbreeding) into variance components. The term 0.5 − 0.25( Fsi + Fdi ) corresponds to MSV of individual i. Both parental genetic variances can be recursively decomposed into a weighted sum of MSVs. Now let vector ci represent the coefficients of this development. If inbreeding through an individual i is of probability ui , then the contribution of the individual to vector m is ui ci and finally m = Cu , where matrix C gathers all the involved coefficients and is an upper triangular matrix with cij = 0.25(t s2ji + t d2 ji )bii and cii = bii . Vector u can be obtained from u = C −1m but C −1 cannot be readily obtained and thus its practical use is limited to small populations. However, since C is upper triangular, u can be obtained from m by Gaussian elimination of m = Cu . In this situation, the Gaussian elimination can be effectively performed because each column of C can be formed at a time by tracing back the pedigrees of the two parents involved. As mentioned earlier, u contains the (marginal) contributions of genes of only NCAs to the statistic of interest (F, F or f ) and allows one to understand when and by whom inbreeding was created (bottlenecks). Finally, the contributions of genes of founders (Lacy et al., 1996; Lacy, 1997) can be established by taking into account the founder genes conveyed by the NCAs. Considering that t jk is the proportion of genes contributed by founder k to NCA j, the contribution of genes of founder k to F, F or f through NCA j is then t jk u j and thus the direct contribution of genes of founder k is t ′k u , where t k is column k of matrix T. And finally we have v = T′u , where v is vector of direct contributions of founders. Simulation study. A population consisting of 15 discrete generations of 250 sires and 1000 dams (2 progeny per dam) with random selection and mating was simulated to compare the developed method with the upward exploration method. The required time in seconds (and memory in Mb) for decomposing F and f of the last generation, with the upward exploration method were 174.7 (31.6) and 611.5 (61.5) respectively, and with the developed method were

8th World Congress on Genetics Applied to Livestock Production, August 13-18, 2006, Belo Horizonte, MG, Brasil

2.3 (1.2) and 2.2 (1.1) respectively. The efficiency of the upward exploration algorithm depends greatly on the structure of pedigrees. CONCLUSION An analytical link between contributions of genes of NCAs (Wright, 1922) and contributions of MSVs of ancestors was established. It was shown that by using an intermediate matrix, which is easy to form, contributions of MSVs of ancestors to F, F or f can be efficiently transformed into contributions of genes of NCAs and founders in large populations. The method is potentially useful for monitoring purposes since two populations with equal f could have very different pedigree structures. The method is implemented in CFC software (Sargolzaei et al., 2006) which is available at: http://agrews.agr.niigata-u.ac.jp/~iwsk/cfc.html. REFERENCES Boichard, D., Maignel, L. and Verrier, E. (1997) Genet. Sel. Evol. 29 : 5-23. Caballero, A. and Toro, M. A. (2000) Genet. Res. 75 : 331-343. Colleau, J.J. (2002) Genet. Sel. Evol. 34 : 409-421. James, J.W. and McBride, G. (1958) J. Genet. 56 : 55-62. Lacy, R.C. (1989) Zoo Biol. 8 : 111-124. Lacy, R.C. (1995) Zoo Biol. 14 : 565-578. Lacy, R.C., Alaks, G. and Walsh, A. (1996) Evolution 50 : 2187-2200. Lacy, R.C. (1997) Evolution 51 : 1025. Malécot, G. (1948) «Les Mathématiques de l’hérédité». Masson et Cie, Paris, France. Sargolzaei, M., Iwaisaki, H. and Colleau, J.J. (2006) Proc. 8th WCGALP, CD-Rom comm. Wright, S. (1922) Am. Nat. 15 : 330-338. Wright, S. (1931) Genetics 16 : 97-159.

DECOMPOSING INBREEDING AND COANCESTRY ...

solutions can be obtained by tracing the pedigree up and down. ... bt bt bt a. After expanding the relationship terms up to the founders and algebraic ...

127KB Sizes 2 Downloads 216 Views

Recommend Documents

DECOMPOSING INBREEDING AND COANCESTRY ...
Note that minimum ID is always placed in column 1 for faster access to already existing combinations of genes. The upward exploration algorithm is.

Heterobeltiosis, inbreeding depression and heritability ...
7-161. Anonymous .2006. Indian Horticulture Database-. 2006, pp. 7-161. Arumugam, R. and Muthukrishnan, C. R. 1979. Gene effects on some quantitative characters in okra. Indian J. agric. Sci., 49: 602-604. Elangovan, M., Muthukrishnan, C. R. and Irul

Decomposing Differences in R0
employ, but the analysis they conduct is still consistent and valid because the terms in (12) still sum to ε .... An Excel spreadsheet and an R package with example data and a tutorial will be available in March 2011 to accompany the methods.

Developing a Framework for Decomposing ...
Nov 2, 2012 - with higher prevalence and increases in medical care service prices being the key drivers of ... ket, which is an economically important segmento accounting for more enrollees than ..... that developed the grouper software.

Decomposing Discussion Forums using User Roles - DERI
Apr 27, 2010 - Discussion forums are a central part of Web 2.0 and Enterprise 2.0 infrastructures. The health and ... they been around for many years in the form of newsgroups [10]. Commerical ... Such analysis will enable host organizations to asses

Decomposing time-frequency macroeconomic relations
Aug 7, 2007 - As an alternative, wavelet analysis has been proposed. Wavelet analysis performs ... For example, central banks have different objectives in ...... interest rates was quite high in the 3 ∼ 20 year scale. Note that the causality is ...

Decomposing Differences in R0
and can be executed in any spreadsheet software program. ... is to describe vital rates and their consequences, in which case analytic decomposition (that.

COANCESTRY: a program for simulating, estimating ...
Genetic marker data are widely used to estimate the relatedness ... Example applications include estimating ... study, I describe a new computer program that comple- ments previous ones in ..... the 'standard business' selection. 3. Click on the ...

COANCESTRY: a program for simulating, estimating ...
COMPUTER PROGRAM NOTE. COANCESTRY: a ... study, I describe a new computer program that comple- ... Correspondence: Jinliang Wang, Fax: 0044 20 75862870; E-mail: ..... tion-free estimation of heritability from genome-wide identity-.

Decomposing the Gender Wage Gap with Sample ...
b) Calculate at the j distribution the percentile levels at which qi lies and call these Pi. ... work but not the wage, are home ownership, number of children between 2 and 6 ..... may exert a gender equalizing effect on intermediate earnings jobs.

Decomposing Structured Predic)on via Constrained ...
Adding Expressivity via Constraints. ▫ Each field must be a consecu)ve list of words and can appear at most once in a cita)on. ▫ State transi)ons must occur on punctua)on marks. ▫ The cita)on can only start with AUTHOR or EDITOR. ▫ The words

Decomposing Duration Dependence in a Stopping ...
Feb 28, 2016 - armed with the same data on the joint distribution of the duration of two ... Previous work has shown that small fixed costs can generate large ...

Decomposing bivariate dominance for social welfare ...
Mar 29, 2017 - Department of Economics, Copenhagen Business School, ... The latter definition (2) has a foundation in expected utility theory and .... With this intuition we prove the following technical lemma, which is useful for showing.

The relative sensitivity of algae to decomposing barley ...
... Germany and Sciento strains were obtained from Sciento, Manchester, UK. .... current results with E. gracilis support those of Cooper et al. (1997) who showed ...

Decomposing Duration Dependence in a Stopping ...
Apr 7, 2017 - as much of the observed decline in the job finding rate as we find with our preferred model. Despite the .... unemployment using “...a search model where workers accumulate skills on the job and lose skills during ...... Nakamura, Emi

The relative sensitivity of algae to decomposing barley ...
unpublished data). ... cus bacillaris was obtained from the culture collection of the Natural History ... meter or Sedgewick-Rafter chamber (Table 1). Where ...

Decomposing the Gender Wage Gap with Sample ...
selection correction and decomposition exercises. 3 ... minimize measurement error in the log hourly wage. No. ..... On the other, the Labor Code15 states that.

SOFSOG: a suite of programs to avoid inbreeding in ...
COMPUTER PROGRAM NOTE. SOFSOG: a suite of programs to ... Correspondence: Jesú s Fernández Martın, Fax: 34 913478743;. E-mail: [email protected]. Ó 2009 ...