Construction of Bacterial Genomic Libraries

arrayedclones, filters for screening by hybridization, or DNA pools for PCR-based screening

Construction of Bacterial Genomic Libraries
Important factors to be considered when constructing or working with libraries for genomewide analysis are summarized below. For detailed protocols on library construction using the cosmid system, see Evans et al. (this volume); for bacteriophage P1 vectors, see Sternberg (this volume); and for BACs, see Birren et al. (this volume).

CHARACTERISTICS OF DIFFERENT BACTERIAL CLONING SYSTEMS

The diversity of bacterial cloning systems available for use in various strains of E. coli enables investigators to choose the cloning system best suited for specific experiments. Table 1 summarizes a variety of vectors used in bacterial cloning systems and describes the features of these vectors commonly used in genome analysis.

Until the development of the newer cloning systems, cosmids were widely used for cloning large-insert DNA fragments in E. coli. Cosmids have the advantage of the high efficiency of an in vitro-packaged bacteriophage λ, in addition, cosmids have the advantages of a small vector (e.g., a plasmid) because of recircularization of the DNA to a plasmid form within the host cell. Cosmids allow the cloning of 15-45 kb of genomic DNA and have been used to build physical maps composed of overlapping clones in human genomes and those of other organisms. Cosmids exist as multiple copies per cell, which allows recovery of large amounts of DNA during DNA preparation but can lead to rearrangements of the cloned DNA (Kim et al. 1992). Procedures for the use of cosmids in genome analysis are described in Evans (this volume) (see also Evans et al. 1992). The fosmid vector (Kim et al. 1992) contains cos sites for bacteriophage λ-mediated packaging, but like BACs, it is derived from a single-copy plasmid (F factor). The fosmid vector allows construction of libraries bearing cosmid sized inserts that are more stably maintained during propagation than those of conventional cosmids. The insert size in cosmid and fosmid clones is limited by the constraints of packaging the DNA into the bacteriophage λ phage head.

The P1 cloning system (Sternberg 1990; Pierce et al. 1992) packages cloned DNA in the larger bacteriophage P1 phage head, allowing cloned fragments to be as large as 100 kb (see Sternberg, this volume). The P1 vector is a single-copy vector, offering the advantage of insert stability. Small modifications to the P1 vector, and a different method of delivering recombinant molecules into the host E. coli cells (electroporation instead of transfection), led to the development of the PAC system (Ioannou et al. 1994) and the cloning of DNA fragments hundreds of kilobases in size. Similarly, the BAC system (Shizuya et al. 1992), which uses an F factor-derived vector has been used to construct libraries of stably propagated DNA fragments as large as 300 kb.

AVERAGE INSERT SIZE AND REPRESENTATION OF THE GENOME

In many genomic cloning strategies, DNA sources and preparations can be manipulated to enrich for specific regions or targets of interest. However, for

Table 2 Probability of Having One or More Clones/Locus within a Library as a Function of Library Size

Table 3 Number of Clones Required for 7.5x

Genomic coverage

39.3 63.2 86.4 95.0 98.2 99.3 99.75 99.91 99.99 99.995

0.5

1 2 3 4 5 7

9 10

9 l0

js Th¡ probabiliryol lindnrg ¡ clonesI¡on ¿ litr¡¡rv ol. Loverage P(r) = =


TIretrobabihies shown a¡e ihos. olli¡di¡g o¡e o¡ m're rl¡nes lo Ir P(o)l Jo¡ librrries wilL gclome.olcmges 0 t lo

are used for cloning are rot unifolrlrly distribLrte.i in geDomicDNA. Second,rhere a¡e regionswithin g e n u m r\ce q u e n \ c r\ l r " .d ( c l h ¿ lo r p u Ú r l )r ' ' d i n tained in bacte¡ialhost cells lor thesereasons,]t ts usually necessary!o conslruct librarieslo a greater d e p , nl l ' r r . , r g B < n e L d ' v . r ' eP o i ' 5 u ¡nf t r l i , i ^ n genome coveragerequired for a The depth oi genomic libtary depends on tbe purpose of the library. As seenin Table2, it is very likely rhÁ1a 5x library would be adcquatefor linding one of more clonesfor a locus of inleref, sincemore than 99yo ot lhe genomeshor d be present.ln a proje{r where only a few clonesper locLrsare needed, lhe addi' tional reagenlsand eflor! required io co¡srrrr't a l " r g . l 0 ' , . r " - " r v a r L , . ^ $ o r r r n l e dH o { e \ r r ' I walking or mapping,where eachclone h a seÍes ol clonesmust be identificd, ¡he chanceof fiiding no clonesat somepoint in the parh is grealer(beingrhe product ()1the probabilitiesol finding eac! ol ¡¡e clores). Thüs, on a shtistical basis,e g., when con' slructing clone contigslrom a 5x BAC library, gaps are expected,on average,every l0 Mb. Libradesot greaterdepth (>l0x) not oniY provide an in'reased . i k e l ' l r o ou d l h u i l d ' n g( " r ' i n r ' u . r \! o v c r c a co v ( long paths, bul also allow a flexibility in selecting specificpaths (e.g.,a path of minimaUy overlapping clonestor sequencing).Thus, lhe appropriatenumber of clones in a librarY should be the minirüal number ol clonesgiving a genome coveragelikcly to meel the necessaryreqldrements

no. ol (Mb)


20 1000 1000 1000

50 t00 50 150

1000 7504 450,000 150,000

no. of 384 weu

a 20 t172 l9l

CoDstruclingand scree¡ing highly redundarl librariesfor large genornesrequires generatingand handling many clones.This increasesthe cost and elfort for dreir production, storage,and manipula_ lion. Table I sumlnarizes the nu¡tbers of clor,es required to prod ce 7.5x coverage libraries for organismsof dilferent geiome sizes Libraries tor organisms with smaller genomes requjre sm¿ll numbers of clones. Fo¡ inslance, a 7 5x fosm;d library for an organism with a 20 Mb genome cor d be slored in a set ol cight 184-well mjcrotier pLa¡es. A 7.5x human (ge¡ornesizeof 1000 Mbl BAC librarywirh an avcrageinscn sizeof 150 kb would occupy hündreds ol mjcrodter plales; a 7'5x h man cosmid library woukt occupy more than 1000plates.

DNASOURCE A genomiclibrary relleclslhe genomeholn which iI was made. The selectionof parlicular individuals. s¡rains,or cell lines as a DNA source can cnh¿nce of Lheresullnrgübraries For instance charactcristics adiac€ntto a specifictrando Lhe sequerces io clo¡e iglt choosero constru( Ihe one cationbreakpoint, of lhe indlvidual wilh that DNA lrom ihe library in üc context oI geDome Howevet breakpoini. is most olten chosenlor ill DNA source ¡he analysis, configuratior of the of the "normai" representati('r genome,rathcr lhan for the presenceor absenceof specilicattributes.Even ilr lhe laiter case,imporlanl issuesrnusr be considered,e.g., the use ot tlaJls lbrmed cell lines. If a cell line has beeDex¡ensivelv d r r r ¿i ñ n . p u ' " 8 e d .i r i \ l r \ e l v . l , d t c " . J n , u l ¿ r em "nomalfuom lhe that deviate and rearrangemen¡s passage cell an early genome, and lhus stareol that orga¡ The sex of rhe choice line would be a betler ism that will sewe as the DNA sourcemay alsobe a factor.In malnmals,twice as many cloncs must be


GenomicL¡braries Construction of Bacterial tor X chrornosoneloci whcn DNA tron a malc is used.comparedto rvhen DNA lrom a iemálc is used. Y chromosomeloci will nol be preseni in DN^ from fenrales.If it is likely üat clonesmay be used larer ir hrnctionál knockout or ho¡nologous reconbina¡ion experiments.there ay be advar tag€sro using DNA derived from the same animal rrain as the recipienl ccll. For the human genome, ¡¿tLrral variation becomessignilicant.To obtain a library that minimizesvaiation, DNA lrorn a con_ individualcouldbe used;for a spccilic sangLrineous clllomoso €, the chromosomecould be flow-sorled Irom a monoclrromosomalsomaiic ccll hybrid on (hc olhe¡ hand, to maxinri?ethe varialionsl}lai carr b€ identilicd lrom the clones in lhe libfary, DNA IÍol¡ multiple individuals could be pooled. Finally, erhicalaDd legal issuesnlust alsobe considered.For guidelúeslrave large scal€humán DNA sequenci¡\g, proteclion oi DNA hccn issuedlor rec.üi|ment and guidelines, secthe donors For iüformation on these Rescarch/ ^"alional Cc¡rle¡ for Hulllan Genome Guidanceon Dcpalment ol Energy (NCÉlGR-DOE) Hu¡nan Subjects Issucs in LargeScalc DNA at ihe following World Wide Web ScqLrcncing / http: / /\\Ñ. ornL. qowe/TectiResou.ces H'¡nan Genone/arctrive/nchgrdoe hLml


each fep, it is important to be able to oblain large amounts oi DNA initially or to be able 1()repeatthe DN^ preparalion as needed. Diifcrcnt bacterial cloning vectors can acceptdifferent size ra¡rgesol geüomic inserr DNA| fiom 15 -45 kb for fosmid/ This cosnids ro as much as 300 kb {or BACS/PACS. has irnpoftanr implications for the preparation ol the source DNA. In gene¡al, the average size oI hagments lrom the p dlied sou¡ce DNA mrsi be five to six timcs larger tiran the size ol lragments ¡equired in the ligation slcp. In particula¡, thc preparation oi DNA for the construction ol B^C and PAC libraries uses source DNAS isolated in agaroseplugs (Shizuyaet al. 1992;loanno et a]. 1994;I
THE BACTERIALHOSTSTRAIN AND QUALITYOF DNA REQUIRED AI\¡OUNT The quantity ol DNA neededto constr cl a library f.ill va¡y depe¡dingon th€ cloningsystemchosen rnd üe extent oi genomecovcragcneeded(see T¡blc 4). As library consl¡uctionofien inl,olvess€v_ r:¿l rounds ol DNA preparation,size-selection,lig ¿iion. and lranslollnalion, wilh optínization oi

CeÍain DNA sequencesmay be dillicull to clone and/or rnaintain in a bacterial host. Many E. rrli murations have becrr identilied üat improve ihe abiljty o{ a strain to scrveas a host io¡ ioreign DNA r i g l rl e ' e l "o f r F t p . i r i ! -\ . l u c r . e \ i n T.r iIrrcf genomes ol higher organismsmears that mollhe ible copies of thcsc repetitive sequenceswili be

Table 4 Cloning Methods Suitable for Generating Genomic Libraries Using Different DNA Preparations


Appropriate cloning system for source DNA

Genomic DNA in agarose

>1 Mb


BAC, PAC, P1, fosmid, cosmid

Genomic DNA in aqueous

100-200 kb


P1, M13, plasmid λ, fosmid, cosmid, bacteriophage

Genomic DNA in aqueous

50-100 kb


bacteriophage λ, M13, plasmids

Purified chromosomal DNA or isolated YAC DNA

>200 kb


fosmid, cosmid, bacteriophage λ, M13, plasmids


fosmid, cosmid, bacteriophage λ, M13, plasmids

systenrs Cloning 1 | Bacterial Cháp16r Dresentwltrin individüal clones' Wild-tvpe E c¡rl¡ strains are highly proficien! in recombiDalion' which causes dele!ions a¡d rearrangemenls rn case through Í]lramolecr¡lar ¡ecombilration(and recomblna oL mutticopy veclors, inlermolecular Poor tion as witl¡ a¡ lhese repeated sequences oi lhe cio,rabilitv aúo resul$ lrom irrcoñpalibilily and reslridion Ioreien DNA with rhe natural host modiflcallon sysremsthe choi.c of host strain is an imDorta¡l factor lor the elficieBcyoi lranüe'lron or r.airstormaLlon,and the quantity a¡d qualily oi DNA obtai¡e¿lfrom cultures To rcduce the ability

ol !h€ bacterial host !o modily cloned DNA fragmcnts in such ways, scveral host strains wilh m J r r r . e e n o l v p e¿ sre n"w ¡r¿rl¡o'' or libran ¿n'j L o ' l \ r r r . : o n L . e el ¿ b l . '- ' l r r L i c n r r ¿ lD H 5 & llHl0B are uselulfor cosmidlibraries,and DHIoB has beer used lor BAc and tosmid library con_ rhe rvell struclio¡. Table 6 slr¡nmarizessome ol known E. ¿rlt gencs that are directly or indiredlt ano involved with DNA recombination processes host restricrion and modilicaiiotl, or otherwis€ inlhrencethe sulability ol a slrain lo serveas a hor lor clorcd liagments

of BacterialGenomicLib¡aries Const¡uction

Table 6 E. coli Mutations That Affect the Ability to Clone Mammalian DNA in Different Host Strains

DNA in Dillerent Hof Strains

increascsthe rabili¡v oI DNA rhár would The major gcne involv€d in homologous.ecou inatioo MlrralioD repc¡itivescqueDces (e'8 containing DNA ' ur¡e.*i'" üe p-"e rn delerionsot rcarransclüenrs Mutatio¡ inoeáscs the rabilirv ThesegcnesencodeexonucleaseV and are also involved in ¡ecomtrination oiDNAthatwoul.lo¡herwisebe|ronetodeledonsorlcarfangcmenls(e's.,DNAcontatnnrgrepctLltve 's reccss'v tor üe '¡pÉsioñ of rhe The ge¡e eD.o.tts a componc¡! ol the RccBC cDzvmccomplcx and DNA that would otherwise bc proDe ro cnzy;et nüclcolylic adivi¡ies MütatioD iticreasesthe stabiiitv ol ilelerionsor rcani¡eemenrs (e.g, DNA conlaining repcdLivesc{luenccs)' The r''¡ The gene enco.lcsá componcnrol dre ru'¡ conbi¡átion pa¡hwav

mutalÚD i¡nprovesrhe growln

I, a co¡iponc¡L ol rbe ru'FPaLhwavThe J¡c¡ nintario¡ tr¡provcs tlre erown The genc encodcsexoDuclease ol r¿daaDd /e.c nlurams methvlareDNA but are unable Thc geneencodesa sul'uDitoI lhc ¡es¡rictio¡ enzvüreEc"( 'rdR ¡¡uranls can lo ¡estric!lorcigü DNA a¡ rbc apfropriarc recognition5i1c a'le¡i¡e rc:idúsin,Eú( rslricl'on The gene encodes¡ subunit oI E .ol; ¡ierhvl¿se which mtllylales lhe residues DNA rr'm ¡n ¡J¿'ll siles:Thc¡t ¡(¡crricrion e¡zvDie lalls to recognizeDNA ¡¡clLvlaled al adenine . ' l e d i- c '-\dt l'o\' r o u r a n. l r - r ¡ * r l " e r e l and HsdR protcins /rr¿i Thc ge¡e eDcodesa lrorei¡ subunit thai co¡lers sile spe'ilicirv lo boü HstlM ¡rutan$ are in.apable oI both melhyladon and rcslridún ntthylaLed al Ihe scquedce The gcne eDcodesa ¡eslricrion enzyme. Muiation prevenis resl¡iclion oI DNA mclhvlcvrosi¡e rcsjd"es 5--C;"CGG-3',thereby allowing morc elljcient.lonlng oI DNA conraining Mutation Dre The n¡:ra and mdc Senescocodethe ¡wo subunits ol the McrBC res¡ric¡ionendonucleasr' cl ringolDNA cllicient ¡rue thus allows and 5-_GDcc_lscquencc ar lne ai melbylcytosúc vcnls resrriction con¡aining5 mcihylcyrosineor 5}vdroxvnreúYl cvlosine' a¡ the sequences The Sencencoitesa refricrion enzylne Mu|alion Prevorts rerri'tion ol DNA ¡tethylated nrelhyladen¡ne coirainiDg of DNA :-_ci.46-r- n. 5--c-"Ac_l-. rhcreby atlowing mo.e ellicienr .lonilrg a¡d has ürcreas'd The genc en.odcs en.lonucleáseI DNA pu¡itied l¡o'n ¿'¡J¿mltaDts is of a hjghe' vleld s!ábilily due to thc absenceoI üis nuclcasc raDdonna The genee¡codesrhe rc!ressor ol thc /d¡ opcron. A mut¿tion in rhis genr aLlowskr irrc'caserl tion rlticieDcyol largeplasmids lacz Ml5

thaLlacks amino ¡cids l l 4l Some clonins This .lcleteclsene enco{lesan ináctive tom ol p'galactosidase vec¡orscncodeanaminotcrmlnalf'Ifag¡¡eDtoIpgala.losidasc'Thisallo]vs(,comple¡¡eDtaljonwiúüe hosr/4.ZAMl5produc!¡olormanactiveenzyme.Thcclonilrssile(s)ofsuchvedols]ieswithinüedfrag. ment'e¡codnrgregjonTherefore,no¡recombinantslortnbluecolonicsonmediumco¡rainingxgal whercasreconbina¡ts are wbüe

^^crA, mdtsC, and mn all rcsrict DNA modilied by CpG ncrhvLase

DNA libraries using E. coli hosts. The ease of application, cloning capacity, and reliability of bacterial cloning systems make them useful for most genome analysis experiments, including large scale physical mapping and genomic sequencing.

