Genome Research 6: 71-79, 1996 The Drosophila Genome Project
Gerald M. Rubin Drosophila Genome Center, Department of Molecular and Cell Biology and Howard Hughes Medical Institute, Berkeley CA, 94720-3200
The Drosophila melanogaster genome is 165 Mb, with about 120 Mb of this being euchromatic. The genome is organized in 4 chromosome pairs (Figure 1A) and is estimated to contain 10 - 12,000 genes (G. Rubin and G. Miklos, unpubl.). The Drosophila research community-both through the efforts of "genome projects" and of individual investigators-has accumulated a vast amount of data on the genetic and molecular organization of the Drosophila genome as well as on the structure, expression and function of individual genes. In this review I will attempt to summarize the current state of genome research in Drosophila. The groups whose work is being summarized are listed in Table 1.
Table 1: Drosophila Research Community
|Group||Senior Members||Contacts||Major Funding Source|
|Berkeley Drosophila Genome Project (BDGP)||Drosophila Genome Center (DGC)
Lawrence Berkeley National Lab. Human Genome Center (LBNL)
Howard Hughes Medical Inst. (HHMI)
Department of Energy (DOE)
|DGC||G. Rubin and S. Lewis (UC Berkeley)
A. Spradling (Carnegie Inst. of Wash.)
M. Palazzolo and C. Martin (LBNL)
D. Hartl (Harvard) until 7/95. I Kiss (Szeged, Hungry) is a collaborator on the gene disruption project.
|LBNL||M. Narla, M. Palazzolo, C. Martin, J. Jaklevic and F. Eeckman||http://genome.lbl.gov||DOE|
|HHMI||G. Rubin and A. Spradling||HHMI|
|European Drosophila Mapping Consortium||M. Ashburner (Cambridge)
D. Glover and R. Saunders (Dundee)
J. Modolell, CSIC (Madrid)
F. Kafatos, EMBL (Heidelberg)
K. Louis, B. Savakis, and I.Siden-Kiamos, IMBB (Heraklion)
|email@example.com||Fondation Schlumberger (Paris) MRC (UK)
European Community Fundacion Ramon Areces
|Karpen||G. Karpen, Salk Institute (La Jolla)||firstname.lastname@example.org||NCHGR|
|University of Alberta, Canada||J. Locke, A. Ahmed, J. Bell, H. Mc Dermid, D. Nash, D. Pilgrim, K. Roy and R. Hodgettsemail@example.com||Canadian Genome Analysis and Technology Program|
|McGill University, Canada||P. Lasko and B. Suter||Paul_Lasko@maclan.mcgill.ca||Canadian Genome Analysis and Technology Program|
|Duncan||I. Duncan (Washington Univ.)||Duncan@biodec.wustl.edu||NCHGR|
|FlyBase||W. Gelbart (Harvard); M. Ashburner (Cambridge, UK); T. Kaufman and K. Mathhews (Indiana Univ.)||http://morgan.harvard.edu/
The origins of genome research in Drosophila Drosophila has been a leading organism for genome research for over 80 years. The concept that recombination frequencies could be used to order genes on a linear map was first demonstrated, and the first genetic maps were constructed, using Drosophila in 1913 (Sturtevant, 1913). Since then Drosophila has remained the metazoan with the most accurate and complete genetic map. The first physical maps-that is, maps that relate genetic functions to physical locations on chromosomes-were Bridges' polytene chromosome maps that, while made nearly 60 years ago (Bridges, 1937), had a resolution of +/- 100 kb. The polytene chromosome maps have allowed hundreds of genes to be placed in small physical intervals by classical cytogenetic methods. In the early 1970's when recombinant DNA methods were developed, Drosophila was the only organism with a physical map of its genome. This map, together with the sensitive and precise mapping that could be accomplished by in situ hybridization to polytene chromosomes (Pardue et al., 1970), enabled a number of pioneering studies to be carried out in David Hogness' laboratory in mid 1970's. Among these were the first chromosomal mapping of cloned unique and dispersed repetitive DNA segments (Wensink et al., 1974; Rubin et al., 1976) and the development of procedures for screening clones by colony hybridization, for assembling large chromosomal contigs, and for positional cloning (Grunstein and Hogness, 1975; Bender et al., 1979). Dozens of Drosophila genes that had been identified by mutations with interesting developmental or physiological phenotypes were positionally cloned in the early 1980s. Physical Maps The physical map based on the polytene chromosomes, while invaluable in providing an unambiguously correct reference map, is a cytogenetic map and thus can not meet the needs of the research community for a clone-based map. The first systematic attempts to make clone-based physical maps of entire Drosophila genome were the YAC-based maps of the Hartl and Duncan groups and the cosmid-based maps of the European Drosophila Mapping Consortium (EDMC). In 1992, the Berkeley Drosophila Genome Project (BDGP) began construction of a P1-based STS-content map. These maps are all cross-referenced to one another through the polytene chromosome map; moreover, the EDMC and BDGP maps contain additional cross-references through STSs. YAC-based Maps The YAC maps were constructed by carrying out in situ hybridization to polytene chromosomes with individual YACs. Hartl's group (Garza et al., 1989; Ajioka et al., 1991) mapped 1193 YAC clones with an average insert size of 207 kb and Duncan's group (Cai et al., 1994) mapped 855 euchromatic YACs with an average size of 211 kb. Together these YAC maps cover about 90% of the euchromatic genome for the autosomes, and 80% for the X; however, overlaps between YACs have not been confirmed by molecular methods. The distribution of clones appears to be essentially random over most of the euchromatic genome; however, there are a few regions for which no or very few YAC clones were recovered. Cosmid-based Maps The overall approach being used by the EDMC to construct a cosmid-based map consists of producing individual contig maps each representing a single chromosomal division (about 1 Mb), that are then complemented by the inclusion of STS markers, generated from the ends of the inserts of mapped cosmids (Siden-Kiamos et al., 1990). In brief, an arrayed cosmid library is screened with a probe generated by micro-dissection of polytene chromosomes and the DNA from positive clones is fingerprinted. After computer-assisted ordering of overlapping cosmids into contigs, a representative set of cosmids from each contig are mapped by in situ hybridization to polytene chromosomes to verify the integrity of the contig and the map localization. STS markers are then produced from the end of the inserts of selected cosmids, with the goal being the production of an STS map with markers spaced, on average, every 35-40 kb; however the STSs themselves are not used in the construction of the map. Integration of the map with the genetic map is achieved by hybridization of cloned genes to the arrayed cosmids or by the information provided by the STS markers. The map of the X-chromosome is the most advanced, covering about 62% of the euchromatic portion of the chromosome and ~560 STS markers (Madueno et al., 1995). Maps of the autosomes are anticipated to reach similar coverage within the next few months. Roughly 1300 STS markers with an average length of 400 bp have been determined by cosmid end-sequencing. Eight percent of these STSs have been found to represent either known Drosophila genes or P1 clones and STS markers from the BDGP (providing links to other existing maps) while 3 percent of the STS markers have strong similarities to genes from other organisms, identifying the Drosophila homologues. A cosmid-based mapping effort focused on the small fourth chromosome is being carried out at the University of Alberta, Canada. The euchromatic region of this chromosome contains ~ 70 genes distributed over ~1.2 Mb of DNA. Interestingly, the interspersion pattern of the repeated DNA component of this region is unlike most Drosophila gene-rich regions and more closely resembles the short period interspersion class of repeats found in mammalian DNA. The Alberta group is using a technique they call cross-screening that relies on an array of pairwise cross-hybridization tests performed on a single blot to determine clone overlaps rapidly and that may be particularly well-suited to mapping repeat-rich DNA (Locke et al., 1996). P1-based Maps The BDGP is constructing a map based on P1 clones using a combination of in situ hybridization (Figure 1B) and STS content mapping. The first step was the generation of a framework map based on polytene chromosome in situ hybridization of 2467 P1 clones with an average insert size of 80 kb (Smoller et al., 1991; Hartl et al., 1994). This map provides about 70 percent coverage of the euchromatic genome. The second step of map construction uses STS markers designed from the ends of genomic inserts of individual P1 clones from the framework map. By direct sequencing of the vector/insert junctions of P1 genomic clones, two STS markers per clone can be generated that are separated by the length of the insert, ~80 kb. Over 2300 such P1-end derived STSs have been mapped to date. When the pair of markers is mapped to the library, the resulting contigs extend bi-directionally from the mapping clone and cover on average 200 kb of the genome. This average contig size exceeds that provided by non-random or single-end mapping strategies. Furthermore, computer simulations indicate the number of markers required to assign all clones in the library to contigs will be minimized by using this so-called double-end clone-limited approach (Palazzolo et al. 1991). This phase has been completed; nearly all the 2300 euchromatic P1 clones in the framework map have been assigned to contigs. As of October 1995, 649 contigs cover the genome with an STS localized on average every 50 kb (Kimmerly et al., 1996). The BDGP estimates that about 10% of the euchromatic genome remains unmapped. The position and approximate size of these gaps are known because all contigs have been localized on the polytene chromosome map. Clones not yet assigned to contigs (2200 of the 9216 clones in the 5-hit P1 mapping library) are currently being used as a source of STS markers that will fill in the bulk of the final 10% of the euchromatic genome. Individual clones are selected, mapped by in situ hybridization and used to generate STSs. This phase of the project should be completed during 1996, at which point coverage is expected to be about 98%. Moreover, many of the remaining gaps in the map are likely to be due to overlaps that have not yet been detected rather than to uncloned regions. Subsequent efforts will focus on contig closure, first by screening a larger 10-hit library with contig end probes. This project has provided a critical test of the utility of the P1 cloning system and will result in the first whole genome to be mapped based on a library constructed with large inserts in a vector that is maintained in E. coli as a single copy plasmid. To provide a more direct link between the physical map and the genetic map, the organized set of P1 clones in this physical map are being used as a substrate for further STS content mapping in which the STS sources are derived from markers that are also genetically mapped, including individual genes that have been cloned and sequenced by the research community (350 STSs mapped) and the sites of insertion of P-elements that disrupt essential genes (270 STSs mapped).
Figure 1. Drosophila chromosomes. (A) Metaphase chromosomes. (B) A portion of a polytene chromosome showing a 4Mb region from the tip of chromosome arm 3L .The average band visible in polytene chromosome preparations contains about 25 kb of DNA. The bracket indicates the signal obtained by in situ hybridization of an 80 kb P1 clone.
Heterochromatin. One of the most striking and enigmatic aspect of genome organization in multicellular eukaryotes is the division of chromosomes into euchromatic and heterochromatic regions. Heterochromatin is distinguished from euchromatin by its paucity of genes, tightly compacted chromatin structure throughout the cell cycle, unusual staining properties, replication late in S phase, and high content of repetitive sequences. In Drosophila, about one-quarter of the genome is heterochromatic including the centric one-quarter of the X, 2nd and 3rd chromosomes and most of the Y and fourth chromosomes (Gatti and Pimpinelli 1992). Essential functional components are contained within heterochromatic regions, including centromeres, telomeres, ribosomal RNA genes and 30-50 protein coding genes. Progress in understanding the molecular structure and composition of heterochromatin has been limited since much of this DNA, in particular the simple sequence satellite DNA, cannot be stably cloned in existing cosmid, YAC or P1 vectors. However, data obtained while constructing the P1 framework map suggests that the P1 library may contain a substantial proportion of non-satellite heterochromatic sequences (Hartl et al., 1994). Karpen and his collaborators have focused on analyzing the structure and function of heterochromatic regions of the Dp1187 minichromosome, a deletion derivative of the X-chromosome that is only about 1.3 Mb in length (Karpen and Spradling, 1990). Recently, Dp1187 deletion derivatives were generated by irradiation mutagenesis and their structures were determined from PFGE and DNA blot hybridization analyses. Minichromosome derivatives with one break in the euchromatin and one break in the heterochromatin provided single copy entry points for detailed pulse-field restriction mapping of previously inaccessible regions of centric heterochromatin. The map revealed the presence of three large complex "islands" containing middle-repetitive and/or single-copy sequences that are separated by inter-island "seas" of satellite sequences (see Figure 2; Le et al., 1994). Pulsed-field DNA blot analysis demonstrated that in general Drosophila heterochromatin is composed of alternating blocks of complex DNA and simple satellite DNA, each hundreds of kilobases in length. The blocks of complex DNA themselves have considerable substructure and contain many transposable element insertions. The major conclusion from these studies is that a surprising and significant amount of substructure is present deep within Drosophila centric heterochromatin. The presence of repeated DNA has made molecular-genetic analyses of higher eukaryotic centromeres and other heterochromatic inheritance elements extremely difficult. Numerous molecular and cytological studies have associated satellite DNAs with centromeres in mammals, but the exact function of satellite DNAs in mammalian inheritance is unclear in large part because the transmission behavior of molecularly defined components has not been assayed directly. Analyses of the meiotic and mitotic transmission behavior of Dp1187 deletion derivatives has localized sequences necessary for chromosome inheritance within the centric heterochromatin. The essential core of the centromere is contained within a 220 kb region that includes significant amounts of complex DNA (see Figure 2; Murphy and Karpen, 1995).
Figure 2. Molecular structure of Dp1187 centric heterochromatin. Sequences to the left of the euchromatin/heterochromatin boundary (position 0 kb) include X-tip euchromatin (solid line) and the subtelomeric heterochromatin (gray box); the 1 Mb of centric heterochromatin is shown as a gray box (0 to +1000 kb). Black boxes are the islands of complex DNA (Tahiti, Moorea, and Bora Bora), which are digested with numerous restriction enzymes that recognize 6 bp sites. Gray boxes in the centric region are blocks that predominantly contain satellite DNA repeats. The approximate locations for some satellites are indicated, where known (1.688=359 bp; 1.672=AATAT). Gray bars within Bora Bora indicate the presence of satellite DNA that separate this island into 4 or more "mini-islands" (Sun and Karpen, unpubl.). The locations of the centromere essential core and redundant flanking regions are indicated below.
Genomic Sequencing The BDGP is currently the only group currently doing production scale Drosophila genomic sequencing. A two-year pilot project involving a sequencing team of seven individuals has just concluded. During this time over 2 Mb of genomic sequence have been completed and deposited in the public databases. The regions sequenced include the Bithorax (~350 kb; Martin et al., 1995) and Antennapedia (~430 kb) homeotic gene complexes, as well as about 1.5 Mb from the 34D-36A genomic region. Although these regions have been heavily studied, their sequences have led to unexpected observations; in the Bithorax complex for example, a glucose transporter gene was discovered in the midst of the complex of homeotic genes. Unlike most genome sequencing projects that employ a shotgun sequencing strategy, the BDGP is using a directed approach to DNA sequencing that has been developed and implemented by the Palazzolo and Martin group at LBNL. This strategy is diagrammed in Figure 3. The approach offers a number of potential advantages: it requires a reduced amount of sequencing; the assembly can be performed in an automated fashion; the assembly is based on redundant information which facilitates the accuracy of the final assembly; and the robust nature of the procedures make them highly amenable to automation. The BDGP was awarded a three year grant from the NCHGR in December 1995 that allows for a four-fold increase over the next 18 months in funding devoted to production sequencing; technological advances and economies of scale should allow a substantially greater relative increase in output. They anticipate that the 120 Mb euchromatic genome can be completed in 5 to 7 years (depending on future funding levels).
Figure 3. Directed sequencing strategy. The strategy has the following four steps: (1) A P1-based physical map that provides a set of minimally overlapping clones that represent the genome is generated using an STS content mapping strategy. (2) DNA from individual P1 clones is sheared to an average size of 3 kb, subcloned into a plasmid vector and set of minimally overlapping 3 kb subclones is identified. Initial approaches to generating this set of 3 kb subclones involved using a PCR-based screening method to identify minimally overlapping clones from three dimensional pools of 960 different clones. Recently, a strategy developed by Bruce Kimmel (unpubl.) has been used in which 192 unique subclones are selected and both ends are then sequenced. The end-specific sequence information is used to build contigs of 3 kb clones which are used as transposon targets and completely sequenced. Subsequent rounds of contig-building and subclone sequencing are employed until the 80 kb insert of the P1 is contiguous. (3) gd transposons are mobilized into the 3 kb target sequences by appropriate bacterial matings. Each clone contains an independent insertion and the insertions in a set of clones are mapped by PCR, using gd element and vector primers. (4) A minimal set of clones with transposons spaced about every 400 bp (indicated by shaded triangles) are selected and sequenced to give the complete double-stranded sequence of each 3 kb insert. Binding sites for the sequencing primers are provided by sequences present near the termini of the transposon.
Biological Annotation of the Genomic Sequence A key use of the sequence information from the canonical model organisms, such as Drosophila, will be to help interpret the sequence of the human genome. Simply determining the DNA sequence of the genome would be sufficient for comparison with the genomes of other species to identify similarities between genes or protein domains among species. But such similarities are inherently intellectually sterile, unless the biological functions of the genes have been established for one or more of the species being compared. If the model organism genome projects are to be maximally useful in assigning functions to human DNA sequences, they will need to utilize the powerful tools for determining gene function that are available to them so that not only the sequences of the genes, but also their biological functions, are determined. Among the model organisms, Drosophila is particularly well-suited for this role. In terms of evolutionary conservation of sequence similarity, Drosophila is the closest of the invertebrate model organisms to humans (Sidow and Thomas, 1994). Moreover, it terms of morphological, physiological, and behavioral complexity Drosophila is by far the closest to humans of these model organisms, yet its genome is not substantially bigger than the least complex metazoans. Finally, the large Drosophila research community-about one researcher per 2 genes-has provided a wealth of information and understanding unusual in its depth and intellectual breadth. These workers have already extensively characterized about 1000, or 10%, of all Drosophila genes in terms of sequence, gene structure, expression pattern and biological function. Through the efforts of many laboratories over the past 30 years, approximately 25% of the genome has been subjected to saturation mutagenesis experiments which attempt to identify all the genes that can mutate to an easily detectable phenotype. These studies lead to an estimate of 4,000 for the number of genes whose functions are essential for viability, or about one third of the total gene number. One of the best characterized regions is the 1.8 Mb in polytene divisions 34D-36A (Ashburner et al., 1990). The sequence of this region is nearing completion by the BDGP and an attempt is being made to correlate open reading frames and transcription units with genetic loci (see Figure 4). Paul Lasko and Beat Suter, at McGill University, are extending the work of the Wright laboratory (Stathakis et al., 1995) in the genetic and molecular analysis of a similarly sized genomic segment comprising polytene regions 37 and 38 and this will be an early target for the BDGP sequencing efforts. The detailed analysis of these regions-genomic sequence, transcript map, expression data, and mutational analysis-should provide a detailed view of the genomic organization of typical euchromatic regions. Indeed, early attempts at such biological annotation of genomic regions began in the 1980s and provided some of the first indications that the number of transcription units would greatly exceeded the number of genes that could be identified by mutational analysis (Bossy et al., 1984). As part of its efforts to develop and apply tools for large-scale functional analysis, the BDGP has undertaken a novel gene disruption project based on mutagenesis by transposable element insertion (Spradling et al., 1995). Transposable elements provide a powerful tool for correlating genetic and molecular information because they generate a simple, reproducible lesion upon insertion that can be detected much more easily than damage produced by other mutagens. In D. melanogaster the P transposable element has been particularly useful because it moves with high frequency but can be tightly controlled by limiting the availability of an element-encoded transposase. The initial goal of the project is to establish a large collection of Drosophila strains that each contain a single genetically engineered P transposable element insertion that mutates a different gene, in a genome free of other P elements. Moreover, the inserted P elements in BDGP lines carry enhancer-traps that can be used to efficiently acquire information about the expression pattern of disrupted genes. The strains in the current collection disrupt 20-25 percent of essential genes, provide information on their expression patterns, and link the genetic, cytogenetic and physical maps of the Drosophila genome at ~100 kb intervals.
Figure 4. A prototype of the BDGP annotated sequence display. This figure is a screen dump taken from the BDGP's prototype Web genome browser applet, written in Java (G. Helt, unpubl.; for more information contact firstname.lastname@example.org). Correlated genetic map and sequence analysis are shown for a portion of P1 DS02740 (~83kb; GenBank acc. no. L49408). The scale is in kb. Drosophila genes that were previously sequenced by members of the Drosophila research community are shown in black. Lethal P-element insertions whose exact locations were mapped (Spradling et al., 1995) are indicated by the blue-tipped vertical bars. Note that one P element insertion appears to inactivate the gene encoding the L4 ribosomal protein and the other that encoding the ipa-6d homolog. Five Drosophila genes that mutate to detectable phenotypes map between these two P element insertions (J. Roote and M. Ashburner, pers. comm.); three can be assigned to specific transcripts, leaving two complementation groups and five transcripts unassigned. Similiarities of conceptually translated regions to known proteins (BLASTX with a significance cutoff of P=1.0e-8) are shown as red boxes. Results from two gene prediction programs are also shown. Exons predicted by Drosophila GRAIL (Xu et al., 1995) are shown as purple boxes. Exons predicted by a Drosophila-specific version of Genefinder are shown as green boxes. These results were further filtered to eliminate certain gene predictions that were completely "shadowed" by higher-score predictions on the opposite strand. Based on analysis of the database searches and computational gene predictions, primers were designed to probe cDNA libraries for the most likely gene candidates, which are annotated as red arrows. cDNAs were found for all of these candidate transcripts (with the sole exception of the grey arrow on the left; L. Hong, D. Harvey and G. Rubin, unpubl.). These cDNAs are currently being sequenced. The cDNAs are labelled when a significant similarity has been detected by BLASTX: L4 (bacterial ribosomal protein L4, P=4.2e-23); ZK418 (predicted gene from C.elegans, P=6.2e-9); GMF (human glial maturation factor, P=3.0e-43); mkr2 (mouse CNS, P=9.7e-44); ipa-6d (B.subtilus ORF, P=7.2e-12). Significant homologies of the known genes are also worth noting: cactus (human bcl-3 [ikb family], P=2.7e-31); fizzy (human p55cdc, P=5.5e-167); , cornichon (hypothetical yeast protein, P=5.0e-15); SED5 (rat syntaxin, P=9.7e-46).
Databases There are two main databases for Drosophila genome information: FlyBase and the BDGP database. In addition there are numerous specialized databases dealing with many aspects of Drosophila anatomy, gene expression and gene function; a list of these other resources can be found at http://www-leland.stanford.edu/~ger/drosophila.html. FlyBase is the central database for Drosophila genetic information. It captures information from the literature, from the major genome projects and through bulk data provided by sequence and bibliographic data banks. The major genomic data sets in FlyBase include genetic information on genes, alleles, chromosomal aberrations and transposons, as well as molecular information on contigs, chromosomal walks, transcripts and proteins. FlyBase data can be accessed through the World Wide Web (http://morgan.harvard.edu/ or http://www.embl-ebi.ac.uk/flybase/) or by gopher server (flybase.bio.indiana.edu). Data available from the BDGP home page (http://www.fruitfly.org/) include data on the P-element gene-disruption project and monthly updates of the P1-based physical map. A sequence display is under development that presents genomic sequence determined by the BDGP, annotated with the results of homology searches, gene prediction programs and cDNA sequence and expression analyses. A prototype of this display is shown in Figure 4. The BDGP and FlyBase have collaborated to produce the Encyclopaedia of Drosophila, a database and graphical user interface that uses a version of ACEDB (R. Durbin and J. Thierry-Mieg, unpubl.) customized for Drosophila (S. Lewis and C. Harmon, unpubl.) to present an integrated view of much of the BDGP and FlyBase data. The EofD is available as a Macintosh-compatible CD-ROM or by ftp in Macintosh or UNIX versions. ACKNOWLEDGEMENTS I thank my Drosophila colleagues for communicating their results and future plans, and A. Spradling and M. Palazzolo for comments on the manuscript.
REFERENCES Ajioka, J. W., D. A. Smoller, R. W. Jones, J. P. Carulli, A. E. C. Vellek, D. Garza, A. J. Link, I. W. Duncan and D. L. Hartl. 1991. Drosophila genome project: One-hit coverage in yeast artificial chromosomes. Chromosoma 100: 495-509 Ashburner, M., P. Thompson, J. Roote, P. Lasko, Y. Grau, M. El Messal, S. Roth, and P. Simpson. 1990. The genetics of a small autosomal region of Drosophila melanogaster containing the structural gene for alcohol dehydrogenase. Genetics 126: 679-694. Bender, W., P. Spierer, and D. Hogness. 1979. Gene isolation by chromosomal walking. J. Supramol. Struct. 8: 32. Bossy, B., L.M.C. Hall, and P. Spierer. 1984. Genetic activity along 315 kb of the Drosophila chromosome. EMBO J. 3: 2537--2541 Bridges, C. 1937. Correspondence between linkage maps and salivary chromosome structure, as illustrated in the tip of chromosome 2R of Drosophila melanogaster. Cytologia Fujii Jubilee Volume: 745-755 Cai, H., P. Kiefel, J. Yee, and I. Duncan. 1994. A yeast artificial chromosome clone map of the Drosophila genome. Genetics 136: 1385-1401 Garza, D., J. W. Ajioka, D. T. Burke and D. L. Hartl. 1989. Mapping the Drosophila genome with yeast artificial chromosomes. Science 246: 641-646 Gatti, M., and S. Pimpinelli. 1992. Functional elements in Drosophila melanogaster heterochromatin. Annu Rev Genet 26: 239-75 Grunstein, M. and D.S. Hogness, D.S. 1975. Colony hybridization: a method for the isolation of cloned DNAs that contain a specific gene. Proc. natn. Acad. Sci. USA. 72: 3961-3965. Hartl, D. L., D. I. Nurminsky, R.W. Jones, and E. R. Lozovskaya. 1994. Genome structure and evolution in Drosophila: Applications of the framework P1 map. Proc. Natl. Acad. Sci. USA 91: 6824-6829 Karpen, G.H. and A.C. Spradling. 1990. Reduced DNA polytenization of a minichromosome region undergoing position-effect variegation in Drosophila. Cell 63: 97-107. Kimmerly, W., K. Stultz, S. Lewis, K. Lewis, V. Lustre, R. Romero, J. Benke, D. Sun, G. Shirley, C. Martin, M. Palazzolo. 1996. A P1-based physical map of the Drosophila euchromatic genome, submitted. Le, M.-H., D. Duricka, and G.H. Karpen. 1995. Islands of complex DNA are widespread in Drosophila melanogaster centric heterochromatin. Genetics 141: 283-303 Locke J., G. Rairdan, H. McDermid, D. Nash, D. Pilgrim, J. Bell, K. Roy and R. Hodgetts. 1996. Cross-screening: a new method to assemble clones rapidly and unambiguously into contigs. (submitted). Madueno, E., G. Papagiannakis, G.A. Rimmington, R.D.C. Saunders, C. Savakis, I. Siden-Kiamos, G. Skavdis, L. Spanos, J. Trenear, P. Adam, M. Ashburner, P. Benos, V.N. Bolshakov, D. Coulson, D.M. Glover, S. Herrmann, F.C. Kafatos, C. Louis, T. Majerus, J. Modolell. 1995. A physical map of the X chromosome of Drosophila melanogaster: Cosmid contigs and sequence tagged sites. Genetics 139: 1631--1647 Martin, C.H., C.A. Mayeda, C.A. Davis, C.L. Ericsson, J.D. Knafels, D.R. Mathog, S.E. Celniker, E.B. Lewis, and M.J. Palazzolo. 1995. Complete sequence of the bithorax complex of Drosophila. Proc. Natl. Acad. Sci. USA 92: 8398-8402 Murphy, T. and G.H. Karpen. 1995. Localization of centromere function in a Drosophila minichromosome. Cell 82: 599-609. Palazzolo, M.J., S.A. Sawyer, C.H. Martin, D.A. Smoller, and D.L. Hartl. 1991. Optimized strategies for STS selection in genome mapping. Proc. Natl. Acad. Sci. 88: 8034-8038 Pardue, M.L., S.A. Gerbi, R.A. Eckhardt, and J.G. Gall. 1970. Cytological localization of DNA complementary to ribosomal RNA in polytene chromosomes of Diptera. Chromosoma 29: 268--290 Rubin, G.M., D.J. Finnegan, and D.S. Hogness. 1976. The chromosomal arrangement of coding sequences in a family of repeated genes. Prog. Nucleic Acid Res. Molec. Biol. 19:;221-226. Siden-Kiamos I., R.D.C. Saunders, L. Spanos, T. Majerus. J. Trenear, C. Savakis, C. Louis., D.M. Glover, M. Ashburner, F.C. Kafatos. 1990. Towards a physical map of the Drosophila melanogaster genome: mapping of cosmid clones within defined genomic divisions. Nucleic Acids Res. 18: 6261--6270 Sidow, A. and W.K. Thomas. 1994. A molecular evolutionary framwork for eukaryonic model organisms. Curr. Biol. 4: 596-603. Smoller, D. A., D. Petrov and D. L. Hartl. 1991. Characterization of bacteriophage P1 library containing inserts of Drosophila DNA of 75-100 kilobase pairs. Chromosoma 100: 487-494. Spradling, A. C., D. M. Stern, I. Kiss, J. Roote, T. Laverty and G.M. Rubin. 1995. Gene disruptions using P transposable elements: An integral component of the Drosophila genome project. Proc. Natl. Acad. Sci. USA. 92: 10824-10830 Stathakis, D. G., E. S. Pentz, M. E. Freeman, J. Kullman, G. R. Hankins, N. J. Pearlson and T. R. F. Wright. 1995. The genetic and molecular organization of the Dopa decarboxylase gene cluster of Drosophila melanogaster. Genetics 141: 629-655 Sturtevant, A. H. 1913. The linear arrangement of six sex-linked factors in Drosophila, as shown by their mode of association. J. Exp. Zool. 14: 43-59 Wensink, P.C., D.J. Finnegan, J.E. Donelson, and D.S. Hogness. 1974. A system for mapping DNA sequences in the chromosomes of Drosophila melanogaster. Cell. 3: 315-325. Xu, Y., G. Helt, J.R.Einstein, G.Rubin, and E.C.Uberbacher. 1995. Drosophila GRAIL: An intelligent system for gene recognition in Drosophila DNA sequences..Proc. First Int. Symp. on Intelligence in Neural and Biological Systems. 128-135.