BDGP News Archive

















The BDGP has submitted 21,620 ESTs from the new AT adult testes cDNA library, and these clones are now available. Click [HERE] for more information. JUN-06-01 Update to Single Nucleotide Polymorphism Map


The BDGP has submitted 21,620 ESTs from the new AT adult testes cDNA library, and these clones are now available. Click [HERE] for more information.


As the genomic sequence is finished to high quality, it will appear [HERE] as Release 2.5. The page will be updated regularly as more sequence is finished.


Release 2 of the annotated genomic sequence has been made public. Read Release 2 Notes [HERE].


The Drosophila Gene Collection r1.0 has been released. Find out more [HERE].


The release of the DGC has been delayed. A low titer of phage contamination has been fo und in a few wells of a small number of the plates. We are currently analyzing all of t he plates, and will do our best to eliminate this contamination. We now plan to distrib ute the 4534 clones in the pOT vector before the end of August. The remaining clones in the pBluescript vector will follow in a few weeks. Please contact [email protected] if you have any questions.


The GadFly, BFD, and cDNA/EST databases are once again functioning. We have been having recurring problems with our Informix server software that have been difficult to debug. We appreciate your patience, and thank everyone who has notified us about problems with the databases.


Download the DGC

We recently reported the analysis of >80,000 ESTs to select a set of 5,849 non-redundant cDNA clones (Rubin et al., Science 287: 2222). This is not a complete set for the Drosophila genome, which is reported to have 13,601 genes, but more cDNA clones will be added to later versions of the DGC.

To facilitate the widespread distribution of these cDNA clones to the community, we will donate copies of the collection to 50 laboratories selected by the National Drosophila Board. The 4534 clones in the pOT vector will be distributed before the end of June. The remaining clones in the pBluescript vector will follow in a few weeks. These clones will also be made available to commercial distributors. As soon as these laboratories and commercial distributors are determined, we will post their contact information on this site. Individual laboratories who wish to obtain a copy should read the letter from the Fly Board. Commercial distributors who wish to obtain a copy should contact [email protected].


The annotated genomic sequence can now be queried by sequence similarity at the BDGP BLAST server. In addition, users can search against the annotations' predicted cDNA and protein sequences, and these data (as well as the CDS and local genomic sequence for the annotations) can be downloaded from our Sequence download page.

We have also updated the table of Predicted genes grouped by sequence similarity. These sets of similar genes, created by BLASTing all of the fly transcript sequences against each other, can be browsed by molecular function.


The BDGP and Celera Genomics report the sequencing and annotation of the euchromatic genome of Drosophila melanogaster. The results appear in the March 24 issue of the journal Science.

The annotation was done in a unique collaboration between Celera, BDGP, and other members of the scientific community. The results of the annotation are stored in GadFly, the FlyBase Genome Annotation Database of Drosophila. This new database and chromosome arm sequence can be queried by gene name, cytological region, molecular function, or protein domain. The annotated genome can also be browsed graphically with our new Java display tool GeneScene.

The genomic sequence and annotations are preliminary. The BDGP has a plan to systematically finish the sequence to high quality and refine and improve the annotations with FlyBase over the next year. At this time, ~92% of the genome is in contigs larger than 30kb, and ~78% in contigs greater than 100kb; most gaps are small (3kb or less) and due to genomic repeats, such as transposons. These contigs are ordered and oriented with respect to the genome: >95% of the euchromatic sequence is in 14 large scaffolds, and is freely available on our sequence download page. The sequence of all the scaffolds (including heterochromatin), predicted transcripts, and predicted proteins are also available on that page.

The exon-intron structure of annotations will often be incorrect initially, but full-length sequencing of cDNAs corresponding to these genes should provide the correct gene structures. We recommend repeating sequence similarity searching yourself using BLAST. Because the sequence will be in flux, we ask that you record short molecular sequence tags, e.g., 30bp of unique sequence from your region of interest, rather than recording the coordinates of a region based on absolute numbers.

Our efforts will be greatly facilitated if the public reports changes to the sequence and annotation to us using our new Update Form; we will make these comments public in the annotation reports.


Celera Genomics has finished their shot gun sequencing of the D. melanogaster genome, and the BDGP has begun the task of gap filling. Over 45 megabases of this sequence are already available from the GenBank/EMBL/DDBJ database. All of the Celera sequence will be available, without annotation, by December 31, 1999. The new sequence covering any gaps will be released to GenBank by the BDGP immediately once it is available. The annotated sequence will be released to the sequence database on publication, which is expected in the first quarter of 2000.

One aspect of the annotation is matching this sequence against Drosophila genes whose sequences are already known. FlyBase has a file of 2540 complete or partial gene sequences. These are freely available on our sequence download page as na_embl.dros.

We are aware that many laboratories have sequences of genes that have not yet been submitted to the nucleotide sequence databases. We encourage you to submit these sequences. This will have three advantages: You will get the credit for having identified and sequenced this gene first ! We will be able to include the identification of these genes on the annotated sequence when it is published. If your sequence is of a cDNA then it will help us get the correct gene structure.


The BDGP has sequenced a complete 2.9-megabase region of the Drosophila melanogaster genome and exhaustively analyzed it. The complete results are available on this page, as well as a Java applet for interactive browsing of the annotations on the Adh region.


View the Scaffold Sequence

The BDGP and Celera Genomics are working together to complete the sequence of the Drosophila melanogaster genome. Celera has produced whole-genome shotgun data to an average depth of 10X coverage. In addition to 26.5 Mb of completed genome sequence, the BDGP is producing a BAC-based genome physical map and defining a tiling path of overlapping BAC and P1 clones spanning the euchromatic portion of the genome. BDGP is generating low-coverage (~1.5X) shotgun sequence of each BAC and P1 clone in the tiling path that has not already been sequenced to greater coverage or completion. This "scaffold sequence" is being used to assist in assembly and finishing of the whole-genome shotgun data.

These new tables show work in progress and will be updated regularly.


A table of human inherited disease genes and Drosophila sequences similar to them by BLAST. In many cases a known Drosophila gene has been identified; in addition, in some cases new Drosophila sequences strikingly similar to the human gene are found in the Drosophila EST and genomic sequence databases.


Results of the community wide experiment to assess gene prediction on long eukaryotic genomic sequences have been published, and are available on our web site. They were presented at ISMB99.


The BDGP and Celera Genomics have signed a Memorandum of Understanding (MOU) outlining how they will work together on sequencing the Drosophila Genome. This collaboration was first announced in the June 5th issue of Science. The goal of this collaboration is to produce a high-quality, publicly available sequence of Drosophila euchromatin by the end of this year.