cDNA & EST FAQ

How were the cDNA libraries made?

What are "IP" or "MIP" clones and what library are these clones from?

What are "FI" clones and what library are these clones from?

Does the BDGP have an ORFeome project?

How long are the cDNAs? Is the submitted sequence the entire sequence of the cDNA?

When will my cDNA clone be full-length sequenced?

What is a "BcDNA?"

Are there sequences in the cDNA database that are not in NCBI's Genbank database?

How do I find a cDNA for my gene?

Is there additional information available for cDNA clones?

How do I request a cDNA clone?

How do I request an aliquot of a cDNA library?

How are the cDNA libraries sequenced?

How are the cDNA clones numbered?

What is the strategy for full-insert sequencing?

What is an EST?

What was the purpose of the EST project?

What is a clot? Can I view EST sequences that are homologous to each other?

Have the ESTs been mapped by in situs to polytene chromosomes?

What was the production rate of the ESTs?

How were RNA in situ hybridizations to embryos with the CK library performed?

How are the sequence data processed and what is the quality of the sequence?

What is the sequence representation of the library?

What is the Drosophila Gene Collection?

How do I obtain the Drosophila Gene Collection?

In what format is the Drosophila Gene Collection?

What does "Not DGC Clone" mean?


cDNA

How were the cDNA libraries made?

AT, CK, FI, GH, GM, HL, IP, LD, LP, MIP, RE, RH, SD, TA, TB, UT

1.AT pOTB7 Plasmid Library (made by Ling Hong with mRNA from J. Pringle and M. Fuller)
The AT (adult testes) library was made from RNA extracted from Drosophila adult male testes. mRNA source: Adult male testes and seminal vesicles hand dissected from 0-3 day old non-isogenic Ore-R males from a population cage, polyA+ selected twice, kindly provided by J. Pringle and M. Fuller. cDNA made using Stratagene ZAP-cDNA synthesis kit; oligo(dT) primed with XhoI site at end of primer for first strand synthesis; EcoRI adapter on 5' ends of clones; cDNA size fractionated on Sephacryl S-500--approximately 1-6kb. cDNA directionally cloned directly into EcoRI/XhoI-digested pOTB7 plasmid. cDNAs were transformed into DH5-alpha strain (Gibco/BRL), or for AT121nn clones, into DH5-alpha TonA.

2.CK Library (made by Casey Kopczynski)
mRNA source--rough endoplasmic reticulum (ER) from 8-16hr embryos from a non-isogenic Oregon R population cage. RER selection to bias for transcripts encoding secreted and membrane-associated proteins, polyA+ selected twice [note: only 70% of the cDNAs appear to be from secreted or membrane-associated proteins by Northern blotting; the rest are cytoplasmic or nuclear proteins]. Oligo(dT) primed with PstI site at end of primer for first strand synthesis; normalized to genomic DNA beads; KS complementary sequence put at 5' end of each cDNA HindIII/XmnI adapter on 5' ends of clones; cDNA directionally cloned into HindIII/PstI-digested pBluescript SK(+) vector. Detailed Protocol

3.FI Clones
Please see the question below.

4.GH ZAPII Library (made by Ling Hong)
The GH library is separate from the HL library. Since many of the HL clones were not full length and others lacked inserts, we chose to remake the head library. mRNA source--adult heads, from an isogenic y; cn bw sp strain, polyA+ selected once. Oligo(dT) primed with XhoI site at end of primer for first strand synthesis; EcoRI adapter on 5' ends of clones; size fractionated on Sephacryl S-500--approximately 1-6kb. cDNA directionally cloned into EcoRI/XhoI-digested Stratagene Uni-Zap XR vector, which allows in vivo excision and recircularization of pBluescript SK(+/-) plasmid.

5.GH pOT2 Plasmid Library (made by Ling Hong)
The GH library is separate from the HL library. Since many of the HL clones were not full length and others lacked inserts, we chose to remake the head library. mRNA source--adult heads, from an isogenic y; cn bw sp strain, polyA+ selected once. Oligo(dT) primed with XhoI site at end of primer for first strand synthesis; EcoRI adapter on 5' ends of clones; size fractionated on Sephacryl S-500--approximately 1-6kb. cDNA directionally cloned into EcoRI/XhoI-digested pOT2 plasmid. ~500,000 colonies were plated out and a mass plasmid DNA prep was done using Qiagen Plasmids containing the cDNA inserts were size selected on Sephacryl S-500 column in order to exclude plasmids with no insert. Early elutions were collected and transformed into DH5 alpha strain (Gibco/BRL).

6.GM Library (made by Ling Hong)
mRNA source--ovaries, stage 1-6 of oogenesis, from a non-isogenic Oregon R strain P2 population cage, polyA+ selected once, RNA provided by Allan Spradling. cDNA made using Stratagene ZAP-cDNA synthesis kit; oligo(dT) primed with XhoI site at end of primer for first strand synthesis; EcoRI adapter on 5' ends of clones; size fractionated on Sephacryl S-500--approximately 1-6kb. For clones GM01001-GM11496, cDNA directionally cloned into EcoRI/XhoI-digested Stratagene Uni-Zap XR vector, which allows in vivo excision and recircularization of pBluescript SK(+/-) plasmid For clones GM12101 and higher, cDNA directionally cloned into EcoRI/XhoI-digested pOT2 plasmid

7.HL Library (made by Ling Hong)
mRNA source--adult heads, from an isogenic y; cn bw sp strain, polyA+ selected once. cDNA made using Stratagene ZAP-cDNA synthesis kit; oligo(dT) primed with XhoI site at end of primer for first strand synthesis; EcoRI adapter on 5' ends of clones; size fractionated on Sephacryl S-500--approximately 1-6kb. For clones HL01001-HL07796, cDNA directionally cloned into EcoRI/XhoI-digested Stratagene Uni-Zap XR vector, which allows in vivo excision and recircularization of pBluescript SK(+/-) plasmid For clones HL07801 and higher, cDNA directionally cloned into EcoRI/XhoI-digested pOT2 plasmid

8.IP and MIP clones
Please see the question below.

9.LD Library (made by Ling Hong)
mRNA source--0-22hr embryos, from an isogenic y; cn bw sp strain, polyA+ selected twice. cDNA made using Stratagene ZAP-cDNA synthesis kit; oligo(dT) primed with XhoI site at end of primer for first strand synthesis; EcoRI adapter on 5' ends of clones; size fractionated on Sephacryl S-500--approximately 1-6kb. For clones LD01001-LD21096, cDNA directionally cloned into EcoRI/XhoI-digested Stratagene Uni-Zap XR vector, which allows in vivo excision and recircularization of pBluescript SK(+/-) plasmid. For clones LD21101 and higher, cDNA directionally cloned into EcoRI/XhoI-digested pOT2 plasmid.

10.LP ZAPII Library (made December, 1998 by Ling Hong)
mRNA source--varying stages of larvae and early pupae, from an isogenic y; cn bw sp strain, polyA+ selected once. Oligo(dT) primed with XhoI site at end of primer for first strand synthesis; EcoRI adapter on 5' ends of clones; size fractionated on Sephacryl S-500--approximately 1-6kb. cDNA directionally cloned into EcoRI/XhoI-digested Stratagene Uni-Zap XR vector, which allows in vivo excision and recircularization of pBluescript SK(+/-) plasmid.

11.LP pOT2 Plasmid Library (made by Ling Hong)
mRNA source--varying stages of larvae and early pupae, from an isogenic y; cn bw sp strain, polyA+ selected once. Oligo(dT) primed with XhoI site at end of primer for first strand synthesis; EcoRI adapter on 5' ends of clones; size fractionated on Sephacryl S-500--approximately 1-6kb. cDNA directionally cloned into EcoRI/XhoI-digested pOT2 plasmid. ~500,000 colonies were plated out and a mass plasmid DNA prep was done using Qiagen Plasmids containing the cDNA inserts were size selected on Sephacryl S-500 column in order to exclude plasmids with no insert. Early elutions were collected and transformed into DH5 alpha strain (Gibco/BRL).

12.RE pFlc1 Plasmid Library (kindly generated by Piero Carninci at the RIKEN with mRNA from Ling Hong)
The RE (Riken embryo) library was made from RNA extracted from Drosophila 0-22hr mixed stage isogenic y; cn bw sp strain embryos, polyA+ selected twice, RNA made by Ling Hong. cDNA was synthesized by priming with the oligo(dT) primed adapter (5'-GAGAGAGAGAGGATCCAATACTGGAGAGTTTTTTTTTTTTTTTTVN-3'). The first strand was synthesized in presence of trehalose, which increases the full-length cDNA synthesis (Carninci, P. et. al., PNAS 95: 520-524; Carninci, P. et al., Methods Enzymol. 303: 19-44). Subsequently, full-length cDNA was selected with the biotinylated cap-trapper (Carninci, P. et al., Genomics 37: 327-336). A linker was then ligated to the single-strand cDNA following the published protocol (Shibata, Y., et al., Biotechniques, in press). Subsequently, the cDNA was normalized by using RoT=1.0 as published (Carninci, P., et al., Genome Res. 10: 1617-1630). Second strand cDNA was primed with the (5'-AGAGAGAGAGCTCGAGCTCTAATAAGGTGACACTATAGAACCA-3') primer. After restriction digestion of the hemimethylated cDNA with BamHI and XhoI, the cDNA was cloned in the lambda FLC-I vector. Subsequently, the library was bulk-excised into pFLC-I plasmid as described (Carninci, P., et al., Genomics, 77:79-90). cDNAs were transformed into DH5-alpha TonA strain.

13.RH pFlc1 Plasmid Library (kindly generated by Piero Carninci at the RIKEN with mRNA from Ling Hong)
The RH (Riken head) library was made from RNA extracted from Drosophila adult heads, from the isogenic y; cn bw sp strain, polyA+ selected once, RNA made by Ling Hong. cDNA was synthesized by priming with the oligo(dT) primed adapter (5'-GAGAGAGAGAGGATCCAATACTGGAGAGTTTTTTTTTTTTTTTTVN-3'). The first strand was synthesized in presence of trehalose, which increases the full-length cDNA synthesis (Carninci, P. et. al., PNAS 95: 520-524; Carninci, P. et al., Methods Enzymol. 303: 19-44). Subsequently, full-length cDNA was selected with the biotinylated cap-trapper (Carninci, P. et al., Genomics 37: 327-336). A linker was then ligated to the single-strand cDNA following the published protocol (Shibata, Y., et al., Biotechniques, in press). Subsequently, the cDNA was normalized by using RoT=1.0 as published (Carninci, P., et al., Genome Res. 10: 1617-1630). Second strand cDNA was primed with the (5'-AGAGAGAGAGCTCGAGCTCTAATAAGGTGACACTATAGAACCA-3') primer. After restriction digestion of the hemimethylated cDNA with BamHI and XhoI, thecDNA was cloned in the lambda FLC-I vector. Subsequently, the library was bulk-excised into pFLC-I plasmid as described (Carninci, P., et al., Genomics, 77:79-90). cDNAs were transformed into DH5-alpha TonA strain.

14.SD pOT2 Plasmid Library (made March, 1999 by Ling Hong)
The SD library was made from RNA extracted from Drosophila tissue culture cells. mRNA source: non-isogenic Schneider L2 cells, polyA+ selected once. oligo(dT) primed with XhoI site at end of primer for first strand synthesis; EcoRI adapter on 5' ends of clones; size fractionated on Sephacryl S-500--approximately 1-6kb. cDNA directionally cloned into EcoRI/XhoI-digested pOT2 plasmid. ~300,000 colonies were plated out and a mass plasmid DNA prep was done using Qiagen Plasmids containing the cDNA inserts were size selected on Sephacryl S-500 column in order to exclude plasmids with no insert. Early elutions were collected and transformed into DH5 alpha strain (Gibco/BRL).

15. TA pOTB7D plasmid library (Mark Stapleton and Charles Yu)

The TA (Total Adult) cDNA library was made from mixed male and female adults of the isogenic y; cn bw sp strain. Total RNA was processed with CIP and TAP in order to ligate an RNA adapter (RLM oligo: GCUGAUGGCGAUGAAUGAACACUGCGUUUGCUGGCUUUGAUGAAA) to decapped mRNA using T4 RNA ligase. This processing step is similar to the preparation of RNA for 5' RACE. Components of the FirstChoice RLM-RACE Kit (Ambion) were used in this processing step. mRNA was then isolated using the PolyAPurist Mag isolation system (Ambion). First strand synthesis was carried out using PowerScript Reverse Transcriptase and an oligo dT primer adapter (5'-ATTCTAGAGGCCGAGGCGGCCGACATG-d(T)30VN-3') from the Creator cDNA Library Construction kit (Clontech). Second Strand synthesis was carried out with Advantage 2 enzyme using 5 cycles of PCR with the first strand primer and a primer that anneals within the RLM oligo (AGTCGGCCTTGTCGGCCCTGCGTTTGCTGGCTTTGATG). Double-stranded cDNA was digested with SfiI and ds-cDNA > 200bp was purified using the Prep-A-Gene Gel Extraction Kit (BioRad). Purified cDNA was then directionally ligated into the DraIII digested pOTB7D vector. cDNAs were transformed into DH5-alpha.

16. TB pOTB7D plasmid library (Mark Stapleton and Charles Yu)

The TB (Total Adult) cDNA library was made from mixed male and female adults of the isogenic y; cn bw sp strain. Total RNA was processed with CIP and TAP in order to ligate an RNA adapter (RLM oligo: GCUGAUGGCGAUGAAUGAACACUGCGUUUGCUGGCUUUGAUGAAA) to decapped mRNA using T4 RNA ligase. This processing step is similar to the preparation of RNA for 5' RACE. Components of the FirstChoice RLM-RACE Kit (Ambion) were used in this processing step. mRNA was then isolated using the PolyAPurist Mag isolation system (Ambion). First strand synthesis was carried out using Thermoscript Reverse Transcriptase (Invitrogen) and an oligo dT primer adapter (5'-ATTCTAGAGGCCGAGGCGGCCGACATG-d(T)30VN-3') from the Creator cDNA Library Construction kit (Clontech). Second Strand synthesis was carried out using 5 cycles of PCR with the first strand primer and a primer that anneals within the RLM oligo (AGTCGGCCTTGTCGGCCCTGCGTTTGCTGGCTTTGATG). Double-stranded cDNA was digested with SfiI, fractionated on a 3% NuSieve 3:1 agarose gel (Lonza Biosciences), and ds-cDNA > 1.5kb was purified using the Prep-A-Gene Gel Extraction Kit (Biorad). Purified cDNA was then directionally ligated into the DraIII digested pOTB7D vector. cDNAs were transformed into DH5-alpha.

17.UT pOTB7 Plasmid Library (made by Ling Hong with mRNA from the M. Fuller lab)
The UT (adult testes) stem cell enriched library was made from adult males overexpressing a UAS-unpaired transgene in germ cells from the nos-vp16 Gal4 driver. The UAS-unpaired transgenic flies were in a y,w genetic background and the nos-vp16 Gal4 driver were in a w background. Additionally there was a UAS-gfp transgene recombined onto the nosGAL4 chromosome. mRNA source: RNA was isolated from hand dissected testes from 0-3 day old males, polyA+ selected twice, kindly provided by the lab of M. Fuller. cDNA made using Stratagene ZAP-cDNA synthesis kit; oligo(dT) primed with XhoI site at end of primer for first strand synthesis; EcoRI adapter on 5' ends of clones; cDNA size fractionated on Sephacryl S-500--approximately 1-6kb. cDNA directionally cloned directly into EcoRI/XhoI-digested pOTB7 plasmid. cDNAs were transformed into DH5-alpha TonA.

What are "IP" or "MIP" clones and what library are these clones from?

cDNA clones submitted to Genbank that have the BDGP identifier starting with IP or MIP have been isolated from a directed screening process. Clones isolated from this screen have been PCR amplified using an individual plasmid cDNA lirary, or a mixture of up to 6 different cDNA libraries, as template. These include any of the libraries listed above. For further details please see the iPCR Screen page.

What are "FI" clones and what library are these clones from?

Clones submitted to Genbank that have the BDGP identifier starting with FI were generated via site-directed mutagenesis. The FI clones are made from cDNAs that have some type of experimentally derived error including reverse transcriptase (RT) errors and chimeras (cDNAs that were ligated together in the library construction step or clones that were recombined to generate a full-length protein). Modifications were made to the original cDNA clone such that the new FI clone's protein translation would match the current (at the time mutagenesis primers were designed) annotation. FI clones were full-length sequenced to ensure no new errors were introduced during mutagenesis.

Does the BDGP have an ORFeome project?

Yes. We have been funded to generate proteomic resources in the form of ORF collections from our Gold Collection. Please see the Universal Proteomics Resources page for details and progress of the project.

How long are the cDNAs? Is the submitted sequence the entire sequence of the cDNA?

We have devoted considerable effort to generate high quality libraries in which (with the exception of the CK library) a significant percentage of clones are full length. To provide a rough estimate of the degree to which we succeeded in synthesizing full length clones, we examined the start sites of 100 of our LD cDNAs, each of which corresponds to a different gene where previous researchers had deposited a putatively full-length cDNA sequence in GenBank. In 75 out of 100 cases our cDNA clone was within 100 bases of the longest cDNA in GenBank for that gene. In 29 out of 100 cases our cDNA was longer than the longest cDNA in GenBank. We have reanalyzed those clones that are represented in the DGC and we estimate that 80% of clones extend beyond the 5' start site while 34.5% of clones are longer than the the corresponding clone in the Genbank test set. (See table 1 of Science paper.)

We have also determined the size of clones in the DGC by PCR using vector primers that flank the cDNA insert. The protocol for sizing is here. The average size for each library is found at the bottom of this table and is 2.2kb for the entire collection.

When will my cDNA clone be full-length sequenced?

We have finished the full-insert sequencing of the new Drosophila Gene Collection release 1.0 and 2.0, and are currently working on release 3.0. We submit sequenced clones to GenBank but unfortunately we cannot tell you when an individual clone will be done.

What is a "BcDNA?"

"BcDNA" stands for "Berkeley cDNA." These are cDNA clones that have been full-length sequenced. The name "BcDNA" is followed by the ID of the particular cDNA clone (e.g. BcDNA:GH04753).

Are there sequences in the cDNA database that are not in NCBI's Genbank database?

There is usually no real lag time between when a sequence appears in our cDNA database and when it is submitted to NCBI's dbEST. However, there are 3'EST sequences for cDNA clones not in the DGC that are in our cDNA database but not in GenBank. We are working to submit these as soon as possible.

How do I find a cDNA for my gene?

  1. Go to the FlyBase annotation database and query with the gene symbol, then click on the individual gene report as needed. The FlyBase associated cDNA clones are listed in the section Stocks & Reagents under 'cDNA Clones'.
  2. If you cannot locate a cDNA clone in FlyBase, search for the gene in the UCSC genome browser and search for mRNA and EST tracks for clones.

Is there additional information available for cDNA clones?

We generally submit EST and cDNA sequence to GenBank as it is generated, but we do not submit our alignments to the genome or other supplemental information. Further details of the clones and some data not yet submitted to GenBank are available through our public web site: Clone Report

How do I request a cDNA clone?

Please see our Materials web page.

How do I request an aliquot of a cDNA library?

To obtain an aliquot of a BDGP cDNA library, look up a lab near you on the list of labs which already have samples from us. These labs have agreed to share aliquots with others in their area.

How are the cDNA libraries sequenced? Each EST is generated by sequencing the 5', and sometimes the 3', end of an individual cDNA clone.

CK Library: We sequenced all cDNAs that showed some pattern by in situ of embryos and some that were uniformly expressed, but none that were not expressed. Selected cDNAs were sequenced from both ends, and others were sequenced entirely using walk-in primers. Sequencing was done using either Pharmacia Autoread Sequencing kits with Cy-5-labelled primers and run on a Pharmacia ALF Express automated DNA sequencer, or using ABI Prism Dye Terminator Cycle Sequencing Ready Reaction kits with AmplitaqR DNA Polymerase and run on an ABI Prism 373 automated DNA sequencer. Universal or T7 primers were used to read the 5' end and reverse or T3 primers were used to read the 3' end of the cDNAs.

All other libraries: We are currently using dye terminator sequencing (ABI Big Dye). The reactions are sequenced on ABI 3730xl capillary DNA sequencers.

How are the cDNA clones numbered?

BDGP cDNA clone IDs are always seven characters; two letters followed by five digits. With the exception of the CK library, the following clone numbering rules are always followed:

BDGP cDNA plate numbers begin at 10 and are counted consecutively through the last plate in the library. Plate numbers less than 100 must be padded with an extra zero to make the required 3 digits. The well numbers are counted from 1 to 96 beginning at well A1 and going from left to right all the way down to well H12. Well numbers less than 10 must also be padded with an extra zero to make the required 2 digits. So, for example, the clone ID LD07623, is interperated as LD library, plate 76, well 23 (B11).

What is the strategy for full-insert sequencing?

It is a directed approach, based on a commercially available in vitro transposition system. The insert size of all 5,849 DGC clones have been determined and sorted into four size categories (<1.4 kb, 1.4-3.0 kb, 3.0-4.5 kb, and >4.5 kb). We estimate that 72% of the DGC clones(1.4-4.5 kb) will be sequenced from mapped PCR products, 23%(<1.4 kb) will be sequenced by oligo walking, and 4%(>4.5 kb) will be sequenced from plasmid prepared template derived from non-mapped transposon bearing cDNAs. Experiments suggest 70-80% of those clones that receive transposons will finish to high quality without any extra work. The remaining 20-30% will be finished to high quality using custom oligos. This in vitro system has a number of advantages that we prefer to other commercial in vitro systems. One major feature is that it allows us to sequence directly from the mapped transposon PCR products using nested primers. Using PCR products as sequencing templates provides a significant cost reduction in comparison to sequencing from plasmid prepped templates. In addition, by sequencing mapped products we will inherently provide an added layer of quality assurance from the actual visualization of sequencing template.

ESTs

What is an EST?

An EST is an Expressed Sequence Tag. It is the sequence of the 5' or 3' end of a cDNA that is used to rapidly identify expressed genes in the genome.

What was the purpose of the EST project?

Please see our EST project page and our Publications.

Our long term goal is to generate a transcript map that provides information on the intron-exon structure, alternative splicing, and transcription start and stop sites, by sequencing cDNAs and comparing them to the genomic sequence.

What is a clot? Can I view EST sequences that are homologous to each other?

The clones in the LD, HL, GM, GH, and LP have been compared to other clones in these libraries using BLAST, and ESTs that are homologous to each other have been grouped into "clots" using Phrap. When you query for a clone, the clot report will indicate the a when the clot was assembled, the length of the consensus sequence from the sequence alignment, the cDNA clones that make up the clot, homologies that the consensus sequence may show to other genes by BLAST similarity searching, and the consensus sequence itself.

Have the ESTs been mapped by in situs to polytene chromosomes?

No, they have not. But they can be mapped to the genomic sequence using BLAST.

How were RNA in situ hybridizations to embryos with the CK library performed?

See the RNA In Situ Hybridization protocol by Jasprien Noordermeer and Casey Kopczynski.

How are the cDNA libraries sequenced?

Each EST is generated by sequencing the 5', and sometimes the 3', end of an individual cDNA clone.

CK Library: We sequenced all cDNAs that showed some pattern by in situ of embryos and some that were uniformly expressed, but none that were not expressed. Selected cDNAs were sequenced from both ends, and others were sequenced entirely using walk-in primers. Sequencing was done using either Pharmacia Autoread Sequencing kits with Cy-5-labelled primers and run on a Pharmacia ALF Express automated DNA sequencer, or using ABI Prism Dye Terminator Cycle Sequencing Ready Reaction kits with AmplitaqR DNA Polymerase and run on an ABI Prism 373 automated DNA sequencer. Universal or T7 primers were used to read the 5' end and reverse or T3 primers were used to read the 3' end of the cDNAs.

All other libraries: We are currently using dye terminator sequencing (ABI Big Dye). The reactions are sequenced on ABI 3730xl capillary DNA sequencers.

How are the sequence data processed and what is the quality of the sequence?

CK Library:

The sequences were trimmed and edited manually using Sequencher 3.1 software. When the 5' and 3' sequences overlapped, contigs were constructed. All submitted sequences have an error rate of 3% or less.

LD, HL, GM, GH and LP Libraries:

BDGP EST sequencing read lengths average 546 bp, with high quality sequence averaging 447 bp.

LD, HL, GM, GH and LP Libraries:

The sequences are first processed by trimming the vector sequences (about 60 bp) from their 5' ends. The 3' end of the sequence is trimmed based on two quality cutoffs. The first cutoff marks the end of the so-called "high quality" sequence. We are using identical criteria to those being used in the HHMI/WashU Mouse EST Project, kindly provided by L. Hillier. By comparing our EST sequences to those in the database in cases where the corresponding gene has been sequenced, we estimate that the "high quality" sequence is more than 99% accurate. Finally, we trim the sequence at a point where we estimate the accuracy falls below 97%. The sequence, with an indication of where the "high quality" sequence ends, is then submitted to NCBI's dbEST. After removal of vector, our submitted sequences average 546 bp with an average "high quality" length of 447 bp.

What is the sequence representation of the library?

We performed our sequencing with librares which had not been normalized to remove abundant sequences. Nevertheless, ESTs represent 65% of all Drosophila genes.

What was the production rate of the ESTs?

We produced and submitted between 3000 and 4000 ESTs per month.

DGC

What is the Drosophila Gene Collection?

Please see our DGC page.

How do I obtain the Drosophila Gene Collection?

Please see our Materials page.

In what format is the Drosophila Gene Collection?

The DGCr1.0 was released in two formats. Some labs and the commercial resources were furnished with the arrays in 384-well formatted plates of bacterial glycerol stocks. Other labs received the arrays in 96-well formatted plates of plasmid DNA stocks. The DGCr2.0 was released in the 384-well plates of bacterial glycerol stocks.

What does "Not DGC Clone" mean? These are wells that contain cDNAs that are duplicates or are not a part of the DGC published in the Rubin et al. paper. Please see our list Wells Identified with "Not DGC Clone".