This directory contains real splice sites and faked splice sites, where the faked splice sites were selected from a window of +/- 40bp around the actual splice sites 5' and 3' splice sites. This data is from ftp://www-hgc.lbl.gov/pub/genesets/Drosophila/multi_exon_GB.sets. This data set was created to build different splice site models. 3faked_100_100.fa.gz 3' faked "AG" sites 3real_100_100.fa.gz 3' real splice site 5faked_100_100.fa.gz 5' faked "GT" sites 5real_100_100.fa.gz 5' real splice site Both 5' and 3' splice site data sets have 100bp of the exon/intron and 100bp of the following intron/exon. Addition splice site were extracted the same way from ftp://www-hgc.lbl.gov/pub/genesets/Drosophila/multi_exon_GB_all.sets. Add3faked_100_100.fa.gz 3' faked "AG" sites Add3real_100_100.fa.gz 3' real splice site Add5faked_100_100.fa.gz 5' faked "GT" sites Add5real_100_100.fa.gz 5' real splice site =================== Martin Reese, 10jul99 mgreese@lbl.gov