Engineering issues in annotation
What public sequence data sets are needed?
- What are the mechanics of obtaining public sequence databases?
- Are curated data sets available or do you need to set up a means of maintaining your own (for repeats, insertions, organism of interest)
How do you achieve computational throughput?
- Workstation farm, or simply a big, powerful box?
- Job flow control
What do you do with the results?
- Homogenize results into single format?
- Filter results for significance and redundancy