This chapter revisits the history of sequencing methods and their advancements. It mainly focuses on the accuracy of the deposited sequences in primary public databases. The source of errors, frequency, errors due to sequencing, and sequence assembly, and their quality are discussed. The quality of sequencing pipelines and error rates of the next-generation sequencing (NGS) data are reviewed. Some tools and techniques to overcome errors are also reviewed. Sequence uncertainties in primary public databases are addressed with reference to HIV-1B sequences. The sequence ambiguities are highlighted along with annotations based on the reference genome (HXB2). There are ambiguities in sequences produced by different sequencing technologies and it is very difficult to distinguish true variants from the errors. This alarms data collection efforts and inferences derived from error-prone DNA-sequencing technologies. Future studies should be cautious in handling such sequences especially on analyzing mutations to understand pathogenesis, drug resistance, and geographical variations.
All Science Journal Classification (ASJC) codes
- Immunology and Microbiology(all)