Sequence accuracy in primary databases: A case study on HIV-1B

Balaji Seetharaman, Akash Ramachandran, Krittika Nandy, Shapshak Paul

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

This chapter revisits the history of sequencing methods and their advancements. It mainly focuses on the accuracy of the deposited sequences in primary public databases. The source of errors, frequency, errors due to sequencing, and sequence assembly, and their quality are discussed. The quality of sequencing pipelines and error rates of the next-generation sequencing (NGS) data are reviewed. Some tools and techniques to overcome errors are also reviewed. Sequence uncertainties in primary public databases are addressed with reference to HIV-1B sequences. The sequence ambiguities are highlighted along with annotations based on the reference genome (HXB2). There are ambiguities in sequences produced by different sequencing technologies and it is very difficult to distinguish true variants from the errors. This alarms data collection efforts and inferences derived from error-prone DNA-sequencing technologies. Future studies should be cautious in handling such sequences especially on analyzing mutations to understand pathogenesis, drug resistance, and geographical variations.

Original languageEnglish
Title of host publicationGlobal Virology II - HIV and NeuroAIDS
PublisherSpringer New York
Pages779-822
Number of pages44
ISBN (Electronic)9781493972906
ISBN (Print)9781493972883
DOIs
Publication statusPublished - 01-01-2017

Fingerprint

HIV
Databases
Technology
DNA Sequence Analysis
Drug Resistance
Uncertainty
Research Design
History
Genome
Mutation

All Science Journal Classification (ASJC) codes

  • Medicine(all)
  • Immunology and Microbiology(all)
  • Neuroscience(all)

Cite this

Seetharaman, B., Ramachandran, A., Nandy, K., & Paul, S. (2017). Sequence accuracy in primary databases: A case study on HIV-1B. In Global Virology II - HIV and NeuroAIDS (pp. 779-822). Springer New York. https://doi.org/10.1007/978-1-4939-7290-6_32
Seetharaman, Balaji ; Ramachandran, Akash ; Nandy, Krittika ; Paul, Shapshak. / Sequence accuracy in primary databases : A case study on HIV-1B. Global Virology II - HIV and NeuroAIDS. Springer New York, 2017. pp. 779-822
@inbook{ea2bc670510f40a79d61bb96793f3f73,
title = "Sequence accuracy in primary databases: A case study on HIV-1B",
abstract = "This chapter revisits the history of sequencing methods and their advancements. It mainly focuses on the accuracy of the deposited sequences in primary public databases. The source of errors, frequency, errors due to sequencing, and sequence assembly, and their quality are discussed. The quality of sequencing pipelines and error rates of the next-generation sequencing (NGS) data are reviewed. Some tools and techniques to overcome errors are also reviewed. Sequence uncertainties in primary public databases are addressed with reference to HIV-1B sequences. The sequence ambiguities are highlighted along with annotations based on the reference genome (HXB2). There are ambiguities in sequences produced by different sequencing technologies and it is very difficult to distinguish true variants from the errors. This alarms data collection efforts and inferences derived from error-prone DNA-sequencing technologies. Future studies should be cautious in handling such sequences especially on analyzing mutations to understand pathogenesis, drug resistance, and geographical variations.",
author = "Balaji Seetharaman and Akash Ramachandran and Krittika Nandy and Shapshak Paul",
year = "2017",
month = "1",
day = "1",
doi = "10.1007/978-1-4939-7290-6_32",
language = "English",
isbn = "9781493972883",
pages = "779--822",
booktitle = "Global Virology II - HIV and NeuroAIDS",
publisher = "Springer New York",
address = "United States",

}

Seetharaman, B, Ramachandran, A, Nandy, K & Paul, S 2017, Sequence accuracy in primary databases: A case study on HIV-1B. in Global Virology II - HIV and NeuroAIDS. Springer New York, pp. 779-822. https://doi.org/10.1007/978-1-4939-7290-6_32

Sequence accuracy in primary databases : A case study on HIV-1B. / Seetharaman, Balaji; Ramachandran, Akash; Nandy, Krittika; Paul, Shapshak.

Global Virology II - HIV and NeuroAIDS. Springer New York, 2017. p. 779-822.

Research output: Chapter in Book/Report/Conference proceedingChapter

TY - CHAP

T1 - Sequence accuracy in primary databases

T2 - A case study on HIV-1B

AU - Seetharaman, Balaji

AU - Ramachandran, Akash

AU - Nandy, Krittika

AU - Paul, Shapshak

PY - 2017/1/1

Y1 - 2017/1/1

N2 - This chapter revisits the history of sequencing methods and their advancements. It mainly focuses on the accuracy of the deposited sequences in primary public databases. The source of errors, frequency, errors due to sequencing, and sequence assembly, and their quality are discussed. The quality of sequencing pipelines and error rates of the next-generation sequencing (NGS) data are reviewed. Some tools and techniques to overcome errors are also reviewed. Sequence uncertainties in primary public databases are addressed with reference to HIV-1B sequences. The sequence ambiguities are highlighted along with annotations based on the reference genome (HXB2). There are ambiguities in sequences produced by different sequencing technologies and it is very difficult to distinguish true variants from the errors. This alarms data collection efforts and inferences derived from error-prone DNA-sequencing technologies. Future studies should be cautious in handling such sequences especially on analyzing mutations to understand pathogenesis, drug resistance, and geographical variations.

AB - This chapter revisits the history of sequencing methods and their advancements. It mainly focuses on the accuracy of the deposited sequences in primary public databases. The source of errors, frequency, errors due to sequencing, and sequence assembly, and their quality are discussed. The quality of sequencing pipelines and error rates of the next-generation sequencing (NGS) data are reviewed. Some tools and techniques to overcome errors are also reviewed. Sequence uncertainties in primary public databases are addressed with reference to HIV-1B sequences. The sequence ambiguities are highlighted along with annotations based on the reference genome (HXB2). There are ambiguities in sequences produced by different sequencing technologies and it is very difficult to distinguish true variants from the errors. This alarms data collection efforts and inferences derived from error-prone DNA-sequencing technologies. Future studies should be cautious in handling such sequences especially on analyzing mutations to understand pathogenesis, drug resistance, and geographical variations.

UR - http://www.scopus.com/inward/record.url?scp=85042603897&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85042603897&partnerID=8YFLogxK

U2 - 10.1007/978-1-4939-7290-6_32

DO - 10.1007/978-1-4939-7290-6_32

M3 - Chapter

AN - SCOPUS:85042603897

SN - 9781493972883

SP - 779

EP - 822

BT - Global Virology II - HIV and NeuroAIDS

PB - Springer New York

ER -

Seetharaman B, Ramachandran A, Nandy K, Paul S. Sequence accuracy in primary databases: A case study on HIV-1B. In Global Virology II - HIV and NeuroAIDS. Springer New York. 2017. p. 779-822 https://doi.org/10.1007/978-1-4939-7290-6_32