Whole genome prediction of bladder cancer risk with the bayesian LASSO

Evangelina López De Maturana, Stephen J. Chanok, Antoni C. Picornell, Nathaniel Rothman, Jesús Herranz, M. Luz Calle, Montserrat García-Closas, Gaëlle Marenne, Angela Brand, Adonina Tardón, Alfredo Carrato, Debra T. Silverman, Manolis Kogevinas, Daniel Gianola, Francisco X. Real, Núria Malats

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

To build a predictive model for urothelial carcinoma of the bladder (UCB) risk combining both genomic and nongenomic data, 1,127 cases and 1,090 controls from the Spanish Bladder Cancer/EPICURO study were genotyped using the HumanHap 1M SNP array. After quality control filters, genotypes from 475,290 variants were available. Nongenomic information comprised age, gender, region, and smoking status. Three Bayesian threshold models were implemented including: (1) only genomic information, (2) only nongenomic data, and (3) both sources of information. The three models were applied to the whole population, to only nonsmokers, to male smokers, and to extreme phenotypes to potentiate the UCB genetic component. The area under the ROC curve allowed evaluating the predictive ability of each model in a 10-fold cross-validation scenario. Smoking status showed the highest predictive ability of UCB risk (AUCtest = 0.62). On the other hand, the AUC of all genetic variants was poorer (0.53). When the extreme phenotype approach was applied, the predictive ability of the genomic model improved 15%. This study represents a first attempt to build a predictive model for UCB risk combining both genomic and nongenomic data and applying state-of-the-art statistical approaches. However, the lack of genetic relatedness among individuals, the complexity of UCB etiology, as well as a relatively small statistical power, may explain the low predictive ability for UCB risk. The study confirms the difficulty of predicting complex diseases using genetic data, and suggests the limited translational potential of findings from this type of data into public health interventions.

Original languageEnglish
Pages (from-to)467-476
Number of pages10
JournalGenetic Epidemiology
Volume38
Issue number5
DOIs
Publication statusPublished - 2014

Fingerprint

Urinary Bladder Neoplasms
Urinary Bladder
Aptitude
Genome
Carcinoma
Area Under Curve
Smoking
Phenotype
Inborn Genetic Diseases
ROC Curve
Quality Control
Single Nucleotide Polymorphism
Public Health
Genotype
Population

All Science Journal Classification (ASJC) codes

  • Epidemiology
  • Genetics(clinical)

Cite this

De Maturana, E. L., Chanok, S. J., Picornell, A. C., Rothman, N., Herranz, J., Calle, M. L., ... Malats, N. (2014). Whole genome prediction of bladder cancer risk with the bayesian LASSO. Genetic Epidemiology, 38(5), 467-476. https://doi.org/10.1002/gepi.21809
De Maturana, Evangelina López ; Chanok, Stephen J. ; Picornell, Antoni C. ; Rothman, Nathaniel ; Herranz, Jesús ; Calle, M. Luz ; García-Closas, Montserrat ; Marenne, Gaëlle ; Brand, Angela ; Tardón, Adonina ; Carrato, Alfredo ; Silverman, Debra T. ; Kogevinas, Manolis ; Gianola, Daniel ; Real, Francisco X. ; Malats, Núria. / Whole genome prediction of bladder cancer risk with the bayesian LASSO. In: Genetic Epidemiology. 2014 ; Vol. 38, No. 5. pp. 467-476.
@article{a3a98edf5df74fc18b91f90ce97c41c2,
title = "Whole genome prediction of bladder cancer risk with the bayesian LASSO",
abstract = "To build a predictive model for urothelial carcinoma of the bladder (UCB) risk combining both genomic and nongenomic data, 1,127 cases and 1,090 controls from the Spanish Bladder Cancer/EPICURO study were genotyped using the HumanHap 1M SNP array. After quality control filters, genotypes from 475,290 variants were available. Nongenomic information comprised age, gender, region, and smoking status. Three Bayesian threshold models were implemented including: (1) only genomic information, (2) only nongenomic data, and (3) both sources of information. The three models were applied to the whole population, to only nonsmokers, to male smokers, and to extreme phenotypes to potentiate the UCB genetic component. The area under the ROC curve allowed evaluating the predictive ability of each model in a 10-fold cross-validation scenario. Smoking status showed the highest predictive ability of UCB risk (AUCtest = 0.62). On the other hand, the AUC of all genetic variants was poorer (0.53). When the extreme phenotype approach was applied, the predictive ability of the genomic model improved 15{\%}. This study represents a first attempt to build a predictive model for UCB risk combining both genomic and nongenomic data and applying state-of-the-art statistical approaches. However, the lack of genetic relatedness among individuals, the complexity of UCB etiology, as well as a relatively small statistical power, may explain the low predictive ability for UCB risk. The study confirms the difficulty of predicting complex diseases using genetic data, and suggests the limited translational potential of findings from this type of data into public health interventions.",
author = "{De Maturana}, {Evangelina L{\'o}pez} and Chanok, {Stephen J.} and Picornell, {Antoni C.} and Nathaniel Rothman and Jes{\'u}s Herranz and Calle, {M. Luz} and Montserrat Garc{\'i}a-Closas and Ga{\"e}lle Marenne and Angela Brand and Adonina Tard{\'o}n and Alfredo Carrato and Silverman, {Debra T.} and Manolis Kogevinas and Daniel Gianola and Real, {Francisco X.} and N{\'u}ria Malats",
year = "2014",
doi = "10.1002/gepi.21809",
language = "English",
volume = "38",
pages = "467--476",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "5",

}

De Maturana, EL, Chanok, SJ, Picornell, AC, Rothman, N, Herranz, J, Calle, ML, García-Closas, M, Marenne, G, Brand, A, Tardón, A, Carrato, A, Silverman, DT, Kogevinas, M, Gianola, D, Real, FX & Malats, N 2014, 'Whole genome prediction of bladder cancer risk with the bayesian LASSO', Genetic Epidemiology, vol. 38, no. 5, pp. 467-476. https://doi.org/10.1002/gepi.21809

Whole genome prediction of bladder cancer risk with the bayesian LASSO. / De Maturana, Evangelina López; Chanok, Stephen J.; Picornell, Antoni C.; Rothman, Nathaniel; Herranz, Jesús; Calle, M. Luz; García-Closas, Montserrat; Marenne, Gaëlle; Brand, Angela; Tardón, Adonina; Carrato, Alfredo; Silverman, Debra T.; Kogevinas, Manolis; Gianola, Daniel; Real, Francisco X.; Malats, Núria.

In: Genetic Epidemiology, Vol. 38, No. 5, 2014, p. 467-476.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Whole genome prediction of bladder cancer risk with the bayesian LASSO

AU - De Maturana, Evangelina López

AU - Chanok, Stephen J.

AU - Picornell, Antoni C.

AU - Rothman, Nathaniel

AU - Herranz, Jesús

AU - Calle, M. Luz

AU - García-Closas, Montserrat

AU - Marenne, Gaëlle

AU - Brand, Angela

AU - Tardón, Adonina

AU - Carrato, Alfredo

AU - Silverman, Debra T.

AU - Kogevinas, Manolis

AU - Gianola, Daniel

AU - Real, Francisco X.

AU - Malats, Núria

PY - 2014

Y1 - 2014

N2 - To build a predictive model for urothelial carcinoma of the bladder (UCB) risk combining both genomic and nongenomic data, 1,127 cases and 1,090 controls from the Spanish Bladder Cancer/EPICURO study were genotyped using the HumanHap 1M SNP array. After quality control filters, genotypes from 475,290 variants were available. Nongenomic information comprised age, gender, region, and smoking status. Three Bayesian threshold models were implemented including: (1) only genomic information, (2) only nongenomic data, and (3) both sources of information. The three models were applied to the whole population, to only nonsmokers, to male smokers, and to extreme phenotypes to potentiate the UCB genetic component. The area under the ROC curve allowed evaluating the predictive ability of each model in a 10-fold cross-validation scenario. Smoking status showed the highest predictive ability of UCB risk (AUCtest = 0.62). On the other hand, the AUC of all genetic variants was poorer (0.53). When the extreme phenotype approach was applied, the predictive ability of the genomic model improved 15%. This study represents a first attempt to build a predictive model for UCB risk combining both genomic and nongenomic data and applying state-of-the-art statistical approaches. However, the lack of genetic relatedness among individuals, the complexity of UCB etiology, as well as a relatively small statistical power, may explain the low predictive ability for UCB risk. The study confirms the difficulty of predicting complex diseases using genetic data, and suggests the limited translational potential of findings from this type of data into public health interventions.

AB - To build a predictive model for urothelial carcinoma of the bladder (UCB) risk combining both genomic and nongenomic data, 1,127 cases and 1,090 controls from the Spanish Bladder Cancer/EPICURO study were genotyped using the HumanHap 1M SNP array. After quality control filters, genotypes from 475,290 variants were available. Nongenomic information comprised age, gender, region, and smoking status. Three Bayesian threshold models were implemented including: (1) only genomic information, (2) only nongenomic data, and (3) both sources of information. The three models were applied to the whole population, to only nonsmokers, to male smokers, and to extreme phenotypes to potentiate the UCB genetic component. The area under the ROC curve allowed evaluating the predictive ability of each model in a 10-fold cross-validation scenario. Smoking status showed the highest predictive ability of UCB risk (AUCtest = 0.62). On the other hand, the AUC of all genetic variants was poorer (0.53). When the extreme phenotype approach was applied, the predictive ability of the genomic model improved 15%. This study represents a first attempt to build a predictive model for UCB risk combining both genomic and nongenomic data and applying state-of-the-art statistical approaches. However, the lack of genetic relatedness among individuals, the complexity of UCB etiology, as well as a relatively small statistical power, may explain the low predictive ability for UCB risk. The study confirms the difficulty of predicting complex diseases using genetic data, and suggests the limited translational potential of findings from this type of data into public health interventions.

UR - http://www.scopus.com/inward/record.url?scp=84902986027&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84902986027&partnerID=8YFLogxK

U2 - 10.1002/gepi.21809

DO - 10.1002/gepi.21809

M3 - Article

VL - 38

SP - 467

EP - 476

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - 5

ER -

De Maturana EL, Chanok SJ, Picornell AC, Rothman N, Herranz J, Calle ML et al. Whole genome prediction of bladder cancer risk with the bayesian LASSO. Genetic Epidemiology. 2014;38(5):467-476. https://doi.org/10.1002/gepi.21809