Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines

01 Pubblicazione su rivista
Ellrott Kyle, Bailey Matthew H., Saksena Gordon, Covington Kyle R., Kandoth Cyriac, Stewart Chip, Hess Julian, Ma Singer, Chiotti Kami E., Mclellan Michael, Sofia Heidi J., Hutter Carolyn, Getz Gad, Wheeler David, Ding Li, Caesar-Johnson Samantha J., Demchok John A., Felau Ina, Kasapi Melpomeni, Ferguson Martin L., Hutter Carolyn M., Sofia Heidi J., Tarnuzzer Roy, Wang Zhining, Yang Liming, Zenklusen Jean C., Zhang Jiashan (Julia), Chudamani Sudha, Liu Jia, Lolla Laxmi, Naresh Rashi, Pihl Todd, Sun Qiang, Wan Yunhu, Wu Ye, Cho Juok, Defreitas Timothy, Frazer Scott, Gehlenborg Nils, Getz Gad, Heiman David I., Kim Jaegil, Lawrence Michael S., Lin Pei, Meier Sam, Noble Michael S., Saksena Gordon, Voet Doug, Zhang Hailei, Bernard Brady, Chambwe Nyasha, Dhankani Varsha, Knijnenburg Theo, Kramer Roger, Leinonen Kalle, Liu Yuexin, Miller Michael, Reynolds Sheila, Shmulevich Ilya, Thorsson Vesteinn, Zhang Wei, Akbani Rehan, Broom Bradley M., Hegde Apurva M., Ju Zhenlin, Kanchi Rupa S., Korkut Anil, Li Jun, Liang Han, Ling Shiyun, Liu Wenbin, Lu Yiling, Mills Gordon B., Ng Kwok-Shing, Rao Arvind, Ryan Michael, Wang Jing, Weinstein John N., Zhang Jiexin, Abeshouse Adam, Armenia Joshua, Chakravarty Debyani, Chatila Walid K., de Bruijn Ino, Gao Jianjiong, Gross Benjamin E., Heins Zachary J., Kundra Ritika, La Konnor, Ladanyi Marc, Luna Augustin, Nissan Moriah G., Ochoa Angelica, Phillips Sarah M., Reznik Ed, Sanchez-Vega Francisco, Sander Chris, Schultz Nikolaus, Sheridan Robert, Sumer S. Onur, Sun Yichao, Taylor Barry S., Wang Jioajiao, Zhang Hongxin, Anur Pavana, Peto Myron, Spellman Paul, Benz Christopher, Stuart Joshua M., Wong Christopher K., Yau Christina, Hayes D. Neil, Parker Null, Wilkerson Matthew D., Ally Adrian, Balasundaram Miruna, Bowlby Reanne, Brooks Denise, Carlsen Rebecca, Chuah Eric, Dhalla Noreen, Holt Robert, Jones Steven J. M., Kasaian Katayoon, Lee Darlene, Ma Yussanne, Marra Marco A., Mayo Michael, Moore Richard A., Mungall Andrew J., Mungall Karen, Robertson A. Gordon, Sadeghi Sara, Schein Jacqueline E., Sipahimalani Payal, Tam Angela, Thiessen Nina, Tse Kane, Wong Tina, Berger Ashton C., Beroukhim Rameen, Cherniack Andrew D., Cibulskis Carrie, Gabriel Stacey B., Gao Galen F., Ha Gavin, Meyerson Matthew, Schumacher Steven E., Shih Juliann, Kucherlapati Melanie H., Kucherlapati Raju S., Baylin Stephen, Cope Leslie, Danilova Ludmila, Bootwalla Moiz S., Lai Phillip H., Maglinte Dennis T., Van Den Berg David J., Weisenberger Daniel J., Auman J. Todd, Balu Saianand, Bodenheimer Tom, Fan Cheng, Hoadley Katherine A., Hoyle Alan P., Jefferys Stuart R., Jones Corbin D., Meng Shaowu, Mieczkowski Piotr A., Mose Lisle E., Perou Amy H., Perou Charles M., Roach Jeffrey, Shi Yan, Simons Janae V., Skelly Tara, Soloway Matthew G., Tan Donghui, Veluvolu Umadevi, Fan Huihui, Hinoue Toshinori, Laird Peter W., Shen Hui, Zhou Wanding, Bellair Michelle, Chang Kyle, Covington Kyle, Creighton Chad J., Dinh Huyen, Doddapaneni Harshavardhan, Donehower Lawrence A., Drummond Jennifer, Gibbs Richard A., Glenn Robert, Hale Walker, Han Yi, Hu Jianhong, Korchina Viktoriya, Lee Sandra, Lewis Lora, Li Wei, Liu Xiuping, Morgan Margaret, Morton Donna, Muzny Donna, Santibanez Jireh, Sheth Margi, Shinbrot Eve, Wang Linghua, Wang Min, Wheeler David A., Xi Liu, Zhao Fengmei, Hess Julian, Appelbaum Elizabeth L., Bailey Matthew, Cordes Matthew G., Ding Li, Fronick Catrina C., Fulton Lucinda A., Fulton Robert S., Kandoth Cyriac, Mardis Elaine R., Mclellan Michael D., Miller Christopher A., Schmidt Heather K., Wilson Richard K., Crain Daniel, Curley Erin, Gardner Johanna, Lau Kevin, Mallery David, Morris Scott, Paulauskis Joseph, Penny Robert, Shelton Candace, Shelton Troy, Sherman Mark, Thompson Eric, Yena Peggy, Bowen J
ISSN: 2405-4712

The Cancer Genome Atlas (TCGA) cancer genomicsdataset includes over 10,000 tumor-normal exomepairs across 33 different cancer types, in total >400TB of raw data files requiring analysis. Here wedescribe the Multi-Center Mutation Calling in Multi-ple Cancers project, our effort to generate a compre-hensive encyclopedia of somatic mutation calls forthe TCGA data to enable robust cross-tumor-typeanalyses. Our approach accounts for varianceand batch effects introduced by the rapid advance-ment of DNA extraction, hybridization-capture,sequencing, and analysis methods over time. Wepresent best practices for applying an ensemble ofseven mutation-calling algorithms with scoring andartifact filtering. The dataset created by this analysisincludes 3.5 million somatic variants and forms thebasis for PanCan Atlas papers. The results havebeen made available to the research communityalong with the methods used to generate them.This project is the result of collaboration from a num-ber of institutes and demonstrates how team sciencedrives extremely large genomics projects.

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma