TY - JOUR
T1 - A benchmarking study of individual somatic variant callers and voting-based ensembles for whole-exome sequencing
AU - Guille, Arnaud
AU - Adélaïde, José
AU - Finetti, Pascal
AU - Andre, Fabrice
AU - Birnbaum, Daniel
AU - Mamessier, Emilie
AU - Bertucci, François
AU - Chaffanet, Max
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - By identifying somatic mutations, whole-exome sequencing (WES) has become a technology of choice for the diagnosis and guiding treatment decisions in many cancers. Despite advances in the field of somatic variant detection and the emergence of sophisticated tools incorporating machine learning, accurately identifying somatic variants remains challenging. Each new somatic variant caller is often accompanied by claims of superior performance compared to predecessors. Furthermore, most comparative studies focus on a limited set of tools and reference datasets, leading to inconsistent results and making it difficult for laboratories to select the optimal solution. Our study comprehensively evaluated 20 somatic variant callers across four reference WES datasets. We subsequently assessed the performance of ensemble approaches by exploring all possible combinations of these callers, generating 8178 and 1013 combinations for single-nucleotide variants (SNVs) and indels, respectively, with varying voting thresholds. Our analysis identified five high-performing individual somatic variant callers: Muse, Mutect2, Dragen, TNScope, and NeuSomatic. For somatic SNVs, an ensemble combining LoFreq, Muse, Mutect2, SomaticSniper, Strelka, and Lancet outperformed the top-performing caller (Dragen) by >3.6% (mean F1 score = 0.927). Similarly, for somatic indels, an ensemble of Mutect2, Strelka, Varscan2, and Pindel outperformed the best individual caller (Neusomatic) by >3.5% (mean F1 score = 0.867). By considering the computational costs of each combination, we were able to identify an optimal solution involving four somatic variant callers, Muse, Mutect2, and Strelka for the SNVs and Mutect2, Strelka, and Varscan2 for the indels, enabling accurate and cost-effective somatic variant detection in whole exome.
AB - By identifying somatic mutations, whole-exome sequencing (WES) has become a technology of choice for the diagnosis and guiding treatment decisions in many cancers. Despite advances in the field of somatic variant detection and the emergence of sophisticated tools incorporating machine learning, accurately identifying somatic variants remains challenging. Each new somatic variant caller is often accompanied by claims of superior performance compared to predecessors. Furthermore, most comparative studies focus on a limited set of tools and reference datasets, leading to inconsistent results and making it difficult for laboratories to select the optimal solution. Our study comprehensively evaluated 20 somatic variant callers across four reference WES datasets. We subsequently assessed the performance of ensemble approaches by exploring all possible combinations of these callers, generating 8178 and 1013 combinations for single-nucleotide variants (SNVs) and indels, respectively, with varying voting thresholds. Our analysis identified five high-performing individual somatic variant callers: Muse, Mutect2, Dragen, TNScope, and NeuSomatic. For somatic SNVs, an ensemble combining LoFreq, Muse, Mutect2, SomaticSniper, Strelka, and Lancet outperformed the top-performing caller (Dragen) by >3.6% (mean F1 score = 0.927). Similarly, for somatic indels, an ensemble of Mutect2, Strelka, Varscan2, and Pindel outperformed the best individual caller (Neusomatic) by >3.5% (mean F1 score = 0.867). By considering the computational costs of each combination, we were able to identify an optimal solution involving four somatic variant callers, Muse, Mutect2, and Strelka for the SNVs and Mutect2, Strelka, and Varscan2 for the indels, enabling accurate and cost-effective somatic variant detection in whole exome.
KW - NGS
KW - benchmark
KW - combination
KW - ensemble
KW - somatic
KW - variant caller
KW - voting
UR - http://www.scopus.com/inward/record.url?scp=85215833029&partnerID=8YFLogxK
U2 - 10.1093/bib/bbae697
DO - 10.1093/bib/bbae697
M3 - Article
AN - SCOPUS:85215833029
SN - 1467-5463
VL - 26
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
IS - 1
M1 - bbae697
ER -