TY - JOUR
T1 - Integrating multiomics and prior knowledge
T2 - a study of the Graphnet penalty impact
AU - Chegraoui, Hamza
AU - Guillemot, Vincent
AU - Rebei, Amine
AU - Gloaguen, Arnaud
AU - Grill, Jacques
AU - Philippe, Cathy
AU - Frouin, Vincent
N1 - Publisher Copyright:
VC The Author(s) 2023. Published by Oxford University Press.
PY - 2023/8/1
Y1 - 2023/8/1
N2 - Motivation: In the field of oncology, statistical models are used for the discovery of candidate factors that influence the development of the pathology or its outcome. These statistical models can be designed in a multiblock framework to study the relationship between different multiomic data, and variable selection is often achieved by imposing constraints on the model parameters. A priori graph constraints have been used in the literature as a way to improve feature selection in the model, yielding more interpretability. However, it is still unclear how these graphs interact with the models and how they impact the feature selection. Additionally, with the availability of different graphs encoding different information, one can wonder how the choice of the graph meaningfully impacts the results obtained. Results: We proposed to study the graph penalty impact on a multiblock model. Specifically, we used the SGCCA as the multiblock framework. We studied the effect of the penalty on the model using the TCGA-LGG dataset. Our findings are 3-fold. We showed that the graph penalty increases the number of selected genes from this dataset, while selecting genes already identified in other works as pertinent biomarkers in the pathology. We demonstrated that using different graphs leads to different though consistent results, but that graph density is the main factor influencing the obtained results. Finally, we showed that the graph penalty increases the performance of the survival prediction from the model-derived components and the interpretability of the results. Availability and implementation: Source code is freely available at https://github.com/neurospin/netSGCCA
AB - Motivation: In the field of oncology, statistical models are used for the discovery of candidate factors that influence the development of the pathology or its outcome. These statistical models can be designed in a multiblock framework to study the relationship between different multiomic data, and variable selection is often achieved by imposing constraints on the model parameters. A priori graph constraints have been used in the literature as a way to improve feature selection in the model, yielding more interpretability. However, it is still unclear how these graphs interact with the models and how they impact the feature selection. Additionally, with the availability of different graphs encoding different information, one can wonder how the choice of the graph meaningfully impacts the results obtained. Results: We proposed to study the graph penalty impact on a multiblock model. Specifically, we used the SGCCA as the multiblock framework. We studied the effect of the penalty on the model using the TCGA-LGG dataset. Our findings are 3-fold. We showed that the graph penalty increases the number of selected genes from this dataset, while selecting genes already identified in other works as pertinent biomarkers in the pathology. We demonstrated that using different graphs leads to different though consistent results, but that graph density is the main factor influencing the obtained results. Finally, we showed that the graph penalty increases the performance of the survival prediction from the model-derived components and the interpretability of the results. Availability and implementation: Source code is freely available at https://github.com/neurospin/netSGCCA
UR - http://www.scopus.com/inward/record.url?scp=85166700385&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btad454
DO - 10.1093/bioinformatics/btad454
M3 - Article
C2 - 37490467
AN - SCOPUS:85166700385
SN - 1367-4803
VL - 39
JO - Bioinformatics
JF - Bioinformatics
IS - 8
M1 - btad454
ER -