Yan receives NSF grant to develop novel tools for mining complex biological data
July 25, 2018
Jingwen Yan, assistant professor of bioinformatics in the BioHealth Informatics Department, has been awarded a two-year, $174,831 CISE Research Initiation Initiative (CRII) grant by the National Science Foundation. Yan is principal investigator and will work with several bioinformatics Ph.D. students on the project.
Multi-omic data and systems biology networks—genome, transcriptome, proteome, and metabolome—hold the potential for discovering new disease mechanisms and markers for designing new drugs and treatment protocols. However, the large scale and complexity has made it very difficult to harness these data sets. Yan proposes to develop a novel, data-intensive computational framework to analyze and integrate the data and networks, create predictive models, and reveal their associations.
“Instead of exploring individual genetic markers followed by enrichment analysis, our new model allows researchers to directly detect disease-related genes with close functional connections in one step. Therefore, it has a great potential to yield novel discoveries in disease research,” Yan said.
Creating powerful, advanced tools
Yan’s project, entitled “Computational Methods to Mine Multi-omic Data for Systems Biology of Complex Diseases,” is expected to result in new methods and tools that effectively integrate heterogeneous, high-dimensional -omic data. Strategic extraction of knowledge in this manner will pave the way for further understanding of biological mechanisms and provide a valuable resource for disease research, with far-reaching economic and societal impacts.
The knowledge extracted can also suggest surrogate biomarkers for therapeutic trials to design new drugs and treatment protocols.
The team will first develop scalable and efficient algorithms to solve the model, and then apply and evaluate the models using a real biomedical data set, the NIH Alzheimer’s Disease Neuroimaging Initative (ADNI) cohort. They will analyze genome-wide association study (GWAS) data, microarray and protein expression data from cerebrospinal fluid and blood, metabolite concentration, cognitive performance, and brain imaging measures.
Specifically, Yan’s project will address data-intensive analysis issues of scalability, multi-source data integration, and network analysis that have challenged conventional methodologies. Yan’s approach—integrating large-scale machine learning, network science, and data-intensive computing—offers an innovative solution. “Integrative -omics holds great promise to illuminate the causal pathways from genotype to phenotype,” she says.
This material is based upon work supported by the National Science Foundation under Grant No. 1755836. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.