School of Informatics and Computing Menu

Sun Kim, PhD: Computational Methods for Making Genome Projects Affordable


The acquisition of genome sequence data is affordable because of recent advance in sequencing technology. However, analysis of genome data is not. To make genome projects affordable for individual research labs, it is important to develop computational tools that automate many steps of the genome project. In this talk, Kim Sun will present computational methods for two important steps in the genome project. The first step is solving the computational genome assembly validation problem. The most widely used strategy for genome sequencing is to determine sequences of randomly sampled, short DNA fragments and then use a computational tool to assemble longer genome sequences. The assembly process is complicated because of the repetitive sequences in the genome that often lead to the incorrect assembly of DNA fragments.  Validation of assembled contigs is quite difficult and often involves manual inspection of contigs. For years Dr. Sun’s group has been developing computational methods for assembly validation. In this talk, Dr. Sun presents their most recent work using a machine learning approach. The second step Dr. Sun will discuss concerns the validation of gene function annotation. Transferring the annotations of genes similar to a target gene is common practice and often results in incorrect annotations. Thus, manual inspection of annotations of “all” genes is required for a genome project, and this step usually takes from months to a year by a few scientists. To speed up the genome annotation process, we developed a novel scoring scheme for annotation quality, called the Annotation Confidence Score, using a genome comparison approach. The score is computed by combining sequence and textual annotation similarity using a modified version of a logistic curve. (Joint work with Justin Choi, Youngik Yang, Haixu Tang, and Don Gilbert.)


Sun Kim is Chair of Faculty Division C, Director of Center for Bioinformatics Research, an Associate Professor in School of Informatics and Computing; an Associated Faculty at Center for Genomics and Bioinformatics; and an Adjunct Associate Professor in the Medical Sciences Program at Indiana University, Bloomington. He also holds an international collaborating professorship of Bio and Brain Engineering and an adjunct professor of Computer Science at KAIST, Korea. Prior to joining Indiana University in 2001, he worked at DuPont Central Research as Senior Computer Scientist from 1998 to 2001, and at the University of Illinois at Urbana-Champaign from 1997 to 1998 as Director of Bioinformatics and Postdoctoral Fellow at the Biotechnology Center and a Visiting Assistant Professor of Animal Sciences.  Sun Kim is a recipient of Outstanding Junior Faculty Award at Indiana University in 2004, a US National Science Foundation CAREER Award DBI-0237901 from 2003 to 2008, an Achievement Award at DuPont Central Research in 2000, and a best poster award at the Beckman Institute Symposium on Bioinformatics, Structure, and Function, University of Illinois at Urbana-Champaign in 1997. He is actively contributing to the bioinformatics community, serving on the editorial board for journals including co-editor-in-chief for International Journal of Data Mining and Bioinformatics, serving as vice chair for education for the IEEE Computer Society Technical Committee on Bioinformatics. He has been co-organizing many scientific meetings including IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2008 as a program co-chair and 2009 as a conference co-chair. Sun Kim received a BS, MS, and Ph.D. in Computer Science from Seoul National University, Korea Advanced Institute of Science and Technology (KAIST), and the University of Iowa, respectively.