Human-Centered Computing Department Menu

Ph.D. in Data Science

The Ph.D. in Data Science, prepares graduates to develop and evaluate novel approaches to collecting, organizing, managing, and extracting knowledge and insights from massive, complex, distributed, heterogeneous datasets. Graduates will learn to define and investigate relevant research problems in data science.

Finding Novel Solutions to Data Research Problems

The program prepares students to address data research problems with inventive and creative solutions that generate new knowledge through studies that demonstrate a high degree of intellectual merit and the potential for broader impact. The Ph.D. curriculum also prepares students to make research contributions that advance the theory and practice of data science.

The program hones students’ ability to:

  • Define, create, adapt, and apply rigorous research methods
  • Communicate research findings effectively to peers through scholarly, peer-reviewed publications that appear in international venues
  • Define, conduct, and manage a research project that involves several people and interdisciplinary expertise
  • Contribute to writing research grant proposals aimed at securing external funding to support research activities
  • Understand and address ethical and professional issues related to their research, including approval processes and certification for human-subject research

Deep technical skills and the ability to formulate and test hypotheses using massive and heterogeneous data provide the foundation for graduates who can become successful researchers either in academic settings or in industrial research and development laboratories.

Careers in Data Science

The degree leads to positions within academia that include research, research support, and tenure-track positions in major universities.

Positions in industry include:

  • Data scientist
  • Director of research
  • Senior data analyst
  • Strategic innovation manager

Learn how to apply

Plan of Study

Data Science Core (24 cr.)

Methods Courses (18 cr.)

May include up to 6 credit hours of INFO-I 790 Informatics Research Rotation.

Specialization (18 cr.)

  1. Disciplinary Affinities (0–6 cr.)
  2. Minor (12–18 cr.)

The student must complete a minor within a domain appropriate to the chosen specialization and/or research area. All courses must be graduate-level and taken outside the Data Science program.

Qualifying Examination, Written and Oral

A student must successfully complete a written and oral qualifying examination before the fifth semester of the program. The written exam has a breadth part and a depth part. The breadth part covers the program’s core courses. The depth part additionally covers material from the student’s research.

The oral exam takes place shortly after the student passes the written exam. The oral exam is based on the student’s response to the written exam and the core courses. The both the written and oral exams are prepared and evaluated by faculty in the school who are familiar with the content of the core courses.

The student must pass both the written exam and the oral exam before advancing to candidacy. The student may retake once either the written exam or oral exam, but not both, if they do not pass that part on the first attempt. For further details, consult with the data science program director.


Dissertation (30 cr.)

A dissertation is a written elaboration of original research that makes creative contributions to the student’s chosen area of specialization. The student will enroll multiple times in INFO I890 Thesis Readings and Research (1-12 cr.) while completing the dissertation. All requirements must be completed within seven years of passing the qualifying exams. The dissertation process includes the following components:

  • Proposal: This is an in-depth oral review undertaken by students who have made significant progress in their research. The proposal will be defended at a public colloquium. The student must complete the proposal within one year of passing the qualifying exams.
  • Defense: The student must defend his or her dissertation in an open seminar scheduled when doctoral research is almost complete.

Please refer to the IUPUI Graduate School Bulletin for more details on the dissertation process.

Learning Outcomes

Students will demonstrate competency in research:

  • Critically evaluate the published scholarly record.
  • Critically apply the theories and methodologies of data science to new research in their primary area of study.
  • Apply appropriate principles, frameworks, and models to evaluate and interpret the frontiers of knowledge in their primary area of study.
  • Demonstrate expository and oral communication skills appropriate to a Ph.D., publishing and presenting work in their field.
  • Critique data practices for ethical issues, including discriminatory practices, power imbalances, and invasions of privacy.
  • Demonstrate advanced competency in data science tools and techniques, applied statistical analysis, and a domain area relevant to their area of specialization.
  • Develop a record of relevant scholarship.
  • Demonstrate an ability to conduct independent, original research with a depth of knowledge in the chosen area of specialization.

Students will demonstrate competency in data analytics:

  • Design and execute ethical research using quantitative and experimental methods.
  • Organize, visualize, and analyze large, complex datasets using descriptive statistics and graphs to make decisions.
  • Apply inferential statistics, predictive analytics, and data mining to informatics-related fields.
  • Analyze datasets with supervised learning methods for functional approximation, classification, and forecasting and unsupervised learning methods for dimensionality reduction and clustering.
  • Identify, assess, and select appropriately among data analytics methods and models for solving a particular real-world problem, weighing their advantages and disadvantages.
  • Write programs to perform data analytics on large, complex datasets.

Students will demonstrate competency in data management and infrastructure:

  • Design and implement relational databases using commercial database management systems according to database concepts and theory.
  • Diagram a relational database design based on an identified scenario.
  • Produce database queries using SQL.
  • Perform database administration tasks.
  • Describe the data management activities associated with the data lifecycle.
  • Overcome difficulties in managing very large datasets, both structured and unstructured, using nonrelational data storage and retrieval (NoSQL), parallel algorithms, and cloud computing.
  • Apply the MapReduce programming model to data-driven discovery and scalable data processing for scientific applications.