INFO-I 415 Introduction to Data Analytics for Informatics
Prerequisites: PBHL-B 302 (or other approved statistics)
This course applies statistical learning methods for data mining and inferential and predictive analytics to informatics-related fields. The course also covers techniques for exploring and visualizing data, assessing model accuracy, and weighing the merits of different methods for a given real-world application. This course is an essential toolset for transforming large, complex informatics datasets into actionable knowledge.
- Analyze datasets with the following supervised learning methods: for functional approximation, multiple linear regression, splines, and local regression; for classification, logistic regression, linear discriminant analysis, decision trees, bagging, random forests, and boosting, and support vector machines.
- Analyze datasets with the following unsupervised learning methods: for dimensionality reduction, principal components analysis; for grouping, k-means clustering and hierarchical clustering.
- Explore, transform, and visualize large, complex datasets with graphs in R.
- Solve real-world problems by adapting and applying statistical learning methods to large, complex datasets.
- Identify and select appropriately among statistical learning methods for a particular realworld problem; analyze each method with respect to a given dataset or research question in terms of modeling accuracy and the biasvariance tradeoff; perform model assessment (i.e., estimate test error rates) and selection by resampling: crossvalidation and bootstrapping; identify overfitting and underfitting; perform model selection and regularization by subset selection and shrinkage methods: ridge regression and Lasso; explain the relative advantages and disadvantages of each statistical learning method for the real-world problem.
- Write programs to perform data analytics on large, complex datasets in R.
- Analyze data from case studies in informaticsrelated fields (e.g., digital media, humancomputer interaction, health informatics, bioinformatics, and business intelligence).
This course is not being offered this semester.