INFO-I 415 Introduction to Data Analytics for Informatics
Prerequisites: ECON E270 or PBHL B300 or PSY B305 or SPEA K300 or STAT 30100 or STAT 35000
This course applies statistical learning methods for data mining and inferential and predictive analytics to informatics-related fields. The course also covers techniques for exploring and visualizing data, assessing model accuracy, and weighing the merits of different methods for a given real-world application. This course is an essential toolset for transforming large, complex informatics datasets into actionable knowledge.
- Analyze datasets with the following supervised learning methods: for functional approximation, multiple linear regression, splines, and local regression; for classification, logistic regression, linear discriminant analysis, decision trees, bagging, random forests, and boosting, and support vector machines.
- Analyze datasets with the following unsupervised learning methods: for dimensionality reduction, principal components analysis; for grouping, k-means clustering and hierarchical clustering.
- Explore, transform, and visualize large, complex datasets with graphs in R.
- Solve real-world problems by adapting and applying statistical learning methods to large, complex datasets.
- Identify and select appropriately among statistical learning methods for a particular realworld problem; analyze each method with respect to a given dataset or research question in terms of modeling accuracy and the biasvariance tradeoff; perform model assessment (i.e., estimate test error rates) and selection by resampling: crossvalidation and bootstrapping; identify overfitting and underfitting; perform model selection and regularization by subset selection and shrinkage methods: ridge regression and Lasso; explain the relative advantages and disadvantages of each statistical learning method for the real-world problem.
- Write programs to perform data analytics on large, complex datasets in R.
- Analyze data from case studies in informaticsrelated fields (e.g., digital media, humancomputer interaction, health informatics, bioinformatics, and business intelligence).