Brief introduction to Statistical Learning: Regression versus Classification; Linear Regression: simple and multiple Linear Regression; Classification: Logistic Regression, Discriminant Analysis; Resampling Methods: Cross-Validation, the Bootstrap; Regularization: Subset Selection, Ridge Regression, the Lasso, Principle Components and Partial Least Squares Regression; Nonlinear Models: Polynomial; Splines; Generalized Additive Models; Tree-Based Models: Decision Trees, Random Forest, Boosting; Support Vector Machines; Unsupervised Learning: Principle Component Analysis, Clustering Methods.

For further information see the academic catalog: IAM557

Course Objectives

At the end of the course, the student will learn:

  • the fundamentals of Statistical Learning, regression and classification
  • linear and nonlinear regressions including splines
  • Generalised Additive Models for both regression and classification problems
  • regularisation techniques including Ridge regression and the Lasso
  • the tree-based methods for regression and classification
  • Support Vector Machine which is highly appreciated among Data Science and Machine Learning Community
  • the difference between supervised and unsupervised learning methods

Course Learning Outcomes

Student, who passed the course satisfactorily will be able to:

  • present the data and its descriptive analysis
  • distinguish between regression and classification problems
  • apply regression or classification algorithms to solve related problems
  • code their own algorithms for specific applications in Statistical and Machine Learning
  • understand the fundamentals of Support Vector Machine and be able to apply to specific problems
  • distinguish between supervised and unsupervised learning methods in related applications

Instructional Methods

The following instructional methods will be used to achieve the course objectives: Lecture, questioning, discussion, group work, simulation.

Tentative Weekly Outline

  1. Brief introduction to Statistical Learning
    1. Regression versus Classification
  2. Linear Regression
    1. simple and multiple Linear Regression
  3. Classification
    1. Logistic Regression
    2. Discriminant Analysis (Linear and Quadratic)
  4. Resampling Methods
    1. Cross-Validation
    2. the Bootstrap
  5. Regularisation
    1. Subset Selection
    2. Ridge Regression
    3. the Lasso
    4. Principle Components Regression
    5. Partial Least Squares Regression
  6. Nonlinear Models
    1. Polynomial and Splines
    2. Generalised Additive Models
  7. Tree-Based Models
    1. Decision Trees
    2. Random Forest
    3. Boosting
  8. Support Vector Machines
  9. Unsupervised Learning
    1. Principle Component Analysis
    2. Clustering Methods

Course Textbook(s)

  • Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical Learning - with Applications in R, 8th ed. Springer, 2013 (Corrected at 8th printing 2017)

Course Material(s) and Reading(s)

Books (Textbook):

  • Trevor Hastie, Robert Tibshirani, Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed., Springer, 2009 (Corrected at 12th printing 2017)

Books (Supplementary):

  • Kevin P. Murphy, Machine Learning: A Probabilistic Perspective, The MIT Press, 2012
  • Peter Harrington, Machine Learning in Action, Manning Publications Co., 2012
  • Charu C. Aggarwal, Neural Networks and Deep Learning: A Textbook, Springer, 2018
  • G. Jay Kerns, Introduction to Probability and Statistics Using R, 1st ed., 2015
  • Robert V. Hogg, Elliot A. Tanis, Dale Zimmerman, Probability and Statistical Inference, 9th ed., 2015
  • Larry Wasserman, All of Statistics - A Concise Course in Statistical Inference, 2004
  • W. N. Venables, D. M. Smith, and the R Core Team, An Introduction to R - Notes on R: A Programming Environment for Data Analysis and Graphics, Version 3.4.2 (2017-09-28)

Resources:

  • The R Project for Statistical Computing: https://www.r-project.org/
  • python: https://www.python.org/
  • RStudio: https://www.rstudio.com/
  • Anaconda: https://www.anaconda.com/

Supplementary Readings / Resources / E-Resources

Readings

It is suggested that you should read the documentations of each resource below:

  • The R Project for Statistical Computing: https://www.r-project.org/
  • python: https://www.python.org/