Brief introduction to Statistical Learning: Regression versus Classification; Linear Regression: simple and multiple Linear Regression; Classification: Logistic Regression, Discriminant Analysis; Resampling Methods: Cross-Validation, the Bootstrap; Regularization: Subset Selection, Ridge Regression, the Lasso, Principle Components and Partial Least Squares Regression; Nonlinear Models: Polynomial; Splines; Generalized Additive Models; Tree-Based Models: Decision Trees, Random Forest, Boosting; Support Vector Machines; Unsupervised Learning: Principle Component Analysis, Clustering Methods.

For further information see the academic catalog: IAM557

### Course Objectives

At the end of the course, the student will learn:

- the fundamentals of Statistical Learning, regression and classification
- linear and nonlinear regressions including splines
- Generalised Additive Models for both regression and classification problems
- regularisation techniques including Ridge regression and the Lasso
- the tree-based methods for regression and classification
- Support Vector Machine which is highly appreciated among Data Science and Machine Learning Community
- the difference between supervised and unsupervised learning methods

### Course Learning Outcomes

Student, who passed the course satisfactorily will be able to:

- present the data and its descriptive analysis
- distinguish between regression and classification problems
- apply regression or classification algorithms to solve related problems
- code their own algorithms for specific applications in Statistical and Machine Learning
- understand the fundamentals of Support Vector Machine and be able to apply to specific problems
- distinguish between supervised and unsupervised learning methods in related applications

### Instructional Methods

The following instructional methods will be used to achieve the course objectives: Lecture, questioning, discussion, group work, simulation.

### Tentative Weekly Outline

- Brief introduction to Statistical Learning
- Regression versus Classification

- Linear Regression
- simple and multiple Linear Regression

- Classification
- Logistic Regression
- Discriminant Analysis (Linear and Quadratic)

- Resampling Methods
- Cross-Validation
- the Bootstrap

- Regularisation
- Subset Selection
- Ridge Regression
- the Lasso
- Principle Components Regression
- Partial Least Squares Regression

- Nonlinear Models
- Polynomial and Splines
- Generalised Additive Models

- Tree-Based Models
- Decision Trees
- Random Forest
- Boosting

- Support Vector Machines
- Unsupervised Learning
- Principle Component Analysis
- Clustering Methods

### Course Textbook(s)

- Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical Learning - with Applications in R, 8th ed. Springer, 2013 (Corrected at 8th printing 2017)

### Course Material(s) and Reading(s)

#### Books (Textbook):

- Trevor Hastie, Robert Tibshirani, Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed., Springer, 2009 (Corrected at 12th printing 2017)

#### Books (Supplementary):

- Kevin P. Murphy, Machine Learning: A Probabilistic Perspective, The MIT Press, 2012
- Peter Harrington, Machine Learning in Action, Manning Publications Co., 2012
- Charu C. Aggarwal, Neural Networks and Deep Learning: A Textbook, Springer, 2018
- G. Jay Kerns, Introduction to Probability and Statistics Using R, 1st ed., 2015
- Robert V. Hogg, Elliot A. Tanis, Dale Zimmerman, Probability and Statistical Inference, 9th ed., 2015
- Larry Wasserman, All of Statistics - A Concise Course in Statistical Inference, 2004
- W. N. Venables, D. M. Smith, and the R Core Team, An Introduction to R - Notes on R: A Programming Environment for Data Analysis and Graphics, Version 3.4.2 (2017-09-28)

#### Resources:

*The R Project for Statistical Computing*: https://www.r-project.org/*python*: https://www.python.org/*RStudio*: https://www.rstudio.com/*Anaconda*: https://www.anaconda.com/

### Supplementary Readings / Resources / E-Resources

#### Readings

It is suggested that you should read the *documentations *of each resource below:

*The R Project for Statistical Computing*: https://www.r-project.org/*python*: https://www.python.org/