Academic year 2023/2024
- Course ID
- Robert Birke (Lecturer)
Mirko Polato (Lecturer)
- 1st year
- Teaching period
- First semester
- Related or integrative
- Course disciplinary sector (SSD)
- INF/01 - informatics
- Formal authority
- Type of examination
- Written and oral
- - Acquaintance with the basic concepts of linear algebra, calculus, probability and statistics.
- A basic knowledge of the Python programming language.
Sommario del corso
Course objectivesThe course is positioned in the context of the Master's Degree in Artificial Intelligence for Biomedicine and Healthcare. It concurs with the objectives of the degree by providing theoretical and practical knowledge to perform real data science tasks on different types of data and to reason about the properties of Machine Learning (and Data Mining) models and algorithms used to solve specific learning/mining tasks.
The course's first focus is data, which is essential to any learning/mining task. The course introduces the main techniques to handle data, perform data cleaning and pre-processing, and assess data quality.
Starting from data, the course teaches the differences between tasks and models and introduces the students to popular Machine Learning models.
Specifically, the course provides an overview of the main supervised and unsupervised learning tasks, ranging from classification to clustering algorithms, discussed theoretically and practically.
Particular attention will be given to the practical aspects by introducing the students to some of the most popular Python libraries for data science (e.g., numpy, pandas, scikit-learn).
Finally, the course will introduce the students to the main concepts of privacy-preserving and federated learning.
Results of learning outcomesThrough the course, students will acquire the knowledge and skills to perform real data science tasks on different types of data and the ability to reason about the properties of the models and algorithms used to solve specific learning/mining tasks.
Knowledge and understandingStudents will be mastering some of the main concepts in Data Mining and Machine Learning.
Applying knowledge and understandingThe students will be able to use the learned knowledge in the context of a modern programming language and libraries to solve some real data science tasks (such as classifying examples into a set of given classes, clustering data into meaningful groups, etc.).
Making judgmentsStudents will learn to judge the suitability of a given model or the properties of a learning algorithm for a given data set.These abilities will be refined practicing with different Machine Learning and Data Mining methods applied to real-case problems.
Learning skillsStudents will be given the opportunity to test and self-assess their own knowledge and skills via quizzes and polls.
Program- Introduction to the course- Data- types of data- data quality- data pre-processing- similarity and dissimilarity- avoiding false discoveries- Classification- basic concepts- decision trees- model evaluation- Advanced Classification- rule-based classifiers- nearest neighbors classifier- support-vector machines- neural networks- Clustering- overview- k-means- hierarchical clustering- db-scan- cluster evaluation- graph-based Clustering- Anomaly detection- clustering-based approach- one-class classification- Privacy and Federated Learning
The course will be held in presence.
The course is mainly lecture-based, with some laboratory sessions.
Lectures will cover the theoretical aspects (17 lectures, 34 hours), while the laboratory will focus on the practical grounds (7 laboratory sessions, 14 hours).
During the lectures, students are encouraged to participate in live polls, Q&A, and quizzes (through tools like Mentimeter, Kahoot, Slido, etc.) to test their understanding and to check the presence of any misconceptions and/or biases.
Laboratory sessions will make use of Python notebooks (e.g., Google Colaboratory) and leverage the Python programming language. Small assignments will be given to the students to further develop their understanding of the presented concepts and algorithms.
Learning assessment methods
All exams will be held in presence.
The final exam is divided into two parts: (i) a written test and (ii) an oral examination:
i) The written test will be composed of open questions, each covering a different course topic. The written test will assess the understanding of the presented concepts and the student's ability to solve small practical problems. The written test will be graded up to 32 points.
ii) The oral examination will test the student's ability to communicate the results of a data analysis session and to reason about the properties of the models and algorithms that are used to solve specific learning/mining tasks. The oral examination also allows the student to elaborate on the answers given in the written test. The oral examination will be graded up to 32 points.
To pass the exam students need to obtain at least 18 points in both the written and oral tests.
The final grade will take into account the results of both.
Suggested readings and bibliography
- Introduction to Data Mining
- Year of publication:
- Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar