ML Topics Breakdown

Preface

Hello, fellas! I decided to create this document to help you with your ML exam preparation. I will be adding more topics and questions as fast as I can... I hope this document will be helpful for you!

Note!

This document was created with gross assistance from ChatGPT.
So if you find any mistakes, please let me know. I will be happy to fix them.
The most important thing is you to check the correctness of the answers and not to rely on them blindly!

How to use this document

So Q&A format is following:

Q&A example

Question 1

See the question here...

Answer 1!

and see the answer here...

NOTE! You can hide the answer by clicking on the sign on the right side of the answer block title.

Here is more info about this feature of foldable content: Obsidian foldable callouts.

So you can use this document as a flashcard. And it's a good idea to hide the answer and try to answer the question yourself first. Only after that, you can open the answer and check if you are right. But do as you like, of course.

Table of Contents

Topics are clickable and will take you to the corresponding section :)

  1. Supervised Learning

    • Nearest Neighbour Classifier
    • Linear regression
    • Decision trees
    • Overfitting
    • Train/validation split
    • Cross‑validation algorithm
    • Classification vs. regression
  2. Test 2. Unsupervised Learning

    • Principal Component Analysis (PCA)
    • UMAP
    • t‑SNE
    • K‑means
    • Hierarchical clustering
    • DBSCAN
  3. Deep Learning

    • Artificial neuron
    • Forward pass
    • Activation functions
    • Gradient descent
    • Backpropagation algorithm
    • Vanishing gradients
    • Training loop
    • Convolutional neural networks
  4. Regularisation

    • L1 & L2 regularisation
    • Lasso regression
    • Ridge regression
    • Weight decay
    • Dropout
    • Data augmentation
  5. Ensemble Learning

    • Bagging
    • Random forest
    • Boosting
    • XGBoost
    • Stacking & blending
  6. Performance Metrics

    • Accuracy
    • Recall, precision & F1‑score
    • Confusion matrix
    • MSE & RMSE
    • ROC & AUC

Example question from the Roman Mishchenko (professor)

Write a formula for the Logistic Regression hypothesis. Describe what ML task we can solve using this model. Draw a simple schematic representation of this model architecture and describe the possible range of outputs from this model.


Explain the principles behind dropout and batch normalization in deep neural networks. For each, describe:
(a) What problem it addresses
(b) How it works mathematically or algorithmically
(c) Its effect on training dynamics. Also, compare them to L2 regularization. In which situations would you combine or avoid these methods?


Define precision, recall, and F1-score with formulas.
A spam filter labels 100 emails as spam, of which 80 are actual spam, but it misses 20 spam emails.
(a) Compute precision, recall, and F1-score.
(b) Discuss the tradeoff between precision and recall in this context.
(c) How would changing the classification threshold affect these metrics?