Machine learning principles

4 minute read

Published:

There are a lot universal principles and basic concepts in machine learning. This post is a brief explanation of some of the questions which may help your understanding and interviews. Thanks for here.

What’s the trade-off between bias and variance?

Reference Bias means the difference between the average prediction of your model and that of the real data. High bias usually means a failed model. The variance shows the difference of each predicted data point with the average of the predictions. It models how the model fits in the data distribution pattern. Hight variance usually means overfitting happens to the model and it generalize poorly to the test data. A model with low variane and low bias usually means it’s underfitting. A typical fomula: Err(x) = bias2 + variance + e

What is the curse of dimensionality?

Reference Normally, the more features we use, the higher the likelihood that we can successfully separate the classes perfectly. Using too many features results in overfitting. The classifier starts learning exceptions that are specific to the training data and do not generalize well when new data is encountered. As the dimensionality increases, a larger percentage of the training data resides in the corners of the feature space. Then we should considering reduce the dimension of the features using such as Principal Component Analysis (PCA).

Explain over- and under-fitting and how to combat them?

Overfitting is when a model fitts exactly close to a specific set of training data while it fails to fit into other sets of data. Underfitting occurs when a statistical model cannot adequately capture the underlying structure of the data. Underfitting would occur, for example, when fitting a linear model to non-linear data.

What is regularization, why do we use it, and give some examples of common methods?

Reference This is a form of regression, that constrains/ regularizes or shrinks the coefficient estimates towards zero. In other words, this technique discourages learning a more complex or flexible model, so as to avoid the risk of overfitting.

Explain Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE)?

Both methods are aiming to do dimensionality reduction and are widely used in visualization. PCA selects the most significant orthogonal components to decompose data, which means the data can be represented as a weighted average of principal components. The process is: normalize data, computecovariance Matrix, get eigenvectors and eigenvalues with SVD, arrange eigenvalues from large to small, project data to eigenvectors that are selected by eigenvalues. t-SNE, Contrary to PCA, is not a mathematical technique but a probablistic one. It tries to solve how to best represent input data using less dimensions by matching distributions.

What is data normalization and why do we need it?

The data normalization makes all features weighted equally. It’s used to rescale values to fit in a specific range to assure better convergence during backpropagation. In general, it boils down to subtracting the mean of each data point and dividing by its standard deviation.

Explain the benefits of dimensionality reduction?

(1) Reduce the storage space needed (2) Speed up computation, less dimensions mean less computing (3) Remove redundant features (4) Good for visualization (5) Too many features or too complex a model can lead to overfitting.

How do you handle missing or corrupted data in a dataset?

You could find missing/corrupted data in a dataset and either drop those rows or columns, or decide to replace them with another value.

Explain decision tree and random forest.

Reference A decision tree is a flowchart like tree model leading to a prediction. In this procedure all the features are considered and different split points are tried and tested using a cost function. The split with the best cost (or lowest cost) is selected, and do recursive binary splitting. The random forest is an ensemble of Decision Trees. Random forest builds multiple decision trees and merges them together to get a more accurate and stable prediction. Random Forest adds additional randomness to the model, while growing the trees. Instead of searching for the most important feature while splitting a node, it searches for the best feature among a random subset of features. This results in a wide diversity that generally results in a better model.

What’s the popular clustering algorithms

Reference K-Means Clustering, Mean-Shift Clustering, Spatial Clustering, EM Clustering with GMM, Hierarchical Agglomerative Clustering.

Good luck!