## Top 10 machine learning models beginners must learn

Machine Learning As we all know, the study of computer algorithms that automatically improve themselves through experience, is commonly referred to as Machine Learning. It is a subset of artificial intelligence. The algorithms in machine learning build a mathematical model that is created based on the training data or the data sample, and is further used to make predictions or decisions without any programming for the same.

A beginner can feel overwhelmed when starting out in the world of machine learning, it can seem very complex and tedious from the surface. However, as one delves deeper into the field, gains more knowledge and insights, machine learning can seem like a very interesting dimension to explore.

At Cognixia, we see a lot of participants who become so engaged in machine learning as they progress through the machine training sessions, that sometimes, it changes their perspective about so many things and gives them a different way to look at particular information. Over the years, having interacted with numerous participants as well as subject matter experts and trainers, we have put together a list of the top 10 machine learning algorithms below, to help ease off the jitters and help beginners on their path into the world of machine learning.

## What are the top 10 machine learning algorithms every beginner must know about?

• Linear Regression
• Logistic Regression
• Classification and Regression Trees
• Naïve Bayes
• K-Nearest Neighbors
• Apriori
• K-means
• Principal Component Analysis
• Bagging with Random Forests

Linear Regression

If the entire gist of machine learning would have to be summarized in one sentence, it would be “To quantify the relationship between input variables and output variables”. That’s it, that is the simple idea behind machine learning. Linear regression is the algorithm where the relationship between input variables (x) and the output variables (y) is represented by the equation –

y = a + bx

Graphically, it looks something like this: The goal of the linear regression algorithm is to find the values of coefficients ‘a’ and ‘b’. In the graph, ‘a’ would be the intercept, while ‘b’ would the slope of the line, while the goal is to fit a line nearest to most of the points, thereby reducing the error, represented by the distance, between the value of the data point y and the line.

This is one of the simplest and basic algorithms in machine learning and every individual working in the field of machine learning should be well-versed with it.

Logistic Regression

When you use binary classification in the data, you most likely require discrete values instead of continuous values. This is where the Logistic Regression algorithm is very useful. While the linear regression yields continuous values, the logistic regression yields discrete values, after application of a transformation function. When using a binary classification, where predictions are made about whether a particular event will occur or not, logistic regressions are commonly used. The algorithm gets its name from the transformation function that is used as part of the algorithm – the logistic function represented by –

h(x) = 1/(1+ex)

The logistic regression equation:

P(x) = e ^ (b0 +b1x) / (1 + e(b0 + b1x))

Can be transformed into:

ln(p(x) / 1-p(x)) = b0 + b1x

As a result, a S-shaped curve is obtained, graphically represented as below – The output of a logistic regression is in the form of probabilities of the default class, and since it is a probability, its value will range between 0 to 1. The value will denote the probability of occurrence of the particular event.

Classification and Regression Trees

Commonly abbreviated as CART, Classification and Regression Trees are a type of decision trees. Here, the non-terminal nodes are the root node and the internal node, while the terminal nodes are the leaf nodes. Each input variable (x) and splitting point is represented by a non-terminal node, while the output variable (y) is represented by the terminal node.

The CART algorithm can be represented as below: Naïve Bayes

This algorithm is again about probabilities. Here, the probability of the occurrence of one event is predicted, given another event has already taken place. The algorithm uses the Bayes Theorem to calculate the probability of the hypothesis. The theorem can be represented as an equation as below:

P(h|d)= (P(d|h) P(h)) / P(d)

Where:

• P(h|d) = Posterior probability. The probability of hypothesis h being true, given the data d, where P(h|d)= P(d1| h) P(d2| h)….P(dn| h) P(d)
• P(d|h) = Likelihood. The probability of data d given that the hypothesis h was true.
• P(h) = Class prior probability. The probability of hypothesis h being true (irrespective of the data)
• P(d) = Predictor prior probability. Probability of the data (irrespective of the hypothesis)

Wondering why the algorithm is called ‘naïve’? Well, that is because it assumes that all the variables would be independent of each other, which, considering the real world, is quite a naïve assumption to make.

K-Nearest Neighbors

Normally, for algorithms, the entire data set is broken down into the training set and the test set. However, for K-Nearest Neighbors or the KNN algorithm, the entire data set is used as the training set. The algorithm goes through the entire data set to find the k-nearest instances to the new instance, or the ‘k’ number of instances that are closest to the new instance, when a new instance occurs. The output received could be a mean of the outcomes in case of regression problems or a mode for classification problems. For this algorithm, the value of ‘k’ would be user-specified. In order to calculate the similarity between the instances, measures like Euclidean distance and Hamming distance are utilized.

Apriori

In order to mine frequent item sets from transactional databases and subsequently generate the association rules, the apriori algorithm is used. It is a type of unsupervised learning algorithm in machine learning. It is very commonly used in the market basket analysis. The association rules are generated after the thresholds for support and confidence are crossed.

For instance, if the association rule is X à Y, then the formula for support, confidence and lift for this association rule would be – The support measure would help cut down the number of candidate item sets that need to be considered during the frequent item set generation, and it is guided by Apriori principle. According to the Apriori principle, if the item set is frequent, then all its subsets will also be frequent.

K-means

This is also a type of unsupervised machine learning algorithm and it is an iterative algorithm that groups similar data into clusters. This algorithm calculates the centroids of K-clusters and assigns a data point to that cluster in such a way that there is a least possible distance between the centroid and its data point.  Principal Component Analysis

Principal Component Analysis or PCA algorithm helps make data easy to explore and visualize by bringing down the number of variables. In order to this, the maximum variance in the data is captured into a new coordinate system with aces that are called as principal components. Here, each component is a linear combination of the original variables, while being orthogonal to one another, indicating that the correlation between these components is zero.

The first principal component represents the direction of the maximum variability in the data, while, the second principal component indicates the remaining variance in the data. However, the second principal component has variables uncorrelated with the first component. Similarly, all successive principal components (PC3, PC4 and so on) would represent the remaining variance while being uncorrelated with the previous components. Bagging with Random Forests

The bagging algorithm is part of the ensemble learning techniques. It is a parallel ensemble technique. The first step for this technique is to create multiple models with the data sets using the bootstrap sampling method. Here, each training set consists of random subsamples from the original data set. Each training sample data is same in size as the original data set, though some records are repeated and some are omitted in each training set.

After the bootstrap sampling is done, step 2 is to create multiple models using the same algorithm but on different generated training sets. This is where Random Forests play a role. In Random Forests, features are selected randomly for constructing the best split. Bagging after splitting on a random subset of features ensures less correlation among prediction from the subtrees.

In this way, in bagging with Random Forests, each tree is constructed using a random sample of records and each split is built with a random sample of predictors. 