Perhaps the greatest explanation on the impact of machine learning comes from Jeff Bezos in his 2016 letter to Amazon shareholders:
“But much of what we do with machine learning happens beneath the surface. Machine learning drives our algorithms for demand forecasting, product search ranking, product and deals recommendations, merchandising placements, fraud detection, translations, and much more. Though less visible, much of the impact of machine learning will be of this type – quietly but meaningfully improving core operations.”
From FinTech startups to Fortune 100 retail giants, a growing number of organizations of all sizes from a range of industries continue to invest in machine learning projects. The increasing adoption mixed with optimistic forecasts for machine learning speak to the power of machine learning models and how they are, as Bezos says, “quietly, but meaningfully improving core operations.”
In this post, we’ll give you a simple overview of how various machine learning models work and how they’re being used today to solve real-world problems.
Machine learning model types: Supervised vs. unsupervised
Most machine learning models fall into two key categories—supervised and unsupervised. There is a third category, reinforcement, but this style of learning is the least tested to date, which is why most ML models fall into supervised or unsupervised.
When determining what specific type of model to choose, the key difference to consider is the difference between labeled and unlabeled data. Supervised learning models train with labeled data, unsupervised learning models train with unlabeled data, and reinforcement learning models train with both. This will make more sense in the next section.
Supervised machine learning
In a supervised learning problem, we’re responsible for labeling the input and output data so that the model can learn what is right or wrong, allowing it to improve over continuous iterations. The simplest example of supervised learning is trying to get an algorithm to detect which emails are spam and which emails aren’t. The person feeding sample data knows the answer, it’s just a matter of training the machine to learn how to think for itself so that it knows it too.
Supervised learning models are further sub-categorized as regression and classification models, which we’ll expand on shortly.
Unsupervised machine learning
Unsupervised machine learning models learn patterns from unlabeled data through techniques such as clustering and association. With unsupervised learning models, you don’t necessarily know the ground truth, or the type of answer to expect. However, the model seeks to find similarities and differences within large volumes of data to help you gain valuable insights. An example of this in practice is recommender systems, which groups people with similar viewing or reading habits and uses the insights to recommend similar shows, movies, or books.
Due to the nature of the unsupervised learning approach, you’ll generally need a lot more data to make powerful models than you’ll need in supervised learning. However, a big benefit of unsupervised learning is that it won’t need as much human intervention. In a supervised learning approach, labeling the data appropriately can take a significant amount of time.
Reinforcement machine learning
Reinforcement machine learning models learn by trial-and-error. By using a mix of labeled and unlabeled data, analysts are key in feeding the algorithm positive or negative responses to the decisions it makes on its own (with the unlabeled data). Naturally, the machine’s goal is to maximize the positive responses it’s given.
Due to the complex nature of trial-and-error, reinforcement learning currently takes the longest to train a computer. However, because of the same complexity of discovering answers on its own, reinforcement learning also gives us the most hope for the advancement of AI and ML.
Machine learning models explained
Now let’s go over the following popular machine learning models and some examples:
Regression (Supervised learning)
Classification (Supervised learning)
Clustering (Unsupervised learning)
Before we delve into it, keep in mind that some models can be used to solve multiple types of problems. For example, K-nearest neighbor and random forest can be used for both regression and classification problems.
Regression models predict continuous values. The target variable is usually expressed as a numerical value that can be written as a decimal. In regression models, the model aims to understand the best relationship between a target outcome variable (the dependent variable) and one or more predictors (the independent variables). Some examples of regression models are linear regression and K-nearest neighbor.
In linear regression, the assumption is that a linear relationship exists between the independent variables and the dependent variable, the question is how to find the specific linear relationship that best fits the data.
The line that best fits the data will have the least amount of error, which means the least amount of distance between the predicted value and the actual value.
Real-world applications of Linear Regression
Linear regression can be used for trend forecasting such as sales estimates or for forecasting an effect such as when predicting how changes in the market can affect the stock price.
K-nearest neighbor (KNN) is a classification model that’s relatively easy to understand. To classify new unlabeled test data, KNN analyzes the data points that are the nearest neighbors. The variable K denotes the number of neighboring data points that KNN analyzes. For example, if K=1, KNN would analyze just the nearest data point to make a decision about the unlabeled data. If K=3, KNN would analyze the 3 nearest data points to make a decision about the unlabeled data.
In classification models, the model aims to correctly assign new data into predefined classes. When there are two possible classes (e.g. yes/no, true/false), it’s referred to as binary classification. If it has 2 or more possible labels, it’s referred to as multi-class classification. Common types of classification models are Logistic Regression, Naive Bayes, Decision Tree, Random Forest, and Neural Networks.
Despite the name, logistic regression is popular for classification problems, namely binary classification problems. Unlike linear regression, logistic regression predicts whether something is true or false, and instead of fitting a line, it fits an S shape.
Naive Bayes models are a set of supervised learning algorithms that make over-simplified assumptions to simplify calculations. The naive assumption is that individual features within a problem are independent, meaning they have no correlation to each other. This assumption is often untrue.
Despite its naive assumptions, it’s a relatively fast classification algorithm and works surprisingly well on many real-world problems.
Real-world application of naive Bayes
Naive Bayes models are commonly used in spam detection. In the context of emails, the naive assumption is that the words are independent of each other. In other words, the model would likely not gather that the usage of the word “free” in the phrase “I was trying to free up time” is different from its use in the phrase “buy crypto for free”.
To train a naive Bayes model, the algorithm runs on training data that is labeled spam or not spam. Then, it figures out a probability that a specific word would appear in spam emails. Say, after training on 300 emails, it detected the word “free’ on 200 of the spam emails. Now, if an email has the word “free” in it, the likelihood of the email being classed as spam increases. The model brings together the individual probabilities of most words in an email to determine the likelihood of an email being spam.
A neural network is commonly compared to a human brain because of how it works similarly to neurons. The neural network consists of an input layer, one or multiple hidden layers, and an output layer. The input layer can consist of one or more independent variables and the output layer consists of one or more dependent variables.
Looking at the picture below, you’d call each circle a node (or a neuron). In the training process, someone feeds data into the nodes of the first layer. (Keep in mind that the data fed into the neural network can also be image data such as pixels).
The first layer, or input layer, then performs a series of mathematical equations on the data before passing the results to the hidden layer.
The hidden layer computes most of the neural network’s calculations before passing the results to the output layer. By the time the new input reaches the output layer, it’s experienced many calculations. The results of all the calculations are essentially probabilities. The output layer considers the resulting probabilities of each class to determine the winning output of the neural network. This is the estimated output.
To train the model, we feed the actual output to it so that it can learn if it was right or wrong and make adjustments accordingly.
Although neural networks can take hours and as much as months to train, they have powerful applications in the real world and form the foundation of Deep Learning.
Real-world applications of Neural Networks
Neural networks can be used in the following ways:
To satisfy Know Your Customer (KYC) and Anti-Money Laundering (AML) regulations through sophisticated identity verification in the form of facial recognition systems.
To forecast stock prices by identifying trends.
To prevent thefts and other crimes through the use of CCTV surveillance made possible by deep learning, particularly neural networks.
Decision Tree and Random Forest
Although they’re two different models, understanding a decision tree will help you understand how random forest models work.
Decision trees and random forest are a couple of the easier ones to understand visually. They look like a flowchart. Decision nodes and leaves make up a decision tree. Decision nodes are where the data splits and leaves are where the output is determined. In a classification problem, the decision node essentially asks questions such as “credit history > 5 years?” “credit score > 600?”. The leaves determine the final output, such as “accept or reject.”
One of the drawbacks of a decision tree is its accuracy. Here’s where random forest comes. In a decision tree, we rely on the decision-making process of one individual tree. In contrast, with random forest, we rely on multiple decision trees and use a “majority wins” model to determine the best solution among multiple decision tree outputs. Random forests still use the same data, but instead of using it to create a single model, it randomly selects features to make multiple models.
Clustering algorithms group similar data and identify patterns. Clustering models are not interested in any specific outcome, unlike classification and regression models. A type of clustering model is K-means.
K-means clustering is a method that uses techniques to reduce the number of feature variables by grouping them into K number of clusters. To be more specific, similar features are merged into K clusters. In the picture below, the circles represent clusters, whereas the x marks represent data points created by multiple features.
Each feature in K-means clustering can only belong to one cluster, unlike another clustering subtype, Fuzzy C-means, which allows features to belong to more than one cluster. A very simplified example is to think of Fuzzy C-means as a Venn diagram and K-means as two circles side by side that do not intersect.
The goal is to learn underlying patterns to optimize the position of the centroid (a data point marking the centermost location of a cluster).
Real-world application of K-means clustering
K-means can be used to segment your customers into k-number of clusters for better targeting. For example, customers can be segmented by purchase history, interests, and by which pages they’ve clicked on from your site.
Moving forward with machine learning models
Libraries like TensorFlow and Scikit-learn have democratized machine learning by simplifying the implementation of machine learning algorithms. These off-the-shelf algorithms have allowed anyone interested in tackling real-world problems using machine learning models to do so within months rather than years. This has promoted more competition which in turn has raised the bar for machine learning experts.
From enhancing your marketing strategy to simplifying regulatory compliance, there’s a machine learning model for practically every business problem and an increasing number of businesses utilizing machine learning experts to take advantage of this.
Although different models are better than others at solving specific problems, it’s common not to know which specific supervised learning model is best for a certain problem until a procedure such as train-test split is performed on the data set.
Also, it’s common to use multiple models to try to solve a problem. For example, see how Cloud Brigade uses linear regression and K-means clustering to help public policymakers better handle COVID-19.