Evolution of Machine Learning

Machine learning is now a vast and rapidly growing field. Machine learning, as a subset of Artificial Intelligence (AI), is the approach to learn from data, identify patterns and make decisions. Machine learning can help humans to solve tasks that are too complex for humans to code imperatively. In the history of machine learning, there were once huge setbacks due to insufficient computing capability. With the rise of cloud computing technologies and massive amounts of computational power, the power of machine learning has been unleashed.

Types of Machine Learning Algorithms

The goal of Machine Learning is to make computers learn from the data that you give them. Instead of writing code that describes the action the computer should take, your code provides an algorithm that adapts based on examples of intended behavior. The resulting program, consisting of the algorithm and associated learned parameters, is called a trained model. Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. There are three main types of machine learning: supervised learning, unsupervised learning, and reinforcement learning. Each type fits different goals.

In supervised learning, the model learns from labeled training data. In machine learning, data labeling is the process of identifying raw data (images, text files, videos, etc.) and adding one or more meaningful and informative labels to provide context so that a machine learning model can learn from it. Supervised learning can be classified into two subgroups: regression and classification. Regression is the problem of estimating or predicting a continuous quantity. One example is to predict income using social factors such as age, education, etc. Classification assigns observations into discrete categories rather than continuous quantities. The simplest case is binary classification which consists of two possible categories. One example is "Dogs vs. Cats" image classification.

Unsupervised learning is a type of machine learning algorithm that looks for hidden patterns or intrinsic structures in a dataset without pre-existing labels. There are several ways that unsupervised learning models are used. The most common example is cluster analysis, which is the process of finding similarities among unlabeled data and grouping them together. The clusters are modeled using a measure of similarity, through which we use unlabeled data to train a model by finding similarities in the data. We use the model to assign a new observation into a cluster of observations. For example, we may want to segment a population into smaller groups that have similar demographics such as level of education or age.

Reinforcement learning is useful in cases where the solution space is enormous or infinite, and typically applies in cases where the machine can be thought of as an agent interacting with its environment. The agent interacts with an uncertain, potentially complex environment to achieve a goal, following trial and error. Consider an agent that has an internal state which is put in an environment. The agent receives rewards for desired actions or penalties for undesired actions. The agent will update its internal state to maximize the total rewards. If there is a start state and an end state with multiple paths in between, reinforcement learning will find a path that would maximize the reward. Examples include self-driving vehicles or AlphaGo.

Machine Learning Workflow

The diagram below gives a high-level overview of the stages in an ML workflow.

Figure 1: A high-level overview of the stages in an ML workflow.

In the above pipeline, the majority of the time is spent in data preparation before modeling, since most machine learning algorithms require data to be formatted in a very specific way before they can yield useful insights. The quality of data is crucial to the quality of the machine learning model, for example, missing or invalid can result in less accurate or even misleading outcomes. Good data preparation produces clean and well-curated data which leads to more practical, accurate model outcomes. Therefore, let’s explore deeper about data to lay the foundation of a high–quality ML model.

Data

Data has to be converted into appropriate representation so that the machines are able to learn the patterns within data. Understanding the different data types can help us identify correct preprocessing techniques & convert the data appropriately. Furthermore, it will also enable us to perform the best visualizations and uncover hidden knowledge. In this primer, we will introduce the following two classification of data to explore more: