In the last part we have mentioned that in machine learning we have supervised learning and unsupervised learning. In supervised machine learning we make labels for data, while in unsupervised learning we do not need to make labels.
From the perspective of classical machine learning, there are Regression and Classification in supervised learning. And in unsupervised learning, the most common form is Clustering.
Regression is what you might already learned in College Statistics. The label predicted by the regression model is a numeric value, which has practical implications.
· The number of cups of hot coffee sold on a given day, based on the temperature, rainfall, and windspeed. The number of cups of coffee is the “label” we are going to predict.
· The selling price of a property based on its size in square feet, the number of bedrooms it contains, and socio-economic metrics for its location. The selling price of a property is the “label” we are going to predict in a regression model.
Some common regression models are like linear regression, polynomial regression, support vector regression, decision tree regression, random forest regression, etc.
Example of a cubic polynomial regression, which is a type of linear regression. Although polynomial regression fits a nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the regression function E(y | x) is linear in the unknown parameters that are estimated from the data. For this reason, polynomial regression is considered to be a special case of multiple linear regression.
By Skbkekas - Own work, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=6457163
Classification is what we usually see in modern AI application, because in Classification, the label predicted by the model usually represents a categorization, which is the class.
· Whether an email is spam or not based on its features like the frequency of certain words and the sender’s email address. True of false is the label we are going to predict in this model. (Binary Classification)
· The genre of a movie (comedy, horror, romance, adventure, or science fiction) based on its cast, director, and budget. The genre is the label we are going to predict. (Multiclass Classification)
Some common classification models are like logistic regression, decision trees, random forest, support vector machine, naïve-bayes, k nearest neighbors, etc.
Example of a linear SVM (support vector machine). The red line indicates a maximum-margin hyperplane and the yellow area demonstrates margins for an SVM trained with samples from two classes. Samples on the margin are called the support vectors.
By Larhmam - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=73710028
Clustering is a form of unsupervised machine learning in which observations are grouped into clusters based on similarities in their data values, or features. In a clustering model, the label is the cluster to which the observation is assigned, based only on its features.
· Customer Segmentation based on their spending behavior, considering annual income and annual spending on a scale from 1 to 100.
A common clustering method is K means. There are also other forms of unsupervised learning like Principal Component Analysis, etc.
Example of a K means clustering.
By I, Weston.pace, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=2463076
Deep learning is a different algorithm architecture from classical statistic methods. It is an advanced form of machine learning, which tries to emulate how the human brain learns. The key to deep learning is the creation of an artificial neural network that simulates electrochemical activity in biological neurons by using mathematical functions. Keep in mind that it can be applied to both supervised learning and unsupervised learning, although the theoretical basis may not be as robust as that of classical methods above.
Example of a simple neural network.
By MultiLayerNeuralNetwork_english.png: Chrislbderivative work: — HELLKNOWZ ▎TALK ▎enWP TALK - MultiLayerNeuralNetwork_english.png, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=11397827
Suppose that inputs are x1, x2, x3, then the input layer could be x = [x1, x2, x3].
The hidden layer and the output layer are calculated by the corresponding input layers and w, which is the weight and feed the input data through the network. The output layer produces predicted y. Then a loss function is applied to compare the predicted y and the real y to get a loss value. Then we can adjust the weights according to loss value and repeat the network until the loss value is minimized and the model result is acceptable.
We will focus on classification, thus in the next part we will explore Python code for various applications.
Please feel free to contact if you have any comments.
Stay tuned for the next part, coming next month!