You made a machine learning model…now what?

Rachel Beery
4 min readNov 25, 2020

--

Machine Learning: you’ve heard it around social media and on the news as a tool that creates robots and of course teaches machines. It has many powers and limitless possibilities that we are still unaware of. Machine learning is a field of study that is concerned with algorithms that learn from examples. Anyone can create a machine learning model though with just a few simple steps, however most importantly machine learning is about understanding the results we receive from the models we create.

Below you can see a diagram of an example machine learning process including what would happen to split the data, complete feature engineering, make a learning model, evaluate that model, and then deciding if that model can be improved. You’ve been successful in your quest to create a machine learning model, so now the most important part of this process is now understanding what you just did and making more models find the one with the best result for your dataset.

It is essential to make sure that the function you’ve written for your machine learning model results in all the information you will need to interpret your model. These features of importance are train accuracy, test accuracy, cross-validation recall, and a confusion matrix. An example confusion matrix below shows us how the accuracies are calculated. In the example, the training data is trying to calculate whether an email is spam. The first accuracy on the right column is the train accuracy, the accuracy below is the test accuracy, and the overall accuracy is on the bottom right.

For this example, I will use the machine learning data in which we analyzed a dataset from the Seattle police department. In this dataset, we are trying to understand when, where, why, and who uses search and seizure to stop someone who looks suspicious.

Find below the example of a machine learning output:

Train Accuracy: 0.589

Test Accuracy: 0.597

Cross-Validation Recall 0.744

Now before you get overwhelmed by all of the output what we first must keep in mind that this is a binary classification which means that our model is trying to understand if what we are feeding it is one thing or the other using an algorithm. For example, for this project, we are analyzing police data where we are trying to find the features that most predict whether an individual was “arrested” or “not arrested”.

First, we want to understand the train and test accuracies that are calculated. To see how these values are calculated reference the following diagram below. Machine learning model accuracy is the measurement used to determine which model is best at identifying relationships and patterns between variables in a dataset based on the input, or training, data. The amount of correct classifications / the total amount of classifications is the way to calculate this rate. The train accuracy is the accuracy of a model on the example it was constructed on. The test accuracy is the accuracy of a model on examples it hasn’t yet seen.

Our cross-validation recall measures how well a model generalizes to new data. This method gives an indication of how well the learner will do when it is asked to make new predictions for data it has not seen yet. The way to overcome this problem is to use a subset of the data set when training a learner. When training is done, the data that wasn’t trained can be used to test the performance of the learned model on the new data. This is the basic idea for a whole class of model evaluation methods called cross-validation.

Confusion Matrix calculating Accuracies

Once we have understood the results of our model and its performance we can decide to continue trying other models. Some examples of additional binary models that were used in this project include Logistic Regression, Bayes Classification, Multinomial Bayes Classification, Decision Trees, Random Forest, and XGBoost models. It is essential to try many different models before finding what model yields the best results.

Machine Learning models can be vitally important for research in all sectors including for-profit businesses, non-profit organizations, governmental organizations, and academics. By creating a better and more accurate model we can make better decisions that impact the choices of these businesses and organizations. In the case of analyzing police department data and why arrests happen, by who, and where can result in results that can influence how we train police forces in the future. With the knowledge of why and how arrests occur we can optimize the resources that police departments have and make recommendations based on what features most influence our machine learning model. The benefits of using and improving our machine model's accuracy help minimize considerable amounts of time, money, and resources.

Sources:

https://www.datarobot.com/wiki/accuracy/#:~:text=Machine%20learning%20model%20accuracy%20is,input%2C%20or%20training%2C%20data.

--

--

Rachel Beery
Rachel Beery

Written by Rachel Beery

0 Followers

Flatiron Data Science Program Student

No responses yet