What problem am i solving?
Predicting customer satisfaction using machine learning.
Some basic stats about the data set;
import pandas as pd
df = pd.read_csv('train.csv')
The journey thus far,
Splitting the data set into a new training and test set
from sklearn.cross_validation import train_test_split
train, test = train_test_split(df, train_size=46020, test_size=30000)
After running logistic regression on the new training set above, and using the trained model to predict values for in the test set, the resulting confusion matrix looked like this.
[[28823 4][1173 0]]
TODO: Provide some interpretation of the confusion matrix###
- further tweaking of the train and test data set sizes did not yield any better predictions for the 1s(ones), the model predicted all the zeros accurately but wrong results for the ones.
Follow me on Part 2 of this journey