Predict Customer Satisfaction with ML

So as part of my data science course at General Assembly, i have the opportunity of doing a machine learning project, which is super awesome.

What problem am i solving?

Predicting customer satisfaction using machine learning.

Some basic stats about the data set;
import pandas as pd
df = pd.read_csv('train.csv')
print df.shape
(76,020, 361)

The journey thus far,

Splitting the data set into a new training and test set
from sklearn.cross_validation import train_test_split
train, test = train_test_split(df, train_size=46020, test_size=30000)

After running logistic regression on the new training set above, and using the trained model to predict values for in the test set,  the resulting confusion matrix looked like this.
[[28823 4][1173 0]]


TODO: Provide  some interpretation of the confusion matrix###

  • further tweaking of the train and test data set sizes did not yield any better predictions for the 1s(ones), the model predicted all the zeros accurately but wrong results for the ones.

Follow me on Part 2 of this journey



About austiine's internet home

I am a software engineer working with ThoughtWorks Sydney. I have worked with Java, Python, Ruby, Javascript and mobile technologies (iOS and Android). I am passionate about programming languages and machine learning.
This entry was posted in Data Science, Machine Learning. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s