Challenge 21 Megathread

For any and all questions relating to challenge 21. :point_down:

For a tutorial on how to use Jupyter Notebook, we put together this video:

Still have questions? Read all the FAQs here.

If you would like to download the data set used in this challenge, click here.

To continue to play around with the datasets in a Jupyter environment, click here.

Just saying… you only actually need to do Challenge 2 to get the answer.

2 Likes

That was a fun one :smiley:

3 Likes

This time doing 1/3 was actually enough to get the correct answer haha.

Summary

#Q1
df[‘Critic_Score’].fillna(df[‘Critic_Score’].median(), inplace=True)
df[‘Critic_Score’].corr(df[‘User_Score’])
#0.4785731396832123

#Q2
retroGames = df[df[‘Year_of_Release’]<2000]
retroGames.sort_values(“Global_Sales”, ascending=False).head()
top5Retro = retroGames.head()
plt.figure()
plt.title(“Top 5 Retro Games by Global Sales (pre-2000 games)”)
plt.bar(top5Retro[‘Name’],top5Retro[‘Global_Sales’])
plt.xticks(rotation = ‘vertical’)
plt.show()

#Q3
df[‘Critic_Count’].fillna(df[‘Critic_Count’].mean(), inplace=True)
df[‘User_Count’].fillna(df[‘User_Count’].median(), inplace=True)
df[‘User_Score’] = df[‘User_Score’] * 10 #critic is 0 to 100, user 0 to 10
dfStats = df.describe()
df[‘Aggregate_Score’] = (df[‘Critic_Count’] * df[‘Critic_Score’] + df[‘User_Count’] * df[‘User_Score’]) / (df[‘Critic_Count’] + df[‘User_Count’])
df[“Aggregate_Score”].describe()

filterDF = df[(df[‘Year_of_Release’]<2000) & (df[“Publisher”] != ‘Nintendo’)]
top5 = filterDF.sort_values(“Aggregate_Score”,ascending=False).head()

plt.figure()
plt.title(“Top 5 Retro Games non-Nintendo by Agg Score (pre-2000 games)”)
plt.barh(y = top5[‘Name’],width = top5[‘Aggregate_Score’])
plt.show()

2 Likes

Can anyone suggest an efficient way of cleaning up all the columns? I did .info() to see what columns included nulls and then line by line .fillna(), (eg. df['column'] = df['column'].fillna(method = 'ffill') x 7 or 8). I feel like a smarter person would be able to accomplish this without copy/pasting so much…

Warning: Don’t click on the answer until you’ve saved anything you want to save from your code.

Like mini-putt, that last hole eats your ball.

2 Likes

Could technically do a quick loop of the columns that you need to fill. Though I’m sure someone has a better solution than this:

colsToFill = [‘col1’, ‘col2’, ‘col3’, …]
for col in colsToFill:
df[col] = df[col].fillna(method=‘ffill’)

numericalCols = [‘col1’,‘col2’]
for col in numericalCols:
df[col] = df[col].fillna(df[col].median())

1 Like

their solution to Question 3 is pretty sketchy …

why would the Critic_Count & User_Count be filled in with the mean & median of the Critic_Score …

#Fill in with the mean 
df['Critic_Count'] = df['Critic_Count'].fillna(value = df.Critic_Score.mean())

#Fill in with the median 
df['User_Count'] = df['User_Count'].fillna(value = df.Critic_Score.median())

I got questions 1 & 2 right, but couldn’t get 3, and now I know why.

Nice catch, but I assume that’s a typo - I used the actual User_Score.median() and got the answer correct. The trick is:
User Score is out of 10 and Critic Score is out of 100 so for your Aggregate_Score, make sure you scale one or the other. Metal Gear Solid then comes out on top.

4 Likes

Bluebird8203 You just got twisted around somehow. They wanted to fill Critic_Count with the mean of Critic_Count and full User_Count with the median of User_Count, not fill the counts with the mean or median of the Critic_Score.

I misread this a bit and filled both counts with the respective means and got the same answer. I did get messed up interpreting what they meant by not published by Nintendo before 2000, but eventually got it…

That’s not my code, that’s the code from LL in the solution.

I did fill the columns with the correct data, but I probably did something wrong with the AggScore.

My apologies, I haven’t read any of the solutions. Contributors usually have multiple ways to solve the problems anyway. That is weird that LHL would have written that as a solution, so then they got really twisted around or mixed up cutting & pasting something.

If you didn’t normalize the scores you would get the wrong answer - like you I got the first 2, but it took a few tries to match the third answer in the only logical answer set. I didn’t notice until after a few versions that the critic score was out of 100 and the users score was out of 10.

1 Like

You can pass a dictionary with the column name and the method inside the fillna method, e.g.:

df_filled = df.fillna({‘Critic_Count’:df[‘Critic_Count’].mean(), ‘User_Count’:df[‘User_Count’].median(),
‘Critic_Score’:df[‘User_Score’].median(), ‘User_Score’:df[‘User_Score’].median()})

3 Likes

Here’s my solution, Q3 took some time
df.info()
import seaborn as sns
df[‘Critic_Score’].fillna((df[‘Critic_Score’].median()), inplace=True)
df.info()
df1 = (df[df[‘Year_of_Release’]<2000]).sort_values(‘Global_Sales’, ascending = False)
df[‘Critic_Count’].fillna((df[‘Critic_Count’].mean()), inplace=True)
df[‘User_Count’].fillna((df[‘User_Count’].median()), inplace=True)
df[‘AggregateScore’] = ((df[‘Critic_Count’]*df[‘Critic_Score’]) + (df[‘User_Count’]*df[‘User_Score’]*10))/((df[‘Critic_Count’]+df[‘User_Count’]))
df2 = df.loc[(df[‘Publisher’] != ‘Nintendo’) & (df[‘Year_of_Release’] < 2000) ]
df3 = (df2.sort_values([‘AggregateScore’], ascending = False)).head()
ax = sns.barplot(x=“Name”, y=“AggregateScore”, data=df3)
ax.set_xticklabels(ax.get_xticklabels(),rotation=30)
print("The answer to first question is ", df[‘Critic_Score’].corr(df[‘User_Score’]))
print("The answer to second question is ",’\n’, df1[‘Name’].iloc[0:5])
print("The answer to final question is ",’\n’, df3[‘Name’].iloc[0])

1 Like

Video Solution: https://youtu.be/AvCtSGWbbjI

4 Likes

Any DataFrame which puts Metal Gear Solid #1 is alright with me!

Will there be some link or verification of participating to share on LinkedIn say?
@tim

And does anyone recommend a continuing rolling challenge site or service?

5 Likes

It turns out that I had accidentally multiplied the User_Count & Credit_Count, rather than adding when calculating the AggScore.
Once I fixed that, I did get Metal Gear Solid forQ3.