For any and all questions relating to challenge 21.

For a tutorial on how to use Jupyter Notebook, we put together this video:

Still have questions? Read all the FAQs here.

To continue to play around with the datasets in a Jupyter environment, click here.

Just sayingā¦ you only actually need to do Challenge 2 to get the answer.

2 Likes

That was a fun one

3 Likes

This time doing 1/3 was actually enough to get the correct answer haha.

Summary

#Q1
df[āCritic_Scoreā].fillna(df[āCritic_Scoreā].median(), inplace=True)
df[āCritic_Scoreā].corr(df[āUser_Scoreā])
#0.4785731396832123

#Q2
retroGames = df[df[āYear_of_Releaseā]<2000]
plt.figure()
plt.title(āTop 5 Retro Games by Global Sales (pre-2000 games)ā)
plt.bar(top5Retro[āNameā],top5Retro[āGlobal_Salesā])
plt.xticks(rotation = āverticalā)
plt.show()

#Q3
df[āCritic_Countā].fillna(df[āCritic_Countā].mean(), inplace=True)
df[āUser_Countā].fillna(df[āUser_Countā].median(), inplace=True)
df[āUser_Scoreā] = df[āUser_Scoreā] * 10 #critic is 0 to 100, user 0 to 10
dfStats = df.describe()
df[āAggregate_Scoreā] = (df[āCritic_Countā] * df[āCritic_Scoreā] + df[āUser_Countā] * df[āUser_Scoreā]) / (df[āCritic_Countā] + df[āUser_Countā])
df[āAggregate_Scoreā].describe()

filterDF = df[(df[āYear_of_Releaseā]<2000) & (df[āPublisherā] != āNintendoā)]

plt.figure()
plt.title(āTop 5 Retro Games non-Nintendo by Agg Score (pre-2000 games)ā)
plt.barh(y = top5[āNameā],width = top5[āAggregate_Scoreā])
plt.show()

2 Likes

Can anyone suggest an efficient way of cleaning up all the columns? I did .info() to see what columns included nulls and then line by line .fillna(), (eg. `df['column'] = df['column'].fillna(method = 'ffill')` x 7 or 8). I feel like a smarter person would be able to accomplish this without copy/pasting so muchā¦

Warning: Donāt click on the answer until youāve saved anything you want to save from your code.

Like mini-putt, that last hole eats your ball.

2 Likes

Could technically do a quick loop of the columns that you need to fill. Though Iām sure someone has a better solution than this:

colsToFill = [ācol1ā, ācol2ā, ācol3ā, ā¦]
for col in colsToFill:
df[col] = df[col].fillna(method=āffillā)

numericalCols = [ācol1ā,ācol2ā]
for col in numericalCols:
df[col] = df[col].fillna(df[col].median())

1 Like

their solution to Question 3 is pretty sketchy ā¦

why would the Critic_Count & User_Count be filled in with the mean & median of the Critic_Score ā¦

``````#Fill in with the mean
df['Critic_Count'] = df['Critic_Count'].fillna(value = df.Critic_Score.mean())

#Fill in with the median
df['User_Count'] = df['User_Count'].fillna(value = df.Critic_Score.median())
``````

I got questions 1 & 2 right, but couldnāt get 3, and now I know why.

Nice catch, but I assume thatās a typo - I used the actual User_Score.median() and got the answer correct. The trick is:
User Score is out of 10 and Critic Score is out of 100 so for your Aggregate_Score, make sure you scale one or the other. Metal Gear Solid then comes out on top.

4 Likes

Bluebird8203 You just got twisted around somehow. They wanted to fill Critic_Count with the mean of Critic_Count and full User_Count with the median of User_Count, not fill the counts with the mean or median of the Critic_Score.

I misread this a bit and filled both counts with the respective means and got the same answer. I did get messed up interpreting what they meant by not published by Nintendo before 2000, but eventually got itā¦

Thatās not my code, thatās the code from LL in the solution.

I did fill the columns with the correct data, but I probably did something wrong with the AggScore.

My apologies, I havenāt read any of the solutions. Contributors usually have multiple ways to solve the problems anyway. That is weird that LHL would have written that as a solution, so then they got really twisted around or mixed up cutting & pasting something.

If you didnāt normalize the scores you would get the wrong answer - like you I got the first 2, but it took a few tries to match the third answer in the only logical answer set. I didnāt notice until after a few versions that the critic score was out of 100 and the users score was out of 10.

1 Like

You can pass a dictionary with the column name and the method inside the fillna method, e.g.:

df_filled = df.fillna({āCritic_Countā:df[āCritic_Countā].mean(), āUser_Countā:df[āUser_Countā].median(),
āCritic_Scoreā:df[āUser_Scoreā].median(), āUser_Scoreā:df[āUser_Scoreā].median()})

3 Likes

Hereās my solution, Q3 took some time
df.info()
import seaborn as sns
df[āCritic_Scoreā].fillna((df[āCritic_Scoreā].median()), inplace=True)
df.info()
df1 = (df[df[āYear_of_Releaseā]<2000]).sort_values(āGlobal_Salesā, ascending = False)
df[āCritic_Countā].fillna((df[āCritic_Countā].mean()), inplace=True)
df[āUser_Countā].fillna((df[āUser_Countā].median()), inplace=True)
df[āAggregateScoreā] = ((df[āCritic_Countā]*df[āCritic_Scoreā]) + (df[āUser_Countā]*df[āUser_Scoreā]*10))/((df[āCritic_Countā]+df[āUser_Countā]))
df2 = df.loc[(df[āPublisherā] != āNintendoā) & (df[āYear_of_Releaseā] < 2000) ]
df3 = (df2.sort_values([āAggregateScoreā], ascending = False)).head()
ax = sns.barplot(x=āNameā, y=āAggregateScoreā, data=df3)
ax.set_xticklabels(ax.get_xticklabels(),rotation=30)
print("The answer to first question is ", df[āCritic_Scoreā].corr(df[āUser_Scoreā]))
print("The answer to second question is ",ā\nā, df1[āNameā].iloc[0:5])
print("The answer to final question is ",ā\nā, df3[āNameā].iloc[0])

1 Like

Video Solution: https://youtu.be/AvCtSGWbbjI

4 Likes

Any DataFrame which puts Metal Gear Solid #1 is alright with me!

Will there be some link or verification of participating to share on LinkedIn say?
@tim

And does anyone recommend a continuing rolling challenge site or service?

5 Likes

It turns out that I had accidentally multiplied the User_Count & Credit_Count, rather than adding when calculating the AggScore.
Once I fixed that, I did get Metal Gear Solid forQ3.