Challenge 13 Megathread

For any and all questions relating to challenge 13. :point_down:

For a tutorial on how to use Jupyter Notebook, we put together this video:

Still have questions? Read all the FAQs here.

If you would like to download the data set used in this challenge, click here.

To continue to play around with the datasets in a Jupyter environment, click here.

Anyone have a cleaner sol?
I didn’t really see how we had to use sort for this problem. Sort might have made grabbing the rows where quality was equal to 7 or 8 easier but this seemed more like a filtering problem.

quality_filter = wine_df['quality'] >= 8
qf = wine_df[quality_filter]
rsf = qf['residual sugar'] > 5
qrsf = qf[rsf]

quality_filter2 = wine_df['quality'] == 8
qf2 = wine_df[quality_filter2]
quality_filter3 = wine_df['quality'] == 7
qf3 = wine_df[quality_filter3]
qf4 = pd.concat([qf2, qf3])
citric_filter = qf4['citric acid'] < 0.4
qcf4 = qf4[citric_filter]

Here’s mine, I combined multiple filters using and logic:

wine_df[(wine_df[‘quality’]>=8) & (wine_df[‘residual sugar’]>=5)]
wine_df[(wine_df[‘quality’].isin([7,8])) & (wine_df[‘citric acid’]<0.4)].count()


wine_df.sort_values(by=[‘quality’,‘residual sugar’], ascending = False)

wine_df[(wine_df[‘quality’]>6) & (wine_df[‘citric acid’]<0.4)][‘quality’].count()


I thought the biggest block means the hardest puzzle but it turned out to be an easy one


Video Solution:

For Q1 I just used two filters
wine_df[ wine_df[‘quality’]>=8 ][ wine_df[‘residual sugar’]>5 ]

For Q2 I used a filter for the citric acid content and a condition in the groupby quality >= 7 with count()
This groups the quality as False and True, with the True counts as the number of wines needed.
wine_df[ wine_df[‘citric acid’]<0.4 ].groupby( wine_df[‘quality’]>=7 ).count()

1 Like

My intuition was to just use filtering for both questions but the suggestion to use value_counts may produce a cleaner solution for Q2.


wine_df[(wine_df['quality'] >= 8) & (wine_df['residual sugar'] >= 5)]

You could sort by quality descending instead of filtering but that just seems silly.


wine_df[(wine_df['citric acid'] < 0.4)]['quality'].value_counts()


My approach:

wine_df_quality_res_filter = wine_df['quality'] >= 8
wine_df_qa_res = wine_df[wine_df_quality_res_filter].sort_values(by=['residual sugar'], ascending=False)

wine_quality_df = wine_df[wine_df['quality'] >= 7]
wine_df_quality_res_filter2 = wine_quality_df['citric acid'] < 0.4
wine_quality_df_acid = wine_quality_df[wine_df_quality_res_filter2]
print(wine_quality_df_acid['citric acid'].value_counts().sum())

My answer is given below:

# Answer of question 1
filtered_df = wine_df.loc[(wine_df['quality'] >= 8) & (wine_df['residual sugar'] > 5)]

# Answer of question 2
filtered_df2 = wine_df.loc[((wine_df['quality'] == 8) | (wine_df['quality'] == 7)) & (wine_df['citric acid'] < 0.4)]

I tried to make a filter that would check for both criteria at the same time, but wasn’t having any luck, so I originally kept them separate. After playing around a bit more, I found another method that gave me the exact same result.

# Q1 - quality of 8 or higher and a residual sugar level above 5?
# separate filters
Q = wine_df['quality'] >= 8 
RS = Qdf["residual sugar"] >5

Qdf = wine_df[Q]
QRSdf = Qdf[RS]


# Q1 - alternate
QRSdf = wine_df[(wine_df['quality'] >= 8) & (wine_df['residual sugar'] > 5)]

# Q2 - quality of 8 and 7 and a citric acid level below 0.4?
df = wine_df[wine_df["quality"] >= 7]
df = df[df["citric acid"] < 0.4]

Had no idea why I would need to use .value_counts() but the solution is actually quite clever. I just used three filters.

I like using query for filtering and agree that sorting is not a good fit for the problem as phrased.

# Q1
wine_df.query('quality >= 8 and `residual sugar` > 5').index.tolist()

# Q2
wine_df.query('`citric acid` < 0.4')['quality'].value_counts()[[7,8]].sum()
1 Like

My method:


Get a sense for the data and see that the maximum for quality is 8

wine_df.sort_values(by=['quality', 'residual sugar'], ascending=False)
Can now see the top quality wines which satisfy our “8 or more” condition (8 is the max remember) and look for sugars > 5, which there are two.


df1 = wine_df[wine_df['quality']>=7]
for quality of 7 or higher

(df1['citric acid']<0.4).value_counts()
returns a bool map where the True value gives you the answer.

print(wine_df[(wine_df['quality']>=8) & (wine_df['residual sugar']>5)].index.values)
print(len(wine_df[(wine_df['quality']>=7) & (wine_df['quality']<=8) & (wine_df['citric acid']<0.4)].values))
1 Like

Is this challenge really about sort and value count if many people used a faster and easier solution with less code that doesn’t use sort or value count?


My Solution:
df1 = wine_df[(wine_df[‘quality’]>=8) & (wine_df[‘residual sugar’]>5)]
df2 = wine_df[(wine_df[‘citric acid’]<0.4)]

Hi All!

I am loving seeing all the different ways people are solving these questions and the discussion around different approaches! One of the great thing about data analytics problems is the many different routes that can be taken to arrive at the solutions. The .sort_values() and .value_counts() are two functions that can be added to the toolbox as we work through this 21 day challenge. They can also be helpful in breaking up your code into digestible steps. Sometimes even if I am able to achieve something in a single line of code I will break it up into a few steps to make it easier to read for someone else stepping into my code. Remember there’s never just one right way to do something so have fun and try what works for you in your problem solving process!

Enjoy the rest of your Monday everyone!


The lesson is that there are many solutions, and the corollary is that the solution suggested by the problem is rarely the best one.