Challenge 12 Megathread

For any and all questions relating to challenge 12. :point_down:

For a tutorial on how to use Jupyter Notebook, we put together this video:

Still have questions? Read all the FAQs here.

If you would like to download the data set used in this challenge, click here.

To continue to play around with the datasets in a Jupyter environment, click here.

Not sure if this has already been posted here, but for anyone who is already familiar with SQL, there is a site that converts SQL to pandas and is super helpful in understanding how to query data using the pandas library:


There are some helpful functions like groupby and pivot_table that can help display the data after applying your filter.

The main idea is using a filter:
df[df.AveragePrice >= 2] which limits your dataset to only weeks where the average price was $2 or more for any given type/location.

To go a step further, I’ve used pivot table, which might be familiar for excel users to show the average price for conventional and organic, and then grabbed only the years where both exist with amount of $2 or more for at least one week in the year.

df0=df[df.AveragePrice >= 2].pivot_table(index=['year'],columns=['type'], values=['AveragePrice'], aggfunc='mean' ).reset_index()
df0[(df0.AveragePrice.conventional) > 0 & (>0)]['year'].values

Anyone have a cleaner sol?

import numpy as np
filt = df['AveragePrice'] > 2
filtered_df = df[filt]
grouped_filt = filtered_df.groupby(['type'])
conv = grouped_filt.get_group('conventional')['year'].unique()
org = grouped_filt.get_group('organic')['year'].unique()
print(np.intersect1d(conv, org))
by_year = df.groupby(['year','type'])

by_year['AveragePrice'].max() > 2

df.groupby([‘year’, ‘type’]).max()



I managed to solve this one using only the article provided, & building on skills from previous days.
I normally stumble on some tiny piece of the code, but didn’t this time.

filter = df['AveragePrice'] > 2.00
df = df[filter]

I also found a solution that works on 1 line to challenge myself a bit more.


Video Solution:


This is another challenge that doesn’t really require the suggested skill. It’s enough to use groupby.


Optimism. No one would farm without it.

We can find the years that have months where the average price spikes high, but beware, those years may not have been the best years for average price overall. Recommendation? More analysis.

filtered_df.groupby(["type", "year"])["AveragePrice"].mean()
df.groupby(["type", "year"])["AveragePrice"].mean()

My Solution:
df.groupby([‘year’, ‘type’])[‘AveragePrice’].max() > 2 (if are True for both (Organic,conventional) this years part of your answer)

1 Like

It does feel like some concepts are being introduced in reverse order. I would have expected filtering to be introduced before groupby().

1 Like

I’m surprised LL classified this question as “hard”.
That said, I think the question " in which year or years did both conventional and organic avocados have had average prices above $2?" could have been a bit more clear, as there is a column called “AveragePrice” and we had to find the annual average of AveragePrice.

1 Like

I think this challenge is weird.
How do I calculate the average price for one year? Because they’re a lot of transaction in a year and each transaction have Average Price so I think it should be
df.groupby(["type", "year"])["AveragePrice"].mean()

But the command above returns all the results below 2$.
Remove all transaction has AveragePrice below 2$ is not the correct answer


When I first read this I thought we were looking for years where the average annual price was over $2 for both types. However using group by with type & years and taking the mean shows no years with the annual average over $2. Using this line where df is the whole dataset. (this is an average of prices in the year, not a weighted average price by production)
df_annual = df.groupby([‘type’,‘year’]).mean()

So the challenge is to find years where the price of both types had a price over $2, so I just applied the same group by to the filtered case which returns the years where samples exceeding $2 with the average of those prices.
filtered_annual = filtered_df.groupby([‘type’,‘year’]).mean()
This returns the average price of entries in a year that were above $2 (whether that is one entry or many), not the full annual average price as above. filtered_df is the same as given in the problem.


Not the best resolution but the one I found.

import pandas as pd

df = pd.read_csv('avocado.csv', index_col = 0)

df_avg_over2 = df['AveragePrice'] > 2
df2 = df[df_avg_over2]

df2.groupby(['type', 'year'])['AveragePrice'].mean()

I tried both of your methods and didn’t come up with an answer I could comprehend. I am completely new to this so it was most definitely operator error or operator misunderstanding than your code.
These are the answers I got from your codes. Please help me understand. Thanks!

esdpw, The second result is showing the counts (number of samples) of the group by applied to the filtered dataframe. So in 2016, there are 4 entries for conventional avocadoes that are over $2 and 360 entries for organic ones over $2. 2016 and 2017 are the only years that show results for both types of avocadoes. Far more entries for organic avocadoes over $2.
I’m not sure what aggregating function you used in the first result, I’m thinking you used minimum, or nothing.
If you change count() to mean() you get the average of the prices over $2 and the average of the corresponding values for volume & small to xlarge bags. What it doesn’t give you is an appreciation for the frequency of higher prices for organic avocadoes.
If you change the order of ‘type’ and ‘year’ you get the same results with the rows in a different order, convention for 2016 & 2017 first, then organic for all 4 years. Just some practice to see how you can present the table differently.

1 Like