Challenge 11 Megathread

For any and all questions relating to challenge 11. :point_down:

For a tutorial on how to use Jupyter Notebook, we put together this video:

Still have questions? Read all the FAQs here.

If you would like to download the data set used in this challenge, click here.

To continue to play around with the datasets in a Jupyter environment, click here.

It’s a pretty boring evening, just waiting for today’s challenge to drop… :slight_smile:

2 Likes

This one was a bit tricky if you don’t read about the multi-level index:
groupedDF = df.groupby([‘year’,‘region’]).mean()
groupedDF.loc[(2017,‘Albany’)][‘AveragePrice’]

4 Likes

Video Solution: https://youtu.be/SDF_TyaFKhQ

Half-way. Congratulations to everyone for making it this far.

5 Likes

I couldn’t get the ‘loc’ to work like the example in the hint, but now by looking at @Andrei solution I understand why.

I used get_group

Summary

df.groupby(['region', 'year']).get_group(('Albany', 2017))['AveragePrice'].mean()

6 Likes

The multi level index is the most annoying thing ever. Always makes me want to go back to R

1 Like

Ahhh, that’s why I was stuck, I was executing those backwards. Thanks!

i got the same thing first, after thinking i had to return how much was made for that year. And second I declared groupby with region first but then when calling it in loc called 2017 first and got errors. I wanted to rip what little hair i have left out :confounded:

here’s mine. didn’t work until i realized that i goofed the order for the rows and column in .loc() and .groupby; i had them opposite to one another. its late. here’s the working thing.

Summary
df = pd.read_csv('avocado.csv', index_col = 0)
df.head()
#groupby region and area
Avos = df.groupby(['region', 'year']).mean()
#locate for Albany and 2017
Avos.loc[('Albany', 2017)]['AveragePrice']

Also s/o to everyone hammering this out on a Friday night to keep the head start.

I ended up with a one line do-it-all:

(df.groupby([‘region’, ‘year’]).mean()).loc[(‘Albany’, 2017)]

4 Likes
dfx = df[df.region=="Albany"].groupby(['year']).mean()
dfx.AveragePrice[2017]
3 Likes

My solution = just flip the order

df.groupby(['region', 'year']).mean()

Then it shows the price right off the bat.
It is definitely the most concise solution I made.

2 Likes

Anyone else catch that this isn’t actually taking the weighted average price? Its merely taking the average of the prices (i.e. the sum of the average price column divided by the number of average prices).

This isn’t taking into account that certain dates have proportionately greater volumes than others and therefore should not have the same weighting

9 Likes

df17A=df[df[“year”]==2017]

C_A=df17A.groupby([“region”])[“AveragePrice”].mean()
C_A[“Albany”]

Another solution is to utilize pandas datetime indexing:

df["Date"] = pd.to_datetime(df["Date"])
df = df.set_index("Date")
albany_2017 = df.loc['2017']
albany_2017 = albany_2017[albany_2017['region'] == "Albany"]
average_price = albany_2017['AveragePrice'].mean()

df.groupby([‘region’, ‘year’]).mean().loc[(‘Albany’,2017)][“AveragePrice”]

A quick tip (since a few people have gotten tripped up over the grouping order giving them difficult results): it is much easier to find what you’re looking for when you first eliminate everything you don’t care about.
df.loc[(df['region'] == 'Albany') & (df['year'] == 2017)]
Then you can use groupby() and mean() to aggregate the results to a single row, and it typically won’t matter which column you group by first.

In SQL you’d typically use a WHERE clause to limit the selected rows before the GROUP BY, and only use a HAVING clause afterwards to adjust the results. You can tackle it with basically the same approach in Pandas.

Spoilers

df.where(df['region'] == 'Albany').where(df['year'] == 2017).groupby(['region', 'year']).mean()
which is the same as:
df.loc[(df['region'] == 'Albany') & (df['year'] == 2017)].groupby(['region', 'year']).mean()

3 Likes