For any and all questions relating to challenge 11.

For a tutorial on how to use Jupyter Notebook, we put together this video:

Still have questions? Read all the FAQs here.

To continue to play around with the datasets in a Jupyter environment, click here.

Itâs a pretty boring evening, just waiting for todayâs challenge to dropâŚ

2 Likes

This one was a bit tricky if you donât read about the multi-level index:
groupedDF = df.groupby([âyearâ,âregionâ]).mean()
groupedDF.loc[(2017,âAlbanyâ)][âAveragePriceâ]

4 Likes

Video Solution: https://youtu.be/SDF_TyaFKhQ

Half-way. Congratulations to everyone for making it this far.

5 Likes

I couldnât get the âlocâ to work like the example in the hint, but now by looking at @Andrei solution I understand why.

I used get_group

Summary

`df.groupby(['region', 'year']).get_group(('Albany', 2017))['AveragePrice'].mean()`

6 Likes

The multi level index is the most annoying thing ever. Always makes me want to go back to R

1 Like

Ahhh, thatâs why I was stuck, I was executing those backwards. Thanks!

i got the same thing first, after thinking i had to return how much was made for that year. And second I declared groupby with region first but then when calling it in loc called 2017 first and got errors. I wanted to rip what little hair i have left out

hereâs mine. didnât work until i realized that i goofed the order for the rows and column in .loc() and .groupby; i had them opposite to one another. its late. hereâs the working thing.

Summary
``````df = pd.read_csv('avocado.csv', index_col = 0)
#groupby region and area
Avos = df.groupby(['region', 'year']).mean()
#locate for Albany and 2017
Avos.loc[('Albany', 2017)]['AveragePrice']
``````

Also s/o to everyone hammering this out on a Friday night to keep the head start.

I ended up with a one line do-it-all:

(df.groupby([âregionâ, âyearâ]).mean()).loc[(âAlbanyâ, 2017)]

4 Likes
``````dfx = df[df.region=="Albany"].groupby(['year']).mean()
dfx.AveragePrice[2017]``````
3 Likes

My solution = just flip the order

``````df.groupby(['region', 'year']).mean()
``````

Then it shows the price right off the bat.
It is definitely the most concise solution I made.

2 Likes

Anyone else catch that this isnât actually taking the weighted average price? Its merely taking the average of the prices (i.e. the sum of the average price column divided by the number of average prices).

This isnât taking into account that certain dates have proportionately greater volumes than others and therefore should not have the same weighting

9 Likes

df17A=df[df[âyearâ]==2017]

C_A=df17A.groupby([âregionâ])[âAveragePriceâ].mean()
C_A[âAlbanyâ]

Another solution is to utilize pandas datetime indexing:

``````df["Date"] = pd.to_datetime(df["Date"])
df = df.set_index("Date")
albany_2017 = df.loc['2017']
albany_2017 = albany_2017[albany_2017['region'] == "Albany"]
average_price = albany_2017['AveragePrice'].mean()``````

df.groupby([âregionâ, âyearâ]).mean().loc[(âAlbanyâ,2017)][âAveragePriceâ]

A quick tip (since a few people have gotten tripped up over the grouping order giving them difficult results): it is much easier to find what youâre looking for when you first eliminate everything you donât care about.
`df.loc[(df['region'] == 'Albany') & (df['year'] == 2017)]`
Then you can use groupby() and mean() to aggregate the results to a single row, and it typically wonât matter which column you group by first.

In SQL youâd typically use a `WHERE` clause to limit the selected rows before the `GROUP BY`, and only use a `HAVING` clause afterwards to adjust the results. You can tackle it with basically the same approach in Pandas.

Spoilers

`df.where(df['region'] == 'Albany').where(df['year'] == 2017).groupby(['region', 'year']).mean()`
which is the same as:
`df.loc[(df['region'] == 'Albany') & (df['year'] == 2017)].groupby(['region', 'year']).mean()`

3 Likes