Challenge 11 Megathread

For any and all questions relating to Challenge 11 :point_down: post away!

This question was difficult and took me longer than I expected.

There are a few recipes for doing this question. (If there are more, feel free to reply and let me know)

One way

  1. List the min and max price of each neighborhood. The function suggested in the tutorial might be helpful here.
  2. Use pandas data frame indexing (could be by column name or index) to subtract the smaller row from the bigger row.
  3. Sort the values, then only show the few biggest.

Second Way

  1. Instead of listing the min and max of each neighborhood, find a function that can find calculate the range (ie. do step 1 AND step 2 of the previous recipe at once). It doesn’t have to be a panadas function…it could be from another data science library that starts with an ‘N’
  2. Sort the values, then show the few biggest.

I hope this was helpful without giving away too much

Update: See @YGW 's post further down for a third way!

8 Likes

A Third Way:
(Directly Find Row Instead of Sorting)

  1. Make a new “difference” column in the DataFrame. This can be done using one of the ways you describe. (For how to add a column, see here, and/or see below code example.)

Edit1: This link gives a more complete explanation of how to add a difference column.

  1. Directly find row with max value using loc and idxmax, instead of sorting. (See here for how.)

  2. Display result using print() method or head() method (after using a filter).

# HOW TO ADD A NEW COLUMN

df["NAME_FOR_NEW_COLUMN"] = LIST_OR_SERIES_FOR_NEW_COLUMN

# Replace NAME_FOR_NEW_COLUMN with any name for your new column
# Replace LIST_OR_SERIES_FOR_NEW_COLUMN with 
# a list or series that is the new column you want to add
# The new column can be one that is derived from the existing columns
3 Likes

Which Is Better:
Directly Finding Max (Or Min) Through Search Function OR
Sorting First?

How To Search For Item Directly
You can use the idxmax and loc functions. See here!

Which Is Better
Not sure! But maybe this is how we can figure it out:

One way to decide between the two would be to decide which is more efficient. From what I gather (from quora and stackoverflow), asking the DataFrame to sort its values by some column before we search through it is:

  • less efficient if we have very few items that need to be sorted or if we are going to search through the DataFrame only a few times, and
  • more efficient if we have a long list of items AND we are going to search through it many times

Thoughts?

Anyone have any thoughts on which is better? Or on if it even matters which method we use (sort or direct search)?

2 Likes

Great explanations guys!! Thanks for always helping.

5 Likes

Is there an aggregate function under groupby that can providee range for different list of catrgorical values in a column?

1 Like

Good solution loc+idxmax :+1:
I was tired to use only the sort_values and head(1).
It will be my first option from now on!

1 Like

Hi all, I will be moderating the forum for the next 2 hours. Please let me know if you face any barriers in attempting to complete todays challenge. Also, please remember to not share any answers on this forum.

Denver

I’m stumped on this one. I’ve used the .agg function as described to list off the max and min prices in columns. The things I’ve tried are “not callable”. How do I “call” the columns to subtract them?

I had such a busy day and was only able to do this 8PM my time. I used your first solution and got it in 10mins
:laughing:

I had to make 3 variables (PriceMax, PriceMin, and Sorted). PriceMax I agg the max price. PriceMin I agg the min price. Sorted I max minus min. Then I sorted the Sorted variable

This one is pretty frustrating (I’m a python noob). I’m using the grouping by neighbourhood using price and max(), then sorting to make it ascending, like we did in yesterday’s challenge. So I can get the max and min values, but I don’t know how to do the dataframe indexing to subtract one from the other.

Eg, I have this code:

groupedMax = df.groupby('neighborhood')[['price']].max()
maxPrice = groupedMax.sort_values('price',ascending=False)

But I’m not sure what to do next :slight_smile:

I ran into the same trouble. A little digging led me to this. If you take their example of:

df.groupby([‘neighborhood’]).agg({‘price’ : [‘mean’,‘max’]})

you can access the columns like this:
df.groupby([‘neighborhood’]).agg({‘price’ : [‘mean’,‘max’]})[‘price’][‘mean’] #for the mean column
df.groupby([‘neighborhood’]).agg({‘price’ : [‘mean’,‘max’]})[‘price’][‘max’] #for the max column

A possible syntax: df[‘new_column’] = df[‘old_column_1’] - df[‘old_column_2’] if that’s what you’re referring to.

I finally got it. Thank you, @Android451 and @JD2022!

Thanks @Denverdias !

I found that syntax, but I’m not sure how to make the ‘old’ columns. if I do:

df["maxPrice"] = df.groupby(['neighborhood']).agg({'price' : ['max']})

I just get NaN in that new column.

df[“maxPrice”] = df.groupby([‘neighborhood’], as_index=False)[‘price’].max() would be a better way of phrasing that.

Any way to download the data and play with it after the challenges?

Check out this post! It explains:

  1. How to download the data
  2. How to upload to a google colab account where you can use a very similar set up to what we see in these challenges (i.e. a jupyter notebook). The colab account is free!
1 Like

Do you know who I reach out to if I got the email saying I missed the challenge last night even though I did complete it? I was working on it past 9:30 PST so it should have still been applicable…