Challenge 8 Megathread

For any and all questions relating to Challenge 8 :point_down: post away!

I think this challenge was better explained than others, but I’ll add a few things.

  • df.loc() and df.iloc() will return a row which can be indexed just like a data frame can. ie.
#return the 10th indexed row
df = pd.read_csv('paris_landmarks.csv')
item_10 = df.iloc[10]
item_10
>landmark      Pantheon
>queue_time           5
>price                5
>Name: 10, dtype: object

#return the price column of the 10th indexed row
item_10.['price']
>5
  • Don’t call print() to output the data frames in Jupyter Notebooks. Simply calling df will produce a printout that is much easier to read. Jupyter knows best.

Picture1

Picture2

8 Likes

Great to be in day 8!!!

Let’s keep up the momentum. Ask many questions here, and we will all think through them together.

2 Likes

Although not part of the challenge, how can we do the min, max, etc for a certain group of data from the entire dataset? For example, if I wanted to only look at the queue times for a price “15” in the data set (or vice versa… let’s say if Dot was on a budget), how could I write that out without looking at the entire dataset? Will we see this use in a challenge later on?

I’m assuming this will come up at some point in future challenges, but pandas loc method can be layered/nested/stacked with other functions to allow you to do this.

So for example, if you run df['price'], pandas will return just the price column with its values.

If you run df['price'] == 15 (or df['price'].eq(15) or any number of other functions (greater than, less than, etc.) from the left panel of this page), you’ll get a series of True & False that shows if each row is equal to 15 or not.

You can nest that in .loc and run df.loc[df['price'].eq(15)] to return a dataframe with just the rows where the price is 15.

Since what was created in that step is just another dataframe, you can work with it in the same way that you would any other dataframe. So you can keep tacking things on to the beginning and end.

So df.loc[df['price'].eq(15)]['queue_time'] gives you the queue time column for landmarks where the price is 15, and df.loc[df['price'].eq(15)]['queue_time'].max() would give you the max queue time for landmarks where the price is 15.

Unfortunately, that doesn’t show you what the landmark is, but it does give you a single numerical value for the max queue time. In general, if you have a single numerical value and want to isolate rows where a column has values that are equal (==) to that value, can you think of a way to pull those rows out of the original dataframe?

4 Likes

Although I haven’t tested this out, would this work? For a data frame, df[df$price == 15, ]. I’m not sure whether this could work within the nested dataframes in your example, but I think this can pull the rows in which the price is 15?

I think @adeelstra 's post answered this pretty succinctly. You might have already seen it, but if not, it’s worth a look.

Hi all, I will be moderating the forum for the next 2 hours. Please let me know if you face any barriers in attempting to complete todays challenge. Also, please remember to not share any answers on this forum.

Denver

You can use: df [ [ ‘landmark’ , ‘queue_time’ ] ] [ df [ ‘price’ ] == 15 ] to get landmark and wait time for exact cost of 15, or use (<=15) to include any locations cheaper than 15. This format is useful when there are many features (columns) and you only want a few.

or

df [ df [ ‘price’ ] == 15 ] to show all of the columns

OK, sorry,I saw that you found this after I wrote the reply :slight_smile: