Challenge 10 Megathread

For any and all questions relating to Challenge 10 :point_down: post away!

This question is badly worded. I hope this write-up helps others that were as confused as I was.
The question asks:
Which neighborhood has the highest average property price and the highest size_in_sqft?
What it should say:
Which neighborhood has the highest average property price and the highest average size_in_sqft?

The way the question is worded, you might think that they are looking for:

  1. Neighbourhood with the highest AVERAGE price
  2. Neighbourhood with the highest MAX size_in_sqft.

But they are actually looking for:

  1. Neighbourhood with the highest AVERAGE price
  2. Neighbourhood with the highest AVERAGE size_in_sqft.

This is not the first time Lighthouse Labs has made a mistake in and/or not proofread their questions, which ignites distrust in me. If I have two chances to get the right answer and both are wasted due to the question being worded wrong or badly, Lighthouse Labs has messed up.

40 Likes

Thanks for sharing the correction. I was bashing my head trying combinations of filters to figure it out!

2 Likes

I also eventually came to this conclusion after seeing that none of the answers matched what I had when using the max size in sqft. Turns out one word makes all the difference!

1 Like

A few more things. This question will require you to use:

  • a_df.groupby(A)[[B]], where A is the name of the column you want to group by, and B is the name(s) of a single or multiple columns you wish to be outputted after grouping by A. groupby() is further explained in the Challenge 10 tutorial and the docs can be found here

  • a_df.mean() - Calculates the mean over a variety of axes. Can be read about more here

  • a_df.sort_values(A) - Will sort a data frame by values in the column name A. The default is ascending (smallest at the top). There might be a way to change this… read more here

  • a_df.head(K) - Only prints/displays the top K rows of the data frame. Extremely useful when trying to understand a dataset without printing out the whole data set. K will default to 5 but can be changed as needed. See here for more details.

10 Likes

So I enjoyed this challenge to play around and use different functions but the way the challenge question was worded really buggered me up. ’

Which neighborhood has the highest average property price and the highest size_in_sqft?

I think if it was written as
1.Which neighborhood has the highest average property price? and;
2. Which neighborhood has the highest size_in_sqft?

Unless I am still misunderstanding the question because the sqft answer seems to be very low to be considered the ‘highest size_in_sqft’

I would have stopped choosing the top 2 that I thought fit the bill of being BOTH the highest price and sqft… but alas I failed my first challenge and feel silly that I understood the question wrong because of the wording. As soon as I saw the work I was like ooooohhhhhh I had to do the sqft separate.

Is there a way I can continue to play with that dataset of dubai properties anywhere else or its locked away now that I finished ?? because I don’t remember that the answer was even high up on the sqft list. I saw sqfts being way up in the 6000’sqft and the final answer seemed way below that…

2 Likes

Hello everyone!

Kindly drop your questions and discuss amongst yourselves without sharing answers.

Thank you for sharing, the inaccurate wording has been a challenge. But overall I am finding out how to use pandas so it’s still a learning experience.

4 Likes

@amariela Great to know that you are learning from the challenge.

Thanks everyone for pushing me in the right direction. The wording defiantly had me running in circles

1 Like

Thank you. I read it the other way too!

2 Likes

Sorting in pandas:

Example data:

print(df)

        0          1     2
0   354.7      April   4.0
1    55.4     August   8.0
2   176.5   December  12.0
3    95.5   February   2.0
4    85.6    January   1.0
5     152       July   7.0
6   238.7       June   6.0
7   104.8      March   3.0
8   283.5        May   5.0
9   278.8   November  11.0
10  249.6    October  10.0
11  212.7  September   9.0

Sorting by month:

In [18]:
df.sort_values('2')

Out[18]:
        0          1     2
4    85.6    January   1.0
3    95.5   February   2.0
7   104.8      March   3.0
0   354.7      April   4.0
8   283.5        May   5.0
6   238.7       June   6.0
5   152.0       July   7.0
1    55.4     August   8.0
11  212.7  September   9.0
10  249.6    October  10.0
9   278.8   November  11.0
2   176.5   December  12.0

Source:

Question is worded so terribly…

3 Likes

Came here to say this! Had to click on the hint, which clarified what was actually needed. Silver lining is I got to practice a lot of ways to analyze data frames while looking for what I was getting wrong.

1 Like

Got it! It took a while to figure it out.

It would be nice if the question text could be updated to stop all the confusion and frustration. Thanks!

Hi Everyone!

How To Keep Playing With The Data Sets

If you want to know how to keep playing with the data sets after the challenge has been completed/submitted, check out bruvark’s awesome workaround in the below link.

The suggested work around looks daunting and complicated at first, but it’s actually very straightforward. I can try to help anyone that has trouble with this!

1 Like

When using groupby, if the output is not displayed by group of values in the column, what could I do be doing wrong? That is, it did not group values by neighborhood, output displayed each value of neighborhood as a separate row, instead of grouping them by the split constraint

1 Like

There is! Check out the below link for a workaround. It seems complicated or difficult at first, but it’s actually very straightforward! I can help if you want/need any.

Yes, the lack of proofreading is insane. You’d think that would be really important in this area. Anyway, I really appreciate all your helpful comments in these forums.