Challenge 19 Megathread

For any and all questions relating to Challenge 19 :point_down: post away!

2 more days to go!
A series of good exercises to do data analysis from simple scratch! :slight_smile:

I am lost in this challenge something is missing, I followed the hints but the output doesn’t seem to have the right answer. Also, the second hint doesn’t make sense why do we need to filter by Month ? Can anyone please point me the right direction or what am i missing. Thanks in advance.

Let us do some questions check here:

  1. What is the origin_city_name which was used by the most people for travelling to Vancouver?
  2. According to our database, how many people travelled from that city?

These two questions can be considered as one, what is origin_city_name that most people for travelling to Vancouver, and how many of them in the given dataset?

Keywords, Travelling to Vancouver:
Q1. Which column indicate that in the dataset?
Q2. What is the value of Vancouver in the column found in previous step?

Once you figure it out Q1 and Q2, check on origin_city_name by group and you get the answer,

  1. Use a histogram to plot the probability distribution of distances for all routes in June 2021.
    Keywords, June 2021:
    What is the column represents June 2021 or 6, 2021?

Once you found out, you get the relevant dataframe, then use a histogram to plot the probability distribution of distances for all routes.


I dont know why now am getting a really weird error, after filtering the data with DEST_CITY_NAME it returning a empty dataset. I did the filtering before it worked and now its not.

Try refreshing your challenge page and put the code in again. That happened to me but my code worked fine after the refresh.

1 Like

Are you using the right value in the dataset to get Vancouver?
Check the value 1st!
Hint, since Vancouver is a part of word in the column you picked, which means there are something else there.

Thanks will try that now !!

Thanks, It worked after refreshing the browser !!

Thanks, I am working on it now. I was thinking similar Q1 and Q2 should be same.

Sometimes being from B.C. presents an advantage, like knowing that the airport code for Vancouver is YVR instead of guessing the format of the destination city!


my suggestion: make sure to know the names of the origin and destination cities. You could use something unique for the column to know all the names…

you could also sort from high to low.

So sad that 3 is just for the show. That was a more analytical question.

Great use of all the skills we’ve learned!

Hello everyone!!! I see that we are all enjoying today’s challenge. Let’s keep up to conversation; ask questions, give suggestions, and answer questions.

Morning! Just curious: why ask three questions when only two are involved in the challenge responses?

I thought my filtering was wrong but it turns out the ‘DEST_AIRPORT_ID’ got summed up when I did .groupby([‘ORIGIN_CITY_NAME’]).sum(‘PASSENGERS’). Is there a way to ONLY sum up the number of passengers without also summing up the ‘DEST_AIRPORT_ID’ and any other column I may have unintentionally summed up?

If you are interested in only the sum of passengers then you could use np.sum on that particular column.

Fun challenge,

For the first question I didn’t knew the exact name assigned to Vancouver so I googled the Id of Vancouver airport (YVR) and filtered by DEST to see what was the name of the city in the df.

then I just filtered by DEST_CITY_NAME with the right name of the city(I didn’t use the filter by ID because it may be another airport with a different ID in the same city)

anyways It may help somebody. Only two days left! :slight_smile:


I’m not so used to airports’ ICAO/ LATA codes.
So, I solved using partial matches with “Vancouver” with .match() / .contains().


There may be a better way but I had success with using the below format to filter for values that contain “Vancouver”