Challenge 10 Megathread

For any and all questions relating to challenge 10. :point_down:

For a tutorial on how to use Jupyter Notebook, we put together this video:

Still have questions? Read all the FAQs here.

If you would like to download the data set used in this challenge, click here.

To continue to play around with the datasets in a Jupyter environment, click here.

completed, and correctly - this built well on top of yesterday’s challenge!
But wondering one thing:
I’ve entered the 2020 rows manually, and would prefer to have it automatically look those up. How would you write code to search the column for the 2020 entries? Or is that beyond our scope at this point?

df['Total Revenue'][155:167]
4 Likes

A couple options:

  • extract the year out of the month column into a new “Year” column, then filter for 2020, use .apply for this to do some text manipulation
  • convert the whole Month column into proper datetime format using pd.to_datetime

Here are a couple of my solutions:
Method 1:

df[‘Total Revenue’] = df[‘Number of Cows’] * df[‘Monthly milk production: pounds per cow’] * df[‘Price_Per_Pound’]
df[‘Year’] = df[‘Month’].apply(lambda x: x.split("-")[0])
df[df[‘Year’]==‘20’][‘Total Revenue’].sum()

Method 2:

df[‘Total Revenue’] = df[‘Number of Cows’] * df[‘Monthly milk production: pounds per cow’] * df[‘Price_Per_Pound’]
df[‘Date’] = pd.to_datetime(df[‘Month’], format="%y-%b")
df[df[‘Date’].dt.year==2020][‘Total Revenue’].sum()

11 Likes

This is my solution


I found the year 2020 by checking if the value of month contains ‘20-’ would that be good solution?

7 Likes

looks fine to me - at least for the way the data is in this table.
I also like your use of the colon/comma after the total_revenue_2020 to get it to format as $202,149.76

2 Likes

excellent, thanks! i’m learning so many things! :smiley:

1 Like

I could’ve done better by using some date functions to work with dates, but I’m pretty happy with this how little code this ended up being:

eligible_year = df['Month'].str[:2] == '20'
df[eligible_year]['Total Revenue'].sum()
2 Likes

For this, and challenge 7, I used the string split function to create a new column for the year, and then grouping by that:

df[['Year','Month']] = df.Month.str.split("-",expand=True,)
df.groupby(by=['year']).sum()

Does anyone else find it a little weird that the milk is measured in pounds, instead of Litres ?

3 Likes

Thanks, I googled it, I didn’t know how to do that :wink:

1 Like

I like your way better than mine. Its so clean

1 Like

The .apply() with a lambda in this case was probably overkill. I usually use that for more complex string manipulation. The way conste11ations did it above is perfectly fine, using str[:2].

In any case I usually like to convert my “text” dates into real dates so I can later filter by year, month, day, etc. so method 2 is probably better.

After some research and finding the .str(x:y) and seeing thats it includes x and is up to but not including y I used it to make a new series for the year. I was a little confused before I found out that little detail because i assumed that it included the next number :fearful:

Summary
  import pandas as pd

df = pd.read_csv('milk_32.csv')
df = df.drop(columns = ['Unnamed: 0'])

#rename this because its too long "inplace=True" so it changes actual column
df.rename(columns={"Monthly milk production: pounds per cow":"Milk/Cow"}, inplace=True)

#maths to fill columns
df['Total Milk Production'] = df['Milk/Cow'] * df['Number of Cows']
df['Total Revenue'] = df['Total Milk Production'] * df['Price_Per_Pound']

#make new column for year
df['Year'] = df['Month'].str[:2]

#Add df['Total Revenue'] WHERE  df['Year'] == 20
profit = df['Total Revenue'][df['Year'] == '20'].sum()
print("Total Revenue for 2020: ${}".format(profit))
1 Like

So simple, when i googled it stack overflow wanted to import itertools or loop over the number

A solution leveraging the given hint:

df['Total Milk Production'] = df['Monthly milk production: pounds per cow']*df['Number of Cows']
df['Total Revenue'] = df['Total Milk Production']*df['Price_Per_Pound']
requested_revenue = df['Total Revenue'][155:167].sum()
print(requested_revenue)
3 Likes

The solution is given below:

df['Total Revenue'] = df['Number of Cows'] * df['Monthly milk production: pounds per cow'] * df['Price_Per_Pound']
df['Date'] = pd.to_datetime(df['Month'], format="%y-%b")
df.loc[df['Date'].dt.year == 2020, 'Total Revenue'].sum()
1 Like

This is the fastest method:

df.loc[df['Month'].str.slice(0,2) == '20', 'Total Revenue']

4 Likes

Alternative method compared to slicing:

df[‘Year’] = df[‘Month’].str[0:2]
df2 = df.groupby(‘Year’)[‘Total Revenue’].sum()

You can then take a look at what the revenue value is in df2.

2 Likes

This definitely works for this data.

Working with dates in a nonstandard string format as the dataframe provides is bad practice, and it’s kind of weird that that’s how they’ve given the data to us. The month field should either be a real date/datetime object or there should be separate year and month fields, both as integers. With columns like that, it would be trivial to slice for what you want. Of course, in the real world, you’re going to get messy data like this all the time, but that’s why the first step is always data cleaning. Manually slicing the dataframe by index after inspecting it with tail, like they suggest in the hint, is bad practice.

3 Likes