Challenge 9 Megathread

For any and all questions relating to challenge 9. :point_down:

For a tutorial on how to use Jupyter Notebook, we put together this video:

Still have questions? Read all the FAQs here.

If you would like to download the data set used in this challenge, click here.

To continue to play around with the datasets in a Jupyter environment, click here.

1 Like

Okay here we go, you’re actually making me read the docs now.

11 Likes

This was surprisingly easy. If you follow the two lines of code and the methods they describe in the explanation, getting to the solution pretty straightforward. This was one of their easier and less windy problem.

Here’s what I did in case you want to know:

Summary

df.median()
df[‘Monthly milk production: pounds per cow’] = df[‘Monthly milk production: pounds per cow’].fillna(value = 755.5)
df[‘Number of Cows’] = df[‘Number of Cows’].fillna(method = ‘ffill’)
df.describe()

6 Likes

I think this challenge is good, but I find the language imprecise. It is not the actual rows that are missing, but cells ie values within the rows.

10 Likes

I also found this fairly easy, needed a little extra googling to get the ffill to work though.
I forgot that the describe function gives all the basic analytics, and did them all separately.

I am curious why df.ffill didn’t work, but did once I added the inplace=True?

df.median()
df[‘Monthly milk production: pounds per cow’] = df[‘Monthly milk production: pounds per cow’].fillna(value = 755.5)
df.ffill(inplace=True)
df[‘Monthly milk production: pounds per cow’].mean()
df[‘Monthly milk production: pounds per cow’].std()
df[‘Number of Cows’].mean()

1 Like

This was a nice challenge. Made me read up on a few docs.

My solution:

monthly_milk_median = df['Monthly milk production: pounds per cow'].median()
print("Monthly Milk Produciton Median: " + str(monthly_milk_median))

df['Monthly milk production: pounds per cow'] = df['Monthly milk production: pounds per cow'].fillna(value = monthly_milk_median)
df['Number of Cows'] = df['Number of Cows'].fillna(method = 'ffill')

df.isnull().sum(axis = 0) # check to make sure the fills worked

print("The average monthly milk production is " + str(df['Monthly milk production: pounds per cow'].mean()) + " pounds per cow.")
print("The stdev for monthly milk production is " + str(df['Monthly milk production: pounds per cow'].std()) + " pounds per cow.")
print("The avg number of cows used is " + str(df['Number of Cows'].mean()) + " pounds per cow.")
1 Like

After much trial and error to get the gist of everything here’s where I’m at.

Spoilers Ahead!

Summary
import pandas as pd

df = pd.read_csv('milk_2.csv')

milk = "Monthly milk production: pounds per cow"

median_milk = df[milk].median()
df[milk] = df[milk].fillna(value = median_milk)

df['Number of Cows'] = df['Number of Cows'].ffill(axis = 0)

print("Q1: The Average monthly milk prod is {}".format(round(df[milk].mean(), 4)))
print("Q2: The standard diviation for monthly milk prod is {}".format(round(df[milk].std(), 4)))
print("Q3: the average number of cows is {}".format(round(df['Number of Cows'].mean(), 4)))

when you call the ffill method on a dataframe, the method returns a copy. When you pass the “True” parameter into inplace, it will modify the dataframe. You could also have alternatively used the fillna() dataframe method with “ffill” (as a string) passed as a parameter for “method”

Many dataframe methods end up returning a copy, so you can either write over the original dataframe like so
df = df.method()
or use the “inplace” parameter like so
df.method(inplace = True)

2 Likes

Clean and simple. Follow the code in the challenge set up and finish with df.describe()
to have a clean stat overview of the filled in table.

Wasn’t so hard today.

    import pandas as pd
df = pd.read_csv('milk_2.csv')
mean_number = df['Monthly milk production: pounds per cow'].median()
df['Monthly milk production: pounds per cow'] = df['Monthly milk production: pounds per cow'].fillna(value = mean_number)
df['Number of Cows'] = df['Number of Cows'].fillna(method= 'ffill')
df['Monthly milk production: pounds per cow'].mean()
df['Monthly milk production: pounds per cow'].std()
df['Number of Cows'].mean()
2 Likes

Thanks for sharing what I place=true is I feel like the Panama’s docs don’t really explain what all the parameters are that well. I’ve had better luck with here and stack overflow :sleepy:

2 Likes

I just wish the one column name wasn’t so long. These past few example I’ve resorted to making a variable with it as the string so I didn’t have to keep typing it out :skull_and_crossbones:

3 Likes

Anyone looking to use the data file locally can find it here.

You can access the online datafile directly like this:

import pandas as pd
url = 'https://gist.githubusercontent.com/pbeens/8332f72c84a2f21b77ac116ba0da0eec/raw/c5e2bdc7a3de071fed46f575af62ecec5d65a087/LL-21DDC2021-milk_2.csv'
df = pd.read_csv(url)
2 Likes

I wanted to limit typing out the column names as well in my experimentation, so I added a little line to adjust mine at the top (took the idea from someone yesterday):

import pandas as pd
df = pd.read_csv(‘milk_2.csv’)
col = df.columns.values
df[col[2]]=df[col[2]].ffill()
df[col[1]]=df[col[1]].fillna(value = df[col[1]].median())

print(“Q1.”,round(df[col[1]].mean(),4),"(Average",col[1],")")
print(“Q2.”,round(df[col[1]].std(),4),"(Standard Deviation",col[1],")")
print(“Q3.”,round(df[col[2]].mean(),4),"(Average",col[2],")")

3 Likes

We have also included links to the data files as well as a Jupyter environment to play around with the datasets at the top of each challenge megathread

1 Like

Video Solution: https://youtu.be/wC7aj9dpkUY

3 Likes

Pandas dataframes rock!

I noticed that df.isnull().sum() works the exact same as df.isnull().sum(axis = 0). Can someone explain what the axis=0 was supposed to do and why it might have been included in the example?