Simple Example of Scatter Plots in Python, using Crop Yields data

62 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

Simple Example of Scatter Plots in Python, using Crop Yields data

Scatter Plots and Crop Yields

Scatter plots can help us to visualize blobby data, which helps us to consider how we can perform unsupervised learning, like K-Nearest Neighbors and K-means clustering.  It helps us to consider grouping.

Scatter plots are also good for supervised learning, because it helps us look at correlation among variables.  This can help us with feature and target selection.  It can also help us to see internally correlated features, which is important: if we can calculate one column from another column in the data, we should remove one of the columns from our regression analysis.  If we see an exponential plot, we can normalize it with a logarithmic formula.

In this example, I look at some Kaggle data on crop yields.

First, I import the data using read_csv, then clean the data to get to a more concise data set.

I need to remove several empty columns, which I do using the syntax:

del crop_df["Unnamed: 12"]

I rename several column, using the syntax;
crop_df = crop_df.rename(columns={'Rain Fall (mm)':'Rainfall'})

I use drop_na to remove unneeded rows.

I make a categorical column to indicate units that have higher, and lower than average yield.

crop_df['Yield_Mean']= np.where(crop_df['Yield'] &gt; 9, 1, 0) to place a 1 in the new column, where the yield is greater than 9.

To create a scatter plot, I use:

crop_df.plot.scatter(x='Rainfall', y='Yield', legend=False)

I try several iterations of this with different values, including nutrients:

crop_df.plot.scatter(x='Nitrogen', y='Yield', legend=False)
crop_df.plot.scatter(x='Phosphorus', y='Yield', legend=False)
crop_df.plot.scatter(x='Fertilizer', y='Yield', legend=False)


This helps us to find correlation across columns.

We can create a colorful plot of multiple values with:

# Color the points by the value of CAT.MEDV
crop_df.plot.scatter(x='Phosphorous', y='Nitrogen',
                        c=['C0' if c == 1 else 'C1' for c in crop_df.Yield_Mean])

# Plot first the data points for CAT.MEDV of 0 and then of 1
# Setting color to 'none' gives open circles
_, ax = plt.subplots()

for catValue, color in (0, 'C1'), (1, 'C0'):
  subset_df = crop_df[crop_df.Yield_Mean == catValue]
  ax.scatter(subset_df.Phosphorous, subset_df.Nitrogen, color='none', edgecolor=color)

ax.set_xlabel('Potassium')
ax.set_ylabel('Nitrogen')
ax.legend(["Low Yield", "High Yield"])    
    
plt.show()
#python #plots #graph #scatter					

Simple Example of Scatter Plots in Python, using Crop Yields data

Nhạc Theo Chủ Đề

Liên kết website