Simple Example of Scatter Plots in Python, using Crop Yields data

Simple Example of Scatter Plots in Python, using Crop Yields data

62 Lượt nghe
Simple Example of Scatter Plots in Python, using Crop Yields data
Scatter Plots and Crop Yields Scatter plots can help us to visualize blobby data, which helps us to consider how we can perform unsupervised learning, like K-Nearest Neighbors and K-means clustering. It helps us to consider grouping. Scatter plots are also good for supervised learning, because it helps us look at correlation among variables. This can help us with feature and target selection. It can also help us to see internally correlated features, which is important: if we can calculate one column from another column in the data, we should remove one of the columns from our regression analysis. If we see an exponential plot, we can normalize it with a logarithmic formula. In this example, I look at some Kaggle data on crop yields. First, I import the data using read_csv, then clean the data to get to a more concise data set. I need to remove several empty columns, which I do using the syntax: del crop_df["Unnamed: 12"] I rename several column, using the syntax; crop_df = crop_df.rename(columns={'Rain Fall (mm)':'Rainfall'}) I use drop_na to remove unneeded rows. I make a categorical column to indicate units that have higher, and lower than average yield. crop_df['Yield_Mean']= np.where(crop_df['Yield'] > 9, 1, 0) to place a 1 in the new column, where the yield is greater than 9. To create a scatter plot, I use: crop_df.plot.scatter(x='Rainfall', y='Yield', legend=False) I try several iterations of this with different values, including nutrients: crop_df.plot.scatter(x='Nitrogen', y='Yield', legend=False) crop_df.plot.scatter(x='Phosphorus', y='Yield', legend=False) crop_df.plot.scatter(x='Fertilizer', y='Yield', legend=False) This helps us to find correlation across columns. We can create a colorful plot of multiple values with: # Color the points by the value of CAT.MEDV crop_df.plot.scatter(x='Phosphorous', y='Nitrogen', c=['C0' if c == 1 else 'C1' for c in crop_df.Yield_Mean]) # Plot first the data points for CAT.MEDV of 0 and then of 1 # Setting color to 'none' gives open circles _, ax = plt.subplots() for catValue, color in (0, 'C1'), (1, 'C0'): subset_df = crop_df[crop_df.Yield_Mean == catValue] ax.scatter(subset_df.Phosphorous, subset_df.Nitrogen, color='none', edgecolor=color) ax.set_xlabel('Potassium') ax.set_ylabel('Nitrogen') ax.legend(["Low Yield", "High Yield"]) plt.show() #python #plots #graph #scatter