This tutorial assumes you have basic knowledge and understanding of;
Matplotlib is a library that aids in the visualization of data in machine learning.
We can plot histograms, line graphs, pie charts, scatter plots etc.
Let's dive right into using matplotlib.
We'll start by importing the libraries which will include pandas, numpy and matplotlib
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Let's start off with some basic examples.
Let's say that we want to plot a curve of x-squared.
We can start off by creating a list of the values of x.
x = [i for i in range(-5, 6)]
y = [(j ** 2) for j in x]
We can then plot a graph of y against x.
plt.title('A graph of x-squared') # This gives a title to our graph
plt.xlabel('x-values') # This gives a label to the horizontal axis of our graph
plt.ylabel('y-vales') # This gives a label to the verticle axis of our graph
plt.plot(x, y)
Let's say we want to find an insight on how a student spends their day.
We can begin by collecting some data, say for a week. We record the hours the student spends sleeping, eating, working and playing.
We can create a dataframe from such data.
df = pd.DataFrame({
'sleeping': [7,8,5,9,7,10,9],
'eating': [3,3,4,3,2,1,1],
'working': [10,8,7,10,9,2,2],
'playing': [4,1,2,3,4,6,6]
}, index = [1,2,3,4,5,6,7]
)
# We can the give our index a name
df.index.name = 'days'
df
Now it is hard to interprete such raw data. We can, however, plot a stack plot to visualize this data.
To do this, we can use the stackplot(x, 'all y elements')
plt.figure(figsize=(10,5)) # This creates the figure on which the stackplot is drawn
plt.stackplot(
df.index, df['sleeping'],
df['eating'], df['working'], df['playing'],
labels=['Sleeping', 'Eating', 'Working', 'Playing'] # The labels help us show the legend on the stack plot
)
plt.title('How the student spends each day')
plt.ylabel('Hours')
plt.xlabel('Days')
plt.legend(loc='upper right') # This shows the legend on the top-right corner of the graph.
We can go ahead an visualize how the student spends each day using a pie chart.
Here we create slices from a given day. These slices can be seen as the different slices a pie chart can have.
So we shall select data for a given day and use the pie() function.
plt.figure(figsize=(5,5))
plt.pie(
x=df.loc[3], # This sets the x values to the values of day 3
labels=df.columns, # This sets the column headers as the labels of the pie chart
startangle=90, # This sets the pie chart to start at angle 90. But it is optional.
shadow=True, # This creates a shadow around the slices of the pie chart. But it is optional.
explode=(0.1,0,0.1,0), # This splits all the slices to become individual. But it is optional.
autopct='%1.1f%%' # This adds the percentages in the slices.
)
plt.title('How the student spends a day')
A scatter plot can be useful in giving the correlation between different values. For the scatter plot, let us import a dataset with the salaries of a given company for employees at different levels.
You can download the dataset we are going to use. Download Dataset
df = pd.read_csv(
'Position_Salaries.csv'
)
Taking a glance at how our data looks.
# This time, let us look at the first 3 rows
df.head(3)
We can get very interesting insight if we visualized this dataset.
Let's find out how the salaries relate to the position levels.
This can be achieved by plotting the Level against the salary.
We use the scatter(x, y) function and provide the relevant x and y values. Note that we are using our matplotlib.pyplot which we imported as plt.
plt.figure(figsize=(10, 5))
plt.scatter(df['Level'], df['Salary'])
This is a very intersting insight. We can easily see that the salary increases with increase in the position level in that given company.
We can also do bar graphs.
plt.figure(figsize=(10, 5))
# Let's create the first set of bars
x = [2,4,6,8,10]
y = [2,3,1,4,5]
# Let's create the second set of bars
x2 = [1,3,5,9,7]
y2 = [7,8,2,5,2]
plt.bar(x,y,label='Bars1') # This plots the first bargraph
plt.bar(x2,y2,label='Bars2') # This plots the second bargraph
plt.xlabel('x-values')
plt.ylabel('y-values')
plt.title('Interesting bar graph\nCheck it out')
plt.legend()