[1,5]: Communication

The main line of code you’re going to be writing a lot is:

import matplotlib.pyplot as plt

That gives you access to the world of matplotlib.
Plotting is an art, just like person-to-person communication. The way one externalizes his views to the outside should cover an entire degree. Always try to put yourself in the shoes of whoever is going to see your plot.
This means that you should not take things for granted. You have put a lot of work on your data and ended up internalizing things that are far from obvious for a first-time-reader.
Here you will find the tools needed to build the foundation for drawing your data, that is the final part of this brief set of lessons.
Matplotlib is a graphic library for 2D plotting. Its main use is through the pyplot module.
Repetita Iuvant:

import matplotlib.pyplot as plt

In this way we are telling the Python interpreter to import the matplotlib library module pyplot and call it plt. This is just a convention, yet when you will be looking for code bits around the internet it will be imported like this. Always (just like “numpy as np” and “pandas as pd”).
The library is object-oriented, that is, you will be instantiating objects that are provided through pyplot.
The object hierarchy is the following:
Artist: quite everything is an Artist, this is just a top level object that you will likely never instantiate directly.
Figure: it keeps track of children AXES (not Axis).


Axes: a plot, a region of the image with the data space. An Axes contains 2 (or 3) Axis objects.
Axis: The Axis for the plot, their location is determined by a Locator and the tick labels by a Formatter. You mostly use direct Axes methods to modify these elements though.
The simplest way to plot data is to call pyplot methods without using axes or figures.

 import matplotlib.pyplot as plt
 import numpy as np
 #Fixing the seed is needed for reproducibility when generating pseudo-random data
 #Let's simulate some log-shaped data
 x=np.arange(1.0,100.0, 5.0)
 y= np.log(x) + np.random.rand(len(x))
 plt.scatter(x, y)

Although this is already nice as it is, we can play with the options: adding a third parameter to the scatter method gives us control on the size of the dots.
The variable passed as size should be a constant number or a list (or numpy array) with the same dimension of x and y.

 #All multiples of 6 will be given size 0, the others will be amplified
 #according to their modulus 6.
 s= x%6 * 50
 plt.scatter(x, y, s)

Another fancy parameter is the alpha channel. You can control it with the parameter alpha. Combine it with the color (c) parameter and you have partial control over the RGBA spectrum.

 plt.scatter(x, y, s, c='g', alpha=0.3)

However the alpha parameter, as it is, can only be a scalar value (not an array), so you can only change the alpha channel of all points (in contrast to the size parameter).
There is a way to have full control on this aspect, however it is not straightforward. You will need to pass an RGBA matrix to the color parameter.
The matrix shape should be (len(x),4), a row of RGBA values for each point, here’s a way to plot individual alphas:

 rgba_colors = np.zeros((len(x),4))
 # for green the second column needs to be one
 rgba_colors[:,1] = 1.0
 # the fourth column is your alphas
 alphas= np.random.rand(len(x))
 rgba_colors[:, 3] = alphas
 plt.scatter(x, y, s, c=rgba_colors)

Individual alphas can enhance or reduce the focus on some elements based on a criterion of your choice.

Figure and Axes

The previous examples exploit the plot methods directly, without using the other objects matplotlib provides us with. But why bother if a simple plot is perfectly fine without complicated using complicated objects?
Because sometimes you need more control over the image. The main use, and the one covered here is plotting a singlefigure with 4subplots.
We can call the methodsubplots, which returns a figure object and a 2D array with axes objects:

#subplots arguments are the dimensions of the subplot grid
 #can be one-dimensional (e.g. subplots(3))
 fig, axarr = plt.subplots(3,2)
 #axarr will be a 3x2 2D array with the corresponding axes

 #let's draw something sinusoidal from 0 to 2π, with 400 points,
 #this can be accomplished with the function linspace from numpy
 fig, axarr = plt.subplots(3,2)
 #since axarr is a 2D array we can select its elements with [i, j]
 # or with [i][j], both notations are fine
 axarr[0][0].scatter(x,y, s=1)
 axarr[0, 1].scatter(x,y**2, s=1)
 axarr[1, 0].hist(y)
 axarr[1, 1].hist(y**2)
 axarr[2, 1].plot(y+y**2+y**3)
 #we can leave one or more blank

But this is too messy:

  • some ticks are covered
  • no labels are present

“What am I seeing?”
To the greater possible extent, the plot should speak for itself.

 axarr[0, 0].set_title('Axis [0,0] is sin(x)')
 axarr[0, 1].set_title('Axis [0,1] is (sin(x))^2')
 axarr[1, 0].set_title('histogram of sin(x)')
 axarr[1, 1].set_title('histogram of (sin(x))^2')
 axarr[2, 0].set_title('Axis [2,0]. I am intentionally blank\n with a very long title,\nbetter use \\n!')
 axarr[2, 1].set_title('sin(x)+sin(x)^2+sinx(x)^3')

With tight_layout the plot forces labels not to overlap (and not to go outside boundaries).
Be aware however that long titles like the one in [2, 0] can force other plots to reduce their size.

Shared axis

Depending on the situation, subplots can share one or both the axis, this can lead to better readability.

 fig, axarr=plt.subplots(2, sharex=True)
 #sharey is a valid parameter too
 fig2, axarr2=plt.subplots(2, sharey=True)
 #by default, subplots are considered on different rows
 fig3, axarr3=plt.subplots(1, 2, sharey=True)
 #this should make more sense for a shared y-axis
 #put some labels
 axarr[0].set_title("sharing X axis")
 axarr[1].set_xlabel("X values (with these units)")
 axarr2[0].set_title("sharing Y axis,\nbad disposition")
 axarr2[0].set_ylabel("common Y values")
 axarr2[0].set_xlabel("X values for first plot")
 axarr2[1].set_xlabel("X values for second plot")
 fig3.suptitle("sharing Y axis,\ngood disposition")
 axarr3[0].set_ylabel("common Y values (with these units)")
 axarr3[0].set_xlabel("X values for first plot")
 axarr3[1].set_xlabel("X values for second plot")

Labels were needed in different places depending on the plot, always keep in mind what you want the final result to be, and plan accordingly.
Notice that in the third figure we used the figure method suptitle instead of the axes method set_title. This was done to achieve that title in-between the two images. You imagine the figure object as a container for its axes, and figure.suptitle writes at the top of that container.