import pandas as pd
import numpy as np
import statsmodels.api as sm
import seaborn as sns
import matplotlib.pyplot as plt
Data Visualization
= sm.datasets.get_rdataset("mtcars", "datasets", cache = True).data
df
df.head()
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
Seaborn seems to be the most efficient way to get decent looking exploratory plots in a hurry.
Line Plot
= "mpg", y = "disp") sns.lineplot(df, x
<AxesSubplot: xlabel='mpg', ylabel='disp'>
Line Plot by factor
Use the hue
argument to break out factors into separate lines.
= "mpg", y = "disp", hue = "am") sns.lineplot(df, x
<AxesSubplot: xlabel='mpg', ylabel='disp'>
Line plot with linear trend
Mean/CI are automatic if you melt the df.
= pd.melt(df, id_vars = "cyl", value_vars = "mpg")
df_long
df_long
cyl | variable | value | |
---|---|---|---|
0 | 6 | mpg | 21.0 |
1 | 6 | mpg | 21.0 |
2 | 4 | mpg | 22.8 |
3 | 6 | mpg | 21.4 |
4 | 8 | mpg | 18.7 |
5 | 6 | mpg | 18.1 |
6 | 8 | mpg | 14.3 |
7 | 4 | mpg | 24.4 |
8 | 4 | mpg | 22.8 |
9 | 6 | mpg | 19.2 |
10 | 6 | mpg | 17.8 |
11 | 8 | mpg | 16.4 |
12 | 8 | mpg | 17.3 |
13 | 8 | mpg | 15.2 |
14 | 8 | mpg | 10.4 |
15 | 8 | mpg | 10.4 |
16 | 8 | mpg | 14.7 |
17 | 4 | mpg | 32.4 |
18 | 4 | mpg | 30.4 |
19 | 4 | mpg | 33.9 |
20 | 4 | mpg | 21.5 |
21 | 8 | mpg | 15.5 |
22 | 8 | mpg | 15.2 |
23 | 8 | mpg | 13.3 |
24 | 8 | mpg | 19.2 |
25 | 4 | mpg | 27.3 |
26 | 4 | mpg | 26.0 |
27 | 4 | mpg | 30.4 |
28 | 8 | mpg | 15.8 |
29 | 6 | mpg | 19.7 |
30 | 8 | mpg | 15.0 |
31 | 4 | mpg | 21.4 |
= "cyl", y = "value") sns.lineplot(df_long, x
<AxesSubplot: xlabel='cyl', ylabel='value'>
Or, use lmplot
to git a linear model like you’d get with geom_smooth(method = lm)
.
= "mpg", y = "disp") sns.lmplot(df, x
<seaborn.axisgrid.FacetGrid at 0x7f5d567270d0>
Box Plot
= "cyl", y = "mpg") sns.boxplot(df, x
<AxesSubplot: xlabel='cyl', ylabel='mpg'>
Histogram
= "wt") sns.histplot(df, x
<AxesSubplot: xlabel='wt', ylabel='Count'>
Density Plot
= "disp") sns.kdeplot(df, x
<AxesSubplot: xlabel='disp', ylabel='Density'>
Heatmap
"disp", "hp"]]) sns.heatmap(df[[
<AxesSubplot: >
Multiple Variable Plots
KDE
"mpg", "wt"]]) sns.kdeplot(df.loc[:, [
<AxesSubplot: ylabel='Density'>
Lineplot
"mpg", "wt"]]) sns.lineplot(df.loc[:, [
<AxesSubplot: >
Faceting
# create three empty spots
= sns.FacetGrid(data = df, col = "cyl", col_wrap=2)
grid
# puts a historgram on each of them
map(sns.histplot, "wt") grid.
<seaborn.axisgrid.FacetGrid at 0x7f5d55cf2590>
The initial display is automatic. If you want to show the same plot again, access the figure
property of the object.
# just typing it out gives the object metadata
grid
<seaborn.axisgrid.FacetGrid at 0x7f5d55cf2590>
grid.figure
Tweaking Plots
Axis Labels
The plot we made of weight and mpg had mostly unusable x tick labels. Let’s revist it.
= sns.lineplot(df.loc[:, ["mpg", "wt"]])
p_line p_line.figure
The syntax is a little awkward. Essentially there is a set method, and you use a get method to retrieve the labels to pass into it, specifying a rotation.
# set what you get from the get method v--here
= 45)
p_line.set_xticklabels(p_line.get_xticklabels(), rotation p_line.figure
/tmp/ipykernel_9407/667538073.py:2: UserWarning: FixedFormatter should only be used together with FixedLocator
p_line.set_xticklabels(p_line.get_xticklabels(), rotation = 45)
They still conflict a little. We can make them a little smaller overall. The technique is the same, just setting a different property.
= 5)
p_line.set_xticklabels(p_line.get_xticklabels(), size p_line.figure
/tmp/ipykernel_9407/3729791072.py:1: UserWarning: FixedFormatter should only be used together with FixedLocator
p_line.set_xticklabels(p_line.get_xticklabels(), size = 5)
Title
set(title = "0_o")
p_line.
p_line.figure
Color Schemes
Discrete
Seaborn lets you preview color palettes by calling them as a function argument to sns.color_palette
.
"dark") sns.color_palette(
The plotting functions will then have arguments for color scheming:
= sns.boxplot(df, x = "cyl", y = "mpg", palette = "dark") p_box
Continuous
"mako", as_cmap = True) sns.color_palette(
"disp", "hp"]], cmap = "mako") sns.heatmap(df[[
<AxesSubplot: >