I’m just sorting out some of my plotting code into a reusable module and I’ve just discovered how cool pairgrids are. Between Matplotlib and Seaborn there isn’t much you can’t do but there’s so much of it that for Python beginners like me it can take some time to discover it all.

For initial data exploration, facetgrids are very useful. We make some kind of x,y plot like money_spent/time_of_day but then split it by some other facet like day_of_week to give us 7 plots instead of one. I used them in my initial titanic exploration to split by passenger class.

g = sns.FacetGrid(data=df, col="Pclass")
g.map(sns.distplot, "Survived", kde=False, bins=[0,1,2])

Pairgrids allow us to choose a variable of interest (like Survived) and plot it against all the other variables at once. This very quickly allows us to see the relationships in the data, and the way I’ve carved up the data so far, it allows us to see that there are very different relationships between certain variables depending on who (or rather what) you are.

Children with a large number of siblings don’t do well. Women start to do badly as their family size increases, but for men it doesn’t make that much difference. This is already pointing us towards some of the interactions we’re going to need to add to our model.

def LogisticPairPlot(df, y, exclude_cols=None):
  values = df.columns.tolist()
  values.remove(y)
    
  for col in df.columns:
    #drop non-numeric cols
    if np.issubdtype(df.dtypes[df.columns.get_loc(col)], np.number) == False:
      values.remove(col) 

  if exclude_cols is not None:
    for col in exclude_cols: values.remove(col)
    
  g = sns.PairGrid(df, y_vars=y, x_vars=values)
  g.map(sns.regplot, logistic=True, ci=None)
 
LogisticPairPlot(df, "Survived", ["PassengerId"])