allenfrostline

How to Beautify a DataFrame?


2020-03-29

In this example we try to visualize a dataframe more intuitively by coloring its backgrounds based on values.

import pandas as pd
import matplotlib.colors as clr

First we read data from data.csv. This is expected to be a sample dataset containing two columns, both representing daily returns (in percentage). Eventually we’d like to visualize the weekly returns by coloring green for gains and red for losses.

data = pd.read_csv('data.csv', index_col=0, parse_dates=[0])
data.head()

It turns out that the data is stored in percentage strings. So before any other steps we convert them to floats first.

data = data.apply(lambda x: x.str.rstrip('%').astype(float) / 100, axis=0)
data.head()

To make the table look better, we change the column and index names.

data.columns = ['Return 1', 'Return 2']
data.index.name = 'Date'
data.head()

Resample to weekly, closed on market close of every Friday. Notice here both closed and label are set to 'right' by default. We specify them explicitly here just for the sake of clarity.

data = data.resample('W-Fri', closed='right', label='right').sum()
data

Style Added: Same Height of Index and Column Names. As the name indicates we want index name to be of the same row as columns.

data.reset_index().style.hide_index()

Style Added: Percentage Format. we format the data back to percentages with dates in %Y-%m-%d style.

data.reset_index().style \
    .format({'Date': '{:%Y-%m-%d}', 'Return 1' :'{:.2%}', 'Return 2' :'{:.2%}'}) \
    .hide_index()

Style Added: Colored Background. We color green for gains and red for losses. Specifically, the deeper color a cell has, the more significant that week’s gain/loss is.

data.reset_index().style \
    .format({'Date': '{:%Y-%m-%d}', 'Return 1' :'{:.2%}', 'Return 2' :'{:.2%}'}) \
    .hide_index() \
    .background_gradient('RdYlGn')

Just a Little Better. Well… The default RdYlGn doesn’t look well in our case. Instead we can customize our own palette. What we want is the colormap to be centered around white with two ends red and green equally spreaded. Moreover, we want the column names to be larger, columns to be wider with fixed widths, and yes, a caption that is bold and italic.

def normalized(val):
    vmax = data.max().max()
    vmin = data.min().min()
    V = max(abs(vmax), abs(vmin))
    return val / V / 2 + .5  # ranges from 0 to 1, centered around .5

def background_gradient(val):
    if not isinstance(val, float): return f'background: white'
    cmap = clr.LinearSegmentedColormap.from_list('RdGn', [(0, 'red'), (.5, 'white'), (1, 'green')])
    color = clr.rgb2hex(cmap(normalized(val)))
    return f'background: {color}'

styles = [
    {
        'selector': 'th',
        'props': [('font-size', '120%'), ('text-align', 'center'), ('width', '150px')]
    },
    {
        'selector': 'td',
        'props': [('text-align', 'center')]
    },
    {
        'selector':'caption',
        'props': [('caption-side', 'bottom'), ('color', 'black'), ('font-style', 'italic'), ('font-weight', 'bold')]
    }
]

data.reset_index().style \
    .format({'Date': '{:%Y-%m-%d}', 'Return 1' :'{:.2%}', 'Return 2' :'{:.2%}'}) \
    .hide_index() \
    .applymap(background_gradient) \
    .set_caption('Table 1.1: Weekly Returns, Calculated on Friday Market Close') \
    .set_table_styles(styles)

The table is so far finished and ready for any presentation or online illustration – as far as I think. If you are not satisfied, go for Pandas’ official documentation to check out more features.