PAIRS TRADING WITH A KALMAN FILTER¶

In this notebook we are going look at the concept of building a trading strategy backtest based on mean reverting, co-integrated pairs of assets (Stock and ETFs). So to restate the theory in in terms of US equities, assets that are statistically co-integrated move in a way that means when their prices start to diverge by a certain amount (i.e. the spread between the 2 assets prices increases), we would expect that divergence to eventually revert back to the mean. In this instance we would look to sell the outperforming stock,and buy the under performing stock under the notion that the under performing stock would eventually recover with the over performing stock and rise in price, or vice versa the over performing stock would in time suffer from the same downward pressure of the under performing stock and fall in relative value.

Hence, pairs trading is a market neutral trading strategy enabling traders to profit from virtually any market conditions: Bull Markets, Bear Markets, or Sideways Markets.

So in our search for co-integrated assets, economic theory would suggest that we are more likely to find pairs of that are driven by the same factors, or similar business practices. After all, it is logical to consider 2 assets in the same industry to be similar products, to be at the mercy of the same general ups and downs of the volatile market.

Kalman Filter¶

So what is aKalman Filter? Well this site (http://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/) explains and states the following:

You can use a Kalman filter in any place where you have uncertain information about some dynamic system, and you can make an educated guess about what the system is going to do next. Even if messy reality comes along and interferes with the clean motion you guessed about, the Kalman filter will often do a very good job of figuring out what actually happened. And it can take advantage of correlations between crazy phenomena that you maybe wouldn’t have thought to exploit!

Kalman filters are ideal for systems which are continuously changing. They have the advantage that they are light on memory (they don’t need to keep any history other than the previous state), and they are very fast, making them well suited for real time problems and embedded systems.

So lets start to import the relevant modules we will need for our strategy backtest:

Imports¶

from time import time
import numpy as np                  
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.cm as cm
from scipy import stats
import datetime as dt  
import pandas as pd
import math
import os.path
import time
import json
import requests
import pandas_market_calendars as mcal
from datetime import timedelta, datetime
from dateutil import parser
import seaborn as sns
import matplotlib as mpl
import quantstats as qs
import statsmodels.api as sm
from pykalman import KalmanFilter
from math import sqrt
import warnings
import ffn
import pyfolio as pf
from pandas_datareader import data as web

pd.set_option('display.max_columns', None)
warnings.filterwarnings('ignore')

Download ticker data¶

symbols = ['GDX','GDXJ','GLD', 'AAPL','GOOGL', 'FB','TWTR','AMD',
           'NVDA','CSCO', 'ORCL', 'ATVI', 'TTWO', 'EA', 'HYG', 
           'LQD', 'JNK', 'SLV', 'USLV', 'SIVR', 'USO', 'UWT', 
           'QQQ', 'SPY', 'VOO', 'VDE', 'VTI', 'EMLP', 'VDC', 
           'FSTA', 'KXI', 'IBB', 'VHT','VNQ', 'IYR', 'MSFT', 
           'PG', 'TMF', 'UPRO', 'WFC', 'JPM', 'GS', 'CVX', 
           'XOM', 'INTC', 'COST', 'WMT', 'T', 'VZ', 'CMCSA', 'AMZN']

def get_symbols(symbols,data_source,ohlc,begin_date=None,end_date=None):
    out = []
    new_symbols = []
    for symbol in symbols:
        df = web.DataReader(symbol, data_source,begin_date, end_date)
        df = df[ohlc]
        new_symbols.append(symbol) 
        out.append(df.astype('float'))
        data = pd.concat(out, axis = 1)
        data.columns = new_symbols
        data = data.dropna(axis=1)
    return data.dropna(axis=1)

start = pd.Timestamp('2014-01-01')
end = pd.Timestamp('2020-03-05')


prices = get_symbols(symbols,data_source='yahoo',ohlc='Close',\
                     begin_date=start,end_date=end)

Plot the resulting DataFrame of price data just to make sure we have what we need and as a quick sanity check:

combo = prices.copy()
combo.index = pd.DatetimeIndex(combo.index)
combo.head()

combo.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1555 entries, 2014-01-02 to 2020-03-06
Data columns (total 50 columns):
GDX      1555 non-null float64
GDXJ     1555 non-null float64
GLD      1555 non-null float64
AAPL     1555 non-null float64
GOOGL    1555 non-null float64
FB       1555 non-null float64
TWTR     1555 non-null float64
AMD      1555 non-null float64
NVDA     1555 non-null float64
CSCO     1555 non-null float64
ORCL     1555 non-null float64
ATVI     1555 non-null float64
TTWO     1555 non-null float64
EA       1555 non-null float64
HYG      1555 non-null float64
LQD      1555 non-null float64
JNK      1555 non-null float64
SLV      1555 non-null float64
USLV     1555 non-null float64
SIVR     1555 non-null float64
USO      1555 non-null float64
QQQ      1555 non-null float64
SPY      1555 non-null float64
VOO      1555 non-null float64
VDE      1555 non-null float64
VTI      1555 non-null float64
EMLP     1555 non-null float64
VDC      1555 non-null float64
FSTA     1555 non-null float64
KXI      1555 non-null float64
IBB      1555 non-null float64
VHT      1555 non-null float64
VNQ      1555 non-null float64
IYR      1555 non-null float64
MSFT     1555 non-null float64
PG       1555 non-null float64
TMF      1555 non-null float64
UPRO     1555 non-null float64
WFC      1555 non-null float64
JPM      1555 non-null float64
GS       1555 non-null float64
CVX      1555 non-null float64
XOM      1555 non-null float64
INTC     1555 non-null float64
COST     1555 non-null float64
WMT      1555 non-null float64
T        1555 non-null float64
VZ       1555 non-null float64
CMCSA    1555 non-null float64
AMZN     1555 non-null float64
dtypes: float64(50)
memory usage: 619.6 KB

num_stocks = len(combo.columns)
print('Number of Stocks =', num_stocks)

Number of Stocks = 50

Plot price series¶

n_secs = len(combo.columns)
colors = cm.rainbow(np.linspace(0, 1, n_secs))
combo.div(combo.iloc[0,:]).plot(color=colors, figsize=(12, 6))# Normalize Prices 
plt.title('All Stocks Normalized Price Series')
plt.xlabel('Date')
plt.ylabel('Price (USD$)')
plt.grid(b=None, which=u'major', axis=u'both')
plt.legend(bbox_to_anchor=(1.01, 1.1), loc='upper left', ncol=1)
plt.show();

What Is Cointegration?¶

The most common test for Pairs Trading is the cointegration test. Cointegration is a statistical property of two or more time-series variables which indicates if a linear combination of the variables is stationary.

Stationary process: parameters such as mean and variance also do not change over time.

Ok so it looks from the chart as if we have downloaded price data for around 50 assets; this should be more than enough to find at least a couple of co-integrated pairs to run our backtest over.

We will now define a quick function that will run our assets, combining them into pairs one by one and running co-integration tests on each pair. That result will then be stored in a matrix that we initialise, and then we will be able to plot that matrix as a heatmap. Also, if the co-integration test meets our threshold statistical significance (in our case 5%), then that pair of tickers will be stored in a list for later retrieval.

# NOTE CRITICAL LEVEL HAS BEEN SET TO 5% FOR COINTEGRATION TEST
def find_cointegrated_pairs(dataframe, critial_level = 0.05):
    n = dataframe.shape[1] # the length of dateframe
    pvalue_matrix = np.ones((n, n)) # initialize the matrix of p
    keys = dataframe.columns # get the column names
    pairs = [] # initilize the list for cointegration
    for i in range(n):
        for j in range(i+1, n): # for j bigger than i
            stock1 = dataframe[keys[i]] # obtain the price of "stock1"
            stock2 = dataframe[keys[j]]# obtain the price of "stock2"
            result = sm.tsa.stattools.coint(stock1, stock2) # get conintegration
            pvalue = result[1] # get the pvalue
            pvalue_matrix[i, j] = pvalue
            if pvalue < critial_level: # if p-value less than the critical level
                pairs.append((keys[i], keys[j], pvalue)) # record the contract with that p-value
    return pvalue_matrix, pairs

Let’s now run our data through our function, save the results and plot the heatmap:

df = combo

binance_symbols = df.columns

# Set up the split point for our "training data" on which to perform the co-integration test (the remaining dat awill be fed to our backtest function)
split = int(len(df) * 0.3)

# Run our dataframe (up to the split point) of ticker price data through our co-integration function and store results
pvalue_matrix, pairs = find_cointegrated_pairs(df[:split])

# Convert our matrix of stored results into a DataFrame
pvalue_matrix_df = pd.DataFrame(pvalue_matrix)

# Use Seaborn to plot a heatmap of our results matrix
sns.clustermap(pvalue_matrix_df, xticklabels=binance_symbols,yticklabels=binance_symbols, figsize=(12, 12))
plt.title('Stock P-value Matrix')
plt.tight_layout()
plt.show();

So we can see from the very dark squares that it looks as though there are indeed a few pairs of assets who’s co-integration score is below the 5% threshold hardcoded into the function we defined. To see more explicitly which pairs these are, let’s print out our list of stored pairs that was part of the fucntion results we stored:

for pair in pairs:
    print("Asset {} and Asset {} has a co-integration score of {}".format(pair[0],pair[1],round(pair[2],4)))

Asset GDX and Asset GLD has a co-integration score of 0.03
Asset GDX and Asset AMD has a co-integration score of 0.0173
Asset GDX and Asset HYG has a co-integration score of 0.0275
Asset GDX and Asset JNK has a co-integration score of 0.0375
Asset GDXJ and Asset AMD has a co-integration score of 0.0052
Asset GDXJ and Asset SLV has a co-integration score of 0.0062
Asset GDXJ and Asset USLV has a co-integration score of 0.0135
Asset GDXJ and Asset SIVR has a co-integration score of 0.006
Asset GDXJ and Asset USO has a co-integration score of 0.0378
Asset GDXJ and Asset COST has a co-integration score of 0.0495
Asset GLD and Asset FB has a co-integration score of 0.0073
Asset GLD and Asset AMD has a co-integration score of 0.0011
Asset GLD and Asset TTWO has a co-integration score of 0.011
Asset GLD and Asset EA has a co-integration score of 0.0039
Asset GLD and Asset HYG has a co-integration score of 0.0139
Asset GLD and Asset JNK has a co-integration score of 0.0051
Asset GLD and Asset SLV has a co-integration score of 0.0362
Asset GLD and Asset USLV has a co-integration score of 0.0124
Asset GLD and Asset SIVR has a co-integration score of 0.0377
Asset GLD and Asset USO has a co-integration score of 0.0111
Asset GLD and Asset QQQ has a co-integration score of 0.0468
Asset GLD and Asset CVX has a co-integration score of 0.0193
Asset GLD and Asset CMCSA has a co-integration score of 0.0495
Asset AAPL and Asset CSCO has a co-integration score of 0.0237
Asset AAPL and Asset SPY has a co-integration score of 0.005
Asset AAPL and Asset VOO has a co-integration score of 0.0313
Asset AAPL and Asset VTI has a co-integration score of 0.0285
Asset AAPL and Asset VDC has a co-integration score of 0.0483
Asset AAPL and Asset UPRO has a co-integration score of 0.0148
Asset AAPL and Asset WFC has a co-integration score of 0.0273
Asset GOOGL and Asset WMT has a co-integration score of 0.0021
Asset FB and Asset HYG has a co-integration score of 0.0013
Asset FB and Asset JNK has a co-integration score of 0.0033
Asset TWTR and Asset JNK has a co-integration score of 0.0488
Asset TWTR and Asset AMZN has a co-integration score of 0.0424
Asset AMD and Asset SLV has a co-integration score of 0.014
Asset AMD and Asset USLV has a co-integration score of 0.0352
Asset AMD and Asset SIVR has a co-integration score of 0.0141
Asset AMD and Asset VDE has a co-integration score of 0.0459
Asset AMD and Asset XOM has a co-integration score of 0.0352
Asset AMD and Asset CMCSA has a co-integration score of 0.0343
Asset CSCO and Asset SPY has a co-integration score of 0.003
Asset CSCO and Asset VOO has a co-integration score of 0.0063
Asset CSCO and Asset VTI has a co-integration score of 0.0016
Asset CSCO and Asset VDC has a co-integration score of 0.0268
Asset CSCO and Asset FSTA has a co-integration score of 0.011
Asset CSCO and Asset KXI has a co-integration score of 0.0346
Asset CSCO and Asset UPRO has a co-integration score of 0.0291
Asset ORCL and Asset LQD has a co-integration score of 0.0288
Asset TTWO and Asset HYG has a co-integration score of 0.0051
Asset TTWO and Asset JNK has a co-integration score of 0.0071
Asset EA and Asset HYG has a co-integration score of 0.0497
Asset EA and Asset CMCSA has a co-integration score of 0.0239
Asset HYG and Asset JNK has a co-integration score of 0.0001
Asset LQD and Asset EMLP has a co-integration score of 0.0211
Asset LQD and Asset PG has a co-integration score of 0.0423
Asset LQD and Asset WMT has a co-integration score of 0.0181
Asset LQD and Asset VZ has a co-integration score of 0.0489
Asset SLV and Asset USLV has a co-integration score of 0.0234
Asset SLV and Asset SIVR has a co-integration score of 0.0003
Asset SLV and Asset USO has a co-integration score of 0.0333
Asset SLV and Asset VDC has a co-integration score of 0.0436
Asset SLV and Asset COST has a co-integration score of 0.0431
Asset USLV and Asset SIVR has a co-integration score of 0.0288
Asset USLV and Asset QQQ has a co-integration score of 0.0476
Asset USLV and Asset VDC has a co-integration score of 0.0331
Asset USLV and Asset COST has a co-integration score of 0.036
Asset SIVR and Asset USO has a co-integration score of 0.0333
Asset SIVR and Asset VDC has a co-integration score of 0.0425
Asset QQQ and Asset CMCSA has a co-integration score of 0.0253
Asset SPY and Asset VOO has a co-integration score of 0.0003
Asset SPY and Asset VDC has a co-integration score of 0.0495
Asset SPY and Asset WFC has a co-integration score of 0.0319
Asset VOO and Asset WFC has a co-integration score of 0.0299
Asset VTI and Asset WFC has a co-integration score of 0.0232
Asset KXI and Asset VHT has a co-integration score of 0.0321
Asset IBB and Asset GS has a co-integration score of 0.0295
Asset VHT and Asset GS has a co-integration score of 0.0185
Asset VNQ and Asset TMF has a co-integration score of 0.04
Asset IYR and Asset TMF has a co-integration score of 0.0494
Asset UPRO and Asset WFC has a co-integration score of 0.03
Asset WFC and Asset GS has a co-integration score of 0.015
Asset T and Asset CMCSA has a co-integration score of 0.0272
Asset T and Asset AMZN has a co-integration score of 0.0148
Asset VZ and Asset AMZN has a co-integration score of 0.0086

We will now use the “pykalman” module to set up a couple of functions that will allow us to generate Kalman filters which we will apply to our data and in turn our regression that is fed the said data.

def KalmanFilterAverage(x):
  # Construct a Kalman filter
    kf = KalmanFilter(transition_matrices = [1],
    observation_matrices = [1],
    initial_state_mean = 0,
    initial_state_covariance = 1,
    observation_covariance=1,
    transition_covariance=.01)
  # Use the observed values of the price to get a rolling mean
    state_means, _ = kf.filter(x.values)
    state_means = pd.Series(state_means.flatten(), index=x.index)
    return state_means

# Kalman filter regression
def KalmanFilterRegression(x,y):
    delta = 1e-3
    trans_cov = delta / (1 - delta) * np.eye(2) # How much random walk wiggles
    obs_mat = np.expand_dims(np.vstack([[x], [np.ones(len(x))]]).T, axis=1)
    kf = KalmanFilter(n_dim_obs=1, n_dim_state=2, # y is 1-dimensional, (alpha, beta) is 2-dimensional
    initial_state_mean=[0,0],
    initial_state_covariance=np.ones((2, 2)),
    transition_matrices=np.eye(2),
    observation_matrices=obs_mat,
    observation_covariance=2,
    transition_covariance=trans_cov)
    # Use the observations y to get running estimates and errors for the state parameters
    state_means, state_covs = kf.filter(y.values)
    return state_means

def half_life(spread):
    spread_lag = spread.shift(1)
    spread_lag.iloc[0] = spread_lag.iloc[1]
    spread_ret = spread - spread_lag
    spread_ret.iloc[0] = spread_ret.iloc[1]
    spread_lag2 = sm.add_constant(spread_lag)
    model = sm.OLS(spread_ret,spread_lag2)
    res = model.fit()
    halflife = int(round(-np.log(2) / res.params[1],0))
    if halflife <= 0:
        halflife = 1
    return halflife

Now let us define our main “Backtest” function that we will run our data through. The fucntion takes one pair of tickers at a time, and then returns several outputs, namely the DataFrame of cumulative returns, the Sharpe Ratio and the Compound Annual Growth Rate (CAGR). Once we have defined our function, we can iterate over our list of pairs and feed the relevant data, pair by pair, into the function, storing the outputs for each pair for later use and retrieval.

def backtest(df,s1, s2):
    #############################################################
    # INPUT:
    # DataFrame of prices (df)
    # s1: the symbol of asset one
    # s2: the symbol of asset two
    # x: the price series of asset one
    # y: the price series of asset two
    # OUTPUT:
    # df1['cum rets']: cumulative returns in pandas data frame
    # sharpe: Sharpe ratio
    # CAGR: Compound Annual Growth Rate
    
    x = df[s1]
    y = df[s2]
    
    # Run regression (including Kalman Filter) to find hedge ratio and then create spread series
    df1 = pd.DataFrame({'y':y,'x':x})
    df1.index = pd.to_datetime(df1.index)
    state_means = KalmanFilterRegression(KalmanFilterAverage(x),KalmanFilterAverage(y))
    df1['hr'] = - state_means[:,0]
    df1['spread'] = df1.y + (df1.x * df1.hr)
    
    # calculate half life
    halflife = half_life(df1['spread'])
    
    # calculate z-score with window = half life period
    meanSpread = df1.spread.rolling(window=halflife).mean()
    stdSpread = df1.spread.rolling(window=halflife).std()
    df1['zScore'] = (df1.spread-meanSpread)/stdSpread
    
    ##############################################################
    
    # trading logic
    entryZscore = 1.25
    exitZscore = -0.08
    
    #set up num units long
    df1['long entry'] = ((df1.zScore < - entryZscore) & ( df1.zScore.shift(1) > - entryZscore))
    df1['long exit'] = ((df1.zScore > - exitZscore) & (df1.zScore.shift(1) < - exitZscore))
    df1['num units long'] = np.nan 
    df1.loc[df1['long entry'],'num units long'] = 1 
    df1.loc[df1['long exit'],'num units long'] = 0 
    df1['num units long'][0] = 0 
    df1['num units long'] = df1['num units long'].fillna(method='pad')
    
    #set up num units short 
    df1['short entry'] = ((df1.zScore > entryZscore) & ( df1.zScore.shift(1) < entryZscore))
    df1['short exit'] = ((df1.zScore < exitZscore) & (df1.zScore.shift(1) > exitZscore))
    df1.loc[df1['short entry'],'num units short'] = -1
    df1.loc[df1['short exit'],'num units short'] = 0
    df1['num units short'][0] = 0
    df1['num units short'] = df1['num units short'].fillna(method='pad')
    
    #set up totals: num units and returns
    df1['numUnits'] = df1['num units long'] + df1['num units short']
    df1['spread pct ch'] = (df1['spread'] - df1['spread'].shift(1)) / ((df1['x'] * abs(df1['hr'])) + df1['y'])
    df1['port rets'] = df1['spread pct ch'] * df1['numUnits'].shift(1)
    df1['cum rets'] = df1['port rets'].cumsum()
    df1['cum rets'] = df1['cum rets'] + 1

    ##############################################################
    
    try:
        sharpe = ((df1['port rets'].mean() / df1['port rets'].std()) * sqrt(252))
    except ZeroDivisionError:
        sharpe = 0.0
        
    ##############################################################
    
    start_val = 1
    end_val = df1['cum rets'].iat[-1]
    start_date = df1.iloc[0].name
    end_date = df1.iloc[-1].name
    days = (end_date - start_date).days
    CAGR = (end_val / start_val) ** (252.0/days) - 1
    
    df1[s1+ " "+s2+'_cum_rets'] = df1['cum rets']
    
    return df1[s1+ " "+s2+'_cum_rets'], sharpe, CAGR

So now let’s run our full list of pairs through our Backtest function, and print out some results along the way, and finally after storing the equity curve for each pair, produce a chart that plots out each curve.

results = []
for pair in pairs:
    rets, sharpe, CAGR = backtest(df[split:],pair[0],pair[1])
    results.append(rets)
    print("The pair {} and {} produced a Sharpe Ratio of {} and a CAGR of {}".format(pair[0],pair[1],
                                                                                     round(sharpe,2),
                                                                                     round(CAGR,4)))
    rets0 = pd.concat(results, axis=1)

The pair GDX and GLD produced a Sharpe Ratio of 0.52 and a CAGR of 0.0312
The pair GDX and AMD produced a Sharpe Ratio of 1.08 and a CAGR of 0.154
The pair GDX and HYG produced a Sharpe Ratio of 0.4 and a CAGR of 0.0345
The pair GDX and JNK produced a Sharpe Ratio of 0.76 and a CAGR of 0.0611
The pair GDXJ and AMD produced a Sharpe Ratio of 0.93 and a CAGR of 0.1373
The pair GDXJ and SLV produced a Sharpe Ratio of 0.55 and a CAGR of 0.0391
The pair GDXJ and USLV produced a Sharpe Ratio of 0.5 and a CAGR of 0.056
The pair GDXJ and SIVR produced a Sharpe Ratio of 0.68 and a CAGR of 0.0473
The pair GDXJ and USO produced a Sharpe Ratio of 0.67 and a CAGR of 0.0756
The pair GDXJ and COST produced a Sharpe Ratio of 0.82 and a CAGR of 0.0816
The pair GLD and FB produced a Sharpe Ratio of 1.46 and a CAGR of 0.0963
The pair GLD and AMD produced a Sharpe Ratio of 1.37 and a CAGR of 0.1579
The pair GLD and TTWO produced a Sharpe Ratio of 1.57 and a CAGR of 0.1134
The pair GLD and EA produced a Sharpe Ratio of 1.52 and a CAGR of 0.1032
The pair GLD and HYG produced a Sharpe Ratio of 1.15 and a CAGR of 0.0398
The pair GLD and JNK produced a Sharpe Ratio of 1.14 and a CAGR of 0.0404
The pair GLD and SLV produced a Sharpe Ratio of 0.48 and a CAGR of 0.0162
The pair GLD and USLV produced a Sharpe Ratio of 0.92 and a CAGR of 0.1008
The pair GLD and SIVR produced a Sharpe Ratio of 0.72 and a CAGR of 0.0238
The pair GLD and USO produced a Sharpe Ratio of 1.22 and a CAGR of 0.0893
The pair GLD and QQQ produced a Sharpe Ratio of 1.23 and a CAGR of 0.0632
The pair GLD and CVX produced a Sharpe Ratio of 0.65 and a CAGR of 0.0394
The pair GLD and CMCSA produced a Sharpe Ratio of 2.09 and a CAGR of 0.106
The pair AAPL and CSCO produced a Sharpe Ratio of 1.53 and a CAGR of 0.0791
The pair AAPL and SPY produced a Sharpe Ratio of 0.95 and a CAGR of 0.0433
The pair AAPL and VOO produced a Sharpe Ratio of 0.83 and a CAGR of 0.0377
The pair AAPL and VTI produced a Sharpe Ratio of 1.01 and a CAGR of 0.0466
The pair AAPL and VDC produced a Sharpe Ratio of 1.22 and a CAGR of 0.0646
The pair AAPL and UPRO produced a Sharpe Ratio of 0.61 and a CAGR of 0.0441
The pair AAPL and WFC produced a Sharpe Ratio of 1.08 and a CAGR of 0.0655
The pair GOOGL and WMT produced a Sharpe Ratio of 2.07 and a CAGR of 0.1078
The pair FB and HYG produced a Sharpe Ratio of 1.2 and a CAGR of 0.0695
The pair FB and JNK produced a Sharpe Ratio of 1.5 and a CAGR of 0.0866
The pair TWTR and JNK produced a Sharpe Ratio of 0.57 and a CAGR of 0.0627
The pair TWTR and AMZN produced a Sharpe Ratio of 1.11 and a CAGR of 0.1073
The pair AMD and SLV produced a Sharpe Ratio of 1.18 and a CAGR of 0.1396
The pair AMD and USLV produced a Sharpe Ratio of 1.1 and a CAGR of 0.1823
The pair AMD and SIVR produced a Sharpe Ratio of 1.15 and a CAGR of 0.137
The pair AMD and VDE produced a Sharpe Ratio of 0.95 and a CAGR of 0.1074
The pair AMD and XOM produced a Sharpe Ratio of 1.31 and a CAGR of 0.139
The pair AMD and CMCSA produced a Sharpe Ratio of 1.08 and a CAGR of 0.1276
The pair CSCO and SPY produced a Sharpe Ratio of 0.82 and a CAGR of 0.0335
The pair CSCO and VOO produced a Sharpe Ratio of 0.89 and a CAGR of 0.0361
The pair CSCO and VTI produced a Sharpe Ratio of 0.71 and a CAGR of 0.0299
The pair CSCO and VDC produced a Sharpe Ratio of 0.84 and a CAGR of 0.0412
The pair CSCO and FSTA produced a Sharpe Ratio of 0.31 and a CAGR of 0.0174
The pair CSCO and KXI produced a Sharpe Ratio of 0.57 and a CAGR of 0.0287
The pair CSCO and UPRO produced a Sharpe Ratio of 0.75 and a CAGR of 0.055
The pair ORCL and LQD produced a Sharpe Ratio of 1.08 and a CAGR of 0.056
The pair TTWO and HYG produced a Sharpe Ratio of 1.63 and a CAGR of 0.1039
The pair TTWO and JNK produced a Sharpe Ratio of 1.61 and a CAGR of 0.1027
The pair EA and HYG produced a Sharpe Ratio of 1.67 and a CAGR of 0.0974
The pair EA and CMCSA produced a Sharpe Ratio of 1.59 and a CAGR of 0.1065
The pair HYG and JNK produced a Sharpe Ratio of 0.0 and a CAGR of 0.0
The pair LQD and EMLP produced a Sharpe Ratio of 1.2 and a CAGR of 0.0413
The pair LQD and PG produced a Sharpe Ratio of 1.66 and a CAGR of 0.055
The pair LQD and WMT produced a Sharpe Ratio of 1.69 and a CAGR of 0.0737
The pair LQD and VZ produced a Sharpe Ratio of 1.05 and a CAGR of 0.0461
The pair SLV and USLV produced a Sharpe Ratio of 0.59 and a CAGR of 0.0652
The pair SLV and SIVR produced a Sharpe Ratio of 0.0 and a CAGR of 0.0
The pair SLV and USO produced a Sharpe Ratio of 0.96 and a CAGR of 0.0827
The pair SLV and VDC produced a Sharpe Ratio of 0.84 and a CAGR of 0.0477
The pair SLV and COST produced a Sharpe Ratio of 0.57 and a CAGR of 0.0406
The pair USLV and SIVR produced a Sharpe Ratio of 0.78 and a CAGR of 0.0731
The pair USLV and QQQ produced a Sharpe Ratio of 1.15 and a CAGR of 0.1349
The pair USLV and VDC produced a Sharpe Ratio of 1.13 and a CAGR of 0.134
The pair USLV and COST produced a Sharpe Ratio of 0.96 and a CAGR of 0.1184
The pair SIVR and USO produced a Sharpe Ratio of 0.78 and a CAGR of 0.0687
The pair SIVR and VDC produced a Sharpe Ratio of 0.94 and a CAGR of 0.0537
The pair QQQ and CMCSA produced a Sharpe Ratio of 2.15 and a CAGR of 0.0892
The pair SPY and VOO produced a Sharpe Ratio of 0.0 and a CAGR of 0.0
The pair SPY and VDC produced a Sharpe Ratio of 1.16 and a CAGR of 0.0287
The pair SPY and WFC produced a Sharpe Ratio of 1.17 and a CAGR of 0.0452
The pair VOO and WFC produced a Sharpe Ratio of 1.08 and a CAGR of 0.0429
The pair VTI and WFC produced a Sharpe Ratio of 1.23 and a CAGR of 0.0474
The pair KXI and VHT produced a Sharpe Ratio of 0.81 and a CAGR of 0.0265
The pair IBB and GS produced a Sharpe Ratio of 0.83 and a CAGR of 0.0459
The pair VHT and GS produced a Sharpe Ratio of 1.23 and a CAGR of 0.0558
The pair VNQ and TMF produced a Sharpe Ratio of 1.69 and a CAGR of 0.1227
The pair IYR and TMF produced a Sharpe Ratio of 1.8 and a CAGR of 0.1247
The pair UPRO and WFC produced a Sharpe Ratio of 0.61 and a CAGR of 0.0456
The pair WFC and GS produced a Sharpe Ratio of 1.43 and a CAGR of 0.0618
The pair T and CMCSA produced a Sharpe Ratio of 1.47 and a CAGR of 0.0732
The pair T and AMZN produced a Sharpe Ratio of 1.34 and a CAGR of 0.0897
The pair VZ and AMZN produced a Sharpe Ratio of 1.23 and a CAGR of 0.0858

rets0.plot(figsize=(12,6),legend=True)
plt.legend(bbox_to_anchor=(1.01, 1.1), loc='upper left', ncol=1)
plt.grid(b=None, which=u'major', axis=u'both')
plt.title('Pairs Returns')
plt.xlabel('Date')
plt.ylabel('Returns');

Now we run a few extra lines of code to combine, equally weight, and print our our final equity curve:

filename = 'pairs_rets.csv'
rets0.to_csv(filename)

#concatenate together the individual equity curves into a single DataFrame
results_df = pd.concat(results,axis=1).dropna()

#equally weight each equity curve by dividing each by the number of pairs held in the DataFrame
results_df /= len(results_df.columns)

#sum up the equally weighted equity curves to get our final equity curve
final_res = results_df.sum(axis=1)

# square root of sample size for correct number of bins for returns distribution
print('Bin Count =', np.sqrt(len(final_res)))

Bin Count = 32.984845004941285

Pair_Rets = ffn.to_returns(final_res)
Pair_Rets = pd.DataFrame(Pair_Rets)
Pair_Rets = Pair_Rets.fillna(0)
Pair_Rets.columns = ['Pairs_Returns']

fig, (ax1, ax2) = plt.subplots(1,2,figsize=(12, 5))

sns.distplot(Pair_Rets, hist = True, kde = True, bins=35,
                 hist_kws = {'linewidth': 1, 'alpha':.5},
                 label='Pairs Returns', color='#4b91fa', ax=ax1)

ax1.axvline(x=0.000709, color='#ff0000', linewidth=1.25, linestyle='dashed',label = 'Returns Mean')

ax1.set_title('Pairs Returns Distribution')
ax1.margins(0.001)
ax1.set_xlabel('Returns (%)')
ax1.set_ylabel('Density')

stats.probplot(Pair_Rets.Pairs_Returns, plot=ax2)

plt.tight_layout()

plt.show();

Pair_Rets.Pairs_Returns.describe()

count    1088.000000
mean        0.000423
std         0.002528
min        -0.012679
25%        -0.000744
50%         0.000390
75%         0.001607
max         0.019094
Name: Pairs_Returns, dtype: float64

perf = final_res.calc_stats()

num_pairs = len(results_df.columns)
print('Number of Pairs =', num_pairs)

Number of Pairs = 85

# set SPY as benchmark
bench = df.loc[str(Pair_Rets.index[0]):str(Pair_Rets.index[-1])].SPY.pct_change().dropna()
Pair_Rets0 = Pair_Rets.loc[str(bench.index[0]):str(bench.index[-1])]

Total Returns Performance¶

fig = pf.create_returns_tear_sheet(Pair_Rets.Pairs_Returns, benchmark_rets=bench)

plt.figure(figsize=(14, 7))
pf.plot_perf_stats(factor_returns=Pair_Rets.Pairs_Returns, returns=bench)
plt.show();

The Sharpe Ratio is very good. Also take into consideration trading cost was not factored in these performance stats. The testing period was only from August 2015 until March 2020 only looking at specific pairs returns when traded in the training period using 1.25 - (-0.08) zscore entry/exit logic. Then creating an equal weight portfolio of those high pairs to trade in the test period. Cummulative returns were almost 58% for about 4yrs with limited drawdown.

qs.extend_pandas()
stock = Pair_Rets.Pairs_Returns

#qs.reports.html(stock, "SPY", title='Kalman Filter Pairs Strategy')

	GDX	GDXJ	GLD	AAPL	GOOGL	FB	TWTR	AMD	NVDA	CSCO	ORCL	ATVI	TTWO	EA	HYG	LQD	JNK	SLV	USLV	SIVR	USO	QQQ	SPY	VOO	VDE	VTI	EMLP	VDC	FSTA	KXI	IBB	VHT	VNQ	IYR	MSFT	PG	TMF	UPRO	WFC	JPM	GS	CVX	XOM	INTC	COST	WMT	T	VZ	CMCSA	AMZN
Date
2014-01-02	22.030001	32.700001	118.000000	79.018570	557.117126	54.709999	67.500000	3.95	15.860000	22.000000	37.840000	18.070000	17.530001	22.830000	93.040001	114.410004	121.589996	19.230000	489.200012	19.719999	34.230000	87.269997	182.919998	167.630005	124.620003	95.080002	23.200001	108.889999	25.809999	42.395000	75.696663	100.580002	64.570000	62.980000	37.160000	80.540001	11.0100	15.593333	45.020000	58.209999	176.889999	124.139999	99.750000	25.790001	117.809998	78.910004	34.950001	49.000000	25.725000	397.970001
2014-01-03	21.830000	32.650002	119.290001	77.282860	553.053040	54.560001	69.000000	4.00	15.670000	21.980000	37.619999	18.290001	17.629999	22.680000	93.010002	114.580002	121.680000	19.420000	504.899994	19.920000	33.750000	86.639999	182.889999	167.479996	124.320000	95.059998	23.190001	108.750000	25.750000	42.480000	75.343330	100.809998	64.930000	63.349998	36.910000	80.449997	11.0150	15.566667	45.340000	58.660000	178.149994	124.349998	99.510002	25.780001	117.290001	78.650002	34.799999	48.419998	25.535000	396.440002
2014-01-06	21.930000	32.790001	119.500000	77.704285	559.219238	57.200001	66.290001	4.13	15.880000	22.010000	37.470001	18.080000	17.600000	22.530001	93.209999	114.830002	121.919998	19.420000	502.100006	19.920000	33.570000	86.320000	182.360001	167.059998	124.290001	94.809998	23.129999	108.269997	25.660000	42.380001	74.606667	100.400002	65.260002	63.610001	36.130001	80.639999	11.1200	15.463333	45.419998	59.000000	179.369995	124.019997	99.660004	25.459999	116.400002	78.209999	34.959999	48.689999	25.510000	393.630005
2014-01-07	21.969999	32.459999	118.820000	77.148575	570.000000	57.919998	61.459999	4.18	16.139999	22.309999	37.849998	18.320000	18.110001	23.100000	93.209999	114.739998	121.949997	19.129999	480.200012	19.610001	33.580002	87.120003	183.479996	168.100006	125.260002	95.419998	23.180000	108.900002	25.799999	42.494999	75.643333	101.570000	65.550003	63.830002	36.410000	81.419998	11.2150	15.731667	45.400002	58.320000	178.289993	125.070000	101.070000	25.590000	115.860001	78.449997	34.950001	49.299999	26.415001	398.029999
2014-01-08	21.610001	31.709999	118.120003	77.637146	571.186157	58.230000	59.290001	4.18	16.360001	22.290001	37.720001	18.340000	17.799999	23.309999	93.150002	114.250000	121.919998	18.830000	458.299988	19.299999	33.160000	87.309998	183.520004	168.169998	124.449997	95.489998	23.110001	108.199997	25.610001	42.125000	77.193336	102.559998	65.230003	63.730000	35.759998	80.239998	11.1325	15.748333	45.919998	58.869999	178.440002	123.290001	100.739998	25.430000	114.050003	77.830002	34.240002	48.500000	26.375000	401.920013

	Backtest
Start date	2015-11-10
End date	2020-03-06
Total months	51
Annual return	11.2%
Cumulative returns	57.9%
Annual volatility	4.0%
Sharpe ratio	2.66
Calmar ratio	2.80
Stability	0.93
Max drawdown	-4.0%
Omega ratio	1.63
Sortino ratio	4.39
Skew	0.50
Kurtosis	6.13
Tail ratio	1.20
Daily value at risk	-0.5%
Alpha	0.11
Beta	0.02

Worst drawdown periods	Net drawdown in %	Peak date	Valley date	Recovery date	Duration
0	3.99	2016-06-28	2017-01-17	2017-08-02	287
1	1.90	2016-05-06	2016-06-15	2016-06-24	36
2	1.88	2016-02-18	2016-03-01	2016-03-16	20
3	1.83	2016-01-05	2016-01-07	2016-01-27	17
4	1.42	2016-02-02	2016-02-11	2016-02-16	11