Stock Price Prediction using Linear Regression

The notebook linear_regression.ipynb contains examples for the prediction of stock prices using OLS with statsmodels and sklearn, as well as ridge and lasso models.

It is designed to run as a notebook on the Quantopian research platform.

How to run this notebook

This notebook is written for the Quantopian research environment.

Imports

In [3]:
import pandas as pd
import numpy as np
from time import time
import talib
import re
from statsmodels.api import OLS
from sklearn.metrics import mean_squared_error
from scipy.stats import spearmanr, pearsonr
from sklearn.linear_model import LinearRegression, Ridge, RidgeCV, Lasso, LassoCV, LogisticRegression
from sklearn.preprocessing import StandardScaler

from quantopian.research import run_pipeline
from quantopian.pipeline import Pipeline, factors, filters, classifiers
from quantopian.pipeline.data.builtin import USEquityPricing

from quantopian.pipeline.factors import (Latest, 
                                         Returns, 
                                         AverageDollarVolume, 
                                         SimpleMovingAverage,
                                         EWMA,
                                         BollingerBands,
                                         CustomFactor,
                                         MarketCap,
                                        SimpleBeta)

from quantopian.pipeline.filters import QTradableStocksUS, StaticAssets
from quantopian.pipeline.data.quandl import fred_usdontd156n as libor
from empyrical import max_drawdown, sortino_ratio

import seaborn as sns
import matplotlib.pyplot as plt

Data Sources

In [4]:
################
# Fundamentals #
################

# Morningstar fundamentals (2002 - Ongoing)
# https://www.quantopian.com/help/fundamentals
from quantopian.pipeline.data import Fundamentals

#####################
# Analyst Estimates #
#####################

# Earnings Surprises - Zacks (27 May 2006 - Ongoing)
# https://www.quantopian.com/data/zacks/earnings_surprises
from quantopian.pipeline.data.zacks import EarningsSurprises
from quantopian.pipeline.factors.zacks import BusinessDaysSinceEarningsSurprisesAnnouncement

##########
# Events #
##########

# Buyback Announcements - EventVestor (01 Jun 2007 - Ongoing)
# https://www.quantopian.com/data/eventvestor/buyback_auth
from quantopian.pipeline.data.eventvestor import BuybackAuthorizations
from quantopian.pipeline.factors.eventvestor import BusinessDaysSinceBuybackAuth

# CEO Changes - EventVestor (01 Jan 2007 - Ongoing)
# https://www.quantopian.com/data/eventvestor/ceo_change
from quantopian.pipeline.data.eventvestor import CEOChangeAnnouncements

# Dividends - EventVestor (01 Jan 2007 - Ongoing)
# https://www.quantopian.com/data/eventvestor/dividends
from quantopian.pipeline.data.eventvestor import (
    DividendsByExDate,
    DividendsByPayDate,
    DividendsByAnnouncementDate,
)
from quantopian.pipeline.factors.eventvestor import (
    BusinessDaysSincePreviousExDate,
    BusinessDaysUntilNextExDate,
    BusinessDaysSinceDividendAnnouncement,
)

# Earnings Calendar - EventVestor (01 Jan 2007 - Ongoing)
# https://www.quantopian.com/data/eventvestor/earnings_calendar
from quantopian.pipeline.data.eventvestor import EarningsCalendar
from quantopian.pipeline.factors.eventvestor import (
    BusinessDaysUntilNextEarnings,
    BusinessDaysSincePreviousEarnings
)

# 13D Filings - EventVestor (01 Jan 2007 - Ongoing)
# https://www.quantopian.com/data/eventvestor/_13d_filings
from quantopian.pipeline.data.eventvestor import _13DFilings
from quantopian.pipeline.factors.eventvestor import BusinessDaysSince13DFilingsDate

#############
# Sentiment #
#############

# News Sentiment - Sentdex Sentiment Analysis (15 Oct 2012 - Ongoing)
# https://www.quantopian.com/data/sentdex/sentiment
from quantopian.pipeline.data.sentdex import sentiment

Prepare the Data

We need to select a universe of equities and a time horizon, build and transform alpha factors that we will use as features, calculate forward returns that we aim to predict, and potentially clean our data.

Time horizon

In [5]:
# trading days per period
MONTH = 21
YEAR = 12 * MONTH
In [7]:
START = '2017-01-01'
END = '2018-12-31'

Universe

We will use equity data for the years 2014 and 2015 from a custom Q50US universe that uses built-in filters, factors, and classifiers to select the 50 stocks with the highest average dollar volume of the last 200 trading days filtered by additional default criteria (see Quantopian docs linked on GitHub for detail). The universe dynamically updates based on the filter criteria so that, while there are 100 stocks at any given point, there may be more than 50 distinct equities in the sample:

In [8]:
def Q50US():
    return filters.make_us_equity_universe(
        target_size=50,
        rankby=factors.AverageDollarVolume(window_length=200),
        mask=filters.default_us_equity_universe_mask(),
        groupby=classifiers.fundamentals.Sector(),
        max_group_weight=0.3,
        smoothing_func=lambda f: f.downsample('month_start'),
    )
In [9]:
# UNIVERSE = StaticAssets(symbols(['MSFT', 'AAPL']))
UNIVERSE = Q50US()

Factor Transformations

In [10]:
class AnnualizedData(CustomFactor):
    # Get the sum of the last 4 reported values
    window_length = 260

    def compute(self, today, assets, out, asof_date, values):
        for asset in range(len(assets)):
            # unique asof dates indicate availability of new figures
            _, filing_dates = np.unique(asof_date[:, asset], return_index=True)
            quarterly_values = values[filing_dates[-4:], asset]
            # ignore annual windows with <4 quarterly data points
            if len(~np.isnan(quarterly_values)) != 4:
                out[asset] = np.nan
            else:
                out[asset] = np.sum(quarterly_values)
In [11]:
class AnnualAvg(CustomFactor):
    window_length = 252
    
    def compute(self, today, assets, out, values):
        out[:] = (values[0] + values[-1])/2
In [12]:
def run_pipeline_chunks(pipe, start_date, end_date, chunks_len = None):
    
    chunks  = []
    current = pd.Timestamp(start_date)
    end     = pd.Timestamp(end_date)
    step    = pd.Timedelta(weeks=26) if chunks_len is None else chunks_len
    
    start_pipeline_timer = time()
    
    while current <= end:
        
        current_end = current + step
        if current_end > end:
            current_end = end
        
        start_timer = time()
        print 'Running pipeline:', current, ' - ', current_end
        results = run_pipeline(pipe, current.strftime("%Y-%m-%d"), current_end.strftime("%Y-%m-%d"))
        chunks.append(results)
        
        # pipeline returns more days than requested (if no trading day), so get last date from the results
        current_end = results.index.get_level_values(0)[-1].tz_localize(None)
        current = current_end + pd.Timedelta(days=1)
        
        end_timer = time()
        print "Time to run this chunk of the pipeline %.2f secs" % (end_timer - start_timer)
        
    end_pipeline_timer = time()
    print "Time to run the entire pipeline %.2f secs" % (end_pipeline_timer - start_pipeline_timer)
    return pd.concat(chunks)
In [13]:
def factor_pipeline(factors):
    start = time()
    pipe = Pipeline({k: v(mask=UNIVERSE).rank() for k, v in factors.items()},
                    screen=UNIVERSE)
    result = run_pipeline_chunks(pipe, start_date=START, end_date=END)
    return result, time() - start

Factor Library

Value Factors

In [14]:
class ValueFactors:
    """Definitions of factors for cross-sectional trading algorithms"""
    
    @staticmethod
    def PriceToSalesTTM(**kwargs):
        """Last closing price divided by sales per share"""        
        return Fundamentals.ps_ratio.latest

    @staticmethod
    def PriceToEarningsTTM(**kwargs):
        """Closing price divided by earnings per share (EPS)"""
        return Fundamentals.pe_ratio.latest
 
    @staticmethod
    def PriceToDilutedEarningsTTM(mask):
        """Closing price divided by diluted EPS"""
        last_close = USEquityPricing.close.latest
        diluted_eps = AnnualizedData(inputs = [Fundamentals.diluted_eps_earnings_reports_asof_date,
                                               Fundamentals.diluted_eps_earnings_reports],
                                     mask=mask)
        return last_close / diluted_eps

    @staticmethod
    def PriceToForwardEarnings(**kwargs):
        """Price to Forward Earnings"""
        return Fundamentals.forward_pe_ratio.latest
    
    @staticmethod
    def DividendYield(**kwargs):
        """Dividends per share divided by closing price"""
        return Fundamentals.trailing_dividend_yield.latest

    @staticmethod
    def PriceToFCF(mask):
        """Price to Free Cash Flow"""
        last_close = USEquityPricing.close.latest
        fcf_share = AnnualizedData(inputs = [Fundamentals.fcf_per_share_asof_date,
                                             Fundamentals.fcf_per_share],
                                   mask=mask)
        return last_close / fcf_share

    @staticmethod
    def PriceToOperatingCashflow(mask):
        """Last Close divided by Operating Cash Flows"""
        last_close = USEquityPricing.close.latest
        cfo_per_share = AnnualizedData(inputs = [Fundamentals.cfo_per_share_asof_date,
                                                 Fundamentals.cfo_per_share],
                                       mask=mask)        
        return last_close / cfo_per_share

    @staticmethod
    def PriceToBook(mask):
        """Closing price divided by book value"""
        last_close = USEquityPricing.close.latest
        book_value_per_share = AnnualizedData(inputs = [Fundamentals.book_value_per_share_asof_date,
                                              Fundamentals.book_value_per_share],
                                             mask=mask)        
        return last_close / book_value_per_share


    @staticmethod
    def EVToFCF(mask):
        """Enterprise Value divided by Free Cash Flows"""
        fcf = AnnualizedData(inputs = [Fundamentals.free_cash_flow_asof_date,
                                       Fundamentals.free_cash_flow],
                             mask=mask)
        return Fundamentals.enterprise_value.latest / fcf

    @staticmethod
    def EVToEBITDA(mask):
        """Enterprise Value to Earnings Before Interest, Taxes, Deprecation and Amortization (EBITDA)"""
        ebitda = AnnualizedData(inputs = [Fundamentals.ebitda_asof_date,
                                          Fundamentals.ebitda],
                                mask=mask)

        return Fundamentals.enterprise_value.latest / ebitda

    @staticmethod
    def EBITDAYield(mask):
        """EBITDA divided by latest close"""
        ebitda = AnnualizedData(inputs = [Fundamentals.ebitda_asof_date,
                                          Fundamentals.ebitda],
                                mask=mask)
        return USEquityPricing.close.latest / ebitda
In [15]:
VALUE_FACTORS = {
    'DividendYield'            : ValueFactors.DividendYield,
    'EBITDAYield'              : ValueFactors.EBITDAYield,
    'EVToEBITDA'               : ValueFactors.EVToEBITDA,
    'EVToFCF'                  : ValueFactors.EVToFCF,
    'PriceToBook'              : ValueFactors.PriceToBook,
    'PriceToDilutedEarningsTTM': ValueFactors.PriceToDilutedEarningsTTM,
    'PriceToEarningsTTM'       : ValueFactors.PriceToEarningsTTM,
    'PriceToFCF'               : ValueFactors.PriceToFCF,
    'PriceToForwardEarnings'   : ValueFactors.PriceToForwardEarnings,
    'PriceToOperatingCashflow' : ValueFactors.PriceToOperatingCashflow,
    'PriceToSalesTTM'          : ValueFactors.PriceToSalesTTM,
}
In [16]:
value_factors, t = factor_pipeline(VALUE_FACTORS)
print('Pipeline run time {:.2f} secs'.format(t))
value_factors.info()
Running pipeline: 2017-01-01 00:00:00  -  2017-07-02 00:00:00
Pipeline Execution Time: 49.51 Seconds
Time to run this chunk of the pipeline 52.05 secs
Running pipeline: 2017-07-04 00:00:00  -  2018-01-02 00:00:00
/venvs/py35/lib/python3.5/site-packages/numpy/lib/arraysetops.py:200: FutureWarning: In the future, NAT != NAT will be True rather than False.
  flag = np.concatenate(([True], aux[1:] != aux[:-1]))
Pipeline Execution Time: 37.55 Seconds
Time to run this chunk of the pipeline 38.93 secs
Running pipeline: 2018-01-03 00:00:00  -  2018-07-04 00:00:00
Pipeline Execution Time: 38.12 Seconds
Time to run this chunk of the pipeline 39.53 secs
Running pipeline: 2018-07-06 00:00:00  -  2018-12-31 00:00:00
Pipeline Execution Time: 38.06 Seconds
Time to run this chunk of the pipeline 39.46 secs
Time to run the entire pipeline 169.98 secs
Pipeline run time 169.99 secs
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 11 columns):
DividendYield                19739 non-null float64
EBITDAYield                  21929 non-null float64
EVToEBITDA                   21929 non-null float64
EVToFCF                      25005 non-null float64
PriceToBook                  25100 non-null float64
PriceToDilutedEarningsTTM    24985 non-null float64
PriceToEarningsTTM           24804 non-null float64
PriceToFCF                   25100 non-null float64
PriceToForwardEarnings       25080 non-null float64
PriceToOperatingCashflow     25100 non-null float64
PriceToSalesTTM              25100 non-null float64
dtypes: float64(11)
memory usage: 2.3+ MB

Momentum

In [17]:
class MomentumFactors:
    """Custom Momentum Factors"""
    class PercentAboveLow(CustomFactor):
        """Percentage of current close above low 
        in lookback window of window_length days
        """
        inputs = [USEquityPricing.close]
        window_length = 252

        def compute(self, today, assets, out, close):
            out[:] = close[-1] / np.min(close, axis=0) - 1

    class PercentBelowHigh(CustomFactor):
        """Percentage of current close below high 
        in lookback window of window_length days
        """
        
        inputs = [USEquityPricing.close]
        window_length = 252
            
        def compute(self, today, assets, out, close):
            out[:] = close[-1] / np.max(close, axis=0) - 1

    @staticmethod
    def make_dx(timeperiod=14):
        class DX(CustomFactor):
            """Directional Movement Index"""
            inputs = [USEquityPricing.high, 
                      USEquityPricing.low, 
                      USEquityPricing.close]
            window_length = timeperiod + 1
            
            def compute(self, today, assets, out, high, low, close):
                out[:] = [talib.DX(high[:, i], 
                                   low[:, i], 
                                   close[:, i], 
                                   timeperiod=timeperiod)[-1] 
                          for i in range(len(assets))]
        return DX  

    @staticmethod
    def make_mfi(timeperiod=14):
        class MFI(CustomFactor):
            """Money Flow Index"""
            inputs = [USEquityPricing.high, 
                      USEquityPricing.low, 
                      USEquityPricing.close,
                      USEquityPricing.volume]
            window_length = timeperiod + 1
            
            def compute(self, today, assets, out, high, low, close, vol):
                out[:] = [talib.MFI(high[:, i], 
                                    low[:, i], 
                                    close[:, i],
                                    vol[:, i],
                                    timeperiod=timeperiod)[-1] 
                          for i in range(len(assets))]
        return MFI           

    @staticmethod
    def make_oscillator(fastperiod=12, slowperiod=26, matype=0):
        class PPO(CustomFactor):
            """12/26-Day Percent Price Oscillator"""
            inputs = [USEquityPricing.close]
            window_length = slowperiod

            def compute(self, today, assets, out, close_prices):
                out[:] = [talib.PPO(close,
                                    fastperiod=fastperiod,
                                    slowperiod=slowperiod, 
                                    matype=matype)[-1]
                         for close in close_prices.T]
        return PPO

    @staticmethod
    def make_stochastic_oscillator(fastk_period=5, slowk_period=3, slowd_period=3, 
                                   slowk_matype=0, slowd_matype=0):                
        class StochasticOscillator(CustomFactor):
            """20-day Stochastic Oscillator """
            inputs = [USEquityPricing.high, 
                      USEquityPricing.low, 
                      USEquityPricing.close]
            outputs = ['slowk', 'slowd']
            window_length = fastk_period * 2
            
            def compute(self, today, assets, out, high, low, close):
                slowk, slowd = [talib.STOCH(high[:, i],
                                            low[:, i],
                                            close[:, i], 
                                            fastk_period=fastk_period,
                                            slowk_period=slowk_period, 
                                            slowk_matype=slowk_matype, 
                                            slowd_period=slowd_period, 
                                            slowd_matype=slowd_matype)[-1] 
                                for i in range(len(assets))]

                out.slowk[:], out.slowd[:] = slowk[-1], slowd[-1]
        return StochasticOscillator
    
    @staticmethod
    def make_trendline(timeperiod=252):                
        class Trendline(CustomFactor):
            inputs = [USEquityPricing.close]
            """52-Week Trendline"""
            window_length = timeperiod

            def compute(self, today, assets, out, close_prices):
                out[:] = [talib.LINEARREG_SLOPE(close, 
                                   timeperiod=timeperiod)[-1] 
                          for close in close_prices.T]
        return Trendline
In [18]:
MOMENTUM_FACTORS = {
    'Percent Above Low'            : MomentumFactors.PercentAboveLow,
    'Percent Below High'           : MomentumFactors.PercentBelowHigh,
    'Price Oscillator'             : MomentumFactors.make_oscillator(),
    'Money Flow Index'             : MomentumFactors.make_mfi(),
    'Directional Movement Index'   : MomentumFactors.make_dx(),
    'Trendline'                    : MomentumFactors.make_trendline()
}
In [19]:
momentum_factors, t = factor_pipeline(MOMENTUM_FACTORS)
print('Pipeline run time {:.2f} secs'.format(t))
momentum_factors.info()
Running pipeline: 2017-01-01 00:00:00  -  2017-07-02 00:00:00
Pipeline Execution Time: 5.02 Seconds
Time to run this chunk of the pipeline 6.43 secs
Running pipeline: 2017-07-04 00:00:00  -  2018-01-02 00:00:00
Pipeline Execution Time: 5.06 Seconds
Time to run this chunk of the pipeline 6.47 secs
Running pipeline: 2018-01-03 00:00:00  -  2018-07-04 00:00:00
Pipeline Execution Time: 4.99 Seconds
Time to run this chunk of the pipeline 6.44 secs
Running pipeline: 2018-07-06 00:00:00  -  2018-12-31 00:00:00
Pipeline Execution Time: 5.07 Seconds
Time to run this chunk of the pipeline 6.40 secs
Time to run the entire pipeline 25.75 secs
Pipeline run time 25.75 secs
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 6 columns):
Directional Movement Index    25100 non-null float64
Money Flow Index              25100 non-null float64
Percent Above Low             25018 non-null float64
Percent Below High            25018 non-null float64
Price Oscillator              25100 non-null float64
Trendline                     25018 non-null float64
dtypes: float64(6)
memory usage: 1.3+ MB

Efficiency Factors

In [20]:
class EfficiencyFactors:

    @staticmethod
    def CapexToAssets(mask):
        """Capital Expenditure divided by Total Assets"""
        capex = AnnualizedData(inputs = [Fundamentals.capital_expenditure_asof_date,
                                         Fundamentals.capital_expenditure],
                                     mask=mask)   
        assets = Fundamentals.total_assets.latest
        return - capex / assets

    @staticmethod
    def CapexToSales(mask):
        """Capital Expenditure divided by Total Revenue"""
        capex = AnnualizedData(inputs = [Fundamentals.capital_expenditure_asof_date,
                                         Fundamentals.capital_expenditure],
                                     mask=mask)   
        revenue = AnnualizedData(inputs = [Fundamentals.total_revenue_asof_date,
                                         Fundamentals.total_revenue],
                                     mask=mask)         
        return - capex / revenue
  
    @staticmethod
    def CapexToFCF(mask):
        """Capital Expenditure divided by Free Cash Flows"""
        capex = AnnualizedData(inputs = [Fundamentals.capital_expenditure_asof_date,
                                         Fundamentals.capital_expenditure],
                                     mask=mask)   
        free_cash_flow = AnnualizedData(inputs = [Fundamentals.free_cash_flow_asof_date,
                                         Fundamentals.free_cash_flow],
                                     mask=mask)         
        return - capex / free_cash_flow

    @staticmethod
    def EBITToAssets(mask):
        """Earnings Before Interest and Taxes (EBIT) divided by Total Assets"""
        ebit = AnnualizedData(inputs = [Fundamentals.ebit_asof_date,
                                         Fundamentals.ebit],
                                     mask=mask)   
        assets = Fundamentals.total_assets.latest
        return ebit / assets
    
    @staticmethod
    def CFOToAssets(mask):
        """Operating Cash Flows divided by Total Assets"""
        cfo = AnnualizedData(inputs = [Fundamentals.operating_cash_flow_asof_date,
                                         Fundamentals.operating_cash_flow],
                                     mask=mask)   
        assets = Fundamentals.total_assets.latest
        return cfo / assets 
    
    @staticmethod
    def RetainedEarningsToAssets(mask):
        """Retained Earnings divided by Total Assets"""
        retained_earnings = AnnualizedData(inputs = [Fundamentals.retained_earnings_asof_date,
                                         Fundamentals.retained_earnings],
                                     mask=mask)   
        assets = Fundamentals.total_assets.latest
        return retained_earnings / assets
In [21]:
EFFICIENCY_FACTORS = {
    'CFO To Assets' :EfficiencyFactors.CFOToAssets,
    'Capex To Assets' :EfficiencyFactors.CapexToAssets,
    'Capex To FCF' :EfficiencyFactors.CapexToFCF,
    'Capex To Sales' :EfficiencyFactors.CapexToSales,
    'EBIT To Assets' :EfficiencyFactors.EBITToAssets,
    'Retained Earnings To Assets' :EfficiencyFactors.RetainedEarningsToAssets
    }
In [22]:
efficiency_factors, t = factor_pipeline(EFFICIENCY_FACTORS)
print('Pipeline run time {:.2f} secs'.format(t))
efficiency_factors.info()
Running pipeline: 2017-01-01 00:00:00  -  2017-07-02 00:00:00
Pipeline Execution Time: 9.52 Seconds
Time to run this chunk of the pipeline 10.96 secs
Running pipeline: 2017-07-04 00:00:00  -  2018-01-02 00:00:00
Pipeline Execution Time: 10.30 Seconds
Time to run this chunk of the pipeline 11.66 secs
Running pipeline: 2018-01-03 00:00:00  -  2018-07-04 00:00:00
Pipeline Execution Time: 10.36 Seconds
Time to run this chunk of the pipeline 11.82 secs
Running pipeline: 2018-07-06 00:00:00  -  2018-12-31 00:00:00
Pipeline Execution Time: 10.47 Seconds
Time to run this chunk of the pipeline 11.92 secs
Time to run the entire pipeline 46.36 secs
Pipeline run time 46.37 secs
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 6 columns):
CFO To Assets                  25005 non-null float64
Capex To Assets                23566 non-null float64
Capex To FCF                   23566 non-null float64
Capex To Sales                 23566 non-null float64
EBIT To Assets                 22369 non-null float64
Retained Earnings To Assets    25005 non-null float64
dtypes: float64(6)
memory usage: 1.3+ MB

Risk Factors

In [23]:
class RiskFactors:

    @staticmethod
    def LogMarketCap(mask):
        """Log of Market Capitalization log(Close Price * Shares Outstanding)"""
        return np.log(MarketCap(mask=mask))
 
    class DownsideRisk(CustomFactor):
        """Mean returns divided by std of 1yr daily losses (Sortino Ratio)"""
        inputs = [USEquityPricing.close]
        window_length = 252

        def compute(self, today, assets, out, close):
            ret = pd.DataFrame(close).pct_change()
            out[:] = ret.mean().div(ret.where(ret<0).std())

    @staticmethod
    def MarketBeta(**kwargs):
        """Slope of 1-yr regression of price returns against index returns"""
        return SimpleBeta(target=symbols('SPY'), regression_length=252) 

    class DownsideBeta(CustomFactor):
        """Slope of 1yr regression of returns on negative index returns"""
        inputs = [USEquityPricing.close]
        window_length = 252

        def compute(self, today, assets, out, close):
            t = len(close)
            assets = pd.DataFrame(close).pct_change()
            
            start_date = (today - pd.DateOffset(years=1)).strftime('%Y-%m-%d')
            spy = get_pricing('SPY', 
                              start_date=start_date, 
                              end_date=today.strftime('%Y-%m-%d')).reset_index(drop=True)
            spy_neg_ret = (spy
                           .close_price
                           .iloc[-t:]
                           .pct_change()
                           .pipe(lambda x: x.where(x<0)))
    
            out[:] = assets.apply(lambda x: x.cov(spy_neg_ret)).div(spy_neg_ret.var())         

    class Vol3M(CustomFactor):
        """3-month Volatility: Standard deviation of returns over 3 months"""

        inputs = [USEquityPricing.close]
        window_length = 63

        def compute(self, today, assets, out, close):
            out[:] = np.log1p(pd.DataFrame(close).pct_change()).std()
In [24]:
RISK_FACTORS = {
    'Log Market Cap' : RiskFactors.LogMarketCap,
    'Downside Risk'  : RiskFactors.DownsideRisk,
    'Index Beta'     : RiskFactors.MarketBeta,
     #'Downside Beta'  : RiskFactors.DownsideBeta,    
    'Volatility 3M'  : RiskFactors.Vol3M,    
}
In [25]:
risk_factors, t = factor_pipeline(RISK_FACTORS)
print('Pipeline run time {:.2f} secs'.format(t))
risk_factors.info()
Running pipeline: 2017-01-01 00:00:00  -  2017-07-02 00:00:00
Pipeline Execution Time: 10.42 Seconds
Time to run this chunk of the pipeline 12.55 secs
Running pipeline: 2017-07-04 00:00:00  -  2018-01-02 00:00:00
Pipeline Execution Time: 10.37 Seconds
Time to run this chunk of the pipeline 11.74 secs
Running pipeline: 2018-01-03 00:00:00  -  2018-07-04 00:00:00
Pipeline Execution Time: 10.65 Seconds
Time to run this chunk of the pipeline 12.03 secs
Running pipeline: 2018-07-06 00:00:00  -  2018-12-31 00:00:00
Pipeline Execution Time: 10.49 Seconds
Time to run this chunk of the pipeline 11.87 secs
Time to run the entire pipeline 48.19 secs
Pipeline run time 55.26 secs
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 4 columns):
Downside Risk     25100 non-null float64
Index Beta        25100 non-null float64
Log Market Cap    25100 non-null float64
Volatility 3M     25100 non-null float64
dtypes: float64(4)
memory usage: 980.5+ KB

Growth Factors

In [26]:
def growth_pipeline():
    revenue = AnnualizedData(inputs = [Fundamentals.total_revenue_asof_date,
                                       Fundamentals.total_revenue],
                             mask=UNIVERSE)
    eps = AnnualizedData(inputs = [Fundamentals.diluted_eps_earnings_reports_asof_date,
                                       Fundamentals.diluted_eps_earnings_reports],
                             mask=UNIVERSE)    

    return Pipeline({'Sales': revenue,
                     'EPS': eps,
                     'Total Assets': Fundamentals.total_assets.latest,
                     'Net Debt': Fundamentals.net_debt.latest},
                    screen=UNIVERSE)
In [27]:
start_timer = time()
growth_factors = run_pipeline(growth_pipeline(), start_date=START, end_date=END)

for col in growth_factors.columns:
    for month in [3, 12]:
        new_col = col + ' Growth {}M'.format(month)
        kwargs = {new_col: growth_factors[col].pct_change(month*MONTH).groupby(level=1).rank()}        
        growth_factors = growth_factors.assign(**kwargs)
print('Pipeline run time {:.2f} secs'.format(time() - start_timer))
growth_factors.info()
Pipeline Execution Time: 20.40 Seconds
Pipeline run time 24.33 secs
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 12 columns):
EPS                        24985 non-null float64
Net Debt                   23832 non-null float64
Sales                      25005 non-null float64
Total Assets               25100 non-null float64
EPS Growth 3M              24922 non-null float64
EPS Growth 12M             24733 non-null float64
Net Debt Growth 3M         23772 non-null float64
Net Debt Growth 12M        23595 non-null float64
Sales Growth 3M            24942 non-null float64
Sales Growth 12M           24753 non-null float64
Total Assets Growth 3M     25037 non-null float64
Total Assets Growth 12M    24848 non-null float64
dtypes: float64(12)
memory usage: 2.5+ MB

Quality Factors

In [28]:
class QualityFactors:
    
    @staticmethod
    def AssetTurnover(mask):
        """Sales divided by average of year beginning and year end assets"""

        assets = AnnualAvg(inputs=[Fundamentals.total_assets],
                           mask=mask)
        sales = AnnualizedData([Fundamentals.total_revenue_asof_date,
                                Fundamentals.total_revenue], mask=mask)
        return sales / assets
  
    @staticmethod
    def CurrentRatio(mask):
        """Total current assets divided by total current liabilities"""

        assets = Fundamentals.current_assets.latest
        liabilities = Fundamentals.current_liabilities.latest
        return assets / liabilities
    
    @staticmethod
    def AssetToEquityRatio(mask):
        """Total current assets divided by common equity"""

        assets = Fundamentals.current_assets.latest
        equity = Fundamentals.common_stock.latest
        return assets / equity    

    
    @staticmethod
    def InterestCoverage(mask):
        """EBIT divided by interest expense"""

        ebit = AnnualizedData(inputs = [Fundamentals.ebit_asof_date,
                                        Fundamentals.ebit], mask=mask)  
        
        interest_expense = AnnualizedData(inputs = [Fundamentals.interest_expense_asof_date,
                                        Fundamentals.interest_expense], mask=mask)
        return ebit / interest_expense

    @staticmethod
    def DebtToAssetRatio(mask):
        """Total Debts divided by Total Assets"""

        debt = Fundamentals.total_debt.latest
        assets = Fundamentals.total_assets.latest
        return debt / assets
    
    @staticmethod
    def DebtToEquityRatio(mask):
        """Total Debts divided by Common Stock Equity"""

        debt = Fundamentals.total_debt.latest
        equity = Fundamentals.common_stock.latest
        return debt / equity    

    @staticmethod
    def WorkingCapitalToAssets(mask):
        """Current Assets less Current liabilities (Working Capital) divided by Assets"""

        working_capital = Fundamentals.working_capital.latest
        assets = Fundamentals.total_assets.latest
        return working_capital / assets
 
    @staticmethod
    def WorkingCapitalToSales(mask):
        """Current Assets less Current liabilities (Working Capital), divided by Sales"""

        working_capital = Fundamentals.working_capital.latest
        sales = AnnualizedData([Fundamentals.total_revenue_asof_date,
                                Fundamentals.total_revenue], mask=mask)        
        return working_capital / sales          
       
        
    class MertonsDD(CustomFactor):
        """Merton's Distance to Default """
        
        inputs = [Fundamentals.total_assets,
                  Fundamentals.total_liabilities, 
                  libor.value, 
                  USEquityPricing.close]
        window_length = 252

        def compute(self, today, assets, out, tot_assets, tot_liabilities, r, close):
            mertons = []

            for col_assets, col_liabilities, col_r, col_close in zip(tot_assets.T, tot_liabilities.T,
                                                                     r.T, close.T):
                vol_1y = np.nanstd(col_close)
                numerator = np.log(
                        col_assets[-1] / col_liabilities[-1]) + ((252 * col_r[-1]) - ((vol_1y ** 2) / 2))
                mertons.append(numerator / vol_1y)

            out[:] = mertons            
In [29]:
QUALITY_FACTORS = {
    'AssetToEquityRatio'    : QualityFactors.AssetToEquityRatio,
    'AssetTurnover'         : QualityFactors.AssetTurnover,
    'CurrentRatio'          : QualityFactors.CurrentRatio,
    'DebtToAssetRatio'      : QualityFactors.DebtToAssetRatio,
    'DebtToEquityRatio'     : QualityFactors.DebtToEquityRatio,
    'InterestCoverage'      : QualityFactors.InterestCoverage,
    'MertonsDD'             : QualityFactors.MertonsDD,
    'WorkingCapitalToAssets': QualityFactors.WorkingCapitalToAssets,
    'WorkingCapitalToSales' : QualityFactors.WorkingCapitalToSales,
}
    
In [30]:
quality_factors, t = factor_pipeline(QUALITY_FACTORS)
print('Pipeline run time {:.2f} secs'.format(t))
quality_factors.info()
Running pipeline: 2017-01-01 00:00:00  -  2017-07-02 00:00:00
Pipeline Execution Time: 34.88 Seconds
Time to run this chunk of the pipeline 36.23 secs
Running pipeline: 2017-07-04 00:00:00  -  2018-01-02 00:00:00
Pipeline Execution Time: 32.94 Seconds
Time to run this chunk of the pipeline 34.32 secs
Running pipeline: 2018-01-03 00:00:00  -  2018-07-04 00:00:00
Pipeline Execution Time: 32.99 Seconds
Time to run this chunk of the pipeline 34.36 secs
Running pipeline: 2018-07-06 00:00:00  -  2018-12-31 00:00:00
Pipeline Execution Time: 32.61 Seconds
Time to run this chunk of the pipeline 33.97 secs
Time to run the entire pipeline 138.88 secs
Pipeline run time 138.89 secs
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 9 columns):
AssetToEquityRatio        22464 non-null float64
AssetTurnover             24985 non-null float64
CurrentRatio              22464 non-null float64
DebtToAssetRatio          25080 non-null float64
DebtToEquityRatio         24551 non-null float64
InterestCoverage          20461 non-null float64
MertonsDD                 25100 non-null float64
WorkingCapitalToAssets    22464 non-null float64
WorkingCapitalToSales     22369 non-null float64
dtypes: float64(9)
memory usage: 1.9+ MB

Payout Factors

In [31]:
class PayoutFactors:

    @staticmethod
    def DividendPayoutRatio(mask):
        """Dividends Per Share divided by Earnings Per Share"""

        dps = AnnualizedData(inputs = [Fundamentals.dividend_per_share_earnings_reports_asof_date,
                                        Fundamentals.dividend_per_share_earnings_reports], mask=mask)  
        
        eps = AnnualizedData(inputs = [Fundamentals.basic_eps_earnings_reports_asof_date,
                                        Fundamentals.basic_eps_earnings_reports], mask=mask)
        return dps / eps
    
    @staticmethod
    def DividendGrowth(**kwargs):
        """Annualized percentage DPS change"""        
        return Fundamentals.dps_growth.latest    
In [32]:
PAYOUT_FACTORS = {
    'Dividend Payout Ratio': PayoutFactors.DividendPayoutRatio,
    'Dividend Growth': PayoutFactors.DividendGrowth
}
In [33]:
payout_factors, t = factor_pipeline(PAYOUT_FACTORS)
print('Pipeline run time {:.2f} secs'.format(t))
payout_factors.info()
Running pipeline: 2017-01-01 00:00:00  -  2017-07-02 00:00:00
Pipeline Execution Time: 6.58 Seconds
Time to run this chunk of the pipeline 7.93 secs
Running pipeline: 2017-07-04 00:00:00  -  2018-01-02 00:00:00
Pipeline Execution Time: 5.69 Seconds
Time to run this chunk of the pipeline 7.02 secs
Running pipeline: 2018-01-03 00:00:00  -  2018-07-04 00:00:00
Pipeline Execution Time: 5.63 Seconds
Time to run this chunk of the pipeline 7.05 secs
Running pipeline: 2018-07-06 00:00:00  -  2018-12-31 00:00:00
Pipeline Execution Time: 5.88 Seconds
Time to run this chunk of the pipeline 7.27 secs
Time to run the entire pipeline 29.27 secs
Pipeline run time 29.28 secs
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 2 columns):
Dividend Growth          19558 non-null float64
Dividend Payout Ratio    19418 non-null float64
dtypes: float64(2)
memory usage: 588.3+ KB

Profitability Factors

In [34]:
class ProfitabilityFactors:
    
    @staticmethod
    def GrossProfitMargin(mask):
        """Gross Profit divided by Net Sales"""

        gross_profit = AnnualizedData([Fundamentals.gross_profit_asof_date,
                              Fundamentals.gross_profit], mask=mask)  
        sales = AnnualizedData([Fundamentals.total_revenue_asof_date,
                                Fundamentals.total_revenue], mask=mask)
        return gross_profit / sales   
    
    @staticmethod
    def NetIncomeMargin(mask):
        """Net income divided by Net Sales"""

        net_income = AnnualizedData([Fundamentals.net_income_income_statement_asof_date,
                              Fundamentals.net_income_income_statement], mask=mask)  
        sales = AnnualizedData([Fundamentals.total_revenue_asof_date,
                                Fundamentals.total_revenue], mask=mask)
        return net_income / sales   
In [35]:
PROFITABIILTY_FACTORS = {
    'Gross Profit Margin': ProfitabilityFactors.GrossProfitMargin,
    'Net Income Margin': ProfitabilityFactors.NetIncomeMargin,
    'Return on Equity': Fundamentals.roe.latest,
    'Return on Assets': Fundamentals.roa.latest,
    'Return on Invested Capital': Fundamentals.roic.latest
}
In [36]:
profitability_factors, t = factor_pipeline(PAYOUT_FACTORS)
print('Pipeline run time {:.2f} secs'.format(t))
payout_factors.info()
Running pipeline: 2017-01-01 00:00:00  -  2017-07-02 00:00:00
Pipeline Execution Time: 5.69 Seconds
Time to run this chunk of the pipeline 7.78 secs
Running pipeline: 2017-07-04 00:00:00  -  2018-01-02 00:00:00
Pipeline Execution Time: 5.78 Seconds
Time to run this chunk of the pipeline 7.20 secs
Running pipeline: 2018-01-03 00:00:00  -  2018-07-04 00:00:00
Pipeline Execution Time: 5.68 Seconds
Time to run this chunk of the pipeline 7.09 secs
Running pipeline: 2018-07-06 00:00:00  -  2018-12-31 00:00:00
Pipeline Execution Time: 6.63 Seconds
Time to run this chunk of the pipeline 8.04 secs
Time to run the entire pipeline 30.12 secs
Pipeline run time 30.13 secs
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 2 columns):
Dividend Growth          19558 non-null float64
Dividend Payout Ratio    19418 non-null float64
dtypes: float64(2)
memory usage: 588.3+ KB

Build Dataset

Get Returns

We will test predictions for various lookahead periods to identify the best holding periods that generate the best predictability, measured by the information coefficient.

More specifically, we compute returns for 1, 5, 10, 20 and 60 days using the built-in Returns function, resulting in over 25,000 observations for the universe of 100 stocks over two years (that include approximately 252 trading days each)

In [37]:
lookahead = [1, 5, 10, 20, 60]
returns = run_pipeline(Pipeline({'Returns{}D'.format(i): Returns(inputs=[USEquityPricing.close], 
                                          window_length=i+1, mask=UNIVERSE) for i in lookahead},
                                screen=UNIVERSE),
                       start_date=START, 
                       end_date=END)
return_cols = ['Returns{}D'.format(i) for i in lookahead]
returns.info()
Pipeline Execution Time: 12.53 Seconds
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 5 columns):
Returns10D    25100 non-null float64
Returns1D     25100 non-null float64
Returns20D    25100 non-null float64
Returns5D     25100 non-null float64
Returns60D    25100 non-null float64
dtypes: float64(5)
memory usage: 1.1+ MB

We will use over 50 features that cover a broad range of factors based on market, fundamental, and alternative data. The notebook also includes custom transformations to convert fundamental data that is typically available in quarterly reporting frequency to rolling annual totals or averages to avoid excessive season fluctuations.

Once the factors have been computed we combine them using pd.concat(), assign index names, and create a categorical variable that identifies the asset for each data point:

In [38]:
data = pd.concat([returns,
                 value_factors,
                 momentum_factors,
                 quality_factors,
                 payout_factors,
                 growth_factors,
                 efficiency_factors,
                 risk_factors], axis=1).sortlevel()
data.index.names = ['date', 'asset']
In [39]:
data['stock'] = data.index.get_level_values('asset').map(lambda x: x.asset_name)
data.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 56 columns):
Returns10D                     25100 non-null float64
Returns1D                      25100 non-null float64
Returns20D                     25100 non-null float64
Returns5D                      25100 non-null float64
Returns60D                     25100 non-null float64
DividendYield                  19739 non-null float64
EBITDAYield                    21929 non-null float64
EVToEBITDA                     21929 non-null float64
EVToFCF                        25005 non-null float64
PriceToBook                    25100 non-null float64
PriceToDilutedEarningsTTM      24985 non-null float64
PriceToEarningsTTM             24804 non-null float64
PriceToFCF                     25100 non-null float64
PriceToForwardEarnings         25080 non-null float64
PriceToOperatingCashflow       25100 non-null float64
PriceToSalesTTM                25100 non-null float64
Directional Movement Index     25100 non-null float64
Money Flow Index               25100 non-null float64
Percent Above Low              25018 non-null float64
Percent Below High             25018 non-null float64
Price Oscillator               25100 non-null float64
Trendline                      25018 non-null float64
AssetToEquityRatio             22464 non-null float64
AssetTurnover                  24985 non-null float64
CurrentRatio                   22464 non-null float64
DebtToAssetRatio               25080 non-null float64
DebtToEquityRatio              24551 non-null float64
InterestCoverage               20461 non-null float64
MertonsDD                      25100 non-null float64
WorkingCapitalToAssets         22464 non-null float64
WorkingCapitalToSales          22369 non-null float64
Dividend Growth                19558 non-null float64
Dividend Payout Ratio          19418 non-null float64
EPS                            24985 non-null float64
Net Debt                       23832 non-null float64
Sales                          25005 non-null float64
Total Assets                   25100 non-null float64
EPS Growth 3M                  24922 non-null float64
EPS Growth 12M                 24733 non-null float64
Net Debt Growth 3M             23772 non-null float64
Net Debt Growth 12M            23595 non-null float64
Sales Growth 3M                24942 non-null float64
Sales Growth 12M               24753 non-null float64
Total Assets Growth 3M         25037 non-null float64
Total Assets Growth 12M        24848 non-null float64
CFO To Assets                  25005 non-null float64
Capex To Assets                23566 non-null float64
Capex To FCF                   23566 non-null float64
Capex To Sales                 23566 non-null float64
EBIT To Assets                 22369 non-null float64
Retained Earnings To Assets    25005 non-null float64
Downside Risk                  25100 non-null float64
Index Beta                     25100 non-null float64
Log Market Cap                 25100 non-null float64
Volatility 3M                  25100 non-null float64
stock                          25100 non-null object
dtypes: float64(55), object(1)
memory usage: 10.9+ MB

Visualizing missing values

In [40]:
# Craete sorted dataframe of numeric_features with missing_count
missing_values0 = data.isnull().sum(axis=0).reset_index()
missing_values0.columns = ['column_name', 'missing_count']
missing_values0 = missing_values0.loc[missing_values0['missing_count']>0]
missing_values0 = missing_values0.sort_values(by='missing_count')
In [41]:
# Get percantage of total NaNs numeric_features
total0 = data.isnull().sum().sort_values(ascending=False)
percent0 = (data.isnull().sum()/data.isnull().count()).sort_values(ascending=False)
missing_data0 = pd.concat([total0, percent0], axis=1,join='outer', keys=['Total Missing Count', '% of Total Observations'])
missing_data0.index.name =' Numeric Feature'

missing_data0.head(len(data.columns))
Out[41]:
Total Missing Count % of Total Observations
Numeric Feature
Dividend Payout Ratio 5682 0.226375
Dividend Growth 5542 0.220797
DividendYield 5361 0.213586
InterestCoverage 4639 0.184821
EVToEBITDA 3171 0.126335
EBITDAYield 3171 0.126335
EBIT To Assets 2731 0.108805
WorkingCapitalToSales 2731 0.108805
AssetToEquityRatio 2636 0.105020
CurrentRatio 2636 0.105020
WorkingCapitalToAssets 2636 0.105020
Capex To Sales 1534 0.061116
Capex To FCF 1534 0.061116
Capex To Assets 1534 0.061116
Net Debt Growth 12M 1505 0.059960
Net Debt Growth 3M 1328 0.052908
Net Debt 1268 0.050518
DebtToEquityRatio 549 0.021873
EPS Growth 12M 367 0.014622
Sales Growth 12M 347 0.013825
PriceToEarningsTTM 296 0.011793
Total Assets Growth 12M 252 0.010040
EPS Growth 3M 178 0.007092
Sales Growth 3M 158 0.006295
EPS 115 0.004582
PriceToDilutedEarningsTTM 115 0.004582
AssetTurnover 115 0.004582
CFO To Assets 95 0.003785
Retained Earnings To Assets 95 0.003785
Sales 95 0.003785
EVToFCF 95 0.003785
Percent Below High 82 0.003267
Percent Above Low 82 0.003267
Trendline 82 0.003267
Total Assets Growth 3M 63 0.002510
DebtToAssetRatio 20 0.000797
PriceToForwardEarnings 20 0.000797
Returns1D 0 0.000000
Returns20D 0 0.000000
Returns5D 0 0.000000
PriceToBook 0 0.000000
PriceToFCF 0 0.000000
Returns60D 0 0.000000
stock 0 0.000000
PriceToOperatingCashflow 0 0.000000
PriceToSalesTTM 0 0.000000
Directional Movement Index 0 0.000000
Money Flow Index 0 0.000000
Price Oscillator 0 0.000000
Volatility 3M 0 0.000000
MertonsDD 0 0.000000
Total Assets 0 0.000000
Downside Risk 0 0.000000
Index Beta 0 0.000000
Log Market Cap 0 0.000000
Returns10D 0 0.000000
In [42]:
ind0 = np.arange(missing_values0.shape[0])
width0 = 0.1
fig, ax = plt.subplots(figsize=(13,5))
colors0 = sns.color_palette('Set2', len(ind0))
rects0 = ax.bar(ind0, missing_values0.missing_count.values, color=colors0)
ax.set_xticks(ind0)
ax.set_xticklabels(missing_values0.column_name.values, rotation='vertical')
ax.set_ylabel("Count")
ax.set_title("Missing Observations Count")
ax.margins(0.001)
plt.show()

Remove columns and rows with less than 80% of data availability

In a next step, we remove rows and columns that lack more than 20 percent of the observations, resulting in a loss of six percent of the observations and 5 columns:

In [43]:
rows_before, cols_before = data.shape
data = (data
        .dropna(axis=1, thresh=int(len(data)*.8))
        .dropna(thresh=int(len(data.columns) * .8)))
#data = data.fillna(data.median())
data = data.bfill().ffill()
rows_after, cols_after = data.shape
print('{:,d} rows and {:,d} columns dropped'.format(rows_before-rows_after, cols_before-cols_after))
1,571 rows and 3 columns dropped

At this point, we have 51 features and the categorical identifier of the stock:

In [44]:
data.sort_index(1).info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 23529 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 53 columns):
AssetToEquityRatio             23529 non-null float64
AssetTurnover                  23529 non-null float64
CFO To Assets                  23529 non-null float64
Capex To Assets                23529 non-null float64
Capex To FCF                   23529 non-null float64
Capex To Sales                 23529 non-null float64
CurrentRatio                   23529 non-null float64
DebtToAssetRatio               23529 non-null float64
DebtToEquityRatio              23529 non-null float64
Directional Movement Index     23529 non-null float64
Downside Risk                  23529 non-null float64
EBIT To Assets                 23529 non-null float64
EBITDAYield                    23529 non-null float64
EPS                            23529 non-null float64
EPS Growth 12M                 23529 non-null float64
EPS Growth 3M                  23529 non-null float64
EVToEBITDA                     23529 non-null float64
EVToFCF                        23529 non-null float64
Index Beta                     23529 non-null float64
InterestCoverage               23529 non-null float64
Log Market Cap                 23529 non-null float64
MertonsDD                      23529 non-null float64
Money Flow Index               23529 non-null float64
Net Debt                       23529 non-null float64
Net Debt Growth 12M            23529 non-null float64
Net Debt Growth 3M             23529 non-null float64
Percent Above Low              23529 non-null float64
Percent Below High             23529 non-null float64
Price Oscillator               23529 non-null float64
PriceToBook                    23529 non-null float64
PriceToDilutedEarningsTTM      23529 non-null float64
PriceToEarningsTTM             23529 non-null float64
PriceToFCF                     23529 non-null float64
PriceToForwardEarnings         23529 non-null float64
PriceToOperatingCashflow       23529 non-null float64
PriceToSalesTTM                23529 non-null float64
Retained Earnings To Assets    23529 non-null float64
Returns10D                     23529 non-null float64
Returns1D                      23529 non-null float64
Returns20D                     23529 non-null float64
Returns5D                      23529 non-null float64
Returns60D                     23529 non-null float64
Sales                          23529 non-null float64
Sales Growth 12M               23529 non-null float64
Sales Growth 3M                23529 non-null float64
Total Assets                   23529 non-null float64
Total Assets Growth 12M        23529 non-null float64
Total Assets Growth 3M         23529 non-null float64
Trendline                      23529 non-null float64
Volatility 3M                  23529 non-null float64
WorkingCapitalToAssets         23529 non-null float64
WorkingCapitalToSales          23529 non-null float64
stock                          23529 non-null object
dtypes: float64(52), object(1)
memory usage: 9.7+ MB

Data Exploration

First lets take a look at the individual distributions of all our data.

In [45]:
data.hist(bins=25, figsize=(22,22))
plt.show()

It is always a good idea to check the relationship between your features and target variable. Here we will look at a scatter plot of the 60 day target variable along with the p-value, r2 score and mean IC (information coefficient) for each feature.

In [50]:
tmp = data.drop(['Returns1D','Returns5D','Returns10D','Returns20D'], axis=1)
tmp.reset_index(level=['asset'], inplace=True, drop=True)
tmp.head()
Out[50]:
Returns60D EBITDAYield EVToEBITDA EVToFCF PriceToBook PriceToDilutedEarningsTTM PriceToEarningsTTM PriceToFCF PriceToForwardEarnings PriceToOperatingCashflow ... Capex To Assets Capex To FCF Capex To Sales EBIT To Assets Retained Earnings To Assets Downside Risk Index Beta Log Market Cap Volatility 3M stock
date
2017-01-03 00:00:00+00:00 0.029940 5.0 11.0 18.0 31.0 14.0 1063.0 16.0 1161.0 18.0 ... 29.0 22.0 28.0 40.0 26.0 24.0 3645.0 50.0 11.0 APPLE INC
2017-01-03 00:00:00+00:00 0.165562 34.0 24.0 15.0 50.0 35.0 2345.0 18.0 1833.0 16.0 ... 24.0 27.0 13.0 13.0 34.0 23.0 4675.0 17.0 16.0 BOEING CO
2017-01-03 00:00:00+00:00 0.066606 32.0 36.0 49.0 35.0 40.0 2788.0 49.0 2400.0 48.0 ... 25.0 44.0 26.0 31.0 47.0 6.0 2556.0 19.0 43.0 BRISTOL-MYERS SQUIBB CO
2017-01-03 00:00:00+00:00 0.099725 36.0 39.0 37.0 46.0 46.0 3390.0 40.0 1883.0 41.0 ... 12.0 12.0 14.0 27.0 28.0 10.0 5036.0 14.0 44.0 CELGENE CORP
2017-01-03 00:00:00+00:00 0.055036 13.0 10.0 39.0 21.0 28.0 1892.0 34.0 2227.0 14.0 ... 38.0 46.0 42.0 17.0 18.0 46.0 2461.0 29.0 14.0 COMCAST CORP

5 rows × 49 columns

In [47]:
def r2(x, y):
    return pearsonr(x, y)[0] ** 2
In [54]:
count = 0
for i, feature in enumerate(list(tmp), 1):
    count += 1
    
    if(feature == 'Returns60D'):
        print()
        
    else:
        print('{} # {}'.format(feature, count))
        plt.figure(figsize=(8,5))
        
        cm = plt.get_cmap('jet')
        colors = np.linspace(0.1, 1, len(tmp))
                             
        sc = plt.scatter(tmp[feature], tmp['Returns60D'], s=25, c=colors, cmap=cm, 
                 edgecolor='k', alpha=0.3, label='Price Data')
        
        j = sns.regplot(tmp[feature], tmp['Returns60D'], data=tmp, scatter=False, 
                line_kws={'color':'k','lw':2, 'linestyle':'dashed'})
    
        cb = plt.colorbar(sc)
        cb.ax.set_yticklabels([str(p) for p in tmp[::len(tmp)//9].index],
                         fontdict = {'fontsize': 10,
                                     'fontweight': 'medium'})
    
        plt.xlabel('{}'.format(feature), size=10, labelpad=10, fontsize=10, fontweight='medium')
        plt.ylabel('Returns60D', size=10, labelpad=10, fontsize=10, fontweight='medium')
        plt.grid(False)
        ic, pval = spearmanr(tmp[feature], tmp['Returns60D'])
        R2 = r2(tmp[feature], tmp['Returns60D'])
        plt.title('r2 = {}, IC = {}, P-Value = {}'.format(round(R2,4), round(ic,4), pval))
            
        for j in range(2):
            plt.tick_params(axis='x', labelsize=10)
            plt.tick_params(axis='y', labelsize=10)
            
        plt.show()
        
        if(count == len(tmp.columns)-1):
            break
EBITDAYield # 2
EVToEBITDA # 3
EVToFCF # 4
PriceToBook # 5
PriceToDilutedEarningsTTM # 6
PriceToEarningsTTM # 7
PriceToFCF # 8
PriceToForwardEarnings # 9
PriceToOperatingCashflow # 10
PriceToSalesTTM # 11
Directional Movement Index # 12
Money Flow Index # 13
Percent Above Low # 14
Percent Below High # 15
Price Oscillator # 16
Trendline # 17
AssetToEquityRatio # 18
AssetTurnover # 19
CurrentRatio # 20
DebtToAssetRatio # 21
DebtToEquityRatio # 22
InterestCoverage # 23
MertonsDD # 24
WorkingCapitalToAssets # 25
WorkingCapitalToSales # 26
EPS # 27
Net Debt # 28
Sales # 29
Total Assets # 30
EPS Growth 3M # 31
EPS Growth 12M # 32
Net Debt Growth 3M # 33
Net Debt Growth 12M # 34
Sales Growth 3M # 35
Sales Growth 12M # 36
Total Assets Growth 3M # 37
Total Assets Growth 12M # 38
CFO To Assets # 39
Capex To Assets # 40
Capex To FCF # 41
Capex To Sales # 42
EBIT To Assets # 43
Retained Earnings To Assets # 44
Downside Risk # 45
Index Beta # 46
Log Market Cap # 47
Volatility 3M # 48

For linear regression models, it is important to explore the correlation among the features to identify multicollinearity issues, and to check the correlation between the features and the target. The notebook contains a seaborn clustermap that shows the hierarchical structure of the feature correlation matrix. It identifies a small number of highly correlated clusters.

In [55]:
g = sns.clustermap(data.drop(['stock'] + return_cols, axis=1).corr(), square=True)
g.ax_heatmap.set_yticklabels(g.ax_heatmap.get_yticklabels(), rotation=0)
plt.title('Correlation of all_features',y=1, x=5,size=20)
plt.show();

Dummy encoding of categorical variables

We need to convert the categorical stock variable into a numeric format so that the linear regression can process it. For this purpose, we use dummy encoding that creates individual columns for each category level and flags the presence of this level in the original categorical column with an entry of 1, and 0 otherwise. The pandas function get_dummies() automates dummy encoding. It detects and properly converts columns of type objects as illustrated next. If you need dummy variables for columns containing integers, for instance, you can identify them using the keyword columns:

In [56]:
X = pd.get_dummies(data.drop(return_cols, axis=1), prefix_sep='_')
X.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 23529 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Columns: 116 entries, EBITDAYield to stock_WELLS FARGO & CO(NEW)
dtypes: float64(116)
memory usage: 21.0+ MB

Creating forward returns

The goal is to predict returns over a given holding period. Hence, we need to align the features with return values with the corresponding return data point 1, 5, 10, 20 or 60 days into the future for each equity. We achieve this by combining the pandas .groupby() method with the .shift() method as follows:

In [57]:
y = data.loc[:, return_cols]
shifted_y = []
for col in y.columns:
    t = int(re.search(r'\d+', col).group(0))
    shifted_y.append(y.groupby(level='asset')['Returns{}D'.format(t)].shift(-t).to_frame(col))
y = pd.concat(shifted_y, axis=1)
y.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 23529 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 5 columns):
Returns1D     23460 non-null float64
Returns5D     23184 non-null float64
Returns10D    22839 non-null float64
Returns20D    22149 non-null float64
Returns60D    19508 non-null float64
dtypes: float64(5)
memory usage: 1.1+ MB
In [59]:
ax = sns.boxplot(y[return_cols])
ax.set_title('Return Distriubtions');

Linear Regression

Statsmodels

We can estimate a linear regression model using OLS with statsmodels. We select a forward return, for example for a 1-day holding period, remove outliers below the 2.5% and above the 97.5% percentiles, and fit the model accordingly:

In [60]:
target = 'Returns1D'
model_data = pd.concat([y[[target]], X], axis=1).dropna()
model_data = model_data[model_data[target].between(model_data[target].quantile(.025), 
                                                   model_data[target].quantile(.975))]

model = OLS(endog=model_data[target], exog=model_data.drop(target, axis=1))
trained_model = model.fit()
trained_model.summary()
Out[60]:
OLS Regression Results
Dep. Variable: Returns1D R-squared: 0.010
Model: OLS Adj. R-squared: 0.005
Method: Least Squares F-statistic: 2.052
Date: Tue, 03 Mar 2020 Prob (F-statistic): 5.34e-10
Time: 18:53:42 Log-Likelihood: 66771.
No. Observations: 22286 AIC: -1.333e+05
Df Residuals: 22174 BIC: -1.324e+05
Df Model: 111
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
EBITDAYield 1.065e-05 3.65e-05 0.291 0.771 -6.09e-05 8.22e-05
EVToEBITDA 3.372e-05 3.29e-05 1.025 0.306 -3.08e-05 9.82e-05
EVToFCF 8.299e-07 2.12e-05 0.039 0.969 -4.06e-05 4.23e-05
PriceToBook 9.263e-06 1.96e-05 0.473 0.637 -2.92e-05 4.77e-05
PriceToDilutedEarningsTTM 3.505e-05 1.34e-05 2.615 0.009 8.78e-06 6.13e-05
PriceToEarningsTTM -6.485e-07 2.41e-07 -2.694 0.007 -1.12e-06 -1.77e-07
PriceToFCF 4.116e-06 2.55e-05 0.161 0.872 -4.58e-05 5.41e-05
PriceToForwardEarnings -1.585e-06 4.28e-07 -3.700 0.000 -2.42e-06 -7.45e-07
PriceToOperatingCashflow 1.644e-06 1.39e-05 0.119 0.906 -2.55e-05 2.88e-05
PriceToSalesTTM -1.978e-06 8.37e-07 -2.362 0.018 -3.62e-06 -3.37e-07
Directional Movement Index 2.858e-06 5.75e-06 0.497 0.619 -8.41e-06 1.41e-05
Money Flow Index -6.4e-06 7.6e-06 -0.842 0.400 -2.13e-05 8.49e-06
Percent Above Low 3.01e-05 1.77e-05 1.702 0.089 -4.57e-06 6.48e-05
Percent Below High -1.671e-05 1.09e-05 -1.527 0.127 -3.82e-05 4.74e-06
Price Oscillator 1.143e-05 8.17e-06 1.399 0.162 -4.58e-06 2.74e-05
Trendline -4.609e-05 1.48e-05 -3.115 0.002 -7.51e-05 -1.71e-05
AssetToEquityRatio 1.675e-07 6.15e-07 0.272 0.785 -1.04e-06 1.37e-06
AssetTurnover -9.595e-05 5.46e-05 -1.757 0.079 -0.000 1.11e-05
CurrentRatio -8.405e-08 8.02e-07 -0.105 0.917 -1.66e-06 1.49e-06
DebtToAssetRatio -3.235e-07 5.16e-07 -0.627 0.531 -1.34e-06 6.88e-07
DebtToEquityRatio -4.524e-07 3.96e-07 -1.142 0.253 -1.23e-06 3.24e-07
InterestCoverage 1.216e-05 2.59e-05 0.470 0.639 -3.86e-05 6.29e-05
MertonsDD 0.0003 7.99e-05 3.894 0.000 0.000 0.000
WorkingCapitalToAssets 1.473e-06 9.75e-07 1.511 0.131 -4.39e-07 3.39e-06
WorkingCapitalToSales -0.0001 5.17e-05 -1.984 0.047 -0.000 -1.26e-06
EPS 7.201e-08 1.38e-07 0.523 0.601 -1.98e-07 3.42e-07
Net Debt -2.383e-14 1.04e-14 -2.299 0.022 -4.41e-14 -3.51e-15
Sales -1.756e-14 1.76e-14 -0.998 0.318 -5.21e-14 1.69e-14
Total Assets 9.465e-15 8.61e-15 1.100 0.271 -7.4e-15 2.63e-14
EPS Growth 3M 8.152e-07 6.98e-07 1.168 0.243 -5.52e-07 2.18e-06
EPS Growth 12M 2.448e-07 7.33e-07 0.334 0.738 -1.19e-06 1.68e-06
Net Debt Growth 3M 1.158e-06 8.66e-07 1.338 0.181 -5.39e-07 2.86e-06
Net Debt Growth 12M 7.935e-07 8.06e-07 0.985 0.325 -7.86e-07 2.37e-06
Sales Growth 3M 1.347e-06 8.79e-07 1.532 0.125 -3.76e-07 3.07e-06
Sales Growth 12M -1.057e-06 7.49e-07 -1.412 0.158 -2.52e-06 4.1e-07
Total Assets Growth 3M -2.06e-06 1e-06 -2.059 0.040 -4.02e-06 -9.85e-08
Total Assets Growth 12M -4.765e-07 8.19e-07 -0.582 0.561 -2.08e-06 1.13e-06
CFO To Assets 5.146e-05 3e-05 1.715 0.086 -7.36e-06 0.000
Capex To Assets 1.494e-05 4.9e-05 0.305 0.760 -8.1e-05 0.000
Capex To FCF -7.414e-06 2.15e-05 -0.344 0.731 -4.96e-05 3.48e-05
Capex To Sales -3.405e-05 4.34e-05 -0.785 0.433 -0.000 5.1e-05
EBIT To Assets -3.421e-05 3.34e-05 -1.023 0.306 -9.98e-05 3.13e-05
Retained Earnings To Assets -0.0001 5.34e-05 -1.960 0.050 -0.000 -2.71e-09
Downside Risk 4.683e-05 1.73e-05 2.705 0.007 1.29e-05 8.08e-05
Index Beta -1.026e-07 1.26e-07 -0.817 0.414 -3.49e-07 1.43e-07
Log Market Cap -7.124e-06 3.96e-05 -0.180 0.857 -8.48e-05 7.06e-05
Volatility 3M -2.913e-05 1.13e-05 -2.575 0.010 -5.13e-05 -6.95e-06
stock_ABBVIE INC -0.0007 0.006 -0.104 0.917 -0.013 0.012
stock_ADVANCED MICRO DEVICES INC 0.0149 0.005 3.280 0.001 0.006 0.024
stock_ALLERGAN PLC 0.0050 0.006 0.886 0.376 -0.006 0.016
stock_ALTABA INC 0.0135 0.006 2.312 0.021 0.002 0.025
stock_AMAZON.COM INC 0.0103 0.006 1.620 0.105 -0.002 0.023
stock_AMGEN INC 0.0114 0.004 2.695 0.007 0.003 0.020
stock_APPLE INC 0.0182 0.006 3.167 0.002 0.007 0.029
stock_APPLIED MATERIALS INC 0.0146 0.005 2.716 0.007 0.004 0.025
stock_AT&T INC. COM 0.0040 0.006 0.645 0.519 -0.008 0.016
stock_Alphabet Inc. Cl A 0.0069 0.007 0.994 0.320 -0.007 0.021
stock_BERKSHIRE HATHAWAY INC CL-B 0.0029 0.008 0.349 0.727 -0.013 0.019
stock_BOEING CO 0.0147 0.004 3.432 0.001 0.006 0.023
stock_BOOKING HOLDINGS INC 0.0106 0.007 1.540 0.124 -0.003 0.024
stock_BRISTOL-MYERS SQUIBB CO 0.0196 0.005 3.758 0.000 0.009 0.030
stock_BROADCOM INC 0.0005 0.006 0.079 0.937 -0.012 0.013
stock_CATERPILLAR INC 0.0141 0.005 3.046 0.002 0.005 0.023
stock_CELGENE CORP 0.0145 0.005 2.724 0.006 0.004 0.025
stock_CHARTER COMMUNICATIONS INC 5.498e-06 0.006 0.001 0.999 -0.012 0.012
stock_CHEVRON CORPORATION 0.0050 0.007 0.750 0.453 -0.008 0.018
stock_CISCO SYSTEMS INC 0.0109 0.004 2.868 0.004 0.003 0.018
stock_CITIGROUP -0.0050 0.016 -0.320 0.749 -0.036 0.026
stock_COCA-COLA CO 0.0150 0.006 2.660 0.008 0.004 0.026
stock_COMCAST CORP 0.0132 0.004 3.019 0.003 0.005 0.022
stock_COSTCO WHOLESALE CORP 0.0146 0.005 2.909 0.004 0.005 0.024
stock_CVS HEALTH CORP 0.0090 0.005 1.636 0.102 -0.002 0.020
stock_DELTA AIR LINES INC -0.0007 0.007 -0.099 0.921 -0.014 0.012
stock_DUPONT DE NEMOURS INC -0.0040 0.007 -0.604 0.546 -0.017 0.009
stock_EXXON MOBIL CORPORATION 0.0096 0.008 1.265 0.206 -0.005 0.025
stock_FACEBOOK INC 0.0062 0.007 0.911 0.362 -0.007 0.019
stock_FORD MOTOR CO(NEW) 0.0095 0.005 1.824 0.068 -0.001 0.020
stock_FREEPORT-MCMORAN INC -0.0049 0.005 -0.960 0.337 -0.015 0.005
stock_GENERAL ELECTRIC CO 0.0092 0.005 1.823 0.068 -0.001 0.019
stock_GENERAL MOTORS CO -0.0030 0.007 -0.418 0.676 -0.017 0.011
stock_GILEAD SCIENCES INC 0.0112 0.005 2.174 0.030 0.001 0.021
stock_GOLDMAN SACHS GROUP INC -0.0021 0.009 -0.229 0.819 -0.020 0.016
stock_HOME DEPOT INC 0.0157 0.006 2.619 0.009 0.004 0.028
stock_INTEL CORP 0.0107 0.005 2.242 0.025 0.001 0.020
stock_INTL BUSINESS MACHINES CORP 0.0114 0.006 2.071 0.038 0.001 0.022
stock_JOHNSON AND JOHNSON 0.0134 0.005 2.558 0.011 0.003 0.024
stock_LOWES COMPANIES INC 0.0114 0.005 2.163 0.031 0.001 0.022
stock_MASTERCARD INC 0.0070 0.008 0.924 0.355 -0.008 0.022
stock_MCDONALDS CORP 0.0180 0.007 2.736 0.006 0.005 0.031
stock_MERCK & CO INC 0.0108 0.005 2.205 0.027 0.001 0.020
stock_MICRON TECHNOLOGY INC 0.0067 0.005 1.361 0.174 -0.003 0.016
stock_MICROSOFT CORP 0.0108 0.005 2.073 0.038 0.001 0.021
stock_MORGAN STANLEY -0.0056 0.008 -0.663 0.507 -0.022 0.011
stock_NETFLIX INC 0.0091 0.006 1.436 0.151 -0.003 0.021
stock_NIKE INC CL-B 0.0094 0.005 1.832 0.067 -0.001 0.019
stock_NVIDIA CORP 0.0108 0.007 1.547 0.122 -0.003 0.024
stock_NXP SEMICONDUCTOR NV -0.0035 0.006 -0.595 0.552 -0.015 0.008
stock_ORACLE CORP 0.0094 0.005 1.850 0.064 -0.001 0.019
stock_PAYPAL HLDGS INC COM W.I. 0.0039 0.006 0.622 0.534 -0.008 0.016
stock_PFIZER INC 0.0080 0.005 1.590 0.112 -0.002 0.018
stock_PROCTER & GAMBLE CO 0.0114 0.006 2.024 0.043 0.000 0.022
stock_QUALCOMM INC 0.0085 0.005 1.613 0.107 -0.002 0.019
stock_SALESFORCE.COM INC 0.0061 0.006 0.953 0.341 -0.006 0.019
stock_SCHLUMBERGER LTD. 0.0092 0.005 1.688 0.091 -0.001 0.020
stock_SQUARE INC CLASS A COM STK 0.0098 0.007 1.360 0.174 -0.004 0.024
stock_STARBUCKS CORPORATION 0.0110 0.006 1.813 0.070 -0.001 0.023
stock_TESLA INC 0.0035 0.006 0.537 0.591 -0.009 0.016
stock_TWITTER INC 0.0010 0.006 0.150 0.881 -0.012 0.014
stock_UNION PAC CORP 0.0088 0.006 1.405 0.160 -0.003 0.021
stock_UNITED STATES STEEL CP -0.0005 0.005 -0.104 0.918 -0.010 0.009
stock_UNITEDHEALTH GROUP INC 0.0084 0.006 1.389 0.165 -0.003 0.020
stock_VERIZON COMMUNICATIONS 0.0022 0.006 0.351 0.726 -0.010 0.015
stock_VISA INC 0.0053 0.006 0.844 0.399 -0.007 0.018
stock_WALMART INC 0.0124 0.010 1.248 0.212 -0.007 0.032
stock_WALT DISNEY CO 0.0144 0.005 2.925 0.003 0.005 0.024
stock_WELLS FARGO & CO(NEW) -0.0050 0.017 -0.302 0.763 -0.038 0.028
Omnibus: 227.023 Durbin-Watson: 1.440
Prob(Omnibus): 0.000 Jarque-Bera (JB): 340.744
Skew: -0.106 Prob(JB): 1.02e-74
Kurtosis: 3.568 Cond. No. 2.14e+14

The summary is available in the notebook to save some space due to the large number of variables. The diagnostic statistics show that, given the high p-value on the Jarque—Bera statistic, the hypothesis that the residuals are normally distributed cannot be rejected.

However, the Durbin—Watson statistic is low at 1.4 so we can reject the null hypothesis of no autocorrelation comfortably at the 5% level. Hence, the standard errors are likely positively correlated. If our goal were to understand which factors are significantly associated with forward returns, we would need to rerun the regression using robust standard errors (a parameter in statsmodels .fit() method), or use a different method altogether such as a panel model that allows for more complex error covariance.

In [61]:
target = 'Returns5D'
model_data = pd.concat([y[[target]], X], axis=1).dropna()
model_data = model_data[model_data[target].between(model_data[target].quantile(.025), 
                                                   model_data[target].quantile(.975))]

model = OLS(endog=model_data[target], exog=model_data.drop(target, axis=1))
trained_model = model.fit()
trained_model.summary()
Out[61]:
OLS Regression Results
Dep. Variable: Returns5D R-squared: 0.035
Model: OLS Adj. R-squared: 0.030
Method: Least Squares F-statistic: 7.214
Date: Tue, 03 Mar 2020 Prob (F-statistic): 1.26e-102
Time: 18:55:13 Log-Likelihood: 47052.
No. Observations: 22024 AIC: -9.388e+04
Df Residuals: 21912 BIC: -9.298e+04
Df Model: 111
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
EBITDAYield 1.679e-05 8.8e-05 0.191 0.849 -0.000 0.000
EVToEBITDA 0.0001 7.98e-05 1.461 0.144 -3.99e-05 0.000
EVToFCF 2.269e-05 5e-05 0.454 0.650 -7.53e-05 0.000
PriceToBook -7.907e-06 4.68e-05 -0.169 0.866 -9.96e-05 8.38e-05
PriceToDilutedEarningsTTM 0.0001 3.2e-05 3.699 0.000 5.57e-05 0.000
PriceToEarningsTTM -2.226e-06 5.73e-07 -3.886 0.000 -3.35e-06 -1.1e-06
PriceToFCF 6.027e-05 6.04e-05 0.998 0.318 -5.81e-05 0.000
PriceToForwardEarnings -6.108e-06 1.02e-06 -5.992 0.000 -8.11e-06 -4.11e-06
PriceToOperatingCashflow 5.133e-05 3.33e-05 1.542 0.123 -1.39e-05 0.000
PriceToSalesTTM -1.487e-05 2.03e-06 -7.325 0.000 -1.88e-05 -1.09e-05
Directional Movement Index 1.01e-05 1.36e-05 0.740 0.459 -1.67e-05 3.68e-05
Money Flow Index -5.022e-05 1.81e-05 -2.771 0.006 -8.57e-05 -1.47e-05
Percent Above Low 3.52e-05 4.25e-05 0.829 0.407 -4.8e-05 0.000
Percent Below High -1.059e-05 2.6e-05 -0.407 0.684 -6.16e-05 4.04e-05
Price Oscillator 2.426e-05 1.95e-05 1.247 0.213 -1.39e-05 6.24e-05
Trendline -9.599e-05 3.53e-05 -2.716 0.007 -0.000 -2.67e-05
AssetToEquityRatio 1.551e-06 1.5e-06 1.031 0.303 -1.4e-06 4.5e-06
AssetTurnover -0.0007 0.000 -5.625 0.000 -0.001 -0.000
CurrentRatio 6.008e-07 1.91e-06 0.314 0.754 -3.15e-06 4.35e-06
DebtToAssetRatio -3.042e-06 1.24e-06 -2.459 0.014 -5.47e-06 -6.17e-07
DebtToEquityRatio -4.271e-07 9.38e-07 -0.455 0.649 -2.27e-06 1.41e-06
InterestCoverage 0.0001 6.2e-05 1.728 0.084 -1.44e-05 0.000
MertonsDD 0.0012 0.000 6.126 0.000 0.001 0.002
WorkingCapitalToAssets 4.527e-06 2.33e-06 1.944 0.052 -3.73e-08 9.09e-06
WorkingCapitalToSales -0.0003 0.000 -2.515 0.012 -0.001 -6.86e-05
EPS 3.508e-08 3.25e-07 0.108 0.914 -6.03e-07 6.73e-07
Net Debt -8.762e-14 2.46e-14 -3.565 0.000 -1.36e-13 -3.94e-14
Sales -7.219e-14 4.18e-14 -1.726 0.084 -1.54e-13 9.78e-15
Total Assets -2.175e-14 2.06e-14 -1.057 0.290 -6.21e-14 1.86e-14
EPS Growth 3M 2.419e-06 1.65e-06 1.463 0.144 -8.22e-07 5.66e-06
EPS Growth 12M 6.959e-07 1.74e-06 0.401 0.689 -2.71e-06 4.1e-06
Net Debt Growth 3M 9.662e-07 2.05e-06 0.471 0.638 -3.06e-06 4.99e-06
Net Debt Growth 12M 5.194e-06 1.91e-06 2.721 0.007 1.45e-06 8.94e-06
Sales Growth 3M 7.809e-06 2.09e-06 3.732 0.000 3.71e-06 1.19e-05
Sales Growth 12M -4.402e-06 1.78e-06 -2.478 0.013 -7.88e-06 -9.2e-07
Total Assets Growth 3M -9.123e-06 2.38e-06 -3.829 0.000 -1.38e-05 -4.45e-06
Total Assets Growth 12M -1.891e-07 1.94e-06 -0.097 0.922 -3.99e-06 3.62e-06
CFO To Assets 0.0002 7.17e-05 2.460 0.014 3.59e-05 0.000
Capex To Assets -7.353e-05 0.000 -0.631 0.528 -0.000 0.000
Capex To FCF -7.539e-05 5.16e-05 -1.462 0.144 -0.000 2.57e-05
Capex To Sales -5.82e-05 0.000 -0.563 0.573 -0.000 0.000
EBIT To Assets -0.0001 8.06e-05 -1.664 0.096 -0.000 2.39e-05
Retained Earnings To Assets -0.0005 0.000 -3.977 0.000 -0.001 -0.000
Downside Risk 0.0001 4.14e-05 3.447 0.001 6.15e-05 0.000
Index Beta -2.031e-08 2.99e-07 -0.068 0.946 -6.07e-07 5.67e-07
Log Market Cap 8.93e-05 9.49e-05 0.941 0.347 -9.68e-05 0.000
Volatility 3M -0.0001 2.69e-05 -5.227 0.000 -0.000 -8.77e-05
stock_ABBVIE INC 0.0331 0.016 2.134 0.033 0.003 0.064
stock_ADVANCED MICRO DEVICES INC 0.1025 0.011 9.448 0.000 0.081 0.124
stock_ALLERGAN PLC 0.0447 0.013 3.312 0.001 0.018 0.071
stock_ALTABA INC 0.0804 0.014 5.714 0.000 0.053 0.108
stock_AMAZON.COM INC 0.0815 0.015 5.357 0.000 0.052 0.111
stock_AMGEN INC 0.0766 0.010 7.557 0.000 0.057 0.096
stock_APPLE INC 0.1291 0.014 9.423 0.000 0.102 0.156
stock_APPLIED MATERIALS INC 0.0966 0.013 7.504 0.000 0.071 0.122
stock_AT&T INC. COM 0.0602 0.015 4.041 0.000 0.031 0.089
stock_Alphabet Inc. Cl A 0.0754 0.017 4.521 0.000 0.043 0.108
stock_BERKSHIRE HATHAWAY INC CL-B 0.0645 0.020 3.261 0.001 0.026 0.103
stock_BOEING CO 0.0988 0.010 9.642 0.000 0.079 0.119
stock_BOOKING HOLDINGS INC 0.0790 0.017 4.774 0.000 0.047 0.112
stock_BRISTOL-MYERS SQUIBB CO 0.1201 0.013 9.604 0.000 0.096 0.145
stock_BROADCOM INC 0.0292 0.015 1.926 0.054 -0.001 0.059
stock_CATERPILLAR INC 0.0901 0.011 8.148 0.000 0.068 0.112
stock_CELGENE CORP 0.0909 0.013 7.119 0.000 0.066 0.116
stock_CHARTER COMMUNICATIONS INC 0.0330 0.015 2.173 0.030 0.003 0.063
stock_CHEVRON CORPORATION 0.0595 0.016 3.734 0.000 0.028 0.091
stock_CISCO SYSTEMS INC 0.0781 0.009 8.575 0.000 0.060 0.096
stock_CITIGROUP 0.0993 0.037 2.666 0.008 0.026 0.172
stock_COCA-COLA CO 0.1011 0.013 7.512 0.000 0.075 0.127
stock_COMCAST CORP 0.0867 0.010 8.307 0.000 0.066 0.107
stock_COSTCO WHOLESALE CORP 0.0823 0.012 6.828 0.000 0.059 0.106
stock_CVS HEALTH CORP 0.0663 0.013 5.050 0.000 0.041 0.092
stock_DELTA AIR LINES INC 0.0228 0.016 1.396 0.163 -0.009 0.055
stock_DUPONT DE NEMOURS INC 0.0064 0.016 0.407 0.684 -0.024 0.037
stock_EXXON MOBIL CORPORATION 0.0778 0.018 4.268 0.000 0.042 0.114
stock_FACEBOOK INC 0.0592 0.016 3.649 0.000 0.027 0.091
stock_FORD MOTOR CO(NEW) 0.0640 0.013 5.108 0.000 0.039 0.089
stock_FREEPORT-MCMORAN INC 0.0118 0.012 0.969 0.333 -0.012 0.036
stock_GENERAL ELECTRIC CO 0.0708 0.012 5.862 0.000 0.047 0.094
stock_GENERAL MOTORS CO 0.0208 0.017 1.203 0.229 -0.013 0.055
stock_GILEAD SCIENCES INC 0.0754 0.012 6.088 0.000 0.051 0.100
stock_GOLDMAN SACHS GROUP INC 0.0577 0.022 2.618 0.009 0.015 0.101
stock_HOME DEPOT INC 0.1050 0.014 7.295 0.000 0.077 0.133
stock_INTEL CORP 0.0838 0.011 7.335 0.000 0.061 0.106
stock_INTL BUSINESS MACHINES CORP 0.0864 0.013 6.560 0.000 0.061 0.112
stock_JOHNSON AND JOHNSON 0.0947 0.012 7.583 0.000 0.070 0.119
stock_LOWES COMPANIES INC 0.0850 0.013 6.709 0.000 0.060 0.110
stock_MASTERCARD INC 0.0774 0.018 4.280 0.000 0.042 0.113
stock_MCDONALDS CORP 0.1203 0.016 7.625 0.000 0.089 0.151
stock_MERCK & CO INC 0.0823 0.012 6.999 0.000 0.059 0.105
stock_MICRON TECHNOLOGY INC 0.0625 0.012 5.285 0.000 0.039 0.086
stock_MICROSOFT CORP 0.0912 0.012 7.345 0.000 0.067 0.116
stock_MORGAN STANLEY 0.0401 0.020 1.998 0.046 0.001 0.079
stock_NETFLIX INC 0.0849 0.015 5.626 0.000 0.055 0.114
stock_NIKE INC CL-B 0.0721 0.012 5.875 0.000 0.048 0.096
stock_NVIDIA CORP 0.0873 0.017 5.213 0.000 0.054 0.120
stock_NXP SEMICONDUCTOR NV 0.0063 0.014 0.445 0.656 -0.021 0.034
stock_ORACLE CORP 0.0743 0.012 6.099 0.000 0.050 0.098
stock_PAYPAL HLDGS INC COM W.I. 0.0415 0.015 2.755 0.006 0.012 0.071
stock_PFIZER INC 0.0681 0.012 5.687 0.000 0.045 0.092
stock_PROCTER & GAMBLE CO 0.0855 0.013 6.364 0.000 0.059 0.112
stock_QUALCOMM INC 0.0669 0.013 5.275 0.000 0.042 0.092
stock_SALESFORCE.COM INC 0.0447 0.015 2.885 0.004 0.014 0.075
stock_SCHLUMBERGER LTD. 0.0748 0.013 5.719 0.000 0.049 0.100
stock_SQUARE INC CLASS A COM STK 0.0672 0.017 3.857 0.000 0.033 0.101
stock_STARBUCKS CORPORATION 0.0829 0.015 5.676 0.000 0.054 0.111
stock_TESLA INC 0.0564 0.016 3.639 0.000 0.026 0.087
stock_TWITTER INC 0.0309 0.016 1.984 0.047 0.000 0.061
stock_UNION PAC CORP 0.0677 0.015 4.492 0.000 0.038 0.097
stock_UNITED STATES STEEL CP 0.0325 0.011 2.833 0.005 0.010 0.055
stock_UNITEDHEALTH GROUP INC 0.0665 0.014 4.602 0.000 0.038 0.095
stock_VERIZON COMMUNICATIONS 0.0492 0.015 3.227 0.001 0.019 0.079
stock_VISA INC 0.0493 0.015 3.260 0.001 0.020 0.079
stock_WALMART INC 0.0907 0.024 3.843 0.000 0.044 0.137
stock_WALT DISNEY CO 0.0982 0.012 8.329 0.000 0.075 0.121
stock_WELLS FARGO & CO(NEW) 0.1137 0.040 2.857 0.004 0.036 0.192
Omnibus: 184.550 Durbin-Watson: 1.445
Prob(Omnibus): 0.000 Jarque-Bera (JB): 207.748
Skew: -0.180 Prob(JB): 7.73e-46
Kurtosis: 3.311 Cond. No. 2.16e+14
In [62]:
target = 'Returns10D'
model_data = pd.concat([y[[target]], X], axis=1).dropna()
model_data = model_data[model_data[target].between(model_data[target].quantile(.025), 
                                                   model_data[target].quantile(.975))]

model = OLS(endog=model_data[target], exog=model_data.drop(target, axis=1))
trained_model = model.fit()
trained_model.summary()
Out[62]:
OLS Regression Results
Dep. Variable: Returns10D R-squared: 0.065
Model: OLS Adj. R-squared: 0.060
Method: Least Squares F-statistic: 13.48
Date: Tue, 03 Mar 2020 Prob (F-statistic): 6.01e-232
Time: 18:55:22 Log-Likelihood: 38817.
No. Observations: 21697 AIC: -7.741e+04
Df Residuals: 21585 BIC: -7.652e+04
Df Model: 111
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
EBITDAYield 7.783e-05 0.000 0.616 0.538 -0.000 0.000
EVToEBITDA 0.0001 0.000 1.283 0.200 -7.77e-05 0.000
EVToFCF 9.141e-05 7.19e-05 1.271 0.204 -4.95e-05 0.000
PriceToBook -6.278e-06 6.71e-05 -0.093 0.926 -0.000 0.000
PriceToDilutedEarningsTTM 0.0002 4.6e-05 3.581 0.000 7.46e-05 0.000
PriceToEarningsTTM -3.701e-06 8.17e-07 -4.531 0.000 -5.3e-06 -2.1e-06
PriceToFCF 0.0002 8.81e-05 2.333 0.020 3.29e-05 0.000
PriceToForwardEarnings -9.837e-06 1.46e-06 -6.736 0.000 -1.27e-05 -6.97e-06
PriceToOperatingCashflow 4.107e-05 4.79e-05 0.857 0.391 -5.28e-05 0.000
PriceToSalesTTM -2.549e-05 2.95e-06 -8.638 0.000 -3.13e-05 -1.97e-05
Directional Movement Index 1.305e-05 1.95e-05 0.670 0.503 -2.51e-05 5.12e-05
Money Flow Index -2.746e-05 2.6e-05 -1.057 0.290 -7.84e-05 2.35e-05
Percent Above Low 0.0001 6.07e-05 1.848 0.065 -6.82e-06 0.000
Percent Below High -0.0001 3.71e-05 -3.034 0.002 -0.000 -3.99e-05
Price Oscillator -3.908e-05 2.78e-05 -1.405 0.160 -9.36e-05 1.54e-05
Trendline -0.0001 5.07e-05 -2.284 0.022 -0.000 -1.65e-05
AssetToEquityRatio -7.667e-06 2.24e-06 -3.427 0.001 -1.21e-05 -3.28e-06
AssetTurnover -0.0013 0.000 -7.146 0.000 -0.002 -0.001
CurrentRatio -1.802e-08 2.73e-06 -0.007 0.995 -5.38e-06 5.34e-06
DebtToAssetRatio -7.141e-06 1.78e-06 -4.008 0.000 -1.06e-05 -3.65e-06
DebtToEquityRatio 2.58e-06 1.34e-06 1.930 0.054 -3.98e-08 5.2e-06
InterestCoverage 0.0002 8.85e-05 2.202 0.028 2.14e-05 0.000
MertonsDD 0.0017 0.000 6.066 0.000 0.001 0.002
WorkingCapitalToAssets 8.778e-06 3.32e-06 2.643 0.008 2.27e-06 1.53e-05
WorkingCapitalToSales -0.0002 0.000 -1.359 0.174 -0.001 0.000
EPS 9.946e-08 4.62e-07 0.215 0.830 -8.07e-07 1.01e-06
Net Debt -1.205e-13 3.51e-14 -3.435 0.001 -1.89e-13 -5.18e-14
Sales -1.271e-13 6.01e-14 -2.113 0.035 -2.45e-13 -9.2e-15
Total Assets -1.131e-13 2.96e-14 -3.818 0.000 -1.71e-13 -5.51e-14
EPS Growth 3M -9.277e-07 2.36e-06 -0.394 0.694 -5.55e-06 3.69e-06
EPS Growth 12M 7.67e-06 2.47e-06 3.099 0.002 2.82e-06 1.25e-05
Net Debt Growth 3M 5.532e-06 2.93e-06 1.890 0.059 -2.06e-07 1.13e-05
Net Debt Growth 12M 4.352e-06 2.72e-06 1.601 0.109 -9.77e-07 9.68e-06
Sales Growth 3M 1.324e-05 2.99e-06 4.423 0.000 7.37e-06 1.91e-05
Sales Growth 12M -7.601e-06 2.54e-06 -2.996 0.003 -1.26e-05 -2.63e-06
Total Assets Growth 3M -2.12e-05 3.41e-06 -6.227 0.000 -2.79e-05 -1.45e-05
Total Assets Growth 12M 3.644e-06 2.76e-06 1.318 0.187 -1.77e-06 9.06e-06
CFO To Assets 0.0003 0.000 3.115 0.002 0.000 0.001
Capex To Assets -0.0003 0.000 -1.800 0.072 -0.001 2.68e-05
Capex To FCF -0.0003 7.49e-05 -4.638 0.000 -0.000 -0.000
Capex To Sales 7.507e-05 0.000 0.506 0.613 -0.000 0.000
EBIT To Assets -0.0004 0.000 -3.340 0.001 -0.001 -0.000
Retained Earnings To Assets -0.0007 0.000 -3.698 0.000 -0.001 -0.000
Downside Risk 0.0001 5.93e-05 1.979 0.048 1.15e-06 0.000
Index Beta -7.441e-07 4.29e-07 -1.734 0.083 -1.59e-06 9.71e-08
Log Market Cap 8.94e-05 0.000 0.654 0.513 -0.000 0.000
Volatility 3M -0.0003 3.82e-05 -8.173 0.000 -0.000 -0.000
stock_ABBVIE INC 0.1070 0.023 4.750 0.000 0.063 0.151
stock_ADVANCED MICRO DEVICES INC 0.2085 0.016 13.293 0.000 0.178 0.239
stock_ALLERGAN PLC 0.1225 0.020 6.216 0.000 0.084 0.161
stock_ALTABA INC 0.1742 0.020 8.524 0.000 0.134 0.214
stock_AMAZON.COM INC 0.2081 0.022 9.438 0.000 0.165 0.251
stock_AMGEN INC 0.1426 0.015 9.771 0.000 0.114 0.171
stock_APPLE INC 0.2677 0.020 13.599 0.000 0.229 0.306
stock_APPLIED MATERIALS INC 0.1794 0.019 9.613 0.000 0.143 0.216
stock_AT&T INC. COM 0.1585 0.021 7.411 0.000 0.117 0.200
stock_Alphabet Inc. Cl A 0.1624 0.024 6.751 0.000 0.115 0.210
stock_BERKSHIRE HATHAWAY INC CL-B 0.1909 0.028 6.714 0.000 0.135 0.247
stock_BOEING CO 0.1961 0.015 13.309 0.000 0.167 0.225
stock_BOOKING HOLDINGS INC 0.1703 0.024 7.079 0.000 0.123 0.217
stock_BRISTOL-MYERS SQUIBB CO 0.2217 0.018 12.285 0.000 0.186 0.257
stock_BROADCOM INC 0.1052 0.022 4.780 0.000 0.062 0.148
stock_CATERPILLAR INC 0.1783 0.016 11.208 0.000 0.147 0.210
stock_CELGENE CORP 0.1739 0.019 9.393 0.000 0.138 0.210
stock_CHARTER COMMUNICATIONS INC 0.1155 0.022 5.230 0.000 0.072 0.159
stock_CHEVRON CORPORATION 0.1430 0.023 6.229 0.000 0.098 0.188
stock_CISCO SYSTEMS INC 0.1531 0.013 11.659 0.000 0.127 0.179
stock_CITIGROUP 0.3293 0.054 6.148 0.000 0.224 0.434
stock_COCA-COLA CO 0.1929 0.019 9.943 0.000 0.155 0.231
stock_COMCAST CORP 0.1900 0.015 12.610 0.000 0.160 0.219
stock_COSTCO WHOLESALE CORP 0.1971 0.018 11.240 0.000 0.163 0.231
stock_CVS HEALTH CORP 0.1568 0.019 8.241 0.000 0.119 0.194
stock_DELTA AIR LINES INC 0.0963 0.025 3.887 0.000 0.048 0.145
stock_DUPONT DE NEMOURS INC 0.0564 0.023 2.464 0.014 0.012 0.101
stock_EXXON MOBIL CORPORATION 0.2037 0.026 7.731 0.000 0.152 0.255
stock_FACEBOOK INC 0.1599 0.024 6.786 0.000 0.114 0.206
stock_FORD MOTOR CO(NEW) 0.1596 0.018 8.774 0.000 0.124 0.195
stock_FREEPORT-MCMORAN INC 0.0475 0.018 2.692 0.007 0.013 0.082
stock_GENERAL ELECTRIC CO 0.1538 0.017 8.855 0.000 0.120 0.188
stock_GENERAL MOTORS CO 0.1043 0.025 4.176 0.000 0.055 0.153
stock_GILEAD SCIENCES INC 0.1656 0.018 9.178 0.000 0.130 0.201
stock_GOLDMAN SACHS GROUP INC 0.1814 0.032 5.711 0.000 0.119 0.244
stock_HOME DEPOT INC 0.2126 0.021 10.235 0.000 0.172 0.253
stock_INTEL CORP 0.1726 0.016 10.501 0.000 0.140 0.205
stock_INTL BUSINESS MACHINES CORP 0.1696 0.019 8.953 0.000 0.132 0.207
stock_JOHNSON AND JOHNSON 0.1867 0.018 10.385 0.000 0.151 0.222
stock_LOWES COMPANIES INC 0.1934 0.018 10.496 0.000 0.157 0.229
stock_MASTERCARD INC 0.1787 0.026 6.800 0.000 0.127 0.230
stock_MCDONALDS CORP 0.2329 0.023 10.232 0.000 0.188 0.278
stock_MERCK & CO INC 0.1612 0.017 9.511 0.000 0.128 0.194
stock_MICRON TECHNOLOGY INC 0.1324 0.017 7.773 0.000 0.099 0.166
stock_MICROSOFT CORP 0.1953 0.018 10.930 0.000 0.160 0.230
stock_MORGAN STANLEY 0.1729 0.029 5.985 0.000 0.116 0.230
stock_NETFLIX INC 0.1770 0.022 8.124 0.000 0.134 0.220
stock_NIKE INC CL-B 0.1705 0.018 9.561 0.000 0.136 0.205
stock_NVIDIA CORP 0.1939 0.024 7.982 0.000 0.146 0.242
stock_NXP SEMICONDUCTOR NV 0.0248 0.020 1.211 0.226 -0.015 0.065
stock_ORACLE CORP 0.1484 0.018 8.458 0.000 0.114 0.183
stock_PAYPAL HLDGS INC COM W.I. 0.1184 0.022 5.399 0.000 0.075 0.161
stock_PFIZER INC 0.1455 0.017 8.420 0.000 0.112 0.179
stock_PROCTER & GAMBLE CO 0.1759 0.019 9.085 0.000 0.138 0.214
stock_QUALCOMM INC 0.1318 0.018 7.202 0.000 0.096 0.168
stock_SALESFORCE.COM INC 0.1316 0.023 5.801 0.000 0.087 0.176
stock_SCHLUMBERGER LTD. 0.1419 0.019 7.528 0.000 0.105 0.179
stock_SQUARE INC CLASS A COM STK 0.1734 0.025 6.842 0.000 0.124 0.223
stock_STARBUCKS CORPORATION 0.1879 0.021 8.866 0.000 0.146 0.229
stock_TESLA INC 0.1441 0.023 6.376 0.000 0.100 0.188
stock_TWITTER INC 0.0898 0.023 3.970 0.000 0.045 0.134
stock_UNION PAC CORP 0.1431 0.022 6.534 0.000 0.100 0.186
stock_UNITED STATES STEEL CP 0.0888 0.017 5.364 0.000 0.056 0.121
stock_UNITEDHEALTH GROUP INC 0.1739 0.021 8.315 0.000 0.133 0.215
stock_VERIZON COMMUNICATIONS 0.1368 0.022 6.235 0.000 0.094 0.180
stock_VISA INC 0.1369 0.022 6.205 0.000 0.094 0.180
stock_WALMART INC 0.2118 0.034 6.236 0.000 0.145 0.278
stock_WALT DISNEY CO 0.1907 0.017 11.246 0.000 0.158 0.224
stock_WELLS FARGO & CO(NEW) 0.3702 0.057 6.463 0.000 0.258 0.482
Omnibus: 94.078 Durbin-Watson: 1.485
Prob(Omnibus): 0.000 Jarque-Bera (JB): 96.866
Skew: -0.148 Prob(JB): 9.24e-22
Kurtosis: 3.140 Cond. No. 2.19e+14
In [63]:
target = 'Returns20D'
model_data = pd.concat([y[[target]], X], axis=1).dropna()
model_data = model_data[model_data[target].between(model_data[target].quantile(.025), 
                                                   model_data[target].quantile(.975))]

model = OLS(endog=model_data[target], exog=model_data.drop(target, axis=1))
trained_model = model.fit()
trained_model.summary()
Out[63]:
OLS Regression Results
Dep. Variable: Returns20D R-squared: 0.121
Model: OLS Adj. R-squared: 0.116
Method: Least Squares F-statistic: 26.22
Date: Tue, 03 Mar 2020 Prob (F-statistic): 0.00
Time: 18:55:33 Log-Likelihood: 31350.
No. Observations: 21041 AIC: -6.248e+04
Df Residuals: 20930 BIC: -6.159e+04
Df Model: 110
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
EBITDAYield 0.0002 0.000 0.919 0.358 -0.000 0.001
EVToEBITDA -5.045e-05 0.000 -0.309 0.757 -0.000 0.000
EVToFCF -0.0002 9.94e-05 -1.826 0.068 -0.000 1.33e-05
PriceToBook 0.0004 9.31e-05 3.968 0.000 0.000 0.001
PriceToDilutedEarningsTTM 9.955e-05 6.36e-05 1.566 0.117 -2.51e-05 0.000
PriceToEarningsTTM -2.532e-06 1.13e-06 -2.238 0.025 -4.75e-06 -3.14e-07
PriceToFCF 0.0007 0.000 5.296 0.000 0.000 0.001
PriceToForwardEarnings -1.204e-05 2.05e-06 -5.871 0.000 -1.61e-05 -8.02e-06
PriceToOperatingCashflow -0.0001 6.7e-05 -1.795 0.073 -0.000 1.11e-05
PriceToSalesTTM -4.341e-05 4.17e-06 -10.406 0.000 -5.16e-05 -3.52e-05
Directional Movement Index 2.41e-06 2.68e-05 0.090 0.928 -5.01e-05 5.49e-05
Money Flow Index -2.112e-05 3.58e-05 -0.589 0.556 -9.14e-05 4.91e-05
Percent Above Low 0.0003 8.51e-05 3.431 0.001 0.000 0.000
Percent Below High -0.0002 5.1e-05 -4.178 0.000 -0.000 -0.000
Price Oscillator -0.0002 3.83e-05 -4.801 0.000 -0.000 -0.000
Trendline -0.0004 7.17e-05 -5.846 0.000 -0.001 -0.000
AssetToEquityRatio -3.229e-05 3.59e-06 -9.005 0.000 -3.93e-05 -2.53e-05
AssetTurnover -0.0025 0.000 -9.605 0.000 -0.003 -0.002
CurrentRatio 1.858e-05 3.77e-06 4.925 0.000 1.12e-05 2.6e-05
DebtToAssetRatio -5.841e-06 2.49e-06 -2.347 0.019 -1.07e-05 -9.62e-07
DebtToEquityRatio 4.026e-06 1.79e-06 2.244 0.025 5.09e-07 7.54e-06
InterestCoverage 0.0003 0.000 2.652 0.008 8.57e-05 0.001
MertonsDD 0.0019 0.000 4.809 0.000 0.001 0.003
WorkingCapitalToAssets -1.648e-06 4.59e-06 -0.359 0.719 -1.06e-05 7.34e-06
WorkingCapitalToSales -0.0002 0.000 -1.013 0.311 -0.001 0.000
EPS 6.914e-08 6.33e-07 0.109 0.913 -1.17e-06 1.31e-06
Net Debt -2.441e-13 4.83e-14 -5.053 0.000 -3.39e-13 -1.49e-13
Sales -3.896e-13 8.34e-14 -4.670 0.000 -5.53e-13 -2.26e-13
Total Assets -1.922e-13 4.16e-14 -4.622 0.000 -2.74e-13 -1.11e-13
EPS Growth 3M -1.237e-06 3.21e-06 -0.385 0.700 -7.53e-06 5.06e-06
EPS Growth 12M 1.412e-05 3.38e-06 4.180 0.000 7.5e-06 2.07e-05
Net Debt Growth 3M 1.352e-05 4.02e-06 3.360 0.001 5.63e-06 2.14e-05
Net Debt Growth 12M -2.704e-07 3.71e-06 -0.073 0.942 -7.55e-06 7.01e-06
Sales Growth 3M 6.932e-06 4.11e-06 1.688 0.091 -1.12e-06 1.5e-05
Sales Growth 12M -1.695e-05 3.49e-06 -4.854 0.000 -2.38e-05 -1.01e-05
Total Assets Growth 3M -2.236e-05 4.67e-06 -4.788 0.000 -3.15e-05 -1.32e-05
Total Assets Growth 12M 5.351e-06 3.77e-06 1.419 0.156 -2.04e-06 1.27e-05
CFO To Assets 0.0005 0.000 3.502 0.000 0.000 0.001
Capex To Assets -0.0001 0.000 -0.572 0.568 -0.001 0.000
Capex To FCF -0.0006 0.000 -6.108 0.000 -0.001 -0.000
Capex To Sales -0.0003 0.000 -1.529 0.126 -0.001 8.83e-05
EBIT To Assets -0.0008 0.000 -4.957 0.000 -0.001 -0.000
Retained Earnings To Assets -0.0006 0.000 -2.407 0.016 -0.001 -0.000
Downside Risk 0.0003 8.24e-05 3.751 0.000 0.000 0.000
Index Beta -2.491e-06 5.95e-07 -4.186 0.000 -3.66e-06 -1.32e-06
Log Market Cap -0.0007 0.000 -3.882 0.000 -0.001 -0.000
Volatility 3M -0.0006 5.21e-05 -10.580 0.000 -0.001 -0.000
stock_ABBVIE INC 0.2743 0.032 8.530 0.000 0.211 0.337
stock_ADVANCED MICRO DEVICES INC 0.3327 0.022 15.123 0.000 0.290 0.376
stock_ALLERGAN PLC 0.3125 0.029 10.803 0.000 0.256 0.369
stock_ALTABA INC 0.3601 0.030 12.160 0.000 0.302 0.418
stock_AMAZON.COM INC 0.5132 0.032 16.202 0.000 0.451 0.575
stock_AMGEN INC 0.2495 0.021 12.168 0.000 0.209 0.290
stock_APPLE INC 0.5436 0.027 19.860 0.000 0.490 0.597
stock_APPLIED MATERIALS INC 0.3109 0.026 11.754 0.000 0.259 0.363
stock_AT&T INC. COM 0.3547 0.030 11.830 0.000 0.296 0.413
stock_Alphabet Inc. Cl A 0.3619 0.034 10.678 0.000 0.296 0.428
stock_BERKSHIRE HATHAWAY INC CL-B 0.4227 0.040 10.549 0.000 0.344 0.501
stock_BOEING CO 0.3774 0.021 18.391 0.000 0.337 0.418
stock_BOOKING HOLDINGS INC 0.3707 0.034 10.749 0.000 0.303 0.438
stock_BRISTOL-MYERS SQUIBB CO 0.3894 0.025 15.427 0.000 0.340 0.439
stock_BROADCOM INC 0.2757 0.032 8.596 0.000 0.213 0.339
stock_CATERPILLAR INC 0.3072 0.022 13.856 0.000 0.264 0.351
stock_CELGENE CORP 0.3051 0.026 11.612 0.000 0.254 0.357
stock_CHARTER COMMUNICATIONS INC 0.3042 0.032 9.456 0.000 0.241 0.367
stock_CHEVRON CORPORATION 0.3198 0.032 9.901 0.000 0.257 0.383
stock_CISCO SYSTEMS INC 0.2843 0.018 15.432 0.000 0.248 0.320
stock_CITIGROUP 0.6436 0.075 8.550 0.000 0.496 0.791
stock_COCA-COLA CO 0.3543 0.027 13.030 0.000 0.301 0.408
stock_COMCAST CORP 0.3847 0.021 18.024 0.000 0.343 0.427
stock_COSTCO WHOLESALE CORP 0.4029 0.025 15.851 0.000 0.353 0.453
stock_CVS HEALTH CORP 0.3563 0.027 13.133 0.000 0.303 0.409
stock_DELTA AIR LINES INC 3.85e-15 4e-16 9.635 0.000 3.07e-15 4.63e-15
stock_DUPONT DE NEMOURS INC 0.1967 0.033 5.900 0.000 0.131 0.262
stock_EXXON MOBIL CORPORATION 0.5052 0.038 13.346 0.000 0.431 0.579
stock_FACEBOOK INC 0.4170 0.034 12.250 0.000 0.350 0.484
stock_FORD MOTOR CO(NEW) 0.3447 0.026 13.095 0.000 0.293 0.396
stock_FREEPORT-MCMORAN INC 0.0939 0.026 3.624 0.000 0.043 0.145
stock_GENERAL ELECTRIC CO 0.3223 0.024 13.180 0.000 0.274 0.370
stock_GENERAL MOTORS CO 0.3088 0.036 8.584 0.000 0.238 0.379
stock_GILEAD SCIENCES INC 0.3491 0.026 13.375 0.000 0.298 0.400
stock_GOLDMAN SACHS GROUP INC 0.3878 0.045 8.656 0.000 0.300 0.476
stock_HOME DEPOT INC 0.4013 0.029 13.750 0.000 0.344 0.459
stock_INTEL CORP 0.3356 0.023 14.581 0.000 0.291 0.381
stock_INTL BUSINESS MACHINES CORP 0.3014 0.026 11.411 0.000 0.250 0.353
stock_JOHNSON AND JOHNSON 0.3613 0.025 14.342 0.000 0.312 0.411
stock_LOWES COMPANIES INC 0.3902 0.028 14.145 0.000 0.336 0.444
stock_MASTERCARD INC 0.4190 0.038 11.108 0.000 0.345 0.493
stock_MCDONALDS CORP 0.4225 0.032 13.183 0.000 0.360 0.485
stock_MERCK & CO INC 0.3025 0.024 12.707 0.000 0.256 0.349
stock_MICRON TECHNOLOGY INC 0.2494 0.024 10.359 0.000 0.202 0.297
stock_MICROSOFT CORP 0.4024 0.025 16.038 0.000 0.353 0.452
stock_MORGAN STANLEY 0.4144 0.041 10.103 0.000 0.334 0.495
stock_NETFLIX INC 0.3352 0.031 10.931 0.000 0.275 0.395
stock_NIKE INC CL-B 0.3826 0.026 14.962 0.000 0.332 0.433
stock_NVIDIA CORP 0.4142 0.035 11.956 0.000 0.346 0.482
stock_NXP SEMICONDUCTOR NV 0.1085 0.029 3.733 0.000 0.052 0.165
stock_ORACLE CORP 0.2726 0.025 11.051 0.000 0.224 0.321
stock_PAYPAL HLDGS INC COM W.I. 0.3442 0.032 10.806 0.000 0.282 0.407
stock_PFIZER INC 0.3056 0.024 12.541 0.000 0.258 0.353
stock_PROCTER & GAMBLE CO 0.3366 0.027 12.374 0.000 0.283 0.390
stock_QUALCOMM INC 0.2471 0.026 9.587 0.000 0.197 0.298
stock_SALESFORCE.COM INC 0.3645 0.034 10.743 0.000 0.298 0.431
stock_SCHLUMBERGER LTD. 0.2542 0.027 9.590 0.000 0.202 0.306
stock_SQUARE INC CLASS A COM STK 0.3965 0.037 10.832 0.000 0.325 0.468
stock_STARBUCKS CORPORATION 0.3913 0.030 12.885 0.000 0.332 0.451
stock_TESLA INC 0.3418 0.033 10.480 0.000 0.278 0.406
stock_TWITTER INC 0.2506 0.033 7.668 0.000 0.187 0.315
stock_UNION PAC CORP 0.2574 0.031 8.192 0.000 0.196 0.319
stock_UNITED STATES STEEL CP 0.1976 0.023 8.452 0.000 0.152 0.243
stock_UNITEDHEALTH GROUP INC 0.4199 0.030 13.995 0.000 0.361 0.479
stock_VERIZON COMMUNICATIONS 0.3210 0.031 10.383 0.000 0.260 0.382
stock_VISA INC 0.3678 0.032 11.457 0.000 0.305 0.431
stock_WALMART INC 0.5010 0.048 10.542 0.000 0.408 0.594
stock_WALT DISNEY CO 0.3417 0.024 14.414 0.000 0.295 0.388
stock_WELLS FARGO & CO(NEW) 0.7425 0.081 9.215 0.000 0.585 0.900
Omnibus: 1.051 Durbin-Watson: 1.622
Prob(Omnibus): 0.591 Jarque-Bera (JB): 1.032
Skew: -0.004 Prob(JB): 0.597
Kurtosis: 3.033 Cond. No. 1.13e+16
In [78]:
target = 'Returns60D'
model_data = pd.concat([y[[target]], X], axis=1).dropna()
model_data = model_data[model_data[target].between(model_data[target].quantile(.025), 
                                                   model_data[target].quantile(.975))]

model = OLS(endog=model_data[target], exog=model_data.drop(target, axis=1))
trained_model = model.fit()
trained_model.summary()
Out[78]:
OLS Regression Results
Dep. Variable: Returns60D R-squared: 0.330
Model: OLS Adj. R-squared: 0.326
Method: Least Squares F-statistic: 87.12
Date: Tue, 03 Mar 2020 Prob (F-statistic): 0.00
Time: 19:10:50 Log-Likelihood: 20353.
No. Observations: 18532 AIC: -4.050e+04
Df Residuals: 18427 BIC: -3.967e+04
Df Model: 104
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
EBITDAYield -0.0014 0.000 -4.671 0.000 -0.002 -0.001
EVToEBITDA 0.0009 0.000 3.089 0.002 0.000 0.001
EVToFCF -1.45e-05 0.000 -0.082 0.934 -0.000 0.000
PriceToBook 0.0004 0.000 2.395 0.017 6.95e-05 0.001
PriceToDilutedEarningsTTM 0.0003 0.000 2.475 0.013 5.38e-05 0.000
PriceToEarningsTTM -1.291e-05 1.84e-06 -7.015 0.000 -1.65e-05 -9.3e-06
PriceToFCF 0.0004 0.000 1.702 0.089 -5.63e-05 0.001
PriceToForwardEarnings 3.503e-06 3.42e-06 1.026 0.305 -3.19e-06 1.02e-05
PriceToOperatingCashflow 0.0013 0.000 11.886 0.000 0.001 0.002
PriceToSalesTTM -0.0001 7.16e-06 -17.728 0.000 -0.000 -0.000
Directional Movement Index 0.0001 4.25e-05 2.790 0.005 3.53e-05 0.000
Money Flow Index -0.0001 5.69e-05 -1.978 0.048 -0.000 -1.02e-06
Percent Above Low -0.0007 0.000 -4.871 0.000 -0.001 -0.000
Percent Below High -0.0003 8.16e-05 -3.093 0.002 -0.000 -9.24e-05
Price Oscillator -0.0002 6.1e-05 -2.929 0.003 -0.000 -5.92e-05
Trendline -0.0001 0.000 -1.227 0.220 -0.000 8.86e-05
AssetToEquityRatio -9.895e-05 7.81e-06 -12.671 0.000 -0.000 -8.36e-05
AssetTurnover -0.0011 0.000 -2.630 0.009 -0.002 -0.000
CurrentRatio 6.704e-05 6.25e-06 10.727 0.000 5.48e-05 7.93e-05
DebtToAssetRatio 7.513e-06 4.17e-06 1.800 0.072 -6.67e-07 1.57e-05
DebtToEquityRatio 3.7e-05 2.78e-06 13.298 0.000 3.15e-05 4.25e-05
InterestCoverage 0.0010 0.000 4.694 0.000 0.001 0.001
MertonsDD 0.0081 0.001 10.836 0.000 0.007 0.010
WorkingCapitalToAssets -4.368e-05 7.33e-06 -5.960 0.000 -5.8e-05 -2.93e-05
WorkingCapitalToSales 0.0005 0.000 1.211 0.226 -0.000 0.001
EPS 8.932e-07 9.91e-07 0.902 0.367 -1.05e-06 2.83e-06
Net Debt 2.238e-13 7.6e-14 2.944 0.003 7.48e-14 3.73e-13
Sales -8.609e-13 1.46e-13 -5.909 0.000 -1.15e-12 -5.75e-13
Total Assets -1.029e-12 6.99e-14 -14.719 0.000 -1.17e-12 -8.92e-13
EPS Growth 3M -2.457e-05 5.08e-06 -4.840 0.000 -3.45e-05 -1.46e-05
EPS Growth 12M 4.763e-05 5.32e-06 8.951 0.000 3.72e-05 5.81e-05
Net Debt Growth 3M 3.316e-05 6.39e-06 5.188 0.000 2.06e-05 4.57e-05
Net Debt Growth 12M -2.444e-05 5.82e-06 -4.196 0.000 -3.59e-05 -1.3e-05
Sales Growth 3M 2.616e-05 6.41e-06 4.082 0.000 1.36e-05 3.87e-05
Sales Growth 12M -4.77e-05 5.6e-06 -8.517 0.000 -5.87e-05 -3.67e-05
Total Assets Growth 3M -6.509e-05 7.33e-06 -8.884 0.000 -7.95e-05 -5.07e-05
Total Assets Growth 12M 4.811e-05 6e-06 8.013 0.000 3.63e-05 5.99e-05
CFO To Assets 0.0007 0.000 2.969 0.003 0.000 0.001
Capex To Assets 0.0008 0.000 2.246 0.025 0.000 0.002
Capex To FCF -0.0007 0.000 -3.635 0.000 -0.001 -0.000
Capex To Sales -0.0019 0.000 -5.613 0.000 -0.002 -0.001
EBIT To Assets -0.0020 0.000 -7.434 0.000 -0.003 -0.001
Retained Earnings To Assets -0.0028 0.000 -6.452 0.000 -0.004 -0.002
Downside Risk 4.662e-06 0.000 0.035 0.972 -0.000 0.000
Index Beta -6.051e-07 1e-06 -0.603 0.546 -2.57e-06 1.36e-06
Log Market Cap -0.0025 0.000 -8.077 0.000 -0.003 -0.002
Volatility 3M -0.0008 8.12e-05 -9.621 0.000 -0.001 -0.001
stock_ABBVIE INC 0.2845 0.057 4.984 0.000 0.173 0.396
stock_ADVANCED MICRO DEVICES INC 0.4686 0.036 12.885 0.000 0.397 0.540
stock_ALLERGAN PLC 0.5821 0.055 10.557 0.000 0.474 0.690
stock_ALTABA INC 0.6958 0.060 11.574 0.000 0.578 0.814
stock_AMAZON.COM INC 0.8690 0.057 15.126 0.000 0.756 0.982
stock_AMGEN INC 0.6328 0.035 18.123 0.000 0.564 0.701
stock_APPLE INC 1.2741 0.045 28.046 0.000 1.185 1.363
stock_APPLIED MATERIALS INC 0.6336 0.045 14.066 0.000 0.545 0.722
stock_AT&T INC. COM 0.7319 0.051 14.372 0.000 0.632 0.832
stock_Alphabet Inc. Cl A 0.7279 0.058 12.455 0.000 0.613 0.842
stock_BERKSHIRE HATHAWAY INC CL-B 1.0465 0.069 15.076 0.000 0.910 1.183
stock_BOEING CO 0.7933 0.034 23.622 0.000 0.727 0.859
stock_BOOKING HOLDINGS INC 0.6028 0.062 9.773 0.000 0.482 0.724
stock_BRISTOL-MYERS SQUIBB CO 0.7739 0.041 18.698 0.000 0.693 0.855
stock_BROADCOM INC 0.3869 0.060 6.403 0.000 0.268 0.505
stock_CATERPILLAR INC 0.4849 0.036 13.444 0.000 0.414 0.556
stock_CELGENE CORP 0.5661 0.045 12.449 0.000 0.477 0.655
stock_CHARTER COMMUNICATIONS INC 0.4604 0.060 7.654 0.000 0.343 0.578
stock_CHEVRON CORPORATION 0.5744 0.056 10.298 0.000 0.465 0.684
stock_CISCO SYSTEMS INC 0.6528 0.031 21.381 0.000 0.593 0.713
stock_CITIGROUP 2.4625 0.127 19.386 0.000 2.214 2.712
stock_COCA-COLA CO 0.6655 0.046 14.603 0.000 0.576 0.755
stock_COMCAST CORP 0.7347 0.037 19.805 0.000 0.662 0.807
stock_COSTCO WHOLESALE CORP 0.6179 0.048 12.944 0.000 0.524 0.711
stock_CVS HEALTH CORP 0.5030 0.048 10.477 0.000 0.409 0.597
stock_DELTA AIR LINES INC 1.413e-16 1.82e-16 0.777 0.437 -2.15e-16 4.98e-16
stock_DUPONT DE NEMOURS INC -2.832e-14 1.69e-15 -16.710 0.000 -3.16e-14 -2.5e-14
stock_EXXON MOBIL CORPORATION 1.0116 0.069 14.734 0.000 0.877 1.146
stock_FACEBOOK INC 0.7773 0.063 12.383 0.000 0.654 0.900
stock_FORD MOTOR CO(NEW) -1.347e-14 8.33e-16 -16.171 0.000 -1.51e-14 -1.18e-14
stock_FREEPORT-MCMORAN INC -0.0459 0.070 -0.657 0.511 -0.183 0.091
stock_GENERAL ELECTRIC CO 0.7030 0.042 16.829 0.000 0.621 0.785
stock_GENERAL MOTORS CO 0.4385 0.065 6.697 0.000 0.310 0.567
stock_GILEAD SCIENCES INC 0.6985 0.048 14.655 0.000 0.605 0.792
stock_GOLDMAN SACHS GROUP INC 1.0887 0.077 14.091 0.000 0.937 1.240
stock_HOME DEPOT INC 0.6477 0.049 13.249 0.000 0.552 0.744
stock_INTEL CORP 0.6943 0.038 18.075 0.000 0.619 0.770
stock_INTL BUSINESS MACHINES CORP 0.5630 0.044 12.912 0.000 0.478 0.648
stock_JOHNSON AND JOHNSON 0.8033 0.042 19.006 0.000 0.720 0.886
stock_LOWES COMPANIES INC -2.13e-14 1.26e-15 -16.857 0.000 -2.38e-14 -1.88e-14
stock_MASTERCARD INC 0.7014 0.068 10.284 0.000 0.568 0.835
stock_MCDONALDS CORP 0.6714 0.054 12.382 0.000 0.565 0.778
stock_MERCK & CO INC 0.5968 0.040 14.900 0.000 0.518 0.675
stock_MICRON TECHNOLOGY INC 0.4950 0.041 12.108 0.000 0.415 0.575
stock_MICROSOFT CORP 0.8790 0.042 20.831 0.000 0.796 0.962
stock_MORGAN STANLEY 1.2267 0.072 17.003 0.000 1.085 1.368
stock_NETFLIX INC 0.5024 0.052 9.603 0.000 0.400 0.605
stock_NIKE INC CL-B 0.6079 0.045 13.406 0.000 0.519 0.697
stock_NVIDIA CORP 0.7372 0.061 12.127 0.000 0.618 0.856
stock_NXP SEMICONDUCTOR NV -0.0892 0.053 -1.694 0.090 -0.192 0.014
stock_ORACLE CORP 0.5648 0.042 13.516 0.000 0.483 0.647
stock_PAYPAL HLDGS INC COM W.I. 0.5716 0.060 9.516 0.000 0.454 0.689
stock_PFIZER INC 0.6791 0.042 16.186 0.000 0.597 0.761
stock_PROCTER & GAMBLE CO 0.6083 0.046 13.200 0.000 0.518 0.699
stock_QUALCOMM INC 0.4412 0.044 10.103 0.000 0.356 0.527
stock_SALESFORCE.COM INC -6.452e-16 4.19e-17 -15.395 0.000 -7.27e-16 -5.63e-16
stock_SCHLUMBERGER LTD. 0.4628 0.045 10.268 0.000 0.374 0.551
stock_SQUARE INC CLASS A COM STK 0.4099 0.067 6.136 0.000 0.279 0.541
stock_STARBUCKS CORPORATION 0.5641 0.054 10.441 0.000 0.458 0.670
stock_TESLA INC 0.4191 0.061 6.882 0.000 0.300 0.538
stock_TWITTER INC 0.3256 0.061 5.358 0.000 0.206 0.445
stock_UNION PAC CORP 3.857e-16 3.43e-17 11.244 0.000 3.18e-16 4.53e-16
stock_UNITED STATES STEEL CP 0.1177 0.039 2.985 0.003 0.040 0.195
stock_UNITEDHEALTH GROUP INC 0.6853 0.054 12.781 0.000 0.580 0.790
stock_VERIZON COMMUNICATIONS 0.4796 0.054 8.964 0.000 0.375 0.584
stock_VISA INC 0.7022 0.060 11.671 0.000 0.584 0.820
stock_WALMART INC 0.8216 0.082 9.975 0.000 0.660 0.983
stock_WALT DISNEY CO 0.6716 0.039 17.116 0.000 0.595 0.748
stock_WELLS FARGO & CO(NEW) 2.4242 0.147 16.448 0.000 2.135 2.713
Omnibus: 120.425 Durbin-Watson: 1.754
Prob(Omnibus): 0.000 Jarque-Bera (JB): 134.684
Skew: -0.154 Prob(JB): 5.67e-30
Kurtosis: 3.281 Cond. No. 1.13e+16

Linear Models for Prediction: sklearn

Since sklearn is tailored towards prediction, we will evaluate the linear regression model based on its predictive performance using cross-validation.

Custom Time Series Cross-Validation

Our data consists of grouped time series data that requires a custom cross-validation function to provide the train and test indices that ensure that the test data immediately follows the training data for each equity and we do not inadvertently create a look-ahead bias or leakage.

We can achieve this using the following function that returns a generator yielding pairs of train and test dates. The set of train dates that ensure a minimum length of the training periods. The number of pairs depends on the parameter nfolds. The distinct test periods do not overlap and are located at the end of the period available in the data. After a test period is used, it becomes part of the training data that grow in size accordingly:

In [83]:
def time_series_split(d=model_data, nfolds=5, min_train=21):
    """Generate train/test dates for nfolds 
    with at least min_train train obs
    """
    train_dates = d[:min_train].tolist()
    n = int(len(dates)/(nfolds + 1)) + 1
    test_folds = [d[i:i + n] for i in range(min_train, len(d), n)]
    for test_dates in test_folds:
        if len(train_dates) > min_train:
            yield train_dates, test_dates
        train_dates.extend(test_dates)

Select Features and Target

We need to select the appropriate return series (we will use a 60-day holding period) and remove outliers. We will also convert returns to log returns as follows:

In [84]:
target = 'Returns60D'
outliers = .01
model_data = pd.concat([y[[target]], X], axis=1).dropna().reset_index('asset', drop=True)
model_data = model_data[model_data[target].between(*model_data[target].quantile([outliers, 1-outliers]).values)] 

model_data[target] = np.log1p(model_data[target])
features = model_data.drop(target, axis=1).columns
dates = model_data.index.unique()

print(model_data.info())
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 19116 entries, 2017-01-03 to 2018-10-03
Columns: 117 entries, Returns60D to stock_WELLS FARGO & CO(NEW)
dtypes: float64(117)
memory usage: 17.2 MB
None
In [87]:
model_data[target].describe()
Out[87]:
count    19116.000000
mean         0.022847
std          0.109096
min         -0.349525
25%         -0.043959
50%          0.033707
75%          0.092706
max          0.377204
Name: Returns60D, dtype: float64
In [88]:
idx = pd.IndexSlice

Train/Test Split

We will use 250 folds to generally predict about 2 days of forward returns following the historical training data that will gradually increase in length.

In [86]:
nfolds = 250
    
for train_dates, test_dates in time_series_split(dates, nfolds=nfolds):
        
    print('{} train_dates from {} - {}'.format(len(train_dates), str(train_dates[0]), 
                                                   str(train_dates[len(train_dates)-1])))
        
    print('{} test_dates from {} - {}'.format(len(test_dates),str(test_dates[0]), 
                                                    str(test_dates[len(test_dates)-1])))
    print()
23 train_dates from 2017-01-03 00:00:00+00:00 - 2017-02-03 00:00:00+00:00
2 test_dates from 2017-02-06 00:00:00+00:00 - 2017-02-07 00:00:00+00:00

25 train_dates from 2017-01-03 00:00:00+00:00 - 2017-02-07 00:00:00+00:00
2 test_dates from 2017-02-08 00:00:00+00:00 - 2017-02-09 00:00:00+00:00

27 train_dates from 2017-01-03 00:00:00+00:00 - 2017-02-09 00:00:00+00:00
2 test_dates from 2017-02-10 00:00:00+00:00 - 2017-02-13 00:00:00+00:00

29 train_dates from 2017-01-03 00:00:00+00:00 - 2017-02-13 00:00:00+00:00
2 test_dates from 2017-02-14 00:00:00+00:00 - 2017-02-15 00:00:00+00:00

31 train_dates from 2017-01-03 00:00:00+00:00 - 2017-02-15 00:00:00+00:00
2 test_dates from 2017-02-16 00:00:00+00:00 - 2017-02-17 00:00:00+00:00

33 train_dates from 2017-01-03 00:00:00+00:00 - 2017-02-17 00:00:00+00:00
2 test_dates from 2017-02-21 00:00:00+00:00 - 2017-02-22 00:00:00+00:00

35 train_dates from 2017-01-03 00:00:00+00:00 - 2017-02-22 00:00:00+00:00
2 test_dates from 2017-02-23 00:00:00+00:00 - 2017-02-24 00:00:00+00:00

37 train_dates from 2017-01-03 00:00:00+00:00 - 2017-02-24 00:00:00+00:00
2 test_dates from 2017-02-27 00:00:00+00:00 - 2017-02-28 00:00:00+00:00

39 train_dates from 2017-01-03 00:00:00+00:00 - 2017-02-28 00:00:00+00:00
2 test_dates from 2017-03-01 00:00:00+00:00 - 2017-03-02 00:00:00+00:00

41 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-02 00:00:00+00:00
2 test_dates from 2017-03-03 00:00:00+00:00 - 2017-03-06 00:00:00+00:00

43 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-06 00:00:00+00:00
2 test_dates from 2017-03-07 00:00:00+00:00 - 2017-03-08 00:00:00+00:00

45 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-08 00:00:00+00:00
2 test_dates from 2017-03-09 00:00:00+00:00 - 2017-03-10 00:00:00+00:00

47 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-10 00:00:00+00:00
2 test_dates from 2017-03-13 00:00:00+00:00 - 2017-03-14 00:00:00+00:00

49 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-14 00:00:00+00:00
2 test_dates from 2017-03-15 00:00:00+00:00 - 2017-03-16 00:00:00+00:00

51 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-16 00:00:00+00:00
2 test_dates from 2017-03-17 00:00:00+00:00 - 2017-03-20 00:00:00+00:00

53 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-20 00:00:00+00:00
2 test_dates from 2017-03-21 00:00:00+00:00 - 2017-03-22 00:00:00+00:00

55 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-22 00:00:00+00:00
2 test_dates from 2017-03-23 00:00:00+00:00 - 2017-03-24 00:00:00+00:00

57 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-24 00:00:00+00:00
2 test_dates from 2017-03-27 00:00:00+00:00 - 2017-03-28 00:00:00+00:00

59 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-28 00:00:00+00:00
2 test_dates from 2017-03-29 00:00:00+00:00 - 2017-03-30 00:00:00+00:00

61 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-30 00:00:00+00:00
2 test_dates from 2017-03-31 00:00:00+00:00 - 2017-04-03 00:00:00+00:00

63 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-03 00:00:00+00:00
2 test_dates from 2017-04-04 00:00:00+00:00 - 2017-04-05 00:00:00+00:00

65 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-05 00:00:00+00:00
2 test_dates from 2017-04-06 00:00:00+00:00 - 2017-04-07 00:00:00+00:00

67 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-07 00:00:00+00:00
2 test_dates from 2017-04-10 00:00:00+00:00 - 2017-04-11 00:00:00+00:00

69 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-11 00:00:00+00:00
2 test_dates from 2017-04-12 00:00:00+00:00 - 2017-04-13 00:00:00+00:00

71 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-13 00:00:00+00:00
2 test_dates from 2017-04-17 00:00:00+00:00 - 2017-04-18 00:00:00+00:00

73 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-18 00:00:00+00:00
2 test_dates from 2017-04-19 00:00:00+00:00 - 2017-04-20 00:00:00+00:00

75 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-20 00:00:00+00:00
2 test_dates from 2017-04-21 00:00:00+00:00 - 2017-04-24 00:00:00+00:00

77 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-24 00:00:00+00:00
2 test_dates from 2017-04-25 00:00:00+00:00 - 2017-04-26 00:00:00+00:00

79 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-26 00:00:00+00:00
2 test_dates from 2017-04-27 00:00:00+00:00 - 2017-04-28 00:00:00+00:00

81 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-28 00:00:00+00:00
2 test_dates from 2017-05-01 00:00:00+00:00 - 2017-05-02 00:00:00+00:00

83 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-02 00:00:00+00:00
2 test_dates from 2017-05-03 00:00:00+00:00 - 2017-05-04 00:00:00+00:00

85 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-04 00:00:00+00:00
2 test_dates from 2017-05-05 00:00:00+00:00 - 2017-05-08 00:00:00+00:00

87 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-08 00:00:00+00:00
2 test_dates from 2017-05-09 00:00:00+00:00 - 2017-05-10 00:00:00+00:00

89 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-10 00:00:00+00:00
2 test_dates from 2017-05-11 00:00:00+00:00 - 2017-05-12 00:00:00+00:00

91 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-12 00:00:00+00:00
2 test_dates from 2017-05-15 00:00:00+00:00 - 2017-05-16 00:00:00+00:00

93 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-16 00:00:00+00:00
2 test_dates from 2017-05-17 00:00:00+00:00 - 2017-05-18 00:00:00+00:00

95 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-18 00:00:00+00:00
2 test_dates from 2017-05-19 00:00:00+00:00 - 2017-05-22 00:00:00+00:00

97 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-22 00:00:00+00:00
2 test_dates from 2017-05-23 00:00:00+00:00 - 2017-05-24 00:00:00+00:00

99 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-24 00:00:00+00:00
2 test_dates from 2017-05-25 00:00:00+00:00 - 2017-05-26 00:00:00+00:00

101 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-26 00:00:00+00:00
2 test_dates from 2017-05-30 00:00:00+00:00 - 2017-05-31 00:00:00+00:00

103 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-31 00:00:00+00:00
2 test_dates from 2017-06-01 00:00:00+00:00 - 2017-06-02 00:00:00+00:00

105 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-02 00:00:00+00:00
2 test_dates from 2017-06-05 00:00:00+00:00 - 2017-06-06 00:00:00+00:00

107 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-06 00:00:00+00:00
2 test_dates from 2017-06-07 00:00:00+00:00 - 2017-06-08 00:00:00+00:00

109 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-08 00:00:00+00:00
2 test_dates from 2017-06-09 00:00:00+00:00 - 2017-06-12 00:00:00+00:00

111 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-12 00:00:00+00:00
2 test_dates from 2017-06-13 00:00:00+00:00 - 2017-06-14 00:00:00+00:00

113 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-14 00:00:00+00:00
2 test_dates from 2017-06-15 00:00:00+00:00 - 2017-06-16 00:00:00+00:00

115 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-16 00:00:00+00:00
2 test_dates from 2017-06-19 00:00:00+00:00 - 2017-06-20 00:00:00+00:00

117 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-20 00:00:00+00:00
2 test_dates from 2017-06-21 00:00:00+00:00 - 2017-06-22 00:00:00+00:00

119 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-22 00:00:00+00:00
2 test_dates from 2017-06-23 00:00:00+00:00 - 2017-06-26 00:00:00+00:00

121 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-26 00:00:00+00:00
2 test_dates from 2017-06-27 00:00:00+00:00 - 2017-06-28 00:00:00+00:00

123 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-28 00:00:00+00:00
2 test_dates from 2017-06-29 00:00:00+00:00 - 2017-06-30 00:00:00+00:00

125 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-30 00:00:00+00:00
2 test_dates from 2017-07-03 00:00:00+00:00 - 2017-07-05 00:00:00+00:00

127 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-05 00:00:00+00:00
2 test_dates from 2017-07-06 00:00:00+00:00 - 2017-07-07 00:00:00+00:00

129 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-07 00:00:00+00:00
2 test_dates from 2017-07-10 00:00:00+00:00 - 2017-07-11 00:00:00+00:00

131 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-11 00:00:00+00:00
2 test_dates from 2017-07-12 00:00:00+00:00 - 2017-07-13 00:00:00+00:00

133 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-13 00:00:00+00:00
2 test_dates from 2017-07-14 00:00:00+00:00 - 2017-07-17 00:00:00+00:00

135 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-17 00:00:00+00:00
2 test_dates from 2017-07-18 00:00:00+00:00 - 2017-07-19 00:00:00+00:00

137 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-19 00:00:00+00:00
2 test_dates from 2017-07-20 00:00:00+00:00 - 2017-07-21 00:00:00+00:00

139 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-21 00:00:00+00:00
2 test_dates from 2017-07-24 00:00:00+00:00 - 2017-07-25 00:00:00+00:00

141 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-25 00:00:00+00:00
2 test_dates from 2017-07-26 00:00:00+00:00 - 2017-07-27 00:00:00+00:00

143 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-27 00:00:00+00:00
2 test_dates from 2017-07-28 00:00:00+00:00 - 2017-07-31 00:00:00+00:00

145 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-31 00:00:00+00:00
2 test_dates from 2017-08-01 00:00:00+00:00 - 2017-08-02 00:00:00+00:00

147 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-02 00:00:00+00:00
2 test_dates from 2017-08-03 00:00:00+00:00 - 2017-08-04 00:00:00+00:00

149 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-04 00:00:00+00:00
2 test_dates from 2017-08-07 00:00:00+00:00 - 2017-08-08 00:00:00+00:00

151 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-08 00:00:00+00:00
2 test_dates from 2017-08-09 00:00:00+00:00 - 2017-08-10 00:00:00+00:00

153 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-10 00:00:00+00:00
2 test_dates from 2017-08-11 00:00:00+00:00 - 2017-08-14 00:00:00+00:00

155 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-14 00:00:00+00:00
2 test_dates from 2017-08-15 00:00:00+00:00 - 2017-08-16 00:00:00+00:00

157 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-16 00:00:00+00:00
2 test_dates from 2017-08-17 00:00:00+00:00 - 2017-08-18 00:00:00+00:00

159 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-18 00:00:00+00:00
2 test_dates from 2017-08-21 00:00:00+00:00 - 2017-08-22 00:00:00+00:00

161 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-22 00:00:00+00:00
2 test_dates from 2017-08-23 00:00:00+00:00 - 2017-08-24 00:00:00+00:00

163 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-24 00:00:00+00:00
2 test_dates from 2017-08-25 00:00:00+00:00 - 2017-08-28 00:00:00+00:00

165 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-28 00:00:00+00:00
2 test_dates from 2017-08-29 00:00:00+00:00 - 2017-08-30 00:00:00+00:00

167 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-30 00:00:00+00:00
2 test_dates from 2017-08-31 00:00:00+00:00 - 2017-09-01 00:00:00+00:00

169 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-01 00:00:00+00:00
2 test_dates from 2017-09-05 00:00:00+00:00 - 2017-09-06 00:00:00+00:00

171 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-06 00:00:00+00:00
2 test_dates from 2017-09-07 00:00:00+00:00 - 2017-09-08 00:00:00+00:00

173 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-08 00:00:00+00:00
2 test_dates from 2017-09-11 00:00:00+00:00 - 2017-09-12 00:00:00+00:00

175 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-12 00:00:00+00:00
2 test_dates from 2017-09-13 00:00:00+00:00 - 2017-09-14 00:00:00+00:00

177 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-14 00:00:00+00:00
2 test_dates from 2017-09-15 00:00:00+00:00 - 2017-09-18 00:00:00+00:00

179 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-18 00:00:00+00:00
2 test_dates from 2017-09-19 00:00:00+00:00 - 2017-09-20 00:00:00+00:00

181 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-20 00:00:00+00:00
2 test_dates from 2017-09-21 00:00:00+00:00 - 2017-09-22 00:00:00+00:00

183 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-22 00:00:00+00:00
2 test_dates from 2017-09-25 00:00:00+00:00 - 2017-09-26 00:00:00+00:00

185 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-26 00:00:00+00:00
2 test_dates from 2017-09-27 00:00:00+00:00 - 2017-09-28 00:00:00+00:00

187 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-28 00:00:00+00:00
2 test_dates from 2017-09-29 00:00:00+00:00 - 2017-10-02 00:00:00+00:00

189 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-02 00:00:00+00:00
2 test_dates from 2017-10-03 00:00:00+00:00 - 2017-10-04 00:00:00+00:00

191 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-04 00:00:00+00:00
2 test_dates from 2017-10-05 00:00:00+00:00 - 2017-10-06 00:00:00+00:00

193 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-06 00:00:00+00:00
2 test_dates from 2017-10-09 00:00:00+00:00 - 2017-10-10 00:00:00+00:00

195 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-10 00:00:00+00:00
2 test_dates from 2017-10-11 00:00:00+00:00 - 2017-10-12 00:00:00+00:00

197 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-12 00:00:00+00:00
2 test_dates from 2017-10-13 00:00:00+00:00 - 2017-10-16 00:00:00+00:00

199 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-16 00:00:00+00:00
2 test_dates from 2017-10-17 00:00:00+00:00 - 2017-10-18 00:00:00+00:00

201 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-18 00:00:00+00:00
2 test_dates from 2017-10-19 00:00:00+00:00 - 2017-10-20 00:00:00+00:00

203 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-20 00:00:00+00:00
2 test_dates from 2017-10-23 00:00:00+00:00 - 2017-10-24 00:00:00+00:00

205 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-24 00:00:00+00:00
2 test_dates from 2017-10-25 00:00:00+00:00 - 2017-10-26 00:00:00+00:00

207 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-26 00:00:00+00:00
2 test_dates from 2017-10-27 00:00:00+00:00 - 2017-10-30 00:00:00+00:00

209 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-30 00:00:00+00:00
2 test_dates from 2017-10-31 00:00:00+00:00 - 2017-11-01 00:00:00+00:00

211 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-01 00:00:00+00:00
2 test_dates from 2017-11-02 00:00:00+00:00 - 2017-11-03 00:00:00+00:00

213 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-03 00:00:00+00:00
2 test_dates from 2017-11-06 00:00:00+00:00 - 2017-11-07 00:00:00+00:00

215 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-07 00:00:00+00:00
2 test_dates from 2017-11-08 00:00:00+00:00 - 2017-11-09 00:00:00+00:00

217 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-09 00:00:00+00:00
2 test_dates from 2017-11-10 00:00:00+00:00 - 2017-11-13 00:00:00+00:00

219 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-13 00:00:00+00:00
2 test_dates from 2017-11-14 00:00:00+00:00 - 2017-11-15 00:00:00+00:00

221 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-15 00:00:00+00:00
2 test_dates from 2017-11-16 00:00:00+00:00 - 2017-11-17 00:00:00+00:00

223 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-17 00:00:00+00:00
2 test_dates from 2017-11-20 00:00:00+00:00 - 2017-11-21 00:00:00+00:00

225 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-21 00:00:00+00:00
2 test_dates from 2017-11-22 00:00:00+00:00 - 2017-11-24 00:00:00+00:00

227 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-24 00:00:00+00:00
2 test_dates from 2017-11-27 00:00:00+00:00 - 2017-11-28 00:00:00+00:00

229 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-28 00:00:00+00:00
2 test_dates from 2017-11-29 00:00:00+00:00 - 2017-11-30 00:00:00+00:00

231 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-30 00:00:00+00:00
2 test_dates from 2017-12-01 00:00:00+00:00 - 2017-12-04 00:00:00+00:00

233 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-04 00:00:00+00:00
2 test_dates from 2017-12-05 00:00:00+00:00 - 2017-12-06 00:00:00+00:00

235 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-06 00:00:00+00:00
2 test_dates from 2017-12-07 00:00:00+00:00 - 2017-12-08 00:00:00+00:00

237 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-08 00:00:00+00:00
2 test_dates from 2017-12-11 00:00:00+00:00 - 2017-12-12 00:00:00+00:00

239 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-12 00:00:00+00:00
2 test_dates from 2017-12-13 00:00:00+00:00 - 2017-12-14 00:00:00+00:00

241 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-14 00:00:00+00:00
2 test_dates from 2017-12-15 00:00:00+00:00 - 2017-12-18 00:00:00+00:00

243 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-18 00:00:00+00:00
2 test_dates from 2017-12-19 00:00:00+00:00 - 2017-12-20 00:00:00+00:00

245 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-20 00:00:00+00:00
2 test_dates from 2017-12-21 00:00:00+00:00 - 2017-12-22 00:00:00+00:00

247 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-22 00:00:00+00:00
2 test_dates from 2017-12-26 00:00:00+00:00 - 2017-12-27 00:00:00+00:00

249 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-27 00:00:00+00:00
2 test_dates from 2017-12-28 00:00:00+00:00 - 2017-12-29 00:00:00+00:00

251 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-29 00:00:00+00:00
2 test_dates from 2018-01-02 00:00:00+00:00 - 2018-01-03 00:00:00+00:00

253 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-03 00:00:00+00:00
2 test_dates from 2018-01-04 00:00:00+00:00 - 2018-01-05 00:00:00+00:00

255 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-05 00:00:00+00:00
2 test_dates from 2018-01-08 00:00:00+00:00 - 2018-01-09 00:00:00+00:00

257 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-09 00:00:00+00:00
2 test_dates from 2018-01-10 00:00:00+00:00 - 2018-01-11 00:00:00+00:00

259 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-11 00:00:00+00:00
2 test_dates from 2018-01-12 00:00:00+00:00 - 2018-01-16 00:00:00+00:00

261 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-16 00:00:00+00:00
2 test_dates from 2018-01-17 00:00:00+00:00 - 2018-01-18 00:00:00+00:00

263 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-18 00:00:00+00:00
2 test_dates from 2018-01-19 00:00:00+00:00 - 2018-01-22 00:00:00+00:00

265 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-22 00:00:00+00:00
2 test_dates from 2018-01-23 00:00:00+00:00 - 2018-01-24 00:00:00+00:00

267 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-24 00:00:00+00:00
2 test_dates from 2018-01-25 00:00:00+00:00 - 2018-01-26 00:00:00+00:00

269 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-26 00:00:00+00:00
2 test_dates from 2018-01-29 00:00:00+00:00 - 2018-01-30 00:00:00+00:00

271 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-30 00:00:00+00:00
2 test_dates from 2018-01-31 00:00:00+00:00 - 2018-02-01 00:00:00+00:00

273 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-01 00:00:00+00:00
2 test_dates from 2018-02-02 00:00:00+00:00 - 2018-02-05 00:00:00+00:00

275 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-05 00:00:00+00:00
2 test_dates from 2018-02-06 00:00:00+00:00 - 2018-02-07 00:00:00+00:00

277 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-07 00:00:00+00:00
2 test_dates from 2018-02-08 00:00:00+00:00 - 2018-02-09 00:00:00+00:00

279 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-09 00:00:00+00:00
2 test_dates from 2018-02-12 00:00:00+00:00 - 2018-02-13 00:00:00+00:00

281 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-13 00:00:00+00:00
2 test_dates from 2018-02-14 00:00:00+00:00 - 2018-02-15 00:00:00+00:00

283 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-15 00:00:00+00:00
2 test_dates from 2018-02-16 00:00:00+00:00 - 2018-02-20 00:00:00+00:00

285 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-20 00:00:00+00:00
2 test_dates from 2018-02-21 00:00:00+00:00 - 2018-02-22 00:00:00+00:00

287 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-22 00:00:00+00:00
2 test_dates from 2018-02-23 00:00:00+00:00 - 2018-02-26 00:00:00+00:00

289 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-26 00:00:00+00:00
2 test_dates from 2018-02-27 00:00:00+00:00 - 2018-02-28 00:00:00+00:00

291 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-28 00:00:00+00:00
2 test_dates from 2018-03-01 00:00:00+00:00 - 2018-03-02 00:00:00+00:00

293 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-02 00:00:00+00:00
2 test_dates from 2018-03-05 00:00:00+00:00 - 2018-03-06 00:00:00+00:00

295 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-06 00:00:00+00:00
2 test_dates from 2018-03-07 00:00:00+00:00 - 2018-03-08 00:00:00+00:00

297 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-08 00:00:00+00:00
2 test_dates from 2018-03-09 00:00:00+00:00 - 2018-03-12 00:00:00+00:00

299 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-12 00:00:00+00:00
2 test_dates from 2018-03-13 00:00:00+00:00 - 2018-03-14 00:00:00+00:00

301 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-14 00:00:00+00:00
2 test_dates from 2018-03-15 00:00:00+00:00 - 2018-03-16 00:00:00+00:00

303 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-16 00:00:00+00:00
2 test_dates from 2018-03-19 00:00:00+00:00 - 2018-03-20 00:00:00+00:00

305 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-20 00:00:00+00:00
2 test_dates from 2018-03-21 00:00:00+00:00 - 2018-03-22 00:00:00+00:00

307 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-22 00:00:00+00:00
2 test_dates from 2018-03-23 00:00:00+00:00 - 2018-03-26 00:00:00+00:00

309 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-26 00:00:00+00:00
2 test_dates from 2018-03-27 00:00:00+00:00 - 2018-03-28 00:00:00+00:00

311 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-28 00:00:00+00:00
2 test_dates from 2018-03-29 00:00:00+00:00 - 2018-04-02 00:00:00+00:00

313 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-02 00:00:00+00:00
2 test_dates from 2018-04-03 00:00:00+00:00 - 2018-04-04 00:00:00+00:00

315 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-04 00:00:00+00:00
2 test_dates from 2018-04-05 00:00:00+00:00 - 2018-04-06 00:00:00+00:00

317 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-06 00:00:00+00:00
2 test_dates from 2018-04-09 00:00:00+00:00 - 2018-04-10 00:00:00+00:00

319 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-10 00:00:00+00:00
2 test_dates from 2018-04-11 00:00:00+00:00 - 2018-04-12 00:00:00+00:00

321 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-12 00:00:00+00:00
2 test_dates from 2018-04-13 00:00:00+00:00 - 2018-04-16 00:00:00+00:00

323 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-16 00:00:00+00:00
2 test_dates from 2018-04-17 00:00:00+00:00 - 2018-04-18 00:00:00+00:00

325 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-18 00:00:00+00:00
2 test_dates from 2018-04-19 00:00:00+00:00 - 2018-04-20 00:00:00+00:00

327 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-20 00:00:00+00:00
2 test_dates from 2018-04-23 00:00:00+00:00 - 2018-04-24 00:00:00+00:00

329 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-24 00:00:00+00:00
2 test_dates from 2018-04-25 00:00:00+00:00 - 2018-04-26 00:00:00+00:00

331 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-26 00:00:00+00:00
2 test_dates from 2018-04-27 00:00:00+00:00 - 2018-04-30 00:00:00+00:00

333 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-30 00:00:00+00:00
2 test_dates from 2018-05-01 00:00:00+00:00 - 2018-05-02 00:00:00+00:00

335 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-02 00:00:00+00:00
2 test_dates from 2018-05-03 00:00:00+00:00 - 2018-05-04 00:00:00+00:00

337 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-04 00:00:00+00:00
2 test_dates from 2018-05-07 00:00:00+00:00 - 2018-05-08 00:00:00+00:00

339 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-08 00:00:00+00:00
2 test_dates from 2018-05-09 00:00:00+00:00 - 2018-05-10 00:00:00+00:00

341 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-10 00:00:00+00:00
2 test_dates from 2018-05-11 00:00:00+00:00 - 2018-05-14 00:00:00+00:00

343 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-14 00:00:00+00:00
2 test_dates from 2018-05-15 00:00:00+00:00 - 2018-05-16 00:00:00+00:00

345 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-16 00:00:00+00:00
2 test_dates from 2018-05-17 00:00:00+00:00 - 2018-05-18 00:00:00+00:00

347 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-18 00:00:00+00:00
2 test_dates from 2018-05-21 00:00:00+00:00 - 2018-05-22 00:00:00+00:00

349 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-22 00:00:00+00:00
2 test_dates from 2018-05-23 00:00:00+00:00 - 2018-05-24 00:00:00+00:00

351 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-24 00:00:00+00:00
2 test_dates from 2018-05-25 00:00:00+00:00 - 2018-05-29 00:00:00+00:00

353 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-29 00:00:00+00:00
2 test_dates from 2018-05-30 00:00:00+00:00 - 2018-05-31 00:00:00+00:00

355 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-31 00:00:00+00:00
2 test_dates from 2018-06-01 00:00:00+00:00 - 2018-06-04 00:00:00+00:00

357 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-04 00:00:00+00:00
2 test_dates from 2018-06-05 00:00:00+00:00 - 2018-06-06 00:00:00+00:00

359 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-06 00:00:00+00:00
2 test_dates from 2018-06-07 00:00:00+00:00 - 2018-06-08 00:00:00+00:00

361 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-08 00:00:00+00:00
2 test_dates from 2018-06-11 00:00:00+00:00 - 2018-06-12 00:00:00+00:00

363 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-12 00:00:00+00:00
2 test_dates from 2018-06-13 00:00:00+00:00 - 2018-06-14 00:00:00+00:00

365 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-14 00:00:00+00:00
2 test_dates from 2018-06-15 00:00:00+00:00 - 2018-06-18 00:00:00+00:00

367 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-18 00:00:00+00:00
2 test_dates from 2018-06-19 00:00:00+00:00 - 2018-06-20 00:00:00+00:00

369 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-20 00:00:00+00:00
2 test_dates from 2018-06-21 00:00:00+00:00 - 2018-06-22 00:00:00+00:00

371 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-22 00:00:00+00:00
2 test_dates from 2018-06-25 00:00:00+00:00 - 2018-06-26 00:00:00+00:00

373 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-26 00:00:00+00:00
2 test_dates from 2018-06-27 00:00:00+00:00 - 2018-06-28 00:00:00+00:00

375 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-28 00:00:00+00:00
2 test_dates from 2018-06-29 00:00:00+00:00 - 2018-07-02 00:00:00+00:00

377 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-02 00:00:00+00:00
2 test_dates from 2018-07-03 00:00:00+00:00 - 2018-07-05 00:00:00+00:00

379 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-05 00:00:00+00:00
2 test_dates from 2018-07-06 00:00:00+00:00 - 2018-07-09 00:00:00+00:00

381 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-09 00:00:00+00:00
2 test_dates from 2018-07-10 00:00:00+00:00 - 2018-07-11 00:00:00+00:00

383 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-11 00:00:00+00:00
2 test_dates from 2018-07-12 00:00:00+00:00 - 2018-07-13 00:00:00+00:00

385 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-13 00:00:00+00:00
2 test_dates from 2018-07-16 00:00:00+00:00 - 2018-07-17 00:00:00+00:00

387 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-17 00:00:00+00:00
2 test_dates from 2018-07-18 00:00:00+00:00 - 2018-07-19 00:00:00+00:00

389 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-19 00:00:00+00:00
2 test_dates from 2018-07-20 00:00:00+00:00 - 2018-07-23 00:00:00+00:00

391 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-23 00:00:00+00:00
2 test_dates from 2018-07-24 00:00:00+00:00 - 2018-07-25 00:00:00+00:00

393 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-25 00:00:00+00:00
2 test_dates from 2018-07-26 00:00:00+00:00 - 2018-07-27 00:00:00+00:00

395 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-27 00:00:00+00:00
2 test_dates from 2018-07-30 00:00:00+00:00 - 2018-07-31 00:00:00+00:00

397 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-31 00:00:00+00:00
2 test_dates from 2018-08-01 00:00:00+00:00 - 2018-08-02 00:00:00+00:00

399 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-02 00:00:00+00:00
2 test_dates from 2018-08-03 00:00:00+00:00 - 2018-08-06 00:00:00+00:00

401 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-06 00:00:00+00:00
2 test_dates from 2018-08-07 00:00:00+00:00 - 2018-08-08 00:00:00+00:00

403 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-08 00:00:00+00:00
2 test_dates from 2018-08-09 00:00:00+00:00 - 2018-08-10 00:00:00+00:00

405 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-10 00:00:00+00:00
2 test_dates from 2018-08-13 00:00:00+00:00 - 2018-08-14 00:00:00+00:00

407 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-14 00:00:00+00:00
2 test_dates from 2018-08-15 00:00:00+00:00 - 2018-08-16 00:00:00+00:00

409 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-16 00:00:00+00:00
2 test_dates from 2018-08-17 00:00:00+00:00 - 2018-08-20 00:00:00+00:00

411 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-20 00:00:00+00:00
2 test_dates from 2018-08-21 00:00:00+00:00 - 2018-08-22 00:00:00+00:00

413 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-22 00:00:00+00:00
2 test_dates from 2018-08-23 00:00:00+00:00 - 2018-08-24 00:00:00+00:00

415 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-24 00:00:00+00:00
2 test_dates from 2018-08-27 00:00:00+00:00 - 2018-08-28 00:00:00+00:00

417 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-28 00:00:00+00:00
2 test_dates from 2018-08-29 00:00:00+00:00 - 2018-08-30 00:00:00+00:00

419 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-30 00:00:00+00:00
2 test_dates from 2018-08-31 00:00:00+00:00 - 2018-09-04 00:00:00+00:00

421 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-04 00:00:00+00:00
2 test_dates from 2018-09-05 00:00:00+00:00 - 2018-09-06 00:00:00+00:00

423 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-06 00:00:00+00:00
2 test_dates from 2018-09-07 00:00:00+00:00 - 2018-09-10 00:00:00+00:00

425 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-10 00:00:00+00:00
2 test_dates from 2018-09-11 00:00:00+00:00 - 2018-09-12 00:00:00+00:00

427 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-12 00:00:00+00:00
2 test_dates from 2018-09-13 00:00:00+00:00 - 2018-09-14 00:00:00+00:00

429 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-14 00:00:00+00:00
2 test_dates from 2018-09-17 00:00:00+00:00 - 2018-09-18 00:00:00+00:00

431 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-18 00:00:00+00:00
2 test_dates from 2018-09-19 00:00:00+00:00 - 2018-09-20 00:00:00+00:00

433 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-20 00:00:00+00:00
2 test_dates from 2018-09-21 00:00:00+00:00 - 2018-09-24 00:00:00+00:00

435 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-24 00:00:00+00:00
2 test_dates from 2018-09-25 00:00:00+00:00 - 2018-09-26 00:00:00+00:00

437 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-26 00:00:00+00:00
2 test_dates from 2018-09-27 00:00:00+00:00 - 2018-09-28 00:00:00+00:00

439 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-28 00:00:00+00:00
2 test_dates from 2018-10-01 00:00:00+00:00 - 2018-10-02 00:00:00+00:00

441 train_dates from 2017-01-03 00:00:00+00:00 - 2018-10-02 00:00:00+00:00
1 test_dates from 2018-10-03 00:00:00+00:00 - 2018-10-03 00:00:00+00:00

OLS Linear Regression

Each iteration obtains the appropriate training and test dates from our custom cross-validation function, selects the corresponding features and targets, and then trains and predicts accordingly.

We capture the root mean squared error as well as the Spearman rank correlation between actual and predicted values:

In [89]:
nfolds = 250
lr = LinearRegression()

test_results, result_idx, preds = [], [], pd.DataFrame()
for train_dates, test_dates in time_series_split(dates, nfolds=nfolds):
    
    X_train = model_data.loc[idx[train_dates], features]
    y_train = model_data.loc[idx[train_dates], target]
    lr.fit(X=X_train, y=y_train)
    
    X_test = model_data.loc[idx[test_dates], features]
    y_test = model_data.loc[idx[test_dates], target]
    y_pred = lr.predict(X_test)
    
    rmse = np.sqrt(mean_squared_error(y_pred=y_pred, y_true=y_test))
    ic, pval = spearmanr(y_pred, y_test)
    
    test_results.append([rmse, ic, pval])
    preds = preds.append(y_test.to_frame('actuals').assign(predicted=y_pred))
    result_idx.append(train_dates[-1])
In [90]:
test_result = pd.DataFrame(test_results, columns=['rmse', 'ic', 'pval'], index=result_idx)

Results

We have captured the test predictions from the 250 folds and can compute both the overall and a 21-day rolling average:

In [91]:
fig, axes = plt.subplots(nrows=2)
rolling_result = test_result.rolling(21).mean().dropna()
rolling_result[['ic', 'pval']].plot(ax=axes[0], title='Information Coefficient')
axes[0].axhline(test_result.ic.mean(), lw=1, ls='--', color='k')
rolling_result[['rmse']].plot(ax=axes[1], title='Root Mean Squared Error')
axes[1].axhline(test_result.rmse.mean(), lw=1, ls='--', color='k')
plt.tight_layout();

For the entire period, we see that the Information Coefficient measured by the rank correlation of actual and predicted returns is positive and statistically significant:

In [93]:
preds_cleaned = preds[(preds.predicted.between(*preds.predicted.quantile([.001, .999]).values))]
sns.jointplot(x='actuals', y='predicted', data=preds_cleaned, stat_func=spearmanr, kind='reg');

Regularization

For the ridge regression, we need to tune the regularization parameter with the keyword alpha that corresponds to the λ we used previously. We will try 21 values from 10-5 to 105 in logarithmic steps.

Ridge Regression: L2 Penalty

The scale sensitivity of the ridge penalty requires us to standardize the inputs using the StandardScaler. Note that we always learn the mean and the standard deviation from the training set using the .fit_transform() method and then apply these learned parameters to the test set using the .transform() method.

In [94]:
nfolds = 250
alphas = np.logspace(-5, 5, 11)
scaler = StandardScaler()
ridge_result, ridge_coeffs = pd.DataFrame(), pd.DataFrame()
for i, alpha in enumerate(alphas):
    print alpha, 
    coeffs, test_results = [], []
    lr_ridge = Ridge(alpha=alpha)
    for train_dates, test_dates in time_series_split(dates, nfolds=nfolds):

        X_train = model_data.loc[idx[train_dates], features]
        y_train = model_data.loc[idx[train_dates], target]
        lr_ridge.fit(X=scaler.fit_transform(X_train), y=y_train)
        coeffs.append(lr_ridge.coef_)

        X_test = model_data.loc[idx[test_dates], features]
        y_test = model_data.loc[idx[test_dates], target]
        y_pred = lr_ridge.predict(scaler.transform(X_test))

        rmse = np.sqrt(mean_squared_error(y_pred=y_pred, y_true=y_test))
        ic, pval = spearmanr(y_pred, y_test)
        
        test_results.append([train_dates[-1], rmse, ic, pval, alpha])
        preds = preds.append(y_test)
        
    test_results = pd.DataFrame(test_results, columns=['date', 'rmse', 'ic', 'pval', 'alpha'])
    ridge_result = ridge_result.append(test_results)
    ridge_coeffs[alpha] = np.mean(coeffs, axis=0)
0 1 2 3 4 5 6 7 8 9 10 
In [95]:
ridge_result.describe()
Out[95]:
rmse ic pval alpha
count 2310.000000 2310.000000 2.310000e+03 2310.000000
mean 0.101400 0.495750 4.474752e-02 10101.010101
std 0.169797 0.243367 1.533533e-01 28576.156942
min 0.035280 -0.442096 3.416974e-43 0.000010
25% 0.073405 0.344474 1.695359e-12 0.001000
50% 0.091405 0.526362 1.853071e-07 1.000000
75% 0.110802 0.670100 1.301841e-03 1000.000000
max 6.052030 0.944032 9.886506e-01 100000.000000

Significance of Information Coefficients - p-value Distribution

In [96]:
plt.figure(figsize=(8,5))
sns.distplot(ridge_result.pval, bins=30, norm_hist=True);
In [97]:
ridge_result_sig = ridge_result[(ridge_result.pval < .05) & (ridge_result.alpha.between(10**-5, 10**5))]
ridge_result_sig_alpha = ridge_result_sig.groupby('alpha')
In [98]:
ridge_coeffs_main = ridge_coeffs.filter(ridge_result_sig.alpha.unique())

Ridge Path

We can now plot the information coefficient obtained for each hyperparameter value and also visualize how the coefficient values evolve as the regularization increases. The results show that we get the highest IC value for a value of λ=10. For this level of regularization, the right-hand panel reveals that the coefficients have been already significantly shrunk compared to the (almost) unconstrained model with λ=10-5:

In [99]:
ridge_result.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2310 entries, 0 to 209
Data columns (total 5 columns):
date     2310 non-null datetime64[ns, UTC]
rmse     2310 non-null float64
ic       2310 non-null float64
pval     2310 non-null float64
alpha    2310 non-null float64
dtypes: datetime64[ns, UTC](1), float64(4)
memory usage: 108.3 KB
In [111]:
best_ic = ridge_result_sig_alpha['ic'].mean().max()
best_alpha = ridge_result_sig_alpha['ic'].mean().idxmax()
In [101]:
fig, axes = plt.subplots(ncols=2, sharex=True)

ridge_result.groupby('alpha')['ic'].mean().plot(logx=True, title='Information Coefficient', ax=axes[0])
axes[0].axhline(ridge_result.groupby('alpha').ic.mean().median())
axes[0].axvline(x=ridge_result.groupby('alpha').ic.mean().idxmax(), c='darkgrey', ls='--')
axes[0].set_xlabel('Regularization')
axes[0].set_ylabel('Information Coefficient')

ridge_coeffs_main.T.plot(legend=False, logx=True, title='Ridge Path', ax=axes[1])
axes[1].set_xlabel('Regularization')
axes[1].set_ylabel('Coefficients')
axes[1].axvline(x=ridge_result.groupby('alpha').ic.mean().idxmax(), c='darkgrey', ls='--')
fig.tight_layout();

Top Coefficients

The standardization of the coefficients allows us to draw conclusions about their relative importance by comparing their absolute magnitude. The most relevant coefficients are:

In [102]:
model_coeffs = ridge_coeffs_main.loc[:, best_alpha]
model_coeffs.index = features
model_coeffs.abs().sort_values().plot.barh(title='Top Factors', figsize=(10,23));

CV Result Distribution

In [103]:
ax = sns.boxplot(y='ic', x='alpha', data=ridge_result_sig)
plt.xticks(rotation=90);

Lasso Regression

The lasso implementation looks very similar to the ridge model we just ran. The main difference is that lasso needs to arrive at a solution using iterative coordinate descent whereas ridge can rely on a closed-form solution:

In [104]:
nfolds = 250
alphas = np.logspace(-8, -2, 13)
scaler = StandardScaler()

lasso_results, lasso_coeffs = pd.DataFrame(), pd.DataFrame()
for i, alpha in enumerate(alphas):
    print i,
    coeffs, test_results = [], []
    lr_lasso = Lasso(alpha=alpha)
    for i, (train_dates, test_dates) in enumerate(time_series_split(dates, nfolds=nfolds)):
        X_train = model_data.loc[idx[train_dates], features]
        y_train = model_data.loc[idx[train_dates], target]
        lr_lasso.fit(X=scaler.fit_transform(X_train), y=y_train)
        
        X_test = model_data.loc[idx[test_dates], features]
        y_test = model_data.loc[idx[test_dates], target]
        y_pred = lr_lasso.predict(scaler.transform(X_test))

        rmse = np.sqrt(mean_squared_error(y_pred=y_pred, y_true=y_test))
        ic, pval = spearmanr(y_pred, y_test)
        
        coeffs.append(lr_lasso.coef_)
        test_results.append([train_dates[-1], rmse, ic, pval, alpha])
    test_results = pd.DataFrame(test_results, columns=['date', 'rmse', 'ic', 'pval', 'alpha'])
    lasso_results = lasso_results.append(test_results)
    lasso_coeffs[alpha] = np.mean(coeffs, axis=0)
0 
/venvs/py35/lib/python3.5/site-packages/sklearn/linear_model/coordinate_descent.py:444: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations
  ConvergenceWarning)
1 2 3 4 5 6 7 8 9 10 11 12 
In [105]:
lasso_results.groupby('alpha').mean()
Out[105]:
rmse ic pval
alpha
1.000000e-08 0.140951 0.540975 0.026469
3.162278e-08 0.140938 0.540965 0.026447
1.000000e-07 0.140900 0.540997 0.026423
3.162278e-07 0.140778 0.540987 0.026408
1.000000e-06 0.140402 0.540919 0.026480
3.162278e-06 0.139275 0.540936 0.026595
1.000000e-05 0.136238 0.540973 0.027882
3.162278e-05 0.126921 0.539762 0.031418
1.000000e-04 0.096590 0.530632 0.039712
3.162278e-04 0.090903 0.506456 0.060761
1.000000e-03 0.093151 0.474378 0.058909
3.162278e-03 0.096868 0.400865 0.084559
1.000000e-02 0.101567 0.269554 0.155904
In [106]:
ax = sns.boxplot(y='ic', x='alpha', data=lasso_results)
plt.xticks(rotation=90);

Cross-validated information coefficient and Lasso Path

As before, we can plot the average information coefficient for all test sets used during cross-validation. We see again that regularization improves the IC over the unconstrained model, delivering the best out-of-sample result at a level of λ=10-5. The optimal regularization value is quite different from ridge regression because the penalty consists of the sum of the absolute, not the squared values of the relatively small coefficient values. We can also see that for this regularization level, the coefficients have been similarly shrunk, as in the ridge regression case:

In [107]:
fig, axes = plt.subplots(ncols=2, sharex=True)

lasso_results.groupby('alpha')['ic'].mean().plot(logx=True, title='Information Coefficient', ax=axes[0])
axes[0].axhline(lasso_results.groupby('alpha')['ic'].mean().median())
axes[0].axvline(x=lasso_results.groupby('alpha')['ic'].mean().idxmax(), c='darkgrey', ls='--')
axes[0].set_xlabel('Regularization')
axes[0].set_ylabel('Information Coefficient')

lasso_coeffs.T.plot(legend=False, logx=True, title='Lasso Path', ax=axes[1])
axes[1].set_xlabel('Regularization')
axes[1].set_ylabel('Coefficients')
axes[1].axvline(x=lasso_results.groupby('alpha')['ic'].mean().idxmax(), c='darkgrey', ls='--')
fig.tight_layout();

In sum, ridge and lasso will produce similar results. Ridge often computes faster, but lasso also yields continuous features subset selection by gradually reducing coefficients to zero, hence eliminating features.

In [ ]: