Stock Price Prediction using Linear Regression¶

The notebook linear_regression.ipynb contains examples for the prediction of stock prices using OLS with statsmodels and sklearn, as well as ridge and lasso models.

It is designed to run as a notebook on the Quantopian research platform.

How to run this notebook¶

This notebook is written for the Quantopian research environment.

Imports¶

import pandas as pd
import numpy as np
from time import time
import talib
import re
from statsmodels.api import OLS
from sklearn.metrics import mean_squared_error
from scipy.stats import spearmanr, pearsonr
from sklearn.linear_model import LinearRegression, Ridge, RidgeCV, Lasso, LassoCV, LogisticRegression
from sklearn.preprocessing import StandardScaler

from quantopian.research import run_pipeline
from quantopian.pipeline import Pipeline, factors, filters, classifiers
from quantopian.pipeline.data.builtin import USEquityPricing

from quantopian.pipeline.factors import (Latest, 
                                         Returns, 
                                         AverageDollarVolume, 
                                         SimpleMovingAverage,
                                         EWMA,
                                         BollingerBands,
                                         CustomFactor,
                                         MarketCap,
                                        SimpleBeta)

from quantopian.pipeline.filters import QTradableStocksUS, StaticAssets
from quantopian.pipeline.data.quandl import fred_usdontd156n as libor
from empyrical import max_drawdown, sortino_ratio

import seaborn as sns
import matplotlib.pyplot as plt

Data Sources¶

################
# Fundamentals #
################

# Morningstar fundamentals (2002 - Ongoing)
# https://www.quantopian.com/help/fundamentals
from quantopian.pipeline.data import Fundamentals

#####################
# Analyst Estimates #
#####################

# Earnings Surprises - Zacks (27 May 2006 - Ongoing)
# https://www.quantopian.com/data/zacks/earnings_surprises
from quantopian.pipeline.data.zacks import EarningsSurprises
from quantopian.pipeline.factors.zacks import BusinessDaysSinceEarningsSurprisesAnnouncement

##########
# Events #
##########

# Buyback Announcements - EventVestor (01 Jun 2007 - Ongoing)
# https://www.quantopian.com/data/eventvestor/buyback_auth
from quantopian.pipeline.data.eventvestor import BuybackAuthorizations
from quantopian.pipeline.factors.eventvestor import BusinessDaysSinceBuybackAuth

# CEO Changes - EventVestor (01 Jan 2007 - Ongoing)
# https://www.quantopian.com/data/eventvestor/ceo_change
from quantopian.pipeline.data.eventvestor import CEOChangeAnnouncements

# Dividends - EventVestor (01 Jan 2007 - Ongoing)
# https://www.quantopian.com/data/eventvestor/dividends
from quantopian.pipeline.data.eventvestor import (
    DividendsByExDate,
    DividendsByPayDate,
    DividendsByAnnouncementDate,
)
from quantopian.pipeline.factors.eventvestor import (
    BusinessDaysSincePreviousExDate,
    BusinessDaysUntilNextExDate,
    BusinessDaysSinceDividendAnnouncement,
)

# Earnings Calendar - EventVestor (01 Jan 2007 - Ongoing)
# https://www.quantopian.com/data/eventvestor/earnings_calendar
from quantopian.pipeline.data.eventvestor import EarningsCalendar
from quantopian.pipeline.factors.eventvestor import (
    BusinessDaysUntilNextEarnings,
    BusinessDaysSincePreviousEarnings
)

# 13D Filings - EventVestor (01 Jan 2007 - Ongoing)
# https://www.quantopian.com/data/eventvestor/_13d_filings
from quantopian.pipeline.data.eventvestor import _13DFilings
from quantopian.pipeline.factors.eventvestor import BusinessDaysSince13DFilingsDate

#############
# Sentiment #
#############

# News Sentiment - Sentdex Sentiment Analysis (15 Oct 2012 - Ongoing)
# https://www.quantopian.com/data/sentdex/sentiment
from quantopian.pipeline.data.sentdex import sentiment

Prepare the Data¶

We need to select a universe of equities and a time horizon, build and transform alpha factors that we will use as features, calculate forward returns that we aim to predict, and potentially clean our data.

Time horizon¶

# trading days per period
MONTH = 21
YEAR = 12 * MONTH

START = '2017-01-01'
END = '2018-12-31'

Universe¶

We will use equity data for the years 2014 and 2015 from a custom Q50US universe that uses built-in filters, factors, and classifiers to select the 50 stocks with the highest average dollar volume of the last 200 trading days filtered by additional default criteria (see Quantopian docs linked on GitHub for detail). The universe dynamically updates based on the filter criteria so that, while there are 100 stocks at any given point, there may be more than 50 distinct equities in the sample:

def Q50US():
    return filters.make_us_equity_universe(
        target_size=50,
        rankby=factors.AverageDollarVolume(window_length=200),
        mask=filters.default_us_equity_universe_mask(),
        groupby=classifiers.fundamentals.Sector(),
        max_group_weight=0.3,
        smoothing_func=lambda f: f.downsample('month_start'),
    )

# UNIVERSE = StaticAssets(symbols(['MSFT', 'AAPL']))
UNIVERSE = Q50US()

Factor Transformations¶

class AnnualizedData(CustomFactor):
    # Get the sum of the last 4 reported values
    window_length = 260

    def compute(self, today, assets, out, asof_date, values):
        for asset in range(len(assets)):
            # unique asof dates indicate availability of new figures
            _, filing_dates = np.unique(asof_date[:, asset], return_index=True)
            quarterly_values = values[filing_dates[-4:], asset]
            # ignore annual windows with <4 quarterly data points
            if len(~np.isnan(quarterly_values)) != 4:
                out[asset] = np.nan
            else:
                out[asset] = np.sum(quarterly_values)

class AnnualAvg(CustomFactor):
    window_length = 252
    
    def compute(self, today, assets, out, values):
        out[:] = (values[0] + values[-1])/2

def run_pipeline_chunks(pipe, start_date, end_date, chunks_len = None):
    
    chunks  = []
    current = pd.Timestamp(start_date)
    end     = pd.Timestamp(end_date)
    step    = pd.Timedelta(weeks=26) if chunks_len is None else chunks_len
    
    start_pipeline_timer = time()
    
    while current <= end:
        
        current_end = current + step
        if current_end > end:
            current_end = end
        
        start_timer = time()
        print 'Running pipeline:', current, ' - ', current_end
        results = run_pipeline(pipe, current.strftime("%Y-%m-%d"), current_end.strftime("%Y-%m-%d"))
        chunks.append(results)
        
        # pipeline returns more days than requested (if no trading day), so get last date from the results
        current_end = results.index.get_level_values(0)[-1].tz_localize(None)
        current = current_end + pd.Timedelta(days=1)
        
        end_timer = time()
        print "Time to run this chunk of the pipeline %.2f secs" % (end_timer - start_timer)
        
    end_pipeline_timer = time()
    print "Time to run the entire pipeline %.2f secs" % (end_pipeline_timer - start_pipeline_timer)
    return pd.concat(chunks)

def factor_pipeline(factors):
    start = time()
    pipe = Pipeline({k: v(mask=UNIVERSE).rank() for k, v in factors.items()},
                    screen=UNIVERSE)
    result = run_pipeline_chunks(pipe, start_date=START, end_date=END)
    return result, time() - start

Factor Library¶

Value Factors¶

class ValueFactors:
    """Definitions of factors for cross-sectional trading algorithms"""
    
    @staticmethod
    def PriceToSalesTTM(**kwargs):
        """Last closing price divided by sales per share"""        
        return Fundamentals.ps_ratio.latest

    @staticmethod
    def PriceToEarningsTTM(**kwargs):
        """Closing price divided by earnings per share (EPS)"""
        return Fundamentals.pe_ratio.latest
 
    @staticmethod
    def PriceToDilutedEarningsTTM(mask):
        """Closing price divided by diluted EPS"""
        last_close = USEquityPricing.close.latest
        diluted_eps = AnnualizedData(inputs = [Fundamentals.diluted_eps_earnings_reports_asof_date,
                                               Fundamentals.diluted_eps_earnings_reports],
                                     mask=mask)
        return last_close / diluted_eps

    @staticmethod
    def PriceToForwardEarnings(**kwargs):
        """Price to Forward Earnings"""
        return Fundamentals.forward_pe_ratio.latest
    
    @staticmethod
    def DividendYield(**kwargs):
        """Dividends per share divided by closing price"""
        return Fundamentals.trailing_dividend_yield.latest

    @staticmethod
    def PriceToFCF(mask):
        """Price to Free Cash Flow"""
        last_close = USEquityPricing.close.latest
        fcf_share = AnnualizedData(inputs = [Fundamentals.fcf_per_share_asof_date,
                                             Fundamentals.fcf_per_share],
                                   mask=mask)
        return last_close / fcf_share

    @staticmethod
    def PriceToOperatingCashflow(mask):
        """Last Close divided by Operating Cash Flows"""
        last_close = USEquityPricing.close.latest
        cfo_per_share = AnnualizedData(inputs = [Fundamentals.cfo_per_share_asof_date,
                                                 Fundamentals.cfo_per_share],
                                       mask=mask)        
        return last_close / cfo_per_share

    @staticmethod
    def PriceToBook(mask):
        """Closing price divided by book value"""
        last_close = USEquityPricing.close.latest
        book_value_per_share = AnnualizedData(inputs = [Fundamentals.book_value_per_share_asof_date,
                                              Fundamentals.book_value_per_share],
                                             mask=mask)        
        return last_close / book_value_per_share


    @staticmethod
    def EVToFCF(mask):
        """Enterprise Value divided by Free Cash Flows"""
        fcf = AnnualizedData(inputs = [Fundamentals.free_cash_flow_asof_date,
                                       Fundamentals.free_cash_flow],
                             mask=mask)
        return Fundamentals.enterprise_value.latest / fcf

    @staticmethod
    def EVToEBITDA(mask):
        """Enterprise Value to Earnings Before Interest, Taxes, Deprecation and Amortization (EBITDA)"""
        ebitda = AnnualizedData(inputs = [Fundamentals.ebitda_asof_date,
                                          Fundamentals.ebitda],
                                mask=mask)

        return Fundamentals.enterprise_value.latest / ebitda

    @staticmethod
    def EBITDAYield(mask):
        """EBITDA divided by latest close"""
        ebitda = AnnualizedData(inputs = [Fundamentals.ebitda_asof_date,
                                          Fundamentals.ebitda],
                                mask=mask)
        return USEquityPricing.close.latest / ebitda

VALUE_FACTORS = {
    'DividendYield'            : ValueFactors.DividendYield,
    'EBITDAYield'              : ValueFactors.EBITDAYield,
    'EVToEBITDA'               : ValueFactors.EVToEBITDA,
    'EVToFCF'                  : ValueFactors.EVToFCF,
    'PriceToBook'              : ValueFactors.PriceToBook,
    'PriceToDilutedEarningsTTM': ValueFactors.PriceToDilutedEarningsTTM,
    'PriceToEarningsTTM'       : ValueFactors.PriceToEarningsTTM,
    'PriceToFCF'               : ValueFactors.PriceToFCF,
    'PriceToForwardEarnings'   : ValueFactors.PriceToForwardEarnings,
    'PriceToOperatingCashflow' : ValueFactors.PriceToOperatingCashflow,
    'PriceToSalesTTM'          : ValueFactors.PriceToSalesTTM,
}

value_factors, t = factor_pipeline(VALUE_FACTORS)
print('Pipeline run time {:.2f} secs'.format(t))
value_factors.info()

Running pipeline: 2017-01-01 00:00:00  -  2017-07-02 00:00:00

Time to run this chunk of the pipeline 52.05 secs
Running pipeline: 2017-07-04 00:00:00  -  2018-01-02 00:00:00

/venvs/py35/lib/python3.5/site-packages/numpy/lib/arraysetops.py:200: FutureWarning: In the future, NAT != NAT will be True rather than False.
  flag = np.concatenate(([True], aux[1:] != aux[:-1]))

Time to run this chunk of the pipeline 38.93 secs
Running pipeline: 2018-01-03 00:00:00  -  2018-07-04 00:00:00

Time to run this chunk of the pipeline 39.53 secs
Running pipeline: 2018-07-06 00:00:00  -  2018-12-31 00:00:00

Time to run this chunk of the pipeline 39.46 secs
Time to run the entire pipeline 169.98 secs
Pipeline run time 169.99 secs
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 11 columns):
DividendYield                19739 non-null float64
EBITDAYield                  21929 non-null float64
EVToEBITDA                   21929 non-null float64
EVToFCF                      25005 non-null float64
PriceToBook                  25100 non-null float64
PriceToDilutedEarningsTTM    24985 non-null float64
PriceToEarningsTTM           24804 non-null float64
PriceToFCF                   25100 non-null float64
PriceToForwardEarnings       25080 non-null float64
PriceToOperatingCashflow     25100 non-null float64
PriceToSalesTTM              25100 non-null float64
dtypes: float64(11)
memory usage: 2.3+ MB

Momentum¶

class MomentumFactors:
    """Custom Momentum Factors"""
    class PercentAboveLow(CustomFactor):
        """Percentage of current close above low 
        in lookback window of window_length days
        """
        inputs = [USEquityPricing.close]
        window_length = 252

        def compute(self, today, assets, out, close):
            out[:] = close[-1] / np.min(close, axis=0) - 1

    class PercentBelowHigh(CustomFactor):
        """Percentage of current close below high 
        in lookback window of window_length days
        """
        
        inputs = [USEquityPricing.close]
        window_length = 252
            
        def compute(self, today, assets, out, close):
            out[:] = close[-1] / np.max(close, axis=0) - 1

    @staticmethod
    def make_dx(timeperiod=14):
        class DX(CustomFactor):
            """Directional Movement Index"""
            inputs = [USEquityPricing.high, 
                      USEquityPricing.low, 
                      USEquityPricing.close]
            window_length = timeperiod + 1
            
            def compute(self, today, assets, out, high, low, close):
                out[:] = [talib.DX(high[:, i], 
                                   low[:, i], 
                                   close[:, i], 
                                   timeperiod=timeperiod)[-1] 
                          for i in range(len(assets))]
        return DX  

    @staticmethod
    def make_mfi(timeperiod=14):
        class MFI(CustomFactor):
            """Money Flow Index"""
            inputs = [USEquityPricing.high, 
                      USEquityPricing.low, 
                      USEquityPricing.close,
                      USEquityPricing.volume]
            window_length = timeperiod + 1
            
            def compute(self, today, assets, out, high, low, close, vol):
                out[:] = [talib.MFI(high[:, i], 
                                    low[:, i], 
                                    close[:, i],
                                    vol[:, i],
                                    timeperiod=timeperiod)[-1] 
                          for i in range(len(assets))]
        return MFI           

    @staticmethod
    def make_oscillator(fastperiod=12, slowperiod=26, matype=0):
        class PPO(CustomFactor):
            """12/26-Day Percent Price Oscillator"""
            inputs = [USEquityPricing.close]
            window_length = slowperiod

            def compute(self, today, assets, out, close_prices):
                out[:] = [talib.PPO(close,
                                    fastperiod=fastperiod,
                                    slowperiod=slowperiod, 
                                    matype=matype)[-1]
                         for close in close_prices.T]
        return PPO

    @staticmethod
    def make_stochastic_oscillator(fastk_period=5, slowk_period=3, slowd_period=3, 
                                   slowk_matype=0, slowd_matype=0):                
        class StochasticOscillator(CustomFactor):
            """20-day Stochastic Oscillator """
            inputs = [USEquityPricing.high, 
                      USEquityPricing.low, 
                      USEquityPricing.close]
            outputs = ['slowk', 'slowd']
            window_length = fastk_period * 2
            
            def compute(self, today, assets, out, high, low, close):
                slowk, slowd = [talib.STOCH(high[:, i],
                                            low[:, i],
                                            close[:, i], 
                                            fastk_period=fastk_period,
                                            slowk_period=slowk_period, 
                                            slowk_matype=slowk_matype, 
                                            slowd_period=slowd_period, 
                                            slowd_matype=slowd_matype)[-1] 
                                for i in range(len(assets))]

                out.slowk[:], out.slowd[:] = slowk[-1], slowd[-1]
        return StochasticOscillator
    
    @staticmethod
    def make_trendline(timeperiod=252):                
        class Trendline(CustomFactor):
            inputs = [USEquityPricing.close]
            """52-Week Trendline"""
            window_length = timeperiod

            def compute(self, today, assets, out, close_prices):
                out[:] = [talib.LINEARREG_SLOPE(close, 
                                   timeperiod=timeperiod)[-1] 
                          for close in close_prices.T]
        return Trendline

MOMENTUM_FACTORS = {
    'Percent Above Low'            : MomentumFactors.PercentAboveLow,
    'Percent Below High'           : MomentumFactors.PercentBelowHigh,
    'Price Oscillator'             : MomentumFactors.make_oscillator(),
    'Money Flow Index'             : MomentumFactors.make_mfi(),
    'Directional Movement Index'   : MomentumFactors.make_dx(),
    'Trendline'                    : MomentumFactors.make_trendline()
}

momentum_factors, t = factor_pipeline(MOMENTUM_FACTORS)
print('Pipeline run time {:.2f} secs'.format(t))
momentum_factors.info()

Running pipeline: 2017-01-01 00:00:00  -  2017-07-02 00:00:00

Time to run this chunk of the pipeline 6.43 secs
Running pipeline: 2017-07-04 00:00:00  -  2018-01-02 00:00:00

Time to run this chunk of the pipeline 6.47 secs
Running pipeline: 2018-01-03 00:00:00  -  2018-07-04 00:00:00

Time to run this chunk of the pipeline 6.44 secs
Running pipeline: 2018-07-06 00:00:00  -  2018-12-31 00:00:00

Time to run this chunk of the pipeline 6.40 secs
Time to run the entire pipeline 25.75 secs
Pipeline run time 25.75 secs
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 6 columns):
Directional Movement Index    25100 non-null float64
Money Flow Index              25100 non-null float64
Percent Above Low             25018 non-null float64
Percent Below High            25018 non-null float64
Price Oscillator              25100 non-null float64
Trendline                     25018 non-null float64
dtypes: float64(6)
memory usage: 1.3+ MB

Efficiency Factors¶

class EfficiencyFactors:

    @staticmethod
    def CapexToAssets(mask):
        """Capital Expenditure divided by Total Assets"""
        capex = AnnualizedData(inputs = [Fundamentals.capital_expenditure_asof_date,
                                         Fundamentals.capital_expenditure],
                                     mask=mask)   
        assets = Fundamentals.total_assets.latest
        return - capex / assets

    @staticmethod
    def CapexToSales(mask):
        """Capital Expenditure divided by Total Revenue"""
        capex = AnnualizedData(inputs = [Fundamentals.capital_expenditure_asof_date,
                                         Fundamentals.capital_expenditure],
                                     mask=mask)   
        revenue = AnnualizedData(inputs = [Fundamentals.total_revenue_asof_date,
                                         Fundamentals.total_revenue],
                                     mask=mask)         
        return - capex / revenue
  
    @staticmethod
    def CapexToFCF(mask):
        """Capital Expenditure divided by Free Cash Flows"""
        capex = AnnualizedData(inputs = [Fundamentals.capital_expenditure_asof_date,
                                         Fundamentals.capital_expenditure],
                                     mask=mask)   
        free_cash_flow = AnnualizedData(inputs = [Fundamentals.free_cash_flow_asof_date,
                                         Fundamentals.free_cash_flow],
                                     mask=mask)         
        return - capex / free_cash_flow

    @staticmethod
    def EBITToAssets(mask):
        """Earnings Before Interest and Taxes (EBIT) divided by Total Assets"""
        ebit = AnnualizedData(inputs = [Fundamentals.ebit_asof_date,
                                         Fundamentals.ebit],
                                     mask=mask)   
        assets = Fundamentals.total_assets.latest
        return ebit / assets
    
    @staticmethod
    def CFOToAssets(mask):
        """Operating Cash Flows divided by Total Assets"""
        cfo = AnnualizedData(inputs = [Fundamentals.operating_cash_flow_asof_date,
                                         Fundamentals.operating_cash_flow],
                                     mask=mask)   
        assets = Fundamentals.total_assets.latest
        return cfo / assets 
    
    @staticmethod
    def RetainedEarningsToAssets(mask):
        """Retained Earnings divided by Total Assets"""
        retained_earnings = AnnualizedData(inputs = [Fundamentals.retained_earnings_asof_date,
                                         Fundamentals.retained_earnings],
                                     mask=mask)   
        assets = Fundamentals.total_assets.latest
        return retained_earnings / assets

EFFICIENCY_FACTORS = {
    'CFO To Assets' :EfficiencyFactors.CFOToAssets,
    'Capex To Assets' :EfficiencyFactors.CapexToAssets,
    'Capex To FCF' :EfficiencyFactors.CapexToFCF,
    'Capex To Sales' :EfficiencyFactors.CapexToSales,
    'EBIT To Assets' :EfficiencyFactors.EBITToAssets,
    'Retained Earnings To Assets' :EfficiencyFactors.RetainedEarningsToAssets
    }

efficiency_factors, t = factor_pipeline(EFFICIENCY_FACTORS)
print('Pipeline run time {:.2f} secs'.format(t))
efficiency_factors.info()

Running pipeline: 2017-01-01 00:00:00  -  2017-07-02 00:00:00

Time to run this chunk of the pipeline 10.96 secs
Running pipeline: 2017-07-04 00:00:00  -  2018-01-02 00:00:00

Time to run this chunk of the pipeline 11.66 secs
Running pipeline: 2018-01-03 00:00:00  -  2018-07-04 00:00:00

Time to run this chunk of the pipeline 11.82 secs
Running pipeline: 2018-07-06 00:00:00  -  2018-12-31 00:00:00

Time to run this chunk of the pipeline 11.92 secs
Time to run the entire pipeline 46.36 secs
Pipeline run time 46.37 secs
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 6 columns):
CFO To Assets                  25005 non-null float64
Capex To Assets                23566 non-null float64
Capex To FCF                   23566 non-null float64
Capex To Sales                 23566 non-null float64
EBIT To Assets                 22369 non-null float64
Retained Earnings To Assets    25005 non-null float64
dtypes: float64(6)
memory usage: 1.3+ MB

Risk Factors¶

class RiskFactors:

    @staticmethod
    def LogMarketCap(mask):
        """Log of Market Capitalization log(Close Price * Shares Outstanding)"""
        return np.log(MarketCap(mask=mask))
 
    class DownsideRisk(CustomFactor):
        """Mean returns divided by std of 1yr daily losses (Sortino Ratio)"""
        inputs = [USEquityPricing.close]
        window_length = 252

        def compute(self, today, assets, out, close):
            ret = pd.DataFrame(close).pct_change()
            out[:] = ret.mean().div(ret.where(ret<0).std())

    @staticmethod
    def MarketBeta(**kwargs):
        """Slope of 1-yr regression of price returns against index returns"""
        return SimpleBeta(target=symbols('SPY'), regression_length=252) 

    class DownsideBeta(CustomFactor):
        """Slope of 1yr regression of returns on negative index returns"""
        inputs = [USEquityPricing.close]
        window_length = 252

        def compute(self, today, assets, out, close):
            t = len(close)
            assets = pd.DataFrame(close).pct_change()
            
            start_date = (today - pd.DateOffset(years=1)).strftime('%Y-%m-%d')
            spy = get_pricing('SPY', 
                              start_date=start_date, 
                              end_date=today.strftime('%Y-%m-%d')).reset_index(drop=True)
            spy_neg_ret = (spy
                           .close_price
                           .iloc[-t:]
                           .pct_change()
                           .pipe(lambda x: x.where(x<0)))
    
            out[:] = assets.apply(lambda x: x.cov(spy_neg_ret)).div(spy_neg_ret.var())         

    class Vol3M(CustomFactor):
        """3-month Volatility: Standard deviation of returns over 3 months"""

        inputs = [USEquityPricing.close]
        window_length = 63

        def compute(self, today, assets, out, close):
            out[:] = np.log1p(pd.DataFrame(close).pct_change()).std()

RISK_FACTORS = {
    'Log Market Cap' : RiskFactors.LogMarketCap,
    'Downside Risk'  : RiskFactors.DownsideRisk,
    'Index Beta'     : RiskFactors.MarketBeta,
     #'Downside Beta'  : RiskFactors.DownsideBeta,    
    'Volatility 3M'  : RiskFactors.Vol3M,    
}

risk_factors, t = factor_pipeline(RISK_FACTORS)
print('Pipeline run time {:.2f} secs'.format(t))
risk_factors.info()

Running pipeline: 2017-01-01 00:00:00  -  2017-07-02 00:00:00

Time to run this chunk of the pipeline 12.55 secs
Running pipeline: 2017-07-04 00:00:00  -  2018-01-02 00:00:00

Time to run this chunk of the pipeline 11.74 secs
Running pipeline: 2018-01-03 00:00:00  -  2018-07-04 00:00:00

Time to run this chunk of the pipeline 12.03 secs
Running pipeline: 2018-07-06 00:00:00  -  2018-12-31 00:00:00

Time to run this chunk of the pipeline 11.87 secs
Time to run the entire pipeline 48.19 secs
Pipeline run time 55.26 secs
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 4 columns):
Downside Risk     25100 non-null float64
Index Beta        25100 non-null float64
Log Market Cap    25100 non-null float64
Volatility 3M     25100 non-null float64
dtypes: float64(4)
memory usage: 980.5+ KB

Growth Factors¶

def growth_pipeline():
    revenue = AnnualizedData(inputs = [Fundamentals.total_revenue_asof_date,
                                       Fundamentals.total_revenue],
                             mask=UNIVERSE)
    eps = AnnualizedData(inputs = [Fundamentals.diluted_eps_earnings_reports_asof_date,
                                       Fundamentals.diluted_eps_earnings_reports],
                             mask=UNIVERSE)    

    return Pipeline({'Sales': revenue,
                     'EPS': eps,
                     'Total Assets': Fundamentals.total_assets.latest,
                     'Net Debt': Fundamentals.net_debt.latest},
                    screen=UNIVERSE)

start_timer = time()
growth_factors = run_pipeline(growth_pipeline(), start_date=START, end_date=END)

for col in growth_factors.columns:
    for month in [3, 12]:
        new_col = col + ' Growth {}M'.format(month)
        kwargs = {new_col: growth_factors[col].pct_change(month*MONTH).groupby(level=1).rank()}        
        growth_factors = growth_factors.assign(**kwargs)
print('Pipeline run time {:.2f} secs'.format(time() - start_timer))
growth_factors.info()

Pipeline run time 24.33 secs
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 12 columns):
EPS                        24985 non-null float64
Net Debt                   23832 non-null float64
Sales                      25005 non-null float64
Total Assets               25100 non-null float64
EPS Growth 3M              24922 non-null float64
EPS Growth 12M             24733 non-null float64
Net Debt Growth 3M         23772 non-null float64
Net Debt Growth 12M        23595 non-null float64
Sales Growth 3M            24942 non-null float64
Sales Growth 12M           24753 non-null float64
Total Assets Growth 3M     25037 non-null float64
Total Assets Growth 12M    24848 non-null float64
dtypes: float64(12)
memory usage: 2.5+ MB

Quality Factors¶

class QualityFactors:
    
    @staticmethod
    def AssetTurnover(mask):
        """Sales divided by average of year beginning and year end assets"""

        assets = AnnualAvg(inputs=[Fundamentals.total_assets],
                           mask=mask)
        sales = AnnualizedData([Fundamentals.total_revenue_asof_date,
                                Fundamentals.total_revenue], mask=mask)
        return sales / assets
  
    @staticmethod
    def CurrentRatio(mask):
        """Total current assets divided by total current liabilities"""

        assets = Fundamentals.current_assets.latest
        liabilities = Fundamentals.current_liabilities.latest
        return assets / liabilities
    
    @staticmethod
    def AssetToEquityRatio(mask):
        """Total current assets divided by common equity"""

        assets = Fundamentals.current_assets.latest
        equity = Fundamentals.common_stock.latest
        return assets / equity    

    
    @staticmethod
    def InterestCoverage(mask):
        """EBIT divided by interest expense"""

        ebit = AnnualizedData(inputs = [Fundamentals.ebit_asof_date,
                                        Fundamentals.ebit], mask=mask)  
        
        interest_expense = AnnualizedData(inputs = [Fundamentals.interest_expense_asof_date,
                                        Fundamentals.interest_expense], mask=mask)
        return ebit / interest_expense

    @staticmethod
    def DebtToAssetRatio(mask):
        """Total Debts divided by Total Assets"""

        debt = Fundamentals.total_debt.latest
        assets = Fundamentals.total_assets.latest
        return debt / assets
    
    @staticmethod
    def DebtToEquityRatio(mask):
        """Total Debts divided by Common Stock Equity"""

        debt = Fundamentals.total_debt.latest
        equity = Fundamentals.common_stock.latest
        return debt / equity    

    @staticmethod
    def WorkingCapitalToAssets(mask):
        """Current Assets less Current liabilities (Working Capital) divided by Assets"""

        working_capital = Fundamentals.working_capital.latest
        assets = Fundamentals.total_assets.latest
        return working_capital / assets
 
    @staticmethod
    def WorkingCapitalToSales(mask):
        """Current Assets less Current liabilities (Working Capital), divided by Sales"""

        working_capital = Fundamentals.working_capital.latest
        sales = AnnualizedData([Fundamentals.total_revenue_asof_date,
                                Fundamentals.total_revenue], mask=mask)        
        return working_capital / sales          
       
        
    class MertonsDD(CustomFactor):
        """Merton's Distance to Default """
        
        inputs = [Fundamentals.total_assets,
                  Fundamentals.total_liabilities, 
                  libor.value, 
                  USEquityPricing.close]
        window_length = 252

        def compute(self, today, assets, out, tot_assets, tot_liabilities, r, close):
            mertons = []

            for col_assets, col_liabilities, col_r, col_close in zip(tot_assets.T, tot_liabilities.T,
                                                                     r.T, close.T):
                vol_1y = np.nanstd(col_close)
                numerator = np.log(
                        col_assets[-1] / col_liabilities[-1]) + ((252 * col_r[-1]) - ((vol_1y ** 2) / 2))
                mertons.append(numerator / vol_1y)

            out[:] = mertons

QUALITY_FACTORS = {
    'AssetToEquityRatio'    : QualityFactors.AssetToEquityRatio,
    'AssetTurnover'         : QualityFactors.AssetTurnover,
    'CurrentRatio'          : QualityFactors.CurrentRatio,
    'DebtToAssetRatio'      : QualityFactors.DebtToAssetRatio,
    'DebtToEquityRatio'     : QualityFactors.DebtToEquityRatio,
    'InterestCoverage'      : QualityFactors.InterestCoverage,
    'MertonsDD'             : QualityFactors.MertonsDD,
    'WorkingCapitalToAssets': QualityFactors.WorkingCapitalToAssets,
    'WorkingCapitalToSales' : QualityFactors.WorkingCapitalToSales,
}

quality_factors, t = factor_pipeline(QUALITY_FACTORS)
print('Pipeline run time {:.2f} secs'.format(t))
quality_factors.info()

Running pipeline: 2017-01-01 00:00:00  -  2017-07-02 00:00:00

Time to run this chunk of the pipeline 36.23 secs
Running pipeline: 2017-07-04 00:00:00  -  2018-01-02 00:00:00

Time to run this chunk of the pipeline 34.32 secs
Running pipeline: 2018-01-03 00:00:00  -  2018-07-04 00:00:00

Time to run this chunk of the pipeline 34.36 secs
Running pipeline: 2018-07-06 00:00:00  -  2018-12-31 00:00:00

Time to run this chunk of the pipeline 33.97 secs
Time to run the entire pipeline 138.88 secs
Pipeline run time 138.89 secs
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 9 columns):
AssetToEquityRatio        22464 non-null float64
AssetTurnover             24985 non-null float64
CurrentRatio              22464 non-null float64
DebtToAssetRatio          25080 non-null float64
DebtToEquityRatio         24551 non-null float64
InterestCoverage          20461 non-null float64
MertonsDD                 25100 non-null float64
WorkingCapitalToAssets    22464 non-null float64
WorkingCapitalToSales     22369 non-null float64
dtypes: float64(9)
memory usage: 1.9+ MB

Payout Factors¶

class PayoutFactors:

    @staticmethod
    def DividendPayoutRatio(mask):
        """Dividends Per Share divided by Earnings Per Share"""

        dps = AnnualizedData(inputs = [Fundamentals.dividend_per_share_earnings_reports_asof_date,
                                        Fundamentals.dividend_per_share_earnings_reports], mask=mask)  
        
        eps = AnnualizedData(inputs = [Fundamentals.basic_eps_earnings_reports_asof_date,
                                        Fundamentals.basic_eps_earnings_reports], mask=mask)
        return dps / eps
    
    @staticmethod
    def DividendGrowth(**kwargs):
        """Annualized percentage DPS change"""        
        return Fundamentals.dps_growth.latest

PAYOUT_FACTORS = {
    'Dividend Payout Ratio': PayoutFactors.DividendPayoutRatio,
    'Dividend Growth': PayoutFactors.DividendGrowth
}

payout_factors, t = factor_pipeline(PAYOUT_FACTORS)
print('Pipeline run time {:.2f} secs'.format(t))
payout_factors.info()

Running pipeline: 2017-01-01 00:00:00  -  2017-07-02 00:00:00

Time to run this chunk of the pipeline 7.93 secs
Running pipeline: 2017-07-04 00:00:00  -  2018-01-02 00:00:00

Time to run this chunk of the pipeline 7.02 secs
Running pipeline: 2018-01-03 00:00:00  -  2018-07-04 00:00:00

Time to run this chunk of the pipeline 7.05 secs
Running pipeline: 2018-07-06 00:00:00  -  2018-12-31 00:00:00

Time to run this chunk of the pipeline 7.27 secs
Time to run the entire pipeline 29.27 secs
Pipeline run time 29.28 secs
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 2 columns):
Dividend Growth          19558 non-null float64
Dividend Payout Ratio    19418 non-null float64
dtypes: float64(2)
memory usage: 588.3+ KB

Profitability Factors¶

class ProfitabilityFactors:
    
    @staticmethod
    def GrossProfitMargin(mask):
        """Gross Profit divided by Net Sales"""

        gross_profit = AnnualizedData([Fundamentals.gross_profit_asof_date,
                              Fundamentals.gross_profit], mask=mask)  
        sales = AnnualizedData([Fundamentals.total_revenue_asof_date,
                                Fundamentals.total_revenue], mask=mask)
        return gross_profit / sales   
    
    @staticmethod
    def NetIncomeMargin(mask):
        """Net income divided by Net Sales"""

        net_income = AnnualizedData([Fundamentals.net_income_income_statement_asof_date,
                              Fundamentals.net_income_income_statement], mask=mask)  
        sales = AnnualizedData([Fundamentals.total_revenue_asof_date,
                                Fundamentals.total_revenue], mask=mask)
        return net_income / sales

PROFITABIILTY_FACTORS = {
    'Gross Profit Margin': ProfitabilityFactors.GrossProfitMargin,
    'Net Income Margin': ProfitabilityFactors.NetIncomeMargin,
    'Return on Equity': Fundamentals.roe.latest,
    'Return on Assets': Fundamentals.roa.latest,
    'Return on Invested Capital': Fundamentals.roic.latest
}

profitability_factors, t = factor_pipeline(PAYOUT_FACTORS)
print('Pipeline run time {:.2f} secs'.format(t))
payout_factors.info()

Running pipeline: 2017-01-01 00:00:00  -  2017-07-02 00:00:00

Time to run this chunk of the pipeline 7.78 secs
Running pipeline: 2017-07-04 00:00:00  -  2018-01-02 00:00:00

Time to run this chunk of the pipeline 7.20 secs
Running pipeline: 2018-01-03 00:00:00  -  2018-07-04 00:00:00

Time to run this chunk of the pipeline 7.09 secs
Running pipeline: 2018-07-06 00:00:00  -  2018-12-31 00:00:00

Time to run this chunk of the pipeline 8.04 secs
Time to run the entire pipeline 30.12 secs
Pipeline run time 30.13 secs
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 2 columns):
Dividend Growth          19558 non-null float64
Dividend Payout Ratio    19418 non-null float64
dtypes: float64(2)
memory usage: 588.3+ KB

Build Dataset¶

Get Returns¶

We will test predictions for various lookahead periods to identify the best holding periods that generate the best predictability, measured by the information coefficient.

More specifically, we compute returns for 1, 5, 10, 20 and 60 days using the built-in Returns function, resulting in over 25,000 observations for the universe of 100 stocks over two years (that include approximately 252 trading days each)

lookahead = [1, 5, 10, 20, 60]
returns = run_pipeline(Pipeline({'Returns{}D'.format(i): Returns(inputs=[USEquityPricing.close], 
                                          window_length=i+1, mask=UNIVERSE) for i in lookahead},
                                screen=UNIVERSE),
                       start_date=START, 
                       end_date=END)
return_cols = ['Returns{}D'.format(i) for i in lookahead]
returns.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 5 columns):
Returns10D    25100 non-null float64
Returns1D     25100 non-null float64
Returns20D    25100 non-null float64
Returns5D     25100 non-null float64
Returns60D    25100 non-null float64
dtypes: float64(5)
memory usage: 1.1+ MB

We will use over 50 features that cover a broad range of factors based on market, fundamental, and alternative data. The notebook also includes custom transformations to convert fundamental data that is typically available in quarterly reporting frequency to rolling annual totals or averages to avoid excessive season fluctuations.

Once the factors have been computed we combine them using pd.concat(), assign index names, and create a categorical variable that identifies the asset for each data point:

data = pd.concat([returns,
                 value_factors,
                 momentum_factors,
                 quality_factors,
                 payout_factors,
                 growth_factors,
                 efficiency_factors,
                 risk_factors], axis=1).sortlevel()
data.index.names = ['date', 'asset']

data['stock'] = data.index.get_level_values('asset').map(lambda x: x.asset_name)
data.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25100 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 56 columns):
Returns10D                     25100 non-null float64
Returns1D                      25100 non-null float64
Returns20D                     25100 non-null float64
Returns5D                      25100 non-null float64
Returns60D                     25100 non-null float64
DividendYield                  19739 non-null float64
EBITDAYield                    21929 non-null float64
EVToEBITDA                     21929 non-null float64
EVToFCF                        25005 non-null float64
PriceToBook                    25100 non-null float64
PriceToDilutedEarningsTTM      24985 non-null float64
PriceToEarningsTTM             24804 non-null float64
PriceToFCF                     25100 non-null float64
PriceToForwardEarnings         25080 non-null float64
PriceToOperatingCashflow       25100 non-null float64
PriceToSalesTTM                25100 non-null float64
Directional Movement Index     25100 non-null float64
Money Flow Index               25100 non-null float64
Percent Above Low              25018 non-null float64
Percent Below High             25018 non-null float64
Price Oscillator               25100 non-null float64
Trendline                      25018 non-null float64
AssetToEquityRatio             22464 non-null float64
AssetTurnover                  24985 non-null float64
CurrentRatio                   22464 non-null float64
DebtToAssetRatio               25080 non-null float64
DebtToEquityRatio              24551 non-null float64
InterestCoverage               20461 non-null float64
MertonsDD                      25100 non-null float64
WorkingCapitalToAssets         22464 non-null float64
WorkingCapitalToSales          22369 non-null float64
Dividend Growth                19558 non-null float64
Dividend Payout Ratio          19418 non-null float64
EPS                            24985 non-null float64
Net Debt                       23832 non-null float64
Sales                          25005 non-null float64
Total Assets                   25100 non-null float64
EPS Growth 3M                  24922 non-null float64
EPS Growth 12M                 24733 non-null float64
Net Debt Growth 3M             23772 non-null float64
Net Debt Growth 12M            23595 non-null float64
Sales Growth 3M                24942 non-null float64
Sales Growth 12M               24753 non-null float64
Total Assets Growth 3M         25037 non-null float64
Total Assets Growth 12M        24848 non-null float64
CFO To Assets                  25005 non-null float64
Capex To Assets                23566 non-null float64
Capex To FCF                   23566 non-null float64
Capex To Sales                 23566 non-null float64
EBIT To Assets                 22369 non-null float64
Retained Earnings To Assets    25005 non-null float64
Downside Risk                  25100 non-null float64
Index Beta                     25100 non-null float64
Log Market Cap                 25100 non-null float64
Volatility 3M                  25100 non-null float64
stock                          25100 non-null object
dtypes: float64(55), object(1)
memory usage: 10.9+ MB

Visualizing missing values¶

# Craete sorted dataframe of numeric_features with missing_count
missing_values0 = data.isnull().sum(axis=0).reset_index()
missing_values0.columns = ['column_name', 'missing_count']
missing_values0 = missing_values0.loc[missing_values0['missing_count']>0]
missing_values0 = missing_values0.sort_values(by='missing_count')

# Get percantage of total NaNs numeric_features
total0 = data.isnull().sum().sort_values(ascending=False)
percent0 = (data.isnull().sum()/data.isnull().count()).sort_values(ascending=False)
missing_data0 = pd.concat([total0, percent0], axis=1,join='outer', keys=['Total Missing Count', '% of Total Observations'])
missing_data0.index.name =' Numeric Feature'

missing_data0.head(len(data.columns))

ind0 = np.arange(missing_values0.shape[0])
width0 = 0.1
fig, ax = plt.subplots(figsize=(13,5))
colors0 = sns.color_palette('Set2', len(ind0))
rects0 = ax.bar(ind0, missing_values0.missing_count.values, color=colors0)
ax.set_xticks(ind0)
ax.set_xticklabels(missing_values0.column_name.values, rotation='vertical')
ax.set_ylabel("Count")
ax.set_title("Missing Observations Count")
ax.margins(0.001)
plt.show()

Remove columns and rows with less than 80% of data availability¶

In a next step, we remove rows and columns that lack more than 20 percent of the observations, resulting in a loss of six percent of the observations and 5 columns:

rows_before, cols_before = data.shape
data = (data
        .dropna(axis=1, thresh=int(len(data)*.8))
        .dropna(thresh=int(len(data.columns) * .8)))
#data = data.fillna(data.median())
data = data.bfill().ffill()
rows_after, cols_after = data.shape
print('{:,d} rows and {:,d} columns dropped'.format(rows_before-rows_after, cols_before-cols_after))

1,571 rows and 3 columns dropped

At this point, we have 51 features and the categorical identifier of the stock:

data.sort_index(1).info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 23529 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 53 columns):
AssetToEquityRatio             23529 non-null float64
AssetTurnover                  23529 non-null float64
CFO To Assets                  23529 non-null float64
Capex To Assets                23529 non-null float64
Capex To FCF                   23529 non-null float64
Capex To Sales                 23529 non-null float64
CurrentRatio                   23529 non-null float64
DebtToAssetRatio               23529 non-null float64
DebtToEquityRatio              23529 non-null float64
Directional Movement Index     23529 non-null float64
Downside Risk                  23529 non-null float64
EBIT To Assets                 23529 non-null float64
EBITDAYield                    23529 non-null float64
EPS                            23529 non-null float64
EPS Growth 12M                 23529 non-null float64
EPS Growth 3M                  23529 non-null float64
EVToEBITDA                     23529 non-null float64
EVToFCF                        23529 non-null float64
Index Beta                     23529 non-null float64
InterestCoverage               23529 non-null float64
Log Market Cap                 23529 non-null float64
MertonsDD                      23529 non-null float64
Money Flow Index               23529 non-null float64
Net Debt                       23529 non-null float64
Net Debt Growth 12M            23529 non-null float64
Net Debt Growth 3M             23529 non-null float64
Percent Above Low              23529 non-null float64
Percent Below High             23529 non-null float64
Price Oscillator               23529 non-null float64
PriceToBook                    23529 non-null float64
PriceToDilutedEarningsTTM      23529 non-null float64
PriceToEarningsTTM             23529 non-null float64
PriceToFCF                     23529 non-null float64
PriceToForwardEarnings         23529 non-null float64
PriceToOperatingCashflow       23529 non-null float64
PriceToSalesTTM                23529 non-null float64
Retained Earnings To Assets    23529 non-null float64
Returns10D                     23529 non-null float64
Returns1D                      23529 non-null float64
Returns20D                     23529 non-null float64
Returns5D                      23529 non-null float64
Returns60D                     23529 non-null float64
Sales                          23529 non-null float64
Sales Growth 12M               23529 non-null float64
Sales Growth 3M                23529 non-null float64
Total Assets                   23529 non-null float64
Total Assets Growth 12M        23529 non-null float64
Total Assets Growth 3M         23529 non-null float64
Trendline                      23529 non-null float64
Volatility 3M                  23529 non-null float64
WorkingCapitalToAssets         23529 non-null float64
WorkingCapitalToSales          23529 non-null float64
stock                          23529 non-null object
dtypes: float64(52), object(1)
memory usage: 9.7+ MB

Data Exploration¶

First lets take a look at the individual distributions of all our data.

data.hist(bins=25, figsize=(22,22))
plt.show()

It is always a good idea to check the relationship between your features and target variable. Here we will look at a scatter plot of the 60 day target variable along with the p-value, r2 score and mean IC (information coefficient) for each feature.

tmp = data.drop(['Returns1D','Returns5D','Returns10D','Returns20D'], axis=1)
tmp.reset_index(level=['asset'], inplace=True, drop=True)
tmp.head()

def r2(x, y):
    return pearsonr(x, y)[0] ** 2

count = 0
for i, feature in enumerate(list(tmp), 1):
    count += 1
    
    if(feature == 'Returns60D'):
        print()
        
    else:
        print('{} # {}'.format(feature, count))
        plt.figure(figsize=(8,5))
        
        cm = plt.get_cmap('jet')
        colors = np.linspace(0.1, 1, len(tmp))
                             
        sc = plt.scatter(tmp[feature], tmp['Returns60D'], s=25, c=colors, cmap=cm, 
                 edgecolor='k', alpha=0.3, label='Price Data')
        
        j = sns.regplot(tmp[feature], tmp['Returns60D'], data=tmp, scatter=False, 
                line_kws={'color':'k','lw':2, 'linestyle':'dashed'})
    
        cb = plt.colorbar(sc)
        cb.ax.set_yticklabels([str(p) for p in tmp[::len(tmp)//9].index],
                         fontdict = {'fontsize': 10,
                                     'fontweight': 'medium'})
    
        plt.xlabel('{}'.format(feature), size=10, labelpad=10, fontsize=10, fontweight='medium')
        plt.ylabel('Returns60D', size=10, labelpad=10, fontsize=10, fontweight='medium')
        plt.grid(False)
        ic, pval = spearmanr(tmp[feature], tmp['Returns60D'])
        R2 = r2(tmp[feature], tmp['Returns60D'])
        plt.title('r2 = {}, IC = {}, P-Value = {}'.format(round(R2,4), round(ic,4), pval))
            
        for j in range(2):
            plt.tick_params(axis='x', labelsize=10)
            plt.tick_params(axis='y', labelsize=10)
            
        plt.show()
        
        if(count == len(tmp.columns)-1):
            break

EBITDAYield # 2

EVToEBITDA # 3

EVToFCF # 4

PriceToBook # 5

PriceToDilutedEarningsTTM # 6

PriceToEarningsTTM # 7

PriceToFCF # 8

PriceToForwardEarnings # 9

PriceToOperatingCashflow # 10

PriceToSalesTTM # 11

For linear regression models, it is important to explore the correlation among the features to identify multicollinearity issues, and to check the correlation between the features and the target. The notebook contains a seaborn clustermap that shows the hierarchical structure of the feature correlation matrix. It identifies a small number of highly correlated clusters.

g = sns.clustermap(data.drop(['stock'] + return_cols, axis=1).corr(), square=True)
g.ax_heatmap.set_yticklabels(g.ax_heatmap.get_yticklabels(), rotation=0)
plt.title('Correlation of all_features',y=1, x=5,size=20)
plt.show();

Dummy encoding of categorical variables¶

We need to convert the categorical stock variable into a numeric format so that the linear regression can process it. For this purpose, we use dummy encoding that creates individual columns for each category level and flags the presence of this level in the original categorical column with an entry of 1, and 0 otherwise. The pandas function get_dummies() automates dummy encoding. It detects and properly converts columns of type objects as illustrated next. If you need dummy variables for columns containing integers, for instance, you can identify them using the keyword columns:

X = pd.get_dummies(data.drop(return_cols, axis=1), prefix_sep='_')
X.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 23529 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Columns: 116 entries, EBITDAYield to stock_WELLS FARGO & CO(NEW)
dtypes: float64(116)
memory usage: 21.0+ MB

Creating forward returns¶

The goal is to predict returns over a given holding period. Hence, we need to align the features with return values with the corresponding return data point 1, 5, 10, 20 or 60 days into the future for each equity. We achieve this by combining the pandas .groupby() method with the .shift() method as follows:

y = data.loc[:, return_cols]
shifted_y = []
for col in y.columns:
    t = int(re.search(r'\d+', col).group(0))
    shifted_y.append(y.groupby(level='asset')['Returns{}D'.format(t)].shift(-t).to_frame(col))
y = pd.concat(shifted_y, axis=1)
y.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 23529 entries, (2017-01-03 00:00:00+00:00, Equity(24 [AAPL])) to (2018-12-31 00:00:00+00:00, Equity(51157 [DD]))
Data columns (total 5 columns):
Returns1D     23460 non-null float64
Returns5D     23184 non-null float64
Returns10D    22839 non-null float64
Returns20D    22149 non-null float64
Returns60D    19508 non-null float64
dtypes: float64(5)
memory usage: 1.1+ MB

ax = sns.boxplot(y[return_cols])
ax.set_title('Return Distriubtions');

Linear Regression¶

Statsmodels¶

We can estimate a linear regression model using OLS with statsmodels. We select a forward return, for example for a 1-day holding period, remove outliers below the 2.5% and above the 97.5% percentiles, and fit the model accordingly:

target = 'Returns1D'
model_data = pd.concat([y[[target]], X], axis=1).dropna()
model_data = model_data[model_data[target].between(model_data[target].quantile(.025), 
                                                   model_data[target].quantile(.975))]

model = OLS(endog=model_data[target], exog=model_data.drop(target, axis=1))
trained_model = model.fit()
trained_model.summary()

The summary is available in the notebook to save some space due to the large number of variables. The diagnostic statistics show that, given the high p-value on the Jarque—Bera statistic, the hypothesis that the residuals are normally distributed cannot be rejected.

However, the Durbin—Watson statistic is low at 1.4 so we can reject the null hypothesis of no autocorrelation comfortably at the 5% level. Hence, the standard errors are likely positively correlated. If our goal were to understand which factors are significantly associated with forward returns, we would need to rerun the regression using robust standard errors (a parameter in statsmodels .fit() method), or use a different method altogether such as a panel model that allows for more complex error covariance.

target = 'Returns5D'
model_data = pd.concat([y[[target]], X], axis=1).dropna()
model_data = model_data[model_data[target].between(model_data[target].quantile(.025), 
                                                   model_data[target].quantile(.975))]

model = OLS(endog=model_data[target], exog=model_data.drop(target, axis=1))
trained_model = model.fit()
trained_model.summary()

target = 'Returns10D'
model_data = pd.concat([y[[target]], X], axis=1).dropna()
model_data = model_data[model_data[target].between(model_data[target].quantile(.025), 
                                                   model_data[target].quantile(.975))]

model = OLS(endog=model_data[target], exog=model_data.drop(target, axis=1))
trained_model = model.fit()
trained_model.summary()

target = 'Returns20D'
model_data = pd.concat([y[[target]], X], axis=1).dropna()
model_data = model_data[model_data[target].between(model_data[target].quantile(.025), 
                                                   model_data[target].quantile(.975))]

model = OLS(endog=model_data[target], exog=model_data.drop(target, axis=1))
trained_model = model.fit()
trained_model.summary()

target = 'Returns60D'
model_data = pd.concat([y[[target]], X], axis=1).dropna()
model_data = model_data[model_data[target].between(model_data[target].quantile(.025), 
                                                   model_data[target].quantile(.975))]

model = OLS(endog=model_data[target], exog=model_data.drop(target, axis=1))
trained_model = model.fit()
trained_model.summary()

Linear Models for Prediction: sklearn¶

Since sklearn is tailored towards prediction, we will evaluate the linear regression model based on its predictive performance using cross-validation.

Custom Time Series Cross-Validation¶

Our data consists of grouped time series data that requires a custom cross-validation function to provide the train and test indices that ensure that the test data immediately follows the training data for each equity and we do not inadvertently create a look-ahead bias or leakage.

We can achieve this using the following function that returns a generator yielding pairs of train and test dates. The set of train dates that ensure a minimum length of the training periods. The number of pairs depends on the parameter nfolds. The distinct test periods do not overlap and are located at the end of the period available in the data. After a test period is used, it becomes part of the training data that grow in size accordingly:

def time_series_split(d=model_data, nfolds=5, min_train=21):
    """Generate train/test dates for nfolds 
    with at least min_train train obs
    """
    train_dates = d[:min_train].tolist()
    n = int(len(dates)/(nfolds + 1)) + 1
    test_folds = [d[i:i + n] for i in range(min_train, len(d), n)]
    for test_dates in test_folds:
        if len(train_dates) > min_train:
            yield train_dates, test_dates
        train_dates.extend(test_dates)

Select Features and Target¶

We need to select the appropriate return series (we will use a 60-day holding period) and remove outliers. We will also convert returns to log returns as follows:

target = 'Returns60D'
outliers = .01
model_data = pd.concat([y[[target]], X], axis=1).dropna().reset_index('asset', drop=True)
model_data = model_data[model_data[target].between(*model_data[target].quantile([outliers, 1-outliers]).values)] 

model_data[target] = np.log1p(model_data[target])
features = model_data.drop(target, axis=1).columns
dates = model_data.index.unique()

print(model_data.info())

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 19116 entries, 2017-01-03 to 2018-10-03
Columns: 117 entries, Returns60D to stock_WELLS FARGO & CO(NEW)
dtypes: float64(117)
memory usage: 17.2 MB
None

model_data[target].describe()

count    19116.000000
mean         0.022847
std          0.109096
min         -0.349525
25%         -0.043959
50%          0.033707
75%          0.092706
max          0.377204
Name: Returns60D, dtype: float64

idx = pd.IndexSlice

Train/Test Split¶

We will use 250 folds to generally predict about 2 days of forward returns following the historical training data that will gradually increase in length.

nfolds = 250
    
for train_dates, test_dates in time_series_split(dates, nfolds=nfolds):
        
    print('{} train_dates from {} - {}'.format(len(train_dates), str(train_dates[0]), 
                                                   str(train_dates[len(train_dates)-1])))
        
    print('{} test_dates from {} - {}'.format(len(test_dates),str(test_dates[0]), 
                                                    str(test_dates[len(test_dates)-1])))
    print()

23 train_dates from 2017-01-03 00:00:00+00:00 - 2017-02-03 00:00:00+00:00
2 test_dates from 2017-02-06 00:00:00+00:00 - 2017-02-07 00:00:00+00:00

25 train_dates from 2017-01-03 00:00:00+00:00 - 2017-02-07 00:00:00+00:00
2 test_dates from 2017-02-08 00:00:00+00:00 - 2017-02-09 00:00:00+00:00

27 train_dates from 2017-01-03 00:00:00+00:00 - 2017-02-09 00:00:00+00:00
2 test_dates from 2017-02-10 00:00:00+00:00 - 2017-02-13 00:00:00+00:00

29 train_dates from 2017-01-03 00:00:00+00:00 - 2017-02-13 00:00:00+00:00
2 test_dates from 2017-02-14 00:00:00+00:00 - 2017-02-15 00:00:00+00:00

31 train_dates from 2017-01-03 00:00:00+00:00 - 2017-02-15 00:00:00+00:00
2 test_dates from 2017-02-16 00:00:00+00:00 - 2017-02-17 00:00:00+00:00

33 train_dates from 2017-01-03 00:00:00+00:00 - 2017-02-17 00:00:00+00:00
2 test_dates from 2017-02-21 00:00:00+00:00 - 2017-02-22 00:00:00+00:00

35 train_dates from 2017-01-03 00:00:00+00:00 - 2017-02-22 00:00:00+00:00
2 test_dates from 2017-02-23 00:00:00+00:00 - 2017-02-24 00:00:00+00:00

37 train_dates from 2017-01-03 00:00:00+00:00 - 2017-02-24 00:00:00+00:00
2 test_dates from 2017-02-27 00:00:00+00:00 - 2017-02-28 00:00:00+00:00

39 train_dates from 2017-01-03 00:00:00+00:00 - 2017-02-28 00:00:00+00:00
2 test_dates from 2017-03-01 00:00:00+00:00 - 2017-03-02 00:00:00+00:00

41 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-02 00:00:00+00:00
2 test_dates from 2017-03-03 00:00:00+00:00 - 2017-03-06 00:00:00+00:00

43 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-06 00:00:00+00:00
2 test_dates from 2017-03-07 00:00:00+00:00 - 2017-03-08 00:00:00+00:00

45 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-08 00:00:00+00:00
2 test_dates from 2017-03-09 00:00:00+00:00 - 2017-03-10 00:00:00+00:00

47 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-10 00:00:00+00:00
2 test_dates from 2017-03-13 00:00:00+00:00 - 2017-03-14 00:00:00+00:00

49 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-14 00:00:00+00:00
2 test_dates from 2017-03-15 00:00:00+00:00 - 2017-03-16 00:00:00+00:00

51 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-16 00:00:00+00:00
2 test_dates from 2017-03-17 00:00:00+00:00 - 2017-03-20 00:00:00+00:00

53 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-20 00:00:00+00:00
2 test_dates from 2017-03-21 00:00:00+00:00 - 2017-03-22 00:00:00+00:00

55 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-22 00:00:00+00:00
2 test_dates from 2017-03-23 00:00:00+00:00 - 2017-03-24 00:00:00+00:00

57 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-24 00:00:00+00:00
2 test_dates from 2017-03-27 00:00:00+00:00 - 2017-03-28 00:00:00+00:00

59 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-28 00:00:00+00:00
2 test_dates from 2017-03-29 00:00:00+00:00 - 2017-03-30 00:00:00+00:00

61 train_dates from 2017-01-03 00:00:00+00:00 - 2017-03-30 00:00:00+00:00
2 test_dates from 2017-03-31 00:00:00+00:00 - 2017-04-03 00:00:00+00:00

63 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-03 00:00:00+00:00
2 test_dates from 2017-04-04 00:00:00+00:00 - 2017-04-05 00:00:00+00:00

65 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-05 00:00:00+00:00
2 test_dates from 2017-04-06 00:00:00+00:00 - 2017-04-07 00:00:00+00:00

67 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-07 00:00:00+00:00
2 test_dates from 2017-04-10 00:00:00+00:00 - 2017-04-11 00:00:00+00:00

69 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-11 00:00:00+00:00
2 test_dates from 2017-04-12 00:00:00+00:00 - 2017-04-13 00:00:00+00:00

71 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-13 00:00:00+00:00
2 test_dates from 2017-04-17 00:00:00+00:00 - 2017-04-18 00:00:00+00:00

73 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-18 00:00:00+00:00
2 test_dates from 2017-04-19 00:00:00+00:00 - 2017-04-20 00:00:00+00:00

75 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-20 00:00:00+00:00
2 test_dates from 2017-04-21 00:00:00+00:00 - 2017-04-24 00:00:00+00:00

77 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-24 00:00:00+00:00
2 test_dates from 2017-04-25 00:00:00+00:00 - 2017-04-26 00:00:00+00:00

79 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-26 00:00:00+00:00
2 test_dates from 2017-04-27 00:00:00+00:00 - 2017-04-28 00:00:00+00:00

81 train_dates from 2017-01-03 00:00:00+00:00 - 2017-04-28 00:00:00+00:00
2 test_dates from 2017-05-01 00:00:00+00:00 - 2017-05-02 00:00:00+00:00

83 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-02 00:00:00+00:00
2 test_dates from 2017-05-03 00:00:00+00:00 - 2017-05-04 00:00:00+00:00

85 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-04 00:00:00+00:00
2 test_dates from 2017-05-05 00:00:00+00:00 - 2017-05-08 00:00:00+00:00

87 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-08 00:00:00+00:00
2 test_dates from 2017-05-09 00:00:00+00:00 - 2017-05-10 00:00:00+00:00

89 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-10 00:00:00+00:00
2 test_dates from 2017-05-11 00:00:00+00:00 - 2017-05-12 00:00:00+00:00

91 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-12 00:00:00+00:00
2 test_dates from 2017-05-15 00:00:00+00:00 - 2017-05-16 00:00:00+00:00

93 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-16 00:00:00+00:00
2 test_dates from 2017-05-17 00:00:00+00:00 - 2017-05-18 00:00:00+00:00

95 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-18 00:00:00+00:00
2 test_dates from 2017-05-19 00:00:00+00:00 - 2017-05-22 00:00:00+00:00

97 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-22 00:00:00+00:00
2 test_dates from 2017-05-23 00:00:00+00:00 - 2017-05-24 00:00:00+00:00

99 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-24 00:00:00+00:00
2 test_dates from 2017-05-25 00:00:00+00:00 - 2017-05-26 00:00:00+00:00

101 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-26 00:00:00+00:00
2 test_dates from 2017-05-30 00:00:00+00:00 - 2017-05-31 00:00:00+00:00

103 train_dates from 2017-01-03 00:00:00+00:00 - 2017-05-31 00:00:00+00:00
2 test_dates from 2017-06-01 00:00:00+00:00 - 2017-06-02 00:00:00+00:00

105 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-02 00:00:00+00:00
2 test_dates from 2017-06-05 00:00:00+00:00 - 2017-06-06 00:00:00+00:00

107 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-06 00:00:00+00:00
2 test_dates from 2017-06-07 00:00:00+00:00 - 2017-06-08 00:00:00+00:00

109 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-08 00:00:00+00:00
2 test_dates from 2017-06-09 00:00:00+00:00 - 2017-06-12 00:00:00+00:00

111 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-12 00:00:00+00:00
2 test_dates from 2017-06-13 00:00:00+00:00 - 2017-06-14 00:00:00+00:00

113 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-14 00:00:00+00:00
2 test_dates from 2017-06-15 00:00:00+00:00 - 2017-06-16 00:00:00+00:00

115 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-16 00:00:00+00:00
2 test_dates from 2017-06-19 00:00:00+00:00 - 2017-06-20 00:00:00+00:00

117 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-20 00:00:00+00:00
2 test_dates from 2017-06-21 00:00:00+00:00 - 2017-06-22 00:00:00+00:00

119 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-22 00:00:00+00:00
2 test_dates from 2017-06-23 00:00:00+00:00 - 2017-06-26 00:00:00+00:00

121 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-26 00:00:00+00:00
2 test_dates from 2017-06-27 00:00:00+00:00 - 2017-06-28 00:00:00+00:00

123 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-28 00:00:00+00:00
2 test_dates from 2017-06-29 00:00:00+00:00 - 2017-06-30 00:00:00+00:00

125 train_dates from 2017-01-03 00:00:00+00:00 - 2017-06-30 00:00:00+00:00
2 test_dates from 2017-07-03 00:00:00+00:00 - 2017-07-05 00:00:00+00:00

127 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-05 00:00:00+00:00
2 test_dates from 2017-07-06 00:00:00+00:00 - 2017-07-07 00:00:00+00:00

129 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-07 00:00:00+00:00
2 test_dates from 2017-07-10 00:00:00+00:00 - 2017-07-11 00:00:00+00:00

131 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-11 00:00:00+00:00
2 test_dates from 2017-07-12 00:00:00+00:00 - 2017-07-13 00:00:00+00:00

133 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-13 00:00:00+00:00
2 test_dates from 2017-07-14 00:00:00+00:00 - 2017-07-17 00:00:00+00:00

135 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-17 00:00:00+00:00
2 test_dates from 2017-07-18 00:00:00+00:00 - 2017-07-19 00:00:00+00:00

137 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-19 00:00:00+00:00
2 test_dates from 2017-07-20 00:00:00+00:00 - 2017-07-21 00:00:00+00:00

139 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-21 00:00:00+00:00
2 test_dates from 2017-07-24 00:00:00+00:00 - 2017-07-25 00:00:00+00:00

141 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-25 00:00:00+00:00
2 test_dates from 2017-07-26 00:00:00+00:00 - 2017-07-27 00:00:00+00:00

143 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-27 00:00:00+00:00
2 test_dates from 2017-07-28 00:00:00+00:00 - 2017-07-31 00:00:00+00:00

145 train_dates from 2017-01-03 00:00:00+00:00 - 2017-07-31 00:00:00+00:00
2 test_dates from 2017-08-01 00:00:00+00:00 - 2017-08-02 00:00:00+00:00

147 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-02 00:00:00+00:00
2 test_dates from 2017-08-03 00:00:00+00:00 - 2017-08-04 00:00:00+00:00

149 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-04 00:00:00+00:00
2 test_dates from 2017-08-07 00:00:00+00:00 - 2017-08-08 00:00:00+00:00

151 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-08 00:00:00+00:00
2 test_dates from 2017-08-09 00:00:00+00:00 - 2017-08-10 00:00:00+00:00

153 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-10 00:00:00+00:00
2 test_dates from 2017-08-11 00:00:00+00:00 - 2017-08-14 00:00:00+00:00

155 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-14 00:00:00+00:00
2 test_dates from 2017-08-15 00:00:00+00:00 - 2017-08-16 00:00:00+00:00

157 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-16 00:00:00+00:00
2 test_dates from 2017-08-17 00:00:00+00:00 - 2017-08-18 00:00:00+00:00

159 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-18 00:00:00+00:00
2 test_dates from 2017-08-21 00:00:00+00:00 - 2017-08-22 00:00:00+00:00

161 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-22 00:00:00+00:00
2 test_dates from 2017-08-23 00:00:00+00:00 - 2017-08-24 00:00:00+00:00

163 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-24 00:00:00+00:00
2 test_dates from 2017-08-25 00:00:00+00:00 - 2017-08-28 00:00:00+00:00

165 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-28 00:00:00+00:00
2 test_dates from 2017-08-29 00:00:00+00:00 - 2017-08-30 00:00:00+00:00

167 train_dates from 2017-01-03 00:00:00+00:00 - 2017-08-30 00:00:00+00:00
2 test_dates from 2017-08-31 00:00:00+00:00 - 2017-09-01 00:00:00+00:00

169 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-01 00:00:00+00:00
2 test_dates from 2017-09-05 00:00:00+00:00 - 2017-09-06 00:00:00+00:00

171 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-06 00:00:00+00:00
2 test_dates from 2017-09-07 00:00:00+00:00 - 2017-09-08 00:00:00+00:00

173 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-08 00:00:00+00:00
2 test_dates from 2017-09-11 00:00:00+00:00 - 2017-09-12 00:00:00+00:00

175 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-12 00:00:00+00:00
2 test_dates from 2017-09-13 00:00:00+00:00 - 2017-09-14 00:00:00+00:00

177 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-14 00:00:00+00:00
2 test_dates from 2017-09-15 00:00:00+00:00 - 2017-09-18 00:00:00+00:00

179 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-18 00:00:00+00:00
2 test_dates from 2017-09-19 00:00:00+00:00 - 2017-09-20 00:00:00+00:00

181 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-20 00:00:00+00:00
2 test_dates from 2017-09-21 00:00:00+00:00 - 2017-09-22 00:00:00+00:00

183 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-22 00:00:00+00:00
2 test_dates from 2017-09-25 00:00:00+00:00 - 2017-09-26 00:00:00+00:00

185 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-26 00:00:00+00:00
2 test_dates from 2017-09-27 00:00:00+00:00 - 2017-09-28 00:00:00+00:00

187 train_dates from 2017-01-03 00:00:00+00:00 - 2017-09-28 00:00:00+00:00
2 test_dates from 2017-09-29 00:00:00+00:00 - 2017-10-02 00:00:00+00:00

189 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-02 00:00:00+00:00
2 test_dates from 2017-10-03 00:00:00+00:00 - 2017-10-04 00:00:00+00:00

191 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-04 00:00:00+00:00
2 test_dates from 2017-10-05 00:00:00+00:00 - 2017-10-06 00:00:00+00:00

193 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-06 00:00:00+00:00
2 test_dates from 2017-10-09 00:00:00+00:00 - 2017-10-10 00:00:00+00:00

195 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-10 00:00:00+00:00
2 test_dates from 2017-10-11 00:00:00+00:00 - 2017-10-12 00:00:00+00:00

197 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-12 00:00:00+00:00
2 test_dates from 2017-10-13 00:00:00+00:00 - 2017-10-16 00:00:00+00:00

199 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-16 00:00:00+00:00
2 test_dates from 2017-10-17 00:00:00+00:00 - 2017-10-18 00:00:00+00:00

201 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-18 00:00:00+00:00
2 test_dates from 2017-10-19 00:00:00+00:00 - 2017-10-20 00:00:00+00:00

203 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-20 00:00:00+00:00
2 test_dates from 2017-10-23 00:00:00+00:00 - 2017-10-24 00:00:00+00:00

205 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-24 00:00:00+00:00
2 test_dates from 2017-10-25 00:00:00+00:00 - 2017-10-26 00:00:00+00:00

207 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-26 00:00:00+00:00
2 test_dates from 2017-10-27 00:00:00+00:00 - 2017-10-30 00:00:00+00:00

209 train_dates from 2017-01-03 00:00:00+00:00 - 2017-10-30 00:00:00+00:00
2 test_dates from 2017-10-31 00:00:00+00:00 - 2017-11-01 00:00:00+00:00

211 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-01 00:00:00+00:00
2 test_dates from 2017-11-02 00:00:00+00:00 - 2017-11-03 00:00:00+00:00

213 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-03 00:00:00+00:00
2 test_dates from 2017-11-06 00:00:00+00:00 - 2017-11-07 00:00:00+00:00

215 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-07 00:00:00+00:00
2 test_dates from 2017-11-08 00:00:00+00:00 - 2017-11-09 00:00:00+00:00

217 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-09 00:00:00+00:00
2 test_dates from 2017-11-10 00:00:00+00:00 - 2017-11-13 00:00:00+00:00

219 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-13 00:00:00+00:00
2 test_dates from 2017-11-14 00:00:00+00:00 - 2017-11-15 00:00:00+00:00

221 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-15 00:00:00+00:00
2 test_dates from 2017-11-16 00:00:00+00:00 - 2017-11-17 00:00:00+00:00

223 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-17 00:00:00+00:00
2 test_dates from 2017-11-20 00:00:00+00:00 - 2017-11-21 00:00:00+00:00

225 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-21 00:00:00+00:00
2 test_dates from 2017-11-22 00:00:00+00:00 - 2017-11-24 00:00:00+00:00

227 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-24 00:00:00+00:00
2 test_dates from 2017-11-27 00:00:00+00:00 - 2017-11-28 00:00:00+00:00

229 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-28 00:00:00+00:00
2 test_dates from 2017-11-29 00:00:00+00:00 - 2017-11-30 00:00:00+00:00

231 train_dates from 2017-01-03 00:00:00+00:00 - 2017-11-30 00:00:00+00:00
2 test_dates from 2017-12-01 00:00:00+00:00 - 2017-12-04 00:00:00+00:00

233 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-04 00:00:00+00:00
2 test_dates from 2017-12-05 00:00:00+00:00 - 2017-12-06 00:00:00+00:00

235 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-06 00:00:00+00:00
2 test_dates from 2017-12-07 00:00:00+00:00 - 2017-12-08 00:00:00+00:00

237 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-08 00:00:00+00:00
2 test_dates from 2017-12-11 00:00:00+00:00 - 2017-12-12 00:00:00+00:00

239 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-12 00:00:00+00:00
2 test_dates from 2017-12-13 00:00:00+00:00 - 2017-12-14 00:00:00+00:00

241 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-14 00:00:00+00:00
2 test_dates from 2017-12-15 00:00:00+00:00 - 2017-12-18 00:00:00+00:00

243 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-18 00:00:00+00:00
2 test_dates from 2017-12-19 00:00:00+00:00 - 2017-12-20 00:00:00+00:00

245 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-20 00:00:00+00:00
2 test_dates from 2017-12-21 00:00:00+00:00 - 2017-12-22 00:00:00+00:00

247 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-22 00:00:00+00:00
2 test_dates from 2017-12-26 00:00:00+00:00 - 2017-12-27 00:00:00+00:00

249 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-27 00:00:00+00:00
2 test_dates from 2017-12-28 00:00:00+00:00 - 2017-12-29 00:00:00+00:00

251 train_dates from 2017-01-03 00:00:00+00:00 - 2017-12-29 00:00:00+00:00
2 test_dates from 2018-01-02 00:00:00+00:00 - 2018-01-03 00:00:00+00:00

253 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-03 00:00:00+00:00
2 test_dates from 2018-01-04 00:00:00+00:00 - 2018-01-05 00:00:00+00:00

255 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-05 00:00:00+00:00
2 test_dates from 2018-01-08 00:00:00+00:00 - 2018-01-09 00:00:00+00:00

257 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-09 00:00:00+00:00
2 test_dates from 2018-01-10 00:00:00+00:00 - 2018-01-11 00:00:00+00:00

259 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-11 00:00:00+00:00
2 test_dates from 2018-01-12 00:00:00+00:00 - 2018-01-16 00:00:00+00:00

261 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-16 00:00:00+00:00
2 test_dates from 2018-01-17 00:00:00+00:00 - 2018-01-18 00:00:00+00:00

263 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-18 00:00:00+00:00
2 test_dates from 2018-01-19 00:00:00+00:00 - 2018-01-22 00:00:00+00:00

265 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-22 00:00:00+00:00
2 test_dates from 2018-01-23 00:00:00+00:00 - 2018-01-24 00:00:00+00:00

267 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-24 00:00:00+00:00
2 test_dates from 2018-01-25 00:00:00+00:00 - 2018-01-26 00:00:00+00:00

269 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-26 00:00:00+00:00
2 test_dates from 2018-01-29 00:00:00+00:00 - 2018-01-30 00:00:00+00:00

271 train_dates from 2017-01-03 00:00:00+00:00 - 2018-01-30 00:00:00+00:00
2 test_dates from 2018-01-31 00:00:00+00:00 - 2018-02-01 00:00:00+00:00

273 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-01 00:00:00+00:00
2 test_dates from 2018-02-02 00:00:00+00:00 - 2018-02-05 00:00:00+00:00

275 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-05 00:00:00+00:00
2 test_dates from 2018-02-06 00:00:00+00:00 - 2018-02-07 00:00:00+00:00

277 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-07 00:00:00+00:00
2 test_dates from 2018-02-08 00:00:00+00:00 - 2018-02-09 00:00:00+00:00

279 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-09 00:00:00+00:00
2 test_dates from 2018-02-12 00:00:00+00:00 - 2018-02-13 00:00:00+00:00

281 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-13 00:00:00+00:00
2 test_dates from 2018-02-14 00:00:00+00:00 - 2018-02-15 00:00:00+00:00

283 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-15 00:00:00+00:00
2 test_dates from 2018-02-16 00:00:00+00:00 - 2018-02-20 00:00:00+00:00

285 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-20 00:00:00+00:00
2 test_dates from 2018-02-21 00:00:00+00:00 - 2018-02-22 00:00:00+00:00

287 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-22 00:00:00+00:00
2 test_dates from 2018-02-23 00:00:00+00:00 - 2018-02-26 00:00:00+00:00

289 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-26 00:00:00+00:00
2 test_dates from 2018-02-27 00:00:00+00:00 - 2018-02-28 00:00:00+00:00

291 train_dates from 2017-01-03 00:00:00+00:00 - 2018-02-28 00:00:00+00:00
2 test_dates from 2018-03-01 00:00:00+00:00 - 2018-03-02 00:00:00+00:00

293 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-02 00:00:00+00:00
2 test_dates from 2018-03-05 00:00:00+00:00 - 2018-03-06 00:00:00+00:00

295 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-06 00:00:00+00:00
2 test_dates from 2018-03-07 00:00:00+00:00 - 2018-03-08 00:00:00+00:00

297 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-08 00:00:00+00:00
2 test_dates from 2018-03-09 00:00:00+00:00 - 2018-03-12 00:00:00+00:00

299 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-12 00:00:00+00:00
2 test_dates from 2018-03-13 00:00:00+00:00 - 2018-03-14 00:00:00+00:00

301 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-14 00:00:00+00:00
2 test_dates from 2018-03-15 00:00:00+00:00 - 2018-03-16 00:00:00+00:00

303 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-16 00:00:00+00:00
2 test_dates from 2018-03-19 00:00:00+00:00 - 2018-03-20 00:00:00+00:00

305 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-20 00:00:00+00:00
2 test_dates from 2018-03-21 00:00:00+00:00 - 2018-03-22 00:00:00+00:00

307 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-22 00:00:00+00:00
2 test_dates from 2018-03-23 00:00:00+00:00 - 2018-03-26 00:00:00+00:00

309 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-26 00:00:00+00:00
2 test_dates from 2018-03-27 00:00:00+00:00 - 2018-03-28 00:00:00+00:00

311 train_dates from 2017-01-03 00:00:00+00:00 - 2018-03-28 00:00:00+00:00
2 test_dates from 2018-03-29 00:00:00+00:00 - 2018-04-02 00:00:00+00:00

313 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-02 00:00:00+00:00
2 test_dates from 2018-04-03 00:00:00+00:00 - 2018-04-04 00:00:00+00:00

315 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-04 00:00:00+00:00
2 test_dates from 2018-04-05 00:00:00+00:00 - 2018-04-06 00:00:00+00:00

317 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-06 00:00:00+00:00
2 test_dates from 2018-04-09 00:00:00+00:00 - 2018-04-10 00:00:00+00:00

319 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-10 00:00:00+00:00
2 test_dates from 2018-04-11 00:00:00+00:00 - 2018-04-12 00:00:00+00:00

321 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-12 00:00:00+00:00
2 test_dates from 2018-04-13 00:00:00+00:00 - 2018-04-16 00:00:00+00:00

323 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-16 00:00:00+00:00
2 test_dates from 2018-04-17 00:00:00+00:00 - 2018-04-18 00:00:00+00:00

325 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-18 00:00:00+00:00
2 test_dates from 2018-04-19 00:00:00+00:00 - 2018-04-20 00:00:00+00:00

327 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-20 00:00:00+00:00
2 test_dates from 2018-04-23 00:00:00+00:00 - 2018-04-24 00:00:00+00:00

329 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-24 00:00:00+00:00
2 test_dates from 2018-04-25 00:00:00+00:00 - 2018-04-26 00:00:00+00:00

331 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-26 00:00:00+00:00
2 test_dates from 2018-04-27 00:00:00+00:00 - 2018-04-30 00:00:00+00:00

333 train_dates from 2017-01-03 00:00:00+00:00 - 2018-04-30 00:00:00+00:00
2 test_dates from 2018-05-01 00:00:00+00:00 - 2018-05-02 00:00:00+00:00

335 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-02 00:00:00+00:00
2 test_dates from 2018-05-03 00:00:00+00:00 - 2018-05-04 00:00:00+00:00

337 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-04 00:00:00+00:00
2 test_dates from 2018-05-07 00:00:00+00:00 - 2018-05-08 00:00:00+00:00

339 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-08 00:00:00+00:00
2 test_dates from 2018-05-09 00:00:00+00:00 - 2018-05-10 00:00:00+00:00

341 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-10 00:00:00+00:00
2 test_dates from 2018-05-11 00:00:00+00:00 - 2018-05-14 00:00:00+00:00

343 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-14 00:00:00+00:00
2 test_dates from 2018-05-15 00:00:00+00:00 - 2018-05-16 00:00:00+00:00

345 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-16 00:00:00+00:00
2 test_dates from 2018-05-17 00:00:00+00:00 - 2018-05-18 00:00:00+00:00

347 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-18 00:00:00+00:00
2 test_dates from 2018-05-21 00:00:00+00:00 - 2018-05-22 00:00:00+00:00

349 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-22 00:00:00+00:00
2 test_dates from 2018-05-23 00:00:00+00:00 - 2018-05-24 00:00:00+00:00

351 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-24 00:00:00+00:00
2 test_dates from 2018-05-25 00:00:00+00:00 - 2018-05-29 00:00:00+00:00

353 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-29 00:00:00+00:00
2 test_dates from 2018-05-30 00:00:00+00:00 - 2018-05-31 00:00:00+00:00

355 train_dates from 2017-01-03 00:00:00+00:00 - 2018-05-31 00:00:00+00:00
2 test_dates from 2018-06-01 00:00:00+00:00 - 2018-06-04 00:00:00+00:00

357 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-04 00:00:00+00:00
2 test_dates from 2018-06-05 00:00:00+00:00 - 2018-06-06 00:00:00+00:00

359 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-06 00:00:00+00:00
2 test_dates from 2018-06-07 00:00:00+00:00 - 2018-06-08 00:00:00+00:00

361 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-08 00:00:00+00:00
2 test_dates from 2018-06-11 00:00:00+00:00 - 2018-06-12 00:00:00+00:00

363 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-12 00:00:00+00:00
2 test_dates from 2018-06-13 00:00:00+00:00 - 2018-06-14 00:00:00+00:00

365 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-14 00:00:00+00:00
2 test_dates from 2018-06-15 00:00:00+00:00 - 2018-06-18 00:00:00+00:00

367 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-18 00:00:00+00:00
2 test_dates from 2018-06-19 00:00:00+00:00 - 2018-06-20 00:00:00+00:00

369 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-20 00:00:00+00:00
2 test_dates from 2018-06-21 00:00:00+00:00 - 2018-06-22 00:00:00+00:00

371 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-22 00:00:00+00:00
2 test_dates from 2018-06-25 00:00:00+00:00 - 2018-06-26 00:00:00+00:00

373 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-26 00:00:00+00:00
2 test_dates from 2018-06-27 00:00:00+00:00 - 2018-06-28 00:00:00+00:00

375 train_dates from 2017-01-03 00:00:00+00:00 - 2018-06-28 00:00:00+00:00
2 test_dates from 2018-06-29 00:00:00+00:00 - 2018-07-02 00:00:00+00:00

377 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-02 00:00:00+00:00
2 test_dates from 2018-07-03 00:00:00+00:00 - 2018-07-05 00:00:00+00:00

379 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-05 00:00:00+00:00
2 test_dates from 2018-07-06 00:00:00+00:00 - 2018-07-09 00:00:00+00:00

381 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-09 00:00:00+00:00
2 test_dates from 2018-07-10 00:00:00+00:00 - 2018-07-11 00:00:00+00:00

383 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-11 00:00:00+00:00
2 test_dates from 2018-07-12 00:00:00+00:00 - 2018-07-13 00:00:00+00:00

385 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-13 00:00:00+00:00
2 test_dates from 2018-07-16 00:00:00+00:00 - 2018-07-17 00:00:00+00:00

387 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-17 00:00:00+00:00
2 test_dates from 2018-07-18 00:00:00+00:00 - 2018-07-19 00:00:00+00:00

389 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-19 00:00:00+00:00
2 test_dates from 2018-07-20 00:00:00+00:00 - 2018-07-23 00:00:00+00:00

391 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-23 00:00:00+00:00
2 test_dates from 2018-07-24 00:00:00+00:00 - 2018-07-25 00:00:00+00:00

393 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-25 00:00:00+00:00
2 test_dates from 2018-07-26 00:00:00+00:00 - 2018-07-27 00:00:00+00:00

395 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-27 00:00:00+00:00
2 test_dates from 2018-07-30 00:00:00+00:00 - 2018-07-31 00:00:00+00:00

397 train_dates from 2017-01-03 00:00:00+00:00 - 2018-07-31 00:00:00+00:00
2 test_dates from 2018-08-01 00:00:00+00:00 - 2018-08-02 00:00:00+00:00

399 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-02 00:00:00+00:00
2 test_dates from 2018-08-03 00:00:00+00:00 - 2018-08-06 00:00:00+00:00

401 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-06 00:00:00+00:00
2 test_dates from 2018-08-07 00:00:00+00:00 - 2018-08-08 00:00:00+00:00

403 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-08 00:00:00+00:00
2 test_dates from 2018-08-09 00:00:00+00:00 - 2018-08-10 00:00:00+00:00

405 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-10 00:00:00+00:00
2 test_dates from 2018-08-13 00:00:00+00:00 - 2018-08-14 00:00:00+00:00

407 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-14 00:00:00+00:00
2 test_dates from 2018-08-15 00:00:00+00:00 - 2018-08-16 00:00:00+00:00

409 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-16 00:00:00+00:00
2 test_dates from 2018-08-17 00:00:00+00:00 - 2018-08-20 00:00:00+00:00

411 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-20 00:00:00+00:00
2 test_dates from 2018-08-21 00:00:00+00:00 - 2018-08-22 00:00:00+00:00

413 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-22 00:00:00+00:00
2 test_dates from 2018-08-23 00:00:00+00:00 - 2018-08-24 00:00:00+00:00

415 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-24 00:00:00+00:00
2 test_dates from 2018-08-27 00:00:00+00:00 - 2018-08-28 00:00:00+00:00

417 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-28 00:00:00+00:00
2 test_dates from 2018-08-29 00:00:00+00:00 - 2018-08-30 00:00:00+00:00

419 train_dates from 2017-01-03 00:00:00+00:00 - 2018-08-30 00:00:00+00:00
2 test_dates from 2018-08-31 00:00:00+00:00 - 2018-09-04 00:00:00+00:00

421 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-04 00:00:00+00:00
2 test_dates from 2018-09-05 00:00:00+00:00 - 2018-09-06 00:00:00+00:00

423 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-06 00:00:00+00:00
2 test_dates from 2018-09-07 00:00:00+00:00 - 2018-09-10 00:00:00+00:00

425 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-10 00:00:00+00:00
2 test_dates from 2018-09-11 00:00:00+00:00 - 2018-09-12 00:00:00+00:00

427 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-12 00:00:00+00:00
2 test_dates from 2018-09-13 00:00:00+00:00 - 2018-09-14 00:00:00+00:00

429 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-14 00:00:00+00:00
2 test_dates from 2018-09-17 00:00:00+00:00 - 2018-09-18 00:00:00+00:00

431 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-18 00:00:00+00:00
2 test_dates from 2018-09-19 00:00:00+00:00 - 2018-09-20 00:00:00+00:00

433 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-20 00:00:00+00:00
2 test_dates from 2018-09-21 00:00:00+00:00 - 2018-09-24 00:00:00+00:00

435 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-24 00:00:00+00:00
2 test_dates from 2018-09-25 00:00:00+00:00 - 2018-09-26 00:00:00+00:00

437 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-26 00:00:00+00:00
2 test_dates from 2018-09-27 00:00:00+00:00 - 2018-09-28 00:00:00+00:00

439 train_dates from 2017-01-03 00:00:00+00:00 - 2018-09-28 00:00:00+00:00
2 test_dates from 2018-10-01 00:00:00+00:00 - 2018-10-02 00:00:00+00:00

441 train_dates from 2017-01-03 00:00:00+00:00 - 2018-10-02 00:00:00+00:00
1 test_dates from 2018-10-03 00:00:00+00:00 - 2018-10-03 00:00:00+00:00

OLS Linear Regression¶

Each iteration obtains the appropriate training and test dates from our custom cross-validation function, selects the corresponding features and targets, and then trains and predicts accordingly.

We capture the root mean squared error as well as the Spearman rank correlation between actual and predicted values:

nfolds = 250
lr = LinearRegression()

test_results, result_idx, preds = [], [], pd.DataFrame()
for train_dates, test_dates in time_series_split(dates, nfolds=nfolds):
    
    X_train = model_data.loc[idx[train_dates], features]
    y_train = model_data.loc[idx[train_dates], target]
    lr.fit(X=X_train, y=y_train)
    
    X_test = model_data.loc[idx[test_dates], features]
    y_test = model_data.loc[idx[test_dates], target]
    y_pred = lr.predict(X_test)
    
    rmse = np.sqrt(mean_squared_error(y_pred=y_pred, y_true=y_test))
    ic, pval = spearmanr(y_pred, y_test)
    
    test_results.append([rmse, ic, pval])
    preds = preds.append(y_test.to_frame('actuals').assign(predicted=y_pred))
    result_idx.append(train_dates[-1])

test_result = pd.DataFrame(test_results, columns=['rmse', 'ic', 'pval'], index=result_idx)

Results¶

We have captured the test predictions from the 250 folds and can compute both the overall and a 21-day rolling average:

fig, axes = plt.subplots(nrows=2)
rolling_result = test_result.rolling(21).mean().dropna()
rolling_result[['ic', 'pval']].plot(ax=axes[0], title='Information Coefficient')
axes[0].axhline(test_result.ic.mean(), lw=1, ls='--', color='k')
rolling_result[['rmse']].plot(ax=axes[1], title='Root Mean Squared Error')
axes[1].axhline(test_result.rmse.mean(), lw=1, ls='--', color='k')
plt.tight_layout();

For the entire period, we see that the Information Coefficient measured by the rank correlation of actual and predicted returns is positive and statistically significant:

preds_cleaned = preds[(preds.predicted.between(*preds.predicted.quantile([.001, .999]).values))]
sns.jointplot(x='actuals', y='predicted', data=preds_cleaned, stat_func=spearmanr, kind='reg');

Regularization¶

For the ridge regression, we need to tune the regularization parameter with the keyword alpha that corresponds to the λ we used previously. We will try 21 values from 10-5 to 105 in logarithmic steps.

Ridge Regression: L2 Penalty¶

The scale sensitivity of the ridge penalty requires us to standardize the inputs using the StandardScaler. Note that we always learn the mean and the standard deviation from the training set using the .fit_transform() method and then apply these learned parameters to the test set using the .transform() method.

nfolds = 250
alphas = np.logspace(-5, 5, 11)
scaler = StandardScaler()
ridge_result, ridge_coeffs = pd.DataFrame(), pd.DataFrame()
for i, alpha in enumerate(alphas):
    print alpha, 
    coeffs, test_results = [], []
    lr_ridge = Ridge(alpha=alpha)
    for train_dates, test_dates in time_series_split(dates, nfolds=nfolds):

        X_train = model_data.loc[idx[train_dates], features]
        y_train = model_data.loc[idx[train_dates], target]
        lr_ridge.fit(X=scaler.fit_transform(X_train), y=y_train)
        coeffs.append(lr_ridge.coef_)

        X_test = model_data.loc[idx[test_dates], features]
        y_test = model_data.loc[idx[test_dates], target]
        y_pred = lr_ridge.predict(scaler.transform(X_test))

        rmse = np.sqrt(mean_squared_error(y_pred=y_pred, y_true=y_test))
        ic, pval = spearmanr(y_pred, y_test)
        
        test_results.append([train_dates[-1], rmse, ic, pval, alpha])
        preds = preds.append(y_test)
        
    test_results = pd.DataFrame(test_results, columns=['date', 'rmse', 'ic', 'pval', 'alpha'])
    ridge_result = ridge_result.append(test_results)
    ridge_coeffs[alpha] = np.mean(coeffs, axis=0)

0 1 2 3 4 5 6 7 8 9 10

ridge_result.describe()

Significance of Information Coefficients - p-value Distribution¶

plt.figure(figsize=(8,5))
sns.distplot(ridge_result.pval, bins=30, norm_hist=True);

ridge_result_sig = ridge_result[(ridge_result.pval < .05) & (ridge_result.alpha.between(10**-5, 10**5))]
ridge_result_sig_alpha = ridge_result_sig.groupby('alpha')

ridge_coeffs_main = ridge_coeffs.filter(ridge_result_sig.alpha.unique())

Ridge Path¶

We can now plot the information coefficient obtained for each hyperparameter value and also visualize how the coefficient values evolve as the regularization increases. The results show that we get the highest IC value for a value of λ=10. For this level of regularization, the right-hand panel reveals that the coefficients have been already significantly shrunk compared to the (almost) unconstrained model with λ=10-5:

ridge_result.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2310 entries, 0 to 209
Data columns (total 5 columns):
date     2310 non-null datetime64[ns, UTC]
rmse     2310 non-null float64
ic       2310 non-null float64
pval     2310 non-null float64
alpha    2310 non-null float64
dtypes: datetime64[ns, UTC](1), float64(4)
memory usage: 108.3 KB

best_ic = ridge_result_sig_alpha['ic'].mean().max()
best_alpha = ridge_result_sig_alpha['ic'].mean().idxmax()

fig, axes = plt.subplots(ncols=2, sharex=True)

ridge_result.groupby('alpha')['ic'].mean().plot(logx=True, title='Information Coefficient', ax=axes[0])
axes[0].axhline(ridge_result.groupby('alpha').ic.mean().median())
axes[0].axvline(x=ridge_result.groupby('alpha').ic.mean().idxmax(), c='darkgrey', ls='--')
axes[0].set_xlabel('Regularization')
axes[0].set_ylabel('Information Coefficient')

ridge_coeffs_main.T.plot(legend=False, logx=True, title='Ridge Path', ax=axes[1])
axes[1].set_xlabel('Regularization')
axes[1].set_ylabel('Coefficients')
axes[1].axvline(x=ridge_result.groupby('alpha').ic.mean().idxmax(), c='darkgrey', ls='--')
fig.tight_layout();

Top Coefficients¶

The standardization of the coefficients allows us to draw conclusions about their relative importance by comparing their absolute magnitude. The most relevant coefficients are:

model_coeffs = ridge_coeffs_main.loc[:, best_alpha]
model_coeffs.index = features
model_coeffs.abs().sort_values().plot.barh(title='Top Factors', figsize=(10,23));

CV Result Distribution¶

ax = sns.boxplot(y='ic', x='alpha', data=ridge_result_sig)
plt.xticks(rotation=90);

Lasso Regression¶

The lasso implementation looks very similar to the ridge model we just ran. The main difference is that lasso needs to arrive at a solution using iterative coordinate descent whereas ridge can rely on a closed-form solution:

nfolds = 250
alphas = np.logspace(-8, -2, 13)
scaler = StandardScaler()

lasso_results, lasso_coeffs = pd.DataFrame(), pd.DataFrame()
for i, alpha in enumerate(alphas):
    print i,
    coeffs, test_results = [], []
    lr_lasso = Lasso(alpha=alpha)
    for i, (train_dates, test_dates) in enumerate(time_series_split(dates, nfolds=nfolds)):
        X_train = model_data.loc[idx[train_dates], features]
        y_train = model_data.loc[idx[train_dates], target]
        lr_lasso.fit(X=scaler.fit_transform(X_train), y=y_train)
        
        X_test = model_data.loc[idx[test_dates], features]
        y_test = model_data.loc[idx[test_dates], target]
        y_pred = lr_lasso.predict(scaler.transform(X_test))

        rmse = np.sqrt(mean_squared_error(y_pred=y_pred, y_true=y_test))
        ic, pval = spearmanr(y_pred, y_test)
        
        coeffs.append(lr_lasso.coef_)
        test_results.append([train_dates[-1], rmse, ic, pval, alpha])
    test_results = pd.DataFrame(test_results, columns=['date', 'rmse', 'ic', 'pval', 'alpha'])
    lasso_results = lasso_results.append(test_results)
    lasso_coeffs[alpha] = np.mean(coeffs, axis=0)

0

/venvs/py35/lib/python3.5/site-packages/sklearn/linear_model/coordinate_descent.py:444: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations
  ConvergenceWarning)

1 2 3 4 5 6 7 8 9 10 11 12

lasso_results.groupby('alpha').mean()

ax = sns.boxplot(y='ic', x='alpha', data=lasso_results)
plt.xticks(rotation=90);

Cross-validated information coefficient and Lasso Path¶

As before, we can plot the average information coefficient for all test sets used during cross-validation. We see again that regularization improves the IC over the unconstrained model, delivering the best out-of-sample result at a level of λ=10-5. The optimal regularization value is quite different from ridge regression because the penalty consists of the sum of the absolute, not the squared values of the relatively small coefficient values. We can also see that for this regularization level, the coefficients have been similarly shrunk, as in the ridge regression case:

fig, axes = plt.subplots(ncols=2, sharex=True)

lasso_results.groupby('alpha')['ic'].mean().plot(logx=True, title='Information Coefficient', ax=axes[0])
axes[0].axhline(lasso_results.groupby('alpha')['ic'].mean().median())
axes[0].axvline(x=lasso_results.groupby('alpha')['ic'].mean().idxmax(), c='darkgrey', ls='--')
axes[0].set_xlabel('Regularization')
axes[0].set_ylabel('Information Coefficient')

lasso_coeffs.T.plot(legend=False, logx=True, title='Lasso Path', ax=axes[1])
axes[1].set_xlabel('Regularization')
axes[1].set_ylabel('Coefficients')
axes[1].axvline(x=lasso_results.groupby('alpha')['ic'].mean().idxmax(), c='darkgrey', ls='--')
fig.tight_layout();

In sum, ridge and lasso will produce similar results. Ridge often computes faster, but lasso also yields continuous features subset selection by gradually reducing coefficients to zero, hence eliminating features.

	Total Missing Count	% of Total Observations
Numeric Feature
Dividend Payout Ratio	5682	0.226375
Dividend Growth	5542	0.220797
DividendYield	5361	0.213586
InterestCoverage	4639	0.184821
EVToEBITDA	3171	0.126335
EBITDAYield	3171	0.126335
EBIT To Assets	2731	0.108805
WorkingCapitalToSales	2731	0.108805
AssetToEquityRatio	2636	0.105020
CurrentRatio	2636	0.105020
WorkingCapitalToAssets	2636	0.105020
Capex To Sales	1534	0.061116
Capex To FCF	1534	0.061116
Capex To Assets	1534	0.061116
Net Debt Growth 12M	1505	0.059960
Net Debt Growth 3M	1328	0.052908
Net Debt	1268	0.050518
DebtToEquityRatio	549	0.021873
EPS Growth 12M	367	0.014622
Sales Growth 12M	347	0.013825
PriceToEarningsTTM	296	0.011793
Total Assets Growth 12M	252	0.010040
EPS Growth 3M	178	0.007092
Sales Growth 3M	158	0.006295
EPS	115	0.004582
PriceToDilutedEarningsTTM	115	0.004582
AssetTurnover	115	0.004582
CFO To Assets	95	0.003785
Retained Earnings To Assets	95	0.003785
Sales	95	0.003785
EVToFCF	95	0.003785
Percent Below High	82	0.003267
Percent Above Low	82	0.003267
Trendline	82	0.003267
Total Assets Growth 3M	63	0.002510
DebtToAssetRatio	20	0.000797
PriceToForwardEarnings	20	0.000797
Returns1D	0	0.000000
Returns20D	0	0.000000
Returns5D	0	0.000000
PriceToBook	0	0.000000
PriceToFCF	0	0.000000
Returns60D	0	0.000000
stock	0	0.000000
PriceToOperatingCashflow	0	0.000000
PriceToSalesTTM	0	0.000000
Directional Movement Index	0	0.000000
Money Flow Index	0	0.000000
Price Oscillator	0	0.000000
Volatility 3M	0	0.000000
MertonsDD	0	0.000000
Total Assets	0	0.000000
Downside Risk	0	0.000000
Index Beta	0	0.000000
Log Market Cap	0	0.000000
Returns10D	0	0.000000

	Returns60D	EBITDAYield	EVToEBITDA	EVToFCF	PriceToBook	PriceToDilutedEarningsTTM	PriceToEarningsTTM	PriceToFCF	PriceToForwardEarnings	PriceToOperatingCashflow	...	Capex To Assets	Capex To FCF	Capex To Sales	EBIT To Assets	Retained Earnings To Assets	Downside Risk	Index Beta	Log Market Cap	Volatility 3M	stock
date
2017-01-03 00:00:00+00:00	0.029940	5.0	11.0	18.0	31.0	14.0	1063.0	16.0	1161.0	18.0	...	29.0	22.0	28.0	40.0	26.0	24.0	3645.0	50.0	11.0	APPLE INC
2017-01-03 00:00:00+00:00	0.165562	34.0	24.0	15.0	50.0	35.0	2345.0	18.0	1833.0	16.0	...	24.0	27.0	13.0	13.0	34.0	23.0	4675.0	17.0	16.0	BOEING CO
2017-01-03 00:00:00+00:00	0.066606	32.0	36.0	49.0	35.0	40.0	2788.0	49.0	2400.0	48.0	...	25.0	44.0	26.0	31.0	47.0	6.0	2556.0	19.0	43.0	BRISTOL-MYERS SQUIBB CO
2017-01-03 00:00:00+00:00	0.099725	36.0	39.0	37.0	46.0	46.0	3390.0	40.0	1883.0	41.0	...	12.0	12.0	14.0	27.0	28.0	10.0	5036.0	14.0	44.0	CELGENE CORP
2017-01-03 00:00:00+00:00	0.055036	13.0	10.0	39.0	21.0	28.0	1892.0	34.0	2227.0	14.0	...	38.0	46.0	42.0	17.0	18.0	46.0	2461.0	29.0	14.0	COMCAST CORP

Dep. Variable:	Returns1D	R-squared:	0.010
Model:	OLS	Adj. R-squared:	0.005
Method:	Least Squares	F-statistic:	2.052
Date:	Tue, 03 Mar 2020	Prob (F-statistic):	5.34e-10
Time:	18:53:42	Log-Likelihood:	66771.
No. Observations:	22286	AIC:	-1.333e+05
Df Residuals:	22174	BIC:	-1.324e+05
Df Model:	111
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[95.0% Conf. Int.]
EBITDAYield	1.065e-05	3.65e-05	0.291	0.771	-6.09e-05 8.22e-05
EVToEBITDA	3.372e-05	3.29e-05	1.025	0.306	-3.08e-05 9.82e-05
EVToFCF	8.299e-07	2.12e-05	0.039	0.969	-4.06e-05 4.23e-05
PriceToBook	9.263e-06	1.96e-05	0.473	0.637	-2.92e-05 4.77e-05
PriceToDilutedEarningsTTM	3.505e-05	1.34e-05	2.615	0.009	8.78e-06 6.13e-05
PriceToEarningsTTM	-6.485e-07	2.41e-07	-2.694	0.007	-1.12e-06 -1.77e-07
PriceToFCF	4.116e-06	2.55e-05	0.161	0.872	-4.58e-05 5.41e-05
PriceToForwardEarnings	-1.585e-06	4.28e-07	-3.700	0.000	-2.42e-06 -7.45e-07
PriceToOperatingCashflow	1.644e-06	1.39e-05	0.119	0.906	-2.55e-05 2.88e-05
PriceToSalesTTM	-1.978e-06	8.37e-07	-2.362	0.018	-3.62e-06 -3.37e-07
Directional Movement Index	2.858e-06	5.75e-06	0.497	0.619	-8.41e-06 1.41e-05
Money Flow Index	-6.4e-06	7.6e-06	-0.842	0.400	-2.13e-05 8.49e-06
Percent Above Low	3.01e-05	1.77e-05	1.702	0.089	-4.57e-06 6.48e-05
Percent Below High	-1.671e-05	1.09e-05	-1.527	0.127	-3.82e-05 4.74e-06
Price Oscillator	1.143e-05	8.17e-06	1.399	0.162	-4.58e-06 2.74e-05
Trendline	-4.609e-05	1.48e-05	-3.115	0.002	-7.51e-05 -1.71e-05
AssetToEquityRatio	1.675e-07	6.15e-07	0.272	0.785	-1.04e-06 1.37e-06
AssetTurnover	-9.595e-05	5.46e-05	-1.757	0.079	-0.000 1.11e-05
CurrentRatio	-8.405e-08	8.02e-07	-0.105	0.917	-1.66e-06 1.49e-06
DebtToAssetRatio	-3.235e-07	5.16e-07	-0.627	0.531	-1.34e-06 6.88e-07
DebtToEquityRatio	-4.524e-07	3.96e-07	-1.142	0.253	-1.23e-06 3.24e-07
InterestCoverage	1.216e-05	2.59e-05	0.470	0.639	-3.86e-05 6.29e-05
MertonsDD	0.0003	7.99e-05	3.894	0.000	0.000 0.000
WorkingCapitalToAssets	1.473e-06	9.75e-07	1.511	0.131	-4.39e-07 3.39e-06
WorkingCapitalToSales	-0.0001	5.17e-05	-1.984	0.047	-0.000 -1.26e-06
EPS	7.201e-08	1.38e-07	0.523	0.601	-1.98e-07 3.42e-07
Net Debt	-2.383e-14	1.04e-14	-2.299	0.022	-4.41e-14 -3.51e-15
Sales	-1.756e-14	1.76e-14	-0.998	0.318	-5.21e-14 1.69e-14
Total Assets	9.465e-15	8.61e-15	1.100	0.271	-7.4e-15 2.63e-14
EPS Growth 3M	8.152e-07	6.98e-07	1.168	0.243	-5.52e-07 2.18e-06
EPS Growth 12M	2.448e-07	7.33e-07	0.334	0.738	-1.19e-06 1.68e-06
Net Debt Growth 3M	1.158e-06	8.66e-07	1.338	0.181	-5.39e-07 2.86e-06
Net Debt Growth 12M	7.935e-07	8.06e-07	0.985	0.325	-7.86e-07 2.37e-06
Sales Growth 3M	1.347e-06	8.79e-07	1.532	0.125	-3.76e-07 3.07e-06
Sales Growth 12M	-1.057e-06	7.49e-07	-1.412	0.158	-2.52e-06 4.1e-07
Total Assets Growth 3M	-2.06e-06	1e-06	-2.059	0.040	-4.02e-06 -9.85e-08
Total Assets Growth 12M	-4.765e-07	8.19e-07	-0.582	0.561	-2.08e-06 1.13e-06
CFO To Assets	5.146e-05	3e-05	1.715	0.086	-7.36e-06 0.000
Capex To Assets	1.494e-05	4.9e-05	0.305	0.760	-8.1e-05 0.000
Capex To FCF	-7.414e-06	2.15e-05	-0.344	0.731	-4.96e-05 3.48e-05
Capex To Sales	-3.405e-05	4.34e-05	-0.785	0.433	-0.000 5.1e-05
EBIT To Assets	-3.421e-05	3.34e-05	-1.023	0.306	-9.98e-05 3.13e-05
Retained Earnings To Assets	-0.0001	5.34e-05	-1.960	0.050	-0.000 -2.71e-09
Downside Risk	4.683e-05	1.73e-05	2.705	0.007	1.29e-05 8.08e-05
Index Beta	-1.026e-07	1.26e-07	-0.817	0.414	-3.49e-07 1.43e-07
Log Market Cap	-7.124e-06	3.96e-05	-0.180	0.857	-8.48e-05 7.06e-05
Volatility 3M	-2.913e-05	1.13e-05	-2.575	0.010	-5.13e-05 -6.95e-06
stock_ABBVIE INC	-0.0007	0.006	-0.104	0.917	-0.013 0.012
stock_ADVANCED MICRO DEVICES INC	0.0149	0.005	3.280	0.001	0.006 0.024
stock_ALLERGAN PLC	0.0050	0.006	0.886	0.376	-0.006 0.016
stock_ALTABA INC	0.0135	0.006	2.312	0.021	0.002 0.025
stock_AMAZON.COM INC	0.0103	0.006	1.620	0.105	-0.002 0.023
stock_AMGEN INC	0.0114	0.004	2.695	0.007	0.003 0.020
stock_APPLE INC	0.0182	0.006	3.167	0.002	0.007 0.029
stock_APPLIED MATERIALS INC	0.0146	0.005	2.716	0.007	0.004 0.025
stock_AT&T INC. COM	0.0040	0.006	0.645	0.519	-0.008 0.016
stock_Alphabet Inc. Cl A	0.0069	0.007	0.994	0.320	-0.007 0.021
stock_BERKSHIRE HATHAWAY INC CL-B	0.0029	0.008	0.349	0.727	-0.013 0.019
stock_BOEING CO	0.0147	0.004	3.432	0.001	0.006 0.023
stock_BOOKING HOLDINGS INC	0.0106	0.007	1.540	0.124	-0.003 0.024
stock_BRISTOL-MYERS SQUIBB CO	0.0196	0.005	3.758	0.000	0.009 0.030
stock_BROADCOM INC	0.0005	0.006	0.079	0.937	-0.012 0.013
stock_CATERPILLAR INC	0.0141	0.005	3.046	0.002	0.005 0.023
stock_CELGENE CORP	0.0145	0.005	2.724	0.006	0.004 0.025
stock_CHARTER COMMUNICATIONS INC	5.498e-06	0.006	0.001	0.999	-0.012 0.012
stock_CHEVRON CORPORATION	0.0050	0.007	0.750	0.453	-0.008 0.018
stock_CISCO SYSTEMS INC	0.0109	0.004	2.868	0.004	0.003 0.018
stock_CITIGROUP	-0.0050	0.016	-0.320	0.749	-0.036 0.026
stock_COCA-COLA CO	0.0150	0.006	2.660	0.008	0.004 0.026
stock_COMCAST CORP	0.0132	0.004	3.019	0.003	0.005 0.022
stock_COSTCO WHOLESALE CORP	0.0146	0.005	2.909	0.004	0.005 0.024
stock_CVS HEALTH CORP	0.0090	0.005	1.636	0.102	-0.002 0.020
stock_DELTA AIR LINES INC	-0.0007	0.007	-0.099	0.921	-0.014 0.012
stock_DUPONT DE NEMOURS INC	-0.0040	0.007	-0.604	0.546	-0.017 0.009
stock_EXXON MOBIL CORPORATION	0.0096	0.008	1.265	0.206	-0.005 0.025
stock_FACEBOOK INC	0.0062	0.007	0.911	0.362	-0.007 0.019
stock_FORD MOTOR CO(NEW)	0.0095	0.005	1.824	0.068	-0.001 0.020
stock_FREEPORT-MCMORAN INC	-0.0049	0.005	-0.960	0.337	-0.015 0.005
stock_GENERAL ELECTRIC CO	0.0092	0.005	1.823	0.068	-0.001 0.019
stock_GENERAL MOTORS CO	-0.0030	0.007	-0.418	0.676	-0.017 0.011
stock_GILEAD SCIENCES INC	0.0112	0.005	2.174	0.030	0.001 0.021
stock_GOLDMAN SACHS GROUP INC	-0.0021	0.009	-0.229	0.819	-0.020 0.016
stock_HOME DEPOT INC	0.0157	0.006	2.619	0.009	0.004 0.028
stock_INTEL CORP	0.0107	0.005	2.242	0.025	0.001 0.020
stock_INTL BUSINESS MACHINES CORP	0.0114	0.006	2.071	0.038	0.001 0.022
stock_JOHNSON AND JOHNSON	0.0134	0.005	2.558	0.011	0.003 0.024
stock_LOWES COMPANIES INC	0.0114	0.005	2.163	0.031	0.001 0.022
stock_MASTERCARD INC	0.0070	0.008	0.924	0.355	-0.008 0.022
stock_MCDONALDS CORP	0.0180	0.007	2.736	0.006	0.005 0.031
stock_MERCK & CO INC	0.0108	0.005	2.205	0.027	0.001 0.020
stock_MICRON TECHNOLOGY INC	0.0067	0.005	1.361	0.174	-0.003 0.016
stock_MICROSOFT CORP	0.0108	0.005	2.073	0.038	0.001 0.021
stock_MORGAN STANLEY	-0.0056	0.008	-0.663	0.507	-0.022 0.011
stock_NETFLIX INC	0.0091	0.006	1.436	0.151	-0.003 0.021
stock_NIKE INC CL-B	0.0094	0.005	1.832	0.067	-0.001 0.019
stock_NVIDIA CORP	0.0108	0.007	1.547	0.122	-0.003 0.024
stock_NXP SEMICONDUCTOR NV	-0.0035	0.006	-0.595	0.552	-0.015 0.008
stock_ORACLE CORP	0.0094	0.005	1.850	0.064	-0.001 0.019
stock_PAYPAL HLDGS INC COM W.I.	0.0039	0.006	0.622	0.534	-0.008 0.016
stock_PFIZER INC	0.0080	0.005	1.590	0.112	-0.002 0.018
stock_PROCTER & GAMBLE CO	0.0114	0.006	2.024	0.043	0.000 0.022
stock_QUALCOMM INC	0.0085	0.005	1.613	0.107	-0.002 0.019
stock_SALESFORCE.COM INC	0.0061	0.006	0.953	0.341	-0.006 0.019
stock_SCHLUMBERGER LTD.	0.0092	0.005	1.688	0.091	-0.001 0.020
stock_SQUARE INC CLASS A COM STK	0.0098	0.007	1.360	0.174	-0.004 0.024
stock_STARBUCKS CORPORATION	0.0110	0.006	1.813	0.070	-0.001 0.023
stock_TESLA INC	0.0035	0.006	0.537	0.591	-0.009 0.016
stock_TWITTER INC	0.0010	0.006	0.150	0.881	-0.012 0.014
stock_UNION PAC CORP	0.0088	0.006	1.405	0.160	-0.003 0.021
stock_UNITED STATES STEEL CP	-0.0005	0.005	-0.104	0.918	-0.010 0.009
stock_UNITEDHEALTH GROUP INC	0.0084	0.006	1.389	0.165	-0.003 0.020
stock_VERIZON COMMUNICATIONS	0.0022	0.006	0.351	0.726	-0.010 0.015
stock_VISA INC	0.0053	0.006	0.844	0.399	-0.007 0.018
stock_WALMART INC	0.0124	0.010	1.248	0.212	-0.007 0.032
stock_WALT DISNEY CO	0.0144	0.005	2.925	0.003	0.005 0.024
stock_WELLS FARGO & CO(NEW)	-0.0050	0.017	-0.302	0.763	-0.038 0.028

Omnibus:	227.023	Durbin-Watson:	1.440
Prob(Omnibus):	0.000	Jarque-Bera (JB):	340.744
Skew:	-0.106	Prob(JB):	1.02e-74
Kurtosis:	3.568	Cond. No.	2.14e+14

Omnibus:	184.550	Durbin-Watson:	1.445
Prob(Omnibus):	0.000	Jarque-Bera (JB):	207.748
Skew:	-0.180	Prob(JB):	7.73e-46
Kurtosis:	3.311	Cond. No.	2.16e+14

Omnibus:	94.078	Durbin-Watson:	1.485
Prob(Omnibus):	0.000	Jarque-Bera (JB):	96.866
Skew:	-0.148	Prob(JB):	9.24e-22
Kurtosis:	3.140	Cond. No.	2.19e+14

Omnibus:	1.051	Durbin-Watson:	1.622
Prob(Omnibus):	0.591	Jarque-Bera (JB):	1.032
Skew:	-0.004	Prob(JB):	0.597
Kurtosis:	3.033	Cond. No.	1.13e+16

Omnibus:	120.425	Durbin-Watson:	1.754
Prob(Omnibus):	0.000	Jarque-Bera (JB):	134.684
Skew:	-0.154	Prob(JB):	5.67e-30
Kurtosis:	3.281	Cond. No.	1.13e+16

	rmse	ic	pval	alpha
count	2310.000000	2310.000000	2.310000e+03	2310.000000
mean	0.101400	0.495750	4.474752e-02	10101.010101
std	0.169797	0.243367	1.533533e-01	28576.156942
min	0.035280	-0.442096	3.416974e-43	0.000010
25%	0.073405	0.344474	1.695359e-12	0.001000
50%	0.091405	0.526362	1.853071e-07	1.000000
75%	0.110802	0.670100	1.301841e-03	1000.000000
max	6.052030	0.944032	9.886506e-01	100000.000000

	rmse	ic	pval
alpha
1.000000e-08	0.140951	0.540975	0.026469
3.162278e-08	0.140938	0.540965	0.026447
1.000000e-07	0.140900	0.540997	0.026423
3.162278e-07	0.140778	0.540987	0.026408
1.000000e-06	0.140402	0.540919	0.026480
3.162278e-06	0.139275	0.540936	0.026595
1.000000e-05	0.136238	0.540973	0.027882
3.162278e-05	0.126921	0.539762	0.031418
1.000000e-04	0.096590	0.530632	0.039712
3.162278e-04	0.090903	0.506456	0.060761
1.000000e-03	0.093151	0.474378	0.058909
3.162278e-03	0.096868	0.400865	0.084559
1.000000e-02	0.101567	0.269554	0.155904

	coef	std err	t	P>\|t\|	[95.0% Conf. Int.]
EBITDAYield	-0.0014	0.000	-4.671	0.000	-0.002 -0.001
EVToEBITDA	0.0009	0.000	3.089	0.002	0.000 0.001
EVToFCF	-1.45e-05	0.000	-0.082	0.934	-0.000 0.000
PriceToBook	0.0004	0.000	2.395	0.017	6.95e-05 0.001
PriceToDilutedEarningsTTM	0.0003	0.000	2.475	0.013	5.38e-05 0.000
PriceToEarningsTTM	-1.291e-05	1.84e-06	-7.015	0.000	-1.65e-05 -9.3e-06
PriceToFCF	0.0004	0.000	1.702	0.089	-5.63e-05 0.001
PriceToForwardEarnings	3.503e-06	3.42e-06	1.026	0.305	-3.19e-06 1.02e-05
PriceToOperatingCashflow	0.0013	0.000	11.886	0.000	0.001 0.002
PriceToSalesTTM	-0.0001	7.16e-06	-17.728	0.000	-0.000 -0.000
Directional Movement Index	0.0001	4.25e-05	2.790	0.005	3.53e-05 0.000
Money Flow Index	-0.0001	5.69e-05	-1.978	0.048	-0.000 -1.02e-06
Percent Above Low	-0.0007	0.000	-4.871	0.000	-0.001 -0.000
Percent Below High	-0.0003	8.16e-05	-3.093	0.002	-0.000 -9.24e-05
Price Oscillator	-0.0002	6.1e-05	-2.929	0.003	-0.000 -5.92e-05
Trendline	-0.0001	0.000	-1.227	0.220	-0.000 8.86e-05
AssetToEquityRatio	-9.895e-05	7.81e-06	-12.671	0.000	-0.000 -8.36e-05
AssetTurnover	-0.0011	0.000	-2.630	0.009	-0.002 -0.000
CurrentRatio	6.704e-05	6.25e-06	10.727	0.000	5.48e-05 7.93e-05
DebtToAssetRatio	7.513e-06	4.17e-06	1.800	0.072	-6.67e-07 1.57e-05
DebtToEquityRatio	3.7e-05	2.78e-06	13.298	0.000	3.15e-05 4.25e-05
InterestCoverage	0.0010	0.000	4.694	0.000	0.001 0.001
MertonsDD	0.0081	0.001	10.836	0.000	0.007 0.010
WorkingCapitalToAssets	-4.368e-05	7.33e-06	-5.960	0.000	-5.8e-05 -2.93e-05
WorkingCapitalToSales	0.0005	0.000	1.211	0.226	-0.000 0.001
EPS	8.932e-07	9.91e-07	0.902	0.367	-1.05e-06 2.83e-06
Net Debt	2.238e-13	7.6e-14	2.944	0.003	7.48e-14 3.73e-13
Sales	-8.609e-13	1.46e-13	-5.909	0.000	-1.15e-12 -5.75e-13
Total Assets	-1.029e-12	6.99e-14	-14.719	0.000	-1.17e-12 -8.92e-13
EPS Growth 3M	-2.457e-05	5.08e-06	-4.840	0.000	-3.45e-05 -1.46e-05
EPS Growth 12M	4.763e-05	5.32e-06	8.951	0.000	3.72e-05 5.81e-05
Net Debt Growth 3M	3.316e-05	6.39e-06	5.188	0.000	2.06e-05 4.57e-05
Net Debt Growth 12M	-2.444e-05	5.82e-06	-4.196	0.000	-3.59e-05 -1.3e-05
Sales Growth 3M	2.616e-05	6.41e-06	4.082	0.000	1.36e-05 3.87e-05
Sales Growth 12M	-4.77e-05	5.6e-06	-8.517	0.000	-5.87e-05 -3.67e-05
Total Assets Growth 3M	-6.509e-05	7.33e-06	-8.884	0.000	-7.95e-05 -5.07e-05
Total Assets Growth 12M	4.811e-05	6e-06	8.013	0.000	3.63e-05 5.99e-05
CFO To Assets	0.0007	0.000	2.969	0.003	0.000 0.001
Capex To Assets	0.0008	0.000	2.246	0.025	0.000 0.002