Time Series Cross-Validation

A key assumption for time series cross-validation is the independent and identical (iid) distribution of the samples available for training. For financial data, this is often not the case. Financial data is neither independently nor identically distributed because of serial correlation and time-varying standard deviation, also known as heteroskedasticity.

Time series cross-validation with sklearn

  • The TimeSeriesSplit in the sklearn.model_selection module aims to address the linear order of time-series data.
  • To address time dependency, the sklearn.model_selection.TimeSeriesSplit object implements a walk-forward test with an expanding training set, where subsequent training sets are supersets of past training sets.
In [1]:
from sklearn.model_selection import TimeSeriesSplit
from sklearn.model_selection import KFold
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import seaborn as sns 

import sklearn
from sklearn.linear_model import ElasticNet
from sklearn.multioutput import MultiOutputRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV

np.random.seed(1338)
cmap_data = plt.cm.Paired
cmap_cv = plt.cm.coolwarm
n_splits = 5
In [2]:
import pandas as pd
from pandas_datareader import data as web
import warnings
warnings.filterwarnings('ignore')

Data

  • Generate 100 random data points with 3 different imbalanced classes for X and y data.
In [3]:
# random data points
n_points = 100
n_features = 10
X = np.random.randn(n_points, n_features)

# imbalanced  classes 
percentiles_classes = [.2, .3, .5]
y = np.hstack([[ii] * int(n_points * perc)
               for ii, perc in enumerate(percentiles_classes)])
In [4]:
X = pd.DataFrame(X)
X.head()
Out[4]:
0 1 2 3 4 5 6 7 8 9
0 0.303396 0.069053 -1.369947 -1.735424 0.920390 -0.673286 0.311303 1.659909 -0.389927 0.246704
1 1.262869 -0.071432 0.217658 -1.038711 -0.577300 0.225155 -1.114151 -0.212118 -0.116820 -0.232433
2 1.010172 -1.522601 -0.038992 -0.495301 -1.265117 1.309476 -0.048075 0.658444 1.178183 0.924793
3 -0.136163 -0.432714 -1.412182 -1.726017 -0.127448 1.360709 1.804914 -0.831119 -0.752473 -0.867475
4 -1.098774 -0.015860 1.763406 -1.095152 0.657602 0.853470 -1.340776 1.257925 1.407447 -1.928222
In [5]:
y
Out[5]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
In [6]:
df = X.copy()
df['class'] = y

Visualize data

In [7]:
total = len(df)

plt.figure(figsize=(13,5))
plt.subplot(121)
g = sns.countplot(x='class', data=df)
g.set_title("class Count", fontsize=14)
g.set_ylabel('Count', fontsize=14)
for p in g.patches:
    height = p.get_height()
    g.text(p.get_x()+p.get_width()/2.,
            height + 1.5,
            '{:1.2f}%'.format(height/total*100),
            ha="center", fontsize=14, fontweight='bold')
plt.margins(y=0.1)
plt.show()
In [8]:
def plot_cv_indices(cv, X, y, ax, n_splits, lw=10):
    """Create a sample plot for indices of a cross-validation object."""

    # Generate the training/testing visualizations for each CV split
    for ii, (tr, tt) in enumerate(cv.split(X=X, y=y, groups=None)):
        # Fill in indices with the training/test groups
        indices = np.array([np.nan] * len(X))
        indices[tt] = 1
        indices[tr] = 0

        # Visualize the results
        ax.scatter(range(len(indices)), [ii + .5] * len(indices),
                   c=indices, marker='_', lw=lw, cmap=cmap_cv,
                   vmin=-.2, vmax=1.2)
        
    # Plot the data classes and groups at the end
    ax.scatter(range(len(X)), [ii + 1.5] * len(X),
               c=y, marker='_', lw=lw, cmap=cmap_data)

    # Formatting
    yticklabels = list(range(n_splits)) + ['class']
    ax.set(yticks=np.arange(n_splits+2) + .5, yticklabels=yticklabels,
           xlabel='Sample index', ylabel="CV iteration",
           ylim=[n_splits+1.2, -.1], xlim=[0, 100])
    ax.set_title('{}'.format(type(cv).__name__), fontsize=15)
    return ax

Blocked and Time Series Split Cross-Validation

Blocked cross-validation works by adding margins at two positions. The first is between the training and validation folds in order to prevent the model from observing lag values which are used twice, once as an estimator (regressor) and another as a response. The second is between the folds used at each iteration in order to prevent the model from memorizing patterns from one iteration to the next.

  • Sci-kit learn gives the luxury to define new types of splitters as long as you abide by its splitter API and inherit from the base splitter.
In [9]:
class BlockingTimeSeriesSplit():
    def __init__(self, n_splits):
        self.n_splits = n_splits
    
    def get_n_splits(self, X, y, groups):
        return self.n_splits
    
    def split(self, X, y=None, groups=None):
        n_samples = len(X)
        k_fold_size = n_samples // self.n_splits
        indices = np.arange(n_samples)

        margin = 0
        for i in range(self.n_splits):
            start = i * k_fold_size
            stop = start + k_fold_size
            mid = int(0.5 * (stop - start)) + start
            yield indices[start: mid], indices[mid + margin: stop]

Plot TimeSeriesSplit vs BlockingTimeSeriesSplit

In [10]:
cvs = [TimeSeriesSplit, BlockingTimeSeriesSplit]

for cv in cvs:
    this_cv = cv(n_splits=n_splits)
    fig, ax = plt.subplots(figsize=(10, 5))
    plot_cv_indices(this_cv, X, y, ax, n_splits)

    ax.legend([Patch(color=cmap_cv(.8)), Patch(color=cmap_cv(.02))],
            ['Testing set', 'Training set'], loc=(1.02, .8))
    plt.tight_layout()
    fig.subplots_adjust(right=.7)
        
plt.show()

The two split methods are depicted above. The horizontal axis is the training set size while the vertical axis represents the cross-validation iterations. The folds used for training are depicted in blue and the folds used for validation are depicted in orange. The final horizontal bar are the three class labels for the response variable.

Crypto Data Set

I have obtained ETH/USD exchange prices up to the year 2020 from cryptodatadownload

In [11]:
df = pd.read_csv('Gemini_ETHUSD_d.csv', skiprows=1, parse_dates=True, index_col='Date')
df = df.sort_index().drop('Symbol', axis=1)
df.head()
Out[11]:
Open High Low Close Volume ETH Volume USD
Date
2016-05-09 12.00 12.00 9.36 9.98 1317.90 12885.06
2016-05-10 9.98 9.98 9.36 9.68 672.06 6578.20
2016-05-11 9.68 10.47 9.68 10.43 3052.51 30978.11
2016-05-12 10.43 12.00 9.92 10.20 2072.56 22183.39
2016-05-13 10.20 11.59 10.20 10.69 1769.71 18923.55

Here, I have used a lag of 58 days for regressors and a target of 58 days for responses. That is, given the past 58 days closing price/volume features forecast the next 58 days. Then the resulting nan rows are dropped to handle missing values.

In [12]:
STEPS = 9
In [13]:
for i in np.arange(1 ,STEPS):
    col_name = '{}d_Fwd_Close'.format(i)
    df[col_name] = df['Close'].shift(-i)
    
df = df.dropna()

Next, we split the data frame into two one for the regressors and the other for the responses. And then split both into two one for training and the other for testing.

In [14]:
Features = 6

X = df.iloc[:, :Features]
y = df.iloc[:, Features:]

split = int(len(df) * 0.7)

X_train = X[:split]
y_train = y[:split]

X_test = X[split:]
y_test = y[split:]
In [15]:
X.head()
Out[15]:
Open High Low Close Volume ETH Volume USD
Date
2016-05-09 12.00 12.00 9.36 9.98 1317.90 12885.06
2016-05-10 9.98 9.98 9.36 9.68 672.06 6578.20
2016-05-11 9.68 10.47 9.68 10.43 3052.51 30978.11
2016-05-12 10.43 12.00 9.92 10.20 2072.56 22183.39
2016-05-13 10.20 11.59 10.20 10.69 1769.71 18923.55
In [16]:
y.head()
Out[16]:
1d_Fwd_Close 2d_Fwd_Close 3d_Fwd_Close 4d_Fwd_Close 5d_Fwd_Close 6d_Fwd_Close 7d_Fwd_Close 8d_Fwd_Close
Date
2016-05-09 9.68 10.43 10.20 10.69 10.25 10.06 11.37 12.23
2016-05-10 10.43 10.20 10.69 10.25 10.06 11.37 12.23 13.30
2016-05-11 10.20 10.69 10.25 10.06 11.37 12.23 13.30 14.50
2016-05-12 10.69 10.25 10.06 11.37 12.23 13.30 14.50 13.90
2016-05-13 10.25 10.06 11.37 12.23 13.30 14.50 13.90 13.97

Model Design

Let’s define a method that creates an elastic net model from sci-kit learn and since we are going to forecast more than one future time step, we will use a multi-output regressor wrapper that trains a separate model for each target time step. This is a simple strategy for extending regressors that do not natively support multi-target regression. However, this introduces more demand for computation resources.

Elastic Net regression introduces both L1-regularization and L2-regularization to resolve overfitting and are also known as Lasso and Ridge regression respectively. Due to the trade offs of both Lasso and Ridge regression, Elastic Net regression was introduced to mix the two models. As a result, some of the variables coefficients are set to zero as per L1-norm and some others are penalized or shrunken as per the L2-norm.

This model combines the best from both regressors and the result is a more stable, robust, and sparse model. As a consequence, there are more parameters to be tuned.

In [17]:
def build_model(_alpha, _l1_ratio):
    estimator = ElasticNet(
        alpha=_alpha,
        l1_ratio=_l1_ratio,
        fit_intercept=True,
        normalize=False,
        precompute=False,
        max_iter=16,
        copy_X=True,
        tol=0.1,
        warm_start=False,
        positive=False,
        random_state=None,
        selection='random'
    )

    return MultiOutputRegressor(estimator, n_jobs=4)
In [18]:
sklearn.metrics.SCORERS.keys()
Out[18]:
dict_keys(['explained_variance', 'r2', 'max_error', 'neg_median_absolute_error', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'accuracy', 'roc_auc', 'balanced_accuracy', 'average_precision', 'neg_log_loss', 'brier_score_loss', 'adjusted_rand_score', 'homogeneity_score', 'completeness_score', 'v_measure_score', 'mutual_info_score', 'adjusted_mutual_info_score', 'normalized_mutual_info_score', 'fowlkes_mallows_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'jaccard', 'jaccard_macro', 'jaccard_micro', 'jaccard_samples', 'jaccard_weighted'])

Time series splitter

In [19]:
model = build_model(_alpha=1.0, _l1_ratio=0.3)
tscv = TimeSeriesSplit(n_splits=5)
rmse = np.sqrt(-cross_val_score(model, X_train, y_train, cv=tscv, scoring='neg_mean_squared_error'))
R2 = cross_val_score(model, X_train, y_train, cv=tscv, scoring='r2')

print(f"RMSE: {rmse.mean()} (+/- {rmse.std()}")
print(f"\nR2: {R2.mean()} (+/- {R2.std()}")
RMSE: 56.75297352172712 (+/- 39.52438387000396

R2: 0.6965199572225272 (+/- 0.21016671547882793

Blocking time series splitter

In [20]:
btscv = BlockingTimeSeriesSplit(n_splits=5)
rmse = np.sqrt(-cross_val_score(model, X_train, y_train, cv=btscv, scoring='neg_mean_squared_error'))
R2 = cross_val_score(model, X_train, y_train, cv=btscv, scoring='r2')

print(f"RMSE: {rmse.mean()} (+/- {rmse.std()}")
print(f"\nR2: {R2.mean()} (+/- {R2.std()}")
RMSE: 40.759129519772294 (+/- 33.4684769189527

R2: 0.5749318796458167 (+/- 0.3367378026776988

Notice how the loss is different among the different splitters. In order to interpret the results correctly, let’s put it to test by using grid search cross-validation to find the optimal values for both regularization parameter alpha and the ratio that controls how much norm contributes to the regularization.

GridSearchCV

GridSearchCV works by exhaustively searching all the possible combinations of the model’s parameters, but it makes use of a loss function to guide the selection of the values to be tried at each iteration. That is solving a minimization optimization problem. However, in SciKit Learn it explicitly tries all the possible combinations which makes it very computationally expensive.

When cross-validation is used in the inner loop of the grid search, it is called grid search cross-validation. Hence, the optimization objective becomes minimizing the loss obtained on each of the k folds.

In [21]:
def plot_grid_search(cv_results, grid_param_1, grid_param_2, name_param_1, name_param_2, best_params):
    
    # Get Test Scores Mean and std for each grid search
    scores_mean = cv_results['mean_test_score']
    scores_mean = np.array(scores_mean).reshape(len(grid_param_2),len(grid_param_1))

    scores_sd = cv_results['std_test_score']
    scores_sd = np.array(scores_sd).reshape(len(grid_param_2),len(grid_param_1))

    # Plot Grid search scores
    _, ax = plt.subplots(1,1)

    # Param1 is the X-axis, Param 2 is represented as a different curve (color line)
    for idx, val in enumerate(grid_param_2):
        ax.plot(grid_param_1, scores_mean[idx,:], '-o', label= name_param_2 + ': ' + str(val))

    ax.set_title(f"Grid Search Best Params: {best_params}", fontsize=12, fontweight='medium')
    ax.set_xlabel(name_param_1, fontsize=12)
    ax.set_ylabel('CV Average Score', fontsize=12)
    ax.legend(loc="best", fontsize=15)
    ax.grid('on')
    ax.legend(bbox_to_anchor=(1.02, 1.02))

Time series splitter

In [22]:
model.get_params().keys()
Out[22]:
dict_keys(['estimator__alpha', 'estimator__copy_X', 'estimator__fit_intercept', 'estimator__l1_ratio', 'estimator__max_iter', 'estimator__normalize', 'estimator__positive', 'estimator__precompute', 'estimator__random_state', 'estimator__selection', 'estimator__tol', 'estimator__warm_start', 'estimator', 'n_jobs'])
In [23]:
params = {
    'estimator__alpha':(0.1, 0.3, 0.5, 0.7, 0.9),
    'estimator__l1_ratio':(0.1, 0.3, 0.5, 0.7, 0.9)
}
In [24]:
scores = []
for i in range(30):
    model = build_model(_alpha=1.0, _l1_ratio=0.3)

    finder = GridSearchCV(
        estimator=model,
        param_grid=params,
        scoring='r2',
        n_jobs=4,
        iid=False,
        refit=True,
        cv=tscv,  # change this to the splitter subject to test
        verbose=1,
        pre_dispatch=8,
        error_score=-999,
        return_train_score=True
        )

    finder.fit(X_train, y_train)

    best_params = finder.best_params_
    best_score = round(finder.best_score_,4)
    scores.append(best_score)
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.7s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.4s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.3s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.1s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.3s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.6s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.6s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.3s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.3s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.6s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.1s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.6s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished

Blocking time series splitter

In [25]:
scores0 = []
for i in range(30):
    model = build_model(_alpha=1.0, _l1_ratio=0.3)
    
    finder0 = GridSearchCV(
        estimator=model,
        param_grid=params,
        scoring='r2',
        n_jobs=4,
        iid=False,
        refit=True,
        cv=btscv,  # change this to the splitter subject to test
        verbose=1,
        pre_dispatch=8,
        error_score=-999,
        return_train_score=True
        )

    finder0.fit(X_train, y_train)

    best_params0 = finder0.best_params_
    best_score0 = round(finder0.best_score_,4)
    scores0.append(best_score)
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.3s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.6s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.3s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.6s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.6s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.1s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.6s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.1s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.3s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.1s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.6s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.6s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.5s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
Fitting 5 folds for each of 25 candidates, totalling 125 fits
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    3.6s
[Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed:   10.2s finished
In [26]:
finder0.cv_results_.keys()
Out[26]:
dict_keys(['mean_fit_time', 'std_fit_time', 'mean_score_time', 'std_score_time', 'param_estimator__alpha', 'param_estimator__l1_ratio', 'params', 'split0_test_score', 'split1_test_score', 'split2_test_score', 'split3_test_score', 'split4_test_score', 'mean_test_score', 'std_test_score', 'rank_test_score', 'split0_train_score', 'split1_train_score', 'split2_train_score', 'split3_train_score', 'split4_train_score', 'mean_train_score', 'std_train_score'])

Results

Grid-search cross-validation was run 30 times in order to objectively measure the consistency of the results obtained using each splitter. This way we can evaluate the effectiveness and robustness of the cross-validation method on the time series. As for the k-fold cross-validation, the parameters suggested were close to uniform. Meaning, it did not really help in discriminating the optimal parameters since all values were either equally good or bad.

In [27]:
scores1 = pd.DataFrame(scores)
bs = round(float(scores1.mean()),4)
print(f'\nTime series splitter best score: {bs}')

plot_grid_search(finder.cv_results_, params['estimator__l1_ratio'], params['estimator__alpha'], 
                 'l1_ratio', 'alpha', best_params)
Time series splitter best score: 0.714
In [28]:
scores01 = pd.DataFrame(scores0)
bs0 = round(float(scores01.mean()),4)
print(f'\nBlocking time series splitter best score: {bs0}')

plot_grid_search(finder0.cv_results_, params['estimator__l1_ratio'], params['estimator__alpha'], 
                 'l1_ratio', 'alpha', best_params0)
Blocking time series splitter best score: 0.7245

In both the cases of time series split cross-validation and blocked cross-validation, we have obtained a clear indication of the optimal values for both parameters. In case of blocked cross-validation, the results were even more discriminative as there is a clearer and more consistent drop off as l1 ratio values increase with respect to alpha values.

In [29]:
preds = pd.DataFrame(finder.predict(X_test), columns=df.iloc[:, Features:].columns)
preds.head()
Out[29]:
1d_Fwd_Close 2d_Fwd_Close 3d_Fwd_Close 4d_Fwd_Close 5d_Fwd_Close 6d_Fwd_Close 7d_Fwd_Close 8d_Fwd_Close
0 121.124623 121.133478 120.734223 119.695900 129.216379 124.016077 124.117680 125.920169
1 123.084550 122.862757 122.151594 120.750844 126.333414 125.406969 125.501370 124.268494
2 120.085939 123.001682 121.977905 116.692940 122.701168 125.025789 125.193181 121.832818
3 120.354948 119.790353 118.580918 115.806696 121.439701 121.306727 121.573581 119.934876
4 121.711769 123.922155 122.820337 117.599744 124.967223 125.522764 125.706135 123.185545
In [30]:
fig, ax = plt.subplots(figsize=(12,5))
ax.scatter(preds.index, y_test['8d_Fwd_Close'], color='b', alpha=0.5, label='Actual', s=50)
ax.scatter(preds.index, preds['8d_Fwd_Close'], color='r', alpha=0.5, label='Perdicted', s=50)
ax.set_xticklabels(df[split:].index.strftime('%Y-%m-%d'))
ax.set_title('8d_Fwd_Close')
ax.legend()
plt.show()

After obtaining the optimal values for the models parameters, we can train the model and evaluate it on the testing set. The results, as depicted in the plot above, indicate smooth capture of the trend and minimum error rate.

In [31]:
model = build_model(_alpha=1.0, _l1_ratio=0.3)
rmse = np.sqrt(-cross_val_score(model, X_test, y_test, cv=tscv, scoring='neg_mean_squared_error'))
R2 = cross_val_score(model, X_test, y_test, cv=tscv, scoring='r2')

print(f"RMSE: {rmse.mean()} (+/- {rmse.std()}")
print(f"\nR2: {R2.mean()} (+/- {R2.std()}")
RMSE: 23.23101772055727 (+/- 7.725460399491488

R2: 0.37692919884847687 (+/- 0.4119088912942127

Trianing Dates for BlockingTimeSeriesSplit

In [32]:
btss = BlockingTimeSeriesSplit(n_splits=12)

for tr_idx, val_idx in btss.split(X, y):

    X_tr, X_vl = X.iloc[tr_idx], X.iloc[val_idx]
        
    print('Train:')
    print(X_tr.info())
    print()
    print('Test:')
    print(X_vl.info())
    print()
Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2016-05-09 to 2016-07-05
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2016-07-06 to 2016-09-01
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2016-09-02 to 2016-10-29
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2016-10-30 to 2016-12-26
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2016-12-27 to 2017-02-22
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2017-02-23 to 2017-04-21
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2017-04-22 to 2017-06-18
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2017-06-19 to 2017-08-15
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2017-08-16 to 2017-10-12
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2017-10-13 to 2017-12-09
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2017-12-10 to 2018-02-05
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2018-02-06 to 2018-04-04
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2018-04-05 to 2018-06-01
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2018-06-02 to 2018-07-29
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2018-07-30 to 2018-09-25
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2018-09-26 to 2018-11-22
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2018-11-23 to 2019-01-19
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2019-01-20 to 2019-03-18
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2019-03-19 to 2019-05-15
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2019-05-16 to 2019-07-12
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2019-07-13 to 2019-09-08
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2019-09-09 to 2019-11-05
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2019-11-06 to 2020-01-02
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2020-01-03 to 2020-02-29
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Trianing Dates for TimeSeriesSplit

In [33]:
tss = TimeSeriesSplit(n_splits=23)

for tr_idx, val_idx in tss.split(X, y):

    X_tr, X_vl = X.iloc[tr_idx], X.iloc[val_idx]
        
    print('Train:')
    print(X_tr.info())
    print()
    print('Test:')
    print(X_vl.info())
    print()
Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 68 entries, 2016-05-09 to 2016-07-15
Data columns (total 6 columns):
Open          68 non-null float64
High          68 non-null float64
Low           68 non-null float64
Close         68 non-null float64
Volume ETH    68 non-null float64
Volume USD    68 non-null float64
dtypes: float64(6)
memory usage: 3.7 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2016-07-16 to 2016-09-11
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 126 entries, 2016-05-09 to 2016-09-11
Data columns (total 6 columns):
Open          126 non-null float64
High          126 non-null float64
Low           126 non-null float64
Close         126 non-null float64
Volume ETH    126 non-null float64
Volume USD    126 non-null float64
dtypes: float64(6)
memory usage: 6.9 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2016-09-12 to 2016-11-08
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 184 entries, 2016-05-09 to 2016-11-08
Data columns (total 6 columns):
Open          184 non-null float64
High          184 non-null float64
Low           184 non-null float64
Close         184 non-null float64
Volume ETH    184 non-null float64
Volume USD    184 non-null float64
dtypes: float64(6)
memory usage: 10.1 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2016-11-09 to 2017-01-05
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 242 entries, 2016-05-09 to 2017-01-05
Data columns (total 6 columns):
Open          242 non-null float64
High          242 non-null float64
Low           242 non-null float64
Close         242 non-null float64
Volume ETH    242 non-null float64
Volume USD    242 non-null float64
dtypes: float64(6)
memory usage: 13.2 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2017-01-06 to 2017-03-04
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 300 entries, 2016-05-09 to 2017-03-04
Data columns (total 6 columns):
Open          300 non-null float64
High          300 non-null float64
Low           300 non-null float64
Close         300 non-null float64
Volume ETH    300 non-null float64
Volume USD    300 non-null float64
dtypes: float64(6)
memory usage: 16.4 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2017-03-05 to 2017-05-01
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 358 entries, 2016-05-09 to 2017-05-01
Data columns (total 6 columns):
Open          358 non-null float64
High          358 non-null float64
Low           358 non-null float64
Close         358 non-null float64
Volume ETH    358 non-null float64
Volume USD    358 non-null float64
dtypes: float64(6)
memory usage: 19.6 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2017-05-02 to 2017-06-28
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 416 entries, 2016-05-09 to 2017-06-28
Data columns (total 6 columns):
Open          416 non-null float64
High          416 non-null float64
Low           416 non-null float64
Close         416 non-null float64
Volume ETH    416 non-null float64
Volume USD    416 non-null float64
dtypes: float64(6)
memory usage: 22.8 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2017-06-29 to 2017-08-25
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 474 entries, 2016-05-09 to 2017-08-25
Data columns (total 6 columns):
Open          474 non-null float64
High          474 non-null float64
Low           474 non-null float64
Close         474 non-null float64
Volume ETH    474 non-null float64
Volume USD    474 non-null float64
dtypes: float64(6)
memory usage: 25.9 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2017-08-26 to 2017-10-22
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 532 entries, 2016-05-09 to 2017-10-22
Data columns (total 6 columns):
Open          532 non-null float64
High          532 non-null float64
Low           532 non-null float64
Close         532 non-null float64
Volume ETH    532 non-null float64
Volume USD    532 non-null float64
dtypes: float64(6)
memory usage: 29.1 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2017-10-23 to 2017-12-19
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 590 entries, 2016-05-09 to 2017-12-19
Data columns (total 6 columns):
Open          590 non-null float64
High          590 non-null float64
Low           590 non-null float64
Close         590 non-null float64
Volume ETH    590 non-null float64
Volume USD    590 non-null float64
dtypes: float64(6)
memory usage: 32.3 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2017-12-20 to 2018-02-15
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 648 entries, 2016-05-09 to 2018-02-15
Data columns (total 6 columns):
Open          648 non-null float64
High          648 non-null float64
Low           648 non-null float64
Close         648 non-null float64
Volume ETH    648 non-null float64
Volume USD    648 non-null float64
dtypes: float64(6)
memory usage: 35.4 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2018-02-16 to 2018-04-14
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 706 entries, 2016-05-09 to 2018-04-14
Data columns (total 6 columns):
Open          706 non-null float64
High          706 non-null float64
Low           706 non-null float64
Close         706 non-null float64
Volume ETH    706 non-null float64
Volume USD    706 non-null float64
dtypes: float64(6)
memory usage: 38.6 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2018-04-15 to 2018-06-11
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 764 entries, 2016-05-09 to 2018-06-11
Data columns (total 6 columns):
Open          764 non-null float64
High          764 non-null float64
Low           764 non-null float64
Close         764 non-null float64
Volume ETH    764 non-null float64
Volume USD    764 non-null float64
dtypes: float64(6)
memory usage: 41.8 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2018-06-12 to 2018-08-08
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 822 entries, 2016-05-09 to 2018-08-08
Data columns (total 6 columns):
Open          822 non-null float64
High          822 non-null float64
Low           822 non-null float64
Close         822 non-null float64
Volume ETH    822 non-null float64
Volume USD    822 non-null float64
dtypes: float64(6)
memory usage: 45.0 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2018-08-09 to 2018-10-05
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 880 entries, 2016-05-09 to 2018-10-05
Data columns (total 6 columns):
Open          880 non-null float64
High          880 non-null float64
Low           880 non-null float64
Close         880 non-null float64
Volume ETH    880 non-null float64
Volume USD    880 non-null float64
dtypes: float64(6)
memory usage: 48.1 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2018-10-06 to 2018-12-02
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 938 entries, 2016-05-09 to 2018-12-02
Data columns (total 6 columns):
Open          938 non-null float64
High          938 non-null float64
Low           938 non-null float64
Close         938 non-null float64
Volume ETH    938 non-null float64
Volume USD    938 non-null float64
dtypes: float64(6)
memory usage: 51.3 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2018-12-03 to 2019-01-29
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 996 entries, 2016-05-09 to 2019-01-29
Data columns (total 6 columns):
Open          996 non-null float64
High          996 non-null float64
Low           996 non-null float64
Close         996 non-null float64
Volume ETH    996 non-null float64
Volume USD    996 non-null float64
dtypes: float64(6)
memory usage: 54.5 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2019-01-30 to 2019-03-28
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1054 entries, 2016-05-09 to 2019-03-28
Data columns (total 6 columns):
Open          1054 non-null float64
High          1054 non-null float64
Low           1054 non-null float64
Close         1054 non-null float64
Volume ETH    1054 non-null float64
Volume USD    1054 non-null float64
dtypes: float64(6)
memory usage: 57.6 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2019-03-29 to 2019-05-25
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1112 entries, 2016-05-09 to 2019-05-25
Data columns (total 6 columns):
Open          1112 non-null float64
High          1112 non-null float64
Low           1112 non-null float64
Close         1112 non-null float64
Volume ETH    1112 non-null float64
Volume USD    1112 non-null float64
dtypes: float64(6)
memory usage: 60.8 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2019-05-26 to 2019-07-22
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1170 entries, 2016-05-09 to 2019-07-22
Data columns (total 6 columns):
Open          1170 non-null float64
High          1170 non-null float64
Low           1170 non-null float64
Close         1170 non-null float64
Volume ETH    1170 non-null float64
Volume USD    1170 non-null float64
dtypes: float64(6)
memory usage: 64.0 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2019-07-23 to 2019-09-18
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1228 entries, 2016-05-09 to 2019-09-18
Data columns (total 6 columns):
Open          1228 non-null float64
High          1228 non-null float64
Low           1228 non-null float64
Close         1228 non-null float64
Volume ETH    1228 non-null float64
Volume USD    1228 non-null float64
dtypes: float64(6)
memory usage: 67.2 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2019-09-19 to 2019-11-15
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1286 entries, 2016-05-09 to 2019-11-15
Data columns (total 6 columns):
Open          1286 non-null float64
High          1286 non-null float64
Low           1286 non-null float64
Close         1286 non-null float64
Volume ETH    1286 non-null float64
Volume USD    1286 non-null float64
dtypes: float64(6)
memory usage: 70.3 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2019-11-16 to 2020-01-12
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

Train:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1344 entries, 2016-05-09 to 2020-01-12
Data columns (total 6 columns):
Open          1344 non-null float64
High          1344 non-null float64
Low           1344 non-null float64
Close         1344 non-null float64
Volume ETH    1344 non-null float64
Volume USD    1344 non-null float64
dtypes: float64(6)
memory usage: 73.5 KB
None

Test:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58 entries, 2020-01-13 to 2020-03-10
Data columns (total 6 columns):
Open          58 non-null float64
High          58 non-null float64
Low           58 non-null float64
Close         58 non-null float64
Volume ETH    58 non-null float64
Volume USD    58 non-null float64
dtypes: float64(6)
memory usage: 3.2 KB
None

In [ ]: