Data distributions and transformations¶

Alt text that describes the graphic

Why scale your data?
- Many machine learning algorithms perform better or converge faster when features are on a relatively similar scale and/or close to normally distributed. Scaling and standardizing can help features arrive in more digestible form for an algorithm.

Visualizing and describing¶

Get a snapshot of the composition of the data

from sklearn import datasets
import pandas as pd
import numpy as np

boston = datasets.load_boston()
X, y = boston.data, boston.target
df = pd.DataFrame(data=boston.data, columns=boston.feature_names)

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 13 columns):
CRIM       506 non-null float64
ZN         506 non-null float64
INDUS      506 non-null float64
CHAS       506 non-null float64
NOX        506 non-null float64
RM         506 non-null float64
AGE        506 non-null float64
DIS        506 non-null float64
RAD        506 non-null float64
TAX        506 non-null float64
PTRATIO    506 non-null float64
B          506 non-null float64
LSTAT      506 non-null float64
dtypes: float64(13)
memory usage: 51.5 KB

# display first 5 rows of df 
df.head()

Pandas describe function to produce some quick descriptive statistics.

# percentile list 
perc =[0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95] 

df.describe(percentiles = perc, include = [np.number, np.object])

Skew: The degree of distortion from a normal distribution.¶

Alt text that describes the graphic

For example if the response variable is skewed in a house pricing regression, the model will be trained on a much larger number of moderately priced homes, and will be less likely to successfully predict the price for the most expensive houses. The concept is the same as training a model on imbalanced categorical classes. If the values of a certain independent variable (feature) are skewed, depending on the model, skewness may violate model assumptions (e.g. logistic regression) or may impair the interpretation of feature importance.

# find skew using pandas 
df.skew().sort_values(ascending=False)

CRIM       5.223149
CHAS       3.405904
ZN         2.225666
DIS        1.011781
RAD        1.004815
LSTAT      0.906460
NOX        0.729308
TAX        0.669956
RM         0.403612
INDUS      0.295022
AGE       -0.598963
PTRATIO   -0.802325
B         -2.890374
dtype: float64

Shapiro-Wilks test:¶

We can objectively determine if a variable is skewed using the Shapiro-Wilks test. The Shapiro–Wilk test tests the null hypothesis that a sample x1, ..., xn came from a normally distributed population. The null hypothesis for this test is that the data is a sample from a normal distribution, so a p-value less than 0.05 indicates significant skewness.

Rank 1D from Yellowbrick

A one-dimensional ranking of features utilizes a ranking algorithm that takes into account only a single feature at a time (e.g. histogram analysis). By default we utilize the Shapiro-Wilk algorithm to assess the normality of the distribution of instances with respect to the feature. A barplot is then drawn showing the relative ranks of each feature.

from yellowbrick.features import Rank1D
import matplotlib.pyplot as plt
import seaborn as sns 
sns.set(style="darkgrid", color_codes=True)

rnk1 = Rank1D(algorithm='shapiro')
rnk1.fit(X, y)           # Fit the data to the visualizer
rnk1.transform(X)
plt.close()

rnk1_df = pd.DataFrame(rnk1.ranks_, index=df.columns, columns=['Rank'])
rnk1_df = rnk1_df.sort_values('Rank', ascending=False)

f, axes = plt.subplots(1,2 ,figsize=(12,5))
sns.barplot(x='Rank', y=rnk1_df.index, data=rnk1_df, ax=axes[0])
axes[0].set_title('Features')
axes[0].set_xlabel('Sharpio Rank')
axes[0].grid(False)

sns.barplot(x=df.skew().abs().sort_values(), y=df.skew().abs().sort_values().index, ax=axes[1])
axes[1].set_title('Features')
axes[1].set_xlabel('Absolute Skew')
axes[1].grid(False)

plt.tight_layout()
plt.show();

Function to produce more descriptive statistics.

from scipy import stats

def resumetable(df):
    print(f"Dataset Shape: {df.shape}")
    summary = pd.DataFrame(df.dtypes,columns=['dtypes'])
    summary = summary.reset_index()
    summary['Name'] = summary['index']
    summary = summary[['Name','dtypes']]
    summary['Missing'] = df.isnull().sum().values    
    summary['Uniques'] = df.nunique().values
    summary['First Value'] = df.loc[0].values
    summary['Second Value'] = df.loc[1].values
    summary['Third Value'] = df.loc[2].values

    for name in summary['Name'].value_counts().index:
        summary.loc[summary['Name'] == name, 'Entropy'] = round(stats.entropy(df[name].value_counts(normalize=True), base=2),2) 

    return summary

resumetable(df)

Dataset Shape: (506, 13)

View individual feature distributions

df.hist(figsize=(11,11), grid=False);

Empirical Cumulative Distribution

# visualising ECDF
from mlxtend.plotting import ecdf

fig, axs = plt.subplots(ncols=2, nrows=0, figsize=(13, 40))
plt.subplots_adjust(right=2)
plt.subplots_adjust(top=2)

for i, feature in enumerate(list(df), 1):
    
    plt.subplot(len(list(df.columns)), 3, i)
    ecdf(df[feature])
    plt.title(f'{feature}', size=15, fontsize=12, fontweight='medium')
    plt.grid(False)
    
    for j in range(2):
        plt.tick_params(axis='x', labelsize=8)
        plt.tick_params(axis='y', labelsize=8)
        
plt.tight_layout()       
plt.show()

# visualising scaled ECDF and KDE
from sklearn.preprocessing import StandardScaler

num_lines = len(df.columns)
colors = [plt.cm.jet(i) for i in np.linspace(0, 1, num_lines)]

from pylab import rcParams 
from cycler import cycler
rcParams['axes.prop_cycle'] = cycler('color', colors)
rcParams['axes.grid'] = False

z0 = df.values
z1 = StandardScaler().fit_transform(z0)
df_scld = pd.DataFrame(z1, columns=df.columns)

f, axes = plt.subplots(1,2 ,figsize=(16,5))

for i in df_scld.columns:
    ecdf(df_scld[i], ax=axes[0], ecdf_marker='.')
    sns.kdeplot(df_scld[i], ax=axes[1])
    
axes[0].set_title('Standardized Features ECDF')
axes[1].set_title('Standardized Features KDE')
axes[0].legend(list(df_scld.columns))
axes[1].legend(list(df_scld.columns))
plt.show()

View feature boxplot

f, ax = plt.subplots(figsize=(8, 5))
ax.set_xscale("log")
sns.boxplot(data=df , orient="h", palette='Set1', ax=ax)
plt.xlabel('Log')
plt.show();

Standardization, or mean removal and variance scaling:¶

Standardization of datasets is a common requirement for many machine learning estimators. Models might behave badly if the individual features do not more or less look like standard normally distributed data: Gaussian with zero mean and standard deviation of 1.

from sklearn import preprocessing

X_train = np.array([[ 1., -1.,  2.],
                     [ 2.,  0.,  0.],
                    [ 0.,  1., -1.]])

scaler = preprocessing.StandardScaler().fit(X_train)
print(scaler.transform(X_train))

[[ 0.         -1.22474487  1.33630621]
 [ 1.22474487  0.         -0.26726124]
 [-1.22474487  1.22474487 -1.06904497]]

Scaling features to a range:¶

An alternative standardization is scaling features to lie between a given minimum and maximum value, often between zero and one, or so that the maximum absolute value of each feature is scaled to unit size. This can be achieved using MinMaxScaler or MaxAbsScaler, respectively.

Transform features by scaling each feature to a given range.¶

from sklearn.preprocessing import MinMaxScaler

data = [[-1, 2], 
        [-0.5, 6], 
        [0, 10], 
        [1, 18]]

scaler = MinMaxScaler()
scaler.fit(data)
print(scaler.transform(data))

[[0.   0.  ]
 [0.25 0.25]
 [0.5  0.5 ]
 [1.   1.  ]]

Scale each feature by its maximum absolute value¶

from sklearn.preprocessing import MaxAbsScaler

X = [[ 1., -1.,  2.],
     [ 2.,  0.,  0.],
     [ 0.,  1., -1.]]

transformer = MaxAbsScaler().fit(X)
print(transformer.transform(X))

[[ 0.5 -1.   1. ]
 [ 1.   0.   0. ]
 [ 0.   1.  -0.5]]

Alt text that describes the graphic

Scaling data with outliers¶

If your data contains many outliers, scaling using the mean and variance of the data is likely to not work very well. In these cases, you can use robust_scale and RobustScaler as drop-in replacements instead. They use more robust estimates for the center and range of your data.
RobustScaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile). Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Median and interquartile range are then stored to be used on later data using the transform method.

from sklearn.preprocessing import RobustScaler

X = [[ 1., -2.,  2.],
     [ -2.,  1.,  3.],
     [ 4.,  1., -2.]]

transformer = RobustScaler(with_centering=True, with_scaling=True, quantile_range=(25.0, 75.0)).fit(X)
print(transformer.transform(X))

[[ 0.  -2.   0. ]
 [-1.   0.   0.4]
 [ 1.   0.  -1.6]]

Function to remove outliers

def CalcOutliers(df_num, limit = 3): 

    # calculating mean and std of the array
    data_mean, data_std = np.mean(df_num), np.std(df_num)

    # seting the cut line to both higher and lower values
    # You can change this value
    cut = data_std * limit

    #Calculating the higher and lower cut values
    lower, upper = data_mean - cut, data_mean + cut

    # creating an array of lower, higher and total outlier values 
    outliers_lower = [x for x in df_num if x < lower]
    outliers_higher = [x for x in df_num if x > upper]
    outliers_total = [x for x in df_num if x < lower or x > upper]

    # array without outlier values
    outliers_removed = [x for x in df_num if x > lower and x < upper]

    print('Total outlier observations: %d' % len(outliers_total)) # printing total number of values outliers of both sides
    print("Total percentage of Outliers: ", round((len(outliers_total) / len(outliers_removed) )*100, 4)) # Percentage of outliers in points
    print('Identified lowest outliers: %d' % len(outliers_lower)) # printing total number of values in lower cut of outliers
    print('Identified upper outliers: %d' % len(outliers_higher)) # printing total number of values in higher cut of outliers
    
    if len(outliers_higher) > 0:
        drp_upper = np.amin(np.array(outliers_higher), axis=0)
        print(f'Drop upper outliers >= {drp_upper}')
        
    if len(outliers_lower) > 0:        
        drp_lower = np.amax(np.array(outliers_lower), axis=0)
        print(f'Drop lower outliers <= {drp_lower}')
        
    if len(outliers_lower) > 0 & len(outliers_higher) > 0:        
        drp_lower = np.amax(np.array(outliers_lower), axis=0)
        drp_upper = np.amin(np.array(outliers_higher), axis=0)
        print(f'Drop outliers <= {drp_lower} and >= {drp_upper}') 

    return

for i in df.columns:
    print(f'\nCalculating outliers for {i}...')
    CalcOutliers(df[i], limit = 3)

Calculating outliers for CRIM...
Total outlier observations: 8
Total percentage of Outliers:  1.6064
Identified lowest outliers: 0
Identified upper outliers: 8
Drop upper outliers >= 37.6619

Calculating outliers for ZN...
Total outlier observations: 14
Total percentage of Outliers:  2.8455
Identified lowest outliers: 0
Identified upper outliers: 14
Drop upper outliers >= 82.5

Calculating outliers for INDUS...
Total outlier observations: 0
Total percentage of Outliers:  0.0
Identified lowest outliers: 0
Identified upper outliers: 0

Calculating outliers for CHAS...
Total outlier observations: 35
Total percentage of Outliers:  7.431
Identified lowest outliers: 0
Identified upper outliers: 35
Drop upper outliers >= 1.0

Calculating outliers for NOX...
Total outlier observations: 0
Total percentage of Outliers:  0.0
Identified lowest outliers: 0
Identified upper outliers: 0

Calculating outliers for RM...
Total outlier observations: 8
Total percentage of Outliers:  1.6064
Identified lowest outliers: 4
Identified upper outliers: 4
Drop upper outliers >= 8.398
Drop lower outliers <= 4.138

Calculating outliers for AGE...
Total outlier observations: 0
Total percentage of Outliers:  0.0
Identified lowest outliers: 0
Identified upper outliers: 0

Calculating outliers for DIS...
Total outlier observations: 5
Total percentage of Outliers:  0.998
Identified lowest outliers: 0
Identified upper outliers: 5
Drop upper outliers >= 10.5857

Calculating outliers for RAD...
Total outlier observations: 0
Total percentage of Outliers:  0.0
Identified lowest outliers: 0
Identified upper outliers: 0

Calculating outliers for TAX...
Total outlier observations: 0
Total percentage of Outliers:  0.0
Identified lowest outliers: 0
Identified upper outliers: 0

Calculating outliers for PTRATIO...
Total outlier observations: 0
Total percentage of Outliers:  0.0
Identified lowest outliers: 0
Identified upper outliers: 0

Calculating outliers for B...
Total outlier observations: 25
Total percentage of Outliers:  5.1975
Identified lowest outliers: 25
Identified upper outliers: 0
Drop lower outliers <= 81.33

Calculating outliers for LSTAT...
Total outlier observations: 5
Total percentage of Outliers:  0.998
Identified lowest outliers: 0
Identified upper outliers: 5
Drop upper outliers >= 34.37

Non-linear transformations¶

Two types of transformations are available: quantile transforms and power transforms. Both quantile and power transforms are based on monotonic transformations of the features and thus preserve the rank of the values along each feature.

Quantile transforms the features to follow a uniform or a normal distribution. Therefore, for a given feature, this transformation tends to spread out the most frequent values. It also reduces the impact of (marginal) outliers: this is therefore a robust preprocessing scheme.

Power transforms are a family of parametric transformations that aim to map data from any distribution to as close to a Gaussian distribution.

Normalization:¶

Normalize samples individually to unit norm.
Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one.
This transformer is able to work both with dense numpy arrays and scipy.sparse matrix (use CSR format if you want to avoid the burden of a copy / conversion).

Alt text that describes the graphic

from sklearn.preprocessing import Normalizer

X = [[4, 1, 2, 2],
     [1, 3, 9, 3],
     [5, 7, 5, 1]]

transformer = Normalizer(norm='l2').fit(X)  
print(transformer.transform(X))

[[0.8 0.2 0.4 0.4]
 [0.1 0.3 0.9 0.3]
 [0.5 0.7 0.5 0.1]]

Box-Cox Transformations¶

When you are dealing with data, you are going to deal with features that are heavily skewed. Transformation techniques are useful to stabilize variance, make the data more normal distribution-like and improve the validity of measures of association.

The problem with the Box-Cox Transformation is estimating lambda. This value will depend on the existing data, and should be considered when performing cross validation on out of sample datasets. Make sure to estimate lambda according to the training dataset.

Other common transformation include log and square root transformation.

Alt text that describes the graphic

from sklearn.preprocessing import PowerTransformer
from sklearn.preprocessing import QuantileTransformer
from sklearn.model_selection import train_test_split

N_SAMPLES = 1000
FONT_SIZE = 14
BINS = 30

rng = np.random.RandomState(304)
bc = PowerTransformer(method='box-cox')
yj = PowerTransformer(method='yeo-johnson')
# n_quantiles is set to the training set size rather than the default value
# to avoid a warning being raised by this example
qt = QuantileTransformer(n_quantiles=500, output_distribution='normal',
                         random_state=rng)
size = (N_SAMPLES, 1)

# lognormal distribution
X_lognormal = rng.lognormal(size=size)

# chi-squared distribution
dfx = 3
X_chisq = rng.chisquare(df=dfx, size=size)

# weibull distribution
a = 50
X_weibull = rng.weibull(a=a, size=size)

# gaussian distribution
loc = 100
X_gaussian = rng.normal(loc=loc, size=size)

# uniform distribution
X_uniform = rng.uniform(low=0, high=1, size=size)

# bimodal distribution
loc_a, loc_b = 100, 105
X_a, X_b = rng.normal(loc=loc_a, size=size), rng.normal(loc=loc_b, size=size)
X_bimodal = np.concatenate([X_a, X_b], axis=0)

# create plots
distributions = [
    ('Lognormal', X_lognormal),
    ('Chi-squared', X_chisq),
    ('Weibull', X_weibull),
    ('Gaussian', X_gaussian),
    ('Uniform', X_uniform),
    ('Bimodal', X_bimodal)
]

colors = ['#D81B60', '#0188FF', '#FFC107',
          '#B7A2FF', '#000000', '#2EC5AC']

fig, axes = plt.subplots(nrows=8, ncols=3, figsize=(12, 18))
axes = axes.flatten()

axes_idxs = [(0, 3, 6, 9), (1, 4, 7, 10), (2, 5, 8, 11), (12, 15, 18, 21),
             (13, 16, 19, 22), (14, 17, 20, 23)]

axes_list = [(axes[i], axes[j], axes[k], axes[l])
             for (i, j, k, l) in axes_idxs]

for distribution, color, axes in zip(distributions, colors, axes_list):
    name, X = distribution
    X_train, X_test = train_test_split(X, test_size=.5)

    # perform power transforms and quantile transform
    X_trans_bc = bc.fit(X_train).transform(X_test)
    lmbda_bc = round(bc.lambdas_[0], 2)
    X_trans_yj = yj.fit(X_train).transform(X_test)
    lmbda_yj = round(yj.lambdas_[0], 2)
    X_trans_qt = qt.fit(X_train).transform(X_test)

    ax_original, ax_bc, ax_yj, ax_qt = axes

    ax_original.hist(X_train, color=color, bins=BINS)
    ax_original.set_title(name, fontsize=FONT_SIZE)
    ax_original.tick_params(axis='both', which='major', labelsize=FONT_SIZE)

    for ax, X_trans, meth_name, lmbda in zip(
            (ax_bc, ax_yj, ax_qt),
            (X_trans_bc, X_trans_yj, X_trans_qt),
            ('Box-Cox', 'Yeo-Johnson', 'Quantile transform'),
            (lmbda_bc, lmbda_yj, None)):
        ax.hist(X_trans, color=color, bins=BINS, )
        title = 'After {}'.format(meth_name)
        
        if lmbda is not None:
            title += r'\n$\lambda$ = {}'.format(lmbda)
        ax.set_title(title, fontsize=FONT_SIZE)
        ax.tick_params(axis='both', which='major', labelsize=FONT_SIZE)
        ax.set_xlim([-3.5, 3.5])

plt.tight_layout()
plt.show()

View Transformations with boxplot and theoretical quantiles¶

def plotting_3_chart(df, title = 'plot'):
        
    ## Creating a customized chart. and giving in figsize and everything. 
    fig = plt.figure(constrained_layout=True, figsize=(8,5))
    ## creating a grid of 3 cols and 3 rows. 
    grid = gridspec.GridSpec(ncols=3, nrows=3, figure=fig)

    ## Customizing the histogram grid. 
    ax1 = fig.add_subplot(grid[0, :2])
    ## Set the title. 
    ax1.set_title(f'{title} distribution')
    ## plot the histogram.f'{title} 
    sns.distplot(df, norm_hist=True, ax = ax1, fit=stats.norm, bins=30)
    ax1.legend(('normal', f'{title}'))

    # customizing the QQ_plot. 
    ax2 = fig.add_subplot(grid[1, :2])
    ## Set the title. 
    ax2.set_title('QQ_plot')
    ## Plotting the QQ_Plot. 
    stats.probplot(df, plot = ax2)

    ## Customizing the Box Plot. 
    ax3 = fig.add_subplot(grid[:, 2])
    ## Set title. 
    ax3.set_title('Box Plot')
    ## Plotting the box plot. 
    sns.boxplot(df, orient='v', ax = ax3)
    
    plt.show();

import pylab 
import matplotlib.gridspec as gridspec
sns.set(style="darkgrid", color_codes=True)

# Creat dummy arrays, skewed to the left
x = stats.loggamma.rvs(3, size=700) + 3

# How is the distribution for x?
plotting_3_chart(x, title = 'no transformation')

# What happens when log transformation?
plotting_3_chart(pd.Series(np.log(x)), title = 'log')

# What happens when sqrt transformation?
plotting_3_chart(pd.Series(np.sqrt(x)), title = 'sqrt')

# Now what happens when box-cox transformation?
x_bc, lmda = stats.boxcox(x)
plotting_3_chart(pd.Series(x_bc), title = 'box-cox')

print("lambda parameter for Box-Cox Transformation is {}".format(lmda))

lambda parameter for Box-Cox Transformation is 1.979697721508571

View non transformed features vs features with outliers removed and boxcox transformation¶

from scipy.special import boxcox1p
from scipy.stats import boxcox_normmax
import warnings
warnings.filterwarnings('ignore')

boston = datasets.load_boston()
X, y = boston.data, boston.target
df = pd.DataFrame(data=boston.data, columns=boston.feature_names)

df_bc = df.copy()

# Normalize skewed features
for i in df_bc.columns:
    df_bc[i] = boxcox1p(df_bc[i], boxcox_normmax(df_bc[i] + 1))

# Remove outliers
transformer = RobustScaler(with_centering=True, with_scaling=True, quantile_range=(25.0, 75.0))
z1 = transformer.fit_transform(df_bc.values)
df_bc = pd.DataFrame(z1, columns=df_bc.columns)

rnk1 = Rank1D(algorithm='shapiro')
rnk1.fit(X, y)           
rnk1.transform(X)
plt.close()

rnk1_df = pd.DataFrame(rnk1.ranks_, index=df.columns, columns=['Rank'])
rnk1_df = rnk1_df.sort_values('Rank', ascending=False)

X0 = df_bc

rnk10 = Rank1D(algorithm='shapiro')
rnk10.fit(X0, y)           
rnk10.transform(X0)
plt.close()

rnk1_df0 = pd.DataFrame(rnk10.ranks_, index=df_bc.columns, columns=['Rank'])
rnk1_df0 = rnk1_df0.sort_values('Rank', ascending=False)

f, axes = plt.subplots(1,2 ,figsize=(12,5))
sns.barplot(x='Rank', y=rnk1_df.index, data=rnk1_df, ax=axes[0])
axes[0].set_title('Features')
axes[0].set_xlabel('Sharpio Rank')
axes[0].grid(False)

sns.barplot(x='Rank', y=rnk1_df0.index, data=rnk1_df0, ax=axes[1])
axes[1].set_title('BoxCox + RobustScaler Features')
axes[1].set_xlabel('Sharpio Rank')
axes[1].grid(False)

plt.tight_layout()
plt.show();

Discretization¶

Discretization (otherwise known as quantization or binning) provides a way to partition continuous features into discrete values. Certain datasets with continuous features may benefit from discretization, because discretization can transform the dataset of continuous attributes to one with only nominal attributes. One-hot encoded discretized features can make a model more expressive, while maintaining interpretability.

KBinsDiscretizer discretizes features into k bins

X = np.array([[ -3., 5., 15 ],
              [  0., 6., 14 ],
              [ -1., 4., 18 ],
              [  6., 3., 11 ]])

est = preprocessing.KBinsDiscretizer(n_bins=[4, 3, 2], encode='ordinal').fit(X)
print(est.transform(X))

[[0. 2. 1.]
 [2. 2. 0.]
 [1. 1. 1.]
 [3. 0. 0.]]

Feature binarization: Feature binarization is the process of thresholding numerical features to get boolean values. This can be useful for downstream probabilistic estimators that make assumption that the input data is distributed according to a multi-variate Bernoulli distribution.

X = [[ 1., -1.,  2.],
     [ 2.,  0.,  0.],
     [ 0.,  1., -1.]]

binarizer = preprocessing.Binarizer().fit(X)  # fit does nothing

print(binarizer.transform(X))

[[1. 0. 1.]
 [1. 0. 0.]
 [0. 1. 0.]]

Generating polynomial features

Often it’s useful to add complexity to the model by considering nonlinear features of the input data. A simple and common method to use is polynomial features, which can get features’ high-order and interaction terms.

Generate a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree.

from sklearn.preprocessing import PolynomialFeatures

X = np.arange(20).reshape(5, 4)

print('Normal:')
print(pd.DataFrame(X))

poly = PolynomialFeatures(degree=2)

print('\nPoly:')         
print(pd.DataFrame(poly.fit_transform(X)))

Normal:
    0   1   2   3
0   0   1   2   3
1   4   5   6   7
2   8   9  10  11
3  12  13  14  15
4  16  17  18  19

Poly:
     0     1     2     3     4      5      6      7      8      9     10  \
0  1.0   0.0   1.0   2.0   3.0    0.0    0.0    0.0    0.0    1.0    2.0   
1  1.0   4.0   5.0   6.0   7.0   16.0   20.0   24.0   28.0   25.0   30.0   
2  1.0   8.0   9.0  10.0  11.0   64.0   72.0   80.0   88.0   81.0   90.0   
3  1.0  12.0  13.0  14.0  15.0  144.0  156.0  168.0  180.0  169.0  182.0   
4  1.0  16.0  17.0  18.0  19.0  256.0  272.0  288.0  304.0  289.0  306.0   

      11     12     13     14  
0    3.0    4.0    6.0    9.0  
1   35.0   36.0   42.0   49.0  
2   99.0  100.0  110.0  121.0  
3  195.0  196.0  210.0  225.0  
4  323.0  324.0  342.0  361.0

	CRIM	ZN	INDUS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT
0	0.00632	18.0	2.31	0.538	6.575	65.2	4.0900	1.0	296.0	15.3	396.90	4.98
1	0.02731	0.0	7.07	0.469	6.421	78.9	4.9671	2.0	242.0	17.8	396.90	9.14
2	0.02729	0.0	7.07	0.469	7.185	61.1	4.9671	2.0	242.0	17.8	392.83	4.03
3	0.03237	0.0	2.18	0.458	6.998	45.8	6.0622	3.0	222.0	18.7	394.63	2.94
4	0.06905	0.0	2.18	0.458	7.147	54.2	6.0622	3.0	222.0	18.7	396.90	5.33

	CRIM	ZN	INDUS	CHAS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT
count	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000
mean	3.613524	11.363636	11.136779	0.069170	0.554695	6.284634	68.574901	3.795043	9.549407	408.237154	18.455534	356.674032	12.653063
std	8.601545	23.322453	6.860353	0.253994	0.115878	0.702617	28.148861	2.105710	8.707259	168.537116	2.164946	91.294864	7.141062
min	0.006320	0.000000	0.460000	0.000000	0.385000	3.561000	2.900000	1.129600	1.000000	187.000000	12.600000	0.320000	1.730000
5%	0.027910	0.000000	2.180000	0.000000	0.409250	5.314000	17.725000	1.461975	2.000000	222.000000	14.700000	84.590000	3.707500
10%	0.038195	0.000000	2.910000	0.000000	0.427000	5.593500	26.950000	1.628300	3.000000	233.000000	14.750000	290.270000	4.680000
25%	0.082045	0.000000	5.190000	0.000000	0.449000	5.885500	45.025000	2.100175	4.000000	279.000000	17.400000	375.377500	6.950000
50%	0.256510	0.000000	9.690000	0.000000	0.538000	6.208500	77.500000	3.207450	5.000000	330.000000	19.050000	391.440000	11.360000
75%	3.677083	12.500000	18.100000	0.000000	0.624000	6.623500	94.075000	5.188425	24.000000	666.000000	20.200000	396.225000	16.955000
90%	10.753000	42.500000	19.580000	0.000000	0.713000	7.151500	98.800000	6.816600	24.000000	666.000000	20.900000	396.900000	23.035000
95%	15.789150	80.000000	21.890000	1.000000	0.740000	7.587500	100.000000	7.827800	24.000000	666.000000	21.000000	396.900000	26.807500
max	88.976200	100.000000	27.740000	1.000000	0.871000	8.780000	100.000000	12.126500	24.000000	711.000000	22.000000	396.900000	37.970000

	Name	dtypes	Uniques	First Value	Second Value	Third Value	Entropy
0	CRIM	float64	504	0.00632	0.02731	0.02729	8.98
1	ZN	float64	26	18.00000	0.00000	0.00000	1.95
2	INDUS	float64	76	2.31000	7.07000	7.07000	5.03
3	CHAS	float64	2	0.00000	0.00000	0.00000	0.36
4	NOX	float64	81	0.53800	0.46900	0.46900	6.00
5	RM	float64	446	6.57500	6.42100	7.18500	8.74
6	AGE	float64	356	65.20000	78.90000	61.10000	8.05
7	DIS	float64	412	4.09000	4.96710	4.96710	8.57
8	RAD	float64	9	1.00000	2.00000	2.00000	2.74
9	TAX	float64	66	296.00000	242.00000	242.00000	4.83
10	PTRATIO	float64	46	15.30000	17.80000	17.80000	4.43
11	B	float64	357	396.90000	396.90000	392.83000	7.21
12	LSTAT	float64	455	4.98000	9.14000	4.03000	8.77