正文

59 - What is Random Forest classifier

很多的决策树在一起就变成了森林，就是随机森林了。

决策树

比如要将一张图分类成:

Air 空气
Pyrite 黄铁矿
Clay 粘土
Pore 孔洞
Quartz 石英

从图像的 Pixel Value 灰度值和 Texture 纹理进行分类:

jpg

Why start with pixel value and not texture metric for this image?
- 为什么从像素值开始，而不是纹理度量这张图像?
Because it gives the best split of input data.
- 因为它给出了输入数据的最佳分割。
How to pick a node that gives the best split?
- 如何选择一个能给出最佳分割的节点?
Use Gini impurity → pick the one that maximizes the Gini gain.
- 使用基尼系数→选择一个使基尼系数增益最大化的。
Gini lmpurity is the probability of incorrectly classifying a randomly chosenelement in the dataset if it were randomly labeled according to the classdistribution in the dataset. lt’s calculated as
- 基尼系数是指数据集中随机选择的元素，如果根据数据集中的类别分布随机标记，则错误分类的概率。计算为

$G=\sum^C_{i=1}p(i)*\left(1-p(i)\right)$

where $C$ is the number of classes and $p(i)$ is the probability of randomly picking an element of class $i$ .
- 其中 $C$ 是类的数量， $p(i)$ 是随机抽取类 $i$ 中的一个元素的概率。
Primary Disadvantage of decision trees: Often suffers from overfitting →works well on training data but fails on newdata leading to low accuracy.
- 决策树的主要缺点:
  
  经常存在过拟合问题→在训练数据上很好，但在新数据上失败，导致精度低。
Random Forest to the rescue!
- 使用决策森林规避决策树的缺点。

jpg

60 - How to use Random Forest in Python

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_csv('data/images_analyzed_productivity1.csv')
df.head()

	User	Time	Coffee	Age	Images_Analyzed	Productivity
0	1	8	0	23	20	Good
1	1	13	0	23	14	Bad
2	1	17	0	23	18	Good
3	1	22	0	23	15	Bad
4	1	8	2	23	22	Good

1 2	`sizes = df['Productivity'].value_counts(sort=1) sizes`

Bad     42
Good    38
Name: Productivity, dtype: int64

去除无关的列

1
2
3

df.drop(['Images_Analyzed'], axis=1, inplace=True)
df.drop(['User'], axis=1, inplace=True)
df.head()

	Time	Coffee	Age	Productivity
0	8	0	23	Good
1	13	0	23	Bad
2	17	0	23	Good
3	22	0	23	Bad
4	8	2	23	Good

删除缺失数据

1	`df = df.dropna()`

将分析结果转换为数字

1
2
3

df.Productivity[df.Productivity == 'Good'] = 1
df.Productivity[df.Productivity == 'Bad'] = 2
df.head()

	Time	Coffee	Age	Productivity
0	8	0	23	1
1	13	0	23	2
2	17	0	23	1
3	22	0	23	2
4	8	2	23	1

定义因变量

1
2
3

Y = df['Productivity'].values
Y = Y.astype('int')
Y

array([1, 2, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1, 2, 2, 2, 1, 2, 1, 2, 1, 2,
       1, 2, 1, 2, 1, 2, 1, 2, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1,
       1, 2, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 2, 1, 2,
       1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 2])

定义自变量

1	`X = df.drop(labels=['Productivity'], axis=1)`

将数据分割为训练集和测试集

1
2
3

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.4, random_state=20)

使用随机森林

sklearn.ensemble.RandomForestClassifier

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=10, random_state=30)
model.fit(X_train, Y_train)
prediction_test = model.predict(X_test)
prediction_test

array([1, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2,
       1, 1, 2, 1, 1, 2, 1, 1, 1, 1])

计算训练出的结果的精确度

1
2
3

from sklearn import metrics

print('Accuracy =', metrics.accuracy_score(Y_test, prediction_test))

Accuracy = 0.9375

扩大训练集的比例可以增加精确度

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=20)
model = RandomForestClassifier(n_estimators=10, random_state=30)
model.fit(X_train, Y_train)
prediction_test = model.predict(X_test)
prediction_test
print('Accuracy =', metrics.accuracy_score(Y_test, prediction_test))

Accuracy = 0.9375

显示特征值的重要性

1
2
3

feature_list = list(X.columns)
feature_imp = pd.Series(model.feature_importances_, index=feature_list).sort_values(ascending=False)
feature_imp

Time      0.714433
Coffee    0.205474
Age       0.080092
dtype: float64

随机森林可视化

python 随机森林可视化_阿雷吖睚的博客-CSDN 博客_随机森林可视化

from IPython.display import HTML, display
from sklearn import tree
import pydotplus

estimators = model.estimators_
for m in estimators:
    dot_data = tree.export_graphviz(m, out_file=None,
                         feature_names=['Time', 'Coffee', 'Age'],
                         class_names=['Good', 'Bad'],
                         filled=True, rounded=True,
                         special_characters=True)
    graph = pydotplus.graph_from_dot_data(dot_data)
# 使用 ipython 的终端 jupyter notebook 显示。
svg = graph.create_svg()
if hasattr(svg, "decode"):
     svg = svg.decode("utf-8")
html = HTML(svg)
display(html)

svg

61 - How to create Gabor feature banks for machine learning

import numpy as np
import cv2
import matplotlib.pyplot as plt
import pandas as pd

1	`img = cv2.imread('images/synthetic.jpg', 0)`

df = pd.DataFrame()
img2 = img.reshape(-1)
df['Original Pixels'] = img2
df

	Original Pixels
0	255
1	255
2	255
3	255
4	255
...	...
363446	255
363447	255
363448	255
363449	255
363450	255

363451 rows × 1 columns

设置 Gabor 的不同参数构造出不同的卷积核，生成用于机器学习的 csv 文件：

num = 1
for sigma in (3, 5):
    for theta in range(2):
        theta = theta / 4. * np.pi
        for lamda in np.arange(0, np.pi, np.pi / 4.):
            for gamma in (0.05, 0.5):
                gabor_label = 'Gabor ' + str(num)
                kernel = cv2.getGaborKernel((5, 5), sigma, theta, lamda, gamma, 0, ktype=cv2.CV_32F)
                fimg = cv2.filter2D(img, cv2.CV_8UC3, kernel)
                filtered_img = fimg.reshape(-1)
                df[gabor_label] = filtered_img
                num += 1

1	`df.head()`

	Original Pixels	Gabor 7	Gabor 8	...	Gabor 23	Gabor 24	Gabor 27	Gabor 28	Gabor 29	Gabor 30	Gabor 31	Gabor 32
0	255	255	255	...	255	255	255	255	130	122	255	255
1	255	255	255	...	255	255	255	255	130	122	255	255
2	255	255	255	...	255	255	255	255	130	122	255	255
3	255	255	255	...	255	255	255	255	130	122	255	255
4	255	255	255	...	255	255	255	255	130	122	255	255

5 rows × 33 columns

1	`df.to_csv('Gabor.csv')`

png

62 - Image Segmentation using traditional machine learning - The plan

讲了下后面几篇视频要干啥。

63 - Image Segmentation using traditional machine learning Part1 - FeatureExtraction

import numpy as np
import cv2
import pandas as pd
import matplotlib.pyplot as plt

img = cv2.imread('images/Train_images/Sandstone_Versa0000.tif', 0)
plt.imshow(img, cmap='gray')

<matplotlib.image.AxesImage at 0x17d0c13f730>

png

1	`df = pd.DataFrame()`

Add original pixel values to the data frame as feature #1

1
2
3

img2 = img.reshape(-1)
df['Original Image'] = img2
df.head()

	Original Image
0	0
1	0
2	0
3	0
4	0

Add Other features
First set - Gabor features

# Generate Gabor features
num = 1  # To count numbers up in order to give Gabor features a lable in the data frame
kernels = []
for theta in range(2):  # Define number of thetas
    theta = theta / 4. * np.pi
    for sigma in (1, 3):  # Sigma with 1 and 3
        for lamda in np.arange(0, np.pi, np.pi / 4):  # Range of wavelengths
            for gamma in (0.05, 0.5):  # Gamma values of 0.05 and 0.5
                gabor_label = 'Gabor' + str(num)  # Label Gabor columns as Gabor1, Gabor2, etc.
                ksize = 9
                kernel = cv2.getGaborKernel((ksize, ksize), sigma, theta, lamda, gamma, 0, ktype=cv2.CV_32F)    
                kernels.append(kernel)
                # Now filter the image and add values to a new column 
                fimg = cv2.filter2D(img2, cv2.CV_8UC3, kernel)
                filtered_img = fimg.reshape(-1)
                df[gabor_label] = filtered_img  # Labels columns as Gabor1, Gabor2, etc.
                print(gabor_label, ': theta =', theta, ': sigma =', sigma, ': lamda =', lamda, ': gamma =', gamma)
                num += 1  # Increment for gabor column label

Gabor1 : theta = 0.0 : sigma = 1 : lamda = 0.0 : gamma = 0.05
Gabor2 : theta = 0.0 : sigma = 1 : lamda = 0.0 : gamma = 0.5
Gabor3 : theta = 0.0 : sigma = 1 : lamda = 0.7853981633974483 : gamma = 0.05
Gabor4 : theta = 0.0 : sigma = 1 : lamda = 0.7853981633974483 : gamma = 0.5
Gabor5 : theta = 0.0 : sigma = 1 : lamda = 1.5707963267948966 : gamma = 0.05
Gabor6 : theta = 0.0 : sigma = 1 : lamda = 1.5707963267948966 : gamma = 0.5
Gabor7 : theta = 0.0 : sigma = 1 : lamda = 2.356194490192345 : gamma = 0.05
Gabor8 : theta = 0.0 : sigma = 1 : lamda = 2.356194490192345 : gamma = 0.5
Gabor9 : theta = 0.0 : sigma = 3 : lamda = 0.0 : gamma = 0.05
Gabor10 : theta = 0.0 : sigma = 3 : lamda = 0.0 : gamma = 0.5
Gabor11 : theta = 0.0 : sigma = 3 : lamda = 0.7853981633974483 : gamma = 0.05
Gabor12 : theta = 0.0 : sigma = 3 : lamda = 0.7853981633974483 : gamma = 0.5
Gabor13 : theta = 0.0 : sigma = 3 : lamda = 1.5707963267948966 : gamma = 0.05
Gabor14 : theta = 0.0 : sigma = 3 : lamda = 1.5707963267948966 : gamma = 0.5
Gabor15 : theta = 0.0 : sigma = 3 : lamda = 2.356194490192345 : gamma = 0.05
Gabor16 : theta = 0.0 : sigma = 3 : lamda = 2.356194490192345 : gamma = 0.5
Gabor17 : theta = 0.7853981633974483 : sigma = 1 : lamda = 0.0 : gamma = 0.05
Gabor18 : theta = 0.7853981633974483 : sigma = 1 : lamda = 0.0 : gamma = 0.5
Gabor19 : theta = 0.7853981633974483 : sigma = 1 : lamda = 0.7853981633974483 : gamma = 0.05
Gabor20 : theta = 0.7853981633974483 : sigma = 1 : lamda = 0.7853981633974483 : gamma = 0.5
Gabor21 : theta = 0.7853981633974483 : sigma = 1 : lamda = 1.5707963267948966 : gamma = 0.05
Gabor22 : theta = 0.7853981633974483 : sigma = 1 : lamda = 1.5707963267948966 : gamma = 0.5
Gabor23 : theta = 0.7853981633974483 : sigma = 1 : lamda = 2.356194490192345 : gamma = 0.05
Gabor24 : theta = 0.7853981633974483 : sigma = 1 : lamda = 2.356194490192345 : gamma = 0.5
Gabor25 : theta = 0.7853981633974483 : sigma = 3 : lamda = 0.0 : gamma = 0.05
Gabor26 : theta = 0.7853981633974483 : sigma = 3 : lamda = 0.0 : gamma = 0.5
Gabor27 : theta = 0.7853981633974483 : sigma = 3 : lamda = 0.7853981633974483 : gamma = 0.05
Gabor28 : theta = 0.7853981633974483 : sigma = 3 : lamda = 0.7853981633974483 : gamma = 0.5
Gabor29 : theta = 0.7853981633974483 : sigma = 3 : lamda = 1.5707963267948966 : gamma = 0.05
Gabor30 : theta = 0.7853981633974483 : sigma = 3 : lamda = 1.5707963267948966 : gamma = 0.5
Gabor31 : theta = 0.7853981633974483 : sigma = 3 : lamda = 2.356194490192345 : gamma = 0.05
Gabor32 : theta = 0.7853981633974483 : sigma = 3 : lamda = 2.356194490192345 : gamma = 0.5

Gerate OTHER FEATURES and add them to the data frame
Canny edge

1
2
3

edges = cv2.Canny(img, 100, 200)
edges1 = edges.reshape(-1)
df['Canny Edge'] = edges1

ROBERTS EDGE

from skimage.filters import roberts, sobel, scharr, prewitt

edge_roberts = roberts(img)
edge_roberts1 = edge_roberts.reshape(-1)
df['Roberts'] = edge_roberts1

SOBEL

1
2
3

edge_sobel = sobel(img)
edge_sobel1 = edge_sobel.reshape(-1)
df['Sobel'] = edge_sobel1

SCHARR

1
2
3

edge_scharr = scharr(img)
edge_scharr1 = edge_scharr.reshape(-1)
df['Scharr'] = edge_scharr1

PREWITT

1
2
3

edge_prewitt = prewitt(img)
edge_prewitt1 = edge_prewitt.reshape(-1)
df['Prewitt'] = edge_prewitt1

GAUSSIAN with sigma = 3

from scipy import ndimage as nd

gaussian_img = nd.gaussian_filter(img, sigma=3)
gaussian_img1 = gaussian_img.reshape(-1)
df['Gaussian s3'] = gaussian_img1

GAUSSIAN with sigma = 7

1
2
3

gaussian_img2 = nd.gaussian_filter(img, sigma=7)
gaussian_img3 = gaussian_img2.reshape(-1)
df['Gaussian s7'] = gaussian_img3

MEDIAN with sigma = 3

1
2
3

median_img = nd.median_filter(img, size=3)
median_img1 = median_img.reshape(-1)
df['Median s3'] = median_img1

VARIANCE with size = 3

1
2
3

variance_img = nd.generic_filter(img, np.var, size=3)
variance_img1 = variance_img.reshape(-1)
df['Variance s3'] = variance_img1  # Add column to original dataframe

1	`df.head()`

	Original Image	Gabor1	Gabor2	Gabor3	Gabor4	Gabor5	Gabor6	Gabor7	Gabor8	Gabor9	...	Gabor32	Canny Edge	Roberts	Sobel	Scharr	Prewitt	Gaussian s3	Gaussian s7	Median s3	Variance s3
0	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0

5 rows × 42 columns

1
2
3

labeled_img = cv2.imread('images/Train_masks/Sandstone_Versa0000.tif', 0)
labeled_img1 = labeled_img.reshape(-1)
df['Label'] = labeled_img1

64 - Image Segmentation using traditional machine learning - Part2 Training RF

Dependent variable

1 2	`Y = df['Label'].values X = df.drop(labels=['Label'], axis=1)`

Split data into test and train

1
2
3

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.4, random_state=20)

Import ML algorithm and train the model

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=10, random_state=42)
model.fit(X_train, Y_train)
prediction_test = model.predict(X_test)

1
2
3

from sklearn import metrics

print("Accuracy =", metrics.accuracy_score(Y_test, prediction_test))

Accuracy = 0.9812850216441728

65 - Image Segmentation using traditional machine learning - Part3 Feature Ranking

fig = plt.figure(figsize=(12, 16))
p = 1
for index, feature in enumerate(df.columns):
    if index == 0:
        p += 1
        ax = fig.add_subplot(181)
        plt.xticks([])
        plt.yticks([])
        ax.imshow(img, cmap='gray')
        ax.title.set_text(feature)
    else:
        if p % 8 == 1:
            p += 1
        exec("ax" + str(index) + "=fig.add_subplot(6, 8, " + str(p) + ")")
        plt.xticks([])
        plt.yticks([])
        exec("ax" + str(index) + ".imshow(np.array(df[feature]).reshape(img.shape), cmap='gray')")
        exec("ax" + str(index) + ".title.set_text('" + feature + "')")
        p += 1
plt.show()

png

importances = list(model.feature_importances_)
features_list = list(X.columns)
feature_imp = pd.Series(model.feature_importances_, index=features_list).sort_values(ascending=False)
feature_imp

Gabor4            0.248493
Gaussian s3       0.168623
Median s3         0.122685
Original Image    0.092540
Gabor8            0.086585
Gabor11           0.076893
Gabor3            0.070587
Gabor6            0.021357
Gaussian s7       0.020470
Gabor24           0.011645
Gabor7            0.010555
Prewitt           0.010252
Gabor21           0.007676
Sobel             0.007102
Gabor23           0.006989
Gabor5            0.006329
Scharr            0.005543
Roberts           0.005393
Gabor22           0.004461
Variance s3       0.002942
Gabor31           0.002886
Gabor29           0.002720
Gabor32           0.002607
Gabor30           0.002361
Canny Edge        0.001267
Gabor12           0.001025
Gabor20           0.000011
Gabor28           0.000002
Gabor27           0.000002
Gabor14           0.000000
Gabor26           0.000000
Gabor25           0.000000
Gabor1            0.000000
Gabor19           0.000000
Gabor18           0.000000
Gabor17           0.000000
Gabor16           0.000000
Gabor10           0.000000
Gabor9            0.000000
Gabor15           0.000000
Gabor2            0.000000
Gabor13           0.000000
dtype: float64

66 - Image Segmentation using traditional machine learning - Part4 Pickling Model

import pickle

filename = 'sandstone_model'
pickle.dump(model, open(filename, 'wb'))

load_model = pickle.load(open(filename, 'rb'))
result = load_model.predict(X)

segmented = result.reshape((img.shape))

1
2
3

import matplotlib.pyplot as plt

plt.imshow(segmented, cmap='jet')

<matplotlib.image.AxesImage at 0x17d37062220>

png

1	`plt.imsave('segmented_rock.jpg', segmented, cmap='jet')`

67 - Image Segmentation using traditional machine learning - Part5 Segmenting Images

import numpy as np
import cv2
import pandas as pd
 
def feature_extraction(img):
    df = pd.DataFrame()


# All features generated must match the way features are generated for TRAINING.
# Feature1 is our original image pixels
    img2 = img.reshape(-1)
    df['Original Image'] = img2

# Generate Gabor features
    num = 1
    kernels = []
    for theta in range(2):
        theta = theta / 4. * np.pi
        for sigma in (1, 3):
            for lamda in np.arange(0, np.pi, np.pi / 4):
                for gamma in (0.05, 0.5):      
                    gabor_label = 'Gabor' + str(num)
                    ksize=9
                    kernel = cv2.getGaborKernel((ksize, ksize), sigma, theta, lamda, gamma, 0, ktype=cv2.CV_32F)    
                    kernels.append(kernel)
                    # Now filter image and add values to new column
                    fimg = cv2.filter2D(img2, cv2.CV_8UC3, kernel)
                    filtered_img = fimg.reshape(-1)
                    df[gabor_label] = filtered_img  # Modify this to add new column for each gabor
                    num += 1
    ########################################
    # Geerate OTHER FEATURES and add them to the data frame
    # Feature 3 is canny edge
    edges = cv2.Canny(img, 100,200)   # Image, min and max values
    edges1 = edges.reshape(-1)
    df['Canny Edge'] = edges1  # Add column to original dataframe

    from skimage.filters import roberts, sobel, scharr, prewitt

    # Feature 4 is Roberts edge
    edge_roberts = roberts(img)
    edge_roberts1 = edge_roberts.reshape(-1)
    df['Roberts'] = edge_roberts1

    # Feature 5 is Sobel
    edge_sobel = sobel(img)
    edge_sobel1 = edge_sobel.reshape(-1)
    df['Sobel'] = edge_sobel1

    # Feature 6 is Scharr
    edge_scharr = scharr(img)
    edge_scharr1 = edge_scharr.reshape(-1)
    df['Scharr'] = edge_scharr1

    # Feature 7 is Prewitt
    edge_prewitt = prewitt(img)
    edge_prewitt1 = edge_prewitt.reshape(-1)
    df['Prewitt'] = edge_prewitt1

    # Feature 8 is Gaussian with sigma=3
    from scipy import ndimage as nd
    gaussian_img = nd.gaussian_filter(img, sigma=3)
    gaussian_img1 = gaussian_img.reshape(-1)
    df['Gaussian s3'] = gaussian_img1

    # Feature 9 is Gaussian with sigma=7
    gaussian_img2 = nd.gaussian_filter(img, sigma=7)
    gaussian_img3 = gaussian_img2.reshape(-1)
    df['Gaussian s7'] = gaussian_img3

    # Feature 10 is Median with sigma=3
    median_img = nd.median_filter(img, size=3)
    median_img1 = median_img.reshape(-1)
    df['Median s3'] = median_img1

    # Feature 11 is Variance with size=3
    variance_img = nd.generic_filter(img, np.var, size=3)
    variance_img1 = variance_img.reshape(-1)
    df['Variance s3'] = variance_img1  # Add column to original dataframe

    return df

import glob
import pickle
from matplotlib import pyplot as plt

filename = "sandstone_model"
loaded_model = pickle.load(open(filename, 'rb'))

path = "images/Train_images/*.tif"
for file in glob.glob(path):
    print(file)  # just stop here to see all file names printed
    img = cv2.imread(file, 0)
    # Call the feature extraction function.
    X = feature_extraction(img)
    result = loaded_model.predict(X)
    segmented = result.reshape((img.shape))
    
    name = file.split("e_")
    cv2.imwrite('images/Segmented/'+ name[1], segmented)

jpg

67b - Feature based image segmentation using traditional machine learning. -Multi-training images-

总结通过传统机器学习方法进行图像分类的各个步骤。

使用随机森林或支持向量机，这是传统的机器学习方法之一，我相信这比深度学习方法要好得多，因为对于大多数应用程序来说，您通常没有深度学习所需的数据类型，因此传统机器学习有时效果很好，如果您没有大量训练数据，实际上有时比深度学习好得多。

import numpy as np
import cv2
import pandas as pd
import pickle
from matplotlib import pyplot as plt
import os

STEP 1: READ TRAINING IMAGES AND EXTRACT FEATURES

image_dataset = pd.DataFrame()  # Dataframe to capture image features

img_path = "images/train_images/"
for image in os.listdir(img_path):  # iterate through each file 
    print(image)
    
    df = pd.DataFrame()  # Temporary data frame to capture information for each loop.
    # Reset dataframe to blank after each loop.
    
    input_img = cv2.imread(img_path + image)  # Read images
    
    # Check if the input image is RGB or grey and convert to grey if RGB
    if input_img.ndim == 3 and input_img.shape[-1] == 3:
        img = cv2.cvtColor(input_img,cv2.COLOR_BGR2GRAY)
    elif input_img.ndim == 2:
        img = input_img
    else:
        raise excerption("The module works only with grayscale and RGB images!")

################################################################
# START ADDING DATA TO THE DATAFRAME

    # Add pixel values to the data frame
    pixel_values = img.reshape(-1)
    df['Pixel_Value'] = pixel_values  # Pixel value itself as a feature
    df['Image_Name'] = image  # Capture image name as we read multiple images
    
############################################################################    
    # Generate Gabor features
    num = 1  # To count numbers up in order to give Gabor features a lable in the data frame
    kernels = []
    for theta in range(2):   # Define number of thetas
        theta = theta / 4. * np.pi
        for sigma in (1, 3):  # Sigma with 1 and 3
            for lamda in np.arange(0, np.pi, np.pi / 4):   # Range of wavelengths
                for gamma in (0.05, 0.5):  # Gamma values of 0.05 and 0.5
                    gabor_label = 'Gabor' + str(num)  # Label Gabor columns as Gabor1, Gabor2, etc.
                    ksize=9
                    kernel = cv2.getGaborKernel((ksize, ksize), sigma, theta, lamda, gamma, 0, ktype=cv2.CV_32F)    
                    kernels.append(kernel)
                    # Now filter the image and add values to a new column 
                    fimg = cv2.filter2D(img, cv2.CV_8UC3, kernel)
                    filtered_img = fimg.reshape(-1)
                    df[gabor_label] = filtered_img  #Labels columns as Gabor1, Gabor2, etc.
                    print(gabor_label, ': theta=', theta, ': sigma=', sigma, ': lamda=', lamda, ': gamma=', gamma)
                    num += 1  # Increment for gabor column label
########################################
# Gerate OTHER FEATURES and add them to the data frame
                
    # CANNY EDGE
    edges = cv2.Canny(img, 100,200)   #Image, min and max values
    edges1 = edges.reshape(-1)
    df['Canny Edge'] = edges1 #Add column to original dataframe
    
    from skimage.filters import roberts, sobel, scharr, prewitt
    
    # ROBERTS EDGE
    edge_roberts = roberts(img)
    edge_roberts1 = edge_roberts.reshape(-1)
    df['Roberts'] = edge_roberts1
    
    # SOBEL
    edge_sobel = sobel(img)
    edge_sobel1 = edge_sobel.reshape(-1)
    df['Sobel'] = edge_sobel1
    
    # SCHARR
    edge_scharr = scharr(img)
    edge_scharr1 = edge_scharr.reshape(-1)
    df['Scharr'] = edge_scharr1
    
    # PREWITT
    edge_prewitt = prewitt(img)
    edge_prewitt1 = edge_prewitt.reshape(-1)
    df['Prewitt'] = edge_prewitt1
    
    # GAUSSIAN with sigma=3
    from scipy import ndimage as nd
    gaussian_img = nd.gaussian_filter(img, sigma=3)
    gaussian_img1 = gaussian_img.reshape(-1)
    df['Gaussian s3'] = gaussian_img1
    
    # GAUSSIAN with sigma=7
    gaussian_img2 = nd.gaussian_filter(img, sigma=7)
    gaussian_img3 = gaussian_img2.reshape(-1)
    df['Gaussian s7'] = gaussian_img3
    
    # MEDIAN with sigma=3
    median_img = nd.median_filter(img, size=3)
    median_img1 = median_img.reshape(-1)
    df['Median s3'] = median_img1
    
    # VARIANCE with size=3
    variance_img = nd.generic_filter(img, np.var, size=3)
    variance_img1 = variance_img.reshape(-1)
    df['Variance s3'] = variance_img1  # Add column to original dataframe

######################################                    
# Update dataframe for images to include details for each image in the loop
    image_dataset = image_dataset.append(df)

STEP 2: READ LABELED IMAGES (MASKS) AND CREATE ANOTHER DATAFRAME WITH LABEL VALUES AND LABEL FILE NAMES

mask_dataset = pd.DataFrame()  # Create dataframe to capture mask info.

mask_path = "images/train_masks/"    
for mask in os.listdir(mask_path):  # iterate through each file to perform some action
    print(mask)
    
    df2 = pd.DataFrame()  # Temporary dataframe to capture info for each mask in the loop
    input_mask = cv2.imread(mask_path + mask)
    
    # Check if the input mask is RGB or grey and convert to grey if RGB
    if input_mask.ndim == 3 and input_mask.shape[-1] == 3:
        label = cv2.cvtColor(input_mask,cv2.COLOR_BGR2GRAY)
    elif input_mask.ndim == 2:
        label = input_mask
    else:
        raise excerption("The module works only with grayscale and RGB images!")

    # Add pixel values to the data frame
    label_values = label.reshape(-1)
    df2['Label_Value'] = label_values
    df2['Mask_Name'] = mask
    
    mask_dataset = mask_dataset.append(df2)  # Update mask dataframe with all the info from each mask

STEP 3: GET DATA READY FOR RANDOM FOREST (or other classifier) COMBINE BOTH DATAFRAMES INTO A SINGLE DATASET

dataset = pd.concat([image_dataset, mask_dataset], axis=1)  # Concatenate both image and mask datasets

# If you expect image and mask names to be the same this is where we can perform sanity check
# dataset['Image_Name'].equals(dataset['Mask_Name'])   
# If we do not want to include pixels with value 0 
# e.g. Sometimes unlabeled pixels may be given a value 0.
dataset = dataset[dataset.Label_Value != 0]

# Assign training features to X and labels to Y
# Drop columns that are not relevant for training (non-features)
X = dataset.drop(labels = ["Image_Name", "Mask_Name", "Label_Value"], axis=1) 

# Assign label values to Y (our prediction)
Y = dataset["Label_Value"].values 

# Split data into train and test to verify accuracy after fitting the model. 
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=20)

STEP 4: Define the classifier and fit a model with our training data

# Import training classifier
from sklearn.ensemble import RandomForestClassifier
# Instantiate model with n number of decision trees
model = RandomForestClassifier(n_estimators = 50, random_state = 42)

# Train the model on training data
model.fit(X_train, y_train)

STEP 5: Accuracy check

from sklearn import metrics

prediction_test = model.predict(X_test)
# Check accuracy on test dataset. 
print("Accuracy = ", metrics.accuracy_score(y_test, prediction_test))

STEP 6: SAVE MODEL FOR FUTURE USE

# You can store the model for future use. In fact, this is how you do machine elarning
# Train on training images, validate on test images and deploy the model on unknown images. 
# Save the trained model as pickle string to disk for future use
model_name = "sandstone_model"
pickle.dump(model, open(model_name, 'wb'))

	Original Pixels	Gabor 7	Gabor 8	...	Gabor 23	Gabor 24	Gabor 27	Gabor 28	Gabor 29	Gabor 30	Gabor 31	Gabor 32
0	255	255	255	...	255	255	255	255	130	122	255	255
1	255	255	255	...	255	255	255	255	130	122	255	255
2	255	255	255	...	255	255	255	255	130	122	255	255
3	255	255	255	...	255	255	255	255	130	122	255	255
4	255	255	255	...	255	255	255	255	130	122	255	255

	Original Image	Gabor1	Gabor2	Gabor3	Gabor4	Gabor5	Gabor6	Gabor7	Gabor8	Gabor9	...	Gabor32	Canny Edge	Roberts	Sobel	Scharr	Prewitt	Gaussian s3	Gaussian s7	Median s3	Variance s3
0	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0

	Original Pixels	Gabor 7	Gabor 8	...	Gabor 23	Gabor 24	Gabor 27	Gabor 28	Gabor 29	Gabor 30	Gabor 31	Gabor 32
0	255	255	255	...	255	255	255	255	130	122	255	255
1	255	255	255	...	255	255	255	255	130	122	255	255
2	255	255	255	...	255	255	255	255	130	122	255	255
3	255	255	255	...	255	255	255	255	130	122	255	255
4	255	255	255	...	255	255	255	255	130	122	255	255

	Original Image	Gabor1	Gabor2	Gabor3	Gabor4	Gabor5	Gabor6	Gabor7	Gabor8	Gabor9	...	Gabor32	Canny Edge	Roberts	Sobel	Scharr	Prewitt	Gaussian s3	Gaussian s7	Median s3	Variance s3
0	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0

	Original Pixels	Gabor 7	Gabor 8	...	Gabor 23	Gabor 24	Gabor 27	Gabor 28	Gabor 29	Gabor 30	Gabor 31	Gabor 32
0	255	255	255	...	255	255	255	255	130	122	255	255
1	255	255	255	...	255	255	255	255	130	122	255	255
2	255	255	255	...	255	255	255	255	130	122	255	255
3	255	255	255	...	255	255	255	255	130	122	255	255
4	255	255	255	...	255	255	255	255	130	122	255	255

	Original Image	Gabor1	Gabor2	Gabor3	Gabor4	Gabor5	Gabor6	Gabor7	Gabor8	Gabor9	...	Gabor32	Canny Edge	Roberts	Sobel	Scharr	Prewitt	Gaussian s3	Gaussian s7	Median s3	Variance s3
0	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0.0	0.0	0.0	0.0	0	0	0	0