使用CNN（卷積神經網絡）和OpenCV進行手勢識別

OpenCV 機器學習人工智能 Python CNN 不靠譜的貓 2019-04-07

要構建SLR（手語識別），我們需要三件事：

機器學習數據集
構建機器學習模型（我們將使用CNN）
應用模型平臺（我們將使用OpenCV）

1）數據集

可以在此處下載手勢數據集（https://www.kaggle.com/datamunge/sign-language-mnist）。

我們的機器學習數據集包含24個（J和Z除外）American Sign Laguage字母表的許多圖像。每個圖像的大小為28x28像素，這意味著每個圖像總共784個像素。

加載機器學習數據集

要加載數據集，請使用以下Python代碼：

import keras
import numpy as np
import pandas as pd
import cv2
from matplotlib import pyplot as plt
from keras.models import Sequential 
from keras.layers import Conv2D,MaxPooling2D, Dense,Flatten, Dropout
from keras.datasets import mnist 
import matplotlib.pyplot as plt
from keras.utils import np_utils
from keras.optimizers import SGD
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
y_train = train['label'].values
y_test = test['label'].values
X_train = train.drop(['label'],axis=1)
X_test = test.drop(['label'], axis=1)

我們的數據集採用CSV（逗號分隔值）格式。train_X和test_X包含每個像素的值。train_Y和test_Y包含圖像標籤。您可以使用以下Python代碼查看機器學習數據集：

display(X_train.info())
display(X_test.info())
display(X_train.head(n = 2))
display(X_test.head(n = 2))

預處理

train_X和test_X包含所有像素像素值的數組。我們從這些值創建了一個圖像。我們的圖像尺寸是28x28，因此我們必須將陣列分成28x28像素組。為此，我們將使用以下代碼：

X_train = np.array(X_train.iloc[:,:])
X_train = np.array([np.reshape(i, (28,28)) for i in X_train])
X_test = np.array(X_test.iloc[:,:])
X_test = np.array([np.reshape(i, (28,28)) for i in X_test])
num_classes = 26
y_train = np.array(y_train).reshape(-1)
y_test = np.array(y_test).reshape(-1)
y_train = np.eye(num_classes)[y_train]
y_test = np.eye(num_classes)[y_test]
X_train = X_train.reshape((27455, 28, 28, 1))
X_test = X_test.reshape((7172, 28, 28, 1))

現在我們可以使用這個數據集來訓練我們的機器學習模型了。

2）建立和訓練模型

我們將使用CNN（卷積神經網絡）來識別字母。我們用keras。

機器學習模型的Python實現如下：

classifier = Sequential()
classifier.add(Conv2D(filters=8, kernel_size=(3,3),strides=(1,1),padding='same',input_shape=(28,28,1),activation='relu', data_format='channels_last'))
classifier.add(MaxPooling2D(pool_size=(2,2)))
classifier.add(Conv2D(filters=16, kernel_size=(3,3),strides=(1,1),padding='same',activation='relu'))
classifier.add(Dropout(0.5))
classifier.add(MaxPooling2D(pool_size=(4,4)))
classifier.add(Dense(128, activation='relu'))
classifier.add(Flatten())
classifier.add(Dense(26, activation='softmax'))

我們的模型由Conv2D和MaxPooling層組成，然後是一些全連接層(Dense）。

第一個Conv2D（卷積）層採用（28,28,1）的輸入圖像。最後一個全連接層為我們提供了26個字母的輸出。

我們正在使用第二個Conv2D層之後的Dropout來正則化我們的訓練。

我們在最後一層使用softmax激活函數。

最後我們的模型看起來像這樣：

我們必須編譯並擬合機器學習模型。為此，我們將使用如下Python代碼：

classifier.compile(optimizer='SGD', loss='categorical_crossentropy', metrics=['accuracy'])
classifier.fit(X_train, y_train, epochs=50, batch_size=100)

我們正在使用SGD優化器來編譯我們的模型。您也可以將時期減少到25。

最後要檢查準確性：

accuracy = classifier.evaluate(x=X_test,y=y_test,batch_size=32)
print("Accuracy: ",accuracy[1])

要保存訓練過的機器學習模型，我們可以使用：

classifier.save('CNNmodel.h5')

3）OpenCV

以下Python實現方法為示例，可以根據需要自己調整。

導入Python庫並加載模型

import cv2
import numpy as np
from keras.models import load_model
from skimage.transform import resize, pyramid_reduce
import PIL
from PIL import Image
model = load_model('CNNmodel.h5')

輔助函數

def crop_image(image, x, y, width, height):
 return image[y:y + height, x:x + width]
def prediction(pred):
 if pred == 0:
 print('A')
 elif pred == 1:
 print('B') 
 elif pred == 2:
 print('C') 
 elif pred == 3:
 print('D') 
 elif pred == 14:
 print('O') 
 elif pred == 8:
 print('I') 
 elif pred == 20:
 print('U') 
 elif pred == 21:
 print('V') 
 elif pred == 22:
 print('W') 
 elif pred == 24:
 print('Y') 
 elif pred == 11:
 print('L')
def keras_process_image(img): 
 image_x = 28
 image_y = 28
 img = cv2.resize(img, (1,28,28), interpolation = cv2.INTER_AREA)
 #img = get_square(img, 28)
 #img = np.reshape(img, (image_x, image_y)) 
 
 return img

預測

我們必須從輸入圖像預測字母。我們的模型將輸出作為整數而不是字母，因為標籤是以整數形式給出的（A為1，B為2，C為3，依此類推......）

def keras_predict(model, image):
 data = np.asarray( image, dtype="int32" )
 
 pred_probab = model.predict(data)[0]
 pred_class = list(pred_probab).index(max(pred_probab))
 return max(pred_probab), pred_class

創建窗體

我們必須創建一個窗口來從我們的網絡攝像頭獲取輸入。我們作為輸入的圖像應該是28x28灰度圖像。因為我們在28x28尺寸的圖像上訓練我們的模型。示例代碼如下：

def main():
 while True: 
 cam_capture = cv2.VideoCapture(0)
 _, image_frame = cam_capture.read() 
 # Select ROI
 im2 = crop_image(image_frame, 300,300,300,300)
 image_grayscale = cv2.cvtColor(im2, cv2.COLOR_BGR2GRAY)
 
 image_grayscale_blurred = cv2.GaussianBlur(image_grayscale, (15,15), 0)
 im3 = cv2.resize(image_grayscale_blurred, (28,28), interpolation = cv2.INTER_AREA) 
 
 #ar = np.array(resized_img)
 #ar = resized_img.reshape(1,784) 
 
 im4 = np.resize(im3, (28, 28, 1))
 im5 = np.expand_dims(im4, axis=0) 
 pred_probab, pred_class = keras_predict(model, im5)
 #print(pred_class, pred_probab)
 prediction(pred_class) 
 
 # Display cropped image
 cv2.imshow("Image2",im2)
 #cv2.imshow("Image4",resized_img)
 cv2.imshow("Image3",image_grayscale_blurred)
 if cv2.waitKey(25) & 0xFF == ord('q'):
 cv2.destroyAllWindows()
 break
keras_predict(model, np.zeros((1, 28, 28, 1), dtype=np.uint8))
if __name__ == '__main__':
 main()
cam_capture.release()
cv2.destroyAllWindows()

我們的機器學習模型準確度約為94％，因此它應該識別字母而沒有任何問題。

使用CNN（卷積神經網絡）和OpenCV進行手勢識別

1）數據集

2）建立和訓練模型

3）OpenCV

相關推薦