代碼詳解：最全面的卷積神經網絡介紹，都在這裡了

人工智能圖像處理 CNN 讀芯術 2019-06-18

全文共8600字，預計學習時長40分鐘或更長

圖片來源：pexels.com

神經網絡由具有權重和偏差的神經元組成。通過在訓練過程中調整這些權重和偏差，以提出良好的學習模型。每個神經元接收一組輸入，以某種方式處理它，然後輸出一個值。如果構建一個具有多層的神經網絡，則將其稱為深度神經網絡。處理這些深度神經網絡的人工智能學分支被稱為深度學習。

普通神經網絡的主要缺點是其忽略了輸入數據的結構。在將數據饋送到神經網絡之前，所有數據都將轉換為一維數組。這適用於常規數據，但在處理圖像時會遇到困難。

考慮到灰度圖像是2D結構，像素的空間排列有很多隱藏信息。若忽略這些信息，則將失去許多潛在的模式。這就是卷積神經網絡（CNN）被引入圖像處理的原因。CNN在處理圖像時會考慮圖像的2D結構。

CNN也是由具有權重和偏差的神經元組成。這些神經元接收輸入的數據並處理，然後輸出信息。神經網絡的目標是將輸入層中的原始圖像數據轉到輸出層中的正確類中。普通神經網絡和CNN之間的區別在於使用的層類型以及處理輸入數據的方式。假設CNN的輸入是圖像，這允許其提取特定於圖像的屬性。這使得CNN在處理圖像方面更有效率。那麼，CNN是如何構建的？

CNN的體系結構

當使用普通神經網絡時，需要將輸入數據轉換為單個向量。該向量作為神經網絡的輸入，然後向量穿過神經網絡的各層。在這些層中，每個神經元都與前一層中的所有神經元相連接。值得注意的是，同層的神經元互不連接。它們僅與相鄰層的神經元相連。網絡中的最後一層是輸出層，它代表最終輸出。

若將這種結構用於圖像處理，它將很快變得難以管理。例如，一個由256x256RGB圖像組成的圖像數據集。由於這是3維圖像，因此將有256 * 256 * 3 = 196,608個權重。請意，這僅適用於單個神經元！每層都有多個神經元，因此權重的數量迅速增加。這意味著在訓練過程中，該模型將需要大量參數來調整權重。這就是該結構複雜和耗時的原因。將每個神經元連接到前一層中的每個神經元，稱為完全連接，這顯然不適用於圖像處理。

CNN在處理數據時明確考慮圖像的結構。CNN中的神經元按三維排列——寬度、高度和深度。當前層中的每個神經元都連接到前一層輸出的小塊。這就像在輸入圖像上疊加NxN過濾器一樣。這與完全連接的層相反，完全連接層的每個神經元均與前一層的所有神經元相連。

由於單個過濾器無法捕獲圖像的所有細微差別，因此需要花費數倍的時間（假設M倍）確保捕獲所有細節。這M個過濾器充當特徵提取器。如果查看這些過濾器的輸出，可以查看層的提取特徵，如邊緣、角等。這適用於CNN中的初始層。隨著在神經網絡層中的圖像處理的進展，可看到後面的層將提取更高級別的特徵。

CNN中的層類型

瞭解了CNN的架構，繼續看看用於構建CNN各層的類型。CNN通常使用以下類型的層：

· 輸入層：用於原始圖像數據的輸入。

· 卷積層：該層計算神經元與輸入中各種切片之間的卷積。

卷積層基本上計算權重和前一層輸出的切片之間的點積。

· 激勵層：此圖層將激活函數應用於前一圖層的輸出。該函數類似於max（0，x）。需要向該層神經網絡增加非線性映射，以便它可以很好地概括為任何類型的功能。

· 池化層：此層對前一層的輸出進行採樣，從而生成具有較小維度的結構。在網絡中處理圖像時，池化有助於只保留突出的部分。最大池是池化層最常用的，可在給定的KxK窗口中選擇最大值。

· 全連接層：此圖層計算最後一層的輸出分。輸出結果的大小為1x1xL，其中L是訓練數據集中的類數。

從神經網絡中的輸入層到輸出層時，輸入圖像將從像素值轉換為最終的類得分。現已提出了許多不同的CNN架構，它是一個活躍的研究領域。模型的準確性和魯棒性取決於許多因素- 層的類型、網絡的深度、網絡中各種類型的層的排列、為每層選擇的功能和訓練數據等。

構建基於感知器的線性迴歸量

接下來是有關如何用感知器構建線性迴歸模型。

本文將會使用TensorFlow。它是一種流行的深度學習軟件包，廣泛用於構建各種真實世界的系統中。在本節，我們將熟悉它的工作原理。在使用軟件包前先安裝它。

確保它已安裝後，創建一個新的python程序並導入以下包：

import numpy as np 
import matplotlib.pyplot as plt 
import tensorflow as tf

使模型適應生成的數據點。定義要生成的數據點的數量：

# Define the number of points to generate 
num_points = 1200

定義將用於生成數據的參數。使用線性模型：y =mx + c：

# Generate the data based on equation y = mx + c 
data = [] 
m = 0.2 
c = 0.5 
for i in range(num_points): 
 # Generate 'x' 
 x = np.random.normal(0.0, 0.8)

生成的噪音使數據發生變化：

# Generate some noise 
 noise = np.random.normal(0.0, 0.04)

使用以下等式計算y的值：

# Compute 'y' 
 y = m*x + c + noise 
 data.append([x, y])

完成迭代後，將數據分成輸入和輸出變量：

# Separate x and y 
x_data = [d[0] for d in data] 
y_data = [d[1] for d in data

繪製數據：

# Plot the generated data 
plt.plot(x_data, y_data, 'ro') 
plt.title('Input data') 
plt.show()

為感知器生成權重和偏差。權重由統一的隨機數生成器生成，並將偏差設置為零：

# Generate weights and biases 
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0)) 
b = tf.Variable(tf.zeros([1]))

使用TensorFlow變量定義等式：

# Define equation for 'y' 
y = W * x_data + b

定義訓練過程使用的損失函數。優化器將使損失函數的值儘可能地減小。

# Define how to compute the loss 
loss = tf.reduce_mean(tf.square(y - y_data))

定義梯度下降優化器並指定損失函數：

# Define the gradient descent optimizer 
optimizer = tf.train.GradientDescentOptimizer(0.5) 
train = optimizer.minimize(loss)

所有變量都已到位，但尚未初始化。接下來：

# Initialize all the variables 
init = tf.initialize_all_variables()

啟動TensorFlow會話並使用初始化程序運行它：

# Start the tensorflow session and run it 
sess = tf.Session() 
sess.run(init)

開始訓練：

# Start iterating 
num_iterations = 10 
for step in range(num_iterations): 
 # Run the session 
 sess.run(train)

打印訓練進度。進行迭代時，損失參數將持續減少：

# Print the progress 
 print('\nITERATION', step+1) 
 print('W =', sess.run(W)[0]) 
 print('b =', sess.run(b)[0]) 
 print('loss =', sess.run(loss))

繪製生成的數據並在頂部覆蓋預測的模型。該情況下，模型是一條線：

# Plot the input data 
 plt.plot(x_data, y_data, 'ro') 
 # Plot the predicted output line 
 plt.plot(x_data, sess.run(W) * x_data + sess.run(b))

設置繪圖的參數：

# Set plotting parameters 
 plt.xlabel('Dimension 0') 
 plt.ylabel('Dimension 1') 
 plt.title('Iteration ' + str(step+1) + ' of ' + str(num_iterations)) 
 plt.show()

完整代碼在linear_regression.py文件中給出。運行代碼將看到以下屏幕截圖顯示輸入數據：

如果關閉此窗口，將看到訓練過程。第一次迭代看起來像這樣：

可看到，線路完全偏離模型。關閉此窗口以轉到下一個迭代：

這條線似乎更好，但它仍然偏離模型。關閉此窗口並繼續迭代：

看起來這條線越來越接近真實的模型。如果繼續像這樣迭代，模型會變得更好。第八次迭代看起來如下：

該線與數據擬合的很好。將在終端上看到以下內容：

完成訓練後，在終端上看到以下內容：

使用單層神經網絡構建圖像分類器

如何使用TensorFlow創建單層神經網絡，並使用它來構建圖像分類器？使用MNIST圖像數據集來構建系統。它是包含手寫的數字圖像的數據集。其目標是構建一個能夠正確識別每個圖像中數字的分類器。

圖片來源：pexels.com

創建新的python程序並導入以下包：

import argparse 
import tensorflow as tf 
from tensorflow.examples.tutorials.mnist import input_data

定義一個解析輸入參數的函數：

def build_arg_parser():
 parser = argparse.ArgumentParser(description='Build a classifier using 
 \MNIST data')
 parser.add_argument('--input-dir', dest='input_dir', type=str, 
 default='./mnist_data', help='Directory for storing data')
 return parser

定義main函數並解析輸入參數：

if __name__ == '__main__': 
 args = build_arg_parser().parse_args()

提取MNIST圖像數據。one_hot標誌指定將在標籤中使用單熱編碼。這意味著如果有n個類，那麼給定數據點的標籤將是長度為n的數組。此數組中的每個元素都對應一個特定的類。要指定一個類，相應索引處的值將設置為1，其他所有值為0：

# Get the MNIST data 
 mnist = input_data.read_data_sets(args.input_dir, one_hot=True)

數據庫中的圖像是28 x 28像素。需將其轉換為單維數組以創建輸入圖層：

# The images are 28x28, so create the input layer 
 # with 784 neurons (28x28=784) 
 x = tf.placeholder(tf.float32, [None, 784])

創建具有權重和偏差的單層神經網絡。數據庫中有10個不同的數字。輸入層中的神經元數量為784，輸出層中的神經元數量為10：

# Create a layer with weights and biases. There are 10 distinct 
 # digits, so the output layer should have 10 classes 
 W = tf.Variable(tf.zeros([784, 10])) 
 b = tf.Variable(tf.zeros([10]))

創建用於訓練的等式：

# Create the equation for 'y' using y = W*x + b 
 y = tf.matmul(x, W) + b

定義損失函數和梯度下降優化器：

# Define the entropy loss and the gradient descent optimizer 
 y_loss = tf.placeholder(tf.float32, [None, 10]) 
 loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_loss)) 
 optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

初始化所有變量：

# Initialize all the variables 
 init = tf.initialize_all_variables()

創建TensorFlow會話並運行：

# Create a session 
 session = tf.Session() 
 session.run(init)

開始訓練過程。使用當前批次運行優化器的批次進行訓練，然後繼續下一批次進行下一次迭代。每次迭代的第一步是獲取下一批要訓練的圖像：

# Start training 
 num_iterations = 1200 
 batch_size = 90 
 for _ in range(num_iterations): 
 # Get the next batch of images 
 x_batch, y_batch = mnist.train.next_batch(batch_size)

在這批圖像上運行優化器：

# Train on this batch of images 
 session.run(optimizer, feed_dict = {x: x_batch, y_loss: y_batch})

訓練過程結束後，使用測試數據集計算準確度：

# Compute the accuracy using test data 
 predicted = tf.equal(tf.argmax(y, 1), tf.argmax(y_loss, 1)) 
 accuracy = tf.reduce_mean(tf.cast(predicted, tf.float32)) 
 print('\nAccuracy =', session.run(accuracy, feed_dict = { 
 x: mnist.test.images, 
 y_loss: mnist.test.labels}))

完整代碼在single_layer.py文件中給出。如果運行代碼，它會將數據下載到當前文件夾中名為mnist_data的文件夾中。這是默認選項。如果要更改它，可以使用輸入參數執行此操作。運行代碼後，將在終端上獲得以下輸出：

正如終端上打印所示，模型的準確率為92.1％。

使用卷積神經網絡構建圖像分類器

上一節中的圖像分類器表現不佳。獲得92.1％的MNIST數據集相對容易。如何使用卷積神經網絡（CNN）來實現更高的精度呢？下面將使用相同的數據集構建圖像分類器，但使用CNN而不是單層神經網絡。

創建一個新的python程序並導入以下包：

import argparse 
import tensorflow as tf 
from tensorflow.examples.tutorials.mnist import input_data

定義一個解析輸入參數的函數：

def build_arg_parser(): 
 parser = argparse.ArgumentParser(description='Build a CNN classifier \ 
 using MNIST data') 
 parser.add_argument('--input-dir', dest='input_dir', type=str, 
 default='./mnist_data', help='Directory for storing data') 
 return parser

定義一個函數來為每個層中的權重創建值：

def get_weights(shape): 
 data = tf.truncated_normal(shape, stddev=0.1) 
 return tf.Variable(data)

定義一個函數來為每個層中的偏差創建值：

def get_biases(shape): 
 data = tf.constant(0.1, shape=shape) 
 return tf.Variable(data)

定義一個函數以根據輸入形狀創建圖層：

def create_layer(shape): 
 # Get the weights and biases 
 W = get_weights(shape) 
 b = get_biases([shape[-1]]) 
 
 return W, b

定義執行2D卷積功能的函數：

def convolution_2d(x, W): 
 return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], 
 padding='SAME')

定義一個函數來執行2x2最大池操作：

def max_pooling(x): 
 return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], 
 strides=[1, 2, 2, 1], padding='SAME')

定義main函數並解析輸入參數：

if __name__ == '__main__': 
 args = build_arg_parser().parse_args()

提取MNIST圖像數據：

# Get the MNIST data 
 mnist = input_data.read_data_sets(args.input_dir, one_hot=True)

使用784個神經元創建輸入層：

# The images are 28x28, so create the input layer 
 # with 784 neurons (28x28=784) 
 x = tf.placeholder(tf.float32, [None, 784])

接下來是利用圖像2D結構的CNN。為4D張量，其中第二維和第三維指定圖像尺寸：

# Reshape 'x' into a 4D tensor 
 x_image = tf.reshape(x, [-1, 28, 28, 1])

創建第一個卷積層，為圖像中的每個5x5切片提取32個要素：

# Define the first convolutional layer 
 W_conv1, b_conv1 = create_layer([5, 5, 1, 32])

用前一步驟中計算的權重張量卷積圖像，然後為其添加偏置張量。然後，需要將整流線性單元（ReLU）函數應用於輸出：

# Convolve the image with weight tensor, add the 
 # bias, and then apply the ReLU function 
 h_conv1 = tf.nn.relu(convolution_2d(x_image, W_conv1) + b_conv1)

將2x2 最大池運算符應用於上一步的輸出：

# Apply the max pooling operator 
 h_pool1 = max_pooling(h_conv1)

創建第二個卷積層計算每個5x5切片上的64個要素：

# Define the second convolutional layer 
 W_conv2, b_conv2 = create_layer([5, 5, 32, 64])

使用上一步中計算的權重張量卷積前一層的輸出，然後添加偏差張量。然後，需要將整流線性單元（ReLU）函數應用於輸出：

# Convolve the output of previous layer with the 
 # weight tensor, add the bias, and then apply 
 # the ReLU function 
 h_conv2 = tf.nn.relu(convolution_2d(h_pool1, W_conv2) + b_conv2)

將2x2最大池運算符應用於上一步的輸出：

# Apply the max pooling operator 
 h_pool2 = max_pooling(h_conv2)

圖像尺寸減少到了7x7。創建一個包含1024個神經元的完全連接層：

# Define the fully connected layer 
 W_fc1, b_fc1 = create_layer([7 * 7 * 64, 1024])

重塑上一層的輸出：

# Reshape the output of the previous layer 
 h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])

將前一層的輸出與完全連接層的權重張量相乘，然後為其添加偏置張量。然後，將整流線性單元（ReLU）函數應用於輸出：

# Multiply the output of previous layer by the 
 # weight tensor, add the bias, and then apply 
 # the ReLU function 
 h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

為了減少過度擬合，需要創建一個dropout圖層。為概率值創建一個TensorFlow佔位符，該概率值指定在丟失期間保留神經元輸出的概率：

# Define the dropout layer using a probability placeholder 
 # for all the neurons 
 keep_prob = tf.placeholder(tf.float32) 
 h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

使用10個輸出神經元定義讀出層，對應於數據集中的10個類。計算輸出：

# Define the readout layer (output layer) 
 W_fc2, b_fc2 = create_layer([1024, 10]) 
 y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

定義損失函數和優化函數：

# Define the entropy loss and the optimizer 
 y_loss = tf.placeholder(tf.float32, [None, 10]) 
 loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv, y_loss)) 
 optimizer = tf.train.AdamOptimizer(1e-4).minimize(loss)

定義如何計算準確度：

# Define the accuracy computation 
 predicted = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_loss, 1)) 
 accuracy = tf.reduce_mean(tf.cast(predicted, tf.float32))

初始化變量後創建並運行會話：

# Create and run a session 
 sess = tf.InteractiveSession() 
 init = tf.initialize_all_variables() 
 sess.run(init)

開始訓練過程：

# Start training 
 num_iterations = 21000 
 batch_size = 75 
 print('\nTraining the model….') 
 for i in range(num_iterations): 
 # Get the next batch of images 
 batch = mnist.train.next_batch(batch_size)

每50次迭代打印準確度進度：

# Print progress 
 if i % 50 == 0: 
 cur_accuracy = accuracy.eval(feed_dict = { 
 x: batch[0], y_loss: batch[1], keep_prob: 1.0}) 
 print('Iteration', i, ', Accuracy =', cur_accuracy)

在當前批處理上運行優化程序：

# Train on the current batch 
 optimizer.run(feed_dict = {x: batch[0], y_loss: batch[1], keep_prob: 0.5})

訓練結束後，使用測試數據集計算準確度：

# Compute accuracy using test data 
 print('Test accuracy =', accuracy.eval(feed_dict = { 
 x: mnist.test.images, y_loss: mnist.test.labels, 
 keep_prob: 1.0}))

運行代碼，將在終端上獲得以下輸出：

繼續迭代時，精度會不斷增加，如以下屏幕截圖所示：

現在得到了輸出，可以看到卷積神經網絡的準確性遠遠高於簡單的神經網絡。

留言點贊關注

我們一起分享AI學習與發展的乾貨

歡迎關注全平臺AI垂類自媒體 “讀芯術”

代碼詳解：最全面的卷積神經網絡介紹，都在這裡了

相關推薦