As in my previous post “Setting up Deep Learning in Windows : Installing Keras with Tensorflow-GPU”, I ran
cifar-10.py, an object recognition task using shallow 3-layered convolution neural network (CNN) on CIFAR-10 image dataset. We achieved 76% accuracy.
In this blog-post, we will demonstrate how to achieve 90% accuracy in object recognition task on CIFAR-10 dataset with help of following concepts:
1. Deep Network Architecture
2. Data Augmentation
CIFAR-10 Task – Object Recognition in Images
CIFAR-10 is an established computer-vision dataset used for object recognition. The CIFAR-10 data consists of 60,000 (32×32) color images in 10 classes, with 6000 images per class. There are 50,000 training images and 10,000 test images in the official data. The label classes in the dataset are:
The classes are completely mutually exclusive. It was collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Let us visualize few of the images of test set using the python snippet given below.
from matplotlib import pyplot from scipy.misc import toimage from keras.datasets import cifar10 def show_imgs(X): pyplot.figure(1) k = 0 for i in range(0,4): for j in range(0,4): pyplot.subplot2grid((4,4),(i,j)) pyplot.imshow(toimage(X[k])) k = k+1 # show the plot pyplot.show() (x_train, y_train), (x_test, y_test) = cifar10.load_data() show_imgs(x_test[:16])
The output plot would look like this:
1. Deep Neural Network Architecture
We saw previously that shallow architecture was able to achieve 76% accuracy only. So, the idea here is to build a deep neural architecture as opposed to shallow architecture which was not able to learn features of objects accurately.
The advantage of multiple layers is that they can learn features at various levels of abstraction. For example, if you train a deep CNN to classify images, you will find that the first layer will train itself to recognize very basic things like edges, the next layer will train itself to recognize collections of edges such as shapes, the next layer will train itself to recognize collections of shapes like wheels, legs, tails, faces and the next layer will learn even higher-order features like objects (truck, ships, dog, frog etc). Multiple layers are much better at generalizing because they learn all the intermediate features between the raw input and the high-level classification. At the same time, there are few important aspects which need to be taken care off to prevent over-fitting. Deep CNN are harder to train because:
a) Data requirement increases as the network becomes deeper.
b) Regularization becomes important as number of parameters (weights) increases in order to do learning of weights from memorization of features towards generalization of features.
Having said that we will build a 6 layered convolution neural network followed by flatten layer. The output layer is dense layer of 10 nodes (as there are 10 classes) with softmax activation. Here is a model summary:
Layer (type) Output Shape Param#
conv2d_1 (Conv2D) (None, 32, 32, 32) 896
batch_normalization_1 (None, 32, 32, 32) 128
conv2d_2 (Conv2D) (None, 32, 32, 32) 9248
batch_normalization_2 (None, 32, 32, 32) 128
max_pooling2d_1 (None, 16, 16, 32) 0
dropout_1 (Dropout) (None, 16, 16, 32) 0
conv2d_3 (Conv2D) (None, 16, 16, 64) 18496
batch_normalization_3 (None, 16, 16, 64) 256
conv2d_4 (Conv2D) (None, 16, 16, 64) 36928
batch_normalization_4 (None, 16, 16, 64) 256
max_pooling2d_2 (None, 8, 8, 64) 0
dropout_2 (Dropout) (None, 8, 8, 64) 0
conv2d_5 (Conv2D) (None, 8, 8, 128) 73856
batch_normalization_5 (None, 8, 8, 128) 512
conv2d_6 (Conv2D) (None, 8, 8, 128) 147584
batch_normalization_6 (None, 8, 8, 128) 512
max_pooling2d_3 (None, 4, 4, 128) 0
dropout_3 (Dropout) (None, 4, 4, 128) 0
flatten_1 (Flatten) (None, 2048) 0
dense_1 (Dense) (None, 10) 20490
Total params: 309,290
Trainable params: 308,394
Non-trainable params: 896
The process of building a Convolutional Neural Network majorly involves four major blocks show below.
Convolution layer ==> Pooling layer ==> Flattening layer ==> Dense/Output layer
Convolution layer is set of 3 operations: Convolution, Activation & Batch normalization. Sometimes, Dropout layer is kept after Pooling in lieu of regularization. Also, Multiple dense layer can be kept after flattening layer before finally keeping output dense layer. These are some general trends/norms which I have come across in designing CNN architectures.
2. Data Augmentation
In Keras, We have a ImageDataGenerator class that is used to generate batches of tensor image data with real-time data augmentation. The data will be looped over (in batches) indefinitely. The image data is generated by transforming the actual training images by rotation, crop, shifts, shear, zoom, flip, reflection, normalization etc. The below code snippets shows how to initialize the image data generator class.
from keras.preprocessing.image import ImageDataGenerator datagen = ImageDataGenerator( rotation_range=90, width_shift_range=0.1, height_shift_range=0.1, horizontal_flip=True) datagen.fit(x_train)
To know more about all the possible arguments (transformations) and methods of this class, refer to Keras documentation here. For example, one can use flow(x, y) method that takes numpy data & label arrays, and generates batches of augmented/normalized data. It yields batches indefinitely, in an infinite loop. Below is the python snippet for visualizing the images generated using flow method of ImageDataGenerator class.
from matplotlib import pyplot as plt # Configure batch size and retrieve one batch of images for X_batch, y_batch in datagen.flow(x_train, y_train, batch_size=9): # Show 9 images for i in range(0, 9): plt.subplot(330 + 1 + i) plt.imshow(toimage(X_batch[i].reshape(img_rows, img_cols, 3))) # show the plot plt.show() break
Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Given below are few techniques which were proposed recently and has become a general norm these days in convolutional neural networks.
Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. The reduction in number of parameters in each step of training has effect of regularization. Dropout has shown improvements in the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets .
Kernel_regularizer allows to apply penalties on layer parameters during optimization. These penalties are incorporated in the loss function that the network optimizes. This argument in convolutional layer is nothing but
L2 regularisation of the weights. This penalizes peaky weights and makes sure that all the inputs are considered. During gradient descent parameter update, the above L2 regularization ultimately means that every weight is decayed linearly, that’s why called weight decay.
BatchNormalization normalizes the activation of the previous layer at each batch, i.e. applies a transformation that maintains the mean activation close to 0 and the activation standard deviation close to 1. It addresses the problem of internal covariate shift. It also acts as a regularizer, in some cases eliminating the need for Dropout. Batch Normalization achieves the same accuracy with fewer training steps thus speeding up the training process .
Having read a brief description of all the concepts that we are going to use here, Let’s look into full python implementation of object recognition task on CIFAR-10 dataset.
import keras from keras.models import Sequential from keras.utils import np_utils from keras.preprocessing.image import ImageDataGenerator from keras.layers import Dense, Activation, Flatten, Dropout, BatchNormalization from keras.layers import Conv2D, MaxPooling2D from keras.datasets import cifar10 from keras import regularizers from keras.callbacks import LearningRateScheduler import numpy as np def lr_schedule(epoch): lrate = 0.001 if epoch > 75: lrate = 0.0005 elif epoch > 100: lrate = 0.0003 return lrate (x_train, y_train), (x_test, y_test) = cifar10.load_data() x_train = x_train.astype('float32') x_test = x_test.astype('float32') #z-score mean = np.mean(x_train,axis=(0,1,2,3)) std = np.std(x_train,axis=(0,1,2,3)) x_train = (x_train-mean)/(std+1e-7) x_test = (x_test-mean)/(std+1e-7) num_classes = 10 y_train = np_utils.to_categorical(y_train,num_classes) y_test = np_utils.to_categorical(y_test,num_classes) weight_decay = 1e-4 model = Sequential() model.add(Conv2D(32, (3,3), padding='same', kernel_regularizer=regularizers.l2(weight_decay), input_shape=x_train.shape[1:])) model.add(Activation('elu')) model.add(BatchNormalization()) model.add(Conv2D(32, (3,3), padding='same', kernel_regularizer=regularizers.l2(weight_decay))) model.add(Activation('elu')) model.add(BatchNormalization()) model.add(MaxPooling2D(pool_size=(2,2))) model.add(Dropout(0.2)) model.add(Conv2D(64, (3,3), padding='same', kernel_regularizer=regularizers.l2(weight_decay))) model.add(Activation('elu')) model.add(BatchNormalization()) model.add(Conv2D(64, (3,3), padding='same', kernel_regularizer=regularizers.l2(weight_decay))) model.add(Activation('elu')) model.add(BatchNormalization()) model.add(MaxPooling2D(pool_size=(2,2))) model.add(Dropout(0.3)) model.add(Conv2D(128, (3,3), padding='same', kernel_regularizer=regularizers.l2(weight_decay))) model.add(Activation('elu')) model.add(BatchNormalization()) model.add(Conv2D(128, (3,3), padding='same', kernel_regularizer=regularizers.l2(weight_decay))) model.add(Activation('elu')) model.add(BatchNormalization()) model.add(MaxPooling2D(pool_size=(2,2))) model.add(Dropout(0.4)) model.add(Flatten()) model.add(Dense(num_classes, activation='softmax')) model.summary() #data augmentation datagen = ImageDataGenerator( rotation_range=15, width_shift_range=0.1, height_shift_range=0.1, horizontal_flip=True, ) datagen.fit(x_train) #training batch_size = 64 opt_rms = keras.optimizers.rmsprop(lr=0.001,decay=1e-6) model.compile(loss='categorical_crossentropy', optimizer=opt_rms, metrics=['accuracy']) model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),\ steps_per_epoch=x_train.shape // batch_size,epochs=125,\ verbose=1,validation_data=(x_test,y_test),callbacks=[LearningRateScheduler(lr_schedule)]) #save to disk model_json = model.to_json() with open('model.json', 'w') as json_file: json_file.write(model_json) model.save_weights('model.h5') #testing scores = model.evaluate(x_test, y_test, batch_size=128, verbose=1) print('\nTest result: %.3f loss: %.3f' % (scores*100,scores)
The output of above python implementation for object recognition task is shown below:
As you can see, the results on test-set reached approximately ~ 90%. One can fine tune it further and run it for more number of epochs to go past 90%.
More on Results
Let’s check out few images from test-set to find out the object class predicted by trained CNN. We will call the
def show_imgs(X) method defined in first section “CIFAR-10 task – Object Recognition in Images” to display 16 images in 4*4 grid. Now, the trained CNN model is loaded into memory from disk and we predict object class of first 16 images from test-set.
Images must be Z-score (mean-std) normalized because that’s how we have implemented while training also. Z-score normalization is important because it results in similarly-ranged feature values and that the gradients don’t go out of control (need one global learning rate multiplier).
(x_train, y_train), (x_test, y_test) = cifar10.load_data() x_train = x_train.astype('float32') x_test = x_test.astype('float32') # mean-std normalization mean = np.mean(x_train,axis=(0,1,2,3)) std = np.std(x_train,axis=(0,1,2,3)) x_train = (x_train-mean)/(std+1e-7) x_test = (x_test-mean)/(std+1e-7) show_imgs(x_test[:16]) # Load trained CNN model json_file = open('model.json', 'r') loaded_model_json = json_file.read() json_file.close() model = model_from_json(loaded_model_json) model.load_weights('model.h5') labels = ['airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck'] indices = np.argmax(model.predict(x_test[:16]),1) print [labels[x] for x in indices]
Hope it was easy to go through tutorial as I have tried to keep it short and simple. Beginners who are interested in Convolutional Neural Networks can start with this application. In short, you have learnt how to implement following concepts with python and Keras.
- Plotting images with matplotlib.
- Z-score (mean-std normalization) of images.
- Building a deep Convolutional Neural Network.
- Applying batch normalization.
- Regularization : Dropout & Kernel regularizers.
- Data Augmentation : ImageDataGenerator in Keras.
- Saving & Loading DNN models (JSON format).
The full python implementation of object recognition task with ~90% accuracy on CIFAR-10 dataset can be found on Github link here.
If you liked the post, follow this blog to get updates about the upcoming articles. Also, share this article so that it can reach out to the readers who can actually gain from this. Please feel free to discuss anything regarding the post. I would love to hear feedback from you.
Happy deep learning 🙂