As in my previous post “Setting up Deep Learning in Windows : Installing Keras with Tensorflow-GPU”, I ran cifar-10.py, an object recognition task using shallow 3-layered convolution neural network (CNN) on CIFAR-10 image dataset. We achieved 76% accuracy.

In this blog-post, we will demonstrate how to achieve 90% accuracy in object recognition task on CIFAR-10 dataset with help of following concepts:
1. Deep Network Architecture
2. Data Augmentation
3. Regularization

CIFAR-10 Task – Object Recognition in Images

CIFAR-10 is an established computer-vision dataset used for object recognition. The CIFAR-10 data consists of 60,000 (32×32) color images in 10 classes, with 6000 images per class. There are 50,000 training images and 10,000 test images in the official data. The label classes in the dataset are:

  • airplane
  • automobile
  • bird
  • cat
  • deer
  • dog
  • frog
  • horse
  • ship
  • truck

The classes are completely mutually exclusive. It was collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Let us visualize few of the images of test set using the python snippet given below.

from matplotlib import pyplot
from scipy.misc import toimage
from keras.datasets import cifar10 
def show_imgs(X):
    pyplot.figure(1)
    k = 0
    for i in range(0,4):
        for j in range(0,4):
            pyplot.subplot2grid((4,4),(i,j))
            pyplot.imshow(toimage(X[k]))
            k = k+1
    # show the plot
    pyplot.show()

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
show_imgs(x_test[:16])

The output plot would look like this:

output

1. Deep Neural Network Architecture

We saw previously that shallow architecture was able to achieve 76% accuracy only. So, the idea here is to build a deep neural architecture as opposed to shallow architecture which was not able to learn features of objects accurately.

The advantage of multiple layers is that they can learn features at various levels of abstraction. For example, if you train a deep CNN to classify images, you will find that the first layer will train itself to recognize very basic things like edges, the next layer will train itself to recognize collections of edges such as shapes, the next layer will train itself to recognize collections of shapes like wheels, legs, tails, faces and the next layer will learn even higher-order features like objects (truck, ships, dog, frog etc). Multiple layers are much better at generalizing because they learn all the intermediate features between the raw input and the high-level classification. At the same time, there are few important aspects which need to be taken care off to prevent over-fitting. Deep CNN are harder to train because:

a)  Data requirement increases as the network becomes deeper.
b) Regularization becomes important as number of parameters (weights) increases in order to do learning of weights from memorization of features towards generalization of features.

Having said that we will build a 6 layered convolution neural network followed by flatten layer. The output layer is dense layer of 10 nodes (as there are 10 classes) with softmax activation. Here is a model summary:

____________________________________________________________________________________
Layer (type)                                    Output Shape                                  Param#
==================================================================
conv2d_1 (Conv2D)                      (None, 32, 32, 32)                             896
____________________________________________________________________________________
batch_normalization_1               (None, 32, 32, 32)                             128
____________________________________________________________________________________
conv2d_2 (Conv2D)                      (None, 32, 32, 32)                             9248
____________________________________________________________________________________
batch_normalization_2               (None, 32, 32, 32)                             128
____________________________________________________________________________________
max_pooling2d_1                         (None, 16, 16, 32)                              0
____________________________________________________________________________________
dropout_1 (Dropout)                  (None, 16, 16, 32)                              0
____________________________________________________________________________________
conv2d_3 (Conv2D)                     (None, 16, 16, 64)                              18496
____________________________________________________________________________________
batch_normalization_3              (None, 16, 16, 64)                              256
____________________________________________________________________________________
conv2d_4 (Conv2D)                     (None, 16, 16, 64)                              36928
____________________________________________________________________________________
batch_normalization_4              (None, 16, 16, 64)                              256
____________________________________________________________________________________
max_pooling2d_2                         (None, 8, 8, 64)                                  0
____________________________________________________________________________________
dropout_2 (Dropout)                   (None, 8, 8, 64)                                  0
____________________________________________________________________________________
conv2d_5 (Conv2D)                     (None, 8, 8, 128)                                 73856
____________________________________________________________________________________
batch_normalization_5              (None, 8, 8, 128)                                 512
____________________________________________________________________________________
conv2d_6 (Conv2D)                    (None, 8, 8, 128)                                  147584
____________________________________________________________________________________
batch_normalization_6             (None, 8, 8, 128)                                  512
____________________________________________________________________________________
max_pooling2d_3                       (None, 4, 4, 128)                                   0
____________________________________________________________________________________
dropout_3 (Dropout)                 (None, 4, 4, 128)                                  0
____________________________________________________________________________________
flatten_1 (Flatten)                      (None, 2048)                                         0
____________________________________________________________________________________
dense_1 (Dense)                         (None, 10)                                          20490
==================================================================
Total params: 309,290
Trainable params: 308,394
Non-trainable params: 896
_____________________________________________________________________________________

The process of building a Convolutional Neural Network majorly involves four major blocks show below.

Convolution layer ==> Pooling layer ==> Flattening layer ==> Dense/Output layer

cnn
Typical block diagram of CNN

Convolution layer is set of 3 operations: Convolution, Activation & Batch normalization. Sometimes, Dropout layer is kept after Pooling in lieu of regularization. Also, Multiple dense layer can be kept after flattening layer before finally keeping output dense layer. These are some general trends/norms which I have come across in designing CNN architectures.

2. Data Augmentation

In Keras, We have a ImageDataGenerator class that is used to generate batches of tensor image data with real-time data augmentation. The data will be looped over (in batches) indefinitely. The image data is generated by transforming the actual training images by rotation, crop, shifts, shear, zoom, flip, reflection, normalization etc. The below code snippets shows how to initialize the image data generator class.

from keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator( rotation_range=90, 
                 width_shift_range=0.1, height_shift_range=0.1, 
                 horizontal_flip=True) 
datagen.fit(x_train)

To know more about all the possible arguments (transformations) and methods of this class, refer to Keras documentation here. For example, one can use flow(x, y) method that takes numpy data & label arrays, and generates batches of augmented/normalized data. It yields batches indefinitely, in an infinite loop. Below is the python snippet for visualizing the images generated using flow method of ImageDataGenerator class.

from matplotlib import pyplot as plt
# Configure batch size and retrieve one batch of images
for X_batch, y_batch in datagen.flow(x_train, y_train, batch_size=9):
    # Show 9 images
    for i in range(0, 9):
        plt.subplot(330 + 1 + i)
        plt.imshow(toimage(X_batch[i].reshape(img_rows, img_cols, 3)))
    # show the plot
    plt.show()
    break

generated.png

If you want to get more insight or visualization on “Data Augmentation” by image generation in Keras, you may like to read blog-posts here and here.

3. Regularization

Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks.  Given below are few techniques which were proposed recently and has become a general norm these days in convolutional neural networks.

Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. The reduction in number of parameters in each step of training has effect of regularization. Dropout has shown improvements in the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets [1].

Kernel_regularizer  allows to apply penalties on layer parameters during optimization. These penalties are incorporated in the loss function that the network optimizes. This argument in convolutional layer  is nothing but L2 regularisation of the weights. This penalizes peaky weights and makes sure that all the inputs are considered. During gradient descent parameter update, the above L2 regularization ultimately means that every weight is decayed linearly, that’s why called weight decay.

BatchNormalization normalizes the activation of the previous layer at each batch, i.e. applies a transformation that maintains the mean activation close to 0 and the activation standard deviation close to 1. It addresses the problem of internal covariate shift. It also acts as a regularizer, in some cases eliminating the need for Dropout. Batch Normalization achieves the same accuracy with fewer training steps thus speeding up the training process [2].

Having read a brief description of all the concepts that we are going to use here, Let’s look into full python implementation of object recognition task on CIFAR-10 dataset.

import keras
from keras.models import Sequential
from keras.utils import np_utils
from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Dense, Activation, Flatten, Dropout, BatchNormalization
from keras.layers import Conv2D, MaxPooling2D
from keras.datasets import cifar10
from keras import regularizers
from keras.callbacks import LearningRateScheduler
import numpy as np

def lr_schedule(epoch):
    lrate = 0.001
    if epoch > 75:
        lrate = 0.0005
    elif epoch > 100:
        lrate = 0.0003        
    return lrate

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

#z-score
mean = np.mean(x_train,axis=(0,1,2,3))
std = np.std(x_train,axis=(0,1,2,3))
x_train = (x_train-mean)/(std+1e-7)
x_test = (x_test-mean)/(std+1e-7)

num_classes = 10
y_train = np_utils.to_categorical(y_train,num_classes)
y_test = np_utils.to_categorical(y_test,num_classes)

weight_decay = 1e-4
model = Sequential()
model.add(Conv2D(32, (3,3), padding='same', kernel_regularizer=regularizers.l2(weight_decay), input_shape=x_train.shape[1:]))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(Conv2D(32, (3,3), padding='same', kernel_regularizer=regularizers.l2(weight_decay)))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))

model.add(Conv2D(64, (3,3), padding='same', kernel_regularizer=regularizers.l2(weight_decay)))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(Conv2D(64, (3,3), padding='same', kernel_regularizer=regularizers.l2(weight_decay)))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.3))

model.add(Conv2D(128, (3,3), padding='same', kernel_regularizer=regularizers.l2(weight_decay)))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(Conv2D(128, (3,3), padding='same', kernel_regularizer=regularizers.l2(weight_decay)))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.4))

model.add(Flatten())
model.add(Dense(num_classes, activation='softmax'))

model.summary()

#data augmentation
datagen = ImageDataGenerator(
    rotation_range=15,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True,
    )
datagen.fit(x_train)

#training
batch_size = 64

opt_rms = keras.optimizers.rmsprop(lr=0.001,decay=1e-6)
model.compile(loss='categorical_crossentropy', optimizer=opt_rms, metrics=['accuracy'])
model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),\
                    steps_per_epoch=x_train.shape[0] // batch_size,epochs=125,\
                    verbose=1,validation_data=(x_test,y_test),callbacks=[LearningRateScheduler(lr_schedule)])
#save to disk
model_json = model.to_json()
with open('model.json', 'w') as json_file:
    json_file.write(model_json)
model.save_weights('model.h5') 
 
#testing
scores = model.evaluate(x_test, y_test, batch_size=128, verbose=1)
print('\nTest result: %.3f loss: %.3f' % (scores[1]*100,scores[0])

The output of above python implementation for object recognition task is shown below:

CIFAR10-result
Test result on CIFAR-10 dataset (~90%)

As you can see, the results on test-set reached approximately ~ 90%. One can fine tune it further and run it for more number of epochs to go past 90%.

More on Results

Let’s check out few images from test-set to find out the object class predicted by trained CNN. We will call the def show_imgs(X) method defined in first section “CIFAR-10 task – Object Recognition in Images” to display 16 images in 4*4 grid. Now, the trained CNN model is loaded into memory from disk and we predict object class of first 16 images from test-set.
Images must be Z-score (mean-std) normalized because that’s how we have implemented while training also. Z-score normalization is important because it results in similarly-ranged feature values and that the gradients don’t go out of control (need one global learning rate multiplier).

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# mean-std normalization
mean = np.mean(x_train,axis=(0,1,2,3))
std = np.std(x_train,axis=(0,1,2,3))
x_train = (x_train-mean)/(std+1e-7)
x_test = (x_test-mean)/(std+1e-7)

show_imgs(x_test[:16])

# Load trained CNN model
json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
model = model_from_json(loaded_model_json)
model.load_weights('model.h5')

labels =  ['airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck']

indices = np.argmax(model.predict(x_test[:16]),1)
print [labels[x] for x in indices]
output
[‘cat’, ‘ship’, ‘ship’, ‘airplane’, ‘frog’, ‘frog’, ‘automobile’, ‘frog’, ‘cat’, ‘automobile’, ‘airplane’, ‘truck’, ‘dog’, ‘horse’, ‘truck’, ‘ship’]

References

[1] Dropout: A Simple Way to Prevent Neural Networks from Overfitting
[2] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Concluding Remarks

Hope it was easy to go through tutorial as I have tried to keep it short and simple. Beginners who are interested in Convolutional Neural Networks can start with this application. In short, you have learnt how to implement following concepts with python and Keras.

  1. Plotting images with matplotlib.
  2. Z-score (mean-std normalization) of images.
  3. Building a deep Convolutional Neural Network.
  4. Applying batch normalization.
  5. Regularization : Dropout & Kernel regularizers.
  6. Data Augmentation : ImageDataGenerator in Keras.
  7. Saving & Loading DNN models (JSON format).

The full python implementation of object recognition task with ~90% accuracy on CIFAR-10 dataset can be found on Github link here.

If you liked the post, follow this blog to get updates about the upcoming articles. Also, share this article so that it can reach out to the readers who can actually gain from this. Please feel free to discuss anything regarding the post. I would love to hear feedback from you.

Happy deep learning 🙂

Advertisements