In the previous blog-post, we demonstrated transfer learning using feature extraction technique and training a classifier further from the generated features. One of the strategy for transfer learning is to not only replace and retrain the classifier on top of the Convolution Neural Network (CNN) on the new data-set, but also fine tuning the weights of the pre-trained network by continuing the back propagation. There are two ways to do transfer learning.

  1. Feature Extraction from pre-trained model and then training a classifier on top of it.
  2. Fine tuning a pre-trained model keeping learnt weights as initial parameters.

This blog-post showcases the implementation of transfer learning using the second way which is “Fine tuning a pre-trained model”.

Transfer Learning Rules

Readers will find the following rules for transfer learning over numerous blogs on internet.

Consider that the new data-set is almost similar to the original dataset used for pre-training. In such scenario, there are two cases.

  1. If the new data-set is very small, it’s better to train only the final layers of the network to avoid overfitting, keeping all other layers fixed. So remove the final layers of the pre-trained network. Add new layers of required classes. Retrain only the new layers after freezing all other layers. Demonstrated in this blog-post.
  2. If the new data-set is very much large, retrain the whole network with initial weights from the pre-trained model.

Consider that the new data-set is very different from the orginal dataset. In this scenario, following is recommended.

  1. If the new data-set is very small, it will be good to fix the earlier layers and retrain the rest of the layers. The earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors), but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original data-set. The earlier layers can help to extract the low level descriptors of the new data.
  2. If the new data-set has large amount of data, we can retrain the whole network with weights initialized from the pre-trained network.

Problem Description

In this blog-post, We will use a data-set containing 16643 food images grouped in 11 major food categories for transfer learning demonstration. This is a food image classification task. The 11 categories are:

  • Bread
  • Dairy product
  • Dessert
  • Egg
  • Fried food
  • Meat
  • Noodles/Pasta
  • Rice
  • Seafood
  • Soup
  • Vegetable/Fruit

The Food-11 dataset is divided in three parts: training, validation and evaluation. The naming convention is used, where ID 0-10 refers to the 11 food categories respectively. The data-set can be downloaded from here.

Let’s start with python codes on transfer learning with fine-tuning technique.

  • Import Library
  • Reading Data
    • Create labels
    • Train, Validation and Test Distribution
    • Sample Images
  • Feature Extraction
  • Fine Tuning VGG16 : Transfer Learning
    • Fine Tuning Full Network
      • Loading chopped VGG16 Model
      • Building Model
      • Accuracy and Loss Plot
      • Test Evaluation
    • Fine Tuning Half Network
      • Load Half Tuned VGG16 Model
      • Building Model
      • Accuracy and Loss Plot
      • Test Evaluation

Import Library

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt; 

# Importing Keras libraries
from keras.utils import np_utils
from keras.optimizers import Adam
from keras.models import Sequential
from keras.applications import VGG16
from keras.callbacks import ModelCheckpoint
from keras.applications import imagenet_utils
from keras.preprocessing.image import load_img, img_to_array
from keras.layers import Input, Dense, Dropout, GlobalAveragePooling2D

# Importing sklearn libraries
from sklearn.metrics import confusion_matrix, accuracy_score

import warnings
warnings.filterwarnings('ignore')

Reading Food-11 Dataset

train = [os.path.join("Food-11/training",img) for img in os.listdir("Food-11/training")]
val = [os.path.join("Food-11/validation",img) for img in os.listdir("Food-11/validation")]
test = [os.path.join("Food-11/evaluation",img) for img in os.listdir("Food-11/evaluation")]
len(train),len(val),len(test)
(9866, 3430, 3347)

Number of images in training set, validation set and test set are 9866, 3430 and 3347 respectively.

train[0:5]
['Food-11/training/9_1339.jpg',
 'Food-11/training/2_1351.jpg',
 'Food-11/training/1_170.jpg',
 'Food-11/training/6_31.jpg',
 'Food-11/training/8_558.jpg']

Create labels

Here, we create labels for all the images and convert them into one hot encoded vector. It is required for training a neural network (last layer).

train_y = [int(img.split("/")[-1].split("_")[0]) for img in train]
val_y = [int(img.split("/")[-1].split("_")[0]) for img in val]
test_y = [int(img.split("/")[-1].split("_")[0]) for img in test]
num_classes = 11
# Convert class labels in one hot encoded vector
y_train = np_utils.to_categorical(train_y, num_classes)
y_val = np_utils.to_categorical(val_y, num_classes)
y_test = np_utils.to_categorical(test_y, num_classes)
train_y[0:10]
[9, 2, 1, 6, 8, 6, 0, 9, 0, 9]
y_train[0:10]
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.]], dtype=float32)

Train, Validation and Test Distribution

The training distribution of food images in 11 categories are shown in an image below.

print("Training data available in 11 classes")
print([train_y.count(i) for i in range(0,11)])

food_classes = ('Bread','Dairy product','Dessert','Egg','Fried food','Meat',
           'Noodles/Pasta','Rice','Seafood', 'Soup', 'Vegetable/Fruit')

y_pos = np.arange(len(food_classes))
counts = [train_y.count(i) for i in range(0,11)]

plt.barh(y_pos, counts, align='center', alpha=0.5)
plt.yticks(y_pos, food_classes)
plt.xlabel('Counts')
plt.title('Train Data Class Distribution')

plt.show()
Training data available in 11 classes
[994, 429, 1500, 986, 848, 1325, 440, 280, 855, 1500, 709]
bar chart of training data distribution
Distribution of training data in of Food-11 data-set
print("Validation data available in 11 classes")
[val_y.count(i) for i in range(0,11)]
Validation data available in 11 classes
[362, 144, 500, 327, 326, 449, 147, 96, 347, 500, 232]
print("Test data available in 11 classes")
[test_y.count(i) for i in range(0,11)]
Test data available in 11 classes
[368, 148, 500, 335, 287, 432, 147, 96, 303, 500, 231]

Sample Images

Few of the sample images from Food-11 data-set are shown below.

def show_imgs(X):
    plt.figure(figsize=(8, 8))
    k = 0
    for i in range(0,4):
        for j in range(0,4):
            image = load_img(train[k], target_size=(224, 224))
            plt.subplot2grid((4,4),(i,j))
            plt.imshow(image)
            k = k+1
    # show the plot
    plt.show()
show_imgs(train)
Sample images Food-11
Samples images from Food-11 Dataset

Features Extraction

Features are nothing but simply the pixel values of image zero-centered each color channel with respect to ImageNet dataset without scaling.

def create_features(dataset):

    x_scratch = []

    # loop over the images
    for imagePath in dataset:

        # load the input image and image is resized to 224x224 pixels
        image = load_img(imagePath, target_size=(224, 224))
        image = img_to_array(image)

        # preprocess the image by (1) expanding the dimensions and
        # (2) subtracting the mean RGB pixel intensity from the
        # ImageNet dataset
        image = np.expand_dims(image, axis=0)
        image = imagenet_utils.preprocess_input(image)

        # add the image to the batch
        x_scratch.append(image)

    x = np.vstack(x_scratch)
    return x
Dimensions of features being extracted for each data-set train, val and test are in shape of processed raw input image i.e. (224, 224, 3).
train_x = create_features(train)
val_x = create_features(val)
test_x = create_features(test)
print(train_x.shape)
print(val_x.shape)
print(test_x.shape)
(9866, 224, 224, 3)
(3430, 224, 224, 3)
(3347, 224, 224, 3)

Fine Tuning VGG16 : Transfer Learning

There are two experiments performed. Firstly, we re-train or say fine tune the full VGG16 network from initialized weights of pre-trained model. Secondly, we fine tune only half of the model by freezing initial half of the convolution layers of VGG16 model.

VGG16 model CNN
VGG16 model architecture

Fine Tuning Full Network

For this purpose, we chop the final 3 dense layers of the pre-trained VGG16 model. Further, we add new dense layers with final output layer of required 11 classes. We do not freeze any layer but modifies all the weights of VGG16 model.

Loading chopped VGG16 Model

We see how to load already train VGG16 model with chopped top layers in Keras. The loaded model is only till last max-pool layer in VGG16 architecture. We can chop dense layers by setting the parameter include_top=false.

# Creating a checkpointer
checkpointer = ModelCheckpoint(filepath='scratchmodel.best.hdf5',
                               verbose=1,save_best_only=True)
# load the VGG16 network
print("[INFO loading network...")
model_vgg = VGG16(weights="imagenet", include_top=False, input_shape=train_x.shape[1:])
model_vgg.summary()
[INFO loading network...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________

Building Model

The below python snippet builds a sequential model in Keras by adding the dense layers on top of loaded VGG16 model.

model_transfer_full = Sequential()
model_transfer_full.add(model_vgg)
model_transfer_full.add(GlobalAveragePooling2D())
model_transfer_full.add(Dropout(0.2))
model_transfer_full.add(Dense(100, activation='relu'))
model_transfer_full.add(Dense(11, activation='softmax'))
model_transfer_full.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
vgg16 (Model)                (None, 7, 7, 512)         14714688  
_________________________________________________________________
global_average_pooling2d_1 ( (None, 512)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 100)               51300     
_________________________________________________________________
dense_2 (Dense)              (None, 11)                1111      
=================================================================
Total params: 14,767,099
Trainable params: 14,767,099
Non-trainable params: 0
_________________________________________________________________
One very important point to note is to keep the learning rate very small. Imagine that the network is already trained for many hours with lots of data at a specific learning rate. We are only trying to fine tune it with few more iterations on new data-set. Therefore, we keep the learning rate smaller than learning rate used for training the base model network. Also, one should not train it for too many iterations if new data-set is small. It may over fit.
opt = Adam(lr=0.00001)
model_transfer_full.compile(loss='categorical_crossentropy', optimizer=opt,metrics=['accuracy'])
history = model_transfer_full.fit(train_x, y_train, batch_size=32, epochs=10,
          validation_data=(val_x, y_val), callbacks=[checkpointer],verbose=1, shuffle=True)
Train on 9866 samples, validate on 3430 samples
Epoch 1/10
9866/9866 [==============================] - 89s 9ms/step - loss: 2.0969 - acc: 0.3394 - val_loss: 1.2539 - val_acc: 0.5837

Epoch 00001: val_loss improved from inf to 1.25395, saving model to scratchmodel.best.hdf5
Epoch 2/10
9866/9866 [==============================] - 87s 9ms/step - loss: 1.0994 - acc: 0.6451 - val_loss: 0.8644 - val_acc: 0.7236

Epoch 00002: val_loss improved from 1.25395 to 0.86440, saving model to scratchmodel.best.hdf5
Epoch 3/10
9866/9866 [==============================] - 87s 9ms/step - loss: 0.7368 - acc: 0.7619 - val_loss: 0.6650 - val_acc: 0.7927

Epoch 00003: val_loss improved from 0.86440 to 0.66498, saving model to scratchmodel.best.hdf5
.
.
Epoch 10/10
9866/9866 [==============================] - 88s 9ms/step - loss: 0.1246 - acc: 0.9607 - val_loss: 0.6008 - val_acc: 0.8563

Epoch 00010: val_loss did not improve from 0.52079

Accuracy and Loss Plot

def plot_accuracy_loss(history):
    fig = plt.figure(figsize=(10,5))

    plt.subplot(1, 2, 1)
    plt.plot(history.history['acc'])
    plt.plot(history.history['val_acc'])
    plt.title('model accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'validation'], loc='upper left')
    plt.ylim([0, 1])

    plt.subplot(1, 2, 2)
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('model loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper right')
    plt.show()

plot_accuracy_loss(history)
accuracy and entropy loss graph transfer learning
Accuracy and Cross-entropy Loss for Transfer learning fully tuned VGG16

Test Evaluation: Full Fine-tuned VGG16

The following codes evaluates the full network fine-tuned model. One can see the confusion matrix shown below on test set for all 11 classes.

preds = np.argmax(model_transfer_full.predict(test_x), axis=1)
print("\nAccuracy on Test Data: ", accuracy_score(test_y, preds))
print("\nNumber of correctly identified imgaes: ",
      accuracy_score(test_y, preds, normalize=False),"\n")
confusion_matrix(test_y, preds, labels=range(0,11))
Accuracy on Test Data:  0.8769046907678518

Number of correctly identified imgaes:  2935 
array([[313,   1,  27,  13,   3,   7,   0,   0,   1,   2,   1],
       [  5, 106,  29,   1,   0,   4,   0,   1,   0,   2,   0],
       [ 10,   9, 445,   6,   2,   7,   0,   2,  12,   7,   0],
       [ 21,   3,  25, 282,   0,   1,   0,   2,   1,   0,   0],
       [ 13,   1,  19,   2, 234,   9,   1,   0,   1,   5,   2],
       [ 12,   0,  31,   7,  11, 366,   0,   1,   2,   1,   1],
       [  0,   0,   0,   2,   1,   0, 143,   0,   0,   1,   0],
       [  0,   0,   2,   0,   0,   0,   0,  94,   0,   0,   0],
       [  5,   0,  23,   3,   1,   6,   0,   1, 262,   1,   1],
       [  3,   0,  14,   0,   1,   1,   0,   0,   4, 476,   1],
       [  3,   1,   7,   2,   0,   0,   2,   1,   1,   0, 214]])

Fine Tuning Half Network

In this experiment, We only modify weights of later half of pre-trained model.  Here also, we chop the final 3 dense layers of the pre-trained VGG16 model and add required dense layers with final output layer for 11 classes.

VGG16 architecture for transfer learning
Fine-tuning VGG16 Model

Load Half Tunable VGG16 Model

We see how to freeze initial layers of pre-trained VGG16 model in Keras. The loaded model is only till last max-pool layer in VGG16 architecture.

# load the VGG16 network
print("[INFO] loading network...")
model_vgg = VGG16(weights="imagenet", include_top=False, input_shape=train_x.shape[1:])

# Freeze the layers except the last 9 layers
for layer in model_vgg.layers[:-9]:
    layer.trainable = False

# Check the trainable status of the individual layers
for layer in model_vgg.layers:
    print(layer, layer.trainable)
[INFO] loading network...
 False
 False
 False
 False
 False
 False
 False
 False
 False
 False
 True
 True
 True
 True
 True
 True
 True
 True
 True

Building Model

The below python snippet builds a sequential model in Keras by adding the dense layers on top of loaded half trainable VGG16 model.

model_transfer_half = Sequential()
model_transfer_half.add(model_vgg)
model_transfer_half.add(GlobalAveragePooling2D())
model_transfer_half.add(Dropout(0.2))
model_transfer_half.add(Dense(100, activation='relu'))
model_transfer_half.add(Dense(11, activation='softmax'))
model_transfer_half.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
vgg16 (Model)                (None, 7, 7, 512)         14714688  
_________________________________________________________________
global_average_pooling2d_2 ( (None, 512)               0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 100)               51300     
_________________________________________________________________
dense_4 (Dense)              (None, 11)                1111      
=================================================================
Total params: 14,767,099
Trainable params: 13,031,611
Non-trainable params: 1,735,488
_________________________________________________________________
opt = Adam(lr=0.00001)
model_transfer_half.compile(loss='categorical_crossentropy', optimizer=opt,metrics=['accuracy'])
history = model_transfer_half.fit(train_x, y_train, batch_size=32, epochs=10,
          validation_data=(val_x, y_val), callbacks=[checkpointer],verbose=1, shuffle=True)
Train on 9866 samples, validate on 3430 samples
Epoch 1/10
9866/9866 [==============================] - 50s 5ms/step - loss: 2.0475 - acc: 0.3971 - val_loss: 1.1239 - val_acc: 0.6379

Epoch 00001: val_loss did not improve from 0.52079
Epoch 2/10
9866/9866 [==============================] - 49s 5ms/step - loss: 0.9886 - acc: 0.6817 - val_loss: 0.7377 - val_acc: 0.7720

Epoch 00002: val_loss did not improve from 0.52079
.
.
Epoch 9/10
9866/9866 [==============================] - 50s 5ms/step - loss: 0.1036 - acc: 0.9679 - val_loss: 0.5388 - val_acc: 0.8536

Epoch 00009: val_loss did not improve from 0.52079
Epoch 10/10
9866/9866 [==============================] - 49s 5ms/step - loss: 0.0807 - acc: 0.9730 - val_loss: 0.5999 - val_acc: 0.8531

Epoch 00010: val_loss did not improve from 0.52079

Accuracy and Loss Plot

plot_accuracy_loss(history)
accuracy and entropy loss graph transfer learning
Accuracy and Cross-entropy Loss for Transfer learning Half tuned VGG16

Test Evaluation: Half Network VGG16

The following codes evaluates the half network fine-tuned VGG16 neural network (NN) model. One can see the confusion matrix shown below on test set for all 11 classes.

preds = np.argmax(model_transfer_half.predict(test_x), axis=1)
print("\nAccuracy on Test Data: ", accuracy_score(test_y, preds))
print("\nNumber of correctly identified imgaes: ",
      accuracy_score(test_y, preds, normalize=False),"\n")
confusion_matrix(test_y, preds, labels=range(0,11))
Accuracy on Test Data:  0.8843740663280549

Number of correctly identified images:  2960 
array([[287,   2,  30,  24,  14,   3,   0,   1,   2,   4,   1],
       [  1, 117,  15,   2,   1,   3,   0,   3,   1,   3,   2],
       [  4,  14, 434,  10,   5,   6,   0,   3,  10,   7,   7],
       [ 16,   4,  18, 285,   2,   3,   1,   2,   1,   1,   2],
       [  3,   2,  13,   1, 254,   9,   0,   1,   2,   2,   0],
       [ 10,   0,  18,   3,  14, 378,   0,   1,   4,   1,   3],
       [  0,   0,   1,   1,   1,   0, 143,   0,   0,   1,   0],
       [  0,   0,   1,   0,   0,   0,   0,  95,   0,   0,   0],
       [  1,   0,  22,   4,   0,   5,   0,   1, 265,   1,   4],
       [  2,   3,   7,   2,   1,   0,   0,   0,   0, 485,   0],
       [  0,   0,   6,   1,   0,   1,   0,   3,   2,   1, 217]])

At The End

Hope it was an easy go through tutorial. The blog-post clearly illustrates the advantage of transfer learning for smaller data-set applications. Also, we saw that half fine-tuned VGG16 model gives small improvement than full fine-tuned model in test accuracy on Food-11 data-set. This post was second post on transfer learning techniques for image classification task.

You can get the full python implementation of this blog-post in a Jupyter Notebook from GitHub link here.

Beginners in computer vision and deep learning can start with this application. We encourage beginners to download the data-set and reproduce the results. Readers can discuss in comments if there is need of any explicit explanation.

If you liked the post, follow this blog to get updates about upcoming articles. Also, share it so that it can reach out to the readers who can actually gain from this. Please feel free to discuss anything regarding the post. I would love to hear feedback from you.

Happy Deep Learning 🙂