In the previous blog-post, we demonstrated transfer learning using feature extraction technique and training a classifier further from the generated features. One of the strategy for transfer learning is to not only replace and retrain the classifier on top of the Convolution Neural Network (CNN) on the new data-set, but also fine tuning the weights of the pre-trained network by continuing the back propagation. There are two ways to do transfer learning.
- Feature Extraction from pre-trained model and then training a classifier on top of it.
- Fine tuning a pre-trained model keeping learnt weights as initial parameters.
This blog-post showcases the implementation of transfer learning using the second way which is “Fine tuning a pre-trained model”.
Transfer Learning Rules
Readers will find the following rules for transfer learning over numerous blogs on internet.
Consider that the new data-set is almost similar to the original dataset used for pre-training. In such scenario, there are two cases.
- If the new data-set is very small, it’s better to train only the final layers of the network to avoid overfitting, keeping all other layers fixed. So remove the final layers of the pre-trained network. Add new layers of required classes. Retrain only the new layers after freezing all other layers. Demonstrated in this blog-post.
- If the new data-set is very much large, retrain the whole network with initial weights from the pre-trained model.
Consider that the new data-set is very different from the orginal dataset. In this scenario, following is recommended.
- If the new data-set is very small, it will be good to fix the earlier layers and retrain the rest of the layers. The earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors), but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original data-set. The earlier layers can help to extract the low level descriptors of the new data.
- If the new data-set has large amount of data, we can retrain the whole network with weights initialized from the pre-trained network.
Problem Description
In this blog-post, We will use a data-set containing 16643 food images grouped in 11 major food categories for transfer learning demonstration. This is a food image classification task. The 11 categories are:
- Bread
- Dairy product
- Dessert
- Egg
- Fried food
- Meat
- Noodles/Pasta
- Rice
- Seafood
- Soup
- Vegetable/Fruit
The Food-11 dataset is divided in three parts: training, validation and evaluation. The naming convention is used, where ID 0-10 refers to the 11 food categories respectively. The data-set can be downloaded from here.
Let’s start with python codes on transfer learning with fine-tuning technique.
- Import Library
- Reading Data
- Create labels
- Train, Validation and Test Distribution
- Sample Images
- Feature Extraction
- Fine Tuning VGG16 : Transfer Learning
- Fine Tuning Full Network
- Loading chopped VGG16 Model
- Building Model
- Accuracy and Loss Plot
- Test Evaluation
- Fine Tuning Half Network
- Load Half Tuned VGG16 Model
- Building Model
- Accuracy and Loss Plot
- Test Evaluation
- Fine Tuning Full Network
Import Library
import os import numpy as np import pandas as pd import matplotlib.pyplot as plt; # Importing Keras libraries from keras.utils import np_utils from keras.optimizers import Adam from keras.models import Sequential from keras.applications import VGG16 from keras.callbacks import ModelCheckpoint from keras.applications import imagenet_utils from keras.preprocessing.image import load_img, img_to_array from keras.layers import Input, Dense, Dropout, GlobalAveragePooling2D # Importing sklearn libraries from sklearn.metrics import confusion_matrix, accuracy_score import warnings warnings.filterwarnings('ignore')
Reading Food-11 Dataset
train = [os.path.join("Food-11/training",img) for img in os.listdir("Food-11/training")] val = [os.path.join("Food-11/validation",img) for img in os.listdir("Food-11/validation")] test = [os.path.join("Food-11/evaluation",img) for img in os.listdir("Food-11/evaluation")]
(9866, 3430, 3347)
Number of images in training set, validation set and test set are 9866, 3430 and 3347 respectively.
['Food-11/training/9_1339.jpg', 'Food-11/training/2_1351.jpg', 'Food-11/training/1_170.jpg', 'Food-11/training/6_31.jpg', 'Food-11/training/8_558.jpg']
Create labels
Here, we create labels for all the images and convert them into one hot encoded vector. It is required for training a neural network (last layer).
train_y = [int(img.split("/")[-1].split("_")[0]) for img in train] val_y = [int(img.split("/")[-1].split("_")[0]) for img in val] test_y = [int(img.split("/")[-1].split("_")[0]) for img in test] num_classes = 11 # Convert class labels in one hot encoded vector y_train = np_utils.to_categorical(train_y, num_classes) y_val = np_utils.to_categorical(val_y, num_classes) y_test = np_utils.to_categorical(test_y, num_classes)
[9, 2, 1, 6, 8, 6, 0, 9, 0, 9]
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.], [0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.], [0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.], [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.], [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.]], dtype=float32)
Train, Validation and Test Distribution
The training distribution of food images in 11 categories are shown in an image below.
print("Training data available in 11 classes") print([train_y.count(i) for i in range(0,11)]) food_classes = ('Bread','Dairy product','Dessert','Egg','Fried food','Meat', 'Noodles/Pasta','Rice','Seafood', 'Soup', 'Vegetable/Fruit') y_pos = np.arange(len(food_classes)) counts = [train_y.count(i) for i in range(0,11)] plt.barh(y_pos, counts, align='center', alpha=0.5) plt.yticks(y_pos, food_classes) plt.xlabel('Counts') plt.title('Train Data Class Distribution')
Training data available in 11 classes [994, 429, 1500, 986, 848, 1325, 440, 280, 855, 1500, 709]

print("Validation data available in 11 classes") [val_y.count(i) for i in range(0,11)]
print("Test data available in 11 classes") [test_y.count(i) for i in range(0,11)]
Sample Images
Few of the sample images from Food-11 data-set are shown below.
def show_imgs(X): plt.figure(figsize=(8, 8)) k = 0 for i in range(0,4): for j in range(0,4): image = load_img(train[k], target_size=(224, 224)) plt.subplot2grid((4,4),(i,j)) plt.imshow(image) k = k+1 # show the plot

Features Extraction
Features are nothing but simply the pixel values of image zero-centered each color channel with respect to ImageNet dataset without scaling.
def create_features(dataset): x_scratch = [] # loop over the images for imagePath in dataset: # load the input image and image is resized to 224x224 pixels image = load_img(imagePath, target_size=(224, 224)) image = img_to_array(image) # preprocess the image by (1) expanding the dimensions and # (2) subtracting the mean RGB pixel intensity from the # ImageNet dataset image = np.expand_dims(image, axis=0) image = imagenet_utils.preprocess_input(image) # add the image to the batch x_scratch.append(image) x = np.vstack(x_scratch) return x
train_x = create_features(train) val_x = create_features(val) test_x = create_features(test) print(train_x.shape) print(val_x.shape) print(test_x.shape)
(9866, 224, 224, 3) (3430, 224, 224, 3) (3347, 224, 224, 3)
Fine Tuning VGG16 : Transfer Learning
There are two experiments performed. Firstly, we re-train or say fine tune the full VGG16 network from initialized weights of pre-trained model. Secondly, we fine tune only half of the model by freezing initial half of the convolution layers of VGG16 model.

Fine Tuning Full Network
For this purpose, we chop the final 3 dense layers of the pre-trained VGG16 model. Further, we add new dense layers with final output layer of required 11 classes. We do not freeze any layer but modifies all the weights of VGG16 model.
Loading chopped VGG16 Model
We see how to load already train VGG16 model with chopped top layers in Keras. The loaded model is only till last max-pool layer in VGG16 architecture. We can chop dense layers by setting the parameter include_top=false.
# Creating a checkpointer checkpointer = ModelCheckpoint(filepath='', verbose=1,save_best_only=True) # load the VGG16 network print("[INFO loading network...") model_vgg = VGG16(weights="imagenet", include_top=False, input_shape=train_x.shape[1:]) model_vgg.summary()
[INFO loading network... _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) (None, 224, 224, 3) 0 _________________________________________________________________ block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 112, 112, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 112, 112, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 56, 56, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 56, 56, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 28, 28, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 14, 14, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 7, 7, 512) 0 ================================================================= Total params: 14,714,688 Trainable params: 14,714,688 Non-trainable params: 0 _________________________________________________________________
Building Model
The below python snippet builds a sequential model in Keras by adding the dense layers on top of loaded VGG16 model.
model_transfer_full = Sequential() model_transfer_full.add(model_vgg) model_transfer_full.add(GlobalAveragePooling2D()) model_transfer_full.add(Dropout(0.2)) model_transfer_full.add(Dense(100, activation='relu')) model_transfer_full.add(Dense(11, activation='softmax')) model_transfer_full.summary()
opt = Adam(lr=0.00001) model_transfer_full.compile(loss='categorical_crossentropy', optimizer=opt,metrics=['accuracy']) history =, y_train, batch_size=32, epochs=10, validation_data=(val_x, y_val), callbacks=[checkpointer],verbose=1, shuffle=True)
Train on 9866 samples, validate on 3430 samples Epoch 1/10 9866/9866 [==============================] - 89s 9ms/step - loss: 2.0969 - acc: 0.3394 - val_loss: 1.2539 - val_acc: 0.5837 Epoch 00001: val_loss improved from inf to 1.25395, saving model to Epoch 2/10 9866/9866 [==============================] - 87s 9ms/step - loss: 1.0994 - acc: 0.6451 - val_loss: 0.8644 - val_acc: 0.7236 Epoch 00002: val_loss improved from 1.25395 to 0.86440, saving model to Epoch 3/10 9866/9866 [==============================] - 87s 9ms/step - loss: 0.7368 - acc: 0.7619 - val_loss: 0.6650 - val_acc: 0.7927 Epoch 00003: val_loss improved from 0.86440 to 0.66498, saving model to . . Epoch 10/10 9866/9866 [==============================] - 88s 9ms/step - loss: 0.1246 - acc: 0.9607 - val_loss: 0.6008 - val_acc: 0.8563 Epoch 00010: val_loss did not improve from 0.52079
Accuracy and Loss Plot
def plot_accuracy_loss(history): fig = plt.figure(figsize=(10,5)) plt.subplot(1, 2, 1) plt.plot(history.history['acc']) plt.plot(history.history['val_acc']) plt.title('model accuracy') plt.ylabel('accuracy') plt.xlabel('epoch') plt.legend(['train', 'validation'], loc='upper left') plt.ylim([0, 1]) plt.subplot(1, 2, 2) plt.plot(history.history['loss']) plt.plot(history.history['val_loss']) plt.title('model loss') plt.ylabel('loss') plt.xlabel('epoch') plt.legend(['train', 'test'], loc='upper right') plot_accuracy_loss(history)

Test Evaluation: Full Fine-tuned VGG16
The following codes evaluates the full network fine-tuned model. One can see the confusion matrix shown below on test set for all 11 classes.
preds = np.argmax(model_transfer_full.predict(test_x), axis=1) print("\nAccuracy on Test Data: ", accuracy_score(test_y, preds)) print("\nNumber of correctly identified imgaes: ", accuracy_score(test_y, preds, normalize=False),"\n") confusion_matrix(test_y, preds, labels=range(0,11))
Fine Tuning Half Network
In this experiment, We only modify weights of later half of pre-trained model. Here also, we chop the final 3 dense layers of the pre-trained VGG16 model and add required dense layers with final output layer for 11 classes.

Load Half Tunable VGG16 Model
We see how to freeze initial layers of pre-trained VGG16 model in Keras. The loaded model is only till last max-pool layer in VGG16 architecture.
# load the VGG16 network print("[INFO] loading network...") model_vgg = VGG16(weights="imagenet", include_top=False, input_shape=train_x.shape[1:]) # Freeze the layers except the last 9 layers for layer in model_vgg.layers[:-9]: layer.trainable = False # Check the trainable status of the individual layers for layer in model_vgg.layers: print(layer, layer.trainable)
[INFO] loading network...False False False False False False False False False False True True True True True True True True True
Building Model
The below python snippet builds a sequential model in Keras by adding the dense layers on top of loaded half trainable VGG16 model.
model_transfer_half = Sequential() model_transfer_half.add(model_vgg) model_transfer_half.add(GlobalAveragePooling2D()) model_transfer_half.add(Dropout(0.2)) model_transfer_half.add(Dense(100, activation='relu')) model_transfer_half.add(Dense(11, activation='softmax')) model_transfer_half.summary()
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= vgg16 (Model) (None, 7, 7, 512) 14714688 _________________________________________________________________ global_average_pooling2d_2 ( (None, 512) 0 _________________________________________________________________ dropout_2 (Dropout) (None, 512) 0 _________________________________________________________________ dense_3 (Dense) (None, 100) 51300 _________________________________________________________________ dense_4 (Dense) (None, 11) 1111 ================================================================= Total params: 14,767,099 Trainable params: 13,031,611 Non-trainable params: 1,735,488 _________________________________________________________________
opt = Adam(lr=0.00001) model_transfer_half.compile(loss='categorical_crossentropy', optimizer=opt,metrics=['accuracy']) history =, y_train, batch_size=32, epochs=10, validation_data=(val_x, y_val), callbacks=[checkpointer],verbose=1, shuffle=True)
Train on 9866 samples, validate on 3430 samples Epoch 1/10 9866/9866 [==============================] - 50s 5ms/step - loss: 2.0475 - acc: 0.3971 - val_loss: 1.1239 - val_acc: 0.6379 Epoch 00001: val_loss did not improve from 0.52079 Epoch 2/10 9866/9866 [==============================] - 49s 5ms/step - loss: 0.9886 - acc: 0.6817 - val_loss: 0.7377 - val_acc: 0.7720 Epoch 00002: val_loss did not improve from 0.52079 . . Epoch 9/10 9866/9866 [==============================] - 50s 5ms/step - loss: 0.1036 - acc: 0.9679 - val_loss: 0.5388 - val_acc: 0.8536 Epoch 00009: val_loss did not improve from 0.52079 Epoch 10/10 9866/9866 [==============================] - 49s 5ms/step - loss: 0.0807 - acc: 0.9730 - val_loss: 0.5999 - val_acc: 0.8531 Epoch 00010: val_loss did not improve from 0.52079
Accuracy and Loss Plot

Test Evaluation: Half Network VGG16
The following codes evaluates the half network fine-tuned VGG16 neural network (NN) model. One can see the confusion matrix shown below on test set for all 11 classes.
preds = np.argmax(model_transfer_half.predict(test_x), axis=1) print("\nAccuracy on Test Data: ", accuracy_score(test_y, preds)) print("\nNumber of correctly identified imgaes: ", accuracy_score(test_y, preds, normalize=False),"\n") confusion_matrix(test_y, preds, labels=range(0,11))
At The End
Hope it was an easy go through tutorial. The blog-post clearly illustrates the advantage of transfer learning for smaller data-set applications. Also, we saw that half fine-tuned VGG16 model gives small improvement than full fine-tuned model in test accuracy on Food-11 data-set. This post was second post on transfer learning techniques for image classification task.
You can get the full python implementation of this blog-post in a Jupyter Notebook from GitHub link here.
Beginners in computer vision and deep learning can start with this application. We encourage beginners to download the data-set and reproduce the results. Readers can discuss in comments if there is need of any explicit explanation.
