The idea of transfer learning comes from a curious phenomenon that many deep neural networks trained on natural images learn similar features. These are texture, corners, edges and color blobs in the initial layers. Such initial-layer features appear not to specific to a particular data-set or task but are general in that they are applicable to many data-sets and tasks. These standard features found on the initial layers seems to occur regardless of the exact cost function and natural image data-set. We call these initial-layer features general and can be transferred for learning specific data-set.
Introduction
In transfer learning, we first train a base network on a base data-set and task, and then we transfer the learned features, to a second target network to be trained on a target data-set and task. This process will tend to work if the features are general, that is, suitable to both base and target tasks, instead of being specific to the base task.
Earlier, I have penned down couple of blog-post to train entire Convolution Network (CNN) model on sufficiently large data-set. You can read posts here and here. In practice, very few people train an entire CNN from scratch because it is relatively rare to have a data-set of sufficient size. Instead, it is common to pre-train a convolution neural network (CNN) on a very large data-set (e.g. ImageNet data-set, which contains 1.2 million images with 1000 categories), and then use the pre-trained model either as an initialization or a fixed feature extractor for the task of interest.
There are two ways to do transfer learning.
- Feature Extraction from pre-trained model and then training a classifier on top of it.
- Fine tuning the pre-trained model keeping learnt weights as initial parameters.
This blog-post showcases the implementation of transfer learning using the first way which is “Feature Extraction from pre-trained model and training a classifier using extracted features”.
Problem Description
In this blog-post, We will use a data-set containing 16643 food images grouped in 11 major food categories for transfer learning demonstration. This is a food image classification task. The 11 categories are:
- Bread
- Dairy product
- Dessert
- Egg
- Fried food
- Meat
- Noodles/Pasta
- Rice
- Seafood
- Soup
- Vegetable/Fruit
The Food-11 dataset is divided in three parts: training, validation and evaluation. The naming convention is used, where ID 0-10 refers to the 11 food categories respectively. The data-set can be downloaded from here.
Lets start with python codes on transfer learning with feature extraction technique.
- Import Library
- Reading Data
- Create labels
- Train, Validation and Test Distribution
- Sample Images
- Features Extraction
- CNN Model Training : Baseline
- Test Evaluation
- Transfer Learning CNN : VGG16 Features
- Test Evaluation
- Logistic Regression: VGG16 Features
- Test Evaluation
Import Library
import os import numpy as np import pandas as pd import matplotlib.pyplot as plt; # Importing sklearn libraries from sklearn.model_selection import GridSearchCV from sklearn.linear_model import LogisticRegression from sklearn.metrics import confusion_matrix, accuracy_score # Importing hypopt library for grid search from hypopt import GridSearch # Importing Keras libraries from keras.utils import np_utils from keras.models import Sequential from keras.applications import VGG16 from keras.applications import imagenet_utils from keras.callbacks import ModelCheckpoint from keras.preprocessing.image import load_img from keras.preprocessing.image import img_to_array from keras.layers import Dense, Conv2D, MaxPooling2D from keras.layers import Dropout, Flatten, GlobalAveragePooling2D import warnings warnings.filterwarnings('ignore')
Reading Food-11 Dataset
train = [os.path.join("Food-11/training",img) for img in os.listdir("Food-11/training")]
val = [os.path.join("Food-11/validation",img) for img in os.listdir("Food-11/validation")]
test = [os.path.join("Food-11/evaluation",img) for img in os.listdir("Food-11/evaluation")]
len(train),len(val),len(test)
Number of images in training set, validation set and test set are 9866, 3430 and 3347 respectively.
train[0:5]
Create labels
Here, we create labels for all the images and convert them into one hot encoded vector. It is required for training a neural network (last layer).
train_y = [int(img.split("/")[-1].split("_")[0]) for img in train]
val_y = [int(img.split("/")[-1].split("_")[0]) for img in val]
test_y = [int(img.split("/")[-1].split("_")[0]) for img in test]
num_classes = 11
# Convert class labels in one hot encoded vector
y_train = np_utils.to_categorical(train_y, num_classes)
y_val = np_utils.to_categorical(val_y, num_classes)
y_test = np_utils.to_categorical(test_y, num_classes)
train_y[0:10]
y_train[0:10]
Train, Validation and Test Distribution
The training distribution of food images in 11 categories are shown in an image below.
print("Training data available in 11 classes")
print([train_y.count(i) for i in range(0,11)])
food_classes = ('Bread','Dairy product','Dessert','Egg','Fried food','Meat',
'Noodles/Pasta','Rice','Seafood', 'Soup', 'Vegetable/Fruit')
y_pos = np.arange(len(food_classes))
counts = [train_y.count(i) for i in range(0,11)]
plt.barh(y_pos, counts, align='center', alpha=0.5)
plt.yticks(y_pos, food_classes)
plt.xlabel('Counts')
plt.title('Train Data Class Distribution')
plt.show()
print("Validation data available in 11 classes")
[val_y.count(i) for i in range(0,11)]
print("Test data available in 11 classes")
[test_y.count(i) for i in range(0,11)]
Sample Images
Few of the sample images from Food-11 data-set are shown below.
def show_imgs(X):
plt.figure(figsize=(8, 8))
k = 0
for i in range(0,4):
for j in range(0,4):
image = load_img(train[k], target_size=(224, 224))
plt.subplot2grid((4,4),(i,j))
plt.imshow(image)
k = k+1
# show the plot
plt.show()
show_imgs(train)
Features Extraction
We will see how to load already train VGG16 model with chopped top layers. The VGG16 model architecture looks like below.
The loaded model is only till last max-pool layer in VGG16 architecture. We can chop dense layers by setting the parameter include_top=false.
# load the VGG16 network
print("[INFO] loading network...")
# chop the top dense layers, include_top=False
model = VGG16(weights="imagenet", include_top=False)
model.summary()
def create_features(dataset, pre_model):
x_scratch = []
# loop over the images
for imagePath in dataset:
# load the input image and image is resized to 224x224 pixels
image = load_img(imagePath, target_size=(224, 224))
image = img_to_array(image)
# preprocess the image by (1) expanding the dimensions and
# (2) subtracting the mean RGB pixel intensity from the
# ImageNet dataset
image = np.expand_dims(image, axis=0)
image = imagenet_utils.preprocess_input(image)
# add the image to the batch
x_scratch.append(image)
x = np.vstack(x_scratch)
features = pre_model.predict(x, batch_size=32)
features_flatten = features.reshape((features.shape[0], 7 * 7 * 512))
return x, features, features_flatten
There are three types of features being extracted for each data-set train, val and test:
- Processed raw input image (224, 224, 3). Used for training CNN from scratch.
- Features extracted from last Convolution layer (7, 7, 512) of pre-trained VGG16. Used for transfer learning.
- Flatten extracted features from VGG16 (25088). Used for training a logistic regression model.
train_x, train_features, train_features_flatten = create_features(train, model)
val_x, val_features, val_features_flatten = create_features(val, model)
test_x, test_features, test_features_flatten = create_features(test, model)
print(train_x.shape, train_features.shape, train_features_flatten.shape)
print(val_x.shape, val_features.shape, val_features_flatten.shape)
print(test_x.shape, test_features.shape, test_features_flatten.shape)
CNN Model Training : Baseline
As a baseline model, we train a CNN model from scratch only on the limited images in Food-11 data-set.
# Creating a checkpointer
checkpointer = ModelCheckpoint(filepath='scratchmodel.best.hdf5',
verbose=1,save_best_only=True)
# Building up a Sequential model
model_scratch = Sequential()
model_scratch.add(Conv2D(32, (3, 3), activation='relu',input_shape = train_x.shape[1:]))
model_scratch.add(MaxPooling2D(pool_size=(2, 2)))
model_scratch.add(Conv2D(64, (3, 3), activation='relu'))
model_scratch.add(MaxPooling2D(pool_size=(2, 2)))
model_scratch.add(Conv2D(64, (3, 3), activation='relu'))
model_scratch.add(MaxPooling2D(pool_size=(2, 2)))
model_scratch.add(Conv2D(128, (3, 3), activation='relu'))
model_scratch.add(MaxPooling2D(pool_size=(2, 2)))
model_scratch.add(GlobalAveragePooling2D())
model_scratch.add(Dense(64, activation='relu'))
model_scratch.add(Dense(11, activation='softmax'))
model_scratch.summary()
model_scratch.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])
#Fitting the model on the train data and labels.
history = model_scratch.fit(train_x, y_train,
batch_size=32, epochs=10,
verbose=1, callbacks=[checkpointer],
validation_data=(val_x, y_val), shuffle=True)
def plot_acc_loss(history):
fig = plt.figure(figsize=(10,5))
plt.subplot(1, 2, 1)
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper right')
plt.show()
plot_acc_loss(history)
Test Evaluation: Baseline CNN
The following codes evaluates the baseline model. Confusion matrix for 11 classes can bee seen below.
preds = np.argmax(model_scratch.predict(test_x), axis=1)
print("\nAccuracy on Test Data: ", accuracy_score(test_y, preds))
print("\nNumber of correctly identified imgaes: ",
accuracy_score(test_y, preds, normalize=False),"\n")
confusion_matrix(test_y, preds, labels=range(0,11))
Transfer Learning CNN : VGG16 Features
Now, we use the extracted features from last maxpooling layer of VGG16 as an input for a shallow neural network. This technique is known as transfer learning with feature extraction.
model_transfer = Sequential()
model_transfer.add(GlobalAveragePooling2D(input_shape=train_features.shape[1:]))
model_transfer.add(Dropout(0.2))
model_transfer.add(Dense(100, activation='relu'))
model_transfer.add(Dense(11, activation='softmax'))
model_transfer.summary()
model_transfer.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])
history = model_transfer.fit(train_features, y_train, batch_size=32, epochs=10,
validation_data=(val_features, y_val), callbacks=[checkpointer],
verbose=1, shuffle=True)
plot_acc_loss(history)
Test Evaluation: Tranfer learnt Neural Network Classifier
The following codes evaluates the transfer learnt features based shallow neural network (NN) model. Confusion matrix for 11 classes can be seen below.
preds = np.argmax(model_transfer.predict(test_features), axis=1)
print("\nAccuracy on Test Data: ", accuracy_score(test_y, preds))
print("\nNumber of correctly identified imgaes: ",
accuracy_score(test_y, preds, normalize=False),"\n")
confusion_matrix(test_y, preds, labels=range(0,11))
Logistic Regression: VGG16 Features
Finally, we use flattened extracted features (7X7X512=25088 features) from last maxpooling layer of VGG16 as an input to Logistic regression (LR) classifier. The validation set is used for fine tuning the hyper parameters of the logistic regression model. It was found that regurlarized logistic model performs better than default LR model. Here, we use hypopt pypi package for performing grid search on hyper-parameters.
param_grid = [{'C': [0.1,1,10],'solver': ['newton-cg','lbfgs']}]
# Grid-search all parameter combinations using a validation set.
opt = GridSearch(model = LogisticRegression(class_weight='balanced', multi_class="auto",
max_iter=200, random_state=1),param_grid = param_grid)
opt.fit(train_features_flatten, train_y, val_features_flatten, val_y, scoring = 'accuracy')
print(opt.get_best_params())
Test Evaluation: Logistic Regression Classifier
The following codes evaluates logistic regression model on test set. Confusion matrix for 11 classes can be seen below.
opt.score(test_features_flatten, test_y)
preds = opt.predict(test_features_flatten)
print("\nAccuracy on Test Data: ", accuracy_score(test_y, preds))
print("\nNumber of correctly identified imgaes: ",
accuracy_score(test_y, preds, normalize=False),"\n")
confusion_matrix(test_y, preds, labels=range(0,11))
When i run the features extract section it showing the kernel is dead what might be the reason
Like