Demonstration of Facial Emotion Recognition on Real Time Video Using CNN : Python & Keras

Date: November 28, 2018Author: Abhijeet Kumar 36 Comments

This blog-post presents building a demonstration of emotion recognition from the detected bounded face in a real time video or images.

Introduction

An face emotion recognition system comprises of two step process i.e. face detection (bounded face) in image followed by emotion detection on the detected bounded face. The following two techniques are used for respective mentioned tasks in face recognition system.

Haar feature-based cascade classifiers : It detects frontal face in an image well. It is real time and faster in comparison to other face detector. This blog-post uses an implementation from Open-CV.
Xception CNN Model (Mini_Xception, 2017) : We will train a classification CNN model architecture which takes bounded face (48*48 pixels) as input and predicts probabilities of 7 emotions in the output layer.

Data-set

One can download the facial expression recognition (FER) data-set from Kaggle challenge here. The data consists of 48×48 pixel gray scale images of faces. The faces have been automatically registered so that the face is more or less centered and occupies about the same amount of space in each image. The task is to categorize each face based on the emotion shown in the facial expression in to one of seven categories (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral).

The training set consists of 35,888 examples. train.csv contains two columns, “emotion” and “pixels”. The “emotion” column contains a numeric code ranging from 0 to 6, inclusive, for the emotion that is present in the image. The “pixels” column contains a string surrounded in quotes for each image. The contents of this string a space-separated pixel values in row major order

Loading FER Data-set

The below code loads the data-set and pre-process the images for feeding it to CNN model. There are two definitions in the code snippet here:

1. def load_fer2013 : It reads the csv file and convert pixel sequence of each row in image of dimension 48*48. It returns faces and emotion labels.

2. def preprocess_input: It is a standard way to pre-process images by scaling them between -1 to 1. Images is scaled to [0,1] by dividing it by 255. Further, subtraction by 0.5 and multiplication by 2 changes the range to [-1,1]. [-1,1] has been found a better range for neural network models in computer vision problems.

import pandas as pd
import cv2
import numpy as np

dataset_path = 'fer2013/fer2013.csv'
image_size=(48,48)

def load_fer2013():
        data = pd.read_csv(dataset_path)
        pixels = data['pixels'].tolist()
        width, height = 48, 48
        faces = []
        for pixel_sequence in pixels:
                face = [int(pixel) for pixel in pixel_sequence.split(' ')]
                face = np.asarray(face).reshape(width, height)
                face = cv2.resize(face.astype('uint8'),image_size)
                faces.append(face.astype('float32'))
        faces = np.asarray(faces)
        faces = np.expand_dims(faces, -1)
        emotions = pd.get_dummies(data['emotion']).as_matrix()
        return faces, emotions

def preprocess_input(x, v2=True):
    x = x.astype('float32')
    x = x / 255.0
    if v2:
        x = x - 0.5
        x = x * 2.0
    return x

faces, emotions = load_fer2013()
faces = preprocess_input(faces)
xtrain, xtest,ytrain,ytest = train_test_split(faces, emotions,test_size=0.2,shuffle=True)

5 expression samples of each of the 7 emotions in the data-set can be seen below.

Originally in the dataset provided in kaggle link, each image is given as string which is a row 1×2304 which is 48×48 image stored as row vector. The strings in the .csv files can be converted into images using the code in github link here.

Training CNN model : Mini Xception

Here comes the exciting architecture which is comparatively small and achieves almost state-of-art performance of classifying emotion on this data-set. The below architecture was proposed by Octavio Arragia et al. in this paper.

mini_exception_cnn_model — Proposed Mini_Xception architecture for emotion classification

One can notice that the center block is repeated 4 times in the design. This architecture is different from the most common CNN architecture like one used in the blog-post here. Common architectures uses fully connected layers at the end where most of parameters resides. Also, they use standard convolutions. Modern CNN architectures such as Xception leverage from the combination of two of the most successful experimental assumptions in CNNs: the use of residual modules and depth-wise separable convolutions.

There are various techniques that can be kept in mind while building a deep neural network and is applicable in most of the computer vision problems. Below are few of those techniques which are used while training the CNN model below.

Data Augmentation : More data is generated using the training set by applying transformations. It is required if the training set is not sufficient enough to learn representation. The image data is generated by transforming the actual training images by rotation, crop, shifts, shear, zoom, flip, reflection, normalization etc.
Kernel_regularizer : It allows to apply penalties on layer parameters during optimization. These penalties are incorporated in the loss function that the network optimizes. Argument in convolution layer is nothing but L2 regularisation of the weights. This penalizes peaky weights and makes sure that all the inputs are considered.
BatchNormalization : It normalizes the activation of the previous layer at each batch, i.e. applies a transformation that maintains the mean activation close to 0 and the activation standard deviation close to 1. It addresses the problem of internal covariate shift. It also acts as a regularizer, in some cases eliminating the need for Dropout. It helps in speeding up the training process.
Global Average Pooling : It reduces each feature map into a scalar value by taking the average over all elements in the feature map. The average operation forces the network to extract global features from the input image.
Depthwise Separable Convolution : These convolutions are composed of two different layers: depth-wise convolutions and point-wise convolutions. Depth-wise separable convolutions reduces the computation with respect to the standard convolutions by reducing the number of parameters. A very nice and visual explanation of the difference between standard and depth-wise separable convolution is given in the paper.

Below python codes implements the above architecture in Keras.

from keras.callbacks import CSVLogger, ModelCheckpoint, EarlyStopping
from keras.callbacks import ReduceLROnPlateau
from keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from keras.layers import Activation, Convolution2D, Dropout, Conv2D
from keras.layers import AveragePooling2D, BatchNormalization
from keras.layers import GlobalAveragePooling2D
from keras.models import Sequential
from keras.layers import Flatten
from keras.models import Model
from keras.layers import Input
from keras.layers import MaxPooling2D
from keras.layers import SeparableConv2D
from keras import layers
from keras.regularizers import l2
import pandas as pd
import cv2
import numpy as np

# parameters
batch_size = 32
num_epochs = 110
input_shape = (48, 48, 1)
verbose = 1
num_classes = 7
patience = 50
base_path = 'models/'
l2_regularization=0.01

# data generator
data_generator = ImageDataGenerator(
                        featurewise_center=False,
                        featurewise_std_normalization=False,
                        rotation_range=10,
                        width_shift_range=0.1,
                        height_shift_range=0.1,
                        zoom_range=.1,
                        horizontal_flip=True)

# model parameters
regularization = l2(l2_regularization)

# base
img_input = Input(input_shape)
x = Conv2D(8, (3, 3), strides=(1, 1), kernel_regularizer=regularization, use_bias=False)(img_input)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Conv2D(8, (3, 3), strides=(1, 1), kernel_regularizer=regularization, use_bias=False)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)

# module 1
residual = Conv2D(16, (1, 1), strides=(2, 2), padding='same', use_bias=False)(x)
residual = BatchNormalization()(residual)
x = SeparableConv2D(16, (3, 3), padding='same', kernel_regularizer=regularization, use_bias=False)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = SeparableConv2D(16, (3, 3), padding='same', kernel_regularizer=regularization, use_bias=False)(x)
x = BatchNormalization()(x)
x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)
x = layers.add([x, residual])

# module 2
residual = Conv2D(32, (1, 1), strides=(2, 2), padding='same', use_bias=False)(x)
residual = BatchNormalization()(residual)
x = SeparableConv2D(32, (3, 3), padding='same', kernel_regularizer=regularization, use_bias=False)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = SeparableConv2D(32, (3, 3), padding='same', kernel_regularizer=regularization, use_bias=False)(x)
x = BatchNormalization()(x)
x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)
x = layers.add([x, residual])

# module 3
residual = Conv2D(64, (1, 1), strides=(2, 2),padding='same', use_bias=False)(x)
residual = BatchNormalization()(residual)
x = SeparableConv2D(64, (3, 3), padding='same',kernel_regularizer=regularization,use_bias=False)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = SeparableConv2D(64, (3, 3), padding='same',kernel_regularizer=regularization,use_bias=False)(x)
x = BatchNormalization()(x)
x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)
x = layers.add([x, residual])

# module 4
residual = Conv2D(128, (1, 1), strides=(2, 2),padding='same', use_bias=False)(x)
residual = BatchNormalization()(residual)
x = SeparableConv2D(128, (3, 3), padding='same',kernel_regularizer=regularization,use_bias=False)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = SeparableConv2D(128, (3, 3), padding='same',kernel_regularizer=regularization,use_bias=False)(x)
x = BatchNormalization()(x)
x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)
x = layers.add([x, residual])
x = Conv2D(num_classes, (3, 3), padding='same')(x)
x = GlobalAveragePooling2D()(x)
output = Activation('softmax',name='predictions')(x)

model = Model(img_input, output)
model.compile(optimizer='adam', loss='categorical_crossentropy',metrics=['accuracy'])
model.summary()

# callbacks
log_file_path = base_path + '_emotion_training.log'
csv_logger = CSVLogger(log_file_path, append=False)
early_stop = EarlyStopping('val_loss', patience=patience)
reduce_lr = ReduceLROnPlateau('val_loss', factor=0.1, patience=int(patience/4), verbose=1)
trained_models_path = base_path + '_mini_XCEPTION'
model_names = trained_models_path + '.{epoch:02d}-{val_acc:.2f}.hdf5'
model_checkpoint = ModelCheckpoint(model_names, 'val_loss', verbose=1,save_best_only=True)
callbacks = [model_checkpoint, csv_logger, early_stop, reduce_lr]

model.fit_generator(data_generator.flow(xtrain, ytrain,batch_size),
                        steps_per_epoch=len(xtrain) / batch_size,
                        epochs=num_epochs, verbose=1, callbacks=callbacks,
                        validation_data=(xtest,ytest))

The model gives 65-66% accuracy on validation set while training the model. The CNN model learns the representation features of emotions from the training images. Below are few epochs of training process with batch size of 64.

Testing the Model

On Images

While performing tests on the trained model, I felt that model detects the emotion of faces as neutral if the expressions are not made distinguishable enough. The model gives probabilities of each emotion class in the output layer of trained mini_xception CNN model. Below are the 18 facial expressions taken from google images to validate the trained model.

In order to detect emotion in a single image, one can execute the python code below.

from keras.preprocessing.image import img_to_array
from keras.models import load_model
import imutils
import cv2
import numpy as np
import sys

# parameters for loading data and images
detection_model_path = 'haarcascade_files/haarcascade_frontalface_default.xml'
emotion_model_path = 'models/_mini_XCEPTION.106-0.65.hdf5'
img_path = sys.argv[1]

# hyper-parameters for bounding boxes shape
# loading models
face_detection = cv2.CascadeClassifier(detection_model_path)
emotion_classifier = load_model(emotion_model_path, compile=False)
EMOTIONS = ["angry","disgust","scared", "happy", "sad", "surprised","neutral"]

#reading the frame
orig_frame = cv2.imread(img_path)
frame = cv2.imread(img_path,0)
faces = face_detection.detectMultiScale(frame,scaleFactor=1.1,minNeighbors=5,minSize=(30,30),flags=cv2.CASCADE_SCALE_IMAGE)

if len(faces) &amp;gt; 0:
    faces = sorted(faces, reverse=True,key=lambda x: (x[2] - x[0]) * (x[3] - x[1]))[0]
    (fX, fY, fW, fH) = faces
    roi = frame[fY:fY + fH, fX:fX + fW]
    roi = cv2.resize(roi, (48, 48))
    roi = roi.astype("float") / 255.0
    roi = img_to_array(roi)
    roi = np.expand_dims(roi, axis=0)
    preds = emotion_classifier.predict(roi)[0]
    emotion_probability = np.max(preds)
    label = EMOTIONS[preds.argmax()]
    cv2.putText(orig_frame, label, (fX, fY - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)
    cv2.rectangle(orig_frame, (fX, fY), (fX + fW, fY + fH),(0, 0, 255), 2)

cv2.imshow('test_face', orig_frame)
cv2.imwrite('test_output/'+img_path.split('/')[-1],orig_frame)
if (cv2.waitKey(2000) &amp;amp; 0xFF == ord('q')):
    sys.exit("Thanks")
cv2.destroyAllWindows()

On Video

In order to detect emotion in webcam, one can execute the python code here.

References

The demonstration codes has been ingested from following sources.

[1] https://github.com/oarriaga/face_classification
[2] https://github.com/omar178/Emotion-recognition
[3] Real-time Convolutional Neural Networks for Emotion and Gender Classification

If you liked the post, follow this blog to get updates about upcoming articles. Also, share it so that it can reach out to the readers who can actually gain from this. Please feel free to discuss anything regarding the post. I would love to hear feedback from you.

Happy deep learning 🙂

36 thoughts on “Demonstration of Facial Emotion Recognition on Real Time Video Using CNN : Python & Keras”

Add Comment

kivenkat says:

December 13, 2018 at 4:53 pm

How much time will take time to train? For me one ecpoh is taking nearly 2Hr of time?

Like

Reply
1. kivenkat says:
  
  December 13, 2018 at 4:54 pm
  
  Is it fine to reduce the no of Epoch????
  
  Like
  
  Reply
  1. Abhijeet Kumar says:
    
    December 13, 2018 at 6:01 pm
    
    To achieve an accuracy of 65%, you may have to run it to 90 iterations atleast.
    
    Like
    
    Reply
2. Abhijeet Kumar says:
  
  December 13, 2018 at 5:59 pm
  
  Do you have GPU ?
  On GPU it would take less time. It’s not advisable to run on CPU. It would take long time.
  
  Like
  
  Reply
  1. kivenkat says:
    
    December 14, 2018 at 2:36 am
    
    Okay, thanks . I am trying on CPU.
    
    Like
    
    Reply
    1. Abhijeet Kumar says:
      
      December 14, 2018 at 5:10 am
      
      If you are CPU, I would suggest trying it out with already trained model from https://github.com/abhijeet3922/FaceEmotion_ID/tree/master/models
      
      “_mini_XCEPTION.106-0.65.hdf5” this model should be good to use.
      
      Like
      
      Reply
ashwinrachha says:

December 25, 2018 at 10:43 am

Hey Man Can we in anyway Incorporate Multi-threading into the Program? Each epoch is taking me nearly 350 seconds to learn. Is there any chance we could reduce that?

Liked by 1 person

Reply
1. Abhijeet Kumar says:
  
  December 25, 2018 at 10:55 am
  
  Are you talking about training or testing ?
  Are you running it on CPU or GPU ?
  Training of DNN already executes in parallel.
  
  Like
  
  Reply
ashwinrachha says:

January 2, 2019 at 7:06 am

I am just able to get two emotions Abhijeet, one is Happy and the other is disgust with your model. Please Help

Like

Reply
S Sai says:

January 24, 2019 at 12:23 pm

The code for detection in image will be a separate python file right or will it be in the same file in which we trained the models

Like

Reply
1. Abhijeet Kumar says:
  
  January 29, 2019 at 12:02 pm
  
  It will be in separate file. Look at my Github repo for the same.
  https://github.com/abhijeet3922/FaceEmotion_ID
  
  Like
  
  Reply
Ali Kamouch says:

February 7, 2019 at 12:36 am

Many thanks for the tutorial Mr. Kumar and Hello Everyone,
I’m a beginner in the field of machine learning.
I tried to reproduce this model during two days but still have some permissions error :
PermissionError: [Errno 13] Permission denied: ‘./models/_emotion_training.log’
Can someone help please

Like

Reply
Arjun says:

February 26, 2019 at 7:10 am

How to add speech recognition with to emotion recognition in while true loop so that it detects the emotion and has to listen what i said simultanoesly .
please help me

Like

Reply
Vishal says:

February 27, 2019 at 6:46 am

Train process create many hdf5 files in models folder. But in the facial_emotion_image.py and real_time_video.py file we are using only one file which is mini_XCEPTION.106-0.65.hdf5. How do we use other files?

Like

Reply
Python Examples says:

March 10, 2019 at 2:55 am

Good one. This can be applied to categorize a movies or clippings. And also we can get a emotional graph of the video clipping.

Like

Reply
Vignesh says:

March 28, 2019 at 5:50 am

Hello sir,
How do you extract the features from the fer2013 dataset?? Is there any specific algorithms within convolutional neural network?

Like

Reply
1. Abhijeet Kumar says:
  
  April 1, 2019 at 5:18 am
  
  Hi Vignesh,
  You need to understand and read about Neural Networks.
  With these models, you do not need to extract hand crafted features.
  Neural netwroks learn features automatically when trained.
  
  Like
  
  Reply
João Miguel Carvalho says:

April 9, 2019 at 1:01 am

Im using your code but I dont have the same results. It fails on most of the pictures.
And I cant run the real time detection

Traceback (most recent call last):
File “real_time_video.py”, line 49, in
for (i, (emotion, prob)) in enumerate(zip(EMOTIONS, preds)):
NameError: name ‘preds’ is not defined

I didnt change anything in your code and preds is defined on line 44
preds = emotion_classifier.predict(roi)[0]

Any help?

Like

Reply
1. Abhijeet Kumar says:
  
  April 9, 2019 at 5:50 am
  
  Hi,
  
  I remember I faced this error quite a time while i was experimenting. You would need to debug the preds variable. Looking at the code, I feel “preds” variable is getting intialized in side “if len(faces) > 0:” block.
  
  It may be the case that faces are not getting detected in your system and hence the “preds” variable is not initialized which leads to that error.
  Do comment if you are able to debug.
  
  Like
  
  Reply
2. Yatindra says:
  
  April 21, 2019 at 5:44 am
  
  The solution is here
  https://github.com/omar178/Emotion-recognition/issues/1#issuecomment-485226615
  
  Like
  
  Reply
Akshat Suwalka says:

May 3, 2019 at 3:32 am

ValueError: Error when checking target: expected predictions to have shape (7,) but got array with shape (6,)

It is working in jupyter notebook (anaconda)
But not in colab and showing this error
Can you please give me the code that can run on colab also

Like

Reply
Sai Teja Ramadev says:

May 26, 2019 at 2:47 pm

65% accuracy is too low. How to improve it to 80%

Like

Reply
Pingback: Transfer Learning using Feature Extraction from Trained models: Food Images Classification - Machine Learning in Action
Lelant says:

July 31, 2019 at 12:39 pm

Hi, I am trying to go through this tutorial, but I got an Error. It says:

FileNotFoundError: [Errno 2] No such file or directory: ‘models/_emotion_training.log’

I downloaded the complete data-set of the kaggle challenge but I cannot find a file named “_emotion_training.log” in the downloads. Is it supposed to come with the download or is it generated somewhere in the code during the process of training the CNN model?

I am thankful for every answer 🙂

Like

Reply
1. Lelant says:
  
  August 2, 2019 at 9:41 am
  
  I found the solution. I just needed to create a new folder named “models”. That’s it.
  
  Like
  
  Reply
2. Haseeb says:
  
  May 8, 2020 at 10:50 am
  
  Hi ,
  I am getting the same error, could you please tell me how to solve it.
  Thank you.
  
  Like
  
  Reply
  1. Haseeb says:
    
    May 8, 2020 at 10:59 am
    
    Creating a new folder named ‘models’ in the cwd solves this problem.
    
    Like
    
    Reply
Claudio says:

September 3, 2019 at 1:55 pm

hi, thanks for the code, I did the indentation modification but know I have these this error

Traceback (most recent call last):
File “real_time_video.py”, line 48, in
preds = emotion_classifier.predict(roi)[0]
NameError: name ‘roi’ is not defined

have you any idea how fix it?

Like

Reply
Vivek Kulkarni says:

October 7, 2019 at 7:35 am

Sir i am getting error in train_emotion classifier.py
Traceback (most recent call last):
File “train_emotion_classifier.py”, line 156, in
validation_data=(xtest,ytest))
File “C:\Users\Hp\PycharmProjects\FaceDepression_ID-master\depression\lib\site-packages\keras\legacy\interfaces.py”, line 91, in wrapper
return func(*args, **kwargs)
File “C:\Users\Hp\PycharmProjects\FaceDepression_ID-master\depression\lib\site-packages\keras\engine\training.py”, line 1732, in fit_generator
initial_epoch=initial_epoch)
File “C:\Users\Hp\PycharmProjects\FaceDepression_ID-master\depression\lib\site-packages\keras\engine\training_generator.py”, line 260, in fit_generator
callbacks.on_epoch_end(epoch, epoch_logs)
File “C:\Users\Hp\PycharmProjects\FaceDepression_ID-master\depression\lib\site-packages\keras\callbacks\callbacks.py”, line 152, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File “C:\Users\Hp\PycharmProjects\FaceDepression_ID-master\depression\lib\site-packages\keras\callbacks\callbacks.py”, line 702, in on_epoch_end
filepath = self.filepath.format(epoch=epoch + 1, **logs)
KeyError: ‘val_acc’

please help me to solve this error

Like

Reply
1. Josh says:
  
  December 19, 2019 at 4:45 am
  
  Hello Vivek,
  Did you find the solution to this problem? I too am facing this problem please let me know of any work around
  
  Like
  
  Reply
  1. Abhijeet Kumar says:
    
    December 19, 2019 at 12:38 pm
    
    This might be the problem when you are using newer version of keras.
    Refer this. https://github.com/keras-team/keras/issues/6104#issuecomment-309579772
    
    Try the solutions there. Kindly post it if you are able to solve.
    
    Like
    
    Reply
2. Abhijeet Kumar says:
  
  December 19, 2019 at 12:41 pm
  
  May be the issue with Keras version.
  Try the solutions from similar issue raised here.
  
  https://github.com/keras-team/keras/issues/6104
  
  Like
  
  Reply
Pingback: Python in Arabic #60 FaceDetection, Facial Emotion Recognition اكتشاف الوجوه وتعابير الوجوه والاشخاص – Trumpathon – News and information on latest top stories, weather, business, entertainment, politics,
nir says:

January 22, 2020 at 9:45 pm

i didnt uderstand how to train data where i need put jpg file?

also is pussible create it as face recognition with names and with emotions,( people multi number )

Like

Reply
Pingback: 说谎的眼睛：根植于维多利亚时代面部识别 - iYouPort
Salma says:

December 30, 2020 at 10:57 pm

How can I fix this error please

FileNotFoundError: [Errno 2] No such file or directory: ‘models/_emotion_training.log’
The dataset is imported from my Drive , i checked all folders and there is no file called _emotion_training.log
please help

Like

Reply