Face verification and identification systems have become very popular in computer vision with advancement in deep learning models like Convolution Neural Networks (CNN). Few weeks before, I thought to explore face recognition using deep learning based models. This blog-post demonstrates building a face recognition system from scratch.


A face recognition system comprises of two step process i.e. face detection (bounded face) in image followed by face identification (person identification) on the detected bounded face. The following two techniques are used for respective mentioned tasks in face recognition system.

  1. Multi-Task Cascaded Convolution Networks (MTCNN, 2015): It detects all the faces in an image and put a bounding box to it.
  2. FaceNet CNN Model (FaceNet, 2015) : It generates embedding (512 dimensional feature vector in the pre-trained model used here) of the detected bounded face which is further matched against embeddings of the training faces of people in the database. This model is used for person identification in the detected face.
Face Recognition System : Pipeline

Before moving ahead, we will understand the difference between verification and identification tasks.

1. Face verification

It answers the problem of person verification i.e. whether the person is present in the detected face. For example, we may need to verify a person by matching the detected face with his/her stored historical facial images.

Verification is implemented using a threshold score (an empirical value) such that if the score is below threshold then it is considered positive and vice versa. A score is calculated as euclidean distance between vector embeddings of two faces in question. A low score means the detected face is close to the stored historical face of person (and hence verified). Likewise, a high score means both the faces are different.

FaceNet achieved accuracy of 98.87% ± 0.15 and 99.63% ±0.09 with two different settings on the LFW face verification task. The selected optimal threshold as 1.24. Labeled Faces in the Wild (LFW) is the de-facto academic test-set for face verification which contains more than 13,000 labelled facial images of 1680 people collected from the web.

2. Face Identification

It answers the problem of person identification on detected face in the image. For example, we may need to identify a person in the detected face against a database of 1000 people.

Identification can be implemented by training a simple multi-class classifier like K-NN or SVM over the embedding of faces generated by FaceNet.  As this post progresses, we will see how we can train a face classifier on our own data-set of people.

Probably, you might be interested in reading this paper thoroughly to check identification and verification accuracy on 1 million faces (MegaFace benchmark). For both the tasks above, a face detector has to run at first place in order to detect bounded faces in image.

Setting up Environment

Let us setup a virtual environment on a Linux based (Ubuntu) system for this demonstration. It can be done on Windows also easily. A Virtual Environment is a tool to keep the dependencies required by different projects in separate places, by creating virtual Python environments for them. So, It is better to create a virtual environment for such demonstration.

You can find the requirements.txt from here. This file contains list of packages which would be sufficient for demonstration of face recognition experiments in this blog-post.

>>> virtualenv Face_ID
>>> cd Face_ID
>>> source bin/activate
>>> pip install -r requirements.txt

In this blog-post, FaceNet and MTCNN techniques are ingested from David Sandberg’s FaceNet implementation found here. Saying that, we need to do following steps.

  1. Download this GitHub repository. Keep the folders of this repository in the Face_ID directory created with virtual environment.
  2. Download the pre-trained model from here. Keep the pre-trained model directory in Face_ID/facenet/src/.

Face Verification

Let us take four images and see how can we compare them in terms of euclidean distance between the embeddings generated by FaceNet Model. We will make use of Face_ID/facenet/src/compare.py for getting the distance .

>>> python facenet/src/compare.py facenet/src/20180402-114759/ 
    --image_size 160 --margin 32 --gpu_memory_fraction 0

0: facenet/dataset/test-images/bradley.jpeg
1: facenet/dataset/test-images/hritik.jpeg
2: facenet/dataset/test-images/mark1.jpeg
3: facenet/dataset/test-images/mark.jpeg

Distance matrix

0 1 2 3
0 0.0000 1.1274 1.4113 1.4722
1 1.1274 0.0000 1.4643 1.4246
2 1.4113 1.4643 0.0000 0.6942
3 1.4722 1.4246 0.6942 0.0000

This simple examples depicts how the euclidean distance between embeddings are low for two faces of the same person (green), high for faces of totally different person (red) and in between (neither high nor low) for similar faces (amber).

It can be said that with threshold 1.1, we can accurately verify the above samples. Readers are encouraged to go through the python code of compare.py. We must agree to the point that initially a face detector (MTCNN model is utilized) has to run which will extract the bounded face from the above images. Further, the distance between extracted embeddings (FaceNet CNN model is utilized) from the bounded face are calculated.

Face Identification

Let us train a face recognition model on our own data-set. We will train a classifier (SVM) on faces of 6 people and then run face recognition on images or videos. We will perform the following steps to do face identification experiment.

  1. Dataset Preparation
    Collect at least 10 images per person at the least. Keep it in Face_ID/facenet/dataset/raw. As of now, you may see raw image folders of 6 people in the mentioned path. Some of the sample images are shown below.
  2. Face Detection
    Run face detection and alignment algorithm i.e. MTCNN based model to extract bounded faces only from all images and prepare a aligned directory. It will be saved in Face_ID/facenet/dataset/aligned path.

    >>> python facenet/src/align/align_dataset_mtcnn.py 
        facenet/dataset/raw facenet/dataset/aligned 
        --image_size 160 --margin 32
    # Read about parameters in code align_dataset_mtcnn.py

    Output bounded faces from MTCNN are shown below:

  3. Training Faces
    Initially, it generates 512 dimensional embedding vector for 10 faces of each of the individual. Further, it trains a multi-class classifier support vector machines (SVM) on the generated vectors.

    >>> python facenet/src/classifier.py TRAIN 
        facenet/dataset/aligned facenet/src/20180402-114759/ 
        --batch_size 1000 --min_nrof_images_per_class 10 
        --nrof_train_images_per_class 10 --use_split_dataset
    # Read about parameters in code classifier.py
  4. Face Recognition
    We are ready to run face recognition on test images. The test images can be found in the path Face_ID/facenet/dataset/test-images/

    >>> python facenet/src/face_recognition_image.py facenet/dataset/test-images/1.jpg
    >>> python facenet/src/face_recognition_image.py facenet/dataset/test-images/2.jpg
    >>> python facenet/src/face_recognition_image.py facenet/dataset/test-images/test1.jpg
    >>> python facenet/src/face_recognition_image.py facenet/dataset/test-images/test2.jpg
    >>> python facenet/src/face_recognition_image.py facenet/dataset/test-images/test3.jpg

Beginners are encouraged to play with python code in order to have much better understanding of the algorithms. Having executed them, you will find the following results (1.jpg and 2.jpg).

Here, we can see that all the faces are detected by bounding box but none of them are recognized from the database of 6 people we trained. A threshold of 0.43 is set on the predicted probabilities (by predict_proba() definition of SVM  ) for each of the bounded face. Likewise we can see the bounded face have been recognized in the below examples (test1.jpg, test2.jpg, test3.jpg).


With threshold of 0.43, one of the face in test1.jpg could not be identified as ‘Bug’ (person nickname :-p). There could be possible two reasons. First, face has not been detected well (right eye is not covered in bounded box). Secondly, It’s a low resolution image. The recognition accuracy will be better with higher resolution image. Likewise, I ran face recognition on a short recorded video of my friends.

>>> python facenet/src/face_recognition_video.py 

Concluding Remarks

Hope it was easy to go through tutorial as I have tried to keep it simple and reproducible. Beginners who are interested in image analytics/computer vision can start with this application.

You might be thinking about the mathematics behind the used models like MTCNN and FaceNet models. You are encouraged to study about these models from references section. Apart from that, there can be a lot of experiments which can be done further.

  1. Experiment on a large scale with the pre-trained model. Example in this blog-post has 6 people in database. One can setup an experiment with 100 people in data-set. If you are interested to do it, do reach me out.
  2. Similarly, experiments can be done with more number of images per person as 10 images would not be sufficient as far as accuracy is concerned when there are large number of people in database. Also, SVM may not perform well with lots of classes. Multi-class classifiers like K-NN can be tried out.
  3. A fair study of accuracy can be done on resolution of test images. A high resolution image performs better than low resolution images. Number of pixels captured in bounded face affects the recognition.
  4. For video face detection, people do implement person tracking for each bounded face in order to smoothen the results and filter unwanted wrong identification of few abrupt frames in between. Face tracking has to be implemented for the same.

Probably, I will discuss the architecture about these CNN models and python implementations in some another blog-posts some other time. You can get the full python and tensorflow codes for this experiment from GitHub link here.


All the concepts, exploration and demonstration codes has been ingested from these sources.

[1] https://github.com/davidsandberg/facenet
[2] https://github.com/AISangam/Facenet-Real-time-face-recognition-using-deep-learning-Tensorflow
[3] https://arxiv.org/pdf/1503.03832.pdf
[4] https://arxiv.org/pdf/1604.02878.pdf

If you liked the post, follow this blog to get updates about upcoming articles. Also, share it so that it can reach out to the readers who can actually gain from this. Please feel free to discuss anything regarding the post. I would love to hear feedback from you.

Happy deep learning 🙂