Face Recognition using Python, Dlib and OpenCV

While exploring and researching underlying concepts behind deep neural network, I have been burnt by boring (but necessary) mathematical concepts (gradient, logit regression, cross entropy, lagrange etc.) and algorithms. To get back on my feet, I need to build something practical, quickly so that I won't lose my interest in this topic. I did some research and eventually, found a gem in github. It was a custom python module called "face recognition" by ageitgey, and he claims, "the world's simplest facial recognition API for Python". In fact, what really caught my eye was his way of implementing dlib and opencv libraries to capture video stream and recognize faces in every frame. It was very impressive, but relatively slow. The problem was that the main process does all the steps of capturing a single video frame, locating faces, encoding faces to compare with known faces and decorating each detected face with a rectangle and label, and then, repeat the entire routine on every frame from the video stream. The results are very choppy and most importantly, it was not acceptable for my kids ;). I had to do something to impress my kids with my own implementation.

I have to start with a disclaimer first. This post was inspired by ageitgey's API, but I am not using his library. Don't get me wrong! His face recognition API (wrapper classes for dlib) is really good and highly recommended for those who want to quickly try his code to implement face recognition. In addition, I am actually using HOG+SVM face detector that comes with dlib instead of CNN (Convolutional Neural Network) because I think the CNN version is too slow to detect faces from the live video stream. At least, it is not suitable for real-time face detection with my hardware ;). My code is written for those who want to learn how to use dlib and opencv.

I am going to explain how to use this very powerful dlib library with pre-trained facial recognition resnet model v1 and at the same time, improve the performance of live facial recognition using python multi-threading library. By the way, you can find the entire code (236 lines) in https://github.com/jpark7ca/face_recognition. Let's get started.

First, you need to install python3 and dlib, opencv, numpy and setuptools modules. Once python3 is installed, follow the steps below to install additional modules using pip.


$ pip3 install numpy
$ pip3 install cmake
$ pip3 install dlib
$ pip3 install setuptools
$ pip3 install opencv-python


If you have cloned my repo, you can find a file called, "face_recognition_webcam_mt.py". The first section is to import necessary modules for face recognition process.


import numpy as np
import dlib
from pkg_resources import resource_filename
import cv2
from threading import Thread


The face recognition process consists of the following steps: a) load and encode face images (known faces) that will be recognized by this process, b) capture a image (frame from the live video stream),  c) locate a face from the image, d) encode a detected face and finally, f) compare this encoded face (detected) with encoded known faces.

To load images from the file system (Step a),  I have created a static method called, load_image in FaceRecognition static class using opencv library. It is very simple and self-explanatory as shown below.


@staticmethod
    def load_image(file, pixeltype=cv2.IMREAD_COLOR):
        _image = cv2.imread(file, pixeltype)
        return np.array(_image)


I am going to create a list of known faces as shown below. In this example, you found two image files and once loaded, the numpy array of the image file will be passed to another static method called, face_encodings.

known_face_encodings = [
        FaceRecognition.face_encodings(FaceRecognition.load_image("person1.jpg"))[0],
        FaceRecognition.face_encodings(FaceRecognition.load_image("person2.jpg"))[0]
    ]


The face encodings static method is the heart of the face recognition process. First, the human face in the image will be detected by dlib.get_frontal_face_detector() and returns a bounding rectangle that surrounds the human face (Step c).


@staticmethod
    def face_encodings(imagelocations=None, upsample=1, jitter=1):

...

face_detector = dlib.get_frontal_face_detector()  
 _raw_face_locations = face_detector(image, upsample)  


Then, dlib_share_predictor() will be called with a 5 point landmarking model which identifies the corners of the eyes and bottom of the nose. The person who trained this model says, "it is trained on the dlib 5-point face landmark dataset, which consists of 7198 faces". Click here for details.


predictor_5_model_location = resource_filename(__name__, "models/shape_predictor_5_face_landmarks.dat")

pose_predictor = dlib.shape_predictor(predictor_5_model_location)
_raw_landmarks = [pose_predictor(image, face_location) for face_location in _raw_face_locations]


Finally, the face encodings method instantiates dlib.face_recognition_model_v1 class with  a pre-trained model (ResNet network with 27 convolutional layers) and calls compute_face_descriptor() to generate the encoded face image (128-dimensional feature vectors) in numpy array format (Step d).


face_recognition_model_location = resource_filename(__name__, "models/dlib_face_recognition_resnet_model_v1.dat")

face_encoder = dlib.face_recognition_model_v1(face_recognition_model_location)

return [np.array(face_encoder.compute_face_descriptor(image, raw_landmark, jitter))
                for raw_landmark in _raw_landmarks]


The face recognition process determines whether your target face is detected or not by comparing a captured encoded image (via face_encodings method) with known encoded images. It is done with compare_encodings static method in FaceRecognition class (Step f).


@staticmethod
    def compare_encodings(known_encodings, encoding_check, tolerance=0.5):
        return list(FaceRecognition.encoding_distance(known_encodings, encoding_check) 
            <= tolerance)


In fact, the heavy lifting part of the face recognition is done and as you can see, it is really simple with a powerful library like dlib ;)

In my code, there are 2 additional threads (processes) in conjunction with the Main process: WebcamVideoStream and FaceRecognitionProcess. The FaceRecognitionProcess is responsible for detecting, encoding and comparing a detected face with known faces in a separate thread as explained, previously. The WebcamVideoStream thread continuously captures a series of frames from the video stream (Step b). Basically, the FaceRecognitionProcess gets a frame from WebcamVideoStream, and then, the Main process shows (render) the processed frame on your computer screen.

The reason for creating multiple threads for this program is to avoid getting delayed by a piece of code that captures a frame from your webcam (I/O), and encodes and compares a captured face with known faces before rendering the processed frame on your screen. If your program processes every captured frames with face recognition (encoding and comparing), which takes considerable time, you will see choppy images of your webcam on your screen. If you let your program to show captured frames on your screen regardless of the completion of face recognition process, the video render function can smoothly display images on your screen. Keep in mind that there are some lags while detecting a face and displaying a rectangle around it when a person makes a sudden move within a short period of time.

Note that if you use Mac, you need to add opencv render function (cv2.imshow) in Main process due to known opencv bug in Mac platform.

In main routine, I am going to encode two faces and save them in a list, and create another list for names for the known faces.


known_face_encodings = [
        FaceRecognition.face_encodings(FaceRecognition.load_image("person1.jpg"))[0],
        FaceRecognition.face_encodings(FaceRecognition.load_image("person2.jpg"))[0]
    ]

known_face_names = [
        "John Doe",
        "Jane Roe"
    ]


Then, start two threads for video capturing and processing (face recognition).


video_capture = WebcamVideoStream(src=0).start()

video_process = FaceRecognitionProcess(capture=video_capture,
                                           known_encodings=known_face_encodings,
                                           known_names=known_face_names,
                                           fx=scale_factor,
                                           fy=scale_factor).start()


And, create a loop which gets a frame from video_capture thread, gets a list of detected faces from video_process thread and draws rectangles around faces on a current frame, and finally, shows the results on your screen. The loop continuously repeats this process until a key is pressed to terminate the program.


while True:

        if video_capture.stopped:
            video_capture.stop()
            break

        frame = video_capture.read()

        # Display the results
        locations = video_process.face_locations
        names = video_process.face_names

        for (left, top, right, bottom), name in zip(locations, names):
            # Scale up to the original size
            top *= r_scale_factor
            right *= r_scale_factor
            bottom *= r_scale_factor
            left *= r_scale_factor

            # Draw a box around the detected face  - BGR
            cv2.rectangle(frame, (left, top), (right, bottom), (244, 134, 66), 3)

            # Draw a label with a name
            cv2.rectangle(frame, (left-2, top - 35), (right+2, top), (244, 134, 66), cv2.FILLED)

            font = cv2.FONT_HERSHEY_DUPLEX
            cv2.putText(frame, name, (left + 6, top - 6), font, 1.0, (255, 255, 255), 1)

        # Display the resulting image
        cv2.imshow('Video', frame)

        # Hit 'q' on the keyboard to quit!
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
  

You can see the entire code in here.  Happy Coding!

Comments

  1. Sometime few educational blogs become very helpful while getting relevant and new information related to your targeted area. As I found this blog and appreciate the information delivered to my database.
    facial recognition system

    ReplyDelete
  2. Hi
    please help me with the following
    Does this address performance improvements in processing the video frames
    Does it address performance improvement during detection phase
    how to execute from conda prompt after cloning
    4.since am new to python i dont understand much about distribution of work across threads.Could you help me with some links
    is it only choppy video output which is bettered here

    ReplyDelete

Post a Comment

Popular Posts