At its very basic, Facial Recognition, which was created by the application of mathematical techniques to the study of the human brain, has four elements to its Technology:
As a biometric system, a Facial Recognition system, as generally understood, operates in either or both of two modes when a face is detected:
Face authentication or verification involves a one-to-one match that compares a query face image against a single enrollment face image whose identity is being claimed. The verification of an individual for a self-serviced immigration clearance using an e-passport is one typical application of this. Matching your own face against your phone is another example.
Face identification or recognition, on the other hand, involves one-to-many matching that compares a queried face against multiple faces in an enrollment database, in order to associate the identity of the queried face to one of those in the database. In some identification applications, one just needs to find the most similar face. In a background check or face identification being done for an investigatory lead, the requirement is more than finding similar faces; typically, a confidence level threshold is specified and all those faces whose similarity score is above the threshold are reported.
Note: The performance of a face recognition system largely depends on a variety of factors such as lighting, facial pose, expression, age span, hair, facial wear, source device, modifications and motion, in addition to the diversity of the dataset and the data the algorithm itself has trained against.
Facial recognition technology (FRT), though, has long been at the center of heated debates about privacy, fairness, and accuracy. Much of this scrutiny emerged following early academic and governmental studies, including a widely cited 2018 paper by MIT Media Lab’s Dr. Joy Buolamwini, co-authored by Dr. Timnit Gebru, which found that commercial facial recognition algorithms disproportionately misidentified women and persons of color — error rates were as much as 34.7% for African American women as compared to a maximum of 0.8% for Caucasian men.
This paper was later reinforced by a NIST 2019 study that empirically demonstrated significant demographic differentials across algorithms. NIST evaluated 189 mostly commercial algorithms from 99 developers.
That NIST study used four large datasets of photographs collected in U.S. governmental applications that were then in operation:
Together, these comprised about 18.27 million images from about 8.49 million people.
There were two inherent issues with the study. The first was the study size.
Even MIT Media Lab’s oft-quoted landmark study, which tested three commercially available systems, and was reportedly the first of its kind to have gender parity in a dataset, used a dataset of just 1,270 publicly available faces, that of lawmakers from a diverse set of countries (from Africa to the Nordic nations) with a high number of women holding office.
While fairly large by U.S. or western European standards, it remained small by the standards of algorithmic testing done at scale in Asia. Second, NIST itself admitted that while the first three sets (mugshots, application photos for immigration benefits, and visa applicant photos had “good compliance with image capture standards,” the last set, that of photos of persons crossing the border, “did not, given constraints on capture duration and environment.”
FR applications may be divided into two broad categories in terms of a subject’s cooperation:
The cooperative case is encountered where the subject is willing to be cooperative by presenting his/her face in a proper way (for example, in a frontal pose with a neutral expression and open eyes). In the noncooperative case, which is more typical in security and surveillance applications, the subject is generally unaware of being identified.
According to the Handbook of Face Recognition, “In terms of distance between the face and the camera, near field face recognition (less than 1 meter) for cooperative applications is the least difficult problem, whereas far field noncooperative applications is the most challenging.”
Applications in-between the above two categories can also be expected. For example, in face-based access control at a distance, when the subject is willing to be cooperative, but is unable to present his/her face in a favorable condition with respect to the camera. This may present challenges to the system even though such cases are still easier than identifying the identity of the face of a subject that is not cooperative. However, according to the Handbook’s editors, in almost all cases, ambient illumination is the foremost challenge for most FR systems.
Despite all this, these concerns raised in 2018 and 2019 were still valid — at the time. However, the FRT landscape has changed significantly since then. Advances in machine learning, the adoption of global and more balanced training datasets, and the development of stronger algorithmic architectures have greatly reduced the demographic bias problem.
Biometrica is not a facial recognition company. We do use NIST-approved third-party providers that provide us with results from an FRT search query run against our UMbRA database, which is 100% law enforcement-sourced. We have no access at any point to biometric templates or biometric identifiers from that query, which is conducted in a secure and isolated black box environment by the NIST-approved third party. But we have gone a step further by intentionally designing our systems and solutions around privacy, human oversight, and the prevention of mass surveillance. This explainer outlines both the historical challenges and how we, and the broader industry, have adapted.
Early systems were trained on datasets like “Labeled Faces in the Wild (LFW)” or “CelebA,” which were heavily skewed towards white male celebrities, with some datasets reportedly as much as 75% male and 80% white.
Women, children, African American, Indigenous, Asian, and other non-white populations were highly underrepresented.
The architectures available at the time were less capable of learning from limited or imbalanced data.
Early algorithms also struggled in low-light conditions or with poor-quality images, which disproportionately affected marginalized communities. Logically, if you trained algorithms on datasets that skewed heavily male and heavily white, and had poor picture quality, it would be harder for those algorithms to recognize people, including women and persons of color.
What does this mean? In simple terms, if we humans have inherent biases and a predisposition to believe ill of people of a certain race or ethnicity, without effective algorithms and algorithmic accountability, appropriate training, or a system of checks and balances in place, our biases will be reflected in our decision-making process.
But things did change. And algorithms did get better as they were exposed to more and more diverse data. In addition, FR algorithms grew stronger as developers also understood that if a face was part of a time series — that is, there was a series of photos of an individual over a period of time instead of many photos at the same time — it also made for a better algorithm. Again, algorithms could be trained.
Critically: NIST found that algorithms developed in Asian countries did not show the same biases against Asian faces, directly tying diversity of training data to performance.
Since 2019, the industry has made remarkable progress:
The most recent NIST results demonstrate:
Facial recognition is most effective when used as a combination of human intelligence and machine intelligence. Why? Because human and machine intelligence balance each other out. Algorithmic intelligence allows us to take huge amounts of data, sift through it and process it at great speed to provide a handful of potentially life-saving results in real-time, while eliminating some inherent human biases in decision-making processes.
Humans have the ability to sift through that handful of results, validate it, and make discerning choices taking other factors into consideration, thus ensuring both algorithmic accountability and humaneness, and reducing the chance of a false positive — or a false negative — that would negatively impact an individual’s life, and especially disproportionately impact disadvantaged communities. With that in mind, Biometrica implemented the following protocols:
The legitimate criticism leveled against early FRT systems changed the trajectory of this technology. However, it is a mistake to assume that those challenges persist at the same scale today. Algorithms have improved, datasets have become global, and safeguards — including mandatory human review and verification — have matured.
Biometrica believes that dismissing responsible, privacy-protective solutions out of a fear of outdated shortcomings denies communities — especially communities that are economically disadvantaged and historically underrepresented in access to affordable technology solutions — and law enforcement agencies in those communities and beyond, access to powerful tools for finding missing persons, preventing violence, protecting vulnerable populations, balancing out human bias, stopping human trafficking and having accountability.
Our system does not enable mass surveillance. It enables focused, lawful, and rights-respecting alerts — nothing more, nothing less.