Datasets & Code


Web-Scale Face Recognition

In this work, we further analyze the problem of face auto-tagging. With millions of users and billions of photos, web-scale face recognition is a challenging task that demands speed, accuracy, and scalability. We propose a novel Linearly Approximated Sparse Representation-based Classification (LASRC) algorithm that uses linear regression to perform sample selection for l1-minimization, thus harnessing the speed of least-squares and the robustness of SRC methods.

Project Page

Code and Data Coming Soon...

Facebook Face Recognition

This work evaluates face recognition applied to the real-world application of Facebook. Because papers usually present results in terms of accuracy on constrained face datasets, it is difficult to assess how they would work on natural data in a real-world application. We present a method to automatically gather and extract face images from Facebook, resulting in over 60,000 faces representing over 500 users. From these natural face datasets, we evaluate a variety of well-known face recognition algorithms (PCA, LDA, ICA, SVMs) against holistic performance metrics of accuracy, speed, memory usage, and storage size.

Project Page


Facebook Picture and Tag Downloader
Matlab Face Recognition Evaluator
Face Extractor

Outlier Detection

Outlier detection has received significant attention in many applications, such as detecting credit card fraud or network intrusions. We propose Attribute Value Frequency (AVF), a fast and scalable outlier detection strategy for categorical data. AVF scales linearly with the number of data points and attributes, and relies on a single data scan.

Project Page



Class Projects and Other

MACH Filter

MACH stands for Maximum Average Correlation Height and it is a special kind of correlation filter that learns the optimal filter from a set of negative and positive training data. The implementations include the standard filter as well as an optimization that removes the necessity for negative training data, allowing the filter to generalize based on the positive data of interest.

Download (Matlab)

Diffusion Maps

Diffusion maps is a dimensionality reduction technique that allows it to learn the optimal reduction of non-linear data. It is based on the Markov transition model allowing it to handle varying degrees of relatedness and its propagation through time. I found this method to be very good for the simulated data, but very sensitive to parameters for more realistic data. As with any graph-based method, you need to specially design the relationship graph to discover the intended relationships.

Download (Matlab)


This is an implementation of the popular eigenfaces technique for dimensionality reduction, which is Principal Components Analysis (PCA) applied to face recognition. I use Nearest-Neighbor to find the closest match, however more sophisticated classifiers can be used in more complicated scenarios.

Download (Matlab)

Comments are closed.