{"id":230,"date":"2021-02-17T03:01:25","date_gmt":"2021-02-17T03:01:25","guid":{"rendered":"http:\/\/enriquegortiz.com\/wordpress\/enriquegortiz-staging\/?post_type=fw-portfolio&#038;p=230"},"modified":"2022-12-18T18:43:07","modified_gmt":"2022-12-18T18:43:07","slug":"video-face-recognition","status":"publish","type":"fw-portfolio","link":"https:\/\/enriquegortiz.com\/wordpress\/enriquegortiz\/project\/video-face-recognition\/","title":{"rendered":"Video Face Recognition"},"content":{"rendered":"<section class=\"fw-main-row \"  >\n\t<div class=\"fw-container\">\n\t\t<div class=\"row\">\n\t\r\n\r\n<div class=\" col-xs-12 col-sm-12 \">\r\n    <div id=\"col_inner_id-69ea5b9014df5\" class=\"fw-col-inner\" data-paddings=\"0px 0px 0px 0px\">\r\n\t\t<p><strong><a href=\"http:\/\/www.enriquegortiz.com\">E.G. Ortiz<\/a><\/strong>, <a href=\"http:\/\/www.alan-wright.com\">A. Wright<\/a>, and <a href=\"http:\/\/crcv.ucf.edu\/people\/faculty\/shah.html\">M. Shah<\/a>. <a href=\"http:\/\/www.enriquegortiz.com\/publications\/VFR_MSSRC.pdf\"><em>\"Face Recognition in Movie Trailers via Mean Sequence Spare Representation-based Classification\"<\/em><\/a>.\u00a0IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.<\/p><h1>Motivation<img loading=\"lazy\" decoding=\"async\" class=\"alignright size-full wp-image-231\" src=\"https:\/\/enriquegortiz.com\/wordpress\/enriquegortiz\/files\/2021\/02\/videoface_teaser.jpg\" alt=\"\" width=\"525\" height=\"330\" srcset=\"https:\/\/enriquegortiz.com\/wordpress\/enriquegortiz\/files\/2021\/02\/videoface_teaser.jpg 525w, https:\/\/enriquegortiz.com\/wordpress\/enriquegortiz\/files\/2021\/02\/videoface_teaser-300x189.jpg 300w\" sizes=\"auto, (max-width: 525px) 100vw, 525px\" \/><\/h1><p>Movie interest is largely correlated to the actors in a movie making annotation of all occurrences of cast members within a movie essential. This work addresses the difficult problem of identifying a video face track with a dictionary of still face images of many people, while rejecting unknown individuals. We employ a large database of still images from the Internet to perform complete video face recognition from face tracking to face track identification.<\/p><h1>Face Tracking<\/h1><p>Our method performs the difficult task of face track- 281 ing based on face detections extracted using the high-performance <a href=\"http:\/\/www.sciencedirect.com\/science\/article\/pii\/S0262885605001605\">SHORE face detection<\/a>. We generate tracks using two metrics one spatial and the other appearance. The spatial metric computes the percent overlap of the current bounding box with the previous. The appearance metric computes a histogram intersection of the local bounding box, which can handle abrupt changes in the scene and the face. We compare each new face detection to existing tracks; if the location and appearance metric is similar, the face is added to the track, otherwise a new track is created. Finally, we use a global histogram for the entire frame, encoding scene information, to detect scene boundaries and impose a lifespan of 20 frames of no detection to detect the end of tracks.<\/p><h1>Mean Sequence Sparse\u00a0Representation-based Classification<\/h1><p>In recent years, <a href=\"http:\/\/research.microsoft.com\/pubs\/132810\/PAMI-Face.pdf\">Sparse Representation-based Classification<\/a>\u00a0(SRC)\u00a0has received much attention due to its high precision and ability to handle occlusions. More recently, we found that combined with several features SRC works well for real-world face recognition and excels at rejecting unknown identities \u00a0(see <a href=\"http:\/\/enriquegortiz.com\/wordpress\/enriquegortiz\/research\/face-recognition-2\/face-recognition\">Face Recognition for Web-Scale Datasets<\/a>). Now, given a face track \u00a0\\( Y = [y_1,y_2,...,y_M]\\) with \\(M\\) frames, we make the strong assumption that they will result in a single coefficient vector \\(x\\) based on the fact that all of the frames belong to the same person and should intuitively be linearly represented by the same people in the dictionary. Based on this assumption we produce the following formulation:<\/p><p style=\"text-align: center;\">\\( \\hat{x}_{\\ell_1} = \\min_x \\sum_{i=1}^M \\| y_i - Ax \\|_2 + \\lambda \\| x \\|_1 \\),<\/p><p>in which we minimize the sum residual error between every frame \\(i\\) and the linear combination \\(Ax\\) and maximizing the sparsity of \\(x\\). By analyzing the least-squares formulation of the residual error, we find the interesting result that it reduces to the mean face track vector as follows:<\/p><p style=\"text-align: center;\">\\(\\hat{x}_{\\ell_1} = \\min_x \\| \\bar{y} - Ax \\|_2 + \\lambda \\| x \\|_1\\),<\/p><p style=\"text-align: left;\">where \\( \\bar{y} = \u00a0\\sum_{i=1}^M y_i \/ M\u00a0\\). This formulation results in at least a 5x speedup depending on the average length of the input face tracks over a naive frame-by-frame application of SRC.<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignright size-full wp-image-341\" src=\"https:\/\/enriquegortiz.com\/wordpress\/enriquegortiz\/files\/2021\/02\/trailers-1024x477-1.jpg\" alt=\"\" width=\"1024\" height=\"477\" srcset=\"https:\/\/enriquegortiz.com\/wordpress\/enriquegortiz\/files\/2021\/02\/trailers-1024x477-1.jpg 1024w, https:\/\/enriquegortiz.com\/wordpress\/enriquegortiz\/files\/2021\/02\/trailers-1024x477-1-300x140.jpg 300w, https:\/\/enriquegortiz.com\/wordpress\/enriquegortiz\/files\/2021\/02\/trailers-1024x477-1-768x358.jpg 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/p><h1>Movie Trailer Face Dataset<\/h1><p>We built our Movie Trailer Face Dataset using 113 movie trailers from YouTube of the 2010 release year that con tained celebrities present in our supplemented PublicFig+10 dataset. These videos were then processed to generate face tracks using the method described above. The resulting dataset contains 3,585 face tracks, 63% consisting of unknown identities (not present in PubFig+10) and 37% 514 known.<\/p><h1>Video Face Recognition Toolbox<\/h1><p>For benchmarking of future methods with our or some other custom data, we provide a Video Face Recognition Toolbox. The tool contains implementations of the tested algorithms (NN, SVM, L2, SRC, and MSSRC). There are two principal scripts:<\/p><ul><li><span style=\"line-height: 13px;\"><strong>trailerExperiments:<\/strong> Entry script for the execution of methods.<\/span><\/li><li><strong>trailerResults:\u00a0<\/strong>Consolidates all results and outputs PR curves.<\/li><\/ul><h1>Video Presentation<\/h1><p>http:\/\/www.youtube.com\/watch?v=2GqUu6EViVE<\/p>\t<\/div>\r\n<\/div>\r\n<\/div>\n\n\t<\/div>\n<\/section>\n\n","protected":false},"featured_media":231,"template":"","fw-portfolio-category":[33],"class_list":["post-230","fw-portfolio","type-fw-portfolio","status-publish","has-post-thumbnail","hentry","fw-portfolio-category-research"],"_links":{"self":[{"href":"https:\/\/enriquegortiz.com\/wordpress\/enriquegortiz\/wp-json\/wp\/v2\/fw-portfolio\/230","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/enriquegortiz.com\/wordpress\/enriquegortiz\/wp-json\/wp\/v2\/fw-portfolio"}],"about":[{"href":"https:\/\/enriquegortiz.com\/wordpress\/enriquegortiz\/wp-json\/wp\/v2\/types\/fw-portfolio"}],"version-history":[{"count":5,"href":"https:\/\/enriquegortiz.com\/wordpress\/enriquegortiz\/wp-json\/wp\/v2\/fw-portfolio\/230\/revisions"}],"predecessor-version":[{"id":352,"href":"https:\/\/enriquegortiz.com\/wordpress\/enriquegortiz\/wp-json\/wp\/v2\/fw-portfolio\/230\/revisions\/352"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/enriquegortiz.com\/wordpress\/enriquegortiz\/wp-json\/wp\/v2\/media\/231"}],"wp:attachment":[{"href":"https:\/\/enriquegortiz.com\/wordpress\/enriquegortiz\/wp-json\/wp\/v2\/media?parent=230"}],"wp:term":[{"taxonomy":"fw-portfolio-category","embeddable":true,"href":"https:\/\/enriquegortiz.com\/wordpress\/enriquegortiz\/wp-json\/wp\/v2\/fw-portfolio-category?post=230"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}