Andrew Ng – Building High-Level Features Using Large Scale Unsupervised Learning (Video+Paper)


Video: Bay Area Vision Meeting: Unsupervised Feature Learning and Deep Learning

Bay Area Vision Meeting
Unsupervised Feature Learning and Deep Learning
Presented by Andrew Ng
March 7, 2011

Despite machine learning’s numerous successes, applying machine learning to a new problem usually means spending a long time hand-designing the input representation for that specific problem. This is true for applications in vision, audio, text/NLP, and other problems. To address this, researchers have recently developed “unsupervised feature learning” and “deep learning” algorithms that can automatically learn feature representations from unlabeled data, thus bypassing much of this time-consuming engineering. Building on such ideas as sparse coding and deep belief networks, these algorithms can exploit large amounts of unlabeled data (which is cheap and easy to obtain) to learn a good feature representation. These methods have also surpassed the previous state-of-the-art on a number of problems in vision, audio, and text. In this talk, I describe some of the key ideas behind unsupervised feature learning and deep learning, describe a few algorithms, and present case studies pertaining.

The Bay Area Vision Meeting (BAVM) is an informal gathering (without a printed proceedings) of academic and industry researchers with interest in computer vision and related areas. The goal is to build community among vision researchers in the San Francisco Bay Area, however, visitors and travelers from afar are also encouraged to attend and present. New research, previews of work to be shown at upcoming vision conferences, reviews of not-well-publicized work, and descriptions of “work in progress” are all welcome.

(Source: YouTube | GoogleTechTalks)


Paper: Building High-Level Features Using Large Scale Unsupervised Learning

Authors: Quoc Le, , , Matthieu Devin, Kai Chen, , , Andrew Ng

International Conference in Machine Learning (2012)
Publication Year: 2012


We consider the problem of building high level, class-specific feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images? To answer this, we train a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization on a large dataset of images (the model has 1 billion connections, the dataset has 10 million 200×200 pixel images downloaded from the Internet). We train this network using model parallelism and asynchronous SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not. Control experiments show that this feature detector is robust not only to translation but also to scaling and out-of-plane rotation. We also find that the same network is sensitive to other high-level concepts such as cat faces and human bodies. Starting with these learned features, we trained our network to obtain 15.8% accuracy in recognizing 20,000 object categories from ImageNet, a leap of 70% relative improvement over the previous state-of-the-art.

Full Text (pdf)


(Source: Google Research)

Leave a Reply