Recognizing visual objects in images, and actions in videos, are important
problems in computer vision, with many applications in security, commerce, human-compute interaction,
content-based video retrieval, visual surveillance, analysis of sports events and more.
Recognition is mainly divided into two parts: category recognition (classification) and
detection/localization. The goal of category recognition is to classify a given object
(or action) into one of several pre-specified categories, while object (action) detection is meant to
separate objects (actions) of interest from the background in a target image or video. Typically,
learning-based approaches involve generative or discriminative training models (parametric models) for each
category based on training examples. These methods require a large number of training examples, can result
in over-fitting of parameters, and do not scale well with the number of object (or action) categories.
We have developed a framework where problems such as generic object detection, action detection, and
action category classification can be solved in a unified setting from a single example (i.e. without training.)
In a related effort, we have also developed a method which can accurately detect salient objects or actions from
visual data without any background or prior knowledge.
Here is a recent talk that summarizes these ideas. For additional results
and graphic explanations, please visit the project webpages for object detection;
action recognition ; and saliency detection.
Also, please consult the relevant publications below.