Home
University of Edinburgh
School of Informatics
Jasper R. R. Uijlings

?? ????? ?????п??????????? ??Щ??????????????

About

Publications

63,903

Reads

16,119

Citations

Skills and Expertise

Classification

Advanced Machine Learning

University of Edinburgh

School of Informatics
Edinburgh, United Kingdom

Current position

Researcher

Zan Gao's Lab

Publications

The Missing Link: Finding label relations across datasets

Preprint

Jun 2022

Computer Vision is driven by the many datasets which can be used for training or evaluating novel methods. However, each dataset has different set of class labels, visual definition of classes, images following a specific distribution, annotation protocols, etc. In this paper we explore the automatic discovery of visual-semantic relations between l...

How stable are Transferability Metrics evaluations?

Preprint

Apr 2022

Transferability metrics is a maturing field with increasing interest, which aims at providing heuristics for selecting the most suitable source models to transfer to a given target dataset, without fine-tuning them all. However, existing works rely on custom experimental setups which differ across papers, leading to inconsistent conclusions about w...

Transferability Metrics for Selecting Source Model Ensembles

Preprint

Nov 2021

We address the problem of ensemble selection in transfer learning: Given a large pool of source models we want to select an ensemble of models which, after fine-tuning on the target training set, yields the best performance on the target test set. Since fine-tuning all possible ensembles is computationally prohibitive, we aim at predicting performa...

Transferability Estimation using Bhattacharyya Class Separability

Preprint

Nov 2021

Transfer learning has become a popular method for leveraging pre-trained models in computer vision. However, without performing computationally expensive fine-tuning, it is difficult to quantify which pre-trained source models are suitable for a specific target task, or, conversely, to which tasks a pre-trained source model can be easily adapted to...

Comparison of our task type specific networks to recent networks on...

Factors of Influence for Transfer Learning Across Diverse Appearance Domains and Task Types

Article

Full-text available

Nov 2021

Transfer learning enables to re-use knowledge learned on a source task to help learning a target task. A simple form of transfer learning is common in current state-of-the-art computer vision models, i.e. pre-training a model for image classification on the ILSVRC dataset, and then fine-tune on any target task. However, previous systematic studies...

Towards Reusable Network Components by Learning Compatible Representations

Article

May 2021

This paper proposes to make a first step towards compatible and hence reusable network components. Rather than training networks for different tasks independently, we adapt the training process to produce network components that are compatible across tasks. In particular, we split a network into two components, a features extractor and a target tas...

Factors of Influence for Transfer Learning across Diverse Appearance Domains and Task Types

Preprint

Mar 2021

Panoptic Image Annotation with a Collaborative Assistant

Conference Paper

Oct 2020

Continuous Adaptation for Interactive Object Segmentation by Learning from Corrections

Chapter

Oct 2020

In interactive object segmentation a user collaborates with a computer vision model to segment an object. Recent works employ convolutional neural networks for this task: Given an image and a set of corrections made by the user as input, they output a segmentation mask. These approaches achieve strong performance by training on large datasets but t...

Connecting Vision and Language with Localized Narratives

Chapter

Oct 2020

We propose Localized Narratives, a new form of multimodal image annotations connecting vision and language. We ask annotators to describe an image with their voice while simultaneously hovering their mouse over the region they are describing. Since the voice and the mouse pointer are synchronized, we can localize every single word in the descriptio...

Training Neural Networks to Produce Compatible Features

Preprint

Apr 2020

This paper makes a first step towards compatible and hence reusable network components. Rather than training networks for different tasks independently, we adapt the training process to produce network components that are compatible across tasks. We propose and compare several different approaches to accomplish compatibility. Our experiments on CIF...

The Open Images Dataset V4: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale

Article

Mar 2020

We present Open Images V4, a dataset of 9.2M images with unified annotations for image classification, object detection and visual relationship detection. The images have a Creative Commons Attribution license that allows to share and adapt the material, and they have been collected from Flickr without a predefined list of class names or tags, lead...

Connecting Vision and Language with Localized Narratives

Preprint

Dec 2019

We propose Localized Narratives, an efficient way to collect image captions with dense visual grounding. We ask annotators to describe an image with their voice while simultaneously hovering their mouse over the region they are describing. Since the voice and the mouse pointer are synchronized, we can localize every single word in the description....

Fig. 2 Visual illustration of (Decomposed) Transposed Convolution. The...

Fig. 3 Visual illustration of Depth-to-Space. Fist a normal convolution...

Fig. 4 Spatial dependency for the outputs of a Transposed Convolution...

Fig. 5 Spatial dependency for the outputs of a Decomposed Transposed...

Fig. 6 Visual illustration of our bilinear additive upsampling. We...

The Devil is in the Decoder: Classification, Regression and GANs

Article

Full-text available

Dec 2019

Many machine vision applications, such as semantic segmentation and depth prediction, require predictions for every pixel of the input image. Models for such problems usually consist of encoders which decrease spatial resolution while learning a high-dimensional representation, followed by decoders who recover the original input resolution and resu...

Continuous Adaptation for Interactive Object Segmentation by Learning from Corrections

Preprint

Nov 2019

In interactive object segmentation a user collaborates with a computer vision model to segment an object. Recent works rely on convolutional neural networks to predict the segmentation, taking the image and the corrections made by the user as input. By training on large datasets they offer strong performance, but they keep model parameters fixed at...

Panoptic Image Annotation with a Collaborative Assistant

Preprint

Jun 2019

This paper aims to reduce the time to annotate images for the panoptic segmentation task, which requires annotating segmentation masks and class labels for all object instances and stuff regions. We formulate our approach as a collaborative process between an annotator and an automated assistant agent who take turns to jointly annotate an image usi...

Interactive Full Image Segmentation by Considering All Regions Jointly

Conference Paper

Jun 2019

Interactive Full Image Segmentation

Preprint

Dec 2018

We address the task of interactive full image annotation, where the goal is to produce accurate segmentations for all object and stuff regions in an image. To this end we propose an interactive, scribble-based annotation framework which operates on the whole image to produce segmentations for all regions. This enables the annotator to focus on the...

The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale

Preprint

Nov 2018

Fluid Annotation: A Human-Machine Collaboration Interface for Full Image Annotation

Conference Paper

Oct 2018

We introduce Fluid Annotation, an intuitive human-machine collaboration interface for annotating the class label and outline of every object and background region in an image. Fluid annotation is based on three principles:(I) Strong Machine-Learning aid. We start from the output of a strong neural network model, which the annotator can edit by corr...

Fluid Annotation: A Human-Machine Collaboration Interface for Full Image Annotation

Preprint

Jun 2018

We introduce Fluid Annotation, an intuitive human-machine collaboration interface for annotating the class label and outline of every object and background region in an image. Fluid Annotation starts from the output of a strong neural network model, which the annotator can edit by correcting the labels of existing regions, adding new regions to cov...

Figure 8: Spatial context visualizations. (Top) Each disc is for a...

A quantitative comparison of different stuff anno-

COCO-Stuff: Thing and Stuff Classes in Context

Conference Paper

Full-text available

Jun 2018

Learning Intelligent Dialogs for Bounding Box Annotation

Conference Paper

Jun 2018

Understanding visual scenes

Article

Mar 2018

A growing body of recent work focuses on the challenging problem of scene understanding using a variety of cross-modal methods which fuse techniques from image and text processing. In this paper, we develop representations for the semantics of scenes by explicitly encoding the objects detected in them and their spatial relations. We represent image...

Learning Intelligent Dialogs for Bounding Box Annotation

Article

Dec 2017

We introduce Intelligent Annotation Dialogs for bounding box annotation. We train an agent to automatically choose a sequence of actions for a human annotator to produce a bounding box in a minimal amount of time. Specifically, we consider two actions: box verification [37], where the annotator verifies a box generated by an object detector, and ma...

Table 2 : Accuracy vs. processing time for different encoding methods...

Fig. 4: An illustrative example when VLAD fails to provide a reliable...

Fig. 5: Impact of the normalization parameter on the Fisher Vector...

Fig. 6: Impact of the normalization parameter on the SD-VLAD...

Fig. 7: The pipeline for real-time video classification. This framework...

Efficient Human Action Recognition using Histograms of Motion Gradients and VLAD with Descriptor Shape Information

Article

Full-text available

Nov 2017

Feature extraction and encoding represent two of the most crucial steps in an action recognition system. For building a powerful action recognition pipeline it is important that both steps are efficient and in the same time provide reliable performance. This work proposes a new approach for feature extraction and encoding that allows us to obtain r...

Extreme Clicking for Efficient Object Annotation

Conference Paper

Oct 2017

The Devil is in the Decoder

Conference Paper

Sep 2017

Many machine vision applications require predictions for every pixel of the input image (for example semantic segmentation, boundary detection). Models for such problems usually consist of encoders which decreases spatial resolution while learning a high-dimensional representation, followed by decoders who recover the original input resolution and...

Figure 2. Illustration of MIL + knowledge transfer for the target class...

Figure 3. Illustration of part of the ImageNet hierarchy, with our...

Revisiting knowledge transfer for training object class detectors

Article

Full-text available

Aug 2017

We propose to revisit knowledge transfer for training object detectors on target classes with only weakly supervised training images. We present a unified knowledge transfer framework based on training a single neural network multi-class object detector over all source classes, organized in a semantic hierarchy. This provides proposal scoring funct...

Figure 1. Annotating an instance of motorbike: (a) The conventional way...

Figure 3. Qualification test. (Left) Qualification test examples of the...

Comparison of extreme clicking and PASCAL VOC ground-truth.

Extreme clicking for efficient object annotation

Article

Full-text available

Aug 2017

Manually annotating object bounding boxes is central to building computer vision datasets, and it is very time consuming (annotating ILSVRC [53] took 35s for one high-quality box [62]). It involves clicking on imaginary corners of a tight box around the object. This is difficult as these corners are often outside the actual object and several adjus...

The Devil is in the Decoder

Article

Jul 2017

Training Object Class Detectors with Click Supervision

Conference Paper

Jul 2017

Figure 4. The error distance of the annotators as a function of the...

Figure 5. Box center score S bc on bicycle examples. (left): Oneclick...

Figure 7. (left) The distribution of errors that the annotators made...

Training object class detectors with click supervision

Article

Full-text available

Apr 2017

Training object class detectors typically requires a large set of images with objects annotated by bounding boxes. However, manually drawing bounding boxes is very time consuming. In this paper we greatly reduce annotation time by proposing center-click annotations: we ask annotators to click on the center of an imaginary bounding box which tightly...

Source Code HOG, HOF, MBH, HMG

Data

Feb 2017

Figure 6: Annotation time versus image boundary complexity. Each circle...

Figure 7: The number of stuff classes occurring in at least x images...

COCO-Stuff: Thing and Stuff Classes in Context

Article

Full-text available

Dec 2016

Semantic classes can be either things (objects with a well-defined shape, e.g. car, person) or stuff (amorphous background regions, e.g. grass, sky). While lots of classification and detection works focus on thing classes, less attention has been given to stuff classes. Nonetheless, stuff classes are important as they allow to explain important asp...

Figure 2: Overview of three semantic segmentation architectures. We...

Evaluation on PASCAL Context validation. We show results using a...

Region-Based Semantic Segmentation with End-to-End Training

Conference Paper

Full-text available

Oct 2016

We propose a novel method for semantic segmentation, the task of labeling each pixel in an image with a semantic class. Our method combines the advantages of the two main competing paradigms. Methods based on region classification offer proper spatial support for appearance measurements, but typically operate in two separate stages, none of which t...

Fig. 2. Overview of three semantic segmentation architectures. We show...

Region-based semantic segmentation with end-to-end training

Article

Full-text available

Jul 2016

Fig. 3: The process of dividing the video in blocks and volumes. The...

Fig. 2: Visualization of the process for capturing the motion...

Histograms of Motion Gradients for Real-time Video Classification

Conference Paper

Full-text available

Jun 2016

Besides appearance information, the video contains temporal evolution, which represents an important and useful source of information about its content. Many video representation approaches are based on the motion information within the video. The common approach to extract the motion information is to compute the optical flow from the vertical and...

We Don’t Need No Bounding-Boxes: Training Object Class Detectors Using Only Human Verification

Conference Paper

Jun 2016

Figure 2: Our two verification strategies for some images of the dog...

Figure 3: Visualisation of search space reduction induced by ?

Figure 4: Comparing the search process of Yes/No verification with...

We don't need no bounding-boxes: Training object class detectors using only human verification

Article

Full-text available

Feb 2016

Training object class detectors typically requires a large set of images in which objects are annotated by bounding-boxes. However, manually drawing bounding-boxes is very time consuming. We propose a new scheme for training object detectors which only requires annotators to verify bounding-boxes produced automatically by the learning algorithm. Ou...

Source Code HOG, HOF, MBH, HMG

Data

Feb 2016

Figure 2: Sample images from the Blip10000 [5], UCF50 [6], UCF101 [61]...

Figure 6: Total computational time, in ms, per retrieval, estimated for...

Fisher Kernel Temporal Variation-based Relevance Feedback for Video Retrieval

Article

Full-text available

Oct 2015

This paper proposes a novel framework for Relevance Feedback based on the Fisher Kernel (FK). Specifically, we train a Gaussian Mixture Model (GMM) on the top retrieval results (without supervision) and use this to create a FK representation, which is therefore specialized in modelling the most relevant examples. We use the FK representation to exp...

Fig. 1: Experiments on the Rochester ADL dataset: (a) the performance...

Cluster encoding for modelling temporal variation in video

Conference Paper

Full-text available

Sep 2015

Classical Bag-of-Words methods represent videos by modeling the variation of local visual descriptors throughout the video. In this approach they mix variation in time and space indiscriminately while these dimensions are fundamentally different. Therefore, in this paper we present a novel method for video representation which explicitly captures t...

Figure 2: The first row shows multiple region proposals (left)...

Joint Calibration for Semantic Segmentation

Article

Full-text available

Jul 2015

Semantic segmentation is the task of assigning a class-label to each pixel in an image. We propose a region-based semantic segmentation framework which handles both full and weak supervision, and addresses three common problems: (1) Objects occur at multiple scales and therefore we should use regions at multiple scales. However, these regions are o...

Affective Analysis of Professional and Amateur Abstract Paintings Using Statistical Analysis and Art Theory

Article

Jul 2015

When artists express their feelings through the artworks they create, it is believed that the resulting works transform into objects with “emotions” capable of conveying the artists' mood to the audience. There is little to no dispute about this belief: Regardless of the artwork, genre, time, and origin of creation, people from different background...

Figure 1. Monolithic vs situational object boundary detection....

Situational Object Boundary Detection

Article

Full-text available

Apr 2015

Intuitively, the appearance of true object boundaries varies from image to image. Hence the usual monolithic approach of training a single boundary predictor and applying it to all images regardless of their content is bound to be suboptimal. In this paper we therefore propose situational object boundary detection: We first define a variety of situ...

Video classification with densely extracted HOG/HOF/MBH Features: An evaluation of the accuracy/computational video classification with densely extracted HOG/HOF/MBH features: An evaluation of the Accuracy/Computational efficiency Trade-Off

Article

Mar 2015

The current state-of-the-art in video classification is based on Bag-of-Words using local visual descriptors. Most commonly these are histogram of oriented gradients (HOG), histogram of optical flow (HOF) and motion boundary histograms (MBH) descriptors. While such approach is very powerful for classification, it is also computationally expensive....

Fig. 1 General framework for video classification using a Bag-of-Words...

Table 2 Comparing the dense HOG/HOF implementation of [25] and ours?

Table 3 Trade-off between frame sampling rate and accuracy?

Fig. 4 Accuracy comparison between [25] and our HOG/HOF descriptors ?

Table 4 Comparison of different optical flow methods used to compute...

Video classification with Densely extracted HOG/HOF/MBH features: an evaluation of the accuracy/computational efficiency trade-off

Article

Full-text available

Sep 2014

Learning to Group Objects

Article

Sep 2014

This paper presents a novel method to generate a hypothesis set of class-independent object regions. It has been shown that such object regions can be used to focus computer vision techniques on the parts of an image that matter most leading to significant improvements in both object localisation and semantic segmentation in recent years. Of course...

Figure 2: Blocks in a video volume can be reused for descriptor...

Realtime video classification using dense HOF/HOG

Conference Paper

Full-text available

Apr 2014

The current state-of-the-art in Video Classification is based on Bag-of-Words using local visual descriptors. Most commonly these are Histogram of Oriented Gradient (HOG) and Histogram of Optical Flow (HOF) descriptors. While such system is very powerful for classification, it is also computationally expensive. This paper addresses the problem of c...

TUHOI: Trento Universal Human Object Interaction Dataset

Conference Paper

Jan 2014

Figure 5: Example of images from MIT dataset (Judd et al., 2009).?

Figure 6: Example of images from NUSEF dataset (Subramanian et al., 2010).?

Figure 10: Influence of the different weighting schemes of the...

Figure 10 shows the averaged AUCs for images from the MIT dataset...

A proto-object-based computational model for visual saliency

Article

Full-text available

Nov 2013

State-of-the-art bottom-up saliency models often assign high saliency values at or near high-contrast edges, whereas people tend to look within the regions delineated by those edges, namely the objects. To resolve this inconsistency, in this work we estimate saliency at the level of coherent image regions. According to object-based attention theory...

Table 1 : Comparison with State-of-the-Art (SoA) in terms of Mean...

Time matters!: capturing variation in time in video using fisher kernels

Conference Paper

Full-text available

Oct 2013

In video global features are often used for reasons of computational efficiency, where each global feature captures information of a single video frame. But frames in video change over time, so an important question is: how can we meaningfully aggregate frame-based features in order to preserve the variation in time? In this paper we propose to use...

Daily Living Activities Recognition via Efficient High and Low Level Cues Combination and Fisher Kernel Representation

Conference Paper

Sep 2013

In this work we propose an efficient method for activity recognition in a daily living scenario. At feature level, we propose a method to extract and combine low- and high-level information and we show that the performance of body pose estimation (and consequently of activity recognition) can be significantly improved. Particularly, we propose an a...

Figure 2: Two examples of our selective search showing the necessity of...

Table 4: Our selective search methods resulting from a greedy search....

Figure 6: The Average Best Overlap scores per class for several method...

Selective Search for Object Recognition

Article

Full-text available

Sep 2013

This paper addresses the problem of generating possible object locations for use in object recognition. We introduce selective search which combines the strength of both an exhaustive search and segmentation. Like segmentation, we use the image structure to guide our sampling process. Like exhaustive search, we aim to capture all possible object lo...

VSEM: An open library for visual semantics representation

Conference Paper

Aug 2013

Table 3 : Comparison with state of the art algorithms (mean average...

Figure 2: The influence of GMM centroids number on system performance...

Fisher kernel based relevance feedback for multimodal video retrieval

Conference Paper

Full-text available

Apr 2013

This paper proposes a novel approach to relevance feedback based on the Fisher Kernel representation in the context of multimodal video retrieval. The Fisher Kernel representation describes a set of features as the derivative with respect to the log-likelihood of the generative probability distribution that models the feature distribution. In the c...

Figure 4: Correct action positions of 2,038 images in the 89 action...

Figure 6: General verbs in images: based on verbobject associations...

Exploiting language models to recognize unseen actions

Conference Paper

Full-text available

Apr 2013

This paper addresses the problem of human action recognition. Typically, visual action recognition systems need visual training examples for all actions that one wants to recognize. However, the total number of possible actions is staggering as not only are there many types of actions but also many possible objects for each action type. Normally, v...

Salient object detection: From pixels to segments

Article

Jan 2013

In this paper we propose a novel approach to the task of salient object detection. In contrast to previous salient object detectors that are based on a spotlight attention theory, we follow an object-based attention theory and incorporate the notion of an object directly into our saliency measurements. Particularly, we consider proto-objects as uni...

Exploiting Language Models for Visual Recognition

Conference Paper

Jan 2013

Fig. 1. General process of our method for image annotation. The red...

Fig. 2. Performance comparison of image annotation w.r.t. the...

Fig. 3. Performance comparison of video concept detection w.r.t. the...

Fig. 4. Performance comparison of 3-D motion data analysis w.r.t. the...

Fig. 5. Performance comparison with semi-supervised approaches on...

Discriminating Joint Feature Analysis for Multimedia Data Understanding

Article

Full-text available

Dec 2012

In this paper, we propose a novel semi-supervised feature analyzing framework for multimedia data understanding and apply it to three different applications: image annotation, video concept detection and 3-D motion data analysis. Our method is built upon two advancements of the state of the art: (1) l2, 1-norm regularized feature selection which ca...

Figure 1: Paintings from the MART collection ordered by human scores.

Table 2: Distribution of paintings according to " emotions " of the...

Figure 4: Paintings with superimposed fixated locations: the first row...

In the eye of the beholder: Employing statistical analysis and eye tracking for analyzing abstract paintings

Conference Paper

Full-text available

Oct 2012

Most artworks are explicitly created to evoke a strong emotional response. During the centuries there were several art movements which employed different techniques to achieve emotional expressions conveyed by artworks. Yet people were always consistently able to read the emotional messages even from the most abstract paintings. Can a machine learn...

Figure 2: Similarity matrices for the human subjects (top left), Global...

Distributional semantics with eyes: using image analysis to improve computational representations of word meaning

Conference Paper

Full-text available

Oct 2012

The current trend in image analysis and multimedia is to use information extracted from text and text processing techniques to help vision-related tasks, such as automated image annotation and generating semantically rich descriptions of images. In this work, we claim that image analysis techniques can "return the favor" to the text processing comm...

Fig. 2. Performance variation w.r.t and when we fi x the number of...

Web Image Annotation Via Subspace-Sparsity Collaborated Feature Selection

Article

Full-text available

Aug 2012

The number of web images has been explosively growing due to the development of network and storage technology. These images make up a large amount of current multimedia data and are closely related to our daily life. To efficiently browse, retrieve and organize the web images, numerous approaches have been proposed. Since the semantic concepts of...

Figure 1: Example of Pascal VOC 2007 images and their FAST based...

Figure 2: Graph of occurrences of objects in Pascal VOC 2007, strength...

Figure 3: Visual examples and minimal PoS form of motorcycle events:...

(Unseen) event recognition via semantic compositionality

Conference Paper

Full-text available

Jun 2012

Since high-level events in images (e.g. “dinner”, “motorcycle stunt”, etc.) may not be directly correlated with their visual appearance, low-level visual features do not carry enough semantics to classify such events satisfactorily. This paper explores a fully compositional approach for event based image retrieval which is able to overcome this sho...

Categorization of a collection of pictures into structured events

Article

Jun 2012

This demo showcases our system which classifies a collection of pictures into events and its individual images into subevents. We reach this goal by analysizing visual features with an efficient implementation of a Bag-of-Words method and by leveraging time information both for events and sub-events. The system allows the user to analyze a collecti...

(Unseen) Event Recognition Using Faceted Recognition

Data

May 2012

Fig. 2 Average Precision Confusion Matrix (CAMP) of the normal...

Fig. 3 Pixel-wise contribution to the classification for top ranked...

Fig. 8 Confusion matrices when using only the descriptors from the...

The Visual Extent of an Object

Article

Full-text available

May 2012

The visual extent of an object reaches beyond the object itself. This is a long standing fact in psychology and is reflected in image retrieval techniques which aggregate statistics from the whole image in order to identify the object within. However, it is unclear to what degree and how the visual extent of an object affects classification perform...

The Visual Extent of an Object - Suppose We Know the Object Locations.

Article

Full-text available

Jan 2012

Supplementary Material

Data

Jan 2012

Figure 2: Classification of sports and social events.

Figure 3: Classification of sub-events: exploitation of time at event...

Figure 4: Graduation class: F-measure for different granularity of...

Table 4 : F-measure for the different test methodologies

Exploitation of time constraints for (sub-)event recognition

Article

Full-text available

Nov 2011

The aim of this paper is threefold: (a) to introduce a dataset for the recognition of events and sub-events in photographs taken by common users; (b) to propose event-based classification to achieve a more accurate labeling of event-related photo collections; (c) to use time clustering information to improve the sub-event recognition in an efficien...

Figure 1: The illustration of our SFSS image annotation framework. ?

Table 1: A brief comparison between the different methods.?

Exploiting the entire feature space with sparsity for automatic image annotation

Conference Paper

Full-text available

Nov 2011

The explosive growth of digital images requires effective methods to manage these images. Among various existing methods, automatic image annotation has proved to be an important technique for image management tasks, e.g., image retrieval over large-scale image databases. Automatic image annotation has been widely studied during recent years and a...

Figure 2. Positioning of this paper with respect to related work.

Figure 3. Two examples of our hierarchical grouping algorithm showing...

Segmentation as selective search for object recognition

Conference Paper

Full-text available

Nov 2011

For object recognition, the current state-of-the-art is based on exhaustive search. However, to enable the use of more expensive features and classifiers and thereby progress beyond the state-of-the-art, a selective search strategy is needed. Therefore, we adapt segmentation as a selective search by reconsidering segmentation: We propose to generat...

Instant Bag-of-Words served on a laptop

Conference Paper

Full-text available

Apr 2011

This demo showcases our realtime implementation of concept classification using the Bag-of-Words method embedded within MediaTable, our interactive categorization tool for large multimedia collections. MediaTable allows the users to open images from disk or download these directly from the internet. Each image is then processed using the Bagof- Wor...

Real-Time Visual Concept Classification

Article

Full-text available

Dec 2010

As datasets grow increasingly large in content-based image and video retrieval, computational efficiency of concept classification is important. This paper reviews techniques to accelerate concept classification, where we show the trade-off between computational efficiency and accuracy. As a basis, we use the Bag-of-Words algorithm that in the 2008...

Real-time bag of words, approximately

Conference Paper

Full-text available

Jul 2009

We start from the state-of-the-art Bag of Words pipeline that in the 2008 benchmarks of TRECvid and PASCAL yielded the best performance scores. We have contributed to that pipeline, which now forms the basis to compare various fast alternatives for all of its components: (i) For descriptor extraction we propose a fast algorithm to densely sample SI...

Figure 1: Confusion Average Precision Matrices (CAMPs) where rows...

What is the spatial extent of an object?

Conference Paper

Full-text available

Jun 2009

This paper discusses the question: Can we improve the recognition of objects by using their spatial context? We start from Bag-of-Words models and use the Pascal 2007 dataset. We use the rough object bounding boxes that come with this dataset to investigate the fundamental gain context can bring. Our main contributions are: (I) The result of Zhang...

What is the Spatial Extent of an Object?

Article

Full-text available

Jun 2009

This paper discusses the question: Can we improve the recognition of objects by using their spatial context? We start from Bag-of-Words models and use the Pascal 2007 dataset. We use the rough object bounding boxes that come with this dataset to investigate the fundamental gain con-text can bring. Our main contributions are: (I) The result of Zhang...

Figure 1: Data flow conventions as used in this Section. Different...

Figure 2: MediaMill TRECVID 2008 concept detection scheme, using the...

Figure 3: General scheme for spatio-temporal sampling of image regions,...

Figure 4: General scheme of the visual feature extraction methods used.?

Figure 5: General scheme for transforming visual features into a...

The MediaMill TRECVID 2008 semantic video search engine

Conference Paper

Full-text available

Jan 2008

In this paper we describe our TRECVID 2008 video retrieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and interactive search. Rather than continuing to increase the number of concept detectors available for retrieval, our TRECVID 2008 experiments focus on increasing the robustness of a small se...

Table 1 : Automatic search run MAP scores, with highest MAP highlighted...

Figure 2: Comparison of MediaMill video indexing experiments with...

Figure 3: Screenshots of the MediaMill Semantic Video Search Engine...

Figure 4: Comparison of interactive video search results for 24 topics...

The MediaMill TRECVID 2006 semantic video search engine

Conference Paper

Full-text available

Jan 2007

In this paper we describe our TRECVID 2007 experiments. The MediaMill team participated in two tasks: concept de- tection and search. For concept detection we extract region- based image features, on grid, keypoint, and segmentation level, which we combine with various supervised learners. In addition, we explore the utility of temporal image featu...

Designing a Story Database for Use in Automatic Story Generation

Conference Paper

Full-text available

Sep 2006

In this paper we propose a model for the representation of stories in a story database. The use of such a database will enable computational story generation systems to learn from previous stories and associated user feedback, in order to create believable stories with dramatic plots that invoke an emotional response from users. Some of the disting...

Network

Andrew Blake
University of Oxford
Jitendra Malik
University of California, Berkeley
Jorge Sánchez
National University of Córdoba
Antonio Criminisi
Microsoft
Joost van de Weijer
Autonomous University of Barcelona

Rita Cucchiara
University of Modena and Reggio Emilia
Chen Change Loy
Nanyang Technological University
Marcel Worring
University of Amsterdam
Cees Snoek
University of Amsterdam
T. Gevers
University of Amsterdam

University of Edinburgh

School of Informatics
Edinburgh, United Kingdom

Current position

Researcher

Zan Gao's Lab

Top co-authors

Bogdan Ionescu
Universitatea Na?ional? de ?tiin?? ?i Tehnologie Politehnica Bucure?ti
Ksenia Konyushkova
Swiss Federal Institute of Technology in Lausanne
Enver Sangineto
University of Trento
Nathan Silberman
New York University
Jan Van Gemert
Delft University of Technology

All co-authors (50)

View All

什么是太监	粘液丝是什么	减肥可以吃什么水果	1.29是什么星座	阴虱用什么药物
菊花可以和什么一起泡水喝	反乌托邦什么意思	海南简称是什么	15天来一次月经是什么原因	脱氧核糖是什么
白垩纪是什么意思	猫为什么不怕蛇	南方是什么生肖	百思不得其解是什么意思	369是什么意思啊
补血最快的方法是什么	三尖瓣轻度反流说明什么	养殖什么赚钱	梦见花开是什么预兆	什么药治痔疮最快

脑梗长期吃什么药好hcv8jop2ns7r.cn	指甲油什么牌子好hcv9jop5ns7r.cn	火眼金睛是什么生肖dayuxmw.com	为什么怀孕会孕酮低hcv9jop0ns2r.cn	喉结肿大是什么原因hcv8jop2ns4r.cn
孕妇腿抽筋是什么原因hcv9jop8ns2r.cn	肚子疼拉稀吃什么药hcv8jop7ns9r.cn	先天性心脏病是什么原因造成的chuanglingweilai.com	湿疹用什么药好hcv7jop9ns5r.cn	男人眉毛长代表什么hcv9jop5ns4r.cn
埃及法老是什么意思hlguo.com	打磨工是做什么的hcv9jop6ns3r.cn	荨麻疹吃什么hcv8jop6ns3r.cn	身体虚弱打什么营养针hcv8jop5ns0r.cn	吃了安宫牛黄丸要禁忌什么不能吃hcv8jop1ns4r.cn
为什么不来大姨妈也没有怀孕hcv8jop5ns6r.cn	什么云见日liaochangning.com	梦见手机摔碎了是什么意思hcv9jop2ns0r.cn	陈皮泡水喝有什么功效hcv9jop4ns5r.cn	狗生小狗前有什么征兆hcv9jop3ns4r.cn