This is the home page of MVSE, a three-year research project funded by EPSRC (EP/V002740/2) and undertaken by a team from Queen’s University Belfast, Ulster University, University of Surrey, University of Cambridge, and the BBC.


How to search for content effectively and efficiently from large video archives such as BBC TV programmes is a significant challenge. Search is typically done via keyword queries using pre-defined metadata such as titles, tags and viewer’s notes. However, it is difficult to use keywords to search for specific moments in a video where a particular speaker talks about a specific topic at a particular location. Video search by examples is a desirable approach for this scenario as it allows search for content by one or more examples of the interested content without having to specify interest in keyword. However, video search by examples is notoriously challenging, and its performance is still poor.

To improve search performance, multiple modalities should be considered – image, sound, voice and text, as each modality provides a separate search cue so multiple cues should identify more relevant content. This is multimodal video search by examples (MVSE). In this project we will study efficient, effective, scalable and robust MVSE where video archives are large, historical and dynamic; and the modalities are person (face or voice), context, and topic. The aim is to develop a framework for MVSE and validate it through the development of a prototype search tool. Such a search tool will be useful for organisations such as the BBC and British Library, who maintain large collections of video archives and want to provide a search tool for their own staff as well as for the public. It will also be useful for companies such as Youtube who host videos from the public and want to enable video search by examples. We will address key challenges in video segmentation, content representation, hashing, ranking and fusion.