Image Retrieval Tuned for Small Objects

An image is worth more than a thousand words. Have you ever seen a mushroom and started to think whether it is edible or not? Or tried to find a suitable spare part for a machine by only having a picture of the current part? Both of these problems could be solved with image retrieval. Wapice Data Scientist Jesper studied how image retrieval can be tuned for small objects in order to enable completely new application areas.

Image retrieval is a form of information retrieval. It answers the question of “How can we find relevant images from a massive database using only a single example of the desired image?” This has several advantages to text-based image search. For example, if images of a certain object are desired, describing it using text is very difficult. Consider which of the following queries is easier and which brings more precise search results.

Chair

What Is Image Retrieval?

Image retrieval is a fundamental and well-known problem in computer vision. The goal is to find relevant images based on an example of a relevant image. The applications of such system include, among others, reverse image search and finding a particular object from a web shop or catalogue based on an image of that object. This concept is shown below.

Chair — Figure: Potential retrieval results for different queries. The top row shows potential retrieval results with current methods when the image is cluttered. The second row shows the correct and desired retrieval results using this method.

The most widely cited methods are developed for images of buildings covering most of the image making the retrieval task easier than using small objects with varying backgrounds. Due to the small size of the object or images with a lot of clutter around the object, its retrieval with existing methods can be difficult. Additionally, most image retrieval benchmarks use manually annotated bounding boxes for the query object.

How to Tune it for Small Objects?

This problem can be approached with an object detection approach. More specifically, since no more training data is desired, weakly supervised object detection is used to determine regions in the image where the retrieval object lies. This means that the same training data can be used for training in the object detection and in the retrieval phases. Additionally, no additional annotation is needed, as all can be done with image-level labels.

Image search — Figure: Actual image retrieval results. The image highlighted on the left is used for query. The top row shows how the whole image can be used for retrieval while the bottom row shows how weakly supervised object detection can help identify small objects from an image.

How Can it Be Used?

Most straight forward method is a mobile app that lets the user take picture of objects and scenes. These pictures can either be analyzed directly on the phone if the database is small enough or sent to cloud or on-premises server for analysis. It can also be a hybrid: only the analysis result is sent forward, preventing the picture from being exposed to the internet. Then, only the features that are meaningful to the system go through the internet, preventing data leaks.

Another option is running the entire system on an edge device. For this, the camera could be attached to a maneuverable stand and the images taken with a button. The database is stored on the edge device, where also the images are analyzed. This method does not require internet access.

Where Can it Be Used?

There are several applications for image retrieval both in the industry as well as in consumer software. Most straight forward application is connecting the method to an already existing database. For example, the method could be trained to find images from a web shop or spare part catalogue. Alternatively, the method can be used in place of image classification but with the advantage that the number of classes does not need to be decided before and can be extended later by simply adding more images to the database with class information. Additionally, this could be used for finding specific metadata of an unknown image, using the metadata stored of similar images.

A model specifically fine-tuned for small objects is relevant when even more automation is desired. With this model, the end user can find a particular object from a catalogue, even if the original image is cluttered with different kinds of objects. This might be the case when the target object is in its usual environment.

Application areas

Find the model number / spec sheet etc. from a single image of a component

Verify in logistic chains that all objects from point A are found in point B using only images of the objects

Sort relevant images into folders

Use an image to find similar spare parts

Find intellectual property rights violations

Find diagnosis for a medical condition

Find suggestions for similar items in a shopping catalog

Find the matching instance of an object from another image of the same scene, in e.g. 3D stereo vision or multi-camera setup

and more...