Computer vision is part of AI consulting tasks that often involve classification problems where one tries to train a deep learning neural net to classify a given image in one of discrete classes.
Typical examples are for example classifying images of animals, food, etc.
Classical problem from this set was classifying images as either cat or dog, see e.g. https://www.kaggle.com/c/dogs-vs-cats
Transfer learning
In cases like this one often uses the benefits of transfer learning. This means that one significantly short the time of development to train the NN for a particular CV problem by starting with the pre-trained neural net that was trained on some other computer vision problem.
It is common to use pre-trained models from well know and researched problems. Examples of pre-trained computer vision models are VGG or Inception model.
Geo location from photos
Recently, as part of computer vision consulting, I came across a quite unique problem for computer vision, which involves a very interesting classification from images, where the results is a set of location coordinates, latitude and longitude.
In other words, given an image, the deep learning net tries to determine the physical location where the image was taken, giving a pair of number for latitude and longitude.
There are various researchers that took up this challenge. Several years ago, researchers with Google were some of the first with their PlaNet solution:
https://arxiv.org/abs/1602.05314
On first sight, the problem looks very difficult. One can easily find a picture where it is hard to detect the location. However, many images contain a lot of information due to presence of landmarks, typical vegetation, weather, architectural features and similar.
The approach taken by the PlaNet solution and another solution that we will describe shortly is to partition the surface of the earth in thousands of cells and then use a big set of geotagged images for classification. Example of huge dataset containing a large number of geotagged images is e.g. Flickr.
Another interesting approach is the one taken by the team from Leibniz Information Centre for Science and Technology (TIB), Hannover and 2 L3S Research Center, Leibniz Universitaet Hannover in Germany.
Their approach is similar to PlaNet – they divide the whole earth in cells but they also have a special decision layer which takes into account the scene content – whether it is indoor, natural or an urban setting.
I implemented their library https://github.com/TIBHannover/GeoEstimation and can confirm it works with surprisingly good results.
The team has also put out an online version of their model and you can check it out here:
https://tibhannover.github.io/GeoEstimation/
If I send this image to the photo geo location tool:
The deep learning tool correctly puts the image in the mediterranean region (its correct location is Ibiza, Spain).