Art History vs Computer Vision

Art History vs Computer Vision

SIFT Tests

Text analysis & processing is a common tool in literary studies. Distance Reading is a way of seeing a corpus of work in a new light. By aggregating large amounts of data we can find new patterns and connections informing and expanding our close reading interpretations. Text analysis is now commonplace in textual studies yet distance reading is not a common occurrence the fields of visual studies.

If we were able to apply these methods to a body of visual works we might be able to make new connections between seemingly disparate artists or genres. Like literary works, an artwork (especially pre-modern art) is often composed of reoccurring tropes or motifs. These feature may range from a gesture to a symbolic emblem. The difficult problem of course is that unlike the medium of text, the artist encodes these tropes in their own visual style/manner/technique. (I know this is a very overly clinical assessment and my years of history of art studies could argue otherwise, but lets set that aside for now.)

My first goal is to get some level of feature detection. I’m not looking for context recognition just some kind of pattern matching. I had read a lot of about the fairly recent SIFT technique and thought it might be a good option for what I was after. Having little experience with computer vision I found some existing implementations of SIFT and wanted to run some tests. For these demos I used the VLFeat tool and a modified version of Jan Erik Solem’s python implementation.

Just to get a baseline I ran Giotto’s Arena Chapel Last Judgement and a detail of the same work though the script:

In these examples the two images are next to each other with any matching features found by the SIFT implementation connected with a red line.

As you can see the program had no problem finding the detail within the full work.

Next I wanted to try two separate works with a similar composition/elements, I picked two emblems due to their simpler line composition:

Some matches but nothing which would make sense to our context. Again, I’m not thinking that the SIFT algorithm will say “A-ha! Two Cupids!” and draw a line between them. I was hoping that perhaps the repeated verticality of the cage’s bars may be a similar enough feature to trigger a match or something along those line.

At this point I start playing with the peak and edge thresholds to increase the number of features detected but still no encouraging results. Although, while it looked like this wasn’t going to work as is, I ran a few more sets of images. I selected two sets by Picasso, collage and etching:

Again, no real positive matches.

My initial results make sense as these are not photographs, this implementation is looking for pixel perfect matches. The problem then is not feature detection, SIFT really excels at that regardless of the image. For example, returning to our emblems, here are the detected features on the left emblem image:

In order to get results we are expecting we need to make the matching of the features less precise or at least a little more tolerant.

If we increase the ratio used to compare possible matches in addition to tweaking the peak and edge thresholds we increase the number of matching features:

I’ve randomized the line colors to make it a little easier to follow.

While the number of false positives have increased we are also getting some encouraging results, such as features of the cage matching each other. This improvement translates across the other test sets as well, yet each medium or style requires adjustments to the ratio/peak/edge settings. This makes sense as the composition of a Vincent van Gogh painting and an emblem etching are quite different.

Ultimately I would like to build a system where a visual search can be performed for elements of one work on a corpus of images. For example if we were interested in cage symbolism we could search a library of images and hopefully come up with something like this for each occurrence of a cage like element:

as opposed to

However even in this drastically different juxtaposition (cage and tulip) we get a false positive.

I put together a demo tool to define your own regions of these two emblems to compare with the other:

I limited it to only two emblems due to the server CPU required to perform the comparison, but the demo should communicate how it could be expanded to compare against a larger image corpus. SIFT comparison seems like it could be useful in building such a system but it would need a much more advance and custom implementation than the stock methods I applied in these examples.