Library of Congress & Flickr Commons

Analysis of user interactions on 40,000 images

Oct 6 2025

Intro

Flickr Commons is a program to bring the visual collections of cultural heritage organizations to new audiences. Getting these resources in front of people where they are online as opposed to being siloed in their own website or not online at all. It was a pretty ground breaking project, the Library of Congress was the first participant with over 40,000 photos now on Flickr. The program continues today under the Flickr Foundation. Starting in 2008 there is a lot of information about the project, this webcast, a project report, and a 2024 impact report. While the project predates my time at the library by a decade and I have nothing to do with these collections with my job at LC I was really compelled by having potentially 17 years of data about interactions between the public and these materials. This post is going to analyze and visualize that data.

Data + Code

I’ll be using data from the public Flickr API that I harvested back in 2024 (I unreliably work on too many personal projects for years and then eventually something will cause me to finish one, like being furloughed in a US government shutdown). So this is all public data, and the code I use to do everything on this page can be found in this Github repo.

Interactions

The comments are the big thing with this project. They are the largest interaction surface between the public and the photos. With over 95,000 comments made on the photos over the 17 years there are a lot of questions in my mind as to what people are saying. To organize them I built embeddings for all 95K comments using the Google Gemini gemini-embedding-001 model. This produces a 3072-dimensional vector for each comment which I then reduced to two-dimensional space and ran some clustering over them to build communities of comments. I then sent a random sample of each off to a LLM to classify them into a group based on the actual text of the comments. Here are the communities:

Category ID  Count      Category Label
================================================================================
0            4464       Historical Context and Identification with Links
1            2782       Aesthetic Praise and Admiration
2            1973       Biographical & Historical Wikimedia Contributions
3            3765       Factual Historical Annotation
4            5553       Historical Detail Identification and Contextualization
5            678        Explore Congratulations
6            595        See Also
7            1402       Historical Subject Identification and Details
8            2529       Location and Status Verification
9            202        Aesthetic Feedback
10           4754       LC Staff Thanks for Metadata Improvement
11           1204       Non-English Compliments
12           629        Flickr Group Invitations
13           1770       Crowdsourced Historical Data Refinement
14           4230       Historical Performing Artist Biographical Documentation
15           3469       Sourced Historical Details and Context
16           335        Flickr Group Invitations
17           1687       Flickr Group Invitations
18           3315       Location Verification and Contemporary Comparison
19           1054       Cross-referencing and Linked Information
20           466        External Content Feature Notification
21           6199       Observations on Period Appearance
22           2806       LC Staff Thanks for Contributions
23           954        Wikidata Zone đź’Ş
24           2571       Factual Correction and Archival Enhancement
25           1620       Historical Factual Identification and Context
26           2130       Historical Baseball Identification and Contextualization
27           296        Flickr Group Invitations
28           3402       Factual contributions and corrections
29           4184       Factual Identification and Historical Context
30           661        Group Invitations
31           270        Identification and Biographical Information of Historical Figures
32           7373       Historical Photo Annotation
33           5265       Biographical and Genealogical Identification
34           372        Flickr Group Invitations
35           5487       Praise
36           4659       Historical Annotation
37           223        Compliments
================================================================================
Total:       95328    

I then created a visualization to display all of these comments. Once loaded you can hover over the comment to see the text, if you click it will take you to that comment on Flickr. Probably does not work well on mobile. Explore the communites below or Open in new tab

The comments aggregate into interesting communities, the majority of them being related to contributing some information about the photo. That can be in the form of context, facts, “See also” links, external links to Wikimedia ecosystem and others. There are also large communities of comments praising the photo, or making related aesthetic comments in addition to more meta Flickr comments.

I really love that there are some hyper specialized communities within the broader group like the red cluster in the lower left of the graph of 2000 comments about “Historical Baseball Identification and Contextualization.” So just a community of comments adding info about baseball related images.

Another one that caught my attention is the “Location and Status Verification” cluster that are comments on typically a John Margolies photograph commenting that what is depicted is still there. Or better yet a set or GPS coordinates or a link to a Google Maps page geolocating the image in the current world.

I took all of these comments, extracted the geo information from them and made a little site to browse all of the photos that had location comments added with the ability to view in google maps street view:

Open in new tab

Here is a demo video of the site:

Comments

Overall there were a lot of comments made on these photos on the Flickr site. Here is a histogram for the whole 17 years:

Here is a list of the top 50 photos with the most comments:

Photo Comments
Woman aircraft worker, Vega Aircraft Corporation, Burbank, Calif. (LOC) 415
[Irish spinner and spinning wheel. County Galway, Ireland] (LOC) 412
Negro boy near Cincinnati, Ohio (LOC) 410
Destitute pea pickers in California. Mother of seven children (LOC) 341
Operating a hand drill at Vultee-Nashville, woman is working on a "Vengeance" dive bomber (LOC) 257
Worker at carbon black plant, Sunray, Texas (LOC) 208
This girl in a glass house is putting finishing touches on the bombardier nose section (LOC) 177
[Svartisen, Nordland, Norway] (LOC) 168
[Bondhus glacier and lake, Hardanger Fjord, Handanger, Norway] (LOC) 162
Poster for a side show at the Vermont state fair, Rutland (LOC) 154
Show remaining 40 photos (11-50)
Photo Comments
[Germany Schaefer, Washington AL (baseball)] (LOC) 148
Chicago, Illinois. In the waiting room of the Union Station (LOC) 143
Carpenter at work on Douglas Dam, Tennessee (TVA) (LOC) 143
A carpenter at the TVA's new Douglas dam on the French Broad River (LOC) 128
Col. Villa (LOC) 125
[Kongen og Dronningen, Bispen, Norway] (LOC) 117
[The Tivoli park, Copenhagen, Denmark] (LOC) 117
Crane operator at TVA's Douglas Dam, Tennessee (LOC) 117
Women at work on bomber, Douglas Aircraft Company, Long Beach, Calif. (LOC) 115
Women are trained to do precise and vital engine installation work (LOC) 114
[Arcade, Rotterdam, Holland] (LOC) 110
[Grand Grocery Co.], Lincoln, Neb. (LOC) 110
[Copenhagen, Helsingborg, Sweden] (LOC) 108
G. Washington's teeth (LOC) 106
[Abraham Lincoln, Congressman-elect from Illinois. Three-quarter length portrait] (LOC) 102
Tank driver, Ft. Knox, Ky. (LOC) 93
General view of one of the classification yards of the Chicago and Northwestern Railroad (LOC) 90
[General view, Ă…lesund, Norway] (LOC) 86
House, Houston, Texas (LOC) 86
[Display of home-canned food] (LOC) 84
Tank commander, Ft. Knox, Ky. (LOC) 83
[Blue grotto, Capri Island, Italy] (LOC) 81
Bayou Bourbeau plantation, a Farm Security Administration co-operative (LOC) 79
[Fantoft Church, Bergen, Norway] (LOC) 76
Children in the tenement district, Brockton, Mass. (LOC) 75
Woman putting on her lipstick in a park with Union Station behind (LOC) 74
"Backstage" at the "girlie" show at the Vermont state fair, Rutland (LOC) 74
Dr. Schreiber of San Augustine giving a typhoid innoculation (LOC) 72
[Rope Bridge, Carrick-a-Rede. County Antrim, Ireland] (LOC) 68
Japanese-American camp, war emergency evacuation, [Tule Lake Relocation Center] (LOC) 67
[Gols Church, with Hovenstuen and Staburet, Christiania, Norway] (LOC) 63
Burgess (LOC) 62
Smoke stacks (LOC) 61
Shepherd with his horse and dog on Gravelly Range, Madison County, Mont. (LOC) 61
Children gathering potatoes on a large farm, vicinity of Caribou, Aroostook County, Me. (LOC) 60
[Glenariff. County Antrim, Ireland] (LOC) 56
[Portrait of Billie Holiday, Downbeat, New York, N.Y., ca. Feb. 1947] (LOC) 55
[Swallow Falls II, Fairy Glen, Bettws-y-Coed (i.e. Betws), Wales] (LOC) 54
Summer scene, N.Y. (LOC) 54
[Electric phosphate smelting furnace used in the making of elemental phosphorous] (LOC) 51

Pareto Principle

There are other types of interactions like Tagging, users adding a folksonomy and Notes which are commenting on a region in the image. I was curious if the “80/20” rule applied to all types of user interaction. The idea is basically that 80 percent of the work/effort/whatever of something comes from 20 percent of the people involved. I always associated this metric towards crowdsourcing projects since I learned about it during my NYPL Labs days with all of our crowdsourcing projects. It is formally called the Pareto principle and before I ran the numbers I knew for certain it would be true for this data, because it always is for crowdsourcing but would it be 20% or smaller? Here is an interface to explore the data:

The data shows for comments it is only 11% of users that made 80% of all comments. For Notes it is a little larger with 14% making most of them but for Tags it is extreme with 1% of users responsible for 80% of all tags. The Tags figure is skewed by one single user doing over 70,000 tags and I’m not sure if this was an automated process. Like all crowdsourcing projects these interactions follow the trend of a small core group of individuals really passionate about the material.

Tags vs Subjects

Each LC photo on Flickr actually has a MARC record created for it in the LC catalog, which is pretty amazing. Of course there are a lot of useful things that can be in a MARC record including things like Subjects Headings. I wanted to explore two areas using subjects (LCSH) one being the popularity of subjects and the other the relationship between Flickr Tags and the subject headings.

The first is popularity of particular subjects, popularity meaning what LCSH headings had the most interaction on? To do this I just pulled the topical subfields ($a) for each photo and added up the interactions. This interfaces lets you explore that data, it also includes popularity based on subcollection:

Open in new tab

I did not see any obvious patterns, I think it mostly falls around collection strength, there are a lot of World War photographs, so that is going to be a popular topic. Likewise for the collection popularity, the majority of the photos comes from the Bain Collection so that will have the most interactions but it is still interesting to see them listed out sorted by interactions.

The other area with subject headings I thought would be interesting is Folksonomy vs Vocabulary. The LCSH headings come from the photo record and the Tags come from Flickr users. I was curious if there are any patterns between what LCSH headings would be assigned vs what the Flickr users are adding to the images.

Open in new tab

Let’s look at one example photos with the LCSH headings “Miniature golf”

Miniature golf

You can see the authorized headings “Miniature golf” and it has one variant “Golf, Miniature” compared to the Tags Flickr users have added. There are a couple useful variants added such as “mini golf” and “put-put” that would probably make good additions to the LCSH variant labels. But most of the Tags are image specific taggings identifying mini golf locations or features. Some of the tags like Dinosaurs do have LCSH equivalents and were added to the MARC record. Here is a list of all LCSH headings added to the miniature golf images:

  • $a Log cabins $y 1980-1990.
  • $a Dinosaurs $y 1990-2000.
  • $a Lighthouses $y 2000-2010.
  • $a Miniature golf $y 1980-1990.
  • $a Sharks $y 1980-1990.
  • $a Lighthouses $y 1970-1980.
  • $a Sculpture $y 1980-1990.
  • $a Horses.
  • $a Tipis $y 1980-1990.
  • $a Octopuses $y 1980-1990.
  • $a Pagodas $y 1980-1990.
  • $a Pirates $y 1980-1990.
  • $a Lighthouses $y 1990-2000.
  • $a Lighthouses $y 1980-1990.
  • $a Miniature golf $y 2000-2010.
  • $a Merry-go-rounds $y 1980-1990.
  • $a Totem poles $y 1980-1990.
  • $a Sculpture $y 1970-1980.
  • $a Miniature golf $y 1970-1980.
  • $a Skulls $y 1980-1990.
  • $a Trading posts $y 1980-1990.
  • $a Signs (Notices) $y 1980-1990.
  • $a Rabbits $y 1980-1990.
  • $a Signs (Notices) $y 1990.
  • $a Obelisks $y 1980-1990.
  • $a Windmills $y 1970-1980.
  • $a Dinosaurs $y 1980-1990.
  • $a Roller coasters $y 1980-1990.
  • $a Signs (Notices) $y 2000-2010.
  • $a Tipis $y 1970-1980.
  • $a Miniature golf $y 1990.
  • $a Drive-in theaters $y 1980-1990.
  • $a Sculpture $y 1990-2000.
  • $a Castles & palaces $y 1980-1990.
  • $a Dinosaurs $y 1970-1980.
  • $a Houses $y 1980-1990.
  • $a Lighthouses $y 1990.
  • $a Swine $y 1980-1990.
  • $a Waiters $y 1990.
  • $a Shoes $y 1980-1990.
  • $a Frogs $y 1980-1990.
  • $a Barns $y 1980-1990.
  • $a Windmills $y 2000-2010.
  • $a Signs (Notices) $y 1970-1980.
  • $a Miniature golf $y 1990-2000.
  • $a Whales $y 1980-1990.
  • $a Windmills $y 1980-1990.

These are all the unique headings, each photo would have at least an $a Miniature golf and a $y and then some of them would also have the descriptive feature $a like Frogs or Windmills. LCSH does provide the ability to do this more fine grain topical assignment and the cataloger has utilized it. So in that case I don’t see an advantage of the Folksonomy usage there. But for some of the Name based Tags identifying specific golf courses it is nice because a Name Authority would not be created for those within the library ecosystem so being able to tag entities ad-hoc without the overhead of authority control. Again nothing super surprising here, Folksonomies work well for some things but it really comes down to the application of the correct system and how consistent it is applied.

Wiki* Integration

The last important point I wanted to look into and explore is how these images fit into the internet’s knowledge system, specifically Wikimedia projects. If you explore the comment graph above you will notice a lot of links to Wikipedia and Wikidata. Those projects allow users to immediately act upon information they are creating interacting with the photos. A lot of comments are things like “I just created a Wikipedia article for them” resulting in the collection seeding information into the internet’s knowledge graph. It also helps that these images are public domain and are being ingested into Wikicommons allowing them to be included in any Wikipedia or Wikidata resources created. But in my mind it provides a huge benefit and boost to the user’s ability to contribute back if the resources are tied to Wiki* projects as much as possible.

I wanted to try to explore how these images fit into the Wiki ecosystem so I took all the comments that had a Wikipedia or Wikidata link in them and built a large navigable graph interface to display the photos and the Wiki* resources they are attached to.

You can watch this video of me interacting with the graph or explore it yourself:

Demo Video

Open in new tab

The images are populated and then the Wiki* links are resolved and those entities are also added to the graph. So it is building a large interconnected network based on what the Flickr user thought was important to tag in the image. I don’t really think of this as a good interface but I wanted to show the magnitude of information and how empowered users are able to place resources into that context. This graph was built only based on Flickr users adding Wiki* links to their comments and just represent around a quarter of the total images.

That brings us to the end of what I looked at with this data. I think this is a really interesting project dataset because of its longevity and the range of user activities it encompasses.