LC & Flickr Commons

Intro

Flickr Commons is a program to bring the visual collections of cultural heritage organizations to new audiences. Getting these resources in front of people where they are online as opposed to being siloed in their own website or not online at all. It was a pretty ground breaking project, the Library of Congress was the first participant with over 40,000 photos now on Flickr. The program continues today under the Flickr Foundation. Starting in 2008 there is a lot of information about the project, this webcast, a project report, and a 2024 impact report. While the project predates my time at the library by a decade and I have nothing to do with these collections with my job at LC I was really compelled by having potentially 17 years of data about interactions between the public and these materials. This post is going to analyze and visualize that data.

Data + Code

I’ll be using data from the public Flickr API that I harvested back in 2024 (I unreliably work on too many personal projects for years and then eventually something will cause me to finish one, like being furloughed in a US government shutdown). So this is all public data, and the code I use to do everything on this page can be found in this Github repo.

Interactions

The comments are the big thing with this project. They are the largest interaction surface between the public and the photos. With over 95,000 comments made on the photos over the 17 years there are a lot of questions in my mind as to what people are saying. To organize them I built embeddings for all 95K comments using the Google Gemini gemini-embedding-001 model. This produces a 3072-dimensional vector for each comment which I then reduced to two-dimensional space and ran some clustering over them to build communities of comments. I then sent a random sample of each off to a LLM to classify them into a group based on the actual text of the comments. Here are the communities:

Category ID Count Category Label ================================================================================ 0 4464 Historical Context and Identification with Links 1 2782 Aesthetic Praise and Admiration 2 1973 Biographical & Historical Wikimedia Contributions 3 3765 Factual Historical Annotation 4 5553 Historical Detail Identification and Contextualization 5 678 Explore Congratulations 6 595 See Also 7 1402 Historical Subject Identification and Details 8 2529 Location and Status Verification 9 202 Aesthetic Feedback 10 4754 LC Staff Thanks for Metadata Improvement 11 1204 Non-English Compliments 12 629 Flickr Group Invitations 13 1770 Crowdsourced Historical Data Refinement 14 4230 Historical Performing Artist Biographical Documentation 15 3469 Sourced Historical Details and Context 16 335 Flickr Group Invitations 17 1687 Flickr Group Invitations 18 3315 Location Verification and Contemporary Comparison 19 1054 Cross-referencing and Linked Information 20 466 External Content Feature Notification 21 6199 Observations on Period Appearance 22 2806 LC Staff Thanks for Contributions 23 954 Wikidata Zone 💪 24 2571 Factual Correction and Archival Enhancement 25 1620 Historical Factual Identification and Context 26 2130 Historical Baseball Identification and Contextualization 27 296 Flickr Group Invitations 28 3402 Factual contributions and corrections 29 4184 Factual Identification and Historical Context 30 661 Group Invitations 31 270 Identification and Biographical Information of Historical Figures 32 7373 Historical Photo Annotation 33 5265 Biographical and Genealogical Identification 34 372 Flickr Group Invitations 35 5487 Praise 36 4659 Historical Annotation 37 223 Compliments ================================================================================ Total: 95328

I then created a visualization to display all of these comments. Once loaded you can hover over the comment to see the text, if you click it will take you to that comment on Flickr. Probably does not work well on mobile. Explore the communites below or Open in new tab

Photo	Comments
Woman aircraft worker, Vega Aircraft Corporation, Burbank, Calif. (LOC)	415
[Irish spinner and spinning wheel. County Galway, Ireland] (LOC)	412
Negro boy near Cincinnati, Ohio (LOC)	410
Destitute pea pickers in California. Mother of seven children (LOC)	341
Operating a hand drill at Vultee-Nashville, woman is working on a "Vengeance" dive bomber (LOC)	257
Worker at carbon black plant, Sunray, Texas (LOC)	208
This girl in a glass house is putting finishing touches on the bombardier nose section (LOC)	177
[Svartisen, Nordland, Norway] (LOC)	168
[Bondhus glacier and lake, Hardanger Fjord, Handanger, Norway] (LOC)	162
Poster for a side show at the Vermont state fair, Rutland (LOC)	154

Photo

Comments

Woman aircraft worker, Vega Aircraft Corporation, Burbank, Calif. (LOC)

415

[Irish spinner and spinning wheel. County Galway, Ireland] (LOC)

412

Negro boy near Cincinnati, Ohio (LOC)

410

Destitute pea pickers in California. Mother of seven children (LOC)

341

Operating a hand drill at Vultee-Nashville, woman is working on a "Vengeance" dive bomber (LOC)

257

Worker at carbon black plant, Sunray, Texas (LOC)

208

This girl in a glass house is putting finishing touches on the bombardier nose section (LOC)

177

[Svartisen, Nordland, Norway] (LOC)

168

[Bondhus glacier and lake, Hardanger Fjord, Handanger, Norway] (LOC)

162

Poster for a side show at the Vermont state fair, Rutland (LOC)

154

Photo	Comments
[Germany Schaefer, Washington AL (baseball)] (LOC)	148
Chicago, Illinois. In the waiting room of the Union Station (LOC)	143
Carpenter at work on Douglas Dam, Tennessee (TVA) (LOC)	143
A carpenter at the TVA's new Douglas dam on the French Broad River (LOC)	128
Col. Villa (LOC)	125
[Kongen og Dronningen, Bispen, Norway] (LOC)	117
[The Tivoli park, Copenhagen, Denmark] (LOC)	117
Crane operator at TVA's Douglas Dam, Tennessee (LOC)	117
Women at work on bomber, Douglas Aircraft Company, Long Beach, Calif. (LOC)	115
Women are trained to do precise and vital engine installation work (LOC)	114
[Arcade, Rotterdam, Holland] (LOC)	110
[Grand Grocery Co.], Lincoln, Neb. (LOC)	110
[Copenhagen, Helsingborg, Sweden] (LOC)	108
G. Washington's teeth (LOC)	106
[Abraham Lincoln, Congressman-elect from Illinois. Three-quarter length portrait] (LOC)	102
Tank driver, Ft. Knox, Ky. (LOC)	93
General view of one of the classification yards of the Chicago and Northwestern Railroad (LOC)	90
[General view, Ålesund, Norway] (LOC)	86
House, Houston, Texas (LOC)	86
[Display of home-canned food] (LOC)	84
Tank commander, Ft. Knox, Ky. (LOC)	83
[Blue grotto, Capri Island, Italy] (LOC)	81
Bayou Bourbeau plantation, a Farm Security Administration co-operative (LOC)	79
[Fantoft Church, Bergen, Norway] (LOC)	76
Children in the tenement district, Brockton, Mass. (LOC)	75
Woman putting on her lipstick in a park with Union Station behind (LOC)	74
"Backstage" at the "girlie" show at the Vermont state fair, Rutland (LOC)	74
Dr. Schreiber of San Augustine giving a typhoid innoculation (LOC)	72
[Rope Bridge, Carrick-a-Rede. County Antrim, Ireland] (LOC)	68
Japanese-American camp, war emergency evacuation, [Tule Lake Relocation Center] (LOC)	67
[Gols Church, with Hovenstuen and Staburet, Christiania, Norway] (LOC)	63
Burgess (LOC)	62
Smoke stacks (LOC)	61
Shepherd with his horse and dog on Gravelly Range, Madison County, Mont. (LOC)	61
Children gathering potatoes on a large farm, vicinity of Caribou, Aroostook County, Me. (LOC)	60
[Glenariff. County Antrim, Ireland] (LOC)	56
[Portrait of Billie Holiday, Downbeat, New York, N.Y., ca. Feb. 1947] (LOC)	55
[Swallow Falls II, Fairy Glen, Bettws-y-Coed (i.e. Betws), Wales] (LOC)	54
Summer scene, N.Y. (LOC)	54
[Electric phosphate smelting furnace used in the making of elemental phosphorous] (LOC)	51

Photo

Comments

[Germany Schaefer, Washington AL (baseball)] (LOC)

148

Chicago, Illinois. In the waiting room of the Union Station (LOC)

143

Carpenter at work on Douglas Dam, Tennessee (TVA) (LOC)

143

A carpenter at the TVA's new Douglas dam on the French Broad River (LOC)

128

Col. Villa (LOC)

125

[Kongen og Dronningen, Bispen, Norway] (LOC)

117

[The Tivoli park, Copenhagen, Denmark] (LOC)

117

Crane operator at TVA's Douglas Dam, Tennessee (LOC)

117

Women at work on bomber, Douglas Aircraft Company, Long Beach, Calif. (LOC)

115

Women are trained to do precise and vital engine installation work (LOC)

114

[Arcade, Rotterdam, Holland] (LOC)

110

[Grand Grocery Co.], Lincoln, Neb. (LOC)

110

[Copenhagen, Helsingborg, Sweden] (LOC)

108

G. Washington's teeth (LOC)

106

[Abraham Lincoln, Congressman-elect from Illinois. Three-quarter length portrait] (LOC)

102

Tank driver, Ft. Knox, Ky. (LOC)

General view of one of the classification yards of the Chicago and Northwestern Railroad (LOC)

[General view, Ålesund, Norway] (LOC)

House, Houston, Texas (LOC)

[Display of home-canned food] (LOC)

Tank commander, Ft. Knox, Ky. (LOC)

[Blue grotto, Capri Island, Italy] (LOC)

Bayou Bourbeau plantation, a Farm Security Administration co-operative (LOC)

[Fantoft Church, Bergen, Norway] (LOC)

Children in the tenement district, Brockton, Mass. (LOC)

Woman putting on her lipstick in a park with Union Station behind (LOC)

"Backstage" at the "girlie" show at the Vermont state fair, Rutland (LOC)

Dr. Schreiber of San Augustine giving a typhoid innoculation (LOC)

[Rope Bridge, Carrick-a-Rede. County Antrim, Ireland] (LOC)

Japanese-American camp, war emergency evacuation, [Tule Lake Relocation Center] (LOC)

[Gols Church, with Hovenstuen and Staburet, Christiania, Norway] (LOC)

Burgess (LOC)

Smoke stacks (LOC)

Shepherd with his horse and dog on Gravelly Range, Madison County, Mont. (LOC)

Children gathering potatoes on a large farm, vicinity of Caribou, Aroostook County, Me. (LOC)

[Glenariff. County Antrim, Ireland] (LOC)

[Portrait of Billie Holiday, Downbeat, New York, N.Y., ca. Feb. 1947] (LOC)

[Swallow Falls II, Fairy Glen, Bettws-y-Coed (i.e. Betws), Wales] (LOC)

Summer scene, N.Y. (LOC)

[Electric phosphate smelting furnace used in the making of elemental phosphorous] (LOC)

Pareto Principle

There are other types of interactions like Tagging, users adding a folksonomy and Notes which are commenting on a region in the image. I was curious if the “80/20” rule applied to all types of user interaction. The idea is basically that 80 percent of the work/effort/whatever of something comes from 20 percent of the people involved. I always associated this metric towards crowdsourcing projects since I learned about it during my NYPL Labs days with all of our crowdsourcing projects. It is formally called the Pareto principle and before I ran the numbers I knew for certain it would be true for this data, because it always is for crowdsourcing but would it be 20% or smaller? Here is an interface to explore the data:

The data shows for comments it is only 11% of users that made 80% of all comments. For Notes it is a little larger with 14% making most of them but for Tags it is extreme with 1% of users responsible for 80% of all tags. The Tags figure is skewed by one single user doing over 70,000 tags and I’m not sure if this was an automated process. Like all crowdsourcing projects these interactions follow the trend of a small core group of individuals really passionate about the material.

Tags vs Subjects

Each LC photo on Flickr actually has a MARC record created for it in the LC catalog, which is pretty amazing. Of course there are a lot of useful things that can be in a MARC record including things like Subjects Headings. I wanted to explore two areas using subjects (LCSH) one being the popularity of subjects and the other the relationship between Flickr Tags and the subject headings.

The first is popularity of particular subjects, popularity meaning what LCSH headings had the most interaction on? To do this I just pulled the topical subfields ($a) for each photo and added up the interactions. This interfaces lets you explore that data, it also includes popularity based on subcollection:

Open in new tab

I did not see any obvious patterns, I think it mostly falls around collection strength, there are a lot of World War photographs, so that is going to be a popular topic. Likewise for the collection popularity, the majority of the photos comes from the Bain Collection so that will have the most interactions but it is still interesting to see them listed out sorted by interactions.

The other area with subject headings I thought would be interesting is Folksonomy vs Vocabulary. The LCSH headings come from the photo record and the Tags come from Flickr users. I was curious if there are any patterns between what LCSH headings would be assigned vs what the Flickr users are adding to the images.

Open in new tab

Let’s look at one example photos with the LCSH headings “Miniature golf”

Miniature golf

You can see the authorized headings “Miniature golf” and it has one variant “Golf, Miniature” compared to the Tags Flickr users have added. There are a couple useful variants added such as “mini golf” and “put-put” that would probably make good additions to the LCSH variant labels. But most of the Tags are image specific taggings identifying mini golf locations or features. Some of the tags like Dinosaurs do have LCSH equivalents and were added to the MARC record. Here is a list of all LCSH headings added to the miniature golf images:

$a Log cabins $y 1980-1990.
$a Dinosaurs $y 1990-2000.
$a Lighthouses $y 2000-2010.
$a Miniature golf $y 1980-1990.
$a Sharks $y 1980-1990.
$a Lighthouses $y 1970-1980.
$a Sculpture $y 1980-1990.
$a Horses.
$a Tipis $y 1980-1990.
$a Octopuses $y 1980-1990.
$a Pagodas $y 1980-1990.
$a Pirates $y 1980-1990.
$a Lighthouses $y 1990-2000.
$a Lighthouses $y 1980-1990.
$a Miniature golf $y 2000-2010.
$a Merry-go-rounds $y 1980-1990.
$a Totem poles $y 1980-1990.
$a Sculpture $y 1970-1980.
$a Miniature golf $y 1970-1980.
$a Skulls $y 1980-1990.
$a Trading posts $y 1980-1990.
$a Signs (Notices) $y 1980-1990.
$a Rabbits $y 1980-1990.
$a Signs (Notices) $y 1990.
$a Obelisks $y 1980-1990.
$a Windmills $y 1970-1980.
$a Dinosaurs $y 1980-1990.
$a Roller coasters $y 1980-1990.
$a Signs (Notices) $y 2000-2010.
$a Tipis $y 1970-1980.
$a Miniature golf $y 1990.
$a Drive-in theaters $y 1980-1990.
$a Sculpture $y 1990-2000.
$a Castles & palaces $y 1980-1990.
$a Dinosaurs $y 1970-1980.
$a Houses $y 1980-1990.
$a Lighthouses $y 1990.
$a Swine $y 1980-1990.
$a Waiters $y 1990.
$a Shoes $y 1980-1990.
$a Frogs $y 1980-1990.
$a Barns $y 1980-1990.
$a Windmills $y 2000-2010.
$a Signs (Notices) $y 1970-1980.
$a Miniature golf $y 1990-2000.
$a Whales $y 1980-1990.
$a Windmills $y 1980-1990.

These are all the unique headings, each photo would have at least an $a Miniature golf and a $y and then some of them would also have the descriptive feature $a like Frogs or Windmills. LCSH does provide the ability to do this more fine grain topical assignment and the cataloger has utilized it. So in that case I don’t see an advantage of the Folksonomy usage there. But for some of the Name based Tags identifying specific golf courses it is nice because a Name Authority would not be created for those within the library ecosystem so being able to tag entities ad-hoc without the overhead of authority control. Again nothing super surprising here, Folksonomies work well for some things but it really comes down to the application of the correct system and how consistent it is applied.

Wiki* Integration

The last important point I wanted to look into and explore is how these images fit into the internet’s knowledge system, specifically Wikimedia projects. If you explore the comment graph above you will notice a lot of links to Wikipedia and Wikidata. Those projects allow users to immediately act upon information they are creating interacting with the photos. A lot of comments are things like “I just created a Wikipedia article for them” resulting in the collection seeding information into the internet’s knowledge graph. It also helps that these images are public domain and are being ingested into Wikicommons allowing them to be included in any Wikipedia or Wikidata resources created. But in my mind it provides a huge benefit and boost to the user’s ability to contribute back if the resources are tied to Wiki* projects as much as possible.

I wanted to try to explore how these images fit into the Wiki ecosystem so I took all the comments that had a Wikipedia or Wikidata link in them and built a large navigable graph interface to display the photos and the Wiki* resources they are attached to.

You can watch this video of me interacting with the graph or explore it yourself:

Demo Video

Matt Miller

Library of Congress & Flickr Commons

Analysis of user interactions on 40,000 images

Oct 6 2025

Intro

Data + Code

Interactions

Comments

Pareto Principle

Tags vs Subjects

Wiki* Integration