Email Visualization

Visualizing 14,000 Released Epstein Emails

Nov 24 2025

Browsing the 20,000 Epstein documents released by the House Oversight Committee using tools people built such as this searchable interface or even network graph is interesting but still kind of opaque. I was thinking about what a browseable interface for thousands of emails would look like. If you have time as the x-axis and people as the y-axis you could "plot" all of them at once. I'm kind of obsessed with the idea of seeing everything at once but still being able to access the individual part. I took the emails released and built an interface based on this idea. I like it as an experiment to give a visible chronological aspect to a digital collection and a way to wrap your mind around the scope and extent of the documents. You can see the shape of communication between Epstein and people who wanted to talk to him and it gives you a way to slice into that query.

Data

Quick note on the data, yes there are 20K documents, but only around 5700 of them are emails. These emails are not digital data files, they are PICTURES of an email. So imagine doing a “print to pdf” of each gmail email and then taking a screen shot of that. Meaning these pictures are of email threads, if you do the work to take all those apart you end up with around 14,000 emails of which there are about 12,600 emails sent directly by Epstein or received by him. Let’s jump into the visualization and then the second part of the blog documents the process if you are interested.

Open in a new window (recommended, not meant for mobile either) or use the embedded version below or click the video link to see a video demo of the viz.

View video demo
Open in New Window

Process

As I mentioned above there are 20K files but not all of them are emails. To sort the thousands of documents I use the vision model Qwen3-VL quantized 8B runs locally on my 4090 nicely. I sent each image off and returned if it was an email or not and if not what kind of document. Those results can be seen here (###_results.json) which might be useful to someone else. Once I narrowed the document set down to about 5700 images I sent each one to Gemini 2.5 pro to extract data in this format:

{
  "sender": "Jeevacation",
  "senderEmail": "jeevacation@gmail.com",
  "senderGuess": "Jeffrey Epstein",
  "receiver": "Gmax",
  "receiverEmail": "gmax1@ellmax.com",
  "date": "2010-11-10-20-53",
  "subject": "Re:",
  "summary": "The sender sarcastically lists a wide variety of luxurious foods and...",
  "messageType": "Reply"
}

There were datasets already available of people using OCR on the released files to dump all the images to text. But I think using the original image retains the structure and provides an easier method of extracting the many nested emails rather than a jumble of OCR text.

If the name was totally redacted for a lot of the emails I asked the model to guess to the sender based on other context clues in the email, for example there are emails where the name is redacted but then in the signature it says "Sent from XYZ's iPhone!" so it can fill in the blank. Or sometimes on long email chains the initial email's name would be redacted but then nested messages from earlier in the conversation were unredacted so you can connect those emails together. Once in this form there was some cleanup of the names to map name variations and then connecting it back to the original documents published by the Oversight Committee on Dropbox.

Overall I think it is a nice experiment. I don't think it will be revealing anything new as the documents are pretty well combed though. But I do like the idea of being able to see the "shape" of the emails for each contact. When did contact start, how long, when were busy periods, etc. I was thinking of adding some timeline overlays, say major political events or other timelines to give a chronological context to what was happening in the world when the emails were being sent. Otherwise I think it is a decent approach to presenting a really opaque and hard to dissect dataset, something I think we'll be seeing more examples of in the future.

Code and data on GitHub.