We built an AI tool for our podcast obssession
TL;DR: re:verb listens to podcasts and emails us the key moments, takes and quotes—making sure we never miss a great podcast again. It’s fun, useful, and a glimpse into the multimodal future of AI. Try it out at reverb.email!
Every week, I want to listen to a couple dozen podcasts, but I only have time for a couple.
When I’m going to wash the dishes, I usually end up scanning recent episodes from my go-to podcasts and pop on the first one that looks interesting. But I know I’m missing so much this way—I’m told there’s more to the world than just the Ezra Klein Show.
In November, we built an AI tool that listens to episodes of the Joe Rogan podcast and analyzes them. That project got us thinking: Could we use what we learned to help us broaden our podcast diet—and maybe even learn a thing or two from the episodes we don’t have time to put on?
We teamed up with our fabulous friend Andrew McGill to find out. (This project absolutely wouldn’t have happened without Andrew—you can read his great writeup here)
What we ended up building together has changed the way I listen to podcasts. Instead of spending 20 minutes figuring out an episode is not for me, I can pick the ones I actually care about—and then I really pay attention. We’ve heard from some of our friends that it’s helped them make sense of their listening, too. So today, we’re making it available to anyone who wants to try it out!
What we built
re:verb emails you smart summaries of new podcast episodes as soon as they come out. You can check it out or sign up at reverb.email.
re:verb’s take on a recent Ezra Klein episode
You can think of them as useful alerts for your favorite podcasts: Every time a new episode is released, our little team of AI agents listens to it, takes notes, listens again to make sure it got everything right, and then sends you a newsletter that tells you about the episode. You’ll get a topline tl;dr, the top highlights from the episode, and the best quote.
Each newsletter is snappy—you can scan it in seconds—and if it grabs your interest, you can jump straight to the episode list in Apple Podcasts or Spotify to play or queue it up. Even if you don’t have time to listen, you might learn a quick something from the podcast highlights we email you.
One friend who listens to some 20 podcasts a week (!), said re:verb “has genuinely improved my overall podcast diet.” Another said it helps them pick the episodes they’ll listen to more carefully. We hope you’ll find it useful, too!
Our challenges along the way
We worked hard to build a system that listens deeply and thoroughly to an entire episode, but doesn’t get lost in the weeds when it’s telling you what happened—even if the episode is several hours long!
🤓 Nerd Mode: How it works
Here’s what happens when a new episode comes out:
- We have a large language model listen to the episode in 20-minute chunks, for a “pre-analysis” step. We’ve prompted the pre-analysis agents to listen really closely to their assigned chunk and take excellent notes for their senior colleague, the newsletter-writing agent.
- Once each of these parallel pre-analyses is done, we combine all the pre-analysis agents’ notes into one long cheat sheet and send it to our newsletter writer AI, which also listens to the whole episode again.
- The newsletter writer double-checks the characters, quotes, and other details, then it zooms way out to choose the most important bits of the podcast for the final newsletter.
- Once the newsletter is ready, the system sends it to every user who’s signed up for that podcast.
All of this happens automatically, around the clock. Each individual run takes a couple minutes, tops, and costs somewhere around 15 cents an episode.
We’re using Google’s Gemini 2.0 Flash model for both analysis steps, because it’s remarkably capable, cheap, and fast. (But we’re not stuck with Google: We can easily swap in a new model anytime, if another one proves to be a better choice later.)
We learned a lot from this project. The biggest lesson: Language models’ facility with multimedia is improving super fast.
Less than six months ago, when we built Roganbot, we had to transcribe every episode before analyzing it. This made things slow and expensive. But since then, a new class of large language models has become widely available: multimodal models that understand images, video, and audio natively, without any manipulation. Gemini is a multimodal model, so we can pass audio files to it directly, bringing costs and compute time way down.
Even though Gemini can ingest audio natively, though, it’s far from a perfectly attentive listener. We found that the model often pays extra-close attention to the beginning and end of a long audio clip, and sometimes leaves out important details from the middle—kind of like a distracted human listener! That’s why we came up with the 20-minute auto chunking method we described above: With these smaller audio files, Gemini stays on task.
Finally, the variety of podcasts we’ve included in this first round of Reverb—from straightforward reporting shows like The Daily to casual hangouts like the Kelce Brothers’ podcast—made it challenging to get re:verb’s newsletter-writing voice just right. Language models are easy to sway, and we had to work hard to keep Gemini from adopting the voice and even the point of view of the episode it’s synthesizing.
What this all means
Multimodal models let us process and analyze video and audio in ways we never could before, which means broadcasters and podcasters can unlock and reuse their archives in new ways—or give each segment new life by synthesizing, summarizing, or repackaging it.
We’re thinking about how this all changes the future of audio and video journalism—but we’re also thinking about how to make re:verb better, such as by adding in more podcasts, and perhaps designing a weekly episode summary for less podcast-hungry users.
We’d love for you to try it out. Sign up at reverb.email! If you’ve got ideas, feedback, or want to work with us, you can reach us at [email protected].