Trying Out Google's Illuminate AI: From Paper to Podcast - July 20, 2024 - home

Background

I recently got into the beta for Google's new AI tool, Illuminate. It's a tool that takes a paper (currently only supports arXiv) and turns it into an 5-ish minute audio file in the style of a podcast. Now I wouldn't consider myself an avid paper reader, but I do enjoy listening to podcasts. So I thought I'd give it a try.

Google's Illuminate AI
Google's Illuminate AI Home Page

Experience

The home page already has a moderate selection of pre-generated podcasts mostly comprised of papers on machine learning and AI with a few books like The Great Gatsby and Alice's Adventures in Wonderland.

Nonetheless, I wanted to try my own selections out, which you can do through the Generate tab.

Generating is a pretty simple process, you give Illuminate your PDF's url, and after a couple of minutes, your audio file is ready. Additionally, you get 5 submissions a day which was fine for me just trying it out.

Google's Illuminate AI
Google's Illuminate AI Generate Page

Results

Paper 1: Computer Science Paper

C++ design patterns for low-latency applications including high-frequency trading
Generated audio file for Paper 1

Paper 2: Biology Paper

BRCA Gene Mutations in dbSNP:A Visual Exploration of Genetic Variants
Generated audio file for Paper 2

Paper 3: Physics Paper

Attaching Theories of Consciousness to Bohmian Quantum Mechanics
Generated audio file for Paper 3

Thoughts

Audio Style

I really enjoyed the podcast style Google chose for these summaries. The way the two hosts go back and forth and explain concepts to each other is quite engaging. It almost feels as if you are the one being explained to and this property is also part of the reason why I enjoy listening to (human) podcasts so much.

However, there are sometimes weird pauses, pronunciations (like Cplus in Paper 1), and vocal inflections that are a bit off. But I think these are minor issues that don't really impact the listening experience or detract from the content.

Content

The content of the summaries was pretty good and I felt they covered the main points of the papers while trying to explain them in a way that was easy to understand. This would probably be great tool for people who want to stay up to date with the latest research but don't have the time to read papers daily. In this way it feels reminiscent of products like Headway and Blinkist, but on demand.

Also, the summaries are quite high level and don't go into quite as much detail as the papers themselves. This is to be expected given the time constraints, but I think it would be nice to have a bit more depth in the summaries.

User Experience

Generating the audio is pretty straightforward so no complaints there. My main feedback here would be that I wish you could upload your own PDFs instead of having to provide an arXiv link. Not all the papers I read are on arXiv and limiting it to only papers means books, blog posts, etc. can't be summarized. I think this is a pretty simple thing to implement given they already have the examples of books on the home page.

Another element that would be nice to have control over is the length of the output. 5 minutes is quite short, especially for longer papers. I think having the ability to choose the length would allow for more detailed discussions; but I'm also not sure how this would work given context lengths and how they implemented the summarization.

Future Work

Seeing how natural the voices are makes me want this level of quality in interactions with current LLMs. I think it would be really cool to have a conversational AI that could explain any concept to you in a podcast style. Current LLMs seem to ramble on too much, or explain things in a way that's suited for text output, not a conversation. As the processing power and inference speed of these models increase, I think this is a very real possibility.

Similarly, I think it would be cool to have a feature where you could ask questions about the paper and have the AI answer them. Maybe this could be text prompts which generate new audio files with the answer included. Or maybe it could be a live chat feature where you can ask questions and hear them live. Regardless of the means, this would make it far better as a learning tool rather than an afternoon pastime.

Overall, I think this tool is a really cool beta and I'm excited to see if Google takes it further. or doesn't :(