SpeechPrune: Context-aware Token Pruning
for Speech Information Retrieval

Yueqian Lin1*, Yuzhe Fu1*, Jingyang Zhang1, Yudong Liu1
Jianyi Zhang1, Jingwei Sun1, Hai "Helen" Li1, Yiran Chen1
*Equal Contribution
1Duke University

Abstract

We introduce Speech Information Retrieval (SIR), a new long-context task for Speech Large Language Models (Speech LLMs), and present SPIRAL, a 1,012-sample benchmark testing models' ability to extract critical details from approximately 90-second spoken inputs. While current Speech LLMs excel at short-form tasks, they struggle with the computational and representational demands of longer audio sequences. To address this limitation, we propose SpeechPrune, a training-free token pruning strategy that uses speech-text similarity and approximated attention scores to efficiently discard irrelevant tokens. In SPIRAL, SpeechPrune achieves accuracy improvements of 29% and up to 47% over the original model and the random pruning model at a pruning rate of 20%, respectively. SpeechPrune can maintain network performance even at a pruning level of 80%. This approach highlights the potential of token-level pruning for efficient and scalable long-form speech understanding.

Method

We propose a training-free token pruning strategy, SpeechPrune, that uses speech-text similarity and approximated attention scores to efficiently prune irrelevant tokens.

Method

Results

We evaluate the performance of SpeechPrune on the SPIRAL dataset. The main results are shown below.

Method PR TF ↓ PT ↓ TM ↓ SA ↓ SPIRAL ↑ SPIRAL-H ↑
Original - 12.2 779 13.40 0.19 60.38% 0%
RAP 0.2 10.06 662 13.32 0.15 42.49% 21.45%
RAC 65.71% 48.13%
Ours 89.23% 81.64%
RAP 0.4 7.93 511 13.24 0.11 42.89% 22.19%
RAC 62.45% 41.90%
Ours 85.97% 76.43%
RAP 0.6 5.79 419 13.17 0.07 42.39% 21.45%
RAC 58.20% 35.41%
Ours 75.89% 63.77%
RAP 0.8 3.66 278 13.09 0.04 45.26% 23.19%
RAC 55.83% 33.67%
Ours 62.45% 46.15%

Samples from SPIRAL

Example Entry 1 - Single-Speaker Lecture

metadata audio sample transcript test question
main topic: "Performing Arts"
subtopic: "Opera"
transcript type: "lecture"
id: "lecture_3"
Full Audio

Key Sentence
spk_1: "Opera, as we know, is a rich blend of music, drama, and visual arts, all coming together on one stage. It's fascinating to think about how it evolved over centuries. You might be familiar with the likes of Verdi and Wagner, but did you know that the world's longest opera lasts about 18 hours? That's Wagner's 'Der Ring des Nibelungen'. But let's dive into something a bit more recent and, uh, perhaps less known. Just last month, an opera called 'The Arctic Light' premiered in Norway, and it featured a choir made up entirely of holograms. This innovative approach not only saves on production costs but also allows for performances in places where it's logistically challenging to gather large numbers of singers. It's a fascinating time for opera, as technology begins to play a more significant role in how stories are told. Now, while traditionalists might resist these changes, many argue that they open up new possibilities for creative expression. And, um, it's this blend of history and modernity that makes opera such an exciting field within the performing arts today."
Q: "What innovative feature did the opera 'The Arctic Light' showcase in its performance?"
A. A holographic choir
B. A 24-hour performance
C. A rotating stage
D. An all-animal cast

Example Entry 2 - Multi-Speaker Meeting

metadata audio sample transcript test question
main topic: "Pharmaceutical"
subtopic: "Pharmacovigilance"
transcript type: "meeting"
id: "meeting_3"
Full Audio

Key Sentence
spk_1: "Let's dive into today's topic on pharmacovigilance. I recently attended a conference where they mentioned that the FDA receives over two million adverse drug event reports each year."

spk_2: "That's a staggering number! But, how do they manage to analyze all those reports effectively? I mean, it must require a lot of resources and technology."

spk_3: "Exactly, and actually, they use a system called FAERS, the FDA Adverse Event Reporting System. It's crucial for signal detection and helps identify potential safety concerns."

spk_4: "Interesting. Did you know that just last month, they implemented a new AI tool to improve the accuracy of these analyses? It's supposed to significantly cut down processing time."

spk_2: "Oh, I wasn't aware of that. It's fascinating how AI is being integrated into healthcare. Does anyone know if it's shown any results yet?"

spk_1: "Well, I spoke with someone from the FDA who mentioned that it's already reduced false positives by 15% since its implementation. It's promising but still in the early stages."

spk_3: "That's promising indeed. It goes to show that technology is really transforming pharmacovigilance. We should definitely keep an eye on how this develops."

spk_5: "Absolutely, and it might be worth considering how we can incorporate similar tools in our processes. Staying updated with these advancements is key for us."
Q: "What percentage reduction in false positives has the FDA's new AI tool achieved?"
A. 10%
B. 15%
C. 20%
D. 25%

Example Entry 3 - Multi-Speaker Conversation

metadata audio sample transcript test question
main topic: "Environment"
subtopic: "Renewable Energy"
transcript type: "conversation"
id: "conversation_188"
Full Audio

Key Sentence
spk_1: "Did you hear about the new solar farm in Nevada? It's supposed to be one of the largest in the country."

spk_2: "Yeah, I read something about that. But I'm more interested in the offshore wind projects. They seem to be gaining traction lately."

spk_1: "Offshore wind is fascinating. I actually saw a report that said a single turbine can power up to 1,500 homes. That's pretty impressive."

spk_3: "Speaking of impressive, did you know that Iceland generates about 99% of its electricity from renewable sources? That's mainly geothermal and hydropower."

spk_2: "Iceland is a great example. But, let's not forget that not all countries have such resources. It's a bit more challenging for places without natural geothermal activity."

spk_4: "True, but innovation is key. I met a researcher today who mentioned that in 2022, the cost of lithium-ion batteries dropped by 89% compared to a decade ago. That could really boost the adoption of renewable energy."

spk_1: "That's an incredible drop! Lower battery costs can definitely make renewable energy more accessible and practical."

spk_3: "And it could also help with energy storage solutions. Storing energy efficiently is half the battle in renewable energy."

spk_4: "Exactly, and with improved storage, we can better manage supply and demand. It's exciting to see where this will lead us in the next few years."
Q: "By what percentage did the cost of lithium-ion batteries drop in 2022 compared to a decade ago?"
A. 75%
B. 89%
C. 50%
D. 30%