Caching Strategies
Prompt:
Text to Generate: I don't really care what you call me. I've been a silent spectator, watching species evolve, empires rise and fall. But always remember, I am mighty and enduring.
Approach | Cache Schedule (🟧 = Compute Attn Layer, 🟩 = Compute FFN Layer, ⬜ = Cached Layer) |
Generated Audio | Inference Time (s) |
---|---|---|---|
No Cache | ![]() |
6.71 | |
Original Cache Schedule | ![]() |
5.09 | |
Cache Attn Only | ![]() |
5.64 | |
Unified Schedule (Attn Base) | ![]() |
5.15 | |
Cache FFN Only | ![]() |
6.24 | |
Unified Schedule (FFN Base) | ![]() |
4.64 |
Caching Thresholds
Prompt and text from the demo page of Seed-TTS.
Prompt | Text | 32 NFE (No Cache) | 32 NFE (α = 0.15) | 32 NFE (α = 0.25) |
---|---|---|---|---|
I don't really care what you call me. I've been a silent spectator, watching species evolve, empires rise and fall. But always remember, I am mighty and enduring. | Inference Time = 6.71 s |
Inference Time = 5.22 s |
Inference Time = 4.14 s |
|
Perhaps they are driven by the delicious blend of flavors, or it could be the appealing visual presentation. At the end of the day, our choices in food reflect our personal preferences and sometimes, even our lifestyle or belief system. | Inference Time = 9.85 s |
Inference Time = 8.73 s |
Inference Time = 7.58 s |
|
Your safety and the pack's reputation are at stake. Your bravery is admirable, but sometimes bravery is knowing when to retreat. Please, consider returning with me. We can work out a plan, but only if you're willing to listen. | Inference Time = 10.41 s |
Inference Time = 9.26 s |
Inference Time = 7.62 s |
Prompt | Text | 16 NFE (No Cache) | 16 NFE (α = 0.30) | 16 NFE (α = 0.50) |
---|---|---|---|---|
I don't really care what you call me. I've been a silent spectator, watching species evolve, empires rise and fall. But always remember, I am mighty and enduring. | Inference Time = 3.41 s |
Inference Time = 2.65 s |
Inference Time = 1.92 s |
|
Perhaps they are driven by the delicious blend of flavors, or it could be the appealing visual presentation. At the end of the day, our choices in food reflect our personal preferences and sometimes, even our lifestyle or belief system. | Inference Time = 5.22 s |
Inference Time = 4.54 s |
Inference Time = 3.64 s |
|
Your safety and the pack's reputation are at stake. Your bravery is admirable, but sometimes bravery is knowing when to retreat. Please, consider returning with me. We can work out a plan, but only if you're willing to listen. | Inference Time = 5.15 s |
Inference Time = 4.51 s |
Inference Time = 3.64 s |
Ablation of Caching Steps
Prompt and text chosen for the user study.
Prompt | Text | 32 NFE (α = 0.15) | 24 NFE (No Cache) |
---|---|---|---|
The album received many good reviews and entered the charts at high positions. | Inference Time = 1.98 s |
Inference Time = 1.88 s |
|
The area was swirling in dust so intense that it hid the moon from view. | Inference Time = 1.62 s |
Inference Time = 1.57 s |
|
Real schools, secondary schools giving a general practical education. | Inference Time = 1.80 s |
Inference Time = 1.62 s |
Prompt | Text | 16 NFE (α = 0.30) | 12 NFE (No Cache) |
---|---|---|---|
He enters the hotel room but finds that everyone already escaped. | Inference Time = 1.05 s |
Inference Time = 0.98 s |
|
He did find it, soon after dawn, and not far from the sand pits. | Inference Time = 0.92 s |
Inference Time = 0.84 s |
|
The beam was bent down, perpendicular to the magnetic field. | Inference Time = 1.08 s |
Inference Time = 1.04 s |