Sound demos for VibSpeech

1. Overall Performance of VibSpeech

These audio samples are recovered by VibSpeech from captured narrowband vibration signals. The experimental setup is introduced in Section 7.1.

The Raw(NB) refers to the raw narrowband signals derived from mmWave signals.
The Reconstructed (WB) refers to the reconstructed audio by VibSpeech.
The Ground-Truth refers to the original audio played by the loudspeaker.

Raw (NB) Reconstructed (WB) Ground-Truth Text
#1 Go Do You Hear
#2 We Are Glad To Welcome His Gospel
#3 At This Moment The Whole Soul Of The Old Man Seemed Centred In His Eyes Which Became Bloodshot
#4 One Minute A Voice Said

2. With/Without Vocal-tract Features (Speaker Embedding) of the Victim

These samples are used to demonstrate the performance of VibSpeech when the attacker does/does not have a short utterance from the victim for speaker embedding extraction.

The Raw (NB) refers to the raw narrowband signals derived from mmWave signals.
The Reconstructed (#0) refers to the reconstructed audio when without speaker embeddings of Spk-A (female).
The Reconstructed (#1) refers to the reconstructed audio when with speaker embeddings of Spk-A. Audio for speaker embedding extraction: Audio#1
The Reconstructed (#2) refers to the reconstructed audio when with speaker embeddings of Spk-A. Audio for speaker embedding extraction: Audio#2
The Ground-Truth refers to the original audio played by the loudspeaker.

Raw (NB) Reconstructed (#0) Reconstructed (#1) Reconstructed (#2) Ground-Truth Text
Spk-A It's Delightful To Hear It In A London Theatre

The Raw (NB) refers to the raw narrowband signals derived from mmWave signals.
The Reconstructed (#0) refers to the reconstructed audio when without speaker embeddings of Spk-B (male).
The Reconstructed (#1) refers to the reconstructed audio when with speaker embeddings of Spk-B. Audio for speaker embedding extraction: Audio#1
The Reconstructed (#2) refers to the reconstructed audio when with speaker embeddings of Spk-B. Audio for speaker embedding extraction: Audio#2
The Ground-Truth refers to the original audio played by the loudspeaker.

Raw (NB) Reconstructed (#0) Reconstructed (#1) Reconstructed (#2) Ground-Truth Text
Spk-B There Was Nothing Said About The Sort Of Accommodation Which Would Be Provided

3. Entension to IMU-based Vibration Side Channel

These audio samples are recovered by VibSpeech when we applied VibSpeech on IMU-measured vibration signals (Section 8). The experimental setup is shown in Figure 23.

Raw (NB) Reconstructed (WB) Ground-Truth Text
#1 Go Do You Hear
#2 We Are Glad To Welcome His Gospel
#3 At This Moment The Whole Soul Of The Old Man Seemed Centred In His Eyes Which Became Bloodshot
#4 It Is Obviously Unnecessary For Us