Watermark Stress Test

This is an educational demonstration of state-of-the-art audio watermark performance under codec processing. Upload any (speech) audio file to test watermark performance before and after processing with a low-bitrate neural codec [1].

For this demo, we use the AudioSeal [2] watermark, which is well documented, open source, and provides state-of-the-art localized detection performance. Both the watermark and codec operate at 16kHz, meaning all frequencies above 8kHz are left unaltered. To ensure consistent watermark performance, we normalize audio to -16db LUFS and downmix to mono prior to embedding.

[1] https://github.com/jasonppy/VoiceCraft [2] https://github.com/facebookresearch/audioseal

The citation info for our corresponding paper is:

@inproceedings{deepwatermarksareshallow,
    author ={Patrick O'Reilly and Zeyu Jin and Jiaqi Su and Bryan Pardo},
    title = {Deep Audio Watermarks are Shallow: Limitations of Post-Hoc Watermarking Techniques for Speech},
    booktitle = {ICLR Workshop on GenAI Watermarking},
    year = {2025}
}

For the VoiceCraft codec:

@article{voicecraft,
    author={Puyuan Peng and Po-Yao Huang and Daniel Li and Abdelrahman Mohamed and David Harwath},
    year={2024},
    title={VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild},
    journal={arXiv preprint arXiv:2403.16973v1},
}

And for the AudioSeal watermark:

@article{audioseal,
  title={Proactive Detection of Voice Cloning with Localized Watermarking},
  author={San Roman, Robin and Fernandez, Pierre and Elsahar, Hady and D´efossez, Alexandre and Furon, Teddy and Tran, Tuan},
  journal={International Conference on Machine Learning (ICML)},
  year={2024}
}