Audio samples for "SETransformer: Transformer Speech enhancement"

Authors: WeiWei Yu, Jian Zhou, HuaBin Wang, Liang Tao, Hon Keung Kwan
Abstract: Recurrent neural networks (RNNs), such as Transformers, GRUs, were important neural network architectures in speech enhancement, due to their powerful sequence learning. However, it severely suffers from an issue: unable to parallelize the sequential computation procedure. Therefore, many non-recurrent sequence models that are built on convolution and attention operations have been proposed recently. Notably, models with multi-head attention such as Transformer have demonstrated extreme effectiveness on many natural language processing (NLP) tasks. However, they are non-trivial to apply to speech enhancement due to heavily rely on position embeddings that require a considerable amount of design efforts. In this paper, we propose the SETransformer which enjoys the advantages of both RNNs and the multi-head attention mechanism while avoids their respective drawback. Experiments show that, as compared with the standard Transformer and the Transformers baseline, the proposed attention approach can consistently achieve better performance in terms of speech quality (PESQ) and intelligibility (STOI) on unseen noise conditions.

Comparing of different models(Clean, Noisy, LSTM, Transformer, SETransformer):

Clean Noisy LSTM Transformer SETransformer
A B C D E
1. test001.wav.
A
B
C
D
E
2. test002.wav.
A
B
C
D
E
3. test003.wav.
A
B
C
D
E
4. test004.wav.
A
B
C
D
E
5. test005.wav.
A
B
C
D
E
6. test006.wav.
A
B
C
D
E