35 speakers
including 17 natives and 18 non-natives
The MODALITY corpus consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system, as every utterance was labelled. Recordings in noisy conditions can be used to test the robustness of speech recognition systems.
The MODALITY audio-visual corpus for multimodal automatic speech recognition. Copyright © Multimedia Systems Department, Gdańsk University of Technology.
Distribution and usage of this corpus is allowed under following conditions:
including 17 natives and 18 non-natives
video capture
includes recordings in clean and noisy conditions
Corpus contains hand-made label files as ground truth for AVSR algorithms
of high quality audio-visual material
gathered from a microphone array
Includes separated commands and continuous sentences
enabling the depth image for further analysis