What is Modality Corpus?

The MODALITY corpus consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system, as every utterance was labelled. Recordings in noisy conditions can be used to test the robustness of speech recognition systems.

Read more

License

The MODALITY audio-visual corpus for multimodal automatic speech recognition. Copyright © Multimedia Systems Department, Gdańsk University of Technology.

Distribution and usage of this corpus is allowed under following conditions:

  1. The corpus is provided as it is. The authors do not warrant that the corpus will be free from errors or will be suitable for any particular purpose.
  2. The authors of the corpus are not responsible for any direct or indirect problems that may be caused to the user of this corpus.
  3. The use of the corpus is limited to research and educational purposes only.
  4. Any work (eg. journal articles, technical reports, conference papers etc.) resulting from the use of the MODALITY corpus must cite the following papers:

    Czyzewski, A., Kostek, B., Bratoszewski, P. et al. J Intell Inf Syst (2017) 49: 167. https://doi.org/10.1007/s10844-016-0438-z

    Jachimski D., Czyżewski A., A comparative study of English viseme recognition methods and algorithms; Multimedia Tools and Applications, Multimed Tools Appl (2018) 77: 16495. https://doi.org/10.1007/s11042-017-5217-5

    Kawaler, M. & Czyżewski, A. J Intell Inf Syst (2019) 53: 381. Speech database including facial expressions recorded with the Face Motion Capture system, J Intell Inf Syst (2019) 53: 381. https://doi.org/10.1007/s10844-019-00547-y

Corpus Features

01

35 speakers

including 17 natives and 18 non-natives

03

Full HD / 100 FPS

video capture

05

Different recording conditions

includes recordings in clean and noisy conditions

07

Labeled material

Corpus contains hand-made label files as ground truth for AVSR algorithms

02

2.1 TB

of high quality audio-visual material

04

8 PCM audio streams

gathered from a microphone array

06

Commands/sentences

Includes separated commands and continuous sentences

08

Time-of-Flight camera recordings

enabling the depth image for further analysis

What do I need to get started?

1

Fast connection

2

A lot of disk space

3

VLC Media Player

4

Many ideas