Welcome to the South African Centre for Digital Language Resources website
Log In

Log In

Forgot Your Password?

Tray Subtotal: R0.00

South African Directory Enquiries (SADE) Name Corpus

Be the first to review this resource

Availability: Available for download

R0.00

Quick Overview

Audio and tagged orthographic transcriptions of South African names produced by first-language speakers of 4 languages: Afrikaans, English, isiZulu, Sesotho.

South African Directory Enquiries (SADE) Name Corpus

Double click on above image to view full picture

Zoom Out
Zoom In
R0.00
"Audio and tagged orthographic transcriptions of South African names produced by first-language speakers of 4 languages: Afrikaans, English, isiZulu, Sesotho. Utterances are tagged with speaker language, word language, speaker identity, speaker gender, broad phonemic pronunciation and pronunciation modality ('intended language')."

Write Your Own Review

Only registered users can write reviews. Please, log in or register

Additional Information

Contact persons and email addresses Marelie H. Davel: marelie.davel@gmail.com
Affiliations North-West University, Molo Afrika Speech Technologies, IntSyst Labs CC
Licensing Creative Commons Attribution 3.0 Unported License (CC BY 3.0)
Licensing details http://creativecommons.org/licenses/by/3.0/
Names of principal developers Charl van Heerden, Marelie Davel, Oluwapelumi Giwa, J.W.F Thirion
Media type Speech
ISLRN No
Category Multilingual speech corpora: annotated
Annotation details All utterances are orthographically transcribed.
Citation information 1.) Jan W.F. Thirion, Charl van Heerden, Oluwapelumi Giwa and Marelie H. Davel, "The South African Directory Enquiries (SADE) corpus", Language Resources and Evaluation. 2.) "Multilingual pronunciations of proper names in a Southern African corpus", J. W. F. Thirion, M. H. Davel and E. Barnard, in Proc. PRASA, Pretoria, November 2012, http://www.prasa.org/proceedings/2012/prasa2012-17.pdf.
Description of background and purpose Audio and tagged orthographic transcriptions of South African names produced by first-language speakers of 4 languages: Afrikaans, English, isiZulu, Sesotho. The corpus was created to support research on multilingual name pronunciation, as well as to train acoustic models that supports multilingual name automatic speech recognition. Tags were partially auto-generated (see paper) and partially corrected using human intervention.
Distribution RMA: www.rma.nwu.ac.za
Source Telephone recordings
Stratum (structure of data) No
Size (number of tokens/duration) 13h56m09s (40 speakers, each producing 400 utterances, 16,000 utterances in total)
File size 494 Mb (zipped)
Specialised software required No
Maturity Released
Verification and proof of quality No
Compatibility with standards Some informal guidelines (in-house)
Details of documentation available Installation contains Readme document
Standards compliance details N/A
Contributors Anina Lambrechts, Bulelwa Matjene, Charl van Heerden, Etienne Barnard, J.W.F Thirion, Marelie H.Davel, Nadia Barnard, Oluwapelumi Giwa, Sarina le Roux, and various language practitioners from 'The Translation World'.

Resource Tags

Use spaces to separate tags. Use single quotes (') for phrases.