Welcome to the South African Centre for Digital Language Resources website
Log In

Log In

Forgot Your Password?

Tray Subtotal: R0.00

NCHLT Afrikaans Named Entity Annotated Corpus

Be the first to review this resource

Availability: Available for download

R0.00

Quick Overview

Named entity annotated data from the NCHLT Text Resource Development: Phase II Project, annotated with PERSON, LOCATION, ORGANISATION and MISCELLANEOUS tags.

NCHLT Afrikaans Named Entity Annotated Corpus

Double click on above image to view full picture

Zoom Out
Zoom In

* Required Fields

R0.00
Named entity annotated data from the NCHLT Text Resource Development: Phase II Project, annotated with PERSON, LOCATION, ORGANISATION and MISCELLANEOUS tags.

Write Your Own Review

Only registered users can write reviews. Please, log in or register

Additional Information

Contact persons and email addresses Roald Eiselen: Roald.Eiselen@nwu.ac.za
Affiliations North-West University, Centre for Text Technology (CTexT)
Licensing Creative Commons Attribution 2.5 South Africa License
Licensing details http://creativecommons.org/licenses/by/2.5/za/legalcode
Names of principal developers Gerhard van Huyssteen, Martin Puttkammer, E.B. Trollip, J.C. Liversage, Roald Eiselen
Media type Text
ISLRN 063-007-581-338-4
Category Monolingual text corpora: Annotated
Annotation details Details provided in documentation.
Citation information Eiselen, R. 2016. Government domain named entity recognition for South African languages. Proceedings of the 10th Language Resource and Evaluation Conference, Portoro┼ż, Slovenia.
Description of background and purpose Named entity annotated data from the NCHLT Text Resource Development: Phase II Project, annotated with PERSON, LOCATION, ORGANISATION and MISCELLANEOUS tags. Each language contains at least 15,000 tokens categorised as one of the entity name classes.
Distribution RMA: www.rma.nwu.ac.za
Source Based on documents from the South African government domain crawled from gov.za websites and collected from various language units.
Stratum (structure of data) Details provided in documentation.
Size (number of tokens/duration) 25,881 annotated tokens (estimated 230,000 total tokens)
File size 24.5 Mb (zipped)
Specialised software required N/A
Maturity Released
Verification and proof of quality Language identification on source files; cleanup on corpora., Manually verified by language expert.
Compatibility with standards A common standard and fully compliant
Details of documentation available Readme included; project report available on request.
Standards compliance details No
Contributors No

Resource Tags

Use spaces to separate tags. Use single quotes (') for phrases.