Explore - Welsh Digital Grid

CorCenCC

CorCenCC is a freely accessible digital collection of samples of Welsh, gathered from real-life communication (a ‘corpus’). You can explore CorCenCC to find out about how people really use Welsh, for instance, how often a specific word is used, or what the most frequently used words in specific kinds of communication are. CorCenCC contains over 11 million words from written, spoken and electronic Welsh language sources, taken from a range of genres, language varieties (regional and social) and contexts. Every word in a corpus is tagged with grammatical and semantic information (relating to themes and topics), and information is provided about where each language excerpt is from (e.g. text type, speaker location).