| HOME
INSTITUTIONS/ASSOCIATIONS
PROJECTS
DICTIONARIES
TOOLS
SPEECH
CONFERENCES
GLOSSARY
MISCELLANEOUS
|
Corpora
represent the basic form of language resources and are therefore
indispensable foundation for language technology research for each
natural language.
Croatian corpora
- Croatian
National Corpus, Zagreb (the largest Croatian corpus)
- Croatian-English parallel
corpus, Zagreb
- Croatian-Slovenian parallel
corpus, Zagreb-Ljubljana
- One-million corpus of the Croatian
literary language (compiled by M. Moguš, Institute of Linguistics,
Faculty of Philosophy, University of Zagreb)
- Corpus of texts by Croatian medieval,
Renaissance and Baroque Writers (Institute of Linguistics,
Faculty of Philosophy, University of Zagreb)
- Corpus of Croatian elementary- and
highschool textbooks (Department of Information Science, Faculty of
Philosophy, University of Zagreb)
- Intratext
collection of religious texts in Croatian
- The Croatian Conference of Bishops - Old Testament and New Testament
- Croatian
Language Online Repository (Institute of Croatian Language and
Linguistics)
- Silvije Strahimir
Kranjčević (comprehensive on-line collection)
- Croatian
Child Language Corpus (Child Language Data Exchange System - CHILDES)
- Kur'an
WWW as corpus
Avalibale corpora of other
languages
Bosnian
Bulgarian
Czech
Danish
Dutch
English
- British
National Corpus, Oxford (the first national corpus ever made, 100
Mw);
Alternativno pretraživanje - VIEW:
Variation in English Words and Phrases
- Bank of
English, Birmingham (the largest corpus of English, over 400 Mw)
- American National Corpus
- Brown
corpus and Brown
corpus (the first modern, computer readable, general, million-token
corpus, American English)
- LOB Corpus
(Lancaster-Oslo/Bergen corpus, the British copy of the Brown corpus)
- TEC Corpus (English
translation Corpus)
- ICAME (International Corpus
Archive of Modern & Mediaeval English), Bergen
- Oxford
Text Archive (Oxford archive of digital texts, not solely in
English)
- Croatian-English Parallel
Corpus, Zagreb
- English-Norvegian Parallel
Corpus, Oslo
- Slovenian-English
Parallel Corpus, Ljubljana (compiled within the ELAN Project)
- Corpus
of Spoken, Professional American English
- The York-Helsinki Parsed Corpus of Old English Poetry
- The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of
Old English
- Evrokorpus (korpus
prijevoda zakonodavstva Europske unije)
- SVEZ-IJS
ACQUIS (slovensko-engleski paralelni korpus)
- SCRIBE (Spoken
Corpus of British English)
- European Parliament Proceedings Parallel Corpus
- Scottish Corpus of Text and Speech
Croco
(German-English parallel corpus)
CzEng (Czech-English
parallel corpus)
Estonian
Ethiopian languages
French
Gaelic languages
German
Greek
Finnish
Hebrew
Hungarian
Indian languages
Italian
Lithuanian
Malayan
Norwegian
Polish
- PELCRA (Polish
and English Language Corpora for Resarch and Applications)
- IPI PAN (Corpus of Polish)
Portuguese
Romanian
- MULTEXT-East:
multilingual text tools and corpora for Central and Eastern Euroepan
Languages
Russian
Serbian
Slovak
Slovenian
Sorbian
Spanish
Swedish
Turkish
Other corpora
Other lists of corpora
|