Institute of linguistics
Faculty of philosophy
University of Zagreb

Croatian
National
Corpus


HOME

INSTITUTIONS/ASSOCIATIONS

PROJECTS

CORPORA

DICTIONARIES

TOOLS

SPEECH

CONFERENCES

GLOSSARY

MISCELLANEOUS

Language technologies tools

Tools for Croatian


Other tools

Text editors and converters

  • Emacs (fully programmable text editor)
  • 2XML (tool for conversion of HTML/RTF texts into XML)
  • Softleks (highly specialized text editor for dictionary compiling/writing)

Language resources markup

  • SGML (Standard Generalized Markup Language)
  • XML (Extended Markup Language)
  • TEI (Text Encoding Initiative: home page)
  • TEI recommendations for corpus annotation
  • TEI Pizza Chef (on-line definition of DTDs)
  • CES (Corpus Encoding Standard, SMGL)
  • XCES (Corpus Encoding Standard, XML)
  • TMX format specification
  • XT (SGML/XML parser and validator by James Clark)
  • O'Reilly XML.com

Word lists, frequency dictionaries, concordances, text statistics and analysis

Taggers

  • TNT (trigram tagger)
  • WinBrill (rule-based tagger for Windows)
  • QTAG (language independent tagger)
  • Czech Morphology (Johns Hopkins University)
  • CLAWS (tagger used for tagging the British National Corpus)

Syntactic parsers

Tree-banks

Semantic networks

Aligners

Machine (aided) translation

Developers environments