Search
From v. 2.0 HNK is accessible and searchable with client program Bonito.
Installation of Bonito
After downloading Bonito for your operating system (Windows, Linux/Unix/MacOs) from its official web-site (http://www.textforge.cz/download), you will have to install it. In Windows it is sufficient to unzip the original distribution zip-file in a directory (e.g. c:\Program Files\Bonito), make a shortcut on the desktop and start the program. For installation and launching Bonito under Linux/Unix/MacOs you will need to install Tcl/Tk version 8.2 or higher.
Setting options in Bonito
At the first start of Bonito the connection parameters and user account options should be set.
Setting connection parameters
Connection parameters are available through menu Manager -> Connection:
During the testing period of HNK v 2.0 a provisional access will be granted to all users (user name: gost; no password). If your computer is already connected to Internet, the connection to HNK server should be established by pressing OK (in the upper right corner of Bonito window, a button with the name of default subcorpus will appear).
Setting user options
User options are available through menu Manager -> Options:
If you want to store settings for current program session only, you should confirm the changes with Apply. If you confirm with Save, settings will be remembered not for this session but for all subsequent program starts. New option settings will be applied only with the next starting of the program.
Selecting the corpus
Before any search, the desired corpus or subcorpus should be selected by pressing the button in the upper right corner of the main program window:
Simple queries
Criteria for queries i.e. search of desired keyword in order to get its concordance, are input in the row New query on the top of the main program window:
By pressing the arrow mark on the far right of the input row the list of previous queries is opened for selection.
Examples of simple queries:
- glava
gives concordance of the word glava - glavo luda
gives concordance with two subsequent words glavo luda as keyword - glava benzinskoga motora
gives concordance with three subsequent words glava benzinskoga motora as keyword
Complex queries
Queries with regular expressions
- glav.
gives concordance of all words beginning with glav, with exactly one following character - glav.*
gives concordance of all words beginning with glav, with 0 or more following characters - glav.+
gives concordance of all words beginning with glav, with 1 or more following characters - glav(a|e|i|u|o|om|ama)
gives concordance of all words beginning with glav and ending with any of alternative
endings in parenthesis i.e. glava, glave, glavi, glavu, glavo, glavom, glavama - glava?
gives concordance of words glava and words shorter for single final character i.e. glav - "glava" | "noga" | "ruka" or [word="glava"] | [word="noga"] | [word="ruka"]
gives concordance with three distinct intermixed words as keyword: glava, noga, ruka
Table with examples of usage of basic regular expressions
| Expression | Description | Example | Expected results |
|---|---|---|---|
| . | dot denotes any single character |
glav. | glava, glave, glavu |
| * | asterix denotes zero or more times the previous character |
glav* | glav, glavv, glavvv, glavvvv, glavvvvv,... |
| + | plus sign denotes one or more times the previous character |
glav+ | glavv, glavvv, glavvvv... |
| {x} | braces denote the number of repeatings of the previous character |
glav.{3} | glavama, glavica, glavicu, glavice, glavici, glavara, glavare, glavari... |
| {x,y} | span in braces denotes span of repeatings of the previous character |
glav{1,4} .{5,10} |
glav, glavv, glavvv, glavvvv all strings between 5 and 10 characters in length |
| | | vertical bar denotes logical operation or |
"glava"|"ruka" | glava, ruka |
| [ ] | brackets denote set or span of characters for resolving the expression |
glav[aeiou] [g-k]lava |
glava, glave, glavi, glavo, glavu
glava, hlava, ilava, jlava, klava |
| ( ) | parenthesis used for grouping of subexpressions |
(G|g)lava ([Gg]|[Pp])lava |
Glava, glava Glava, glava, Plava, plava |
| (?i) | this expression is used for ignoring the case |
(?i)glava | Glava, glava |
If you would like to use dot ., plus sign +, asterix *, parenthesis ( ), braces { } or brackets [ ] litterary, each of this characters should be escaped with \.
Queries with structural tags in text
- <head> "glava"
gives concordance of word glava appearing at the very beginning of any heading - "glava" within <head>
gives concordance of word glava appearing anywhere within any heading - "glava" within <head type="pn">
gives concordance of word glava appearing anywhere within headings of type "pn"
i.e. subheading - <head> "[0-9]+"
gives concordance of numbers at the very beginning of any heading - ".*ba" </head>
gives concordance of all words which end in -ba at the very end of any heading
The list of structural tags used in corpus is available through menu Corpus -> Information summary or with shortcut (ctrl+I):
Queries with lemma or morphosyntactic description
Notice: This type of queries will be available for the whole HNK in v 2.5 scheduled for spring 2006. In the meantime it is possible to test this type of queries on cw2000 subcorpus only.
- [lemma="glava"]
gives concordance of all word-forms of lemma glava - [msd="Ncm.*"]
gives concordance of all word-forms of all common (c) nouns (N) of masculine gender (m) - [msd="Af.*"] [lemma="glava"]
gives concordance of all word-forms of lemma glava with any qualificative (f) adjective (A) in front of it - [msd="S.*"] [msd="Nc..g"]
gives concordance of all prepositions (S) followed by common (c) noun (N) in genitive (g)
Explanation and detailed description of morphosyntactic tags (MSD) used in HNK can be found at the official web-pages of MulTextEast reccomendations.
The list of attributes used in corpus (i.e. lemma, MSD etc.) is available through menu Corpus -> Information summary or with shortcut (ctrl+I) (see illustration above).
Additional information in concordances
Beside the text in concordances the following additional information can be displayed: source reference, token attributes, structural tags.
Displaying source abbreviation
The source reference can be displayed in concordance through menu selection View -> References or with shortcut F4.
In HNK the source reference is encoded with doc.file so this reference should be selected in order to display the source abbreviation at the beginning of each concordance row.
Displaying token attribute
The token attribute can be displayed in concordance through menu selection View -> Attributes or with shortcut F5.
In HNK token attributes are encoded with lemma for lemmas and msd for morphosyntactic descriptions. By selecting desired attributes they are being displayed next to keyword (Only in KWIC) or next to each token in concordance (For all positions).
Displaying structural tags
The structural tags can be displayed in concordance through menu selection View -> Structures or with shortcut F6.
All structural tags used in selected corpus are available for selection. By default they are displayed in green.
Printing and saving the concordance
Concordances can be printed out (Concordance -> Print) or saved to a file (Concordance -> Save to file or short cut F2). If you send a concordance to a printer, you should know that Bonito automatically selects the default printer and all its default settings without the possibility to change them interactively (depending on the size of context, it should often be printer and paper of size A3). There is an alternative in saving the concordance to file. This file can be opened and reformatted in another program.
While saving the concordance to file it is recommendable to select utf-8 character encoding since the maximum preservation of different character sets is achieved in that way. The files with this universal encoding can be opened today by almost all text editors.
Bonito documentation
Additional features of Bonito are described in its manual which can be opened in HTML format through the menu Help -> Documentation. The same manual is also available in PDF format in subdirectory doc within the directory where Bonito has been installed (e.g. c:\Program Files\Bonito\doc).
