Some methodological remarks 1
what kind of data do we get from corpora?
3 basic types
- evidence: is the linguistics unit we are looking for there?
- frequency: if it’s there, how often?
- relation: if it’s there more than once, is there any recognizable relation with other units? Are there different relations or only one?
what do we count?
- phonemes/graphemes (+ their combinations = syllables?)
- morphemes (+ their combinations = words)
- words (+ their combinations = syntagms)
- syntagms (+ their combinations = clauses, sentences)
- meanings...