The descriptors are called the tags and the
automatic assignment of the descriptors to the given tokens is called tagging.
The process of assigning one of the parts of
speech to the given word is called Parts Of Speech tagging, commonly referred
to as POS tagging. Parts of speech include nouns, verbs, adverbs, adjectives,
pronouns, conjunction and their sub-categories
Tagger (POS Tagger) is a software that reads text and then assigns parts of
speech to each word (and other token), such as noun, verb, adjective, etc., It
uses different kinds of information such as dictionary, lexicons, rules, etc.
have category or categories of a particular word, that is a word may belong to
more than one category. For example, run is both noun and verb so to solve this
ambiguity taggers use probabilistic information.
There are mainly
two type of taggers:
Rule-based – Uses
hand-written rules to distinguish the tag ambiguity.
taggers are either HMM based – chooses the tag sequence which maximizes the
product of word likelihood and tag sequence probability, or cue-based, using
decision trees or maximum entropy models to combine probabilistic features.
the relevant tags to attach with the words from set of tags called tagset.
Every tagger will
be given a standard tagset. The tagset may be coarse such as N (Noun), V(Verb),
ADJ(Adjective), ADV(Adverb), PREP(Preposition), CONJ(Conjunction) or
fine-grained such as NNOM(Noun-Nominative), NSOC(Noun-Sociative), VFIN(Verb
Finite),VNFIN(Verb Nonfinite) and so on. Most of the taggers use only fine
Example of an English(Treebank)
tags are shown below