Tagging: the automatic assignment of the descriptors to

Tagging:

  The descriptors are called the tags and the
automatic assignment of the descriptors to the given tokens is called tagging.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

POS Tagging

The process of assigning one of the parts of
speech to the given word is called Parts Of Speech tagging, commonly referred
to as POS tagging. Parts of speech include nouns, verbs, adverbs, adjectives,
pronouns, conjunction and their sub-categories

POS Tagger

A Part-Of-Speech
Tagger (POS Tagger) is a software that reads text and then assigns parts of
speech to each word (and other token), such as noun, verb, adjective, etc., It
uses different kinds of information such as dictionary, lexicons, rules, etc.

because dictionaries
have category or categories of a particular word, that is a word may belong to
more than one category. For example, run is both noun and verb so to solve this
ambiguity taggers use probabilistic information.

There are mainly
two type of taggers:

Rule-based – Uses
hand-written rules to distinguish the tag ambiguity.

Stochastic
taggers are either HMM based – chooses the tag sequence which maximizes the
product of word likelihood and tag sequence probability, or cue-based, using
decision trees or maximum entropy models to combine probabilistic features.

Tagset

Tagger chooses
the relevant tags to attach with the words from set of tags called tagset.

Every tagger will
be given a standard tagset. The tagset may be coarse such as N (Noun), V(Verb),
ADJ(Adjective), ADV(Adverb), PREP(Preposition), CONJ(Conjunction) or
fine-grained such as NNOM(Noun-Nominative), NSOC(Noun-Sociative), VFIN(Verb
Finite),VNFIN(Verb Nonfinite) and so on. Most of the taggers use only fine
grained tagset.

Example of an English(Treebank)
tags are shown below