Part of voice tagging (PoS)
Tagging is a kind of classification that can be defined as the automatic assignment of deion to tokens. Here the deeur is called tag, which can represent any of the semantic information, part of the speech, etc.
Now if we are talking about PartofSpeech (PoS) tagging, then it can be defined as the process of assigning one of the parts of speech to the given word. It is generally referred to as POS labeling. In simple terms, we can say that POS tagging is a task of labeling every word of a sentence with its appropriate part of speech. We already know that parts of speech include nouns, verbs, adverbs, adjectives, pronouns, conjunctions and their subcategories.
Most POS tags fall under Rule Base POS markup, Stochastic POS tag, and markup transformation.
POS markup based on rRules
One of the oldest markup techniques is rulebased POS marking. Rulebased markers use a dictionary or lexicon to get possible tags to mark up each word. If the word has more than one possible tag, rulebased tags use handwritten rules to identify the correct tag. Removal of ambiguity can also be done in rulebased tagging by analyzing the linguistic characteristics of a word as well as its preceding and following words. For example, suppose if the preceding word of a word is article, then the word must be a noun.
As the name suggests, all such information in rulebased POS markup is encoded as rules. These rules can be 
We can also understand rulebased point of sale markup by its twostep architecture 

First step  In the first step, it uses a dictionary to assign each word a list of potential parts of speech.

Second step  In the second step, it uses large lists of handwritten disambiguation rules to sort the list at only one part of speech for every word.
RuleBased Point of Sale Markup Properties
RuleBased Point of Sale Markers have the following properties 

These taggers are knowledgebased taggers.

Rules for rulebased POS tags are built manually.

Informations are encoded as rules.

We have a limited number of rules around 1000.

Smoothing and language modeling are defined explicitly in the rulebased markers.
Stochastic POS tagging
Another tagging technique is Stochastic POS tagging. Now the question that arises here is which model can be stochastic. The model that includes frequency or probability (statistics) can be called stochastic. A number of different approaches to the problem of marking up part of speech can be called stochastic markup.
The simplest stochastic markup applies the following approaches for POS markup 
Word Frequency approach
In this approach, stochastic markers eliminate the word ambiguity based on the probability that a word appears with a particular tagtime. We can also say that the most frequently encountered tag with the word in the training set is the one assigned to an ambiguous instance of that word. The main problem with this approach is that it can produce an invalid tag sequence.
Tag sequence probabilities
This is another approach to stochastic marking, where the marker calculates the probability of a sequence of tags occurring. It is also called the ngram approach. It is called so because the best tag for a given word is determined by the probability at which it occurs with the previous n tags.
Stochastic POST markup properties
Stochastic POS markers have the following pro perties 

This POS tagging is based on the likelihood of the tag occurring.

This requires training corpus

There would be no preliability that the words do not exist in the corpus.

It uses different corpus tests (other than the training corpus).

This is the simplest POS markup because it chooses the most frequent tags associated with a word in the training corpus.
Transformationbased markup
Transformationbased markup is also called Brill markup. This is an instance of transformationbased learning (TBL), which is a rulebased algorithm for automatic tagging of outlets on given text. TBL, allows us to have linguistic knowledge in a readable form, transforms one state into another state using transformation rules.
It draws inspiration from both the taggers explained earlier  rulebased and stochastic. If we see a similarity between the rulebased tagger and the transformation tagger, aWhile as rulebased, it is also rulebased which specifies which tags should be assigned to which words. On the other hand, if we see a similarity between the stochastic and transformational tagger then like the stochastic, it is a machine learning technique in which the rules are automatically inferred from data.
How Transformation Based Learning (TBL) Works
In order to understand the workings and concept of transformation based taggers, we need to understand how transformation works. learning based on transformation. Consider the following steps to understand how TBL works 

Start with the solution  TBL usually starts with a solution to the problem and works in cycles.

Most profitable transformation chosen  In each cycle, TBL will choose the transmost beneficial training.

Apply to problem  The transformation chosen in the last step will be applied to the problem.
The algorithm will stop when the transformation selected in step 2 does not add any more value or there is no more transformations to select. This type of learning is best suited to classification tasks.
Benefits of Transformation Based Learning (TBL)
The benefits of TBL are as follows 

We are learning a small, simple set of rules and these rules are sufficient for markup.

Development as well as debugging is very easy in TBL because the learned rules are easy to understand.

The complexity of the markup is reduced because in TBL there is an interweaving of machinelearned and humangenerated rules.

The tagger based on the transformation is much faster than the Markov model tagger.
Disadvantages of Transformation Based Learning (TBL)
The disadvantages of TBL are as follows 

Transformationbased learning (TBL) does not provide tag probabilities.

In TBL, the training time is very long, especially on large corpora.
Hidden Markov Model (HMM) PDV Markup
Before we dig deep into HMM POS ta gging, we need to understand the concept of a Hidden Markov (HMM).
Hidden Markov model
An HMM model can be defined as the double embedding stochastic model, where the underlying stochastic process is hidden. This hidden stochastic process can only be observed through another set of stochastic processes which produces the sequence of observations.
Example
For example, a sequencerHidden coin pulling experiments are performed and we only see the observation sequence consisting of heads and tails. The actual details of the process  the number of parts used, the order in which they are selected  are hidden from us. By observing this sequence of heads and tails, we can construct several HMMs to explain the sequence. Here is a form of a hidden Markov model for this problem 
We assumed that there are two states in the HMM and each of the states corresponds to the selection of different biased parts. matrix gives state transition probabilities 
$$ A = begin {bmatrix} a11 & a12 a21 & a22 end {bmatrix} $$
Here,

a _{ ij } = probability of transition of a state to another from i to j.

a _{ 11 } + a _{ 12 } = 1 and a _{ 21 } + a _{ 22 } = 1

P _{ 1 } = probability of heads from the first piece i.e. through the first piece.

P _{ 2 } = probability of heads of the second coin ie the bias of the second coin.
We can also make an HMM model assuming there are 3 or more parts.
In this way, we can characterize HMM by the following elements 

N, the number of states in the model (in the example N = 2, only two states).

M, the number of distinct observations that can appear with each state in the example above M = 2, that is, H or T).

A, the state transition probability distribution  matrix A in the example above.

P, the probability distribution of the observable symbols in each state (in our example P1 and P2).

I, the distribution of éinitial state.
Using HMM for POS Tagging
The POS tagging process is the process of finding the sequence of tags that is most likely to occur. 'have generated a given sequence of words. We can model this POS process using a Hidden Markov Model (HMM), where tags are the hidden states that produced the observable output, > ie, the words .
Mathematically, in POS markup, we are always interested in finding a sequence of tags (C) that maximizes 
P (C  W)
Where,
C = C _{ 1 }, C _{ 2 } , C _{ 3 } ... C _{ T }
W = W _{ 1 }, W _{ 2 } , W _{ 3 }, W _{ T }
On the other side of the coin, the point is that we need a lot of statistical data to reasonably estimate this type of sequences. However, to simplify the problem, we can applicate some mathematical transformations with some hypotheses.
Using HMM to do POS tagging is a special case of Bayesian interference. Therefore, we'll start by rephrasing the problem using Bayes' rule, which says that the conditional probability mentioned above is equal to 
(PROB (C _{ 1 }, ..., CT) * PROB (W _{ 1 }, ..., WT  C _{ 1 }, ..., CT)) / PROB (W _{ 1 }, ..., WT)
We can eliminate the denominator in all of these cases because we are interested in finding the sequence C which maximizes the value here above. It will not affect our response. Now our problem boils down to finding the C sequence that maximizes 
PROB (C _{ 1 }, ..., CT) * PROB (W _{ 1 }, ..., WT  C _{ 1 }, ..., CT) (1)
Even after reducing the problem in the expression above, cit would require a lot of data. We can make reasonable independence assumptions about the two probabilities in the above expression to overcome the problem.
First hypothesis
The probability of a tag depends on the previous one (bigram model) or the two previous ones (trigram model) or the previous n beacons (ngram model) which, mathematically, can be explained as follows 
PROB (C _{ 1 }, ..., C _{ T }) = Π _{ i = 1..T } PROB (C _{ i }  C _{ in + 1 }… C _{ i1 }) (ngram model)
PROB (C _{ 1 }, .. ., CT) = Π _{ i = 1..T } PROB (C _{ i }  C _{ i1 }) (bigram model)
The start of a sentence can be taken into account by assuming an initial probability for each tag.
PROB (C _{ 1 } C _{ 0 }) = PROB _{ initial } (C _{ 1 })
Second hypothesis
The second probability of equation (1) above can be approximated by assuming that a word appears in a category independent of words in the preceding or following categories which can be explained mathematically as follows 
PROB (W _{ 1 },. .., W _{ T }  C _{ 1 }, ..., C _{ T }) = Π _{ i = 1..T } PROB (W _{ i }  C _{ i })
Now based on the two assumptions above our goal boils down to finding a C sequence that maximizes
Π _{ i = 1 ... T } PROB (C _{ i }  C _{ i1 }) * PROB (W _{ i }  C _{ i })
Now the question here is converting the problem to the above form really helped us. The answer is  yes, it does. If we have a large tagged corpus, then bothprobabilities in the above formula can be calculated as 
PROB (C _{ i = VERB }  C _{ i 1 = NOUN }) = (# of instances where the verb follows the noun) / (# of instances where the noun appears) (2)
PROB (W _{ i }  C _{ i }) = (# of instances where W _{ i } appears in C _{ i }) / (# of instances where C _{ i } appears) (3)