About Punjabi Resources - Online Punjabi Grammar Checker

A brief description of the online Punjabi resources available on this website for learning Punjabi grammar concepts is provided below, detailed description can be found here.

Grammar Checker

Grammar checking is one of the basic activities performed in a typical word processing application. Grammar checking means to ensure that a given piece of text follows the grammar rules of the language in which it is written. This is an entirely automated grammar checking system for the Punjabi language, which to the best of our knowledge is first such elaborate attempt at grammar checking for one of the Indian languages. It performs grammar checking at phrase and clause levels using grammatical information exhibited by part-of-speech tags in the form of feature value pairs. It can detect a number of grammatical errors in formal Punjabi texts. For most of the detected errors, it provides a list of suggestions. It also provides detailed error information explaining error rule applied along with incorrect and desired values for involved grammatical categories. For example, Sentence ਦੋ ਸੋਹਣਾ ਮੁੰਡੇ ਜਾਂਦੇ ਹਨ is an incorrect sentence and ਦੋ ਸੋਹਣੇ ਮੁੰਡੇ ਜਾਂਦੇ ਹਨ is its corrected form as suggested by this grammar checker.

Uses:

Getting to know whether the sentence that you formulated is according to the Punjabi grammar rules or not.
Learning from your mistakes - for every detected error, suggestions and error reason are provided in the 'error details' box.

Note: This system is still in BETA state, therfore some correct sentences may be termed as incorrect and some incorrect ones can be flaged as correct. There can be many reasons for this - some words (person or place names) are unknown to the system, some words are ambiguous, system performed wrong analysis.

Morphological Analyzer and Generator

The purpose of a morphological analyzer is to return root word and grammatical information about all the possible word classes (parts of speech) for a given word. For Punjabi nominal word classes, it should return gender, number, person, and case. For Punjabi verbs, tense, aspect, and modality will be required in addition to gender, number, and person. Morphological generator does exactly the reverse of morphological analyzer. Given a root word and its grammatical information (including word class), a typical morphological generator will generate the word form or surface form for that root word. This system uses a full-form lexicon for analysis and generation.

Uses:

Morphological analysis helps you to get to know the word class and grammatical information of the input word, for example ਮੁੰਡਿਆ is noun and vocative case form used in singular number.
Morphological analysis helps you to know from which root word the input word is derived from, for example ਮੁੰਡਿਆ is derived from root ਮੁੰਡਾ.
Morphological generation helps you to know all the words derived from a single root, for example the wors derived from ਮੁੰਡਾ are ਮੁੰਡਾ, ਮੁੰਡੇ, ਮੁੰਡਿਆ, ਮੁੰਡਿਆਂ, and ਮੁੰਡਿਓ. Similarly you can know word forms of ਕਰ, ਹੱਸ, ਕਹਿ etc. It should be noted that morphological generation is only possible from root words, for instance search for ਮੁੰਡੇ will produce no results, it being a word form and not a root word.

Note: This system is in its BETA state and new words are being added. You may notice that some words are labeled as unknown, it is very likely that such words are either person or place names.

Part-of-Speech (POS) Tagger

The output of a morphological analyzer is usually ambiguous as it may return more than one POS (part-of-speech) tag for a single word. The reason being that in sentences, same word can be used as a noun or a verb, as a verb or a postposition etc. The job of a part-of-speech tagger is to disambiguate that ambiguous input by making use of the context information in which the word is being used. A part-of-speech tagger is also known as morphological disambiguator or simply a tagger. This part-of-speech tagger is a linguistic rule-based tagger, the first published tagger for Punjabi. The tagger achieves accuracy comparable with the data driven taggers reported for other Indian languages using a tagging scheme specifically developed in this work with a focus on grammatical features involved in various agreements in Punjabi sentences.

Uses:

Though the output of POS tagger is not directly useful for an ordinary user, with a little effort you can understand its output and get a feel of how the things work in a typical natural language processing system. Examples of such systems include grammar checkers in word processing systems, machine translation systems, speech recognition systems etc.

Phrase Chunker

It is situated between POS tagging and a full-blown grammatical analysis, i.e. parsing. Whereas POS tagging works only on the word level and the grammatical analysis is supposed to build a tree structure of the sentence, phrase chunking assigns tags to word sequences in the sentence. Typical chunks are noun phrase (NP) and verb phrase (VP). Noun phrase can consist of one or more adjectives and a noun/pronoun etc. A typical verb phrase can consist of a main verb, operator verbs, and an auxiliary verb. This phrase chunker is a rule-based chunking system and is the first such system for Punjabi. This system groups input text into various phrases using the annotation scheme specifically developed for this research work. This phrase chunker, using a larger tagset, reports better results than those reported for similar systems for other Indian languages using different statistical approaches.

Uses:

Phrase chunker helps you understand how the sentences are built from different components. You can input a Punjabi sentence and see its words grouped into phrases to understand the Punjabi phrase structure.

Contributions

This web application has benefited from discussions with various professors from Punjabi University, Patiala (India) including Dr. Gurpreet Singh Lehal, Late Dr. Shiv Sharma Joshi, Dr. Harjeet Singh Gill, Mukhtiar Singh Gill, Dr. Joga Singh, and Dr. Baldev Singh Cheema.

Last updated: Oct 06, 2021 UTC