The language used to specify text search strings is called a regular expression (RE). Using a unique syntax that is stored in a pattern, RE aids us in matching or finding other strings or sets of strings. Both in UNIX and MS Word, regular expressions are used similarly to search text.
- Finite State Automata: The plural form of the word automaton is automata, and an automaton is an abstract self-propelled computing device that automatically performs a predetermined sequence of operations.
- Morphological Parsing: The challenge of realizing that a word can be broken down into linguistic structures known as morphemes, or smaller meaningful units, is known as morphological parsing.
Types of Morphemes: The two types of morphemes, the smallest units with meaning, are
- Stems: It is a core meaningful unit of a word.
- Word order: Morphological parsing would determine the word order. criteria for developing a morphological parser:
- Lexicon: The first requirement for creating a morphological parser is a lexicon, which contains a list of stems and affixes as well as some basic details about each one. For example, whether the stem is a Noun stem or a Verb stem, and so on.
- Morphotactics: It is essentially the morpheme ordering model. In other words, the model explains which morpheme classes can follow which morpheme classes within a word.
- Orthographic rules: The changes that take place in a word are modeled using these spelling rules.