Regular languages

Martin McBride, 2018-05-09
Tags none
Categories none

A written language uses set of symbols that are be arranged according to certain rules, to express meaning.

The sentence above, of course, is written in the English language. The symbols it uses the characters of the alphabet, plus a few punctuation symbols. It has rules of spelling that allow you to construct words, and also rules of grammar that allow you to construct sentences. However, in the case of the English language, the rules are not completely precise. There are differences of opinion on the correct spelling of some words, and also some of the finer points of grammar. Not only that, but people often write English that is indisputably incorrect, but we can usually still understand it perfectly well.

Computer languages

Computer languages also have rules, but they are generally far more precisely defined than natural languages such English. For example, consider this Python code:

def sum(a, b):
    return a + b

This defines a function that adds two numbers together. Here are some of the rules:

  • The word def and return are called keywords, and must be written exactly as shown. We can't decide to use define to define our function, we can't even use a capital letter Def, that would be incorrect and the program wouldn't run.
  • We can choose our own name for the function sum, and the variables a and b. We could have called it add(x, y) for example. But there are rules about these names - the first character must be either a letter or an underscore, and the remaining characters must be letters, numbers or underscores.
  • The overall structure of the function definition must also follow rules. A simple function like this must start with def followed by the function signature and colon character. The body of the function follows this line, and must be indented.

Regular languages

A regular language is a particular type of computer language that obeys specific rules. Regular languages are often quite simple and apply to things such as number formats (eg 0xAA55 for a hex number), Huffman codes, simple file formats and similar.

There are various ways to define a regular language. One definition is that a regular language can be parsed by a finite state machine). We will look at that is more detail in that section.

Most programming languages, such as Python, Java, C etc are not regular languages, see this section

Copyright (c) Axlesoft Ltd 2021