What is a Language?

You probably speak at least one language. In computer science, we have two slightly different definitions of languages. We will see that they both actually relate very closely to the kinds of languages you know. We need to generalize the concepts you are familiar with first.

We can think of a language as having two compontents.

  1. The words in the language

  2. The rules about how to arrange the words.

Words in a Dictionary

We can’t just make up whatever words we want and still communicate. Everyone needs to have an agreed on set of acceptable words. We can think of this as a Boolean question. Give a string of text, is either is part of our language or it is not. In English, we could say that a word either appears in the dictionary or does not.

The word “cat” appears in the dictionary. It is an acceptable word in the language. The word “balsdds” does not appear in any English dictionary. It is not an acceptable word in our language.

Even using the term “word” here is not really correct. No one would really say “balsdds” is a word. As computer scientists, we could say it was a string. It is a collection of letters. When we say “word” is is already implied that we mean acceptable words. There are also languages were our implicit meaning of “word” doesn’t translate as well. A language can be pictographic.

Let us generalize these concepts. A symbol is a more general kind of word. A symbol is an element that can be used in our language. It could be a single letter, a picture, and number, a strings, or anything else. It doesn’t really matter, as long as we can communicate them.

Our dictionary is called an alphabet. An alphabet is a finite set of symbols that can be used in our language. Imagine there was a huge book that just listed every acceptable word in the English language. That would be our English alphabet. We write the alphabet with the Greek symbol Sigma \(\Sigma\).

The alphabet for English looks something like

\( \begin{align} \Sigma =& \left\{\text{a}, \text{abbot}, \cdots, \text{cat}, \cdots, \text{zebra}, \cdots \right\} \end{align} \)

We can then ask if symbols are in our alphabet. The symbol \(\in\) is represents the asking if an element is in a set.

\( \begin{align} \text{cat} \in \Sigma =& \text{True} \\ \text{balsdds} \in \Sigma =& \text{False} \end{align} \)

Grammer Rules

A language has a second component. It also has a set of grammatical rules. The following is a valid sentence in English.

The cat lived in a house.

It meets both requirements. All the symbols used are in our alphabet. Additionally, they are arranged in a meaningful way. If we reorder the words, we no longer have something that is acceptable in the English language.

Cat the house in a lived.

This is made up of acceptable symbols, but they are not part of our language because they do not meet the grammatical rules.

Language

A language has two components, an alphabet and a grammar. If an string is given, it is either part of the language or it is not.

\( \begin{align} \text{The cat lived in a house.} \in \text{English} =& \text{True} \\ \text{Cat the house in a lived.} \in \text{English} =& \text{False} \end{align} \)

This is a very brief introduction to Formal Language Theory. It covered everything we will need to study algorithms. If you would like to learn more about Formal Language Theory a good place to start is with this article.