Section I: Languages & Corpora

Section I: Languages & Corpora#

The goal of Section I is to establish the hierarchy of logical relations that govern Strings, Characters, Alphabets, Words, Languages, Sentences and Corpora. As each of these entities is introduced and defined, a new level of relations will be revealed and codified. Palindromic symmetries will manifest on each level, in slightly different but related forms. Each type of symmetry will involve, in some form or another, the concept of String Inversion, to be defined shortly.

The essence of a Palindrome lies in binding together the syntactical symmetries at every linguistic layer into a semantic whole. Indeed, it will be seen the symmetrical structure required by Palindromes in turn requires these linguistic layers to have explicit relations and specific synactical properties, regardless of their semantic interpretation. These symmetries, in turn, guide the formal development in seeking the machinery capable of expressing them.

A Character will form the first layer of the hierarchy. Characters will be regarded as a type of String; they will be seen as “atoms” or “units” of Strings. The elementary operation of Concatenation will be used to build up Strings of greater complexity starting from Character. This much agrees with formal language theory and context free grammar constructions, although the details will differ.

Where the current formalization will start to diverge is at the next level of the semantic hiearchy. A Word will be considered another type of String. Colloquially, a Word can be understood as a String with semantic content, but as this formal system will establish, a Word is not distinguished solely by its semantic content, and in fact, it’s semantic interpretation is dependent on prior syntactical conditions that differentiate Words from Strings; This fact is often not made explicit in formal treatmeants of language, but the study of palindromes makes this distinction critical. In the current work, Words will be treated as linguistic entities that are distinguishable by their non-zero length and lack of Delimiters.

Finally, a Sentence will also be considered a type of String. A Sentence will be regarded as a sequence of Words that have been delimited together and selected as bearing semantic content. A specialization of concatenation in the form of Limitations will be introduced to describe this structure of Sentences. This operation will in turn define a subdomain within the set of all finite Strings that is constructed from the permutation of delimited Words. This set will allow proofs to leverage induction to establish results over the entire set of all Sentences.

Section I will elaborate the necessary syntactic conditions for a String to be distinguished as a formal Word or a formal Sentence, without taking into account the semantic content that is assigned to either entity through colloquial use. In other words, this section seeks to formally disentangle the syntactical functions of Words, Sentences and Strings.

Section I.I: Formalization#

Note

All of the terminology presented in this section will be elaborated upon in the subsequent sections.

Conventions#

General conventions adopted throughout the course of this work are given below.

\(N_n\) will represent the set of natural numbers starting at 1 and ending at n,

\[N_n = \{ 1, 2, ... , n \}\]

The cardinality of a set \(A\) will be denoted \(\lvert A \rvert\)
∎ will be used to denote the ending of all definitions, examples and proofs.
The terms “set” and “class” are used interchangeably.

Constants#

Terms#

Important

The exact meaning of these symbols should be attended with care. \(\mathfrak{a}, \mathfrak{b}, \mathfrak{c}, ...\) represent Characters of the Alphabet and thus are all unique, each one representing a different linguistic element. When Character symbols are used with subscripts, \(\mathfrak{a}_1, \mathfrak{a}_2, \mathfrak{a}_3, ...\), they are being referenced in their capacity to be ordered within a String. With this notation, it is not necessarily implied \(\mathfrak{a}_1\) and \(\mathfrak{a}_2\) are unequal Character-wise, but that they are differentiated only by their relative order in a String.

Likewise, when Character Variables are used with subscripts, it is meant to refer to the capacity of a Character Variable to be indeterminate at a determinate position within a String.

Moreover, the range of a Character Variable is understood to be the Alphabet \(\Sigma\) from which it is being drawn.

Relations#

Sets#

Section I.II: Strings#

All non-Empty Characters belong to the Alphabet,

\[\Sigma = \{ \mathfrak{a}, \mathfrak{b}, \mathfrak{c}, ... \}\]

Important

The Delimiter belongs to the Alphabet.

\[\sigma \in \Sigma\]

The aggregate of the Alphabet and the Empty Character is referred to as the Total Alphabet and is denoted,

\[\Sigma_{e} = \Sigma \cup \{ \varepsilon \}\]

A Character is the basic unit of a String. In order to construct a String or set of Strings, an Alphabet must be selected. A String is regarded as a linguistic artifact or inscription that is defined entirely by its Characters and their ordering. In order to construct more complicated Strings through the sequencing of Characters, the operation of Concatenation must be defined.

Concatenation#

Important

Many of the results of regular expressions and automata theory are taken as given and will not be proved, such as the associativity of concatenation (i.e. \((ut)v = u(vt)\)), the closure of concatenation over \(S\) (i.e., concatenating two Strings will always yield a String), etc.

Example Let \(s_1 = \mathfrak{abc}\) and \(s_2 = \mathfrak{def}\). The concatenation of these two Strings \({s_1}{s_2}\) is written,

\[{s_1}{s_2} = (\mathfrak{abc})(\mathfrak{def})\]

Using the Inductive Clause, this concatenation can be grouped into simpler concatenations as follows,

\[\mathfrak{a}(\mathfrak{b}(\mathfrak{c}(\mathfrak{d}(\mathfrak{ef})))) = (((((\mathfrak{ab})\mathfrak{c})\mathfrak{d})\mathfrak{e})\mathfrak{f}) = \mathfrak{abcdef}\]

By Character Comprehension Axiom, all Characters are Strings and concatenation is closed under \(S\), therefore, \(\mathfrak{ef} \in S\). As each nested concatenation is evaluated, the Induction clause in Concatenation ensures the next level of concatenation is a String.

As a result, \({s_1}{s_2} = \mathfrak{abcdef}\) and \({s_1}{s_2} \in S\)

∎

String Length#

The length of a String is defined as its number of non-Empty Characters.

Example Let \(s_1 = \mathfrak{abc}\varepsilon\mathfrak{def}\). Using Concatenation, this can be grouped as \(s_1 = (\mathfrak{abc}\varepsilon\mathfrak{de})(\mathfrak{f})\).

Applying String Length to \(\mathfrak{f}\) where \(u = \mathfrak{f}\) and \(v = \varepsilon\),

\[l(\mathfrak{f}) = l(\varepsilon) + 1 = 0 + 1 = 1\]

Note

This same logic generalizes to all Alphabetic Characters,

\[\forall \iota \in \Sigma: l(\iota) = 1\]

Applying String Length with \(u = \mathfrak{abc}\varepsilon\mathfrak{de}\) and \(v = \mathfrak{f}\),

\[l(\mathfrak{abc}\varepsilon\mathfrak{def}) = l(\mathfrak{abc}\varepsilon\mathfrak{de}) + 1\]

The first term on the righthand side can be evaluated by applying String Length with \(u = \mathfrak{abc}\varepsilon\mathfrak{d}\) and \(v = \mathfrak{e}\),

\[l(\mathfrak{abc}\varepsilon\mathfrak{def}) = (l(\mathfrak{abc}\varepsilon\mathfrak{d}) + 1) + 1\]

Continuing in this fashion, the result is calculated,

\[l(s_1) = 6\]

∎

The definition of String length allows an important shorthand to be defined. This notation introduces nothing new into the system, but significantly improves the readability of proofs.

Note

The notation \(s[i]\) is borrowed directly from string slicing in computer science.

The following example shows how the definition of Character indexing “skips” over the physical index of Empty Characters and assigns a logical index to any non-Empty Characters in a String.

Example Let \(s_1 = \mathfrak{ab}\varepsilon\mathfrak{c}\). By String Length, \(l(s_1) = 3\).

Consider \(s_1[3]\). Apply the definition of Character Indices with \(u_1 =\mathfrak{ab}\varepsilon\) and \(v_1 = \mathfrak{c}\). \(i = l(s_1)\) and \(v_1 \neq \varepsilon\), therefore, by the Induction clause, \(s[3] = \mathfrak{c}\).

Consider \(s_1[2]\). Apply the definition of Character Indices with \(u_1 =\mathfrak{ab}\varepsilon\) and \(v_1 = \mathfrak{c}\). At this step, \(v_1 \neq \varepsilon\) but \(i \neq l(s_1)\), so the \(s_1[i] = u_1[i]\). Note \(l(u_1) = 2\).

To find \(u_1[i]\), let \(u_1 = {u_2}{v_2}\) where \(u_2 = \mathfrak{ab}\) and \(v_2 = \varepsilon\). At this step, \(i = l(u_1)\), but \(v_2 = \varepsilon\), therefore \(u_1[i] = u_2[i]\). Note \(l(u_2) = 2\).

To find \(u_2[i]\), let \(u_2 = {u_3}{v_3}\) where \(u_3 = \mathfrak{a}\) and \(v_3 = \mathfrak{b}\). At this step, \(i = l(u_2)\) and \(v_3 \neq \varepsilon\), therefore \(u_2[i] = v_3 = \mathfrak{b}\).

From this, it follows, \(s_1[2] = u_1[2] = u_2[2] = v_3 = \mathfrak{b}\).

∎

The first theorem confirms the well known result that String Length sums over concatenation within the formal system.

Note

The proof of Theorem 1.2.1 is a standard proof included in virtually every textbook on the subject of regular expressions, automata, formal language, etc. For this reason, the proof has been omitted from the main body of the work. A proof by induction is presented in Appendix, Omitted Proofs for completionists.

String Equality#

Two Strings are said to be equal if they have the same length and their corresponding Alphabetic Characters (\(\iota \in \Sigma\)) are equal.

Example Let \(s_1 = \mathfrak{ab}\) and \(s_2 = \mathfrak{a}\varepsilon\mathfrak{b}\). Apply String Length,

\[l(s_1) = l(s_2) = 2 = n\]

Now, \(N_n = { 1, 2 }\). Using Character Indices,

\[s_1[1] = s_2[1] = \mathfrak{a}\]

\[s_1[2] = s_2[2] = \mathfrak{b}\]

Therefore, \(\forall i \in N_n: s_1[i] = s_2[1]\). It follows from these facts and application of String Equality,

\[s_1 = s_2\]

∎

Containment#

The notion of containment is the formal explication of the colloquial relation of “being a substring of”.

Example Let \(s_1 = \mathfrak{abcdef}\). Then the truth of the following propositions can be verified using the given values of \(w_1\) and \(w_2\) in the definition of Containment.

\(\mathfrak{ab} \subset_s s_1\), where \(w_1 = \varepsilon\) and \(w_2 = \mathfrak{cdef}\).
\(\mathfrak{cde} \subset_s s_1\), where \(w_1 = \mathfrak{ab}\) and \(w_2 = \mathfrak{f}\).
\(\neg (\mathfrak{g} \subset_s s_1)\), for any \(w_1, w_2\)

∎

Note

This is another standard theorem in formal theory of strings. See Appendix, Omitted Proofs for a proof.

String Inversion#

Note

Many of the theorems in this section are also standard results in the formal theory of strings, but the mechanics of the proofs are materially different from their usual constructions, due to the definition of String Inversion as the reversal of logical Character indices within the String, rather than an induction on a recursive definition of reversal (which canonical proofs found in textbooks typically follow). Since these proofs illustrate the simplification affected by Character indices, they have been retained in the main body of the work.

Example Let \(s_1 = \mathfrak{abc}\). Let \(s_2 = {s_1}^{-1}\). The inverse can be constructed through its Character Indices by applying String Inversion,

\[s_2[1] = s_1[3 - 1 + 1] = s_1[3] = \mathfrak{c}\]

\[s_2[2] = s_1[3 - 2 + 1] = s_1[2] = \mathfrak{b}\]

\[s_2[3] = s_1[3 - 3 + 1] = s_1[1] = \mathfrak{c}\]

Concatenating the results,

\[s2 = {s_1}^{-1} = \mathfrak{cba}\]

∎

Proof Let \(s \in S\). Let \(t = s^{-1}\). Let \(n = l(s)\). From String Inversion,

\[l(t) = l(s) = n \quad \text{ (1) }\]

\[\forall i \in N_n: t[i] = s[n - i + 1] \quad \text{ (2) }\]

Let \(u = t^{-1}\). Applying String Inversion again,

\[l(u) = l(t) = n \quad \text{ (3) }\]

\[\forall j \in N_n: u[j] = t[n - j + 1] \quad \text{ (4) }\]

Plugging \(i = n - j + 1\) into (2) and substituting into (4),

\[\forall j \in N_n: u[j] = s[n - (n - j + 1) + 1] = s[j] \quad \text{ (5) }\]

Moreover, from (1) and (3), it follows,

\[l(s) = l(u) \quad \text{ (6) }\]

By the String Equality, (5) and (6) together imply,

\[u = t^{-1} = (s^{-1})^{-1} = s\]

Therefore,

\[\forall s: (s^{-1})^{-1} = s\]

∎

Proof Let \(s,t \in S\). Let \(u = st\). Let \(m = l(s)\) and \(n = l(t)\). Let \(u = st\). By Theorem 1.2.1,

\[l(u) = l(st) = l(s) + l(t) = m + n\]

Let \(v = u^{-1} = (st)^{-1}\). Let \(w = (t)^{-1}(s)^{-1}\). By repeated application of String Inversion,

\[l(v) = l(st) = m + n \quad \text{ (1) }\]

\[l((t)^{-1}) = l(t) = n\]

\[l((s)^{-1}) = l(s) = m\]

Using these results and applying Theorem 1.2.1 to \(w\),

\[l(w) = l((s)^{-1}) + l((t)^{-1}) = m + n \quad \text{ (2) }\]

From (1) and (2), it follows,

\[l(v) = l(w) \quad \text{ (3) }\]

Let \(i \in N_{m+n}\).

Case 1: \(i \leq i \leq n\)

By String Inversion,

\[v[i] = u[m + n - i + 1]\]

By assumption \(i \leq n\) or \(n - i \geq 0\), therefore,

\[m + n - i \geq m\]

Increasing the LHS of this inequality does not affect the truth of its assertion,

\[m + n - i + 1 \geq m\]

From this, \(u = st\) and \(l(s) = m\), it follows that \(u[m + n - i + 1]\) is an index in \(t\),

\[v[i] = t[n - i + 1] \quad \text{ (4) }\]

Consider \(w[i]\). Since \(l((t)^{-1}) = n\) and \(i \leq n\), it follows that \(w[i] = (t^{-1})[i]\). By String Inversion,

\[w[i] = t^{-1}[i] = t[n - i + 1] \quad \text{ (5) }\]

Combining (4) and (5),

\[v[i] = w[i] \quad \text{ (6) }\]

Applying String Equality, (3) and (6) imply,

\[v = w\]

Case 2: \(n + 1 \leq i \leq m + n\)

By String Inversion,

v[i] = u[m + n - i + 1]

By assumption \(i \geq n + 1\) or \(n - i + 1 \leq 0\), therefore,

\[m + n - i + 1 \leq m\]

From this, \(u = st\) and \(l(s) = m\), it follows that \(u[m + n - i + 1]\) is an index in \(s\),

\[v[i] = s[m + n - i + 1] \quad \text{ (7) }\]

Consider \(w[i]\). Since \(l((t)^{-1}) = n\) and \(i \geq n\), it follows that \(w[i] = (s^{-1})[i - n]\). By String Inversion,

\[w[i] = s^{-1}[i-n] = s[m - (i - n) + 1]\]

\[w[i] = s[m + n - i + 1] \quad \text{ (8) }\]

Combining (7) and (8),

\[v[i] = w[i] \quad \text{ (9) }\]

Applying String Equality, (3) and (6) imply,

\[v = w\]

In both cases, the theorem is proved. Summarizing and generalizing,

\[\forall s,t \in S: (st)^{-1} = (t^{-1})(s^{-1})\]

∎

Proof Let \(s,t \in S\).

(\(\rightarrow\)) Assume \(t \subset_s s\). Then by Containment, there exists \(w_1, w_2 \in S\) such that,

\[s = (w_1)(t)(w_2)\]

Consider \(s^{-1}\). Applying Theorem 1.2.4 twice, this becomes,

\[s^{-1} = (w_2)^{-1}(t)^{-1}(w_1)^{-1}\]

Therefore, there exists \(u_1 = {w_2}^{-1}\) and \(u_2 = {w_1}^{-1}\) such that \(s^{-1} = (u_1)(t^{-1})(u_2)\) and by the definition of Containment,

\[t^{-1} \subset_s s^{-1}\]

(\(\leftarrow\)) The proof is identical to (\(\rightarrow\)).

Therefore,

\[\forall s,t \in S: t \subset_s s \equiv t^{-1} \subset_s s^{-1}\]

∎

Section I.III: Words#

Important

To reiterate the introduction to this section, the current formal system does not seek to describe a generative grammar. Its theorems cannot be used as schema for generating grammatical sentences. The intent of this analysis is to treat Words as interpretted constructs embedded in a syntactical structure that is independent of their specific interpretations.

A Word is a type of String constructed through concatenation that has been assigned by semantic content. A Language is the aggregate of all Words.

\[\forall \alpha \in L: \alpha \in S\]

Or equivalently,

\[L \subset S\]

Word Classes#

Note

\(R\) may be defined equivalently through set builder notation,

\[R = \{ \alpha \in L \mid \alpha = {\alpha}^{-1} \}\]

Example The following table lists some reflective English words.

Word
mom
dad
noon
racecar
madam
level
civic

∎

Important

A Word is invertible if and only if its inverse belongs to the Language.

Example The following table lists some English words and their inverses (where applicable).

Word	Inverse
time	emit
saw	was
raw	war
dog	god
pool	loop
cat	x
you	x
help	x
door	x
book	x

∎

Note

Invertible Words are often called semiordnilaps in other fields of study.

Proof Let \(\alpha \in L\).

(\(\rightarrow\)) Assume \(\alpha \in I\). By the definition of invertible Words,

\[{\alpha}^{-1} \in L\]

By Theorem 1.2.3,

\[({\alpha}^{-1})^{-1} = \alpha\]

Therefore, by assumption,

\[({\alpha}^{-1})^{-1} \in L\]

By the definition of invertible Words,

\[{\alpha}^{-1} \in I\]

(\(\leftarrow\)) Assume \({\alpha}^{-1} \in L\) such that \({\alpha}^{-1} \in I\). By the definition of invertible Words,

\[({\alpha}^{-1})^{-1} \in L\]

By Theorem 1.2.3,

\[\alpha \in L\]

Since \({\alpha}^{-1} \in L\) by assumption, it follows immediately from the definition of invertible Words,

\[\alpha \in I\]

Summarizing and generalizing,

\[\forall \alpha \in L: \alpha \in I \equiv {\alpha}^{-1} \in I\]

∎

Proof Let \(\alpha in R\) and \(l(\alpha) = n\). By the definition of Reflective Words,

\[\alpha = \alpha^{-1}\]

Since \(\alpha \in L\) by assumption, it follows \(\alpha in I\). In other words,

\[\alpha \in R \implies \alpha \in I\]

But this is exactly the definition of the subset relation in set theory, therefore,

\[R \subset I\]

∎

Limitations#

Note

A Limitation, though notationally complex, can be understood as shorthand for the iterated concatenation of words and Delimiters. is the presence of the Delimiter in the Induction clause. In other words, a Limitation inserts Delimiters inbetween each Word in the Lexicon over which the index is ranging.

Example Let \(L = L_english\). Consider calculating the Limitation of the following Phrase,

\[P_3 = (\text{mother}, \text{may}, \text{i})\]

Apply the Basis clause Limitations ,

\[n = 1: \quad \Pi_{i=1}^{1} \alpha_i = \text{mother}\]

The Limitation can then be built up recursively using the Induction clause,

\[n = 2: \quad \Pi_{i=1}^{2} \alpha_i = (\Pi_{i=1}^{1} \alpha_i)(\sigma)(\text{may})= (\text{mother})(\sigma\text{may}) = \text{mother}\sigma\text{may}\]

\[n = 3: \quad \Pi_{i=1}^{3} \alpha_i = (\Pi_{i=1}^{2} \alpha_i)(\sigma)(\text{i}) = (\text{mother}\sigma\text{may})(\sigma\text{i}) = \text{mother}\sigma\text{may}\sigma\text{i}\]

So the Limitation of the Phrase is shown to be,

\[\Pi_{i=1}^{3} \alpha_i = \text{mother may I}\]

Important

The result of a Limitation is a String. Since a Limitation is shorthand for alternating concatenation of Characters and Delimiters, the closure of Limitations over \(S\) is guaranteed by the closure of concatenation over \(S\)

∎

This subsection closes with a definition that will be used to quantify a theorem regarding Word Length.

Warning

The type of each set defined in this section should be carefully analyzed.

A Phrase is an ordered set of Words.
A Lexicon is the set of all Phrases of a fixed Word Length.
A Dialect is the set of Strings formed by delimiting every Phrase in every Lexicon of a Language.

Example Let \(L = \{ \text{hakuna}, \text{matata} \}\). Then, the first few Lexicons are given below,

\[L_1 = \{ \{ (1, \text{hakuna}) \}, \{ (1, \text{matata}) \} \}\]

\[L_2 = \{ \{ (1, \text{hakuna}), (2, \text{hakuna}) \} \{ (1, \text{hakuna}), (2, \text{matata}) \}, \{ (1, \text{matata}), (2, \text{hakuna}) \} \{ (1, \text{matata}), (2, \text{matata}) \} \}\]

\[\text{"..."}\]

The Dialect is the union of all delimited Phrases in all Lexicons of the Language,

\[D = \{ \text{hakuna}, \text{matata}, \text{hakuna hakuna} \text{hakuna matata}, \text{matata hakuna}, \text{matata matata}, ... \}\]

∎

Canonization#

Canonization is a function defined over \(s \in S\) that produces the canonical form of a String by removing all instances of the Empty Character from it.

Example Let \(s_1 = (\mathfrak{a})(\varepsilon)(\mathfrak{b})\).

Let \(u_1 = (\mathfrak{a})(\varepsilon)\) and \(v_1 = \mathfrak{b}\). Note \(v_1 \in \Sigma\) and \(s_1 = (u_1)(v_1)\). By the Induction clause of Canonization,

\[\pi(s_1) = (\pi(u_1))(\v_1)\]

Let \(u_2 = \mathfrak{a}\) and \(v_2 = \varepsilon\). Note \(u_1 = (u_2)(v_2)\). By the Induction clause,

\[\pi(u_1) = \pi(u_2)\]

Let \(u_3 = (\varepsilon)\) and \(v_3 = \mathfrak{a}\). Note \(v_3 \in \Sigma\) and \(u_2 = (u_3)(v_3)\). By the Induction clause,

\[\pi(u_2) = (\pi(u_3))(v_3)\]

By the Basis clause,

\[\pi(u_3) = \varepsilon\]

Putting the recursion together,

\[\pi(s_1) = ((\varepsilon)(v_3))(\v_1)\]

\[\pi(s_1) = (\varepsilon)(\mathfrak{ab})\]

By the Basis clause of Concatenation, this becomes,

\[\pi(s_1) = \mathfrak{ab}\]

∎

Canonization provides a method of “cleaning” \(S\) of troublesome Strings, such as \(\mathfrak{a}\varepsilon\mathkfra{b}\) that prevent the assertion of uniqueness within the semantic domains of \(L\) and \(C\). The Canon provides a domain within \(S\) where the uniqueness of the Limitation can be established.

Proof Let \(n \in \mathbb{N}\) and \(p \in L_n\) such that,

\[p = (\alpha_1, \alpha_2, ..., \alpha_n)\]

The proof will proceed by induction on \(n\).

Basis: Assume \(n = 1\). By Basis clause of Limitations,

\[\Pi_{i=1}^{1} p(i) = \alpha_1\]

Induction: Assume for \(k \geq 1\), these exists a unique String \(s_k\) such that,

\[s_k = \Pi_{i=1}^{k} p(i)\]

By Induction clause of Limitations,

\[\Pi_{i=1}^{k+1} p(i) = (\Pi_{i=1}^{k} p(i))(\sigma)(\alpha_{k+1})\]

By inductive hypothesis,

\[s_{k+1} = \Pi_{i=1}^{k+1} p(i) = ({s_k})(\sigma)(\alpha_{k+1})\]

Therefore, by induction,

\[\forall n \in \mathbb{N}: \forall p \in L_n: \exists! s \in \mathbb{S}: s = \Pi_{i=1}^{n} p(i)\]

∎

Proof Let \(s \in S\). The proof proceeds by induction on \(s\).

Basis Let \(s = \varepsilon\). By the definition Canonization,

\[\pi(\varepsilon) = \varepsilon.\]

Let \(t = \pi(\varepsilon)\). Consider,

\[\pi(t) = \pi(\pi(\varepsilon)) = \pi(\varepsilon) = \varepsilon\]

Induction Assume \(\pi(\pi(t)) = \pi(t)\) for some \(t \in S\). Let \(s = (t)(\iota)\) where \(\iota \in \Sigma_e\). Either \(\iota = \varepsilon\) or \(\iota \neq \varepsilon\).

Case I: \(\iota = \varepsilon\)

By the Induction clause of Canonization,

\[\pi(s) = \pi(t)\]

By the Basis clause of Concatenation,

\[s = (t)(\varepsilon) = t\]

Therefore, by inductive hypothesis,

\[\pi(s) = \pi(t) = \pi(\pi(t)) = \pi(\pi(s))\]

Case II \(iota \neq \varepsilon\)

By the Induction clause of Canonization,

\[\pi(s) = \pi(t\iota) = \pi(t)(\iota)\]

Now the String \(u = \pi(t)\) belongs to the Canon, \(u \in \mathbb{S}\), and must therefore be a String free of \(\varepsilon\). Likewise, \(\iota \neq \varepsilon\) by assumption. Therefore, \(u\iota\) is also a String free of \(\varepsilon\). From this and the definition of Canonization, it follows \(\pi(u\iota) = u\iota\),

\[\pi(s) = u\iota\]

Consider,

\[\pi(\pi(s)) = \pi(u\iota) = u\iota\]

Therefore,

\[\pi(s) = \pi(\pi(s))\]

And the induction is established. Summarizing and generalizing,

\[\forall s \in S: \pi(s) = \pi(\pi(s))\]

∎

Proof Let \(s \in S\).

(\(\leftarrow\)) Assume \(s = \pi(s)\). By the definition of Canon, any String that is the result of Canonization belongs to the Canon, therefore \(s \in \mathbb{S}\).

(\(\rightarrow\)) Assume \(s \in \mathbb{S}\). By the definition of Canon, there must exist a \(t \in S\) such that \(\pi(t) = s\). Consider \(\pi(\pi(t))\). By Theorem 1.3.4,

\[\pi(\pi(t)) = \pi(t)\]

Substituting \(\pi(t) = s\),

\[\pi(s) = s\]

Therefore, the equivalence is established.

∎

Proof Let \(t \in S\). The proof will proceed by induction on \(t\).

Basis: Let \(s \in \mathbb{S}\). Let \(t = \varepsilon\). By the Basis clause of Canonization and the definition of \(Canon <palindromics-definition-1-3-7>\), \(t \in \mathbb{S}\)

Consider \(st\). By the Basis clause of \(Concatenation <palindromics-definition-1-2-1>\), \(st = s\varepsilon = s\). But \(s \in mathbb{S}\) by assumption, thus \(st \in mathbb{S}\).

Induction. Assume \(u \in \mathbb{S}\) such that \(su \in \mathbb{S}\). By Theorem 1.3.5,

\[\pi(su) = su \quad (1)\]

Let \(t = (u)(\iota)\) where \(\iota \in \Sigma\). Consider \(st\),

\[st = (s)(u)(\iota) = (su)(\iota) \quad (2)\]

Where the last equality follows from the associativity of concatenation. By inductive hypothesis, \(su \in \mathbb{S}\). Moreover, \(\iota \in \mathbb{S}\) since \(\pi(\iota) = \iota\). Therefore, by definition of \(Canonization <palindromics-definition-1-3-7>\)

\[\pi(st) = \pi(su)\iota\]

Substituting in (1) and (2)

\[\pi(st) = (su)\iota = st\]

By Theorem 1.3.5,

\[st \in \mathbb{S}\]

Thus, the induction is complete. Summarizing and generalizing,

\[\forall s,t \in \mathbb{S}: st \in \mathbb{S}\]

Section I.IV: Sentences#

A Sentence is a Limitation of Words over a Phrase in the Language’s Lexicon for any value of \(n \geq 1\).

Warning

This statement should not be interpretted as a schema for generating grammatical sentences. In general, Limitations are not grammatical. However, all grammatical sentences are Limitations.

In other words, this statement should be interpretted as a necessary syntactic pre-condition a Sentence must satisfy before it may be assigned semantic content.

A Corpus is the aggregate of all Sentences.

\[\forall \zeta \in C: \exists n: \zeta = \Pi_i^{n} p(i)\]

Note

The value of \(n\) in the preceding equation will be further specified after several definitions and theorems. It will be shown to be directly and necessarily related to the Word structure of \(\zeta\).

The full semantic hierarchy has now been formalized. The hierarchy is summarized as follows,

Strings: \(\iota, \alpha, \zeta\)
Sets: \(\Sigma, L, C\)
Character Membership: \(\iota \in \Sigma\)
Word Membership: \(\alpha \in L\)
Sentence Membership: \(\zeta \in C\)

These observations can be rendered into English,

All Characters, Words and Sentences are Strings.
The Alphabet, Languages and Corpus are sets of Strings.
All non-Empty Characters belong to an Alphabet.
All Words belong to the Language.
All Sentences belong to the Corpus.

Word Length#

Important

The Induction clause of Word Length relies on the Discovery Axiom and the Measureable Axiom to ensure for any Strings \(u \in L\), \(\neg(\sigma \subset_s u)\) and \(u \neq \varepsilon\).

Important

While Word Length will be primarily used on \(\zeta \in C\), it is important to note the definition is defined over all \(s \in S\). In other words, Word Length is a property of Strings, as can be seen in the example, “blargafaful buttons”.

Example Let \(ᚠ = \text{truth is beauty}\).

Let \(u_1 = \text{truth}\) and \(v_1 = \text{is beauty}\). Then \(u_1 \in L_{\text{english}}\) and \(ᚠ = (u_1)(\sigma)(v_1)\). Apply the Induction clause of Word Length,

\[\Lambda(ᚠ) = \Lambda(v_1) + 1\]

Let \(u_2 = \text{is}\) and \(v_2 = \text{beauty}\).

Important

A selection of \(u_2 = \text{i}\) or \(u_2 = \text{is be}\) would not satisfy the condition \(s = {u}{\sigma}{v}\) in the Induction clause, which requires \(u\) and \(v\) to be delimited with \(\sigma\).

Then \(u_2 \in L_{\text{english}}\) and \(v_1 = (u_2)(\sigma)(v_2)\). Apply the Induction clause of Word Length,

\[\Lambda(v_1) = \Lambda(v_2) + 1\]

Finally, note \(v_2 \in L_{\text{english}}\) and apply the Basis clause to \(v_2\),

\[\Lambda(v_2) = 1\]

Putting the recursion together,

\[\Lambda(ᚠ) = (1 + 1) + 1 = 3\]

∎

Example Let \(ᚠ = \text{palindromes vorpal semiordinlap}\)

Let \(u_1 = \text{palindromes}\) and \(v_1 = \text{vorpal semiordinlap}\). Then \(u_1 \in L_{\text{english}}\) and \(ᚠ = (u_1)(\sigma)(v_1)\). Apply the Induction clause of Word Length,

\[\Lambda(ᚠ) = \Lambda(v_1) + 1\]

Let \(u_2 = \text{vorpal}\) and \(\v_2 = \text{semiordinlap}\). Then \(u_2 \notin L_{\text{english}}\) and \(v_1 = (u_2)(\sigma)(v_2)\). Apply the Induction clause of Word Length,

\[\Lambda(v_1) = \Lambda(v_2)\]

Finally, note \(v_2 \in L_{\text{english}}\) and apply the Basis clause to \(v_2\),

\[\Lambda(v_2) = 1\]

Putting the recursion together,

\[\Lambda(ᚠ) = (1 + 1) = 2\]

∎

Important

As these examples demonstrate, the Word Length of a String is always relative to a given a Language. A subscript will be used to denote whether a Word Length is relative to a particular language,

\[\Lambda_{\text{english}}(\text{closing sale}) = 2\]

Whereas,

\[\Lambda_{\text{italian}}(\text{closing sale}) = 1\]

Example Let \(L = L_{\text{english}}\). Let \(ᚠ = \text{observe how system into system runs}\). Consider \(ᚠ[[3]]\).

Let \(u_1 = \text{observe}\) and \(v_1 = \text{how system into system runs}\). Then \(ᚠ = (u_1)(\sigma)(v_1)\), \(u_1 \in L\) and \(3 > 1\). Therefore, by the Induction clause of Word Indices,

\[ᚠ[[3]] = v_1[[3-1]] = v_1[[2]]\]

At the next step, let \(u_2 = \text{how}\) and \(v_2 = \text{system into system runs}\). Then \(v_1 = (u_2)(\sigma)(v_2)\), \(u_2 \in L\) and \(2 > 1\),

\[v_1[[2]] = v_2[[1]]\]

At the next step, let \(u_3 = \text{system}\) and \(v_3 = \text{into system runs}\). Then \(v_2 = (u_3)(\sigma)(v_3)\), \(u_3 \in L\) but \(1 = 1\), therefore,

\[ᚠ[[3]] = v_1[[2]] = v_2[[1]] = u_3 = \text{system}\]

∎

Example Let \(ᚠ = \text{the gobberwarts with my blurglecruncheon}\). Consider \(ᚠ[2]\).

Let \(u_1 = \text{"the"}\) and \(v_1 = \text{gobberwarts with my blurglecruncheon}\). Then \(ᚠ = (u_1)(\sigma)(v_1)\), \(u_1 \in L\) and \(2 > 1\). Therefore, by the Induction clause of Word Indices,

\[ᚠ[[2]] = v_1[[2-1]] = v_1[[1]]\]

At the next step, let \(u_2 = \text{gobberwarts}\) and \(v_2 = \text{with my blurglecruncheon}\). Then \(v_1 = (u_2)(\sigma)(v_2)\) but \(u_2 \notin L\) and \(1 = 1\), so by the first condition of the Induction clause,

\[v_1[[1]] = v_2[[1]]\]

At the next step, let \(u_3 = \text{with}\) and \(v_3 = \text{my blurglecruncheon}\). Then \(v_2 = (u_3)(\sigma)(v_3)\), \(u_3 \in L\) and \(1 = 1\). So, the second condition of the Induction clause,

\[ᚠ[[2]] = v_1[[1]] = v_2[[1]] = u_3 = \text{with}\]

∎

The next theorems will not be required for the final postulates, but they are given to indicate the type of results that may be established regarding the concept of Word Length. For the curious reader, the details can be found in Appendix I.II: Omitted Proofs.

Note

Theorem 1.4.1 and Theorem 1.4.2 demonstrate Word Length is fundamentally different than String Length with respect to the operation of concatenation. In Theorem 1.2.1, it was shown String Length sums over concatenation. Theorem 1.4.1 shows the corresponding property is not necessarily true for Word Length. This is an artifact of the potential destruction of semantic content that may occur upon concatenation.

The edge case of compound Words (e.g. daylight) makes the proof Theorem 1.4.2 particularly interesting.

Sentence Axioms#

The following theorem is proved in Appendix I.II: Omitted Proofs, as it is not required for the results in Section III. This theorem demonstrates the relationship between a Limitation and Word Length that was pointed out in the introduction of this subsection.

Note

The next theorem can be seen as a specialiation of Theorem 1.2.4 for the subdomain of the Corpus.

Proof Let \(\zeta \in C\). Let \(n = \Lambda(\zeta)\). Let \(s\),

\[s = \Pi_{i=1}^{n} \zeta[[i]] \quad \text{ (1) }\]

\[= (\zeta[[1]])(\sigma)(\zeta[[2]]) ... (\varsigma)(\zeta[[n]])\]

Consider \(s^{-1}\),

\[s^{-1} = ((\zeta[[1]])(\sigma)(\zeta[[2]]) ... (\varsigma)(\zeta[[n]]))^{-1}\]

From String Inversion and the fact \(l(\varsigma) = 1\), it follows \(\sigma^{-1} = \sigma\). Using this fact, the application of Theorem 1.2.4 \(n\) times yields,

\[s^{-1} = ({\zeta}^{-1}[[n]])(\sigma)({\zeta}^{-1}[[n-1]]) ... (\varsigma)({\zeta}^{-1}[[1]])\]

Reindex the terms on the RHS to match Limitation with \(j = n - i + 1\). Then, as \(i\) goes from \(1 \to n\), \(j\) goes \(n \to 1\) and visa versa,

\[= \Pi_{i=1}^{n} {\zeta[[n - i + 1]]}^{-1} \quad \text{ (2) }\]

Combining (1) and (2) and generalizing,

\[\forall \zeta in C: (\Pi_{i=1}^{n} \zeta[[i]])^{-1} = \Pi_{i=1}^{n} (\zeta[[n - i + 1]]^{-1})\]

∎

Proof Let \(s, t \in D\). That is, assume, for some \(n, m \in \mathbb{N}\),

\[s = \Pi_{i=1}^{n} p(i)\]

\[t = \Pi_{i=1}^{m} q(i)\]

where \(n = \Lambda(s)\) and \(m = \Lambda(t)\).

The proof proceeds by induction on \(n\).

Basis: Assume \(n = 1\).

Then, by the Basis clause of Limitations, \(s = \alpha\) for some \(\alpha \in L\). By the Discovery Axiom, \(\neg(\sigma \subset_s \alpha)\).

Consider \(u = (\alpha)(\sigma)(t)\). By the Basis clause of Word Length,

\[\Lambda(u) = \Lambda(\alpha) + \Lambda(t)\]

\[\Lambda((s)(\sigma)(t)) = \Lambda(s) + \Lambda(t)\]

Induction Assume for any \(u \in D\) with \(\Lambda(u) = n\),

\[\Lambda((u)(\sigma)(t)) = \Lambda(u) + \Lambda(t)\]

Let \(s \in D\) such that \(\Lambda(s) = n + 1\). By the Induction clause of the Dialects and Limitations,

\[s = (\alpha)(\sigma)(v)\]

By the Induction clause of Word Length,

\[\Lambda(s) = \Lambda(\alpha) + \Lambda(v)\]

\[\Lambda(s) = 1 + \Lambda(v) \quad \text{ (1) }\]

From this and \(\Lambda(s) = n + 1\), it is concluded \(\Lambda(v) = n\) and therefore satisfies the inductive hypothesis.

Consider \(\Lambda((s)(\sigma)(t))\).

\[\Lambda((s)(\sigma)(t)) = \Lambda((\alpha)(\sigma)(v)(\sigma)(t))\]

\[= \Lambda(\alpha) + \Lambda((v)(\sigma)(t))\]

\[= 1 + \Lambda(v) + \Lambda(t)\]

But from (1), this reduces to,

\[= \Lambda(s) + \Lambda(t)\]

Therefore, putting everything together, the Induction is complete,

\[\Lambda((s)(\sigma)(t)) = \Lambda(s) + \Lambda(t)\]

Summarizing and generalizing,

\[\forall s,t \in D: \Lambda((s)(\sigma)(t)) = \Lambda(s) + \Lambda(t)\]

∎

Important

Theorem 1.4.5 only applies to Strings quantified over the Dialect. If the theorem were quantified over the Corpus, i.e. semantic Sentences, then the inductive hypothesis would fail at the step where the induced String is decomposed,

\[s = (\alpha)(\sigma)(u)\]

To see this, note that when a Sentence has it’s first Word partitioned from it, there is no guarantee the resultant will also be a semantic Sentence, e.g. “we are the stuffed men” is a Sentence, but “are the stuffed men” is not a Sentence. Therefore, the theorem must be induced over the Dialect.

This may seem a strong restriction, but as the next two theorems establish, this result still applies to the Corpus.

Proof Let \(\zeta \in C\). By definition of a Sentence,

\[\zeta = \Pi_{i=1}^{\Lambda(\zeta) \zeta[[i]]\]

By the definition of a Dialect, \(\zeta \in D\).

Therefore, \(\zeta \in C \implies \zeta \in D\). This is exactly the definition of a subset,

\[C \subseteq D\]

∎

Proof Let \(\zeta, \xi \in C\).

By Theorem 1.4.6, \(C \subseteq D\). By definition of subsets,

\[\zeta, \xi \in C \implies \zeta, \xi \in D\]

Therefore, by Theorem 1.4.5,

forall zeta, xi in C: Lamdbda((zeta)(sigma)(xi)) = Lambda(zeta) + Lambda(xi)

∎

Proof Let \(\zeta in C\). By Theorem 1.4.3,

\[\zeta = \Pi_{i=1}^{\Lambda(\zeta)} \zeta[[i]]\]

By the Word Comprehension Axiom and Canonization Axiom,

\[\zeta[[i]] \in \mathbb{S}\]

By the definition of Canonization,

\[\pi(\sigma) = \sigma\]

By the definition of Limitation, \(\Pi\) produces Strings through Concatenation. By Theorem 1.3.4, the Canon is closed over Concatenation. From this, it must be the case \(\zeta \in \mathbb{S}\). Therefore,

\[\zeta \in C \implies \zeta \in \mathbb{S}\]

This is exactly the definition of subsets,

\[C \subset \mathbb{S}\]

∎

Sentence Classes#

Proof Let \(\zeta in C\).

(\(\rightarrow\)) Assume \(\zeta \in K\). By the definition of Invertible Sentences,

\[{\zeta}^{-1} \in C\]

By :ref:` <palindromics-theorem-1-2-3>`,

\[({\zeta}^{-1})^{-1} = \zeta\]

By assumption, \(\zeta \in C\), therefore, by the definition of Invertible Sentences,

\[{\zeta}^{-1} \in K\]

(\(\leftarrow\)) Assume \({\zeta}^{-1} \in K\), which implies \({\zeta}^{-1} \in C\). By assumption \(\zeta \in C\). Therefore, definition of Invertible Sentences,

\[\zeta \in K\]

Summarizing and generalizing,

\[\forall \zeta in C: \zeta \in K \equiv {\zeta}^{-1} \in K\]

∎

Proof Let \(\zeta \in K\). By the definition of Invertible Sentences,

\[{\zeta}^{-1} \in C\]

By Theorem 1.4.4, this can be written,

\[{\zeta}^{-1} = \Pi_{i=1}^{n} p(i)\]

where,

\[p = ( {\zeta[[n]]}^{-1}, {\zeta[[n-1]]}^{-1}, ... , {\zeta[[1]]}^{-1} )\]

By the Word Comprehension Axiom,

\[\forall i \in N_{\Lambda(\zeta)}: {{\zeta}^{-1}}[[i]] \in L\]

From this, it can be concluded every \({\zeta[[i]]}^{-1}\) in \(p\) must belong to \(L\), and each of those Words has an inverse that is also in \(L\).

By the definition of Invertible Words, the inverse of a Word can only belong to a Language if and only if the Word is invertible.

\[\forall i \in N_{\Lambda(\zeta)}: {{\zeta}^{-1}}[[i]] \in I\]

Therefore,

\[\forall i \in N_{\Lambda(\zeta)}: {\zeta[[i]]}^{-1} \in I\]

By Theorem 1.3.1,

\[\forall i \in N_{\Lambda(\zeta)}: \zeta[[i]] \in I\]

Generalizing,

\[\forall \zeta in K: \forall i \in N_{\Lambda(\zeta)}: \zeta[[i]] \in I\]

∎

Proof Let \(\zeta \in K\), let \(n = \Lambda(\zeta)\) and let \(i \in N_n\).

By Theorem 1.4.6 and assumption,

\[\forall i \in N_n: \zeta[[i]] \in I\]

By Theorem 1.3.1,

\[\forall i \in N_n: (\zeta[[i]])^{-1} \in I\]

Consider,

\[\Pi_{i=1}^{n} (\zeta[[n - i + 1]])^{-1}\]

By Theorem 1.4.4,

\[(\Pi_{i=1}^{n} \zeta[[i]])^{-1}\]

And by definition of Sentences and Limitations,

\[\zeta = \Pi_{i=1}^{n} \zeta[[i]]\]

Therefore,

\[(\zeta)^{-1} = \Pi_{i=1}^{n} (\zeta[[n - i + 1]])^{-1}\]

By the \(Theoreom 1.4.8 <palindromics-theorem-1-4-8>\), \(C \subset \mathbb{S}\). By Theorem 1.3.3, Limitations are unique over the Canon, thus the only way two Limitations that belong to the Corpus can be equal to \(\zeta^{-1}\) is when,

\[{{\zeta}^{-1}}[[i]] = (\zeta[[n - i + 1]])^{-1}\]

Summarizing and generalizing,

\[\forall \zeta \in K: \forall i \in N_{\Lambda(\zeta)}: {\zeta}^{-1}[[i]] = (\zeta[[\Lambda(\zeta) - i + 1]])^{-1}\]

Section I.V: Summary#

The analysis requires one more piece of formal machinery before it can codify the phenomenon of palindromes. However, even without the later results, Theorem 1.4.10 and Theorem 1.4.11 are particularly compelling results that demonstrate the efficacy of the current formal system and its ability to generate novel, if intuitively obvious, theorems.

The deductive path from Theorem 1.4.10 to Theorem 1.4.11 follows a “propagation of inversion” up the semantic hierarchy, from Characters to Words to Sentences.

First, String Inversion was defined as a operation performed on the Characters within a String,

\[s[i] = t[l(s) - i + 1]\]

Where \(t\) is the inverse of \(s\), \(t^{-1} = s\). This in turn defined an equivalence class over involutive Words in Reflective Words,

alpha in R equiv alpha = {alpha}^{-1}

Moreover, it created a semi-group in Invertible Words,

\[\alpha \in I \equiv {\alpha}^{-1} \in I\]

This inversion makes its way to the top layer of the semantic hierarchy with Invertible Sentences,

\[\zeta \in K \equiv {\zeta}^{-1} \in C\]

The class \(K\) then imposes a condition on all Sentences that belong to it, namely that its Words must be also invertible,

\[\zeta[[i]] \in I\]

The inversion then “propagates” up a level in the semantic hierarchy and results in a directly analogous condition on the Word-level to the Character-level symmetry,

\[{\zeta}^{-1}[[i]] = (\zeta[[\Lambda(\zeta) - i + 1]])^{-1}\]

Important

The direction of implication in Theorem 1.4.10 and Theorem 1.4.11 is unidirectional. In other words, while invertibility implies the previous two equations, invertibility cannot be concluded on the basis of the previous two equations. This is an artifact of the formal system’s inability to formalize the grammar of Sentences.

Section I: Languages & Corpora

Contents

Section I: Languages & Corpora#

Section I.I: Formalization#

Conventions#

Constants#

Terms#

Relations#

Sets#

Section I.II: Strings#

Concatenation#

String Length#

String Equality#

Containment#

String Inversion#

Section I.III: Words#

Word Classes#

Limitations#

Canonization#

Section I.IV: Sentences#

Word Length#

Sentence Axioms#

Sentence Classes#

Section I.V: Summary#