On regular languages over power sets

abstract

The power set of a finite set is used as the alphabet of a string interpreting a sentence of Monadic Second-Order Logic so that the string can be reduced (in straightforward ways) to the symbols occurring in the sentence.Simple extensions to regular expressions are described matching the succinctness of Monadic Second-Order Logic.A link to Goguen and Burstall's notion of an institution is forged, and applied to conceptions within natural language semantics of time based on change.Various reductions of strings are described, along which models can be miniaturized as strings.

introduction
Working with more than one alphabet is established practice in finitestate language processing, attested by the popularity of auxiliary symbols (e.g., Kaplan and Kay 1994;Beesley and Karttunen 2003;Yli-Jyrä and Koskenniemi 2004;Hulden 2009).To avoid choosing an alphabet prematurely, implementations commonly treat the alphabet Σ as a dynamic entity that is left underspecified before the finite automaton is constructed in full. 1 Fixing Σ is not always necessary to determine the language denoted by an expression.This is the case with regular expressions; the expression denotes the empty set for any alphabet Σ, and the expression ab denotes the singleton set {ab} for any alphabet Σ ⊇ {a, b}.Beyond regular expressions, however, there are expressions that denote different languages given different choices of the alphabet Σ.Consider ab's negation (or complement) ab, which denotes a language that is regular iff Σ is a finite set.To delay fixing Σ to some finite set is to leave open just what the denotation Σ * − {ab} of ab is.Relative to an alphabet Σ, a symbol c, understood as a string of length one, belongs to that denotation if and only if c ∈ Σ. (Σ contains any symbol, including c, in the open alphabet system implemented in Beesley and Karttunen 2003.)Apart from negations, there are many more extensions to regular expressions describing denotations that vary with the choice of alphabet.Consider the sentences of Monadic Second-Order Logic (MSO), which, under a model-theoretic interpretation against strings, capture the regular languages, by a fundamental theorem due independently to Büchi, Elgot and Trakhtenbrot (e.g., Theorem 3.2.11, page 145 in Grädel 2007;Theorem 7.21, page 124 in Libkin 2010).Leaving the precise details of MSO for Section 2 below, suffice it to say (for now) that occurrences of a string symbol a are encoded in a unary predicate symbol P a for an MSO-sentence such as ∀x P a (x), saying a occurs at every string position (satisfied by the string aaa but not by the string ab unless a = b).We can check if a string over any finite alphabet Σ (hereafter, a Σ-string) satisfies an MSO-sentence φ, but the computation gets costlier as Σ is enlarged.Surely, however, only the symbols that appear in φ matter in satisfying φ or its negation?To investigate this question, let the vocabulary of φ be the set voc(φ) := {a | P a occurs in φ} of subscripts of unary predicate symbols appearing in φ. (For example, ∀x P a (x)'s vocabulary voc(∀x P a (x)) is {a}.)Now the question is: can we not reduce satisfaction of φ by a Σ-string to satisfaction of φ by a voc(φ)-string?A simple form such a reduction might take is a function f : Σ * → voc(φ) * mapping a Σ-string s to a voc(φ)-string f (s) that satisfies φ if and only if s does (1) Unfortunately, already for φ equal to ∀x P a (x) and Σ to {a, b}, it is clear no such function f can exist; the lefthand side of (1) fails for s = ab, [ 30 ] whereas the righthand side cannot: a n |= ∀x P a (x) for all integers n ≥ 0.
Evidently, voc(φ) * is too small to provide the variation necessary for the reduction (1).Enter (2 voc(φ) ) * , where the power set 2 A of a set A is the set of all subsets of A. For any MSO-sentence φ and string s = α 1 • • • α n of sets α i , we intersect s componentwise with voc(φ) for the 2 voc(φ) -string Then for any finite set (2) The subscripts Σ and voc(φ) on |= in the lefthand and righthand sides of (2) track the reduction effected by ρ voc(φ) but could otherwise be dropped, had we not already used |= for the satisfaction relation mentioned in (1).Fixing φ's denotation relative to Σ as the set of 2 Σ -strings that |= Σ -satisfy φ, we may conclude from (2) that ( †) whatever finite set Σ we use to fix the denotation of φ, it all comes down to voc(φ).Beyond MSO, the reduction (2) is an instance of a general condition built into an abstract model-theoretic approach to specification and programming based on institutions (Goguen and Burstall 1992).We adopt this perspective to generalize (2) in Section 3 from ρ voc(φ) [ 31 ] to functions on strings of sets, manipulating not only the vocabulary but also the length of strings (yielding, at the limit, infinite strings).At the center of this perspective are declarative methods for specifying sets of strings over different alphabets.We focus on methods, including but not limited to MSO, where the alphabets are power sets 2 Σ of finite sets Σ.

Our argument for
A multiplicity of such alphabets is useful in the semantics of tense and aspect to measure time at different bounded granularities Σ, tracking finite sets of unary predicates named in Σ. Consider, for instance, Reichenbach's well-known account based on a reference time R, an event time E and a speech time S (Reichenbach 1947).We can picture various temporal relations between an event and a speech as strings of boxes that may or may not contain E or S. For example, the string E S portrays S after E (much like a film or comic strip), which we can verbalize using the simple past or the present perfect, illustrated by (a) and (b) respectively (where the event with time E is Ed's exhalation).where a box is drawn instead of the usual curly braces {, } for a set construed as a symbol in a string of sets.The difference brought out in ( ‡) carries significance for anaphora (e.g., Kamp and Reyle 1993, where R is split many ways) and event structure (including an event's consequent state, in Moens and Steedman 1988).Both strings in ( ‡) can be constructed from simpler strings representing a Reichenbachian analysis of (i) tense as a relation between R and S, with Σ = {R,S} and R S for the past (a), and R,S for the present (b) and (ii) aspect as a relation between R and E, with Σ = {R,E} and R,E for the simple (a), and E R for the perfect (b).
[ 32 ] Complicating the picture, there are finer analyses of E into aspectual classes going back to Aristotle, Ryle and Vendler (e.g., Dowty 1979) that call for an expansion of Σ = {R,E,S} to refine the level of granularity (Fernando 2014).A wide ranging hypothesis that the semantics of tense and aspect is finite-state is defended in Fernando (2015), deploying regular languages over power sets, of the kind described below.
Applications to temporal semantics aside, the reader expecting a discussion of finite-state methods applied to phonology, morphology and/or syntax should be warned that such a discussion has been left for someone competent in such matters to take up elsewhere.The present paper claims neither to be the first nor the last word on regular languages over power sets.Its aim simply is to show how to get a handle on the dependence of certain declarative methods on the choice of a finite set Σ of symbols by stepping up to the power set 2 Σ of Σ and reducing a string through some function ρ voc(φ) or other.MSO provides an obvious point of departure (Section 2), leading to further declarative methods (Section 3).

mso and related extensions of regular expressions
It is convenient to fix an infinite set Z of symbols a that can appear in unary predicate symbols P a , from which sentences of MSO are formed.
An MSO-sentence φ can have within it only finitely many unary predicate symbols P a , allowing us to break MSO up into fragments given by finite subsets Σ of Z (no single one of which encompasses all of MSO).
In addition to the P a 's, we assume a binary relation symbol S (for successors), from which we can form, for example, the MSO-sentence saying that every a-occurrence is succeeded by a b-occurrence.Formal definitions are given in Subsection 2.1 of a satisfaction relation |= Σ between (finite) MSO Σ -models and MSO Σ -sentences, built from MSO Σformulas with free variables analyzed by suitable expansions of Σ.These expansions are undone by functions ρ Σ on strings that arguably provide the key to predication and quantification over strings.Indeed, the ρ Σ 's pave an easy route to the regularity of MSO, as we show in Subsection 2.2.The functions can be tweaked for useful extensions [ 33 ] in Subsection 2.3 of regular expressions, and declarative methods in Section 3 that, like our presentation of MSO via |= Σ , meet abstract requirements from Goguen and Burstall (1992).
In what follows, we write Fin(A) for the set of finite subsets of a set A. Often but not always, A is Z.

MSO-models, formulas and satisfaction
We restrict our attention to finite models, defining for any integer n ≥ 0, [n] to be the set of integers from 1 to n, [n] := {1, 2, . . ., n} and S n to be the successor (next) relation from Given Σ ∈ Fin(Z), let us agree that an MSO Σ -model M is a tuple There is a simple bijection str from MSO Σ -models to 2 Σ -strings, picturing an MSO Σ -model which inverts to (with α i boxed, as noted in the introduction, to mark them out as string symbols).Strings of boxes with exactly one An advantage in working with (generalizing ρ voc(φ) in the introduction).The A-reduct of the MSO Σmodel given by the string The difference between an MSO Σ -model M and the string str(M) is so slight that we can confuse M harmlessly with str(M) and refer to a 2 Σ -string as an MSO Σ -model.
To form MSO-formulas with free variables, let us fix an infinite set Var disjoint from Z, Var ∩ Z = , treating each x ∈ Var as a firstorder variable.Given finite subsets Σ of Z and V of Var, we define a MSO Σ,V -model to be a 2 Σ∪V -string in which each x ∈ V occurs exactly once, and collect these in the set Mod V (Σ) We define the set MSO Σ,V of MSO Σ -formulas φ with free variables in V by induction, alongside sets Σ,V (φ) of strings in Mod V (Σ) that satisfy φ, determining a satisfaction relation The inductive definition consists of six clauses.
(a) If {x, y} ⊆ V , then x = y and S(x, y) are in MSO Σ,V , with x = y satisfied by strings in Mod V (Σ) where x and y occur in the same position x, y * and S(x, y) satisfied by strings in Mod V (Σ) where x occurs immediately before y and is satisfied by strings in Mod V (Σ) where the occurrence of x coincides with one of a Σ,V (P a (x)) For quantification, we must be careful that a variable can be reused, as in which is equivalent to P b (x)∧∃ y P a ( y) since ∃x P a (x) and ∃ y P a ( y) are. 3o cater for reuse of q ∈ Var ∪ Z, we define an equivalence relation ∼ q between strings s and s ′ of sets that differ at most on q, putting where the function ρq removes q from a string We can now state the last two clauses of our inductive definition of MSO Σ,V and Σ,V (φ). (e which simplifies in case P a is not reused We adopt the usual abbreviations: φ ∨ ψ for ¬(¬φ ∧ ¬ψ), ∀xφ for ¬∃x¬φ, etc.Also, we render second-order quantification ∃P a as ∃X , writing ∃X φ for ∃P a φ X a where a does not occur in φ, and φ X a is φ with P a replacing every occurrence of X .For example, we can express where closed(X ) abbreviates ∀x∀ y(X (x) ∧ S(x, y) ⊃ X ( y)), which we can picture as Next comes the pay-off in interpreting MSO-sentences over not just Z-strings but strings of sets.An easy proof by induction on φ ∈ MSO Σ,V establishes and for all φ ∈ MSO A,U , To pick out MSO Σ,V -formulas with no free variables, we let V = for the set MSO Σ = MSO Σ, [ 37 ] of MSO Σ -sentences, and write |= Σ for |= Σ, , and Σ (φ) for Σ, (φ) (where φ ∈ MSO Σ ).An immediate corollary to Proposition 1 is that for all φ ∈ MSO Σ and s ∈ Mod where voc(φ) is the smallest subset (sharpening the description of voc(φ) in the introduction).

Regularity
For any finite sets A and B, the restriction ) * is a regular relation -i.e.computed by a finite-state transducer (with one state, mapping α ⊆ B to α ∩ A).For the preimage (or inverse image) of a language L under a relation R, we borrow the notation from dynamic logic, instead of R −1 L which becomes awkward for long R's.We can then rephrase the definition of Mod V (Σ) as Similarly we have As regular languages are closed under intersection, complementation and preimages under regular relations (which are themselves closed under inverses), it follows that [ 38 ] Proposition 2 For every Σ ∈ Fin(Z), V ∈ Fin(Var) and φ ∈ MSO Σ,V , the set Σ,V (φ) of strings in Mod V (Σ) that satisfy φ is a regular language.
The aforementioned Büchi-Elgot-Trakhtenbrot theorem (BET) sidesteps free variables, making do with MSO Σ = MSO Σ, and a fragment There is a sense in which the difference between s and ι(s) is purely cosmetic; a simple one-state finite-state transducer computes ι.But the MSO Σ -sentences valid in |= Σ need not be valid in |= Σ ; take the MSO Σ -sentence specifying in every string position x, exactly one symbol a from Σ.
BET effectively presupposes spec(Σ) to extract from φ ∈ MSO Σ the regular language {s ∈ Σ * | ι(s) |= Σ φ} over Σ, rather than the full regular language Σ (φ) over 2 Σ from Proposition 2. To represent a regular language over 2 Σ , BET provides a sentence not in MSO Σ but in MSO 2 Σ , which we can translate into MSO Σ by replacing every subformula P α (x) (for α ⊆ Σ) with the conjunction in MSO Σ,{x} interpretable by |= Σ,V . 4Insofar as computations are carried out on syntactic representations (e.g., MSO-formulas) rather than on semantic models (designed largely as theoretical aids to understanding), the explosion from Σ to 2 Σ is computationally worrying in the syntactic step from MSO Σ to MSO 2 Σ rather than in the semantic enrichment of Σ * to (2 Σ ) * .
4 Conversely, we can translate MSO Σ to MSO 2 Σ by replacing subformulas [ 39 ] Underlying Proposition 2 is a recipe from MSO Σ,V to the regular expressions A .These extended regular expressions are as succinct as the formulas in MSO Σ,V they represent (up to a constant factor).That said, if we take the example of spec(Σ), we can simplify the recipe for linear in the size of Σ (as opposed to spec(Σ) with quadratically many occurrences of the variable x).The representability of regular languages by regular expressions in general (i.e., Kleene's theorem) raises the question: what useful finite-state tools does MSO add to the usual regular operations?Apart from intersection and complementation (the usual extensions to regular expressions), one tool that MSO Σ introduces is the idea of a string as a model, the proper formulation of which blows Σ up to its power set 2 Σ (to represent all finite MSO Σmodels, whether or not they satisfy spec(Σ)).Exploiting that blow up, we can define regular relations such as ρ B A under which preimages of regular languages are also regular.We modify the relations ρ B A in the next subsection, Subsection 2.3, examining the MSO representation of accepting runs of a finite automaton, which is demonstrably more succinct than any available with regular expressions.

Some parts and sorts
Using sets as symbols provides a ready approach to meronymy (i.e., parts); we drop the subscript A on ρ A for the non-deterministic relation ⊵ of componentwise inclusion between strings of the same length called subsumption in Fernando (2004).For example, s ⊵ ρ A (s) for all strings s of sets.A part of reduced length can be obtained by truncating [ 40 ] On regular languages over power sets a string s from the front for a suffix s ′ s suffix s ′ ⇐⇒ (∃s ′′ ) s = s ′′ s ′ or from the back for a prefix s ′ s prefix s ′ ⇐⇒ (∃s ′′ ) s = s ′ s ′′ .
We can then compose the relations ⊵, suffix and prefix for a notion ⊒ of containment between strings of possibly different lengths.For every atomic MSO Σ,V -formula φ, the satisfaction set Σ,V (φ) consists of the strings in Mod V (Σ) with characteristic ⊒-parts, given as follows.
Proposition 3 For all disjoint finite sets Σ and V , Under Proposition 3, each set Σ,V (φ) is the intersection of Mod V (Σ) with a language 〈⊒〉 s φ , where s φ is a string of length ≤ 2 that pictures φ.The obvious picture of x < y is the set x * y of arbitrarily long strings which is nonetheless easier to visualize (if not read) than the MSO ,{x, y} -formula expressing x < y.To compress the language x * y to the string x y , we can replace containment ⊒ by weak containment with deletions (x i equal to the empty string ε) allowed anywhere, not just in the front or back of α 1 • • • α n or inside any box α i .(For example, x, a n y ⪰ x y for all integers n ≥ 0.) Proposition 3 holds with ⊒ and S(x, y) replaced by ⪰ and x < y respectively is computable by a finite-state transducer (for all finite sets Σ and V ).Within Mod V (Σ) are part relations ρ {x} (for x ∈ V ) revealed by the equation Moving from MSO to finite automata, let us rewrite pairs Σ, V as pairs A, Q of disjoint finite sets A and Q, and define an (A, Q)-automaton to be a triple = (→ , F , q ) consisting of (i) a set → of triples in Q × A × Q specifying -transitions (where we write q a → q ′ instead of (q, a, q ′ ) ∈ → ) (ii) a set F ⊆ Q of -final states, and (iii) an -initial state q ∈ Q.
Given an (A, Q)-automaton , an -accepting run is a string → q 1 and q n ∈ F and 5 For the present purposes, we can take a part relation to be any fragment R of ⪰ (i.e., whenever sRs ′ , s ⪰ s ′ ).Thus, ρ A , suffix, prefix, ⊒ and ⪰ are all part relations.
[ 42 ] (where for n = 0, the empty string ε is an -accepting run iff q ∈ F ). Let AccRuns( ) be the set of -accepting runs.Clearly, for all s ∈ A * , accepts s ⇐⇒ (∃s ′ ∈ AccRuns( That is, accepts the language As for the set AccRuns( ) of -accepting runs, we start by collecting strings of pairs from A and We refine Pairs(A, Q) to AccRuns( ), taking into account (i) the set Init[ ] of strings that start with a pair a, q such that q a ⇝ q Init[ ] := 〈prefix〉 a, q | q a ⇝ q (ii) the set Final[ ] of strings ending with an -final state Final[ ] := 〈⊵〉 〈suffix〉 q | q ∈ F and (iii) the set Bad[ ] of strings containing q a, q ′ for triples (q, a, q ′ ) outside the set ⇝ of -transitions (and containment ⊒ is the relational composition of ⊵, suffix and prefix).
[ 43 ] Proposition 4 For all disjoint finite sets A and Q, and all (A, Q)-automata , the set AccRuns( ) of -accepting runs consists of all strings in Pairs(A, Q) that belong to Init[ ] and Final[ ] Note that the language Pairs(A, Q) can be formed by defining for any finite sets C and D, the set The language c | c ∈ C of ρ C -parts of strings in Spec D (C) includes strings of any finite length, whereas all strings a, q , q and q a, q ′ pictured in Init , Final and Bad have length ≤ 2. This is one sense in which the constraint captures accepting runs of all (A, Q)-automata, just as Mod V (Σ) in Proposition 3 captures all MSO Σ,V -models.That is, Pairs(A, Q) and Mod V (Σ) are general, sortal constraints that provide a context (or background) for more specific constraints to differentiate strings of the same sort; this differentiation is effected in Propositions 4 and 3 by attributes or parts that pick out substrings of length bounded by 2. Table 1 outlines the situation.
A further difference between the second and third columns of Table 1 is that whereas the sortal constraints Mod V (Σ) and Pairs(A, Q) employ deterministic part relations ρ A , the differential constraints employ non-deterministic relations ⊒, prefix and the relational composition ⊵; suffix.Although it is [ 44 ] clear from Subsection 2.1 that the work done by ⊒, prefix and ⊵; suffix can be done by ρ A , non-determinism nevertheless arises when introducing existential quantification through the inverse θ B A of ρ B A (used for the step from -accepting runs to the language ( ) accepted by ).But while ⊒, prefix and ⊵; suffix search inside a string, θ B A searches outside.The search by θ B A is bounded only because the set B (that serves as its superscript) is finite (with elements of B not in A amounting to auxiliary symbols).
Non-determinism aside, the relations ⊒, prefix and ⊵; suffix differ from ρ A and its inverse in relating strings of different lengths.Indeed, Table 1 arose above from the observation that parts with length ≤ 2 suffice for the constraints in the third column.That said, in the next section, we compress strings deterministically without setting any predetermined bounds (such as 2) on the resulting length, for sorts and parts alike.

compression and institutions
Having established through Proposition 1 the reduction (2) (for all φ ∈ MSO Σ and s ∈ (2 Σ ) * ), we proceeded to part relations other than ρ A in Table 1.The present section calls attention to string functions that can (unlike ρ A ) shorten a string, pointing the equivalence (2) and Table 1 in the direction of institutions (Goguen and Burstall 1992).
As the length n of a string determines the domain [n] = {1, . . ., n} of the model encoded by the string, compression alters ontology over and above A-reducts produced by ρ A . 3.1

From compression to inverse limits
We can strip off empty boxes at the front and back of a string s by defining s otherwise so that unpad(s) neither begins nor ends with , making * x * = 〈unpad〉 x .
[ 45 ] Using unpad-preimages, we can eliminate Kleene stars from the right side of x * (3) and from the extended regular expressions from Proposition 3 for the sets Σ,V (φ) of strings satisfying formulas φ ∈ MSO Σ,V .Regular expressions with complementation instead of Kleene star are known in the literature as star-free regular expressions, denoting, by a theorem of McNaughton and Papert, the first-order definable sets (Theorem 7.26, page 127, Libkin 2010).We can formulate a notion of Σ-extended starfree expressions matching the regular expressions over 2 Σ , but while it is easy enough to introduce the constructs 〈⊒〉 and 〈unpad〉, we need subsets and supersets of Σ to relativize complementation and define the constructs ρ B A and θ B A , where θ B A is the inverse of ρ B A .On the positive side, this complication is potentially interesting as it suggests a hierarchy between the star-free regular languages and regular languages over 2 Σ .Be that as it may, our present concerns lie elsewhere.
Rather than separating the set Var of first-order variables from the set Z of subscripts a on unary predicates P a , we can formulate the requirement on a symbol a that it occur exactly once in MSO {a} nom(a) := ∃x∀ y(P a ( y) ≡ x = y) characteristic of nominals in the sense of Hybrid Logic (e.g., Braüner 2014, or "world variables" in Prior 1967, pages 187-197), with {a} (nom(a)) = 〈unpad〉 a .
From nom(a), it is a small step to the condition interval(a) that a occur in a string without gaps, which we can express in MSO {a} as interval(a where gap a ( y) says a does not occur at position y even though it occurs before and after y gap a ( y) [ 46 ] On regular languages over power sets We can eliminate • + from the right of (4) by defining a function bc that given a string s, compresses blocks α n of n > 1 consecutive occurrences in s of the same symbol α to a single α, leaving s otherwise unchanged bc(s so that a + is 〈bc〉 a .In general, bc outputs only stutter-free strings, where a string Construing boxes in a string as moments of time, we can view bc as implementing "McTaggart's dictum that 'there could be no time if nothing changed"' (Prior 1967, page 85).The restriction of bc to any finite alphabet is computable by a finite-state transducer, as are, for all Σ ∈ Fin(Z) and A ⊆ Σ, the composition For a ∈ Σ, the (2 Σ )-strings in which a is an interval are those that π Σ {a} maps to a

The functions π
[ 47 ] Conflating a string s with the language {s}, observe that Interval({a}) = a .For a ̸ = a ′ , the set Interval({a, a ′ }) consists of thirteen strings, one per interval relation in Allen (1983), which can be partitioned between the nine-element set describing overlap ⃝ between a and a ′ insofar as for all s ∈ Interval(Σ) with a, a ′ ∈ Σ, and the two-element sets describing complete precedence ≺ insofar as for all s ∈ Interval(Σ) with a, a ′ ∈ Σ, s |= Σ ∀x∀ y (P a (x) ∧ P a ′ ( y)) ⊃ x < y ⇐⇒ π Σ {a,a ′ } (s) ∈ (a ≺ a ′ ) and similarly for a ′ ≺ a. Event structures are built around the relations ⃝ and ≺ in Kamp and Reyle (1993) (pages 667-674) to express the Russell-Wiener event-based conception of time, a particular elaboration of McTaggart's dictum mentioned above.The sets Interval(A) above provide representations of finite event structures (Fernando 2011).
Requiring that event structures be finite flies against the popularity of, for instance, the real line in temporal semantics (e.g., Kamp and Reyle 1993, page 670).But we can approximate any infinite set Z by its set Fin(Z) of finite subsets, using the inverse system for the inverse limit consisting of maps a : Fin(Z) → Fin(Z) * that respect the projections π A,B .An element of that inverse limit, in case ⊆ Z, is the map a such that for all copying .Notice that compressing strings via π A,B allows us to lengthen the strings in the inverse limit.If we remove the compression bc in π A,B , we are left with the map ρ A that leaves the ontology intact (insofar as the domain of an MSO-model is given by the string length), whilst restricting the vocabulary (for A-reducts).

From inverse systems to institutions
We have left out from the language Interval({a}) = a the string a (among many others) that satisfies interval(a), having built unpad into Notice that a is bounded to the left in a a |= {a} ∃x∃ y(S(x, y) ∧ P a ( y) ∧ ¬P a (x)) but not in a .The functions π B A underlying Interval(A) abstract away information about boundedness, which is fine if we assume intervals are bounded (as in Allen 1983).But what if we wish to study intervals that may or may not be left-bounded?Or, for that matter, strings where a may or may not be an interval?The line we pursue in this subsection harks back to Table 1 at the end of Section 2, encoding presuppositions in the second column (e.g., Mod V (Σ)), and assertions in the third column (e.g., 〈⊒〉 s φ ).For instance, we presuppose a string s is stutter-free (i.e., s = bc(s)) and assert that a is an interval in s, to replace Interval(A) by the intersection of which a and a are members, for a ∈ A. More generally, the idea is to refine the inverse system from the previous subsection to certain concrete instances of institutions (in the sense of Goguen and Burstall 1992) given by suitable functions on strings.
[ 49 ] To ensure each a ∈ V occurs at least once in the string, we put V at the very end with e V (ε) := V for the empty string ε.Now, if f is the composition and ( c1) and (c2) hold.
The third column of Table 1 calls for further ingredients.Let us define a Z-form to be a function sen with domain Fin(Z) mapping A ∈ Fin(Z) to a set sen(A) such that for all B ∈ Fin(Z), Given a Z-form sen, we can associate every for all A ∈ Fin(Z).Next, given a function f on Fin(Z) * and a Z-form sen, let us agree that a ( f , sen)-specification is a function with domain Fin(Z) mapping A ∈ Fin(Z) to a function A with domain sen(A) mapping φ ∈ sen(A) to a set A (φ) of strings in P f (A).The intuition is that A (φ) consists of the strings in P Putting the ingredients together, let us define a (Z, f )-quadriplex to be a 4-tuple (Fin(Z), P f , sen, ) such that (i) P f is an inverse system over Fin(Z) (ii) sen is a Z-form, and (iii) is a ( f , sen)-specification.
[ 51 ] Note that once Z and f are fixed, only the third and fourth components sen and of a (Z, f )-quadriplex (Fin(Z), P f , sen, ) may vary.To link up with institutions, as defined in Goguen and Burstall (1992), we view (i) Fin(Z) as a category with morphisms given by ⊆ (ii) P f as a contravariant functor from Fin(Z) to the category Set of sets and functions, and (iii) sen as a (covariant) functor from Fin(Φ) to Set such that whenever is the inclusion sen(A) → sen(B).The one remaining condition a (Z, f )-quadriplex must meet to be an institution is that for all A ⊆ B ∈ Fin(Z) and φ ∈ sen(A), which we can put as the equation In fact, the special case A = voc(φ) suffices.

Proposition 5 Given a set Z and function
If f is the identity on Fin(Z) * , and sen(Σ) is MSO Σ , then (6) becomes the equivalence for all φ ∈ MSO Σ and s ∈ (2 Σ ) * .( 6) also represents the division in Table 1 between column 2 (P f (Σ)) and column 3 ( f voc(φ) voc(φ) (φ)), whilst leaving open the possibility that f is not the identity function on Fin(Z) * nor is φ an MSO-formula.
Under (6), we may assume without loss of generality that sen and have the following form.For every Σ ∈ Fin(Z), there is a set Expr(Σ) of expressions e with denotations [[e]] ⊆ (2 Σ ) * such that sen(Σ) = 2 Σ × Expr(Σ) consists of pairs (A, e) of subsets A ⊆ Σ and e ∈ Expr(Σ) with voc(A, e) = A and [ 52 ] An instructive example is provided by A equal to {a}, and e equal to the extended regular expression 〈⊒〉 a a or equivalently, the MSO {a}sentence ∃x∃ y (S(x, y) ∧ P a (x) ∧ P a ( y)).
The righthand side of ( 7) can never hold with f = bc; there is no s ∈ (2 Σ ) + such that bc {a} (s) ⊒ a a .A slight revision, however, makes the right hand side bc-satisfiable; introduce a symbol b ̸ = a for A equal to In general, we can neutralize block compression bc on a string s by adding a fresh symbol to alternating boxes in s, which bc then leaves unchanged, since bc(s) = s ⇐⇒ s is stutter-free Similarly, we can add negations a of symbols a in A through a function cl to express bc Σ A in terms of π Σ applying prefix after bc, and say a overlaps a ′ Σ ({a, a ′ }, ∃x(P a (x) ∧ P a ′ (x))) = bc Σ {a,a ′ } 〈⊒〉 a, a ′ applying containment ⊒ after bc.It is clear that unpad is just one of many relations that can come after bc Σ A (leading, in this case, to π Σ A = bc Σ A ; unpad).The projection ρ Σ A in bc Σ A = ρ Σ A ; bc changes the granularity from Σ to A before bc reduces the ontology to suit A, and part [ 53 ] relations (such as prefix, containment ⊒ or unpad) pick out a temporal span to frame a string (such as or a, a ′ ) picturing an assertion (e.g., left-boundeness, overlap).We are dividing here the choice of an expression e φ denoting the language voc(φ) (φ) in Proposition 5 between a relation R and a string s for e φ = 〈R〉 s.Such a choice presupposes the finite approximability of the model of interest via the inverse limit of P f (the discreteness of strings mirroring the bounded granularity of natural language statements, rife with talk of "the next moment").Finite approximability is not only plausible but arguably implicit in accounts such as Reichenbach (1947) of tense and aspect.

conclusion
There is no question that as declarative devices specifying sets of strings accepted by finite automata, regular expressions are more popular than MSO.What MSO offers, however, is a model-theoretic perspective on strings with computable notions of entailment (inclusions between regular languages being decidable), in addition to Boolean connectives that expose deficiencies in succinctness of regular expressions (e.g., Gelade and Neven 2012).Mapping a finite automaton to a regular expression denoting the language ( ) accepted by can have exponential cost (Ehrenfeucht and Zeiger 1976;Holzer and Kutrib 2010).A more concise representation of ( ) existentially quantifies away the internal states from the accepting runs of (analyzed in Proposition 4 above).Not only can this be carried out in MSO (proving one half of the Büchi-Elgot-Trakhtenbrot theorem), but it is well-known that MSO-sentences can be far more succinct than finite automata (e.g., Libkin 2010, pages 124-125, and 135-136).To match the succinctness of MSO, regular expressions over alphabets 2 Σ (for finite sets Σ) are extended with preimages and images under homomorphisms ρ A that output A-reducts, for A ⊆ Σ.
The step from Σ up to 2 Σ is justified by the various notions of part between strings of sets, given by ρ A , subsumption ⊵, prefix, suffix, block compression bc and unpad, all computable (over 2 Σ ) by finitestate transducers.Reducts between vocabularies are composed with compression within a fixed vocabulary to fit ontology against the vocabulary.An inverse limit construction (turning compression around to extension) takes us beyond the finite models of MSO to infinite time- [ 54 ] (a) Ed exhaled.(b) Ed has exhaled.To represent the difference between (a) and (b), we bring the reference time R into the picture, expanding Σ = {E,S} to Σ = {R,E,S} with ( ‡) R,E S for the simple past (a), and E R,S for the present perfect (b),