Bimorphisms and synchronous grammars

We tend to think of the study of language as proceeding by characterizing the strings and structures of a language, and we think of natural-language processing as using those structures to build systems of utility in manipulating the language. But many language-related problems are more fruitfully viewed as requiring the specification of a relation between two languages, rather than the specification of a single language. In this paper, we provide a synthesis and extension of work that unifies two approaches to such language relations: the automatatheoretic approach based on tree transducers that transform trees to their counterparts in the relation, and the grammatical approach based on synchronous grammars that derive pairs of trees in the relation. In particular, we characterize synchronous tree-substitution grammars and synchronous tree-adjoining grammars in terms of bimorphisms, which have previously been used to characterize tree transducers. In the process, we provide new approaches to formalizing the various concepts: a metanotation for describing varieties of tree automata and transducers in equational terms; a rigorous formalization of tree-adjoining and tree-substitution grammars and their synchronous counterparts, using trees over ranked alphabets; and generalizations of tree-adjoining grammar allowing multiple adjunction.


introduction
We tend to think of the study of language as proceeding by characterizing the strings and structures of a language, and we think of natural-language processing as using those structures to build systems of utility in manipulating the language. But many language-related problems are more fruitfully viewed as requiring the specification of a relation between two languages, rather than the specification of a single language. The paradigmatic case is machine translation, where the translation relation between the source and target natural languages is itself the goal to be characterized. Similarly, the study of semantics involves a relation between a natural language and a language of semantic representation (phonological form and logical form in one parlance). Computational interpretation of text, as in question-answering or natural-language command and control systems, requires computing that relation in the direction from natural language to semantic representation, and tactical generation in the opposite direction. Sentence paraphrase and compression can be thought of as computing a relation between strings of a single natural language. Similar examples abound.
The modelling of these relations has been a repeated area of study throughout the history of computational linguistics, proceeding in phases that have alternated between emphasizing automatatheoretic tools and grammatical tools. On the automata-theoretic side, the early pioneering work of Rounds (1970) on tree transducers was intended to formalize aspects of transformational grammars, and led to a long development of the formal-language theory of tree transducers. Grammatical approaches are based on the idea of synchronizing the grammars of the related languages. We use the general term synchronous grammars for the idea (Shieber and Schabes 1990), though early work in formalizing programming-language compilation uses the more domain-specific term syntax-directed transduction or translation (Lewis and Stearns 1968;Aho and Ullman 1969), and a variety of specific systems -inversion transduction grammars (Wu 1996(Wu , 1997, head transducers (Alshawi et al. 2000), multitext grammars (Melamed 2003(Melamed , 2004) -forgo the use of the term. The early work on the synchronous grammar approach for natural-language application involved synchronizing tree-adjoining grammars (TAG). A recent [ 52 ] resurgence of interest in automata-theoretic approaches in the machine translation community (Graehl and Knight 2004;Galley et al. 2004) has led to more powerful types of transducers (Maletti et al. 2009) and a far better understanding of the computational properties of and relationships among different transducer types (Maletti et al. 2009). Synchronous grammars have also seen a rise in application in areas such as machine translation DeNeefe and Knight 2009), linguistic semantics Han and Hedberg 2008), and sentence compression (Yamangil and Shieber 2010).
As these various models were developed, the exact relationship among them had been unclear, with a large number of seemingly unrelated formalisms being independently proposed or characterized. In particular, the grammatical approach to tree relations found in synchronous grammar formalisms and the automata-theoretic approach of tree transducers have been viewed as contrasting approaches.
A reconciliation of these two approaches was initiated in two pieces of earlier work (Shieber 2004, which the present paper unifies, simplifies, and extends. That work proposed to use the formallanguage-theoretic device of bimorphisms (Arnold and Dauchet 1982), previously little known outside the formal-language-theory community, as a means for unifying the two approaches and clarifying the interrelations. It investigated the formal properties of synchronous treesubstitution grammars (STSG) and synchronous tree-adjoining grammars (STAG) from this perspective, showing that both formalisms, along with traditional tree transducers, can be thought of as varieties of bimorphisms. This earlier work has already been the basis for further extensions, such as the synchronous context-free tree grammars of Nederhof and Vogler (2012).
The present paper includes all of the results of the prior two papers, with notations made consistent, presentations clarified and expanded, and proofs simplified, and therefore supersedes those papers. It provides a definitive presentation of the formal foundations for TSG, TAG, and their synchronous versions, improving on the earlier presentations. To our knowledge, it provides the most consistent definition of TAG and STAG available, and the only one to use trees over ranked rather than unranked alphabets. It also, in passing, provides a characterization of transducers in terms of equational systems using [ 53 ] a uniform metagrammar notation, a new characterization of the relation between tree-adjoining grammar derivation and derived trees, and a new simpler and more direct proof of the equivalence of treeadjoining languages and the output languages of monadic macro tree transducers, formal contributions that may have independent utility. Finally, it extends the prior results to cover more linguistically appropriate variants of synchronous tree-adjoining grammars, in particular incorporating multiple adjunction.
After some preliminaries (Section 2), we present a set of known results relating context-free languages, tree homomorphisms, tree automata, and tree transducers to extend them for the tree-adjoining languages (Section 3), presenting these in terms of restricted kinds of functional programs over trees, using a simple grammatical notation for describing the programs. We review the definition of treesubstitution and tree-adjoining grammars (Section 4) and synchronous versions thereof (Section 5). We prove the equivalence between STSG and a variety of bimorphism (Section 6).
The grammatical presentation of transducers as functional programs allows us to easily express generalizations of the notions: monadic macro tree homomorphisms, automata, and transducers, which bear (at least some of) the same interrelationships that their traditional simpler counterparts do (Section 7). Finally, we use this characterization to place the synchronous TAG formalism in the bimorphism framework (Section 7.3), further unifying tree transducers and other synchronous grammar formalisms. We show that these methods generalize to TAG allowing multiple adjunction as well (Section 8). 1 The present work, being based on and synthesizing work from some ten years ago, is by no means the last word in the general area. Indeed, since publication of the earlier articles, the connections among synchronous grammars, transducers, and bimorphisms have been considerably further clarified. The relation between bimorphisms and tree transducers has benefitted from a notion of extended top-down tree transducers, which have been shown to be strongly equivalent to the B(LC, LC) bimorphism class we discuss below (Maletti 2008). Koller and Kuhlmann (2011) provide an elegant generalization of monolingual and synchronous systems in terms of interpreted regular tree grammars (IRTG), in spirit quite close to the idea here of reconstructing synchronous grammars as bimorphism-like formal systems. Their IRTG can be used for CFG, TSG, TAG, and synchronous versions of various sorts. Of especial interest are the formalizations of Büchse et al. (2012Büchse et al. ( , 2014, which modify the definitions of TAG to incorporate state information at substitution and adjunction sites. This modification eliminates much of the inelegance of the formalization here that accounts for our having to couch the various equivalences we show in terms of weak rather than strong generative capacity. The presentation below should be helpful in understanding the background to these works as well.

preliminaries
We start by defining the terminology and notations that we will use for strings, trees, and the like.

Basics
We will notate sequences with angle brackets, e.g., 〈a, b, c〉, or where no confusion results, simply as a bc, with the empty string written ε.
We follow much of the formal-language-theory literature (and in particular, the tree transducer literature) in defining trees over ranked alphabets, in which the symbols decorating the nodes are associated with fixed arities. (By contrast, formal work in computational linguistics typically uses unranked trees.) Trees will thus have nodes labeled with elements of a ranked alphabet, a set of symbols F, each with a non-negative integer rank or arity assigned to it, determining the number of children for nodes so labeled. To emphasize the arity of a symbol, we will write it as a parenthesized superscript, for instance f (n) for a symbol f of arity n. Analogously, we write F (n) for the set of symbols in F with arity n. Symbols with arity zero (F (0) ) are called nullary symbols or constants. The set of nonconstants is written To express incomplete trees, trees with "holes" waiting to be filled, we will allow leaves to be labeled with variables, in addition to nullary symbols. The set of trees over a ranked alphabet F [ 55 ] and variables X, notated T(F, X), is the smallest set such that Nullary symbols at leaves f ∈ T(F, X) for all f ∈ F (0) ; Variables at leaves x ∈ T(F, X) for all x ∈ X; Internal nodes f (t 1 , . . . , t n ) ∈ T(F, X) for all f ∈ F (n) , n ≥ 1, and t 1 , . . . , t n ∈ T(F, X).
Where convenient, we will blur the distinction between the leaf and internal node notation for a nullary symbol f , allowing f () as synonymous for the leaf node f .
We abbreviate T(F, ), where the set of variables is empty, as T(F), the set of ground trees over F. We will also make use of the set of n numerically ordered variables X n = {x 1 , . . . , x n }, and write x, y, z as synonyms for x 1 , x 2 , x 3 , respectively. Trees can also be viewed as mappings from tree addresses, sequences of integers, to the labels of nodes at those addresses. The address ε is the address of the root, 1 the address of the first child, 12 the address of the second child of the first child, and so forth. We write q ≺ p to indicate that tree address q is a proper prefix of p, and p − q for the sequence obtained from p by removing prefix q from the front. For instance, 1213 − 12 = 13. We will use the notation t/p to pick out the subtree of the node at address p in the tree t, that is, (using · for the insertion of an element on a sequence) The notation t@p picks out the label of the node at address p in the tree t, that is, the root label of t/p. Replacing the subtree of t at address p by a tree t ′ , written t[p → [ 56 ]

Bimorphisms and synchronous grammars
The height of a tree t, notated height(t), is defined as follows: We can use trees with variables as contexts in which to place other trees. A tree in T(F, X n ) will be called a context, typically denoted with the symbol C. The notation C[t 1 , . . . , t n ] for t 1 , . . . , t n ∈ T(F) denotes the tree in T(F) obtained by substituting for each x i the corresponding t i .
More formally, for a context C ∈ T(F, X n ) and a sequence of n trees t 1 , . . . , t n ∈ T(F), the substitution of t 1 , . . . , t n into C, notated C[t 1 , . . . , t n ], is defined inductively as follows:

A grammatical metanotation
We will use a grammatical notation akin to BNF to specify, among other constructs, equations defining functional programs of various sorts. As an introduction to this notation, here is a grammar defining trees over a ranked alphabet and variables (essentially identically to the definition given above): The notation allows definition of classes of expressions (e.g., F (n) ) and specifies metavariables over them ( f (n) ). These classes can be primitive (F (n) ) or defined (X), even inductively in terms of other classes or themselves (T(F, X)). We use the metavariables and subscripted variants on the right-hand side to represent an arbitrary element of the corresponding class. Thus, the elements t 1 , . . . , t m stand for arbitrary trees in T(F, X), and x an arbitrary variable in X. Because numerically subscripted versions of x appear explicitly and individually enumerated as instances of X (on the right hand side of the rule defining variables), numerically subscripted variables (e.g., x 1 ) on the right-hand side of all rules are taken to refer to the specific elements of X (for instance, in the definition (1) of tree transducers), whereas otherwise subscripted elements within the metanotation (e.g., x i , t 1 , t m ) are taken as metavariables.
3 tree transducers, homomorphisms, and automata We review the formal definitions of tree transducers and related constructions for defining tree languages and relations, making use of the grammatical metanotation to define them as functional program classes. 3.1

Tree transducers
The variation in tree transducer formalisms is extraordinarily wide and the literature vast. For present purposes, we restrict attention to simple nondeterministic tree transducers operating top-down, which transform trees by replacing each node with a subtree as specified by the label of the node and the state of the transduction at that node.
Informally, a tree transducer (specifically a nondeterministic top-down tree transducer (↓T T )) specifies a nondeterministic computation from T(F) to T(G) defined such that the symbol at the root of the input tree and a current state determines an output context in which the recursive images of the subtrees are placed. Formally, we can define a transducer as a kind of functional program, that is, a set of equations characterized by the following grammar for equations Eqn. (The set of states is conventionally notated Q, with members notated q. One of the states is distinguished as the initial state of the transducer.) [ 58 ] Intuitively speaking, the expressions in R (n) are right-hand-side terms using variables limited to the first n.
Given this formal description of the set of equations Eqn, a tree transducer is defined as a tuple 〈Q, F, G, ∆, q 0 〉 where 2 • Q is a finite set of states; • F is a ranked alphabet of input symbols; • G is a ranked alphabet of output symbols; • ∆ ⊆ Eqn is a finite set of equations; Conventional nomenclature refers to the equations as transitions, by analogy with transitions in string automata. We use both terms interchangeably. To make clear the distinction between these equations and other equalities used throughout the paper, we use the special equality symbol . = for these equations. The equations define a derivation relation as follows. Given a tree transducer 〈Q, F, G, ∆, q 0 〉 and two trees t , and a tree C ∈ T(F ∪ G ∪ Q, X 1 ) in which the variable x 1 occurs exactly once and trees u 1 , . . . , u n ∈ T(F ∪ G), such that We abuse notation by using the same symbol for the transition equations and the one-step derivation relation they define, and will further extend the abuse to cover the derivation relation's reflexive transitive closure. The tree relation defined by a ↓T T 〈Q, F, G, ∆, q 0 〉 is the set of all tree pairs 〈s, t〉 ∈ T(F)×T(G) such that q 0 (s) . = t. By virtue of nondeterminism in the equations, multiple equations for a given state q and symbol f , tree transducers define true relations rather than merely functions.
By way of example, the equation grammar above allows the definition of the following set of equations defining a tree transducer: 3 This transducer allows for the following derivation: , q( f (a))) .

Subvarieties of transducers
Important subvarieties of the basic transducers can be defined by restricting the trees τ that form the right-hand sides of equations, the elements of R (n) used.
A transducer is • linear if for each such equation defining the transducer, τ is linear, that is, no variable is used more than once; • complete if τ contains every variable in X n at least once; • ε-free if τ ̸ ∈ X n ; • symbol-to-symbol if height(τ) = 1; and • a delabeling if τ is complete, linear, and symbol-to-symbol. 3.3

Nonlinearity deprecated
The following rules specify a transducer that recursively "rotates" subtrees of the form f (t 1 , f (t 2 , t 3 )) to the tree f ( f (t 1 , t 2 ), t 3 ), failing if the required pattern is not found. (a, b), a), a), b) (as depicted graphically in Figure 1) according to the following derivation:

a), b)
A variant transducer can allow f subtrees to remain unchanged (rather than failing) when the second argument is not itself an f tree. We add a (nondeterministic) equation to allow nonrotation, which puts the proper constraint on its second subtree y through the new state q ′ defined by This allows, for instance, the "already rotated" tree in Figure 1(b) to transduce to itself. Note that intrinsic use is made in these examples of the ability to duplicate variables on the right-hand sides of rewrite rules. Transducers without such duplication are linear. Linear tree transducers are incapable of performing local rotations of this sort.
Local rotations are typical of natural-language applications. For instance, many of the kinds of translation divergences between languages, such as that exemplified in Figure 2, manifest such rotations. Similarly, semantic bracketing paradoxes can be viewed as necessitating rotations. Thus, linear tree transducers are insufficient for naturallanguage modeling purposes.
Nonlinearity per se, the ability to make copies during transduction, is not the kind of operation that is characteristic of naturallanguage phenomena. Furthermore, nonlinear transducers are computationally problematic. The following nonlinear transducer generates a tree that doubles in both width and depth.
Notice that the number of a's in the i-th iteration is 2 2 i −1 . The size of this transducer's output is exponential in the size of its input. (The existence of such a transducer constitutes a simple proof of the lack of composition closure of tree transducers, as the exponential of an exponential grows faster than exponential.) In summary, nonlinearity seems inappropriate on computational and linguistic grounds, yet is required for tree transducers to express the kinds of simple local rotations that are typical of natural-language transductions. By contrast, STSG, as described in Section 6, is intrinsically a linear formalism but can express rotations straightforwardly.

Tree automata and homomorphisms
Two subcases of tree transducers are especially important. First, tree transducers that implement a partial identity function over their domain are tree automata. These are delabeling tree transducers that preserve the label and the order of arguments. Because they compute only the identity function, tree automata are of interest for the domains over which they are defined, not the mappings they compute. This domain forms a tree language, the tree language recognized by the automaton. The tree languages so recognized are the regular tree languages (or recognizable tree languages). Though the regular tree languages are a superset of the tree languages defined by context-free grammars (the local tree languages), the string languages defined by their yield are coextensive with the context-free languages. We take tree automata to be quadruples by dropping one of the redundant alphabets from the corresponding tree transducer quintuple.
Second, tree homomorphisms are deterministic tree transducers with only a single state, hence essentially stateless. The replacement of a node by a subtree thus proceeds deterministically and independently of its context. Consequently, a homomorphism h : T(F) → [ 63 ] T(G) is specified by its kernel, a functionĥ : F → T(G, X ∞ ) such that h( f ) is a context in T(G, X arity( f ) ) for each symbol f ∈ F. The kernelĥ is extended to the homomorphism h by the following recurrence: that is,ĥ( f ) acts as a context in which the homomorphic images of the subtrees are substituted.
As with transducers (see Section 3.2), further restrictions can be imposed to generate the subclasses of linear, complete, ε-free, symbolto-symbol, and delabeling tree homomorphisms.
The import of these two subcases of tree transducers lies in the fact that the tree relations defined by certain tree transducers have been shown to be also characterizable by composition from these simplified forms, via an alternate and quite distinct formalization, to which we now turn.

The bimorphism characterization of tree transducers
Tree transducers can be characterized directly in terms of equations defining a simple kind of functional program, as above. Bimorphisms constitute an elegant alternative characterization of tree transducers in terms of a constellation of elements of the various subtypes of transducers -homomorphisms and automata -we have introduced.
A bimorphism is a triple 〈L, h in , h out 〉 consisting of a regular tree language L (or, equivalently, a tree automaton) and two tree homomorphisms h in and h out (connoting the input and output respectively). The tree relation L defined by a bimorphism is the set of tree pairs that are generable from elements of the tree language by the homomorphisms, that is, Depending on the type of tree homomorphisms used in the bimorphism, different classes of tree relations are defined. We can limit attention to bimorphisms in which the input or output homomorphisms are restricted to a certain type: linear (L), complete (C), ε-free (F ), symbol-to-symbol (S), delabeling (D), or unrestricted (M). We will write B(I, O) where I and O characterize a subclass of homomorphisms for the set of bimorphisms for which the input homomorphism is in the [ 64 ] subclass indicated by I and the output homomorphism is in the subclass indicated by O. For example, B(D, M ) is the set of bimorphisms for which the input homomorphism is a delabeling but the output homomorphism can be arbitrary.
The tree relations definable by bottom-up tree transducers (closely related to the top-down transducers we use here) turn out to be exactly this class B(D, M ). (See the survey by Comon et al. (2008, Section 6.5) and works cited therein.) The bimorphism notion thus allows us to characterize certain tree transductions purely in terms of tree automata and tree homomorphisms.
As an example, we consider the rotation transducer of Section 3.3, expressed as a bimorphism. The tree relation for the bimorphism expresses an abstract specification of where the rotations are to occur, picking out such cases with a special symbol R of arity 3, its arguments being the three subtrees participating in the rotation.
The input homomorphism maps these trees onto trees prior to rotation.
Notice that the trees rooted in R map onto a tree configuration that should be rotated. The output homomorphism maps each tree onto the corresponding post-rotation tree: Again, to allow the option of nonrotating configurations, we can add to the control trees nodes labeled F that should map onto configurations that cannot be rotated. (New equations are marked with ⇐.) The new q ′ state guarantees this constraint on the F trees.
The input homomorphism maps the new F states onto f trees as does the output homomorphism.
tree-substitution and tree-adjoining grammars Tree-adjoining grammars (TAG) and tree-substitution grammars (TSG) are grammar formalisms based on tree rewriting, rather than the string rewriting of the Chomsky hierarchy formalisms. Grammars are composed of a set of elementary trees, which are combined according to simple tree operations. In the case of TAG, these operations are substitution and adjunction, in the case of TSG, substitution alone. Synchronous variants of these formalisms extend the base formalism with the synchronization idea presented in earlier work (Shieber 1994). In particular, grammars are composed of pairs of elementary trees, and certain pairs of nodes, one from each tree in a pair, are linked to indi- [ 66 ] cate that operations incorporating trees from a single elementary pair must occur at the linked nodes.
We review here the definition of tree-substitution and tree-adjoining grammars, and their synchronous variants. Since TSG can be thought of as a subset of TAG, we first present TAG, describing the restriction to TSG thereafter. Our presentation of TAG differs slightly from traditional ones in ways that simplify the synchronous variants and the later bimorphism constructions.

Tree-adjoining grammars
A tree-adjoining grammar is composed of a set of elementary trees, such as those depicted in Figure 4, that are combined by operations of substitution and adjunction. Traditional presentations of TAG, with which we will assume familiarity, take the symbols in elementary and derived trees to be unranked; nodes labeled with a given nonterminal symbol may have differing numbers of children. (Joshi and Schabes (1997) present a good overview.) For example, foot nodes of auxiliary trees and substitution nodes have no children, whereas the similarly labeled root nodes must have at least one. Similarly, two nodes with the same label but differing numbers of children may match for the purpose of allowing an adjunction (as the root nodes of α 1 and β 1 in Figure 4). In order to integrate TAG with tree transducers, however, we move to a ranked alphabet, which presents some problems and opportunities. (In some ways, the ranked alphabet definition of TAGs is slightly more elegant than the traditional one.) We will thus take the nodes of TAG trees to be labeled with symbols from a ranked alphabet F; a given symbol then has a fixed arity and a fixed number of children. However, in order to maintain information about which symbols may match for the purpose of adjunction and substitution, we take the elements of F to be explicitly formed as pairs of an unranked label e and an arity n. (For notational consistency, we will use e for unranked and f for ranked symbols.) We will notate these elements, abusing notation, as e (n) , and make use of a function |·| to unrank symbols in F, so that |e (n) | = e.
To handle foot nodes, for each non-nullary symbol e (i) ∈ F (≥1) , we will associate a new nullary symbol e * , which one can take to be the pair of e and * ; the set of such symbols will be notated F * . Similarly, for substitution nodes, F ↓ will be the set of nullary symbols e ↓ [ 67 ] for all e (i) ∈ F (≥1) . These additional symbols, since they are nullary, will necessarily appear only at the frontier of trees. We will extend the function |·| to provide the unranked symbol associated with these symbols as well, so |e ↓ | = |e * | = e.
A TAG grammar (which we will define more precisely shortly) is based then on a set P of elementary trees, a finite subset of T(F ∪ F ↓ ∪ F * ), divided into the auxiliary and initial trees depending on whether they do or do not possess a foot node, respectively. In order to allow reference to a particular tree in the set P, we associate with each tree a unique name, conventionally notated with a subscripted α or β for initial and auxiliary trees respectively. We will abuse notation by using the name and the tree that it names interchangably, and use primed and subscripted variants of α and β as variables over initial and auxiliary trees, with γ serving for elementary trees in general.
Traditionally in TAG grammars, substitutions are obligatory at substitution nodes (those with labels from F ↓ ) and adjunctions are optional at nodes with labels from F. This presents two problems. First, the optionality of adjunction makes it tricky to provide a canonical fixed-length specification of what trees operate at the various nodes in the tree; such a specification will turn out to be helpful in our definitions of derivation for TAG and synchronous TAG. (This is not a problem for substitution, as the obligatoriness of substitution means that there will be exactly as many trees substituting in as there are substitution nodes.) Second, it is standard within TAG to provide further constraints that disallow adjunction at certain nodes. So far, we have no provision for such nonadjoining constraints. To address these problems, we use a TAG formalism slightly modified from traditional presentations, one that loses no expressivity in weak generative capacity but is easier for analysis purposes.
First, we make all adjunction obligatory, in the sense that if a node in a tree allows adjunction, an adjunction must occur there. To get the effect of optional adjunction, for instance at a node labeled B, we add to the grammar a nonadjunction tree na B , a vestigial auxiliary tree of a single node B * , which has no adjunction sites and therefore does not itself modify any tree that it adjoins into. These nonadjunction trees thus found the recursive structure of derivations. 4 Figure 3: Sample TAG tree marked with diacritics to show the permutation of operable nodes. Note that the node at address 1 is left out of the set of operable sites; it is thus a nonadjoining node.
Second, now that it is determinate whether an operation must occur at a node, the number of children of a node in a derivation tree is determined by the elementary tree γ at that node; it is just the number of adjunction or substitution sites in γ, the operable sites, which we will notate γ. We take γ to be the set of adjunction and substitution nodes in the tree, that is, all nodes in the tree with the exception of the foot node. (Below, we will allow for nodes to be left out from the set of operable sites, and in Section 8, we generalize this to allow multiple adjunctions at a single site.) All that is left is to provide a determinate ordering of operable sites in an elementary tree, that is, a permutation π on the operable sites γ (or equivalently, their addresses). This permutation can be thought of as specified as part of the elementary tree itself. For example, the tree in Figure 3, which requires operations at the nodes at addresses ε, 12, and 2, may be associated with the permutation 〈12, 2, ε〉. The permutation can be marked on the tree itself with numeric diacritics i , as shown in the figure.
A nonadjoining constraint on a node can now be implemented merely by removing the node from the operable sites of a tree, and hence from the tree's associated permutation. In the graphical depictions, nonadjoining nodes are those non-substitution nodes that bear no numeric diacritic.
Formally, we define E(F), the elementary trees over a ranked alphabet F, to be all pairs □ γ = 〈γ, π〉 where γ ∈ T(F ∪ F ↓ ∪ F * ) and π is a permutation of a subset of the nodes in γ. As above, we use the notation γ to specify the operable sites of γ, that is, the domain of π. The operable sites γ must contain all substitution nodes in γ.
for all elements of F is consistent with that practice. Our approach, however, opens the possibility of leaving out nonadjunction trees for one or more symbols, thereby implementing a kind of global obligatory adjunction constraint, less expressive than those variants of TAG that have node-based obligatory adjunction constraints, but more so than the purely adjunction-optional approach.
[ 69 ] We further require that the tree γ whose root is labeled f contain at most one node labeled with | f | * ∈ F * and no other nodes labeled in F * ; this node is its foot node, and its address is notated foot(β). The foot node is not an element of γ. Trees with a foot node are auxiliary trees; those with no foot node are initial trees. The set E(F) is the set of all possible such elementary trees.
The notation □ γ is used to indicate an elementary tree, the box as a mnemonic for the box diacritics labeling the permutation. We use similar notations for the particular cases where the elementary tree is initial ( □ α) or auxiliary ( □ β). For convenience, for an elementary tree □ γ, we will use γ for its tree component when no confusion results, and will conflate the tree properties of an elementary tree □ γ and its tree component γ.
A TAG grammar is then a triple 〈F, P, S〉, where F is a ranked alphabet; P is the set of elementary trees, a finite subset of E(F); and S ∈ F ↓ is a distinguished initial symbol. We further partition the set P into the set I of initial trees in P and the set A of auxiliary trees in P. A simple TAG grammar is depicted in Figure 4; α 1 and α 2 are initial trees, and β 1 and β 2 are auxiliary trees.

The substitution and adjunction operations
We turn now to the operations used to derive more complex trees from the elementary trees. It is convenient to notationally distinguish derived trees that have the form of an initial or auxiliary tree, that is, (respectively) lacking or bearing a foot node. We use the bolded symbols α and β for derived trees in T(F ∪ F ↓ ∪ F * ) without and with foot nodes, respectively, again using γ when being agnostic as to the form.
The trees are combined by two operations, substitution and adjunction. Under substitution, a node labeled e ↓ (at address p) in a tree γ can be replaced by an initial-form tree α with the corresponding label f at the root when | f | = e. The resulting tree, the substitution of Under adjunction, an internal node of γ at p labeled f ∈ F is split apart, replaced by an auxiliary-form tree Figure 4: Sample TAG for the copy language resulting tree, the adjunction of β at p in γ, is This definition (by requiring f to be in F, not F * or F ↓ ) is consistent with the standard convention, without loss of expressivity, that adjunction is disallowed at foot nodes and substitution nodes. For uniformity, we will notate these operations with a single operator op p defined as follows:

Derivation trees and the derivation relation
A derivation tree D records the operations over the elementary trees used to derive a given derived tree. Each node in the derivation tree specifies an elementary tree □ γ, with the node's child subtrees D i recording the derivations for trees that are adjoined or substituted into that tree at the corresponding operable nodes. A derivation for a grammar G = 〈F, P, S〉 is a tree whose nodes are labeled with elementary trees, that is, a tree D in T(P). We here interpret P itself as a ranked alphabet, where for each □ γ = 〈γ, π〉 ∈ P, we take its arity to be arity( □ γ) ≡ |π|. This requirement enforces the constraint that nodes in a derivation tree labeled with □ γ will have exactly the right number of children to specify the subtrees to be used at each of the operable sites in □ γ. We add an additional constraint: Labels match: For each node in D labeled with □ γ = 〈γ, π〉, and for all i where 1 ≤ i ≤ arity( □ γ), the root node of the i-th child of □ γ, labeled with □ γ i , must match the corresponding operable site in □ γ, that is, |γ@π i | = |γ i @ε| .
(The notation γ@π i can be thought of as the node in γ labeled by diacritic i .) A derivation is complete if it is rooted in an initial tree that is itself rooted in the initial symbol: Initial symbol at root: The tree □ α r at the root of the derivation tree must be an initial tree labeled at its root by the initial symbol; that is, |α r @ε| = |S|. 5 For example, the tree in Figure 5(a) is a well-formed complete derivation tree for the grammar in Figure 4. Note, for instance, that |α 1 @π 2 | = S = |β 1 @ε| as required by the label-matching constraint, and the root is an initial tree α 1 whose root is consistent with the initial symbol S ↓ .
A simple tree automaton can check these conditions, and therefore define the set of well-formed complete derivation trees. This automaton is constructed as follows. The states of the automaton are the set {q N | N ∈ |F|}, one for each unranked vocabulary symbol in the derived tree language. The start state is q |S| . For each tree □ γ = 〈γ, π〉 ∈ P, of arity n and rooted with the symbol N , there is a transition of the form The set of well-formed derivation trees is thus a regular tree set.
5 The stripping of ranks and diacritics is necessary to allow, for instance, the initial symbol to match root nodes of differing arities.
[ 72 ] For the grammar of Figure 4, the automaton defining well-formed derivation trees is given by q S (α 1 (x, y)) which recognizes the tree of Figure 5(a): The derivation relation D, that is, the relation between derivation trees and the derived trees that they specify, can be simply defined via the hierarchical iterative operation of trees at operable sites. In particular, for a derivation tree with root labeled with the elementary tree □ γ = 〈γ, π〉 of arity n, we define D( □ γ(t 1 , . . . , t n )) ≡ γ[op π 1 D(t 1 ), op π 2 D(t 2 ), . . . , op π n D(t n )] where, following Schabes and Shieber (1994), the right-hand side specifies the simultaneous application of the specified operations. We define this in terms of the sequential application of operations as follows: The update function adjusts the paths at which later operations take place to compensate for an earlier adjunction. (Recall the notations q ≺ p for q a proper prefix of p and p − q for the sequence obtained by removing the prefix q from p.) if γ is an auxiliary-form tree and q ≺ p Stuart M. Shieber that is, that the order of adjunctions is immaterial according to this definition. The proof applies equally well to substitution and mixtures of operations. This proves that the order of the permutation over operable sites is truly arbitrary; any order will yield the same result. (In Section 8, the introduction of multiple adjunction presents the potential for noncommutativity. We address the issue in that section.) As the base case, this definition gives, as expected, for elementary trees of arity 0, that is, trees with no operable sites.

Tree-substitution grammars
Tree-substitution grammars are simply tree-adjoining grammars with no auxiliary trees, so that the elementary trees are only combined by substitution.
As a simple natural-language example, we consider the grammar with three elementary trees of Figure 6 and initial symbol S. The arities of the symbols should be clear from their usage and the associated permutations from the link diacritics.
As in Section 4.3, the derived tree for a derivation tree D is generated by performing all of the requisite substitutions. In this section, we provide a new definition of the derivation relation between a derivation tree and the derived tree it specifies as a simple homomorphism h D , and prove that this definition is equivalent to that of Section 4.3. [ 74 ] We define h D in equational form. For each elementary tree □ α ∈ P, there is an equation of the form where the right-hand-side transformation ⌊·⌋ is defined by Essentially, this transformation replaces each operable site π i by the homomorphic image of the corresponding variable x i , that is, for a tree α with n substitution sites in its permutation π.

An example derivation
Returning to the example, the equations corresponding to the elementary trees of Figure 6 are We define the derived tree corresponding to a derivation tree D as the application of this homomorphism to D, that is h D (D). For the example above, the derived tree is that shown in Figure 2(a): By composing the automaton recognizing well-formed derivation trees with the homomorphism above, we can construct a single transducer doing the work of both. We do this explicitly for TAG in Section 7.1.
Note that, by construction, each variable occurs exactly once on the right-hand side of a given equation. Thus, this homomorphism h D is linear and complete.

Equivalence of D and h D
We can show that this definition in terms of the linear complete homomorphism h D is equivalent to the traditional definition D: The proof is by induction on the height of D. Since h D is the identity function everywhere except at operable sites, This serves as the base case for the induction. Now suppose, that Equation (6) holds for trees of height k, and consider tree □ α(D 1 , . . . , D n ) of height k + 1. Then The marked step applies the induction hypothesis. Later, in Section 7 we will provide a similar reformulation of the derivation relation for tree-adjoining grammars. To do so, however, requires additional power beyond simple tree homomorphisms, which is the subject of that section.

synchronous grammars
We perform synchronization of tree-adjoining and tree-substitution grammars as per the approach taken in earlier work (Shieber 1994). Synchronous grammars consist of pairs of elementary trees with a linking relation between operable sites in each tree. Simultaneous operations occur at linked nodes. In the case of synchronous treesubstitution grammars, the composition operation is substitution, so the linked nodes are substitution nodes. [ 76 ] We define a synchronous tree-adjoining grammar, then, as a quintuple G = 〈F in , F out , P, S in , S out 〉, where • F in and F out are the input and output ranked alphabets, respectively, • S in ∈ F in↓ and S out ∈ F out ↓ are the input and output initial symbols, and • P is a set of elementary linked tree pairs, each of the form are input and output trees and ⌢ ⊆ γ in × γ out is a bijection between operable sites from the two trees.
We define G in = 〈F in , P in , S in 〉 where P in = {〈γ, π in 〉 | 〈γ, γ ′ , ⌢〉 ∈ P}; this is the left projection of the synchronous grammar onto a simple TAG. The right projection G out is defined similarly. Recall that the elementary trees in this grammar need a permutation on their operable sites. In order to guarantee that derivations for the synchronized grammars are isomorphic, the permutations for the operable sites for paired trees should be consistent. We therefore choose an arbitrary permutation 〈p in,1 ⌢ p out,1 , . . . , p in,n ⌢ p out,n 〉 over the linked pairs, and take the permutations π in for γ in and π out for γ out to be defined as π in = 〈p in,1 , . . . , p in,n 〉 and π out = 〈p out,1 , . . . , p out,n 〉. Since ⌢ is a bijection, these projections are permutations as required.
A synchronous derivation was originally defined (Shieber 1994) as a pair 〈D in , D out 〉 where 6 1. D in is a well-formed derivation tree for G in , and D out is a wellformed derivation tree for G out , and 2. D in and D out are isomorphic. 7 The derived tree pair for derivation 〈D in , D out 〉 is then 〈D(D in ), D(D out )〉.
6 In our earlier definition (Shieber 1994), a third condition required that the isomorphic operations be sanctioned by links in tree pairs. This condition can be dropped here, as it follows from the previous definitions. In particular, since the permutations for paired trees are chosen to be consistent, it follows that the isomorphic children of isomorphic nodes are substituted at linked paths. 7 By "isomorphism" here, we mean the normal sense of isomorphism of rooted trees where the elementary-tree-pairing relation in P serves as the bijection witnessing the isomorphism.
[ 77 ] Presentations of synchronous tree-adjoining grammars typically weaken the requirement that the linking relation be a bijection; multiple links are allowed to impinge on a single node. One of two interpretations is possible in this case. We might require that if multiple links impinge upon a node, only one of the links be used. Under this interpretation, the multiple links at a node can be thought of as abbreviatory for a set of trees, each of which contains only one of the links. (The abbreviated form allows for exponentially fewer trees, however.) Thus, the formalism is equivalent to the one described in this section in terms of bijective link relations. Alternatively, we might allow true multiple adjunction of nontrivial trees, which requires an extended notion of derivation tree and derivation relation. This interpretation, proposed by Schabes and Shieber (1994), is arguably better motivated. We defer discussion of multiple adjunction to Section 8, where we address the issue in detail.

the bimorphism characterization of stsg
The central result we provide relating STSG to tree transducers is this: STSG is weakly equivalent to B(LC, LC), that is, equivalent in the characterized string relations. To show this, we must demonstrate that any STSG is reducible to a bimorphism, and vice versa.

Reducing STSG to B(LC, LC)
Given an STSG G = 〈F in , F out , P, S in , S out 〉, we need to construct a bimorphism characterizing the same tree relation. All the parts are in place to do this. We start by defining a language of synchronous derivation trees, which recasts synchronous derivations as single derivation trees from which the left and right derivation trees can be projected via homomorphisms. Rather than taking a synchronous derivation to be a pair of isomorphic trees D in and D out , we take it to be the single tree D isomorphic to both, whose element at address p is the elementary tree pair in P that includes D in @p and D out @p. The two synchronized derivations D in and D out can be separately recovered by projecting this new derivation tree on its first and second elements via homomorphisms: h in that projects on the first component and h out that projects on the second, respectively. These homomorphisms are trivially linear and complete (indeed, they are mere delabelings). [ 78 ] We define the set of well-formed synchronous derivation trees to be the set of trees D ∈ T(P) such that h in (D) and h out (D) are both well-formed derivation trees as per Section 4.3. Since tree automata are closed under inverse homomorphism and intersection, the set is a regular tree language.
The fact that for any tree D ∈ , h in (D) and h out (D) are wellformed derivation trees for their respective TSGs is trivial by construction. It is also trivial to show that any paired derivation has a corresponding synchronous derivation tree in .
For a given derivation tree D ∈ , the paired derived trees can be constructed as h D (h in (D)) and h D (h out (D)), respectively. Thus the mappings from the derivation tree to the derived trees are the compositions of two linear complete homomorphisms, hence linear complete homomorphisms themselves. We take the bimorphism characterizing the STSG tree relation to be 〈 , h D • h in , h D • h out 〉. Thus, the tree relation defined by the STSG is in B(LC, LC).

Reducing B(LC, LC) to STSG
The other direction is somewhat trickier to prove. Given a bimorphism 〈L, h in , h out 〉 over input and output alphabets F in and F out , respectively, we construct a corresponding STSG G = 〈F ′ in , F ′ out , P, S in , S out 〉. By "corresponding", we mean that the tree relation defined by the bimorphism is obtainable from the tree relation defined by the STSG via simple homomorphisms of the input and output that eliminate the nodes labeled in Q (as described below). The tree yields are unchanged by these homomorphisms; thus, the string relations defined by the bimorphism and the synchronous grammar are identical.
As the language L is a regular tree language, it is generable by a nondeterministic tree automaton h D = 〈Q, F d , ∆, q 0 〉. We use the states of this automaton in the input and output alphabets of the STSG. The input alphabet of the STSG is F ′ in = F in ∪ Q, composed of the input symbols of the bimorphism, along with the states of the automaton (taken to be symbols of arity 1), and similarly for the output alphabet. The state symbols mark the places in the tree where substitutions occur, allowing control for appropriate substitutions. It is these state symbols that can be eliminated by a simple homomorphism. 8

The basic idea of the STSG construction is to construct an elementary tree pair corresponding to each compatible pair of transitions in the transducer
we construct a tree pair where the following transformation is applied to the right-hand sides of the transitions to form the body of the synchronized trees: Note that this transformation generates the tree along with a permutation of the operable sites (all substitution nodes) in the tree, and that there will be exactly n such sites in each element of the tree pair, since the transitions are linear and complete by hypothesis. Thus, the two permutations define an appropriate linking relation, which we take to be the synchronous grammar linking relation for the tree pair. An example may clarify the construction. Take the language of the bimorphism to be defined by the following two-state automaton: troduce any extra tree structure in the STSG, so that the trees generated by the bimorphism relation could be recovered by a delabeling rather than a homomorphism deleting extra nodes. However, the proof of equivalence was considerably more subtle, and did not generalize as readily to the case of STAG. Nonetheless, it is useful to note that even more faithful STSG reconstructions of bimorphisms are possible. Alternately, the definition of STSG (and similarly, STAG) can be modified to incorporate finite-state information explicitly at operable sites. By adding in this information, the bookkeeping done here can be folded into the states, allowing for a stricter strong-generative capacity equivalence. This elegant approach is pursued by Büchse et al. (2014).
[ 80 ] This automaton uses the states to alternate g's with f 's and a's level by level. For instance, it admits the middle tree in Figure 7. With input and output homomorphisms defined bŷ = Aĥ out (a) . = N the bimorphism so defined generates the tree relation instance exemplified in the figure. The construction given above generates the elementary tree pairs in Figure 8 for this bimorphism. The reader can verify that the grammar generates a tree pair which corresponds to that shown in Figure 7 generated by the bimorphism after deletion of the state symbols.
By placing STSG in the class of bimorphisms, which have already been used to characterize tree transducers, we synthesize these two independently developed approaches to specifying tree relations. But the relation between a TAG derivation tree and its derived tree is not a mere homomorphism. The appropriate morphism generalizing linear complete homomorphisms to allow adjunction can be used to provide [ 81 ] a bimorphism characterization of STAG as well, further unifying these strands of research. It is to this possibility that we now turn.

embedded tree transducers
We have shown that the string relations defined by synchronous treesubstitution grammars were exactly the relations B(LC, LC). Intuitively speaking, the tree language in such a bimorphism represents the set of derivation trees for the synchronous grammar, and each homomorphism represents the relation between the derivation tree and the derived tree for one of the projected tree-substitution grammars. The homomorphisms are linear and complete because the tree relation between a tree-substitution grammar derivation tree and its associated derived tree is exactly a linear complete tree homomorphism.
To characterize the relations defined by synchronous tree-adjoining grammars, it similarly suffices to find a simple homomorphism-like char- [ 82 ] acterization of the tree relation between TAG derivation trees and derived trees. In Section 7.3 below, we show that linear complete embedded tree homomorphisms (which we introduce next) serve this purpose. Embedded tree transducers are a generalization of tree transducers in which states are allowed to take a single additional argument in a restricted manner. They correspond to a restrictive subcase of macro tree transducers with one recursion variable. We use the term "embedded tree transducer" rather than the more cumbersome "monadic macro tree transducer" for brevity and by analogy with embedded pushdown automata (Schabes and Vijay-Shanker 1990), another automata-theoretic characterization of the tree-adjoining languages.
We modify the grammar of transducer equations to add an extra optional argument to each occurrence of a state q. To highlight the special nature of the extra argument, it is written in angle brackets before the input tree argument. We uniformly use the otherwise unused variable x 0 for this argument in the left-hand side, and add x 0 as a possible right-hand side itself. Finally, right-hand-side occurrences of states may be passed an arbitrary further right-hand-side tree in this argument. (The use of square brackets in the metanotation indicates optionality.) Embedded transducers are strictly more expressive than traditional transducers, because the extra argument allows unbounded communication between positions unboundedly distant in depth in the output tree. For example, a simple embedded transducer can compute the reversal of a string, transducing 1(2(2(nil))) to 2(2(1(nil))), for instance. (This is not computable by a traditional tree transducer.) It is given by the following equations: This is, of course, just the normal accumulating reverse functional program, expressed as an embedded transducer. 9 The additional power of embedded transducers is exactly what is needed to characterize the additional power that TAGs represent over CFGs in describing tree languages, as we will demonstrate in this section. In particular, we show that the relation between a TAG derivation tree and derived tree is characterized by a deterministic linear complete embedded tree transducer (DLCETT). The first direct presentation of the connection between the treeadjoining languages and macro tree transducers -the basis for the presentation here -was given in an earlier paper . However, the connection may be implicit in a series of previous results in the formal-language theory literature. 10 For instance, Fujiyoshi and Kasai (2000) show that linear, complete monadic context-free tree grammars generate exactly the tree-adjoining languages via a normal form for spine grammars. Separately, the relation between contextfree tree grammars and macro tree transducers has been described, where the relationship between the monadic variants of each is implicit. Thus, taken together, an equivalence between the tree-adjoining 9 A simpler set of equations achieves the same end.
Unfortunately, this set of equations doesn't satisfy the structure of an embedded tree transducer given in Equation (7). Surprisingly, however, the compilation from equations to TAG presented in Section 7.2 applies to this set of equations as well, generating a TAG whose derived trees also reverse its derivation trees. 10 We are indebted to Uwe Mönnich for this observation.
[ 84 ] languages and the image languages of monadic macro tree transducers might be pieced together.
In the present work, we define the relation between tree-adjoining languages and linear complete embedded tree transducers directly, simply, and transparently, by giving explicit constructions in both directions. First, we show that for any TAG we can construct a DLCETT that specifies the tree relation between the derivation trees for the TAG and the derived trees. Then, we show that for any DLCETT we can construct a TAG such that the tree relation between the derivation trees and derived trees is related through a simple homomorphism to the DLCETT tree relation. Finally, we use these results to show that STAG and the bimorphism class B(E LC, E LC) are weakly equivalent, where E LC stands for the class of linear complete embedded homomorphisms.

From TAG to transducer
As the first part of the task of characterizing TAG in terms of DLCETT, we show that for any TAG grammar G = 〈F, P, S〉, there is a DLCETT 〈{h D }, P, F, ∆, h D 〉 (in fact, an embedded homomorphism), that transduces the derivation trees for the grammar to the corresponding derived trees. This transducer plays the same role for TAG as the definition of h D in Section 4.3 did for TSG. We define the components of the transducer as follows: The single state, evocatively named h D , is the initial state. The input alphabet is the set of elementary trees P in the grammar, since the input trees are to be the derivation trees of the grammar. The arity of a tree (qua symbol in the input alphabet) is as described in Section 4.3. The output alphabet is that used to define the trees in the TAG grammar, F.
We now turn to the construction of the equations, one for each elementary tree □ γ ∈ P. Suppose □ γ has a permutation π = 〈π 1 , . . . , π n 〉 on its operable sites. (We use this ordering by means of the diacritic representation below.) If γ is an auxiliary tree, construct the equation (x 1 , . . . , x n )) . = ⌊γ⌋ and if γ is an initial tree, construct the equation (x 1 , . . . , x n )) .
= ⌊γ⌋ [ 85 ] where the right-hand-side transformation ⌊·⌋ is defined by 11 ⌊ f (t 1 , . . . , t n )⌋ = f (⌊t 1 ⌋, . . . , ⌊t n ⌋) Note that the equations so generated are linear and complete, because each variable x i is generated once as the tree α is traversed, namely at position π i in the traversal (marked with i ), and the variable x 0 is generated at the foot node only. Thus, the generated embedded tree transducer is linear and complete. Because only one equation is generated per tree, the transducer is trivially deterministic. Because there is only one state, it is a kind of embedded homomorphism.
As noted for TSG in Section 4.3, by composing the automaton recognizing well-formed derivation trees from Section 4.3 with the embedded homomorphism above generating the derived tree, we can construct a single DLCETT doing the work of both. Where the construction of Section 4.3 would generate a transition of the form in Equation 2, repeated here as we compose this transition with the corresponding transition from the previous section h D 〈x 0 〉( □ γ(x 1 , . . . , x n )) . = ⌊γ⌋ 11 It may seem like trickery to use the diacritics in this way, as they are not really components of the tree being traversed, but merely manifestations of an extrinsic ordering. But their use is benign. The same transformation can be defined, a bit more cumbersomely, keeping the permutation π separate, by tracking the permutation and the current address p in a revised transformation ⌊·⌋ π,p defined as follows: We then use ⌊α⌋ π,ε for the transformation of the tree α. [ 86 ] or h D 〈〉( □ γ(x 1 , . . . , x n )) . = ⌊γ⌋ for auxiliary and initial trees respectively. The composition construction generates a transducer with states in the cross-product of the states of the input transducers. In this case, since the latter transducer has a single state, we simply reuse the state set of the former, generating 7.1.1 An example derivation By way of example, we consider the tree-adjoining grammar given by the following trees: Starting with the auxiliary tree β A = A( 1 B(a), 2 C( 3 D(A * ))), the adjunction sites, corresponding to the nodes labeled B, C, and D at addresses 1, 2, and 21, have been arbitrarily given a preorder permu- [ 87 ] tation. We therefore construct the equation as follows: Similar derivations for the remaining trees yield the (deterministic linear complete) embedded tree transducer defined by the following set of equations: We can use this transducer to compute the derived tree for the derivation tree as follows: (B(b, B(a)), C(D(A(e))))

Equivalence of D and h D
We can now show for TAG derivations, as we did for TSG derivations in Section 4.3, that the embedded homomorphism h D constructed in this way computes the derivation relation D.
[ 88 ] In order to simplify the argument, we take advantage of the commutativity of operations (Equation 4), and assume without loss of generality that each permutation associated with the operable sites of an elementary tree is consistent with a postorder traversal of the nodes in the tree. We can then simplify Equation 3 to γ[op p 1 γ 1 , op p 2 γ 2 , . . . , op p n γ n ] ≡ γ[op p 1 γ 1 ][op p 2 γ 2 , . . . , op p n γ n ] since in a postorder traversal, p i ̸ ≺ p i+k .
It will also prove to be useful to have a single notation for the effect of both substitution and adjunction. Recall the definitions of substitution and adjunction:

γ[adj p β] ≡ γ[p → β[foot(β) → γ/p]]
Under the convention that mapping a (nonexistent) "foot" of an initial tree leaves the tree unchanged, that is, the two operations collapse notationally, so that we can write for both substitution and adjunction.
We prove that D(D) = h D 〈〉(D) for derivations D rooted in an initial tree, and D(D)[foot(D(D)) → x] = h D 〈x〉(D) for derivations rooted in an auxiliary tree. The proof is again by induction on the height of the derivation D.
For the base case, the derivation consists of a single tree with no operable sites. If it is an initial tree α, then D(α) = α = h D 〈〉(α) straightforwardly from the definition of h D , using only the first equation in Equation (10). Similarly, the base case for auxiliary trees, requires only the first and third equations in (10). [ 89 ] For the recursive case, with the marked step appealing to the induction hypothesis.
Similarly, for derivations rooted in an auxiliary tree,

From transducer to TAG
Having shown how to construct a DLCETT that captures the relation between derivation trees and derived trees of a TAG, we turn now to showing how to construct a TAG that mimics in its derivation/derived tree relation a DLCETT. Given a linear complete embedded tree transducer 〈Q, F, G, ∆, q 0 〉, we construct a corresponding TAG 〈G ∪Q, P,q 0 〉 where the alphabet consists of the output alphabet G of the transducer together with the disjoint set of unary symbolsQ = {q 1 , . . . ,q |Q| } corresponding to the states of the input transducer. The initial symbol of [ 90 ] the grammar is the symbolq 0 corresponding to the initial state q 0 of the transducer. The elementary trees of the grammar are constructed as follows. For each rule of the form we build a tree named 〈q, f , τ〉. Where this tree appears is determined solely by the state q, so we take the root node of the tree to be the corresponding symbolq. Any foot node in the tree will also need to be marked with the same label, so we pass this information down as the tree is built inductively. The tree is therefore of the formq(⌈τ⌉ q ) where the right-hand-side transformation ⌈·⌉ q constructs the remainder of the tree by the inductive walk of τ, with the subscript noting that the root is labeled by state q.
Note that at x 0 , a foot node is generated of the proper label. (Because the equation is linear, only one foot node is generated, and it is labeled appropriately by construction.) Where recursive processing of the input tree occurs (q j 〈τ〉(x k )), we generate a tree that admits adjunctions atq j . The role of the diacritic k is merely to specify the permutation of operable sites for interpreting derivation trees; it says that the k-th child in a derivation tree rooted in the current elementary tree is taken to specify adjunctions at this node. The trees generated by this TAG correspond to the outputs of the corresponding tree transducer. Because of the more severe constraints on TAG, in particular that all combinatorial limitations on putting subtrees together must be manifest in the labels in the trees themselves, the outputs actually contain more structure than the corresponding transducer output. In particular, the state-labeled nodes are merely for bookkeeping. A simple homomorphism removing these nodes gives [ 91 ] the desired transducer output: 12 An example may clarify the construction. Recall the reversal embedded transducer in (8) above. The construction above generates a TAG containing the following trees. We have given them indicative names rather than the cumbersome ones of the form 〈q i , f , τ〉. α nil :ṙ(nil) α 1 :ṙ( 1ṡ(1(ni l))) α 2 :ṙ( 1ṡ(2(ni l))) β nil :ṡ(ṡ * ) β 1 :ṡ( 1ṡ(1(ṡ * ))) β 2 :ṡ( 1ṡ(2(ṡ * ))) It is simple to verify that the derivation tree α 1 (β 2 (β 2 (β nil )))) derives the treeṙ (ṡ 4 (2(ṡ(2(ṡ(1(nil))))))) .
Simple homomorphisms that extract the input function symbols on the input and drop the bookkeeping states on the output (that is, the homomorphism rem provided above) reduce these trees to 1(2(2(nil))) and 2(2(1(nil))) respectively, just as for the corresponding tree transducer.

Equivalence of DLCETT and TAG
We demonstrate that the compilation from DLCETT to TAG generates a grammar with the same language as that of the DLCETT by appeal to the previous result of Section 7.1.2. Consider a DLCETT T = 〈Q, F, G, ∆, q 0 〉 converted by the compilation above to a grammar G = 〈G ∪Q, P,q 0 〉. That grammar may itself be compiled to a DLCETT using the compilation of Section 7.1.2, previously shown to be language-preserving. We show that this round-trip conversion preserves the language that is the range of the DLCETT by showing that each equation in the original grammar "round-trip" compiles to an equation that differs only in the tree structure. In particular, a rule of the form q〈x 0 〉( f (x 1 , . . . , x m )) = τ compiles to the equation For each rule in T of the form q〈x 0 〉( f (x 1 , . . . , x m )) = τ, we generate a tree 〈q, f , τ〉 in the grammar G of the formq(⌈τ⌉ q ). This tree, in turn, is compiled as in Section 7.1 to an equation in the output transducer T ′ : ≈ ⌊⌈τ⌉ q ⌋ (Here and in the following, we write q for q |q| in the ⌊·⌋ construction, taking advantage of the bijection between theQ symbols and the corresponding states of the generated transducer.) Note that this is exactly of the required form, so long as ⌊⌈τ⌉ q ⌋ ≈ τ, which we now prove by induction on the structure of τ.
The last step follows from the induction hypothesis and the fact that r em removes the symbolq j .
Writing L(T ) for the range string language of the transducer T , we have that L(G) = L(T ′ ) and L(T ) = L(T ′ ). We conclude that L(T ) = L(G). In fact, by the above, the tree languages are identical up to the homomorphism rem. Most importantly, then, the weak generative capacity of TAGs and the range of DLCETTs are identical.

7.3
The bimorphism characterization of STAG The major advantage of characterizing TAG derivation in terms of tree transducers (via the compilation (10)) is the integration of synchronous TAGs into the bimorphism framework, which follows directly.
In order to model a synchronous grammar formalism as a bimorphism, the well-formed derivations of the synchronous formalism must be characterizable as a regular tree language and the relation between such derivation trees and each of the paired derived trees as a homomorphism of some sort. As shown in Section 6, for synchronous tree-substitution grammars, derivation trees are regular tree languages, and the map from derivation to each of the paired derived trees is a linear complete tree homomorphism. Thus, synchronous treesubstitution grammars fall in the class of bimorphisms B(LC, LC). The other direction holds as well; all bimorphisms in B(LC, LC) define string relations expressible by an STSG.
A similar result follows for STAG. Crucially relying on the result above that the derivation relation is a DLCETT, we can use the same method directly to characterize the synchronous TAG string relations as just B(E LC, E LC). We have thus integrated synchronous TAG with the other transducer and synchronous grammar formalisms falling under the bimorphism umbrella. 8

multiple adjunction
The discussion so far has assumed that derivations allow at most one operation to occur at any given node in an elementary tree (in fact, exactly one). This constraint inhered in the original formulations of TAG derivation (Vijay-Shanker 1987), and had the effect of removing systematic spurious ambiguities without reducing the range of defin- [ 94 ] able languages. Schabes and Shieber (1994) point out the desirability of allowing multiple adjunctions at a single node, and provide various arguments for this generalization, most notably as needed for many applications of synchronous TAG, which is precisely the case that we are concerned with in this paper. It therefore behooves us to examine the effect of multiple adjunction on the analysis.
There are various ways in which multiple adjunction can be inserted. Most simply, one could specify that the set of operable nodes of a tree allows for a given node in the set a fixed number of times. (This could be graphically depicted by allowing more than one diacritic at a given node, with each diacritic to be used exactly once.) In theory, this would allow multiple nontrivial adjunctions to occur at a single node, inducing ambiguity as to the resulting derived tree, but we can eliminate this possibility by requiring that nontrivial (that is, non-na) trees be adjoined at at most one site at a given node. We start by handling this kind of simple generalization of TAG derivation in Sections 8.1-8.2.
More generally, Schabes and Shieber (1994) call for allowing an arbitrary number of adjunctions at a given node. In particular, they call for distinguishing predicative and modifier auxiliary trees, and allowing any number of modifier trees and at most one predicative tree to adjoin at a given node. The derived tree is ambiguous as to the relative orderings of the modifier trees, but the predicative tree is required to fall above the modifier trees. We address this major generalization of TAG derivation in Section 8.3.

Simple multiple adjunction
We start with a simple generalization of TAG derivation in which operable nodes may be used a fixed number of times. Since the set of operable nodes may now include duplicates, adjunction nodes may occur more than once in the permutation π. To guarantee that at most one of these can be nontrivially adjoined, we need to revise the definition of derivation tree, that is, fix the tree automaton from Section 4.3 defining well-formed derivation trees, and to prove that the derivation relation D is still well-defined.
We present an alternative automaton defining the regular tree language of well-formed derivation trees now allowing the limited form of multiple adjunction. We double the number of states from [ 95 ] the previous construction. The states of the automaton are the set {q N △ | N ∈ F} ∪ {q N • | N ∈ F}, two for each unranked vocabulary symbol in the derived tree language. The △ diacritic indicates a nontrivial tree rooted in the given symbol; the • diacritic requires a nonadjunction na tree rooted in that symbol. The start state is q |S|△ .
For each nontrivial tree (that is, not an na tree) □ γ = 〈γ, π〉, of arity n and rooted with the symbol N , we construct all possible transitions of the form where each q i is either q |γ@π i |• or q |γ@π i |△ , subject to the constraint that for each node η in α, the sequence 〈q i | π i = η〉 contains at most one △. Because there are many such ways of setting the q i to satisfy this constraint, there are many (though still a finite number of) transitions for each γ. In addition, for na trees, there is a transition The set of well-formed derivation trees is thus still a regular tree set.
The only remaining issue is to verify that the limited form of multiple adjunction that we allow still yields a well-defined derived tree. In general, multiple adjunctions at the same site do not commute. However, the only cases of multiple adjunctions that we allow involve all but one of the auxiliary trees being vestigial nonadjunction trees.

Fixed multiple adjunction
What if we allow more than one of the multiple (fixed) occurrences of a node to be operated on by a nontrivial auxiliary tree? At that point, [ 96 ]  the definition of simultaneous operations no longer commutes, and which auxiliary tree is used at which position becomes important. The definition of the derivation tree language given in Section 4.3 allows such derivations to be specified merely by relaxing the constraint that a node appears only once in the set of operable sites. If we move to a multiset of operable sites, with π a permutation over that multiset, the remaining definitions generalize properly.
We present (Figure 9) a fragment based on the semantic half of a synchronous TAG presented previously (Shieber 1994, Figure 1) to exemplify simultaneous adjunction. This grammar uses simultaneous adjunction at the root of the α blink tree. That tree has three operable sites, two of which are the root node. We will take the permutation of operable sites for the tree to be 〈 1 , 2 , 3 〉.
We can examine what the compilation of Section 7.1 provides as the interpretation for this grammar. Applying it to the output trees in the grammar generates a DLCETT. We start with the problematic multiple adjunction tree α blink . q F 〈〉(α blink (x 1 , x 2 , x 3 )) . = ⌊α blink ⌋ = ⌊ 1 2 F (blink, 3 T )⌋ = q F 〈⌊ 2 F (blink, 3 T )⌋〉(x 1 ) = q F 〈q F 〈⌊F (blink, 3 T )⌋〉(x 1 )〉(x 2 ) = q F 〈q F 〈F (blink, q T 〈〉(x 3 ))〉(x 1 )〉(x 2 ) (Here, the second line uses the obvious generalization of the second equation of (10) to sets of diacritics, that is, ⌊ k · · · f (t 1 , . . . , t n )⌋ = q | f | 〈⌊· · · f (t 1 , . . . , t n )⌋〉(x k ) , the ellipses standing in for arbitrary further diacritics.) [ 97 ] The second and third steps are notable here, in that the choice of which of the two operable sites to use first was arbitrary. That is, one could just as well have chosen to process diacritic 2 before 1 , in which case the generated rule would have been q F 〈〉(α blink (x 1 , x 2 , x 3 )) . = q F 〈q F 〈F (blink, q T 〈〉(x 3 ))〉(x 2 )〉(x 1 ) . This is, of course, just the consequence of the fact that multiple adjunctions at the same node do not commute. To manifest the ambiguity, we can just generate both transitions (and in general, all such transitions) in the transducer defining the derivation relation. The transducer naturally becomes nondeterministic. Alternatively, a particular order might be stipulated, regaining determinism, but giving up analyses that take advantage of the ambiguity.
[ 98 ]  { wcw R | w ∈ {a, b} * } using general multiple adjunction, and (b) a derivation of the string ab ba.

General multiple adjunction
Finally, fully general multiple adjunction as described by Schabes and Shieber (1994) allows for one and the same operable site to be used an arbitrary number of times. To enable this interpretation of TAG derivations, major changes need to be made to the definitions of derivation tree and derivation relation. Consider the sample grammar of Figure 10 where the two auxiliary trees β a and β b are modifier trees (in the terminology of Schabes and Shieber (1994)) and thus allowed to multiply adjoin at the two operable nodes in the initial tree. This grammar should generate the language { wcw R | w ∈ {a, b} * }.
Derivation trees must allow an arbitrary number of operations to occur at a given site. To represent this in a ranked tree, we can encode the sequence of trees adjoined at a given location with a recursive structure. In particular, we use a binary symbol · (which we write infix) to build a list of trees to be adjoined at the site, using a nonadjunction tree to mark the end of the list. Essentially, derivation trees now contain lists of auxiliary trees to operate at a site rather than a single tree, with the nonadjoining trees serving as the nil elements of the list and the binary · serving as the binary constructor. For example, a derivation for the grammar of Figure 10(a) can be represented by the tree in Figure 10(b).
The derivation tree language with lists instead of individual trees is still regular. In fact, the full specification of multiple adjunction given by Schabes and Shieber (1994) specifies that at a given operable site an arbitrary number of modifier trees but at most one predicative [ 99 ] tree may be adjoined. Further, the predicative tree is to appear highest in the derived tree above the adjoined modifiers. This constraint can be specified by defining the derivation tree language appropriately, allowing at most one predicative tree, and placing it at the end of the list of nontrivial trees adjoining at a site. It is a simple exercise to show that the derivation tree language so restricted still falls within the regular tree languages.
Finally, we must provide a definition of the derivation relation for this generalized form of multiple adjunction. In particular, we need transitions for the new form of constructor node, which specifies the combination of two adjunctions at a single site. We handle this by stacking the rest of the adjunctions above the first. We add to the definition of the derivation transducer of Section 7.1 transitions of the following form for each symbol N that is the root of some auxiliary tree: Note that the new transition is still linear and complete. For the grammar of Figure 10(a) we would thus have the following transitions defining the derivation relation: Using this derivation relation, the derived tree for the derivation tree of Figure 10(b) can be calculated as corresponding to the string a bc ba as expected.
[ 100 ] conclusion Synchronous grammars and tree transducers -two approaches to the specification of language relations useful for a variety of formal and computational linguistics modeling of natural languages -are unified by means of the elegant construct of the bimorphism. This convergence synthesizes the approaches and allows a direct comparison among these and other potential systems for describing language relations through other bimorphisms. The examination of additional bimorphism classes may open up further possibilities for useful modeling tools for natural language.
acknowledgements This paper has been gestating for a long time. I thank the participants in my course on Transducers at the 2003 European Summer School on Logic, Language, and Information in Vienna, Austria, where some of these ideas were presented, and Mark Dras, Mark Johnson, Uwe Mönnich, Rani Nelken, Rebecca Nesson, James Rogers, and Ken Shan for helpful discussions on the topic of this paper and related topics. The extensive comments of the JLM reviewers were invaluable in improving the paper. This work was supported in part by grant IIS-0329089 from the National Science Foundation.