Computational modelling of Yorùbá numerals in a number-to-text conversion system

abstract

In this paper, we examine the processes underlying the Yorùbá numeral system and describe a computational system that is capable of converting cardinal numbers to their equivalent Standard Yorùbá number names.First, we studied the mathematical and linguistic basis of the Yorùbá numeral system so as to formalise its arithmetic and syntactic procedures.Next, the process involved in formulating a Context-Free Grammar (CFG) to capture the structure of the Yorùbá numeral system was highlighted.Thereafter, the model was reduced into a set of computer programs to implement the numerical to lexical conversion process.System evaluation was done by ranking the output from the software and comparing the output with the representations given by a group of Yorùbá native speakers.The result showed that the system gave correct representation for numbers and produced a recall of 100% with respect to the collected corpus.Our future study is focused on developing a text normalisation system that will produce number names for other numerical expressions such as ordinal numbers, date, time, money, ratio, etc. in Yorùbá text.

introduction
The use of numbers and their power in capturing concepts makes them indispensable in effective communication (Goyvaerts 1980).In any society, the use of numbers is firmly anchored to numerous beliefs and perceived usefulness of the significant philosophy underlying numerical messages (Abímbọĺá 1977).In fact, key advancement in civilisation can be traced to the conception, invention, representation, and manipulation of numbers to facilitate accurate rendering of measurable objects.This has made the use of numbers an important tool within the society, where it is used in trade, cosmology, mathematics, divination, music, medicine, etc.Early cultures devised various means of number representation, which include body/finger counting (Zaslavsky 1973;Saxe 1981), object counting, Egyptian numerals, Babylonian numerals, Greek numerals, Chinese numerals, Roman numerals, Mayan numerals, Hindu-Arabic numerals, etc.The Hindu-Arabic numeral system, which is considered to be the greatest mathematical discovery (Bailey and Borwein 2011), is still the most commonly used symbolic representation of numbers due to its simplicity and the fact that it requires little memorisation to represent practically any number.
Naming numbers in human languages requires various mathematical and linguistic processes.For example, the number 74 is represented as 70 (7 × 10) increased by 4 in English, whereas it is represented as 60 (6 × 10) increased by 14 (4 + 10) in French.In Logo, the number 74 is represented as 10 added to 60 (20 × 3) increased by 4. In Yorùbá, in turn, the same number is derived in a more complex way by adding 4 to 80 (20 × 4) reduced by 10.Table 1 shows the representation of the number 74 in the four languages.
The analysis of number names is important but understudied in human language processing.While it may seem trivial to compute number names in languages like English, it may be difficult to get it right in many other languages, particularly in the Yorùbá language.In this paper, we present a formal description of the Yorùbá numeral system; specifically, the problem of Yorùbá number name transcription is addressed from an engineering perspective, by applying standard theories and techniques to an understudied language.This is part of a wider interest in the development of Text-To-Speech (TTS) synthesis and Machine Translation (MT) systems for the Yorùbá language.In TTS and related applications, text normalisation is often the first task, in which Non-Standard Words (NSW) such as numbers, abbreviations, acronyms, time, date, etc. are correctly identified and expanded into their textual forms (Sproat 1996).The expansion of numerical expressions in text is thus a key task in such applications because numbers occur more frequently in varying forms within a block of text.These forms include cardinal numbers, ordinal numbers, telephone numbers, date, time, percentages, monetary value, address, etc.The rest of this paper is structured as follows: Section 2 gives an analysis of the Yorùbá numeral system and its associated number naming rules.Section 3 discusses the system design and implementation, while Section 4 discusses the results.Section 5 presents the system evaluation and Section 6 concludes the paper with areas of further study.

the yorùbá numerals
The Yorùbá language (ISO 639.3 yor), which belongs to the West Benue-Congo branch of the Niger-Congo African languages family, is spoken by about 19,000,000 speakers in the South-Western Nigeria (Owolabi 2006).The language is also spoken in other West African countries such as Central Togo, the East-Central part of the Republic of Benin, and Creole population of Sierra Leone.Outside Africa, Yorùbá (called Nagô, Aku, or Lukumi; Lovejoy and Trotman 2003) is spoken in Brazil, Cuba, and Trinidad and Tobago.
Without a formal method of documenting literature, the Yorùbá community developed a complex numeral system that extensively uses subtraction throughout its system (Verran 2001).This has attracted many linguistic scholars to investigate the reasons why this community has developed an intricate numeral system.Certainly, knowledge of the Yorùbá numeral system has been passed from generation to [ 169 ] generation by means of oral literature.Young language learners, in particular, are made to undergo drills of reciting rhymes with numbers ranging from 1 to 10.
In an early study of the Yorùbá numeral system, Mann (1887) shows how large numbers could be represented as an arithmetic combination of the basic number units and reveals that the subtraction operation plays an important role in number naming.The peculiarity in the Yorùbá numerals was highlighted as follows: "Very different is the framework of the Yorùbá, it can boast of a greater number of radical names of numerals, and to a large extent makes use of subtraction..." (Mann 1887, p. 60) A fact worth noting is that some systems illustrate a pervasive use of the subtractive techniques.Examples of such systems are the clock system and the Roman numeral system.In the conventional clock system, when the minute part of time is greater than 30 minutes, the spoken representation can be derived by employing the subtractive technique.For instance, four canonical representations of 2:30 PM are: All four representations in (i) to (iv) are acceptable and none has precedence over the other.The form in (iv) is used to a large extent in our daily lives without any difficulty.Similarly, halb zwei in German means 'half of the second hour', which is 'half one'.So, the Yorùbá's use of subtraction is not completely exceptional, but its extensive usage may seem unusual, especially when it is preferred over the simpler addition operation.
Another observable feature of the Yorùbá numeral system is the use of base 20 (vigesimal), which likely stems from the counting of cowry shells as described by Mann: "Here we may explain the origin of this somewhat cumbersome system; it springs from the way in which the large sum of money (cowries) are counted.When a bagful is cast on the floor, the counting person sits or kneels down beside it, takes 5 and 5 cowries and counts silently, 1, 2, up to 20, thus 100 are counted off, this is repeated to get a second 100, these little heaps each of 100 cowries are united, and a next 200 is, when counted, swept together with the first" (Mann 1887, p. 63) However, there are vigesimal systems that do not have any relation to cowry shells.A more obvious reason for vigesimal systems could be that humans have 10 fingers and 10 toes.The use of 20 as a base may seem cumbersome, however, it is not entirely exceptional.In many languages, especially in Europe and Africa, 20 is a base with respect to the linguistic structure of the names of certain numbers.Even so, a consistent vigesimal system based on the powers of 20, i.e.: 20, 400, 8000, etc. is not generally used.Examples of a strict vigesimal numeral system are those of Maya and Dzongkha (the national language of Bhutan).The numeral systems of the Ainu language of Japan and Kaire language of Sudan also rely, to an extent, on base 20 for the representation of numbers.Apart from Yorùbá, other African languages with vigesimal numeral system are: Madingo, Mundo, Logone, Nupe, Mende, Bongo, Efik, Vei, Igbo, and Affadeh (Conant 1896).The study by Conant (1896) highlighted the extent of the mental computation required in the expression and conception of the Yorùbá numerals, and concluded that the Yorùbá numeral system is the most peculiar numeral system in existence.One might then begin to wonder why the Yorùbá language, with a simple syllabic structure, will use such a complex numeral system.The reason for this may not be too clear.Johnson (1921) conducted an analysis of the Yorùbá numerals by focusing on the derivation processes and the morphophonological rules required, and showed how large numbers are calculated in multiples of 20,000.The study by Abraham (1958) examined the arithmetic skills employed in different Yorùbá numeral groups, and provided a guide into their syntactic representation.A profound study on Yorùbá numerals was done by Ẹkúndayọ̀(1977), where the derivational breakdown of the Yorùbá numerals was discussed and the structural representation of Yorùbá numerals was illustrated.In the study, 16 basic number lexemes which serve as the basic building blocks of the Yorùbá numeral system were identified as presented in Section 2.1.
These forms of lexemes are used with multiples of 100 between 200 and 20,000.The lexical representation of 20 has two values, i.e., ogún or okòó, which are used in different contexts.Okòó is the only form used in initial word positions when it is added to (ó lé) or subtracted from (ó dín) a vigesimal, while ogún is used with the multiplication formatives in numerical derivation.To illustrate this, 220 may either be expressed as igba ó lé ogún (200 increased by 20) or okòólérúgba (20 added to 200) but not as igba ó lé okòó or ogúnlérúgba although they would represent the same quantity.
Numbers are generated using syntactic combination of these lexemes, and only three of the basic mathematical operators are required to represent an infinite set of numbers within the Yorùbá language.These operators are represented by special position words like lé for addition, dín for subtraction, and ọǹà for multiplication.However, it should be pointed out that subtraction has an unusually higher functional load than addition.An exponential represented as ìlọpo may be required to express very large numerals as powers of 20,000 (Ọdẹjọbí 2003) but this is not generally used in the Yorùbá numeral system.

Overcounting in Yorùbá numerals
We have mentioned the use of three of the standard arithmetic operations (i.e., multiplication, subtraction, and addition) in the Yorùbá numeral system.However, it is important to discuss a special mode of subtraction depicted by ẹẹdín and its variant, aadín.The ẹẹdín phenomenon was well articulated in Ẹkúndayọ̀(1977), where a detailed [ 172 ] explanation of this concept was given.Overcounting (Menninger 1969) occurs when a numeral is expressed in relation to a higher numeral.Overcounting, thus, becomes inevitable within any numeral system employing subtraction operation in number representation.
In the Yorùbá numerals system, when ẹẹdín is used with a number, it implies that the number must be reduced by a certain value.The use of ẹẹdín or aadín is context-dependent; hence, the value deducted varies depending on the numeral to which it is attached.This is shown in Table 2.When ẹẹdín' is used with numbers 20 and 30, 5 is deducted from them to produce 15 (ẹẹdín ogún = ẹẹdógún) and 25 (ẹẹdín ọgbọǹ = ẹẹdọǵbọǹ) respectively.But if ẹẹdín is used with 600, 100 is deducted to produce 500 (ẹẹdín ẹgbẹta = ẹẹdẹǵbẹta).The concept of overcounting is also noticeable in the numeral systems of Ainu and Maya.Danish, an essentially Germanic language, also exhibits a related subtractive phenomenon (Conant 1896) as illustrated below: Notably, the process of naming numbers in Danish is similar to Yorùbá.Now, we present the rules used in naming numbers in Yorùbá.

Yorùbá number naming rules
There are basic rules that hold in the generation of an infinite set of number names in the Yorùbá language as captured in Figure 1.As observed by Hurford (2001), numeral sequences in human languages show several discontinuities in their patterns of representation.Therefore, it is important to identify numeral groups that exhibit similar derivative process within the Yorùbá numeral system.This is to [ 173 ]  b) Numbers from 11 to 200: The addition operation is used for deriving numbers from one to four above multiples of 10, while numbers from five to one below such points are obtained through subtraction as illustrated in Figure 2. The Yorùbá lexical representation of number 11 is formed as an additive concatenation of the terms for numbers 1 and 10.This also applies to numbers 12, 13, and 14 as :  e) Numbers 20,000 and above: Numerals greater than 20,000 are derived as a multiple of 20,000 (ọkẹ́kan = twenty thousand in one place).

2.4
The linguistic structure of numerals In this section, we review two important bibliographic references on the syntactic structure of numerals.The first one is Hurford (1975), which is an extensive study of various numeral systems.The other one is the study conducted by Ẹkúndayọ̀(1977), in which phrase structure rules were proposed for the Yorùbá numeral system.

Hurford's generative numeral grammar
A notable work on the application of generative grammar to numerals is the work of Hurford (1975), where the universal phrase structure rules for deriving numerals were presented.A modified version of the phrase structure rules was presented in Hurford (2007), being a significant improvement with respect to well-formed numerals.In this extensive study of numerals, Hurford considered numerals as syntactic structural constructs and proposed a universal constraint on numerals, which he called the packing strategy.The packing strategy helps to make the right choice for a number name from different structural constructs derived by the production rules presented in Definition 1.
The packing strategy guides the general constraints on the well-formed nature of numerals and any structure containing an ill-formed structure is itself ill-formed.

Definition 1 (Hurford's production rules for Yorùbá numerals)
Hurford's production rules for the Yorùbá numeral system are as follows: Where DIGIT is a set of basic number lexemes, M is a set of multiplicative base lexemes, NUM is a numeral and the start symbol, and NP is a Number Phrase.Rule ( 2) is interpreted as addition/subtraction, and it can occur in reverse order, i.e., NUM → NUM NP.Rules ( 3) and ( 4) are interpreted as multiplication when two constituents are chosen.The curly brace in the production rules shows alternative productions, while parenthesis indicates [ 177 ] an optional item.For example, an NP can be formed from a single NUM or a multiplicative combination of M NUM.
Hurford's generative framework provides an adequate account for the numeral system of most languages including English.However, the grammar was structurally inadequate for the Yorùbá numeral system.It is worth noting that the grammar does not provide an adequate mechanism to differentiate between the addition and subtraction operations in Rule (2).For example, Hurford (1975) presented structures for 46 and 4,600, as shown in Figure 3.In the structure in Figure 3(a), 46 (ẹrindínlààdọta) was derived by deducting 4 (ẹrin) from 50 (ààdọta) and 50 was derived by deducting 10 from 60 (ọgọta).In Figure 3(b) and (c), representing structures for 4, 600, i.e., ẹgbẹtalélogún (200 × 23) and ẹgbààjì ó lé ẹgbẹta (4,000+600) respectively, 23 (ẹtalélogún) was derived by adding 3 (ẹta) to 20 (ogún) and 4,600 was derived by 4,000 plus 600.Rule NUM → NUM NP is interpreted as subtraction in Figure 3(a), whereas, it is interpreted as addition in Figures 3(b) and (c).This means that the structure in Figure 3(a) could be misinterpreted as 54, and structures in Figure 3(b) and (c) as 3,400.Therefore, this introduces ambiguity in interpretation.It is also important to point out that Rule (4) results in an incorrect interpretation of the structures of M .To illustrate this, the rule represents 20 (ogún) as a combination of 10 (ẹẁá) and 2 (èjì), which is structurally incorrect.This is because ogún is not formed, by any means, from the combination of ẹẁá and èjì.
The study also acknowledged that multiple structures may exist for some numbers like 4,600, as shown in Figures 3(b) and 3(c), but concluded that the structure in 3(c) was ill-formed, whereas it is a valid structure in Yorùbá.This conclusion could result from a limited expert knowledge in verifying the correctness of these structures, as noted: "Despite the difficulty in finding crucial information in the sources, it is conceivable that some complete account of Yorùbá numerals can be given that is soundly motivated.This language certainly presents the weightiest challenge for a general theory of numerals that we have encountered."(Hurford 1975, p.The study conducted by Ẹkúndayọ̀(1977) reveals that there exist similarities between the mechanism used in the Yorùbá language for constructing an infinite number of sentences from a finite set of building blocks and constructing an infinite set of numerals from a limited set of basic numbers.This proposition was corroborated into 3 different concepts as shown in Table 3.This shows that all Yorùbá numerals can be sententially represented through the addition, subtraction, and multiplication operators.The study also shows that some numbers have multiple representations in the Yorùbá language, but constraints of correctness are imposed on these representations.These constraints include linguistic and structural plausibilities.
Apart from the concept of infinity, creativity, and paraphrasable representation of numerals, Ẹkúndayọ̀(1977) demonstrated that a recursive grammar is needed for numeral derivation and representation.It was observed that the recursive rules are not easily established for the Yorùbá numerals, however, a set of phrase structure (PS) rules for the Yorùbá numeral system was given as shown in Definition 2. Yorùbá numerals also require a high level of creativity as higher numerals must be recreated every time they are used.
3 Paraphrase A single idea could be represented in several ways.
A single number may also be represented in different forms in Yorùbá numerals.
[ 180 ] Definition 2 (Ẹkúndayọ's PS rules for Yorùbá numerals) Ẹkúndayọ̀phrase structure rules for the Yorùbá numeral system are as follows: Where NUM is a numeral and the start symbol, NP is a noun phrase, VP is a verb phrase, S is a sentence, N is the set of 16 basic number lexemes, PRON is the formative ó ('it'), and V is a verb represented as the operating formatives ọǹà (for multiplication), dín (for subtraction), and lé (for addition).NOTE: Rule ( 8) was presented as V → V N P in the original article but it was modified to make the grammar complete.
The point of interest here is that verbs are used in number naming, and that numbers are sententially represented in their surface structure.This allows for a distinction between addition and subtraction operations.This is illustrated by the structure of ẹrinléláàdọta (54), shown in Figure 4, where the operating formative (V ) is explicitly represented.Although these PS rules proved useful in the derivation of Yorùbá numerals, they are mostly arithmetic rather than syntactic rules as the positions of the basic lexical numerals and operatives do not correspond to their positions in the surface structure.An example would be the surface structure of ẹrinléláàdọta (54) represented in Figure 4 as ((ọgún ọǹà mẹta) ó dín ẹẁá) ó lé ẹrin rather than ẹrin ó lé (aadín (ọgún ọǹà mẹta)), thereby leading to a misrepresentation of numerals.
Another problem with Ẹkúndayọ's PS rules is that multiplicative bases (M) in Hurford's grammar are not captured.The multiplicative bases help to understand which numbers are important milestones in a numeral system.Hence, in this paper, we used knowledge from these two models to capture the essential components of the Yorùbá numerals.The grammar developed captures the multiplicative bases and treats Yorùbá numerals as both arithmetic and syntactic constructs.It has been shown that the Yorùbá numeral system is very methodical, thus, an efficient computational system is required to gain accuracy in number representation.Figure 5 presents the block diagram of number transcription in the Yorùbá language.There are four important processes in this model.First, there is the number decomposition process, where numbers are expressed as a sum of smaller numbers in harmony with the sub-grouping discussed in Section 2.3.The output of this process is the magnitude stack.Next, there is a process that generates the possible forms of a single number.This is done by careful combinations of neighbouring elements of the magnitude stack and parsing them with the designed numeral grammar.This is done by using  the packing strategy to verify whether the structures are well-formed.The third process is where tokens of the number forms are converted to their equivalent lexical forms, and the final process is where the morphophonological rules employed in Yorùbá naming numbers are applied.

Number decomposition to vigesimal
Within the Yorùbá numeral system, every number can be represented using a combination of five different smaller terms, each drawn from the possible groups of the Yorùbá numeral system.So, the first process is to generate the magnitude stack from the given number.This generates five new numbers where a) d 0 is 0 or a member of subgroup (a), i.e., d 0 takes values from 0 to 9, i.e., d 0 ∈ DIGIT = {0, 1, 2, ..., 9}.b) d 1 is 0 or a member of subgroup (b), i.e., e) d 4 is 0 or a member of subgroup (e), i.e., [ 183 ] [400,000, 12,000, 900, 80, 7] 1, 876,234 [1,860,000, 16,000, 200, 30, 4] These new numbers can be derived using Algorithm 1.Any of The magnitude stacks of some numbers are presented in Table 4.
For example, the magnitude stack generated for number 1,876,234 was: ,860,000, 16,000, 200, 30, 4] In the next section, we discuss how the representations of single numbers are generated.

Generating forms of a number
Once the magnitude stack has been computed, the next task is to generate the possible forms of the number in Yorùbá.All the possible Yorùbá forms of a number are derived by some combinations of neighbouring elements of the magnitude stack.The possible forms for a number with magnitude stack of [d 4 , d 3 , d 2 , d 1 , d 0 ] are listed in Table 5.For example, the magnitude stack for 19,669 is 000,600,60,9], and the possible forms are shown in Table 6.All possible forms are then stored in the form stack.However, not all numbers exhibit all these forms.The number of forms largely depends on the values of d 4 , d 3 , d 2 , d 1 , and d 0 .
Thereafter, the elements of the form stack are expanded to a form containing only the symbols representing the basic lexical items.The expanded form stack for number 19,669 is presented in Table 7.In these forms, '×' represents multiplication, '−' and '+' represent subtraction and addition within a number phrase respectively; '−−' and '++' represent subtraction and addition between number phrases respectively, as discussed in Section 3.3 b(ii).It should be noted that return magnitudeStack.reverse();end procedure arithmetic is mostly done from right to left in the Yorùbá numeral system, i.e., 2 − 20 implies 2 removed from 20, which gives 18.In the same way, (10 − (20 × 4)) implies 10 deducted from (20× 4) to give 70.

Context-free grammar for Yorùbá numerals
We studied the structures of the five numeral groups discussed in Section 2.3, from which some patterns became apparent.We started the design of the CFG by identifying the set of terminal symbols which are: a) The set of lexemes listed in Section 2.1.
Thus the set of terminal symbols, T , is made up of all elements in: DIGIT, M , V , VV, and REDUCE.
The start symbol is a numeral which is denoted by NUM.Since a CFG is the union of simpler grammars (Sipser 2007), we started by constructing rules for structures of numerals that could occur as a number phrase.
A phrase could be formed as a single DIGIT (Ẹkúndayọ̀1977) or from the multiplication of M and DIGIT.A phrase formed by multiplication is denoted by MP (Hurford 2007)  by NP → DIGIT).Also, Rule ( 11) is recursive to handle multiple levels of multiplication.For example, 18,000 (ẹgbààsán) is represented as 2,000 multiplied by 9, and 2,000 is subsequently represented as 200 multiplied by 10, as shown in the parse tree in Figure 6.
We then added a rule to make allowance for the ẹẹdín/aadín type of subtraction.The phrase ẹẹdín/aadín can only occur as a prefix to a number derived from a multiplication operation.When this is done, the value deducted depends on the number to which it is prefixed (discussed in Subsection 2.2).A further example is 50 (ààdọta), which is derived by deducting 10 from 60 (ọgọta).Rule (12) captures this as shown in the structure in Figure 7.
With the inclusion of this rule, it should be pointed out that it has some obvious consequences.The rule overgenerates, that is, it allows the use of ẹẹdín or aadín without respecting Table 2.We shall devise means of filtering out ill-formed structures using the packing strategy.
The next stage refers to how the operators V (Verbs) are represented within a phrase.Within a phrase, the Yorùbás start number presentation with the smaller number (Addend/Subtrahend) rather than the larger number (Augend/Minuend).For instance, number 21 (twenty one) is represented as ọkànelélógún (1+20) in Yorùbá.We then considered '1 +' as a verb phrase (VP), which is made up of a DIGIT and a V as presented in Rule (13): A VP can then be combined with an NP to make up an NP, i.e.: Also, the order in Rule (11) can be reversed to capture the structure of numbers like 600,000 (ọgbọǹ ọkẹ), which is represented as 30 times 20,000.The multiplicative base now is positioned at the end of the rule, and can only take ọkẹ́(20,000) as a value.The outcome of this new rule is a phrase (NP), since it cannot be used as a multiplicand (MP) to derive higher numerals.For example, 1,200,000 cannot be represented as ọgbọǹ ọkẹ́ọǹạ̀mẹẃàá ((30 × 20,000) × 10), but as ọdúnrún ọkẹ́(300 × 20,000).So we added Rule (15).The structure of number 600,000 is shown in Figure 8.
Next, we created rules to connect these phrases together to form a number.So, a number could be formed from a phrase, i.e.: NUM → NP (16) [ 189 ] Also, a number could be formed by combining an existing number with a phrase using the lexical operatives in the set VV.We added two rules to capture this as follows: Although multiplication plays an important role in Yorùbá numerals, its lexical representation, ọǹà does not occur in number names except when more than one 20,000 (ọkẹ) occur within a number phrase.For example, 400,000,000 is represented as 20,000 × 20,000, i.e., ọkẹọ ǹà ọkẹ́kan, and the structure is also captured using Rule (18) as shown in Figure 9.
Finally, all these rules were merged to make up the production rules of the Yorùbá numeral grammar, as presented in Definition 3.

Definition 3 (Production rules of the Yorùbá numeral grammar)
The production rules for the Yorùbá numeral system are as follows: These phrase structure rules include the verbs which are operating formatives (V and VV) proposed by (Ẹkúndayọ̀1977).These rules pro- duce a single and correct structure for most Yorùbá numerals, however, the rules overgenerate with some numerals.For example, the number 1,000,000 produces 3 structures as presented in Figure 10, but the valid structure is determined using a single packing strategy defined in Definition 4.

Definition 4 (Packing strategy for the Yorùbá numeral system)
The following metarules govern well-formed Yorùbá numeral structures: (i) Whenever a phrase MP is formed by a multiplicative combination of two numerals, the multiplicand (MP) must be greater than the multiplier (NP).
(ii) Whenever the rule NP → REDUCE MP is used, the lexical item of REDUCE must correspond to the appropriate MP, as shown in Table 2.
(iii) Whenever the rule S → VV NP is used and the VV has the value of ọǹà, then NP can only take a value of ọkẹ, and the resulting S must be used with a multiple of ọkẹ.(see Figure 9).
Using the packing strategy for Yorùbá numerals, the well-formedness of the structures in Figure 10 was investigated and only the structure in (c) was well-formed.The analysis is as follows: [ 191 ] (i) In the structure in Figure 10 Once a parse tree is generated, we then convert the tokens to their lexical equivalences followed by the application of morphophonological rules.

Morphophonological rules in Yorùbá numeral system
The representation of numbers in Yorùbá is cumbersome due to the fact that a high level of linguistic processing is involved.Therefore, the speakers are required to have adequate knowledge of some morphophonological rules in the Yorùbá language.These morphophonological rules include deletion, vowel coalescence, vowel harmony, and vowel assimilation.These rules will be discussed to show how they are useful in number naming.

Deletion
Deletion is a process by which a phrase or word is shortened by completely deleting a segment.Both vowels and consonants can be deleted in Yorùbá.The most commonly deleted consonants are w (when it is part of the last syllable) and g.Deletion is notable in the contracted form of phrases dín ní (less than) and lé ní (more than), where i is completely deleted and n is converted to l.This conversion is possible because n and l are allophones of the same phoneme.For example, the expression for 28, which is derived as 2 from 30 (èjì dín ní ọgbọǹ), is èjìdínlọǵbọǹ.
A deletion also occurs in naming numbers between 11 and 14.For example, ókànlá (11) is formed by adding 1 to 10, i.e., òkan lé ẹẁá, which is contracted to form òkanlẹẃá by deleting the vowel é.The consonant w and vowel ẹ are then deleted to form òkànlá. Another example is ẹẹdẹǵbẹta, which is formed from ẹẹdín ẹgbẹta.This is achieved by completely deleting the vowel ín.

Vowel coalescence
Coalescence is a phonological process whereby two adjoining segments converge or fuse into one element such that the new segment is [ 193 ] phonologically distinct from the input segments (Bámiṣilẹ̀1994).This is illustrated by Equation 25, where V 1 is the vowel that ends the first morpheme, V 2 is the vowel that begins the second morpheme, '+' is the morpheme boundary, and V 3 is the resulting morpheme.
In coalescence, the combining vowels may be phonologically distinct from each other but the resulting vowel must be distinct from the combining vowels, i.e., Vowel coalescence is most notable when two nouns are next to each other.And since Yorùbá numerals are mostly treated as nominal entities, they also use vowel coalescence in naming numbers.For example, vowel coalescence is used in the formation of ogójì (40) derived from 20 multiplied by 2, i.e., ogún èjì.The vowels ún and e are combined by coalescence to become o.Table 8 shows the possible occurrence of vowel coalescence in the Yorùbá numeral system.ii) Non-ATR, which are vowels a, ẹ, and ọ ATR vowels involve drawing forward the root of the tongue so that the pharynx is expanded.In simple Yorùbá words, the last vowel in the word determines the other vowels in the word (Akinlabí 2004).So, if the last vowel in a word is an ATR, the immediately preceding vowel must be an ATR.The high vowels (i and u) do not participate [ 194 ] in the vowel harmony at all, and they can occur with any vowel.Only the mid vowels (e, o, ẹ, and ọ) are fully involved in the vowel harmony (Akinlabí 2004).The chart presented in Table 9 shows the permissible and non-permissible sequences of vowels in the Yorùbá language.These rules also apply to number naming in Yorùbá as illustrated with the following example: The number ọgọta (120) is derived as 20 (ogún) multiplied by 3 (ẹta), i.e., ogún ẹta.The vowels ún and ẹ̀are then changed to vowel ọ́to form ogọta by means of vowel coalescence.Since the last vowel in the last two syllables is a, which is a non-ATR, therefore, the immediately preceding vowels must be non-ATR.We will then proceed to check for harmony between the first two syllables.The second vowel ọ́is non-ATR, therefore, the first vowel o must also be a non-ATR.This will transform o to ọ by means of the vowel harmony.

Vowel assimilation
Vowel assimilation is a process whereby a vowel becomes completely or partially like another vowel (Akinlabí 2004).Vowel assimilation is most notable in Yorùbá numerals when a consonant separating 2 vowels is deleted.This can be illustrated by number 2,000 (ẹgbàá).
The number 2,000 is actually formed from 200 × 10, i.e., igba ẹẁá, which will produce ẹgbẹẁá by vowel deletion.ẹgbàá is then formed by deleting the consonant w and allowing the vowel ẹ to assimilate the form of vowel a.
[ 195 ] Vowel assimilation can also occur between vowels separated by a consonant as in the expression for 800 (ẹgbẹrin).This expression is derived as 200 (igba) multiplied by 4 (ẹrin), i.e., igba ẹrin.igbẹrin is then formed by deleting a.The vowel i then assimilates ẹ to form ẹgbẹrin.

System and implementation
An object oriented programming (OOP) approach with 7 classes was used during the system design.The UML class diagram and the sequence diagram for the software are as shown in Figure 11  The software implementation was done using Python and Java.The software was implemented following the specifications in the system design.The following software pieces were developed to demonstrate the conversion of numbers to Standard Yorùbá text: [ 196 ] Figure 12: UML sequence diagram a) Desktop application: The desktop application was implemented using PyQt in the Python programming language environment.The combination of Python and Qt makes possible the development of applications that are platform-independent (Summerfield 2008).NLTK (Loper and Bird 2002) was used to implement the grammar designed for the Yorùbá numeral system.It was also used to generate the parse trees of the number forms.The screenshot is as shown in Figure 13 and the software is available for download at http://www.ifecisrg.org/yorubanumerals.
b) Web application: The web application was implemented using the Google App Engine Python API.The screenshot is as shown in [ 197 ] Figure 14.The application is available at http://www.num2yor.
appspot.com c) Mobile application on Android OS: The mobile application was ported to Android using Java and the Android Application Development Toolkit (ADT).The screenshot is as shown in Figure 15.
The desktop application has a single document interface with toolbars for all tasks on top, a menu bar duplicating toolbar tasks, and a dockable history and analysis widgets.The analysis widget shows the computational details of a numeral structure.The Onka software has the following features: a) The history can be saved for future usage.b) Users can copy the output text to the computer's clipboard and paste it into an editing program or word processor.
c) The output of the software can be printed or saved in Unicode text format.
d) L A T E X users can copy or save the output in the L A T E X format.Also, the parse trees generated can be copied in the qtree (Siskind and Dimitriadis 2008) bracketed syntax for inclusion in T E X documents.

discussion
The software produces the correct lexical transcription for numbers in the Yorùbá language.In the following subsections, analysis will be carried out on the structure, computation, and forms of certain numbers.The numbers that will be considered are 240, 969, 19,669, and 40,000,000.

4.1
The number 240 The software processing of 240 produced two different forms, which are: a. òjìlélígba: This number is computed by the addition of digit 40 to 200, i.e., [D40 + 200].This representation uses only one addition operation.The parse tree of this representation is shown in Figure 16.This representation contains three terminal symbols, and the depth of the parse tree is 6.
[ 198 ] The number 969 The software gives 5 representations for number 969 as shown in Table 10.All these representations are valid and none has preference over others.The choice of a representation depends on the mental dexterity of the speaker.The parse tree for the first representation is shown in Figure 18.The number 19,669 The output of the software for number 19,669 is shown in Figure 14.Representations 1 to 7 were presented by Ẹkúndayọ̀(1977) and the developed software produced three more representations (8-10) that are structurally valid.

system evaluation
In order to determine the accuracy of the system, we analysed and evaluated the output generated using the qualitative evaluation method.However, in these circumstances, it becomes expedient to rank the output of the software when multiple representations are produced.The aim is to order the representations according to the economy of computation.

Number name ranking
Although all representations produced are valid, we proposed some heuristic measures for ranking the representations when there are multiple correct expressions for a number.Once the parse tree had been generated for each representation, we computed the values to determine the computational economy of the numeral structure in the following order: i) The total number of terminal nodes (t): This represents the number of basic lexical items that make up a Yorùbá numeral.
The fewer the number of terminal nodes, the more economical the numeral structure is.
ii) The height of the parse tree (h): The height of the generated parse tree was determined by using the height() function of the package nltk.tree.The parse tree with the least height is thus considered the most suitable representation for a number.
iii) The relative number of subtractions (r): The most natural operations in most numeral systems are addition and multiplication, yet, the Yorùbá numeral system places a higher functional load on subtraction.The value of r is calculated by dividing the number of subtraction operations by the total number of arithmetic operations as shown in Equation 26.The two possible types of subtraction are the normal subtraction operation and the ẹẹdín type of subtraction.
r = Number of subtraction operations Total number of arithmetic operations (26) This means that a lower r implies a higher economy.
Once the first measure has been calculated and some structures have the same cost, the second measure, which checks the height of the parse tree in each structure, is used.But, if there is still a tie in values among any of the structures, the last measure (i.e., the relative number of subtraction) is used to determine the most suitable representation for a number.To illustrate this, we used these measures to decide which of the two representations for the number 240 discussed in Section 4.1 is more computationally economical.We started [ 203 ] by picking the representation with the minimum number of terminals.The parse tree in Figure 16 has three terminal symbols compared to four in Figure 17.Thus, the structure in Figure 16 is more computationally economical.
Also, the analysis of the ten representations for the number 19,669 is presented in Table 11.This shows the number of terminal symbols, the depth of the parse tree, and the arithmetic complexity.The computational cost was calculated based on these criteria, and it was used to rank the representations.The representations with the lowest number of terminal symbols and least height are representations 2 and 8 (with 8 terminal symbols and height of 8), however, representation 2 has the lesser relative number of subtractions.Hence, representation 2 (Figure 20) is the most computationally economical.Table 12 presents the most economical representations of selected numbers derived from the software.

Qualitative evaluation
The Mean Opinion Score (MOS) was used for the qualitative evaluation of the system.Chosen members of the staff of Ọbáfẹḿi AwólọẃọÙ niversity, who are Yorùbá native speakers with adequate knowledge of the Yorùbá language and its orthography, were asked to provide the textual equivalences of some numbers in Yorùbá.Afterwards, their responses were compared to the output from the software.A questionnaire was designed and administered to the selected group of 32 respondents.The numbers in the questionnaire were 25,67,132,750,969,2,400,3,000,19,669,20,000,30,000, 1,000,000, and 400,000,000.The MOS evaluation was carried out to capture two important aspects of the Yorùbá numeral system.The first one was the ability of the respondents to give an accurate representation of the numbers in terms of value and orthography, and the second one was to obtain the most suitable representation for the numbers as provided by the respondents.The numbers used in the questionnaires were chosen based on the following criteria: i) Numbers 25, 67, and 132 were included to confirm that numbers between 1 and 200 have one standard lexical form.
ii) Numbers 750, 969, 2,400, and 19,669 were included to check whether the respondents are aware that there are multiple representations for these Yorùbá numerals.
iii) The number 20,000 was included to check whether the respondents find 20,000 as a single lexical item or think it is derived from the number 200.
iv) Numbers higher than 20,000 (30,000, 1,000,000, 400,000,000) were included to see if the respondents represent these numbers as multiples of 20,000 or in some other way.
v) Some structurally complex numbers (969 and 19,669) were added to see the most convenient combination of basic lexical numerals used by the respondents to derive these numbers.
The results of the analysis revealed that: • For numbers 25, 67, and 132, all the respondents gave one correct representation, which matched up with the output from the software.This shows that numbers below 200 have one standard lexical form and that the skills needed to name these numbers are well understood.
• Ten respondents gave a representation for 19,669 but only two of them gave a correct number name (ẹẹdẹgbààwá ó lé ọtalélẹgbẹta ó lé mẹsán and ẹẹdẹgbààwá ó lé ẹgbẹta ó lé mọkandínlaadọrin).The other eight respondents provided number names that do not in any way evaluate to the number 19,669.Twenty two (22) of the [ 206 ] respondents did not give any representation for number 19,669.This shows that few respondents understand that 19,669 needs to be reconstructed and only two respondents were able to carry out the required computations.This result also shows that none of the respondents realised that multiple representations exist for the number 19,669.
• Seven of the respondents gave the correct number names for 20,000, with only two respondents using ọkẹ, and the remaining five using ẹgbààwá.This shows that few respondents were able to represent 20,000 in Yorùbá.
• Only three respondents gave the correct names for 1,000,000 (ààdọta ọkẹ), and none of the respondents gave the correct representation for 400,000,000 (ọkẹ́ọǹà ọkẹ).This shows thatYorùbá native speakers may find the computations underlying naming large numbers cumbersome.
From these results, we conclude that the respondents were able to produce correct representations for numbers that are frequently used (number 1 to 200), although most of them were not able to produce names for higher numbers.After comparing the responses of the human evaluators with the system output, we recognise that the software out-performed the human evaluators.This affirms that most nativespeakers know the terminologies needed for large numbers but are not familiar with the expression skills required for computing their number names.Without a doubt, modern Yorùbá speakers are losing the numeral generation skills embedded in their language.An obvious reason for this is the overwhelming use of the English numerals within the Yorùbá community.

conclusion
In this paper, we discussed extensively the computational analysis of the Yorùbá numerals.We started by identifying the basic lexical numerals and the numeral groups.Then, we designed a CFG that was able to capture the structure of the Yorùbá numerals.Furthermore, we implemented a software for converting numbers to their textual equivalences in the Yorùbá language and generating their corresponding parse trees.
[ 207 ] In this study, we are able to show that: 1.The Yorùbá number system has a systematic concept underlying it and that this concept can be articulated using modern computing tools and techniques.
2. The Yorùbá numeral system is not fully vigesimal.Elements of decimal (base 10) and quinary (base 5) are used in numeral representation.
3. The system's recall is 100% with respect to the corpus used in this study.This implies that, with carefully constructed computational model, the generation of the Yorùbá numeral system can be fully automated.
4. All the forms of number names produced were valid and the most computationally suitable representation are those in which : (a) the least number of terminal nodes is used, (b) the least height of the parse tree is generated, and (c) the least relative number of subtraction operations is involved.Though these measures are computationally reasonable, an interesting study will be to verify why Yorùbá native speakers sometimes prefer to adopt more complex methods, particularly when generating numerals greater that 200.
The results of this study can be applied in Yorùbá TTS.In any TTS system, numbers must be expanded into their textual forms before the actual speech synthesis is carried out.Thus, the system developed can serve as a sub-system of a Yorùbá TTS to handle the expansion of numbers to their textual equivalences.However, additional heuristic strategies must be employed by the TTS listeners to understand the number being spoken.Without a doubt, an increased usage of the Yorùbá numerals in communication could reduce the mental task needed for number conception.
The software developed in this study has a place in effective teaching and learning of the Yorùbá language.The software can be used in classes to teach the Yorùbá numeral system and its structure.This will allow the students to see the various forms possible for a single number and to visualise the structure (parse tree) of the numerals.
There are certain areas related to this study which we cannot explore.By pointing out these areas, we hope to focus our future study on [ 208 ] them.There is a need to carry out the contextual analysis of the Yorùbá numeral systems which will establish the relationships between numerals and their surrounding words.This will ensure that the expansion of numbers is carried out based on the context (cardinal, ordinal, nominal, currency, percentage, ratio, date, time, etc.) they represent.Also, there is a need to carry out a study on how the textual forms of the Yorùbá numerals could be recognised and converted to numbers.Definitely, the results of these studies could be applied in Yorùbá MT and information retrieval.

Figure 5 :
Figure 5: Number to Yorùbá text transcription system.The figure shows the processes involved in converting a cardinal number to Yorùbá text.

Table 5 :
Forms of Yorùbá number

Table 7
, i.e., MP is formed by a single multiplicative base M, or recursively by multiplying MP by a number phrase NP, e.g.ọgọta is formed by multiplying an MP (20 -formed by MP → M ) and an NP (3 -formed

Table 8 :
Vowel coalescence in Yorùbá numerals.V 1 is the vowel that ends the first morpheme, V 2 is the vowel that begins the second morpheme, + is the morpheme boundary, and

Table 9 :
Sequence of vowels in Yorùbá bisyllabic words.The symbols + and indicate the permissible and non-permissible vowel sequence, respectively.V 2 is the second oral vowel in the word and V 1 indicates the vowel that may precede V 2 .
*The letter u cannot start a word in the Standard Yorùbá language.

Table 11 :
Representations for the number 19,669 and their ranks.t is the number of terminal symbols, h is the height of the parse tree as generated by the software, and r is the relative number of subtraction operations in the representation

Table 12 :
Software output for some numbers