Bringing Balinese to iOS

Norbert Lindenberg

October 27, 2015

While iOS itself only supports the most popular writing systems of the world, recent changes have made it possible for third parties to provide the core support, fonts and keyboards, for additional writing systems. This article discusses how to implement fonts and keyboards for a complex writing system for iOS, based on my experience developing the Balinese Font and Keyboard app.

Why Balinese on iOS?
Balinese script requirements
Balinese in Unicode
Font technologies
Existing Balinese fonts
Adapting a Balinese font for iOS
A Balinese keyboard for iOS
What about Android and Windows?
Conclusion
Acknowledgments
References

Why Balinese on iOS?

Language and writing system support in iOS has grown over the years, and iOS 9 provides at least one font and keyboard for quite a few scripts: Arabic, Bengali, Cherokee, Chinese (simplified and traditional), Cyrillic, Devanagari, Greek, Gujarati, Gurmukhi, Hebrew, Japanese, Korean, Latin, Tamil, Telugu, Thai, Tibetan – and of course emoji. However, there are hundreds more scripts in the world, and users want to use them on their computers – because they’re the normal way their languages are written, or because they want to document and study them. Projects such as Wikipedia are aiming to support hundreds if not thousands of languages, and the lack of fonts and keyboards is a big obstacle.

Temple sign in Balinese and Latin scripts

Balinese is an interesting example of these languages and scripts. The Balinese language is spoken by about 1 million people in daily life on Bali and some neighboring islands, so it’s hardly endangered. However, in written communication it has been largely replaced by Indonesian, the national language of Indonesia, and even where Balinese itself is written, Latin script is now commonly used. Children are still taught its traditional writing system in school, but seem to forget it quickly afterwards. In public life, Balinese script is primarily used for signs on temples, government buildings, and some street signs; even then it’s almost always paired with a transliteration in Latin script.

One reason for the decline in use of Balinese is government policy: The Indonesian government is promoting Indonesian as a unifying element for its 255 million people. But another reason is technology: Balinese is a rather complex script, and until recently was not well supported on computers. Most previous attempts at implementing Balinese also focused on Windows, the operating system used by scholars and in schools. People in Bali as elsewhere however now rely far more on cell phones in their daily lives, especially for communication via social networks and messaging apps. Android has the largest market share in Indonesia, but iPhones, while not yet affordable for most people, are so desirable that smartphones in general are commonly called “iPhones”.

Browser compatibility: This article uses Balinese characters according to the Unicode standard. Over the last year, browser/OS combinations have appeared that can render them correctly: Safari on iOS 8 and OS X 10.10, Firefox 41 and recent Chrome on Windows and Mac, recent Chrome on Android 5.1. Reader views in these browsers fail on some of the text, as does Edge on Windows 10 and any software released before September 2014.

Balinese script requirements

The Balinese script evolved from the ancient Brahmi script via Pallava and Kawi. Thanks to this heritage, it has many similarities with other Brahmic scripts, such as Javanese, Thai, Tamil, and Devanagari, and shares many of the requirements and solutions discussed here with those scripts.

The Balinese script is an abugida: At its core is a set of consonants with the inherent vowel a: ᬳ ha, ᬦ na, ᬘ ca, ᬭ ra, ᬓ ka, and so on. When syllables have a vowel other than a, that vowel is indicated using an additional mark, which can attach to the consonant above, below, to the left, to the right, or in combinations thereof: ᬳᬶ hi, ᬳᭂ hĕ, ᬳᬸ hu, ᬳᬾ he, or ᬳᭀ ho. The vowel marks that attach to the left of their base consonants, the “pre-base vowels”, will be a recurring topic in this article because they significantly complicate processing. Long vowels are sometimes indicated by a modified mark, sometimes by appending the tedung, ◌ᬵ: ᬳᬷ hī, ᬳᬹ hū, but ᬳᬵ hā. However, tedung is also used in some cases to indicate a change in vowel, from ᬳᬾ he to ᬳᭀ ho or from ᬳᭂ hĕ to ᬳᭃ hö.

There are characters that represent vowels by themselves: ᬅ a, ᬇ i, and so on. However, in modern Balinese these are rarely used – it’s more common to use the consonant ᬳ ha with the appropriate vowel mark and omit the consonant in pronunciation.

Consonants don’t always come with a vowel – they may occur at the end of a phrase, or as part of a consonant cluster. In the first case, the omission of the vowel is indicated by the mark adeg adeg, ◌᭄. The town of Ubud, for example, is written ᬳᬸᬩᬸᬤ᭄. In consonant clusters, however, a vowel-less consonant is indicated by using a “conjunct” form of the following consonant, which is written below or, in a few cases, next to the vowel-less consonant. For example, the word for jackfruit, nangka, consists of ᬦ na, ᬗ nga, and ᬓ ka, but to suppress the vowel in ᬗ nga, the conjunct form of ᬓ ka, which is ◌᭄ᬓ, is used, with the result ᬦᬗ᭄ᬓ. The conjunct form of ᬓ ka looks very similar to the normal form of ᬦ na, so the position is really important! For conjunct forms that go to the right of the vowel-less consonant, distinct forms are used. For example, in the word for character and script, akṣara, the conjunct form of ᬱ ṣa is ◌᭄ᬱ: ᬅᬓ᭄ᬱᬭ.

Conjunct forms written below the base consonant are called gantungan, there can be at most two layers of them, and only a few are allowed in the second layer: ◌᭄ᬭ ra, ◌᭄ᬯ wa, and ◌᭄ᬬ ya. Conjunct forms written after the base consonant are called gempelan.

Vowel-less consonants at the end of a syllable can in a few cases be written with special characters: ◌ᬂ ng, ◌ᬃ r, ◌ᬄ h.

Balinese characters are often written with contextual forms, such as:

Gantungan and gempelan, as seen above.
Different conjunct forms when stacking them: ᬲ + ◌᭄ᬢ + ◌᭄ᬭ → ᬲ᭄ᬢ᭄ᬭ → ᬲ᭄ᬢ᭄ᬭ stra.
Different forms of vowels when attaching them to gantungan: ᬩ + ◌᭄ᬬ + ◌ᬸ → ᬩ᭄ᬬᬸ byu.
Different forms of consonants and independent vowels combined with tedung: ᬳ + ◌ᬵ → ᬳ‍ᬵ → ᬳᬵ hā.

The characters often need to be positioned in ways that differ from simple left-to-right placement:

Gantungan are often placed below other consonants. Up to three consonants can occur in one stack: ᬩ + ◌᭄ᬭ + ◌᭄ᬬ → ᬩ᭄ᬭ᭄ᬬ brya.
Vowels and other marks are placed above, below, or to the left of their base consonants: ᬳᬶ hi, ᬳᬸ hu, ᬳᬾ he, ᬳᬃ har.
If more than one mark is used above the same base consonant, they should go side by side: ᬳᬶᬃ hir.
When a vowel is positioned below a gantungan, then the vowel needs to be pushed below that gantungan: ᬦ + ◌᭄ᬤ + ◌ᬸ → ᬦ᭄ᬤᬸ ndu.

All these contextual forms and positioning rules mark out Balinese as a complex script, which raises some issues for encoding it in Unicode as well as for rendering it with fonts.

Balinese in Unicode

The first step in supporting a script on modern computers is adding it to the Unicode character set. The Unicode standard has the goal to support all characters used in the world’s languages. A character here is the abstract basic entity of written text, and Unicode provides an identifying number (the code point, written with “U+”), a name, and a set of properties for each character. One example is the Latin capital letter A with code point U+0041, and properties describing it as an upper-case letter with lower-case counterpart Latin small letter a, as a non-breaking component of words, and so on.

Identifying those characters and their properties is a surprisingly big effort, and a long list of scripts is still awaiting encoding. Fortunately for this project, Balinese was added to Unicode in version 5.0 in 2006, thanks to work by Michael Everson, I Made Suatjana and a group of experts in Bali.

In encoding Balinese, a few key decisions were made:

Conjunct forms of consonants are encoded by inserting the BALINESE adeg adeg character right before the consonant that needs to take its conjunct form, after the vowel-less consonant. In our examples above, ᬦᬗ᭄ᬓ is encoded as ᬦ ᬗ ◌᭄ ᬓ, and ᬅᬓ᭄ᬱᬭ is encoded as ᬅ ᬓ ◌᭄ ᬱ ᬭ. This is similar to the use of virama characters in the encoding of most other Brahmic scripts, most closely Javanese and Oriya, but differs from Tibetan, for which the conjunct forms were encoded as separate characters. Note that in this case the adeg adeg is not visible when rendered, while the same Unicode character is also used for the visible adeg adeg ◌᭄ at the end of a phrase.
Character sequences are encoded in logical order; in particular, vowels that are pronounced after a consonant also follow that consonant in encoded text, regardless of whether they are visually shown before or after the consonant. The word for god or goddess, dewa, is encoded as ᬤ ◌ᬾ ᬯ, but needs to display as ᬤᬾᬯ, with ᬾ first. Again, this is the general practice in encoding Brahmic scripts, but it differs from Thai, for which pre-base vowels are encoded before their base consonants.
Several character combinations involving the character tedung, ◌ᬵ, which is encoded on its own as Balinese vowel sign tedung, were encoded as precomposed forms: ᬆ ā, ◌ᭁ au, ◌ᭃ ö, and more. Balinese follows several major Indic scripts in this regard, while the encoding for Javanese avoids precomposed forms.

Font technologies

Where Unicode defines abstract characters, fonts provide concrete shapes, the glyphs, for them. There’s only one Latin capital letter A, but fonts can render it as A, A, A, A, A, and more.

For basic font rendering, today’s major computing platforms use the same font technologies. Fonts are collections of tables stored in “sfnt” containers, which were first introduced for TrueType fonts. The most important tables in a font are:

The table defining the glyphs, typically either a glyf table with TrueType outlines or a CFF table with PostScript outlines (other formats exist for bitmap or colored glyphs).
The cmap table mapping from Unicode characters to the glyphs provided by the font.
The name table providing the font name in various forms and related user-visible information, such as a description of the font and copyright information.
The head, hhea, and hmtx tables providing glyph metrics and general information about the font, such as ascent, descent, recommended line spacing, and more.

For supporting more complex font rendering – anything where glyphs aren’t arranged in a simple left-to-right sequence – several different technologies are in use. They have in common that they enable two classes of operations:

Substitutions map a glyph sequence to a different glyph sequence. A ligature is one such substitution, in which a sequence of two or more glyphs is replaced with one glyph representing the combination with a different shape. In English, ligatures are commonly used for f + i → fi and f + l → fl. Reordering is a substitution that changes the order of the glyphs in the glyph sequence, as is needed for the Balinese pre-base vowels such as ◌ᬾ.
Positioning moves glyphs into the desired locations relative to other glyphs. Kerning is a common way of positioning that moves glyphs closer together to avoid excessive gaps in sequences such as “AV” or “Wo”. More complex operations however are necessary to arrange the vertical stacks of consonants, above-base vowels and below-base vowels in Balinese. A common model for these is to define anchor points on glyphs, and then specifying which anchor points should be aligned with which anchor points of other glyphs.

Anchors are defined for the glyphs for ᬳ and ◌ᬃ, and the mark ◌ᬃ is positioned by aligning the anchor at its bottom with the anchor at the top of ᬳ.

Apple Advanced Typography

Apple Advanced Typography (AAT) was born as part of QuickDraw GX in the 1990es, and is available on Apple’s platforms: OS X, iOS, and watchOS. It provides fonts with a very flexible model for substitutions and positioning based on finite state machines. On the other hand, it also hands full responsibility for these operations to fonts. In its current version, support for complex writing systems is centered on three tables:

The morx table describes substitutions of glyph sequences with other glyph sequences. Particularly important for Brahmic scripts is the Rearrangement subtable, which is used to move the glyphs for pre-base vowels before the glyphs of their base consonants.
The kerx and ankr tables can be used to position glyphs by defining anchor points. kerx and ankr tables are a fairly recent addition to AAT and a huge improvement over the older kern tables.

OpenType

OpenType was created by Microsoft and Adobe in response to AAT. Various subsets of and extensions to it exist in at least four major implementations: Microsoft’s implementation in Windows, Adobe’s in InDesign and other products, Apple’s in its platforms, and the open-source implementation HarfBuzz, which is used in Firefox, Linux, Android, and many other products. OpenType assigns some responsibilities to the rendering system that in AAT would rest with the font, in particular the reordering of vowels required for scripts such as Balinese. Traditionally, Microsoft implemented and specified these responsibilities on a per-script basis, and they never got around to specifying them for Balinese. HarfBuzz implemented reordering for a number of scripts, including Balinese, without a specification. For Windows 10, Microsoft finally followed and provided the Universal Shaping Engine (USE), with reordering and related operations for all scripts in Unicode 7 that weren’t covered before. However, it’s unclear if and when Apple will implement the Universal Shaping Engine.

OpenType is centered on three tables:

The GDEF table assigns all glyphs in a font to glyph classes. At the minimum, there are two classes, base glyphs and marks, but fonts can define additional ones.
The GSUB table defines those glyph substitutions that aren’t implemented by the rendering system.
The GPOS table positions glyphs.

Graphite

Graphite was created by SIL to enable the implementation of writing systems that OpenType didn’t support on non-Apple platforms. Unlike AAT and OpenType, whose behavior is specified based on font table formats, Graphite uses a higher-level language, Graphite Description Language, which is compiled to tables that the Graphite renderer interprets. The Graphite renderer has been integrated into some open-source products, most importantly Firefox.

Hacked encodings and brute-force OpenType fonts

None of these three technologies is available everywhere, and universal script support in OpenType is very new. All three are based on Unicode, and many minority scripts have been encoded fairly recently or aren’t encoded yet. So what do people do when font technologies and Unicode fail them? They create hacked fonts or brute-force OpenType fonts.

Hacked fonts are fonts that re-interpret the code points of Unicode or another character encoding (ISO 8859-1 is a common victim) to mean the glyphs they want to show. So instead of meaning Latin capital letter A, the code point 0041 could be interpreted as the Balinese letter ᬳ. Instead of relying on glyph substitution and positioning, complete character clusters may be rendered as individual glyphs. If ISO 8859-1 is used, then the number of such glyphs is typically limited to 224, but fonts re-interpreting Unicode may have over a thousand glyphs.

Since hacked fonts don’t conform to any standard character encoding, no general-purpose software can generate or interpret them correctly. They have to be paired with special input systems that map key sequences to the code points for the desired characters or character clusters. Sometimes specialized word processor software is created that provides other functionality based on the hacked encoding. The general problem remains that text created with a hacked font is forever tied to that font and loses its meaning when separated from the font.

A newer variant are hacked Unicode fonts that are actually based on the Unicode encoding of Balinese, but then extend it with additional code points in the private use area to make contextual glyph forms available, such as the conjunct forms of consonants. This enables correct encoding when using software with sufficient OpenType support to render the contextual forms, while allowing hacked encoding when using less capable software, such as older versions of Microsoft Word and Adobe Photoshop.

Brute-force fonts use OpenType technology, but implement advanced behavior such as reordering without support from the rendering engine. Such a font might, for example, use a series of contracting and expanding substitutions to move a pre-base vowel before its base consonant.

Existing Balinese fonts

I’m not a type designer, so I started by looking for existing Balinese fonts. The resulting list is short, so I can present it in its entirety. The sample text shown for each one is the traditional Balinese greeting ᬒᬁᬲ᭄ᬯᬲ᭄ᬢ᭄ᬬᬲ᭄ᬢᬸ om swastyastu.

Bali Simbar was probably the first digital font created for Balinese, and was designed by I Made Suatjana. The most recent version is from 1999, before Balinese was encoded in Unicode (or in any character encoding), so the font is hacked. It contains about 180 Balinese glyphs.

om swastyastu in Bali Simbar — Bali Simbar

JG Aksara Bali was designed by Jason Glavy. The most recent version is from 2003, still before Balinese reached Unicode, so it’s still hacked. The font has over 1400 Balinese glyphs, including a huge selection of precomposed glyph clusters.

om swastyastu in JG Aksara Bali — JG Aksara Bali

Aksara Bali by Khoi Nguyen Viet is the first hacked Unicode Balinese font with a brute-force OpenType implementation. The results depend on how well other OpenType features are implemented in the renderer, and don’t always look good. The font has about 370 Balinese glyphs.

om swastyastu in Aksara Bali — Aksara Bali

The team of Aditya Bayu Perdana, Ida Bagus Komang Sudarma, and Arif Budiarto has created a small series of Balinese fonts: Tantular Bali, Lilitan, and Geguratan, all using hacked Unicode and a brute-force OpenType implementation. Tantular has about 400 Balinese glyphs.

om swastyastu in Tantular — Tantular Bali

Noto Sans Balinese by the Monotype Design Team is the first font that relies on Balinese support in an OpenType renderer. It was released by Google as part of their Noto series. “Noto” stands for “no tofu”, and indeed the font prevents tofu in Balinese text, but in its current form it’s not quite adequate for rendering real text documents. Three bugs are visible in its rendering of om swastyastu: The sign ulu candra ◌ᬁ sits on top of the tedung ◌ᬵ instead of on top of its base character okara ᬑ, the second-layer gantungan ◌᭄ᬬ ya overlaps the first-layer ◌᭄ᬢ ta instead of extending below it, and the final syllable collides with the gantungan ◌᭄ᬬ. The font has about 180 Balinese glyphs.

om swastyastu in Noto Sans Balinese — Noto Sans Balinese

Design options

You’ve probably noticed that the renderings differ significantly in size. That’s not an accident – the fonts are actually designed that way, responding differently to the question how they should align with Latin text and with rendering systems designed around the Latin concepts of baseline, ascent, and descent. If we assume that the bottoms of the base consonants should align with the Latin baseline, then the two possible layers of gantungan and below-base vowels can extend far below the baseline, risking collisions with glyphs of the next line. To ensure that Balinese text can fit into the envelope defined by Latin ascenders and descenders, Bali Simbar and JG Aksara Bali are therefore drawn quite small, and their base consonants float well above the Latin baseline to make room for below-base glyphs. Aksara Bali, on the other hand, happily stretches outside Latin ascenders and descenders, while still floating its base consonants above the Latin baseline. Tantular, Lilitan, Geguratan, and Noto Sans Balinese, finally, set their base consonants on the Latin baseline and size them to roughly match the x-height of typical Latin fonts, with other marks extending well beyond Latin ascenders and descenders. Here’s a comparison of all seven fonts with their (shortened) names set in Georgia:

Adapting a Balinese font for iOS

Seven fonts, none of them uses AAT, none of them can be rendered correctly on iOS. Well, I’m not a type designer, but I’m a software engineer and can deal with tables and finite state machines. Noto Sans Balinese became the first font I worked on because it was available under the Apache license, which explicitly allows derivatives. The license doesn’t extend to the name, so I named the adaptation for iOS “Ubud”, after the town in Bali, and it’s the font used for most of the Balinese text in this article.

Font tools

Most font tools focus on OpenType support, and support for AAT tables is rare. As you might expect, Apple provides a set of font tools that enable the creation of fonts including AAT tables. Maintaining these tools doesn’t seem a high priority for Apple though – the current version is a beta released in October 2011. For the creation of morx tables, the documentation discusses a new Advanced Typography Input File format, but the format is quite verbose and the compiler produced incorrect output, so I went with the older Morph Input Files (MIF) format instead. For the creation of ankr and kerx tables the tools offer no support at all – for ankr tables not surprising because they’re newer than the tools. I ended up developing my own tool for these two tables.

Glyph names

Working with glyph IDs is tedious, so the first step was to give all glyphs of the font names, using the post table. The Adobe Glyph List Specification provides guidelines for names that may allow tools to recover Unicode character strings from glyph names, and I followed these guidelines: Glyphs representing a single character U+XXXX are named uniXXXX (for example, uni1B33 for ᬳ); glyphs representing a ligature of two characters U+XXXX and U+YYYY are named uniXXXX_uniYYYY (for example, uni1B44_uni1B33 for the gantungan ◌᭄ᬳ ha); variant glyphs get a descriptive suffix (for example, uni1B44_uni1B26.shallow).

Glyph classes

AAT tables rely heavily on state machines, which in turn rely on glyph classes. So the next step is to define such classes. For glyphs that correspond directly to characters, the classes used in the Ubud font are similar to the ones defined for the Universal Shaping Engine. To start with, there’s a class for all the consonants, which can serve as the base of clusters, but also have conjunct forms. A separate class has non-consonant base glyphs, which can serve as the base of a cluster, but don’t have conjunct forms, including dotted circle, uni25CC, and no-break space, uni00A0. Differing from the current version of the USE specification (February 2015), this class includes independent vowels, as in some cases Balinese independent vowels can have a dependent vowel (tedung), the adeg adeg, or even a gantungan attached. Another class includes just the adeg adeg, which has very unique behavior, and yet another the pre-base vowels.

Reordering pre-base vowels

The main reason for adding AAT tables is reordering pre-base vowels. morx tables provide dedicated support for this in the form of Rearrangement subtables. Here’s a greatly simplified version of the Rearrangement subtable in the Ubud font:

B uni1B13 uni1B14 uni1B15 uni1B16 uni1B17 // … and many more

VPre uni1B3E uni1B3F

	`EOT`	`OOB`	`B`	`VPre`
`StartText`	`1`	`1`	`2`	`1`
`SawBase`	`1`	`1`	`2`	`3`

	`GoTo`	`MarkFirst?`	`MarkLast?`	`Advance?`	`DoThis`
`1`	`StartText`	`no`	`no`	`yes`	`none`
`2`	`SawBase`	`yes`	`no`	`yes`	`none`
`3`	`StartText`	`no`	`yes`	`yes`	`xD->Dx`

The first section defines the glyph classes, the second is the state array, the third the action list. B is the class of consonants; VPre the class of pre-base vowels; EOT is a pseudo-class representing the end of the text; and OOB the default class for all other glyphs. The AAT renderer starts analysis of each string in state StartText. Each entry in the state array indicates which action should be taken when a glyph of a given class is encountered in a given state. So, when a consonant is encountered in the StartText state, take action 2; when a pre-base vowel is encountered in the SawBase state, take action 3. Action 1 is the one taken most often: It resets the state to StartText and advances to the next glyph. Action 2 is taken when a consonant is encountered: It marks the glyph as the first one to consider for rearrangement, transitions to state SawBase, and advances to the next glyph. Action 3 finally is the one that actually rearranges glyphs: It’s taken when a pre-base vowel is encountered in state SawBase, and it marks that vowel as the last one to consider for rearrangement, moves the vowel to before the previously marked consonant, resets the state to StartText, and advances to the next glyph. A complete state machine for rearrangement has to also account for glyphs that may occur between the base consonant and the pre-base vowel, for malformed glyph sequences, and for some additional classes and states imposed by AAT, so it’ll be more complicated – the one in the Ubud font has 15 classes, 11 states, and 18 actions.

Splitting multi-component glyphs

Rearrangement is not the only functionality that the OpenType renderer implements for all fonts; it also splits vowels that have components on more than one side of the base glyph and for which the Unicode standard provides a decomposition (in Balinese the ones that have a tedung ◌ᬵ component), and inserts a dotted circle before any glyph that lacks a valid base glyph, such as a standalone dependent vowel.

Balinese has not only characters with multiple glyph components for which the Unicode standard provides a decomposition, such as ◌ᭀ, but also some for which it doesn’t, such as ◌ᬼ lĕ. These also need to be split so that the individual components can be positioned correctly. The Ubud font splits all such multi-component vowels in one pass. AAT doesn’t have an operation to simply replace one glyph with two, so two subtables are necessary: An Insertion subtable inserts a component glyph before or after the original one; then a Noncontextual subtable replaces the original glyph with the other component glyph. Here is a simplified version of the Insertion part, which only considers the glyphs with a tedung component:

WithTedung uni1B06 uni1B08 uni1B0A uni1B0C uni1B0E uni1B12 uni1B3B uni1B3D uni1B40 uni1B41 uni1B43

	`EOT`	`OOB`	`WithTedung`
`StartText`	`1`	`1`	`2`

	`GoTo`	`Mark?`	`Advance?`	`InsertMark`	`InsertCurrent`
`1`	`StartText`	`no`	`yes`	`none`	`none`
`2`	`StartText`	`no`	`yes`	`none`	`Tedung`

Tedung

IsKashidaLike yes

InsertBefore no

Glyphs uni1B35

The glyph class WithTedung includes all glyphs that have a tedung component, except the one for just tedung itself. Whenever the state machine encounters one of these glyphs, action 2 says to insert something at the current position, with the Tedung section providing the details: IsKashidaLike essentially means that the new glyph will be inserted adjacent to the current one, InsertBefore no means insert after, and the inserted glyph is uni1B35, the one for the tedung character.

After the insertion operation is completed, every glyph with a tedung component is followed by a duplicate tedung, so we now have to replace the glyphs with tedung components with their tedung-less peers. The Noncontextual subtable simply takes a list of glyphs with their replacements:

uni1B06 uni1B05

uni1B08 uni1B07

// several more

uni1B43 uni1B42

Inserting dotted circles

Inserting dotted circles before any glyph that lacks a valid base glyph requires a fairly complicated state array and action list that together can identify all valid Balinese glyph sequences, but in the end, when a glyph without a valid base glyph is found, the insertion operation ends up with details similar to the Tedung case above:

DottedCircle

IsKashidaLike yes

InsertBefore yes

Glyphs uni25CC

Mapping `GSUB` subtables to AAT

Splitting multi-component vowels, inserting dotted circles, and reordering pre-base vowels are the three operations that the Universal Shaping Engine provides by default for any Balinese OpenType font. All other substitutions have to be defined by subtables of a GSUB table in the font, and Noto Sans Balinese does so. To make the font work with AAT, these subtables had to be converted to equivalent morx subtables. One of them is the mapping of an adeg adeg with a subsequent consonant to the conjunct form of the consonant. What looks like one of the more advanced features of the font actually turns out to be very easily implemented: Replacing two glyphs with a new one is basically a ligature, and AAT provides the LigatureList subtable, where each entry says which ligature replaces which old glyphs. The one for the Ubud font starts with:

List

uni1B44_uni1B0B uni1B44 uni1B0B

uni1B44_uni1B13 uni1B44 uni1B13

uni1B44_uni1B14 uni1B44 uni1B14

// many more

Supporting second-layer gantungan

As mentioned above, Noto Sans Balinese has bugs. Some of these are fixed in Ubud, among them the two described above: Incorrectly positioned above-base marks, and overlap between second-layer and first-layer gantungan.

The overlap between second-layer and first-layer gantungan at first seems like it could be solved by positioning, simply pushing the second-layer glyphs below the first-layer ones. However, in the case of ◌᭄ᬬ ya that’s clearly not sufficient: The arm on the right side should still reach up to the height of base consonants, so it needs to be longer for the second-layer form. Also, in order to reduce the overall descent and minimize collisions with following lines, it’s better to use shallower shapes for all second-layer gantungan. The Ubud font therefore has separate glyphs for all possible second-layer gantungan and vowels, and uses a Contextual subtable to swap them in whenever one of these gantungan and vowels follows a gantungan:

CBelow uni1B44_uni1B13 uni1B44_uni1B14 uni1B44_uni1B15 // many more

CBelow2 uni1B44_uni1B2C uni1B44_uni1B2D uni1B44_uni1B2F uni1B38 uni1B39 uni1B3A

	`EOT`	`OOB`	`CBelow`	`CBelow2`
`StartText`	`1`	`1`	`2`	`2`
`SawCBelow`	`1`	`1`	`2`	`3`

	`GoTo`	`Mark?`	`Advance?`	`SubstMark`	`SubstCurrent`
`1`	`StartText`	`no`	`yes`	`none`	`none`
`2`	`SawCBelow`	`no`	`yes`	`none`	`none`
`2`	`StartText`	`no`	`yes`	`none`	`AltCBelow`

AltCBelow

uni1B44_uni1B2C uni1B44_uni1B2C.L2

uni1B38 uni1B38.L2

// same for other glyphs in CBelow2

The CBelow2 class contains those gantungan and vowels that can occur below gantungan; the CBelow class all remaining gantungan. When a gantungan is encountered, state SawCBelow is entered, and if in that state a glyph in class CBelow2 is found, then it’s replaced with its second-layer counterpart.

Positioning glyphs

So far we’ve only dealt with glyph substitutions. It’s in this area that the main differences between OpenType and AAT play out: OpenType providing more pre-made functionality, AAT a more flexible set of basic operations.

In positioning glyphs, now that AAT supports anchor-based positioning, the two technologies are roughly comparable – for some cases AAT provides more flexibility with its state machines, but for Balinese OpenType seems quite adequate. The big drawback in AAT positioning is the lack of tool support for ankr and kerx tables. Inspired by Grzegorz Rolek’s Kerning Input File compiler, I took advantage of the “OpenType seems quite adequate” and created a little tool that converts the GPOS table in Ubud into ankr and kerx tables. One issue that showed up is that in OpenType positioning rules are applied cluster by cluster, and if the GPOS table doesn’t specify a positioning rule for a base and a mark in a cluster, the renderer applies a default positioning rule that aligns the left edge of the mark’s bounding box with the right edge of the base’s bounding box, which works as long as the mark’s glyph is drawn to the left of its bounding box. AAT, on the other hand, deals with glyph runs that may span multiple clusters, and requires base glyphs to be identified in the kerx table – if one isn’t, then a mark intended for it may show up on an earlier base glyph. The solution was to fill in a number of missing base glyphs in the GPOS table.

What’s the result?

This is ᬒᬁᬲ᭄ᬯᬲ᭄ᬢ᭄ᬬᬲ᭄ᬢᬸ om swastyastu rendered with the Ubud font:

Where does it work?

Once the AAT font is complete, it can be installed on iOS. It works well in most apps, whether they use native text views or web views. Before iOS 8.3, web views would find the font only if style sheets requested it by name, but that’s fixed, so any Balinese text now is rendered with it. Apps that use their own font rendering engines however, such as Microsoft Word, still fail to render it correctly – the Pages app is a good alternative.

A Balinese keyboard for iOS

There’s no standard keyboard layout for Balinese, and few Balinese have much experience typing their own script on a keyboard. In addition, an iOS keyboard can show all its keys, so its layout is easy to figure out, unlike, say, that of a remapped hardware keyboard. This means there’s freedom to experiment with a layout that has no ties to QWERTY and its relatives.

The Balinese script has been encoded in 121 Unicode characters, but some of these characters are redundant, some are specific to Balinese music or to the Sasak language, and many are rarely used. The core of the script are 18 consonants, 8-10 dependent vowels, 2 vocalic consonants, 3 syllable-ending consonants, and adeg adeg. An iPhone keyboard can reasonably be a 4×10 matrix and needs a few function keys, so the core script fills just about a basic keyboard layout. I arranged the consonants in traditional hanacaraka order on the left and vowels on the right. Less commonly used characters can be found in two additional layers, one for additional consonants and vowels that are primarily used for words inherited from Kawi and Sanskrit, and one for digits and punctuation. Characters specific to music and to the Sasak language are omitted for now. For the iPad, which can comfortably accommodate a 4×12 matrix on a keyboard, two layers suffice.

main layer for Balinese keyboard for iPhone — Main layer for Balinese keyboard for iPhone

As with fonts, here too pre-base vowels pose a problem: Most Balinese are used to writing Balinese only on paper or on lontar (dried palm leaves), and so are used to writing these vowels before the consonants they belong to. However, without special precautions a vowel typed before its consonant would be interpreted as belonging to the previous consonant. The keyboard therefore inserts a temporary base character before pre-base vowels when they’re first typed, and replaces that temporary base character with the actual consonant once the user types it. A variety of bugs in iOS made it surprisingly difficult to find a suitable base character; I settled on U+200A hair space, which the font doesn’t consider a valid base character and which therefore leads to the vowel being rendered with a dotted circle.

Unicode and the Universal Shaping Engine require the characters within a Balinese cluster to be in an order that may not be entirely obvious to users – for example, some above-base marks need to entered before post-base vowels, but others after them. The keyboard therefore rearranges characters within each cluster into the order expected by Unicode and the USE. It also replaces decomposed sequences that contain the character tedung into the corresponding precomposed sequences.

The details of creating a keyboard for iOS in general are covered in a separate article.

What about Android and Windows?

The operating system that Indonesians use most commonly isn’t iOS – many would love to use it, but not many can afford to buy iOS devices. In the meantime, they make do with Android on phones or Windows on desktop and laptop computers.

In Android 5.1, released in February 2015, Google added the Noto Sans Balinese font, and the font rendering system, which includes HarfBuzz, is capable of rendering it. As mentioned above though, Noto Sans Balinese is not a great font – it provides glyphs for all Balinese characters in Unicode, but doesn’t support many of the contextual forms required in Balinese. It’s not clear when it will be improved. It’s also hampered by Android’s poor software upgrade story – it may well take two years before the majority of Android phones in Indonesia will use 5.1 or higher. Google has launched the Android One program in Indonesia, which aims to make their own, upgradable, version of Android more widely available; it’s not clear whether they will succeed against Samsung and other vendors that prefer selling modified versions of Android. Users who’d like to use better third-party fonts have no way to install them for system-wide use on Android without jail-breaking it. On the positive side, Android has long allowed third-party keyboards.

Windows 10 is the first version that includes the Universal Shaping Engine and so enables (mostly) correct rendering of Balinese OpenType fonts, but it doesn’t provide a Balinese font. Installation of third-party fonts and keyboards has long been possible on Windows. The challenge will be to migrate from the various custom solutions using hacked Balinese fonts that have been used on older Windows versions to the Unicode-based solution offered by Windows 10.

Conclusion

This article described what’s involved in creating Unicode-based fonts and keyboards for Balinese and similar complex writing systems on iOS. The work involved isn’t always easy for developers, the user experience of installing fonts and keyboards is far too complicated, and there are numerous bugs to be fixed and enhancements to be made (I’ve sent Apple over 50 bug reports as part of this project). On the other hand, I was pleasantly surprised that I could get a complex writing system like Balinese implemented at all – when I first looked at this 18 months ago, nothing seemed possible. In their own ways, Android and Windows have made progress as well, although Android still doesn’t allow installation of fonts for system-wide use. Overall, the pieces are finally coming together to support a far larger range of writing systems than before on the most popular computing platforms.

Acknowledgments

Many thanks to Muthu Nedumaran for generous advice on font and keyboard development, to Donny Harimurti, Dendy Narendra, and Bemby Bantara Narendra for information on the Balinese script as well as feedback on aspects of the Balinese Font and Keyboard app, to Jason Glavy and Aditya Bayu Perdana and his team for making their fonts available, to numerous Apple engineers for providing the underlying technology and fixing critical bugs, and to Menasse Zaudou for reviewing a draft of this article.

References

Balinese script requirements:

United States Library of Congress: ALA-LC Romanization Table Balinese. 2012. This defines the romanization used in this article, except for character names, where I followed the Unicode Standard.
Fred B. Eiseman: Tulisan Bali. ᬢᬸᬮᬶᬲᬦ᭄ ᬩᬮᬶ. A Layman’s Guide to Balinese Script. Second Edition, 1999.
Richard Ishida: Balinese script notes, Balinese character notes. October 2014.
Ida Bagus Adi Sudewa: The Balinese Alphabet. 2003. This provided the starting point for standardization of Balinese in Unicode.
Balinese alphabet. Wikipedia.

Balinese in Unicode:

The Script Encoding Initiative supported encoding of Balinese and keeps working on the scripts still awaiting encoding.
Michael Everson and I Made Suatjana: Proposal for encoding the Balinese script in the UCS. 2005.
The Unicode Consortium: Balinese (and related scripts) in Unicode 8.0. 2015.
The Unicode Consortium: Code chart for Balinese in latest Unicode.

Font technologies:

Apple Inc.: TrueType™ Reference Manual. 2014.
Microsoft Typography: OpenType specification. Version 1.7, 2015.
Microsoft Typography: Creating and supporting OpenType fonts for the Universal Shaping Engine. February 2015.
freedesktop.org: HarfBuzz. August 2015.
SIL: Graphite.
Yannis Haralambous and P. Scott Home: Fonts & Encodings. O’Reilly, 2007. This book was quite helpful in making sense of the font technologies.

Existing Balinese fonts:

I Made Suatjana: Bali Simbar. 1999.
Jason Glavy: JG Aksara Bali. 2003.
Khoi Nguyen Viet: Aksara Bali. 2011.
Aditya Bayu Perdana, Ida Bagus Komang Sudarma, and Arif Budiarto: Tantular Bali. 2015.
Aditya Bayu Perdana, Ida Bagus Komang Sudarma, and Arif Budiarto: Lilitan. 2015.
Ida Bagus Komang Sudarma and Arif Budiarto: Geguratan. 2015.
Monotype Design Team: Noto Sans Balinese. Google, 2014.
Noto Sans Balinese bug list.

Adapting a Balinese font for iOS:

Apple Inc.: OS X Font Tools. Version 4 beta 1, 2011.
Grzegorz Rolek: Kerning Input File compiler.
Adobe Systems Incorporated: Adobe Glyph List Specification. 2015.
Adding Graphite and AAT to a Font. This page helped clarify some AAT issues, although some other of the issues described appear to have been fixed in the meantime.
Muthu Nedumaran: Building Tamil Unicode Fonts for Mac OS X. Tamil Internet Conference 2009. Extensive description of how to create AAT tables for a font for a simpler Brahmic script.
Norbert Lindenberg: Installing Fonts on iOS. 2015.

A Balinese keyboard for iOS:

Norbert Lindenberg: Developing Keyboards for iOS. 2014.

Conclusion:

Lindenberg Software LLC: Balinese Font and Keyboard. 2015.

Updates

2019-01-24: Replaced PNG images of font renderings with SVG images. Added the description of a third bug in Noto Sans Balinese, which was not visible with the earlier OpenType implementation used to generate the PNG images. Added sample rendering with the Ubud font. Made the formation of the ᬳᬵ ligature visible again for users of newer versions of the Ubud font, which form this ligature by default.