News

Why Unicode is about more than making emojis work across all devices

Emojis make up a small part of the work of the Unicode Consortium.
Emojis make up a small part of the work of the Unicode Consortium. Emojis make up a small part of the work of the Unicode Consortium.

As Apple users updating to the company’s OS 11.1 get their hands on a new set of emojis, here’s a look at the work of the Unicode Consortium – the organisation which agrees which emojis make the cut – and much more:

Even though you’re reading these letters in English – your PC, mobile or tablet is actually computing numbers to display the words.

Unique combinations of numbers identify every character be it letters, punctuation marks or symbols from different languages worldwide.

And Unicode is the name for those agreed numbers.

What happened before Unicode was introduced?

Blocks spell out the word Unicode
Blocks spell out the word Unicode (domoskanonos/Getty Images/iStockphoto)
(domoskanonos/Getty Images)

Before Unicode, companies might have a different way to show everything from a letter T to a smiley face through character encodings. It would mean a lot of work had to go on behind the scenes before things would display correctly.

It resulted in a lot of data corruption, akin to when a text document is written in an “unsupported font” and all that is shown to the reader is gibberish.

What is the Unicode Consortium?

The Unicode Consortium is the not-for-profit organisation, supported by membership donations, which governs the Unicode Standard.

It is based in Mountain View, California, on Microsoft’s Silicon Valley Campus.

What is the Unicode Standard?

A woman is using a laptop keyboard
A woman is using a laptop keyboard
(Dominic Lipinski/PA)

The Unicode Standard is the dictionary for the numbers used to represent letters, punctuation and symbols.

“The Unicode Standard provides the basis for processing, storage and seamless data interchange of text data in any language in all modern software and information technology protocols,” says the Unicode Consortium.

“It provides a uniform, universal architecture and encoding for all languages of the world, with over 100,000 characters currently encoded.”

And that includes standards in emojis.

In Unicode 8.0, 1,281 character codes were emojis of the 120,000 codes overall. While it’s important a facepalm or a poo emoji doesn’t get confused for anything else, the consortium’s mainstay is making sure written languages are clearly displayed.

In Unicode 10, released in June, there were 56 new emojis and 8,518 new characters overall.

How is the standard organised?

The consortium endeavours to keep obvious sets, such as alphabets together. Each entry is listed as a code point and with a description.

U+0041 is the character A. The official description is “Latin Capital Letter A”. U+0042 is B.

Who are the members of the consortium?

The Unicode Consortium website lists 12 full members – which all get a vote – including big players in tech: Apple, Google, Facebook, Microsoft, Oracle, IBM and Adobe.

Other full members are Huawei, Netflix, SAP, Symantec and Oman’s Ministry of Awqaf and Religious Affairs.

Oman?

Yes. The Gulf Kingdom wanted to have a say on how the Arabic Language is represented online.

Underpinning its interest is a desire to preserve Arabic online and create a fully searchable Koran. Difficulties in putting the text online means that a lot of the pages are either scanned in or mashed translations.

The Government of Bangladesh and the Indian State of Tamil Nadu are Institutional Members along with the University of California, Berkeley. They all have voting rights too.

When did Unicode come in?

The Consortium started work in 1991 when the exchange of data was littered with mistakes because of the ambiguity around how a certain letter should be displayed.

Some users would see squares as placeholders because the actual symbol was not available or could not be resolved.

In addition, a lot of human languages could not be displayed on screen at all.