What is Unicode?

Unicode is the universal character encoding, maintained by the Unicode Consortium. This encoding standard provides the basis for processing, storage and interchange of text data in any language in all modern software and information technology protocols. See "What is Unicode?" for a short explanation of what Unicode is all about. That page is translated into more than 50 languages, to illustrate the use of the standard. See for yourself!

What is the scope of Unicode?

Unicode covers all the characters for all the writing systems of the world, modern and ancient. It also includes technical symbols, punctuations, and many other characters used in writing text. The Unicode Standard is intended to support the needs of all types of users, whether in business or academia, using mainstream or minority scripts.

How many languages are covered by Unicode?

It's hard to say, because Unicode encodes scripts for languages, rather than languages per se. Many scripts (especially the Latin script) are used to write a large number of languages. The easiest answer is that Unicode covers all of the languages that can be written in the following scripts: Latin, Greek, Cyrillic, Armenian, Hebrew, Arabic, Syriac, Thaana, Devanagari, Bengali, Gurmukhi, Oriya, Tamil, Telugu, Kannada, Malayalam, Sinhala, Thai, Lao, Tibetan, Myanmar, Georgian, Hangul, Ethiopic, Cherokee, Canadian Aboriginal Syllabics, Khmer, Mongolian, Han (Japanese, Chinese, Korean ideographs), Hiragana, Katakana, and Yi. Unicode also includes many historic scripts used to write long-dead languages, as well as lesser-used regional scripts that may be used as a second (or even third) way to write a particular language. See Supported Scripts for the full list. See also the list of Languages and Scripts.