ECMAScript Internationalization API

2013-08-31T13:58:58Z

If you have ever tried to write an application that supports multiple languages, I’m sure you will agree that it can be a complicated process. While there are obvious issues, such as translating the text, there are also differing conventions between countries on how things like dates and numbers are formatted, or how letters are sorted. Even just between the different versions of English, there are issues that crop up, such as how being British, I may understand the date 10/07/2013 differently than my American colleagues.

It is these latter cases that the ECMAScript Internationalization API aims to address. It is a companion spec to ES5.1 and above, that adds language sensitive functionality to JavaScript. It is influenced by established internationalisation APIs in Java and .NET.

The three areas that the current version one of the spec handles are strings, numbers, and dates. It does this by making a number of existing methods of the String, Number, and Date objects locale aware (that is, you can pass in a locale as an argument), and providing an Intl object that provides further localisation functionality.

Specifying a locale

First things first, when dealing with localisation, you need to specify the language and other conventions that will be used. This is called the locale. It is slightly more complicated than just specifying the language, as you can optionally specify the country or region (this is commonly used for English to indicate the difference between American and British English, or any of the various other versions used throughout the world), the writing system, calendar, and so on.

The string used to specify the locale is called the language tag. This is defined in IETF BCP47. Anyone that has written HTML has almost certainly encountered this. It is the value you give to the lang attribute (or xml:lang in XML/XHTML).

<!DOCTYPE html>
<html lang="en-GB">
    …
</html>

As many of you may have not used the language tag beyond the basics, I’ll cover some of the details here that you may need to know.

The components of the language tag are called sub tags, and are separated by hyphens.

Language sub tag

The first sub tag is the language sub tag. This can be either a two letter language code from ISO 639-1, or a three letter language code from ISO 639-2, ISO 639-3, or ISO 639-5. By convention, the language sub tag is written in lowercase.

It is recommended to use the ISO 639-1 two letter (often called Alpha-2) code if it exists for the language in question. Most major languages still in use today have an assigned Alpha-2 code. Examples include en for English, es for Spanish, de for German, ar for Arabic, and so on.

If the language isn’t included in ISO 639-1, you can use the Alpha-3 code from ISO 639-2 through 5. Examples of languages represented by an Alpha-3 code that have localisations provided in Windows (and thus supported by IE11) include Hawaiian (haw), Mohawk (moh), and Cherokee (chr).

While languages that are represented by an Alpha-2 code also have an assigned Alpha-3 code (such as eng for English, or spa for Spanish), IE11 treats it as invalid. Blink (the rending engine for Chrome and modern versions of Opera) on the other hand converts them to the Alpha-2 code when translating the specified locale to the canonical form.

Script sub tag

The optional script sub tag specifies the writing system used by the language in question. This uses the ISO 15924 four letter code. It is convention to write the script sub tag in title case.

For most languages, the script sub tag is not needed, as it is only written using one writing system. For example, English is exclusively written using the Latin alphabet, so specifying en-Latn (English using the Latin alphabet) would be redundant. Internally, the browser will just strip off the Latn sub tag when producing the canonical form of the language tag.

Some languages, however, use two different writing systems. An example of this is Serbian, which can be written using Latin (sr-Latn) or Cyrillic (sr-Cyrl).

While IE11 supports both, Blink only supports Serbian written in Cyrillic, and thus converts sr-Cyrl and sr-Latn to sr. While I suspect that Blink does support the script sub tag, it doesn’t provide any locale that uses it that I could find. IE11 supports multiple scripts for Serbian, Bosnian, Uzbek, Azeri (all Latin and Cyrillic); and Tamazight (Latin and Tifinagh), and Inuktitut (Latin and Syllabics).

Region sub tag

The region sub tag is optional, and specifies the country or regional variation of the language specified in the language sub tag. You can specify the region using either the ISO 3166-1 alpha-2 code, or the UN M.49. The ISO 3166-1 alpha-3 code is unsupported. By convention, the region sub tag is written in uppercase.

The Alpha-2 code should always be used over the numerical UN code, except in one situation; when you want to specify a geographical region. ISO 3166-1 only specifies individual countries, while UN M.49 has codes for continents and regions.

Most regions do not have a locale assigned to them; after all, what data would be used for en-021 (English as used in North America)? Conventions differ between the US, Canada, never mind the other countries. There is one key exception; es-419 (Spanish as spoken in Latin America). Many apps do not want to provide a different locale file for every country in Latin America for cases where conventions do not differ between the individual countries. Therefor the Latin American region is a useful catch all. It is supported by both IE11 and Blink. The only other language/region combination that I could find that is supported (only in IE) is en-029 (English as spoken in the Caribbean).

If you use a UN region code and it isn’t supported, it is stripped out by the browser and it will likely fall back to the base language. Blink also does this if you specify a UN country code (for en-021 it uses en rather than converting it to en-US), while IE11 rejects it as an invalid language tag, and falls back to the next locale in the list, or the default locale.

Be aware that you may get unexpected results if there is no locale provided for the combination of language and region. While IE looks to support an impressive array of locales, Blink is fairly lacking. For example, no English locales except US and British English are supported in Chrome, so if you specify en-AU for Australian English, it will fallback to en, which uses US English conventions. For numbers this is probably fine, but for dates there will be a lot of confused users.

As far as I could tell, Blink only supports different country variations of Spanish, Portuguese; and US and British English. No country specific version of French, German or Arabic are provided.

Extension sub tags

The language tag allows for extensions to add further locale-based behaviour. The extension that is relevant to the ECMAScript Internationalization API is “BCP 47 Extension U”, defined in RFC 6067.

The U extension is a single u token, which is followed by one or many keywords. They keywords specify things such as the number system, currency, calendar, and so on. A keyword consists of a two character “key” sub tag, that defines what property of the locale is being set, and zero to many three to eight character “type” sub tags, that sets the value. The structure is defined by the Unicode BCP 47 Extension Data section of Unicode TR35.

The actual keys and types are defined in XML files linked from the Unicode CLDR Project‎. These are being continuously updated, but the idea is existing keywords will never be removed. The latest trunk can be found at http://unicode.org/repos/cldr/trunk/common/bcp47.

By convention, extension sub tags are written in lowercase.

The relevant keywords are as follows:

Calendar

Setting the calendar can be achieved by using the ca key. An example of using the Buddhist calendar with the English locale would be en-u-ca-buddhist, where en is the language, u is the unicode extension, and ca-buddhistis the keyword, where ca is the calendar key and buddhist is the calendar type.

Collation

The type of ordering and sorting of strings is specified in the collation XML file. There are numerous keys, which I will cover later in this post.

Currency

The currency can be set using the cu key and a three letter currency code based on ISO 4217.

An example of setting Great British Pounds when using the American English locale and the Gregorian calendar, would be en-US-u-ca-gregory-cu-gbp. The Gregorian calendar is the default, so would not be needed, but is used for illustrative purposes to show more than one keyword at once.

Numeral system

The numeral system key is nu. All types in the CLDR numbers list are valid except native, traditio, and finance.

I’m not sure if the types are based on an existing standard. Some do seem a bit odd; for example the numerals commonly called Arabic numerals (or Hindu numerals), i.e. 1234567890, are called latn. This sort of makes sense, as we now use them with the latin alphabet, but they’re also used with the Greek and Cyrillic alphabets. What are commonly called Eastern Arabic or Arabic–Indic numerals (i.e. those used in many modern standard arabic locales) is called arab.

Timezone

The timezone key is tz. The type is a shortened version of the time zone code from [IANA Time Zone database]. They are based on UN/LOCODE with the space removed between the country and city codes.

Currently, by spec, it is not possible to set the timezone using the U extension keyword, but it works in Blink anyway.

Localising content

There is no way to globally set a locale using the Internationalization API. Instead, there are three objects that handle localisation duties for strings (Collator), numbers (NumberFormat), and dates (DateTimeFormat). These share a common parent Intl object.

One way to localise a string, number, or date is to use the relevant object’s constructor, passing in the locale, and any options. This works in a similar way for each type:

var locale = "en-GB",
    localeDate = new Intl.DateTimeFormat(locale),
    localeNumber = new Intl.NumberFormat(locale),
    localeString = new Intl.Collator(locale);

If you do not pass in a locale as the first argument, it will default to the user’s default locale. In my case, this would be en-GB.

In the example above, I just supplied one locale. Instead, it is possible to pass an array of language tags. If the first is not supported (and stripping off sub tags doesn’t lead to one that is), it will move to the next in the list until it does match a supported locale, or gets to the end and uses the default locale. In the following example, the language codes for Norwegian and Danish are specified. IE11 supports Norwegian, so it is chosen, but in Blink, it is not supported, so Danish is used. The latter uses the same collation and formatting rules as Norwegian, so it is a good plan B.

localeNumber = new Intl.Collator(["no", "da"]);

You can pass in an options object as the second argument to set specific formatting options. Each type has its own options, so I will cover them in their respective sections.

Working with localisation objects

Many of the methods that you will commonly use are shared between the three types of objects. There is also a common property for specifying how locales are matched.

Testing if a locale is supported

It can be important to test if a locale is supported, as otherwise unexpected results might ensue. Unfortunately, there is no list of supported locales that you can fetch. Instead, you have to pass an array of locales to query if they are supported. You do this with the supportedLocalesOf method. It will return an array with unsupported locales removed.

As there is no global locale, you will have to test each object individually to see if a locale is supported. In the following example, I am testing if the locales are supported for DateTimeFormat. You can call the same method on Collator and NumberFormat.

var supportedDateLocales = Intl.DateTimeFormat.supportedLocalesOf(["en-GB", "en-US", "ar", "de", "bs-Cyrl", "haw", "sma-NO"]);

There is an important note here on how locales work. If the full locale is not supported, the user agent will strip off sub tags until it finds a locale it does support. This is called the “lookup” algorithm. At the time of writing, IE11 supports the en-AU (Australian English) locale, but Chrome/Blink does not. Thus, IE11 will use en-AU while Chrome will use en. However, if you test if the locale is supported, both will return en-AU:

console.log(Intl.DateTimeFormat.supportedLocalesOf(["en-AU"));
// "en-AU", even though Chrome uses en.

It will only remove a locale from the array if there is no fallback that is supported. In the first example, neither sma-NO or sma are supported by Chrome, so it will not include it in the returned array.

While Chrome does not normalise the locale to the actual value it is using, IE11 does. Thus, if we test for en-DE (German English doesn’t really exist outside of British comedies like ’Allo ’Allo), IE will return en.

This situation is a bit unfortunate, as IE behaves in a way I’d expect, but supports a lot of locales, while Chrome doesn’t support many country specific locales, but behaves in a way that makes it difficult to tell if it really used the correct locale. This is especially an issue for dates and locales such as Australian English, where Chrome will fall back to en and use the US date format. Without handling this case, there will be a lot of confused users for the first 12 days of each month.

Accessing the locale, and the formatting and collation options

Once you have created an object for formatting or collation, it can be useful to query which locale is being used (especially as it may not be the one specified) and the options that are set. This can be done with the resolvedOptions method. It returns an objects with the specified options. These can then be accessed as you would with any other properties of an object.

var locale = "en-GB",
    numFormat = new Intl.NumberFormat(locale),
    dateFormat = new Intl.DateTimeFormat(locale),
    collator = new Intl.Collator(locale); // print options object to the console. Unset options are not included
    
var options = numFormat.resolvedOptions();
console.dirxml(options); // access individual options
console.log("Style: " + options.style);
// same method exists for dateTime and collator, but returns different options

console.dirxml(dateFormat.resolvedOptions());
console.dirxml(collator.resolvedOptions());

Demo showing the use of resolvedOptions

There is one thing that is important to note. Some options are dependent on the value of other options. If an option is not relevant to the formatting, it will not be included in the object returned. An example of this is the currency property of the NumberFormat object. This is only used if the style object has a value of currency.

Changing the locale look up method

Previously, I described how the browser (or user agent) processes the language tag to find a supported locale. The lookup method that I described can be changed by setting the localeMatcher property. The method I described can be set using a value of "lookup". The default is actually not this but "best fit". This leaves it up to the user agent to come up with a custom algorithm that can provide better matches. An example could be falling back to en-GB instead of en if en-AU is specified, and not supported. As far as I can tell, both IE and Blink just use the regular lookup method when "best fit" is specified.

Formatting using the locale

To format numbers and dates based on the current locale and formatting options, you use the conveniently named format method:

localeNumber.format(1000.15); // 1,000.15 with default formatting
localeDate.format(new Date()) // 23/8/2013  with default formatting

Demo showing the use of the format method

This is where you can see the true power of the Internationalization API. Across the various locales that browsers support, there is a lot of data that you’d have to capture yourself if you wanted to format using the correct conventions.

We hit this exact problem when I was working on Opera Dragonfly. For the Resource and Network Inspectors, I naïvely wanted the data, such as the size of the resource, to be a more human friendly value. A number such as 12,000,000 is much easier to understand at first glance than 12000000. However, it was pointed out to me that while English uses commas as a thousand separator, languages such as French use a space. Other languages use a point and so on. As JavaScript didn’t (until now) provide a built-in way to do this, it would have been a lot of extra work (and research into the correct conventions) to support a relatively minor enhancement (in the grand scheme of things).

Collating using the locale

Searching and sorting strings is somewhat similar to formatting numbers and dates. Instead of using the format method, you use the compare method.

When sorting an array, you can pass this as an argument of its sort method:

var letters = ["b", "a", "c"],
    collator = new Intl.Collator("en");
    letters.sort(collator.compare); // ["a", "b", "c"]

Demo showing the use of the compare method

Formatting and collation options

So far we’ve seen how to create a localisation object by specifying the locale, and using the default options. We’ve also seen that some options can be set using the unicode u extension to the locale.

It is also possible to set a full range of options when constructing the localisation object. This can be achieved by passing in an options object as the second argument of the relevant constructor function, and also by specifying language tag extensions. In the following sections I will describe these options for the NumberFormat, DateTimeFormat, and Collator objects in more detail.

Number formatting options

Beyond the locale, the NumberFormat object has the following internal properties that represent formatting options:

`numberingSystem`

The numberingSystem property sets the type of numerals used when formatting the number. Unlike the other options, this can only be set via the Unicode u extension to the language tag, using the nu keyword. The valid types can be found in numbers.xml.

new Intl.NumberFormat("en-u-nu-beng").format(100000.10); // ১০০,০০০.১ using Bengali digits

Demo showing the numberingSystem being set via the nu Unicode extension

`useGrouping`

When the useGrouping property is truthy, the locale dependent grouping separator is used when formatting the number. This is often known as the thousands separator, although it is up to the locale where it is placed. The ‘useGrouping’ is set to true by default.

 // 100 000,1
new Intl.NumberFormat("ru", { useGrouping: true }).format(100000.10);

// 100000.10
new Intl.NumberFormat("en", { useGrouping: false }).format(100000.10);

Demo setting the useGrouping property

`style`

The style property sets the style of number formatting. It can either be the string "decimal" (the default), "currency" or "percent".

`decimal`

When a number is formatted as a decimal, the decimal mark is replaced with the most appropriate symbol for the locale. In English this is a decimal point (“.”), while in many locales it is a decimal comma (“,”). If grouping is enabled the locale dependent grouping separator is also used. These symbols are also used for numbers formatted as currency or a percentage, where appropriate.

 // 100,000.10
new Intl.NumberFormat("en", { style: "decimal" }).format(100000.10);

Demo using the decimal number formatting style

`percent`

When formatting using percent, the number is multiplied by 100 and the correct symbol for the percentage symbol is added in the correct position for the locale used. This is often a percent sign (“%”), sometimes with our without a space between the number, but in Arabic and Farsi it is the appropriately named Arabic percent sign (“٪”).

// Chrome: "50%", IE: "50 %"
new Intl.NumberFormat("en", { style: "percent" }).format(0.5);

Demo using the percent number formatting style

`currency`

If the style is set to currency, there is a requirement that the currency property also needs to be specified. This is because the currency used is not dependent on the locale. You may be using a Thai locale, but dealing in US Dollars, for example.

When formatting as currency, the formatting rules for currency are used. This is often the same way as how a decimal number is formatted, but some countries–including Switzerland–use different formatting rules for currency. The correct currency is then added to the formatted string following the options specified by the currency and currencyDisplay properties.

new Intl.NumberFormat("en", {
         style: "currency", 
         currency: "GBP"
     }).format(100000000.59)); // £100,000,000.59

Demo using the currency formatting style

`currency`

The currency property specifies the currency that will be used when formatting the number. The value should be a ISO 4217 alphabetic currency code. These are the same values specified in the Unicode cu extension.

Note that the currency can not be set as part of the language tag, as the Internationalization API ignores the cu extension.

When formatting a currency, it does not convert the value of one currency to another. It merely adds the correct symbol, name, or code of the currency specified, and adds formatting rules related to that currency, such as the number of decimal places (if any).

new Intl.NumberFormat("en", {
     style: "currency", 
     currency: "JPY"
  }).format(100000000.59)); // "¥100,000,001"

Demo using the Japanese Yen currency

Note, in the previous example, IE rounds the value as Japanese yen no longer has a subunit. Chrome leaves the decimal place intact.

`currencyDisplay`

If the number is using currency formatting, the currencyDisplay property specifies if the currency will be displayed using its symbol (e.g. $ for US dollars, £ for Great British pounds, and so on), the ISO 4217 currency code, or the currency name. These correspond to the values symbol, code, and name respectively.

The currency symbol and name will be localised to use the conventions of the language specified in the locale, where appropriate.

var spondoolies = 100000000000;

//  "QR100,000,000,000" Symbol is QR in English
var qatariSpondooliesEn = 
        new Intl.NumberFormat("en", {
            style: "currency", 
            currency: "QAR",
            currencyDisplay: "symbol"
         }).format(spondoolies)); 

//  "ر.ق.‏ ١٠٠٬٠٠٠٬٠٠٠٬٠٠٠" Symbol is ر.ق in Arabic
var qatariSpondooliesAr =
        new Intl.NumberFormat("ar", {
             style: "currency", 
             currency: "QAR",
             currencyDisplay: "symbol"
        }).format(spondoolies));

Demo showing the use of currencyDisplay

`minimumIntegerDigits`

The minimumIntegerDigits property sets the minimum number of digits before the decimal place (known as integer digits). The number is padded with leading zeros if it would not otherwise have enough digits. The value must be an integer between 1 and 21.

// 000,010.1
new Intl.NumberFormat("en", { minimumIntegerDigits: 6 }).format(10.10);

Demo showing the use of minimumIntegerDigits

`minimumFractionDigits`

The minimumFractionDigits property is similar to minimumIntegerDigits, except it deals with the digits after the decimal place (Fractional digits). It must be an integer between 0 and 20. The fractional digits will be padded with trailing zeros if they are less than the minimum.

 // 0.100
new Intl.NumberFormat("en", { minimumFractionDigits: 3 }).format(.1);

Demo showing the use of minimumFractionDigits

`maximumFractionDigits`

The maximumFractionDigits property follows the same rules as minimumFractionDigits, but sets the maximum number of fractional digits that are allowed. The value will be rounded if there are more digits than the maximum specified.

var mxFr1 = new Intl.NumberFormat("en", { maximumFractionDigits: 1});

console.log(mxFr1.format(.42)); // 0.4
console.log(mxFr1.format(.99)); // 1

Demo showing the use of maximumFractionDigits

`minimumSignificantDigits`

The minimumSignificantDigits works as a pair with maximumSignificantDigits. It specifies the minimum number of significant figures that are used when formatting the number. It must be an integer value between 1 and 21. If the number has less than the specified minimum significant figures, the fraction digits will be padded with trailing zeros.

If minimumSignificantDigits and maximumSignificantDigits are specified, they override minimumIntegerDigits, minimumFractionDigits and maximumFractionDigits.

`maximumSignificantDigits`

Complementing minimumSignificantDigits, the maximumSignificantDigits property sets the maximum number of significant figures. if there are more significant figures than the maximum specified, the number is rounded to the number of specified significant figures and the remaining significant digits are replaced with zeros.

var mn1Mx3 = new Intl.NumberFormat("en", { 
    minimumSignificantDigits: 1,
    maximumSignificantDigits: 3
});

console.log(mn1Mx3.format(1.501)); // 1.5
console.log(mn1Mx3.format(25499.99)); // 25,500

Demo showing min and max significant digits

Date and time formatting options

Date and time formatting works much the same way as number formatting, but includes its own specific options.

Beyond the locale, the DateTimeFormat object has the following internal properties:

`numberingSystem`

As with NumberFormat, the numberingSystem property can only be set with the Unicode u extension to the language tag, using the nu keyword.

It works in exactly the same way as forNumberFormat, except that it sets the numeral system used in the date.

`calendar`

The calendar property sets which calendar is used when formatting the date. For most locales, this will be the Gregorian calendar (gregory in Unicode CLDR), but some countries use alternative calendars. There are also a number of calendars that are still used for specific situations, such as religious holidays or official documents.

As an example, Thailand uses the Thai solar calendar (buddhist), which uses the same day and month as the Gregorian calendar, but the year 0 starts from the beginning of the Buddhist Era (545 BC), rather than the Christian Era.

The calendar property uses the ca keyword from the Unicode u extension of the language tag. It also can not be set via the options object.

For valid values see the calendar.xml file from Unicode CLDR.

`timeZone`

While there is a tz keyword to specify the time zone offset in the language tag, it is not currently used in the ES Internationalization API.

Instead, the time zone can be set using the timeZone parameter of the options object. While Unicode CLDR defines a number of values in timezone.xml, the spec only requires that the utc value is supported. If timeZone is undefined, the time zone of the user’s host environment (i.e. that used by their OS or browser) is used when formatting the date (but timeZone remains undefined per spec). Other values should throw an error.

var utc = new Intl.DateTimeFormat("en", { 
    timeZone: "utc",
    hour: "numeric"
});

console.log(utc.format(new Date()));

Demo setting the date to use UTC time zone

While the current version of the spec only supports utc and the user’s current timezone (when timeZone is undefined), there is a conformance clause that “allows implementations to accept additional time zone names under specified conditions.” Chrome takes advantage of this, by also accepting additional time zones. It does not support the field names specified in timezone.xml, but instead supports the alias values from the same document. It also reports the user’s current time zone using this alias value when timeZone is undefined:

// same behaviour when explicitly set as undefined, or option is not set.
var format = new Intl.DateTimeFormat("en-GB", {
    timeZone: undefined
});

// IE: undefined, Chrome: America/Los_Angeles (if in PDT/PST time zone).
console.dirxml(format.resolvedOptions().timeZone);

Demo showing differing behaviour between IE and Blink when timeZone is undefined

var format = new Intl.DateTimeFormat("en-GB", {
    timeZone: "Antarctica/South_Pole"
});

// Chrome: Antarctica/South_Pole, IE: outside valid range
console.dirxml(format.resolvedOptions().timeZone);

Demo showing the timeZone being set to the South Pole’s time zone

`timeZoneName`

The timeZoneName property sets how the time zone name will be displayed. It can either be set to long or short. The former should use the name of the time zone, while the latter should use its abbreviated name. If this property is not set, it is not included in the resolved options, and no time zone indicator will be included in the formatted date string.

This is the theory, but in practice, IE never shows the time zone name, even though it sets the option correctly.

Chrome does show the time zone name, but it doesn’t show exactly as I’d expect. When using short it displays the correct time zone abbreviation for US time zones, with the caveat that it uses the generic form, rather than the standard or daylight savings form. If you live on the West Coast of the US (or specify "America/Los_Angeles" in Chrome), it will include the token PT when using the en locale, rather than PST or PDT, depending on the time of year.

If you’re elsewhere, the names seem a little strange to me. For example, in the UK we usually use “GMT” or “BST” (in summer), as can be seen in the CLDR Timezones chart, but Chrome will use United Kingdom Time when timeZone is set to "Europe/London". As well as not being time zone name I’d expect, it isn’t the abbreviated form, and is inconsistent with how time zones are named for the US. Now imagine that you want to format a date using a time zone such as CET, used by many countries. If I choose "Europe/Paris" (there is no generic Europe/Central or equivalent that I could find, either in timezone.xml or that is supported by Chrome), Chrome will display “France Time”. That is going to feel odd for someone in the Spain, Norway, or any of the other countries in the central European time zone.

Then there is the long form. The CLDR Timezones chart uses the full name of the timezone, such as Pacific Time, while Chrome shows the GMT offset: GMT-07:00. Outside of the US, the long form will be shorter than the short form.

I suspect these issues with Chrome are because time zone support is experimental, beyond support for UTC. With IE not showing the time zone name at all, this is certainly a property that can not be relied upon yet.

Time components: `hour`, `minute`, and `second`

The time is not shown by default, and the hour, minute, and second properties are absent if not set.

All three can be set to either "2-digit" or "numeric".

If the hour is displayed, it will either use the 12 or 24 hour clock, depending on the locale.

The "2-digit" value does what you’d expect; formats that component of the time using two digits, just as you’d likely see on a digital watch:

var timeFormat = new Intl.DateTimeFormat("en-GB", {
    hour: "2-digit",
    minute: "2-digit",
    second: "2-digit"
});

console.log(timeFormat.format());

Demo setting the hour, minute, and second to two digits

In Chrome however, only one digit will be used for the hour if the time is before midday and the 12 hour clock is being used.

For the "numeric" value, one digit will be used for the hour where possible (the same as described above for Chrome when "2-digit" is specified). I’ve not noticed any difference for the minute and second properties in IE, while in Chrome the "numeric" value is not supported for any time related properties; they remain set to "2-digit".

`hour12`

The hour12 property specifies what time notation is used for formatting the time. It accepts a boolean value, where true uses the 12-hour clock and false uses the 24-hour clock (often called military time in the US). This property is undefined if the hour property is not used when formatting the date.

Much of the world uses the 24-hour clock by default in written communications, with some notable exceptions, including North America, and India. Many countries use both systems interchangeably, so IE and Chrome don’t always agree with each other on the default value for this property.

`hourNo0`

The hourNo0 is a boolean property that specifies if the hours go from 1–12 (true) or 0–11 (false). This is only present if the 12-hour clock is being used (hour12: true, and the hour property is present).

This sounds like the hour counts up like a zero indexed array, but it is not quite like that. In some locales midnight is 00:00 (British English does this), while in some locales (such as American English), midnight is 12:00. The hours from 1 a.m. to 11:59 p.m. are the same.

This is what I’m assuming this does anyway, as it looks like hourNo0 is not supported in either IE or Chrome.

`weekday`

The weekday property specifies how the day of the week is formatted. If absent, it is not included in the date formatting.

The valid values are narrow, short and, long, indicating the length of the string used. In alphabetic languages, narrow will usually use one character for narrow, while short will use an abbreviation (Wed for Wednesday in English), and long will use the full name of the day.

 // W if Wednesday
new Intl.DateTimeFormat("en", { weekday: "narrow" }).format()

Demo showing a narrow weekday

One fairly major issue with using weekday in Chrome is that many supported locales do not include a localised string. Instead a digit that represents the day of the week is shown. For Wednesday, this is 4, based on Sunday being the first day of the week.

`era`

The era can be be included in the formatted date using the era property. It accepts the same narrow, short, and long values, with the same definition, as the weekday property, and is also absent if not used in formatting.

For modern dates using the Gregorian calendar, the era: “AD” (using short in English) or “Anno Domini” (inlong` form) is rarely used, but it is more important for some calendar systems. The Japanese calendar has an era (called nengō) that changes at the start of the reign of a new emperor.

While IE supports the era property, it does not include it in the formatted string, no matter which value is used.

`day`, `month`, and `year`

It is fairly self explanatory what these properties do. All three accept "2-digit" and "numeric", which will be familiar from the time-specific properties. Also in keeping with those properties, they are absent if not used in formatting the date.

The month property adds three additional values: narrow, short, and long. They behave in the same way as for weekday; the 8th month in the en locale is formatted as Aug using short, and August using the long value.

In many locales, the day, month, and year properties are used by default if no options (other than the locale) are specified when constructing a new DateTimeFormat object. They are also used by default when the era property is set without any other date specific properties.

The order of these properties differ depending on the locale. The most common is little-endian: day, month, then year. The next most common is big-endian, with the most significant component first: year, month, then day.

var locales = ["en-GB", "en-US", "hu"],
    length = locales.length,
    dateFormat = null;

for (var i = 0; i < length; i++) {
    dateFormat = new Intl.DateTimeFormat(locales[i], {
        year: "numeric",
        month: "long",
        day: "numeric"
    });
    console.log(dateFormat.resolvedOptions().locale + " " + dateFormat.format());
}

Demo for formatting the date, showing different order depending on locale

The US is a significant outlier, with its month, day, year format. This could cause confusion with the default numeric-style month format, as any locale using the en language sub tag with an unsupported region sub tag will fall back to the US format. In IE the major English speaking locales are covered, but in Chrome, only en-GB and en-US are supported.

If you’re targeting English speaking users from other locales, I’d recommend specifying the month using short or long to mitigate this issue, or use en-GB for date formatting.

String collation options

As well as the locale, the Collator object contains the following internal properties that can be set via language tag extensions or the options object:

`usage`

The usage property specifies what action will be performed on the list of strings. if the property is set to search it will check to see if a string matches, while sort will sort the list of strings in the order specified by the locale and other options.

These two values are also specified in the co Unicode extension, but these are ignored in the Internationalization API.

In the following example, I am sorting an array of characters based on the order specified by the English and then Swedish languages. In English, where diacritics are not used for the most part, any character with such marks are sorted after the base character. Thus the Swedish characters “å” (LATIN SMALL LETTER A WITH RING ABOVE) and “ä” (LATIN SMALL LETTER A WITH DIAERESIS) are sorted after “a”. In Swedish, these characters, along with “ö” (LATIN SMALL LETTER O WITH DIAERESIS) are standard letters of the alphabet, that appear in the order mentioned after “z”.

var str = "qwertyuiopåasdfghjklöäzxcvbnm".split(""),
    locales = ["en", "sv"],
    length = locales.length,
    sorter = null;
    
for (var i = 0; i < length; i++) { 
    sorter = new Intl.Collator(locales[i], { usage: "sort" });
    
    console.log(sorter.resolvedOptions().locale + ":\t" +
        str.sort(sorter.compare));
}

Demo showing the sort order differences between English and Swedish

The situation is the same in Danish and Norwegian with the “æ”, “ø”, and “å” characters (except that Chrome, and thus ironically Opera, don’t support the Norwegian locale). If you need to sort using Norwegian sort order, it is best to use an array for the locale, with Danish as a fallback.

`collation`

The remaining types specified in the Unicode co extension are valid in the Internationalization API.

These types tailor the sort order. One example of a collation type is trad. This stands for traditional collation. In some languages, the sort order has changed, and this property allows you to tailor the sort order to use the older style. One language where this is relevant is Spanish. Traditionally there were two unique compressions (that is, two letters that get treated as one) that are being phased out; “ch”, which sorts between “c” and “d”; and “ll” between “l” and “m”.

var str = ["cherry", "llama", "apple", "lychee", "date", "cranberry"],
    locales = ["en", "es-u-co-trad"],
    length = locales.length,
    sorter = null;
    
for (var i = 0; i < length; i++) { 
    sorter = new Intl.Collator(locales[i], { usage: "sort" });
    
    console.log(sorter.resolvedOptions().locale + ":\t" + 
        str.sort(sorter.compare));
}

Demo showing traditional sort order in Spanish

In the above example, “cherry” should be sorted after “cranberry”, and “llama” (what you’ve never head of the llama fruit?) should be after “lychee” when using the Spanish locale with traditional sorting.

In IE11, you can also set the collation type via the collation property of the options object, but this is not valid in Chrome, or by the specification.

Another example is the phonebk type. This uses the ordering as found in phone books, and other such lists of names. If you’re an English speaker, you may wonder what the difference is between this and the regular rules for collation, but in German there is an important difference.

A number of names that use characters with umlauts can also be spelt in the expanded form, where an “e” is added after the base character. Thus, some people may be called “Müller”, while other may be “Mueller”. As both are variations of the same name, it would be preferable for both names to be found together in a phone book. For this reason, when using the phonebk type, any characters with an umlaut are collated the same as if they were in their expanded form. In the following example, “Häßler” (Haessler) comes before “Hamman”, while “Müller” and “Mueller” are interchangeable, so are sorted by first name:

var str = ["Mueller, Peter", "Hamann, Dietmar", "Müller, Gerd", "Müller, Thomas", "Häßler, Thomas"],
    locales = ["de", "de-u-co-phonebk"],
    length = locales.length,
    sorter = null;
    
for (var i = 0; i < length; i++) { 
    sorter = new Intl.Collator(locales[i], { usage: "sort" });
    console.log(sorter.resolvedOptions().locale)
    console.log(str.sort(sorter.compare));
}

Demo using German phone book ordering

There are a number of other options. Many of these are related to Pinyin ordering, but also dictionary (dict) style ordering–primarily used in Sinhala–phonetic for sorting based on pronunciation, and so on.

`ignorePunctuation`

There are times when you are comparing or sorting strings that you want to ignore any punctuation; perhaps you want to treat words with typographic quotes and straight quotes as the same (say, “can’t” and “can’t”), or match both the hyphenated and non-hyphenated form of a name. You can achieve this by setting the ignorePunctuation property to true in the options object:

var ignore = [true, false],
    locale = "en",
    searcher = null;
    
for (var i = 0; i < ignore.length; i++ ) {
    searcher = new Intl.Collator(locale, { 
        usage: "search", 
        ignorePunctuation: ignore[i] 
    });

    console.log(searcher.compare("Helena Bonham Carter", 
        "Helena Bonham-Carter"));
}

Demo showing search, ignoring punctuation

Be aware that this is all or nothing; it will ignore all punctuation used by the locale in question.

The equivalent Unicode extension key is ka, but this is not supported as part of the language tag in the Internationalization API.

`sensitivity`

Sometimes you may want to be more or less strict when collating strings. Let’s take an example; search for “naïve” in your favourite search engine, then use the browser inline search functionality to look for “naive”. It will likely (unless you’ve changed the settings) match all cases of the original search term too. One of the reasons for this is that it is easier for the user, rather than having to force them to learn how to type the diacritic, or using copy and paste.

The sensitivity property allows you to customise this behaviour in your own app. It has four possible values:

`base`

Strings are only unequal if the base letter is different. If the base letter has a diacritical mark, or is a different case, it will still be equal.

 // equal
new Intl.Collator("en", { 
    usage: "search", 
    sensitivity: "base" 
}).compare("a", "å");

`accent`

Strings are unequal if the base letter or the diacritical mark are different. Strings with different cases will be equal.

// unequal
new Intl.Collator("en", { 
    usage: "search", 
    sensitivity: "accent" 
}).compare("a", "å");

`case`

Strings are unequal if the base letter or case are different. If the diacritical mark is different it will still be equal

// unequal
new Intl.Collator("en", { 
    usage: "search", 
    sensitivity: "case" 
}).compare("a", "A");

`variant`

Strings will only be equal if the base letter is the same. Changes in letter, case, or diacritical marks will be considered unequal.

An example of all these in action can be found in [this sensitivity demo(http://jsfiddle.net/dstorey/CzFam/)

One thing to bear (🐻) in mind is that while in some languages (say English), a letter with a certain diacritical mark may be considered the same letter, in others that combination may be considered a different base letter. The letter “å” is a different letter to “a” in Denmark and Norway:

 // equal
new Intl.Collator("en", { 
        usage: "search", 
        sensitivity: "base" 
    }).compare("a", "å");

 // unequal
new Intl.Collator("da", { 
        usage: "search", 
        sensitivity: "base" 
    }).compare("a", "å");

Demo showing difference in base letters between English and Danish

The equivalent Unicode key to the sensitivity property is ks, but this has different, less obvious types, and it is also ignored by the Internationalization API.

`numeric`

The numeric property enables and disables numeric sorting. If it is enabled, numbers (or even numbers represented as strings) are sorted in numerical order. That is, 10 will come after 9. On the other hand, if it is disabled, they are lexicographically ordered: 10 will come after 1 and before 2.

It is possible to set the numeric property either as a property of the same name in the options object, or using the Unicode extension kn key in the language tag. Using both methods, true enables numeric sorting, and false disables it.

var nums = [0, 2, "4", 6, 8, 1, 3, 5, 7, 9, 10, 47, "-5"],
    collator = new Intl.Collator("en", { numeric: true });

// ["-5", 0, 1, 2, 3, "4", 5, 6, 7, 8, 9, 10, 47] 
console.log(nums.sort(collator.compare));

Demo showing numeric sorting

An implementation is not required to implement the numeric property, but it is supported by both IE11 and Chrome/Blink.

`caseFirst`

If you are sorting a language that uses a bicameral script (Latin, Cyrillic, and Greek, among other), you need to take into account case order. In something like a dictionary or phone book, there is little difference if a letter is upper or lower case, while in Unicode, and the default sort order in JavaScript, majuscule letters come before minuscule.

The Internationalization API provides the caseFirst property for specifying the behaviour that you want. It can either be set via a property of the same name in the options object, or via the kf key in the language tag.

The valid values come directly from the types defined in the collation.xml Unicode extension. They are as follows:

`upper`

Upper case letters are sorted before their lower case equivalent.

var nums = ["A", "b", "C", "c", "a", "B"],
   collator = new Intl.Collator("en", { caseFirst: "upper" } );

// ["A", "a", "B", "b", "C", "c"] 
console.log(nums.sort(collator.compare));

`lower`

Lower case letters are sorted before their upper case equivalent. The same array as above would result in ["a", "A", "b", "B", "c", "C"].

`false`

This is the default value, and provides no special case ordering. Using the English locale, this is the same as if "lower" was specified. Note that "false" is a string, rather than a boolean.

You can try out all three values in the caseFirst demo

Using existing Number, Date, and String methods

You may be aware that in ECMAScript, the Object prototype has a toLocaleString method, and that Array, Date, and Number override this with their own implementation. And also, that the Date prototype provides the toLocaleDateString and toLocaleTimeString methods. While these could be useful, they accepted no parameters, and thus could only be used for the user’s current locale.

In much the same way that the Internationalization API allows you to pass a locale and options Object as arguments to the NumberFormat, DateTimeFormat, and Collator constructors, it also extends the already existing locale aware methods to include these two parameters:

// Show the hour in Chinese
console.log(new Date().toLocaleTimeString("zh", {
    hour: "2-digit"
}));

// Show the date in Turkish
console.log(new Date().toLocaleDateString("tr", {
    day: "2-digit",
    month: "long",
    year: "numeric"
}));

// Show a number in Arabic as Brazilian real
console.log(new Number(12345677).toLocaleString("ar", {
    style: "currency",
    currency: "BRL",
    currencyDisplay: "name"
}));

Using existing locale aware methods with locale and options

When to use each approach?

At the end of the day, which approach you use is up to you, and your use case. Using the existing methods are more concise, but if you are localising more than one piece of data with the same locale information, you will have to pass the locale and options each time. When using the Collation and formatter objects, you can set this up once and keep reusing it.

Wrap up

Phew, congratulations for sitting through all that, if you’re still here. I hope this gives you enough information to be able to dive deep into the Internationlization API, and start to add some locale awareness to your apps. You’ve seen how numbers and dates can be formatted using the local customs of individual languages and countries. You’ve also seen how strings can be collated using these customs, and vary degrees of strictness. You’ve also probably come to appreciate, if you didn’t already, how much work it would take to roll your own if localisation functionality is not built in.

While this API is designed for localising apps and sites, there are a number of formatting and collation options that may prove useful even when your app will never be translated; things such as formatting numbers and currencies correctly, to improving sort order or relaxing the strictness of search matching when punctuation or accented characters are used.

Work has already begun on the 2nd version of the Internationlization API. Currently including expanded Time Zone support (using the IANA time zone name), and Locale sensitive case conversion.

The API is clearly a first step along the road to adding full localisation support to JavaScript, but it already covers a lot of use cases. I know it would have been helpful for issues I’ve seen in the past with developing multi-language apps.

Записи о веб-разработке, рекомендованные Артемием Трегубенко

ECMAScript Internationalization API

Specifying a locale

Language sub tag

Script sub tag

Region sub tag

Extension sub tags

Calendar

Collation

Currency

Numeral system

Timezone

Localising content

Working with localisation objects

Testing if a locale is supported

Accessing the locale, and the formatting and collation options

Changing the locale look up method

Formatting using the locale

Collating using the locale

Formatting and collation options

Number formatting options

numberingSystem

useGrouping

style

decimal

percent

currency

currency

currencyDisplay

minimumIntegerDigits

minimumFractionDigits

maximumFractionDigits

minimumSignificantDigits

maximumSignificantDigits

Date and time formatting options

numbering­System

calendar

time­Zone

time­ZoneName

Time components: hour, minute, and second

hour12

hourNo0

weekday

era

day, month, and year

String collation options

usage

collation

ignorePunctuation

sensitivity

base

accent

case

variant

numeric

caseFirst

upper

lower

false

Using existing Number, Date, and String methods

When to use each approach?

Wrap up

An Improved DevTools Editing Workflow

`numberingSystem`

`useGrouping`

`style`

`decimal`

`percent`

`currency`

`currency`

`currencyDisplay`

`minimumIntegerDigits`

`minimumFractionDigits`

`maximumFractionDigits`

`minimumSignificantDigits`

`maximumSignificantDigits`

`numberingSystem`

`calendar`

`timeZone`

`timeZoneName`

Time components: `hour`, `minute`, and `second`

`hour12`

`hourNo0`

`weekday`

`era`

`day`, `month`, and `year`

`usage`

`collation`

`ignorePunctuation`

`sensitivity`

`base`

`accent`

`case`

`variant`

`numeric`

`caseFirst`

`upper`

`lower`

`false`