LanguageTag

LanguageTag

Table of Contents

This is a memo of RFC 5646, ie BCP-47.

1 The Language Tag

Language tags are used to help identify languages, whether spoken, written, signed, or otherwise signaled, for the purpose of communication. This includes constructed and artificial languages but excludes languages not intended primarily for human communication, such as programming languages.

1.1 Syntax

  • TAG is composed from a sequence of one or more subtags
  • SubTags are sequence of alphanumric characters to narrow the range of languge.
  • SubTags are concated suing "-".

The syntax of the language tag in ABNF [RFC5234] is:

Language-Tag  = langtag             ; normal language tags
              / privateuse          ; private use tag
              / grandfathered       ; grandfathered tags

langtag       = language
                ["-" script]
                ["-" region]
                *("-" variant)
                *("-" extension)
                ["-" privateuse]

language      = 2*3ALPHA            ; shortest ISO 639 code
                ["-" extlang]       ; sometimes followed by
                                    ; extended language subtags
              / 4ALPHA              ; or reserved for future use
              / 5*8ALPHA            ; or registered language subtag

extlang       = 3ALPHA              ; selected ISO 639 codes
                *2("-" 3ALPHA)      ; permanently reserved

script        = 4ALPHA              ; ISO 15924 code

region        = 2ALPHA              ; ISO 3166-1 code
              / 3DIGIT              ; UN M.49 code

variant       = 5*8alphanum         ; registered variants
              / (DIGIT 3alphanum)

extension     = singleton 1*("-" (2*8alphanum))

                                    ; Single alphanumerics
                                    ; "x" reserved for private use
singleton     = DIGIT               ; 0 - 9
              / %x41-57             ; A - W
              / %x59-5A             ; Y - Z
              / %x61-77             ; a - w
              / %x79-7A             ; y - z

privateuse    = "x" 1*("-" (1*8alphanum))

grandfathered = irregular           ; non-redundant tags registered
              / regular             ; during the RFC 3066 era

irregular     = "en-GB-oed"         ; irregular tags do not match
              / "i-ami"             ; the 'langtag' production and
              / "i-bnn"             ; would not otherwise be
              / "i-default"         ; considered 'well-formed'
              / "i-enochian"        ; These tags are all valid,
              / "i-hak"             ; but most are deprecated
              / "i-klingon"         ; in favor of more modern
              / "i-lux"             ; subtags or subtag
              / "i-mingo"           ; combination
              / "i-navajo"
              / "i-pwn"
              / "i-tao"
              / "i-tay"
              / "i-tsu"
              / "sgn-BE-FR"
              / "sgn-BE-NL"
              / "sgn-CH-DE"

regular       = "art-lojban"        ; these tags match the 'langtag'
              / "cel-gaulish"       ; production, but their subtags
              / "no-bok"            ; are not extended language
              / "no-nyn"            ; or variant subtags: their meaning
              / "zh-guoyu"          ; is defined by their registration
              / "zh-hakka"          ; and all of these are deprecated
              / "zh-min"            ; in favor of a more modern
              / "zh-min-nan"        ; subtag or sequence of subtags
              / "zh-xiang"

alphanum      = (ALPHA / DIGIT)     ; letters and numbers

Figure 1: Language Tag ABNF

Note:

1.1.1 Formatting of Languge Tags

Although tags should be case-insensitive, there are formatting conventions:

  • recommends that language codes be written in lowercase ('mn' Mongolian).
  • recommends that script codes use lowercase with the initial letter capitalized ('Cyrl' Cyrillic).
  • recommends that country codes be capitalized ('MN' Mongolia).

1.2 Language Subtag Sources and Interpretation

The namespace of language tags and their subtags is administered by the Internet Assigned Numbers Authority (IANA) according to the rules in Section 5 of this document. The Language Subtag Registry maintained by IANA is the source for valid subtags: other standards referenced in this section provide the source material for that registry.

1.2.1 Primary Language Subtag

Should never be omitted in most cases, can be two or three characters.

posted @ 2014-06-18 13:29  英超  Views(617)  Comments(0Edit  收藏  举报