Language Tagging
Languages in Rev79 are tagged with BCP 47 language tags. This page provides an overview of how Rev79 understands language tags, and how relationships between languages are inferred by comparing language tags.
Structure of Language Tags
Language tags consist of a series of "subtags" providing information about the language:
en-Latn-AU
^^-------- Primary subtag
^^^^--- Script subtag
^^ Country subtag
The syntax and meaning of subtags is defined in RFC 5646. Rev79 supports the subset of language tags defined by the following ABNF (taken from the RFC):
langtag = language ["-" script] ["-" region] *("-" variant) ["-" privateuse]
/ privateuse
language = 2*3ALPHA ; shortest ISO 639 code
["-" extlang] ; sometimes followed by
; extended language subtags
/ 4ALPHA ; or reserved for future use
/ 5*8ALPHA ; or registered language subtag
extlang = 3ALPHA ; selected ISO 639 codes
*2("-" 3ALPHA) ; permanently reserved
script = 4ALPHA ; ISO 15924 code
region = 2ALPHA ; ISO 3166-1 code
/ 3DIGIT ; UN M.49 code
variant = 5*8(ALPHA / DIGIT) ; registered variants
/ (DIGIT 3(ALPHA / DIGIT))
privateuse = "x" 1*("-" (1*8(ALPHA / DIGIT)))
Rev79 treats language tags purely syntactically, and does not attempt to normalise or validate tags using the IANA registry. Nonetheless, to aid interoperability with other systems, care should be taken to ensure valid language/script/region subtags according to their relevant standards.
Private use subtag: HIS (ROLV codes)
Rev79 supports ROLV codes using the Harvest Information System extension subtag defined by Global Recordings Network. These tags must consist of the characters HIS followed by exactly five numbers (padded by leading zeroes, where necessary).
kmn-x-HIS01234
^^^----------- Awtuw language
^--------- extension subtag marker
^^^^^^^^ Kamnam variety
Inferring Relationships from Language Tags
In some circumstances, Rev79 will infer a relationship between two languages, so that work done on specific language tag (e.g. kmn-x-HIS01234) can be considered as work towards a more general language tag (e.g. kmn). This inference is syntactic, based on the contents of the language tag.
A language A is considered "more specific" than another language B if language A's language tag matches language B's language tag, according to the "extended filtering" process defined in RFC 4647. Broadly speaking, this considers each tag subtag of the language tag, and matches either if A and B have the same value (or a subset, for variant/privateuse sets) for the subtag, or if B does not specify the subtag.
Some examples:
kmn-x-HIS01234is more specific thankmnkmn-PG-x-HIS01234is more specific thankmnkmn-x-HIS01234is not more specific thankmn-PGenis not more specific thankmn