European Ordering Rules: Ordering of characters from Latin, Greek, Cyrillic, Georgian and Armenian scripts
Status: prEN 13710 integrating the comments from the CEN enquiry
Editor and Contact Person: Marc Wilhelm Küster (kuester [AT] fh-worms [DOT] de)
Contents
- European Ordering Rules: Ordering of characters from Latin, Greek, Cyrillic, Georgian and Armenian scripts
- Foreword
- Introduction
- 1 Scope
- 2 Normative references
- 3 Terms and definitions
- 4 Conformance
- 5 Tailorability
- 7 Bibliography
- 6 EOR Delta Table
- Annex A (informative): Principles behind the European Ordering Rules
- A.0 Introduction
- A.1 Terms and definitions
- A.2 Preparatory procedures
- A.3 The multilevel ordering procedure
- A.4 First ordering level
- A.5 Second ordering level
- A.6 Third ordering level
- A.7 Fourth ordering level
- A.8 Specific ordering sequences
- Annex B (informative): Word-by-word ordering
- Annex C (informative): Ordering by position and by style
- Annex D (informative): Mixed-script ordering with one predominant script
- Annex E (informative): Defining National Deltas based on the EOR
- Annex F (informative): Modern European Scripts / MES
- Annex G (informative): EOR Delta in LDML Syntax
Foreword
This document (prEN 13710:2009) has been prepared by Technical Committee CEN/TC “304”, the secretariat of which is held by DIN.
This document is currently submitted to the CEN Enquiry.
This document will supersede ENV 13710:2001-12 European Ordering Rules — Ordering of characters from the Latin, Greek and Cyrillic scripts and CR 14400:2001-12 European Ordering Rules - Ordering for Latin, Greek, Cyrillic, Georgian and Armenian scripts.
Introduction
This European Standard provides rules for ordering multilingual lists into a well-defined and intuitive sequence. These rules are intended for data from different European languages that must be brought into a predictable order that makes it easy for users from multiple cultural backgrounds to find information. At the same time the standard is a basis for the definition of language-specific profiles taking the rules of a given language community into account at the same time as the total pan-European character set in a consistent, pan-European manner.
The rules have been tested and widely adopted in two predecessor specifications, ENV 13710:2000-12 European Ordering Rules — Ordering of characters from the Latin, Greek and Cyrillic scripts and its companion and extension specification CR 14400:2001-12 European Ordering Rules - Ordering for Latin, Greek, Cyrillic, Georgian and Armenian scripts. This European Standard consolidates these two documents into one technically largely upwards compatible standard.
This European Standard caters for two different target groups, software implementers on the one hand and users of ordering applications on the other.
Software implementers need unambiguous, machine-processable guidelines, which can readily be loaded into existing and future ordering applications. This goal can best be achieved by defining a European default ordering table in the syntaxes of two internationally relevant specifications in the field:
the international ordering standard ISO/IEC 14651:2007 International string ordering and comparison — Method for comparing character strings and description of the common template tailorable ordering, of which the present standard is a “delta" (section 6);
the Locale Data Markup Language (LDML) which is used to capture (amongst others) the collation data for the Unicode Common Locale Data Repository (CLDR). CLDR builds on the Unicode Collation Algorithm (UCA) which is technically a profile of ISO/IEC 14651:2007 (informative annex G).
Users with no specific ICT background, however, need an explanation of the principles in a form more in line with existing national ordering standards or relevant practice. Tailoring tables can be difficult to read for human readers, so an explanation of the principles behind that table is given in the informative annexes. Users not familiar with the formal syntax of the tailoring table are advised to consult those annexes first.
The normative main part of this European Standard specifies letter-by-letter ordering of character strings. Informative Annex A presents equivalent information in a more human-oriented way. Informative Annex B deals with word-by-word ordering as a special form of ordering with multiple keys. Informative Annex C explains the use of further ordering criteria. Informative Annex D presents a widely used alternative to the main part, namely the amalgamation of several scripts in one index via implicit transliteration. Informative Annex E gives guidance on the use of this European Standard as the basis for expressing national deltas. Informative Annex F lists the underlying character repertoire for ease of reference. Informative Annex G expresses the formal delta in the LDML syntax.
Following the practice of ISO/IEC 14651 characters are referenced as UXXXX where X stands for any hexadecimal digit and refers to the code position of that character in ISO/IEC 10646. This convention is used throughout this European Standard.
1 Scope
This European Standard specifies the order between two character strings composed of characters from the Modern European Scripts (MES) collection of ISO/IEC 10646:2003 or subsets of it.
NOTE Collection 283 Modern European Scripts (MES) of ISO/IEC 10646:2003 was originally specified in CEN Workshop Agreement 13873:2000 Multilingual European Subsets of ISO/IEC 10646 as Multilingual European Subset Number 3 and was subsequently incorporated as a collection in Annex A of ISO/IEC 10646:2003 alongside its sister collections MES-1 and MES-2.
The ordering rules specified in this European Standard are only applicable for lists of data in more than one European language and when this data is intended for a multicultural audience. They complement existing national standards or practices in the field.
2 Normative references
This European Standard incorporates by dated or undated reference provisions from other publications. These normative references are quoted at the appropriate places in the text, and the publications are listed hereafter.
All standards are subject to revision. Dated references do not always refer to subsequent amendments of the publication in question. Undated references always refer to the latest edition.
ISO/IEC 10646:2003-12, Information Technology — Universal Multi-Octet Coded Character set (UCS). Second edition
ISO 12199:2000-8, Alphabetical ordering of multilingual terminological and lexicographical data represented in the Latin alphabet
ISO/IEC 14651:2007-12, International string ordering and comparison — Method for comparing character strings and description of the common template tailorable ordering
Unicode Technical Standard #10: Unicode Collation Algorithm. Version 5.2.0 (2009-10-08)
3 Terms and definitions
For the purpose of this European Standard the following definitions of ISO/IEC 10646 and of ISO/IEC 14651 apply:
3.1
character
member of a set of elements used for the organisation, control, or representation of data [ISO/IEC 10646:2003]
3.2
character string
sequence of characters considered as a single object [ISO/IEC 14651]
3.3
collating symbol
symbol used to specify weights assigned to a collating element [ISO/IEC 14651]
3.4
collating element
sequence of one or more characters that are considered a single entity for ordering [ISO/IEC 14651]
3.5
collation table
mapping from collating elements to weighting elements [ISO/IEC 14651]
NOTE A special collation table is the Common Template Table (CTT) used in Annex A of ISO/IEC 14651 to express the default mapping from collating elements to weighting elements.
3.6
delta
list of the differences between a given collation table and another one [ISO/IEC 14651]
NOTE: The given collation table, together with a given delta, forms a new collation table. Unless otherwise specified in the European Standard, the term “delta" always refers to differences from the Common Template Table as defined in ISO/IEC 14651.
3.7
ordering
process by which, given two strings, it is determined whether the first one is less than, equal to, or greater than the second one [ISO/IEC 14651]
3.8
sorting
presentation of information in a structured way
NOTE: Sorting may include the subdivision of information by subject matters, e. g. by having several registers in a book, by splitting a phone book into several sections, one for each town that falls into its purview or by having multiple indices in a library. Ordering is in most circumstances an integral part of this procedure.
4 Conformance
In order to be conformant to this European Standard an application shall meet the requirements prescribed in section 6 of ISO/IEC 14651 and its Common Template Table ISO14651_2006_TABLE1 after the application of the EOR delta table specified in section 6 of this European Standard. An equivalent description of the resulting tailored table shall equally conform to this European Standard.
5 Tailorability
The European Ordering Rules defined in this standard can be taken as a default template which can be tailored to the needs of any European country in the manner specified by ISO/IEC 14651 (cf. also Informative Annex E).
This European Standard is not meant to influence national standards or traditions in the field of ordering, its scope being the ordering of multilingual data. Nonetheless, national standards are encouraged to express their national ordering rules on this European Standard by declaring a formalized set of deviation rules (”delta”), as explained in Informative Annex E. This way, the respective ordering rules are automatically machine-processable and can be incorporated into international repositories of locale data, allowing for more widespread support of national ordering standards across software products.
7 Bibliography
ISO/IEC 15897:1999 Information technology – Procedures for the registration of cultural elements
Unicode Technical Standard #35: Unicode Locale Data Markup Language (LDML). Version 1.8 (2010-04-28)
6 EOR Delta Table
NOTE For the syntax of the table please consult ISO/IEC 14651.
%% EOR's EORDeltaTable % %% European Ordering Rules. % % EOR delta for MES-3 from ISO/IEC 14651:2007's CTT (ISO14651_2006_TABLE1_en.txt). % % EOR uses four levels for comparison reorder-after <BASE> % Introduce the LIG weight. collating-symbol <LIG> <LIG> reorder-end reorder-after <VRNT3> %Introduce more variants collating-symbol <VRNT4> collating-symbol <VRNT5> collating-symbol <VRNT6> collating-symbol <VRNT7> collating-symbol <VRNT8> collating-symbol <VRNT9> <VRNT4> <VRNT5> <VRNT6> <VRNT7> <VRNT8> <VRNT9> reorder-end reorder-after <S0584> %Introduce a weight for U0587 ARMENIAN SMALL LIGATURE ECH YIWN collating-symbol <ECH-YIWN> <ECH-YIWN> reorder-end reorder-after <SFFFF> order_start forward;forward;forward;forward % Non-alphanumeric characters (including some modifier letters): % The DRACHMA SIGN is already in ISO14651_2006 ignorable on levels 1-3 <U0024> IGNORE;IGNORE;IGNORE;<U0024> % DOLLAR SIGN <U00A2> IGNORE;IGNORE;IGNORE;<U00A2> % CENT SIGN <U00A3> IGNORE;IGNORE;IGNORE;<U00A3> % POUND SIGN <U00A4> IGNORE;IGNORE;IGNORE;<U00A4> % CURRENCY SIGN <U00A5> IGNORE;IGNORE;IGNORE;<U00A5> % YEN SIGN <U20A0> IGNORE;IGNORE;IGNORE;<U20A0> % EURO-CURRENCY SIGN <U20A1> IGNORE;IGNORE;IGNORE;<U20A1> % COLON SIGN <U20A2> IGNORE;IGNORE;IGNORE;<U20A2> % CRUZEIRO SIGN <U20A3> IGNORE;IGNORE;IGNORE;<U20A3> % FRENCH FRANC SIGN <U20A4> IGNORE;IGNORE;IGNORE;<U20A4> % LIRA SIGN <U20A5> IGNORE;IGNORE;IGNORE;<U20A5> % MILL SIGN <U20A6> IGNORE;IGNORE;IGNORE;<U20A6> % NAIRA SIGN <U20A7> IGNORE;IGNORE;IGNORE;<U20A7> % PESETA SIGN <U20A8> IGNORE;IGNORE;IGNORE;<U20A8> % RUPEE SIGN <U20A9> IGNORE;IGNORE;IGNORE;<U20A9> % WON SIGN <U20AA> IGNORE;IGNORE;IGNORE;<U20AA> % NEW SHEQEL SIGN <U20AB> IGNORE;IGNORE;IGNORE;<U20AB> % DONG SIGN <U20AC> IGNORE;IGNORE;IGNORE;<U20AC> % EURO SIGN <U20AD> IGNORE;IGNORE;IGNORE;<U20AD> % KIP SIGN <U20AE> IGNORE;IGNORE;IGNORE;<U20AE> % TUGRIK SIGN <U20AF> IGNORE;IGNORE;IGNORE;<U20AF> % DRACHMA SIGN <U20B0> IGNORE;IGNORE;IGNORE;<U20B0> % GERMAN PENNY SIGN <U20B1> IGNORE;IGNORE;IGNORE;<U20B1> % PESO SIGN <U20B2> IGNORE;IGNORE;IGNORE;<U20B2> % GUARANI SIGN <U20B3> IGNORE;IGNORE;IGNORE;<U20B3> % AUSTRAL SIGN <U20B4> IGNORE;IGNORE;IGNORE;<U20B4> % HRYVNIA SIGN <U20B5> IGNORE;IGNORE;IGNORE;<U20B5> % CEDI SIGN % Modifier letters that are not ignorable in ISO14651_2006_TABLE1_en.txt <U02B0> IGNORE;IGNORE;IGNORE;<U02B0> % MODIFIER LETTER SMALL H <U02B1> IGNORE;IGNORE;IGNORE;<U02B1> % MODIFIER LETTER SMALL H WITH HOOK <U02B2> IGNORE;IGNORE;IGNORE;<U02B2> % MODIFIER LETTER SMALL J <U02B3> IGNORE;IGNORE;IGNORE;<U02B3> % MODIFIER LETTER SMALL R <U02B4> IGNORE;IGNORE;IGNORE;<U02B4> % MODIFIER LETTER SMALL TURNED R <U02B5> IGNORE;IGNORE;IGNORE;<U02B5> % MODIFIER LETTER SMALL TURNED R WITH HOOK <U02B6> IGNORE;IGNORE;IGNORE;<U02B6> % MODIFIER LETTER SMALL CAPITAL INVERTED R <U02B7> IGNORE;IGNORE;IGNORE;<U02B7> % MODIFIER LETTER SMALL W <U02B8> IGNORE;IGNORE;IGNORE;<U02B8> % MODIFIER LETTER SMALL Y <U02BB> IGNORE;IGNORE;IGNORE;<U02BB> % MODIFIER LETTER TURNED COMMA <U02BC> IGNORE;IGNORE;IGNORE;<U02BC> % MODIFIER LETTER APOSTROPHE <U02BD> IGNORE;IGNORE;IGNORE;<U02BD> % MODIFIER LETTER REVERSED COMMA <U02BE> IGNORE;IGNORE;IGNORE;<U02BE> % MODIFIER LETTER RIGHT HALF RING <U02BF> IGNORE;IGNORE;IGNORE;<U02BF> % MODIFIER LETTER LEFT HALF RING <U02C0> IGNORE;IGNORE;IGNORE;<U02C0> % MODIFIER LETTER GLOTTAL STOP <U02C1> IGNORE;IGNORE;IGNORE;<U02C1> % MODIFIER LETTER REVERSED GLOTTAL STOP <U02D0> IGNORE;IGNORE;IGNORE;<U02D0> % MODIFIER LETTER TRIANGULAR COLON <U02D1> IGNORE;IGNORE;IGNORE;<U02D1> % MODIFIER LETTER HALF TRIANGULAR COLON <U02E0> IGNORE;IGNORE;IGNORE;<U02E0> % MODIFIER LETTER SMALL GAMMA <U02E1> IGNORE;IGNORE;IGNORE;<U02E1> % MODIFIER LETTER SMALL L <U02E2> IGNORE;IGNORE;IGNORE;<U02E2> % MODIFIER LETTER SMALL S <U02E4> IGNORE;IGNORE;IGNORE;<U02E4> % MODIFIER LETTER SMALL REVERSED GLOTTAL STOP <U02EE> IGNORE;IGNORE;IGNORE;<U02EE> % MODIFIER LETTER DOUBLE APOSTROPHE <U0294> IGNORE;IGNORE;IGNORE;<U0294> % LATIN LETTER GLOTTAL STOP <U0295> IGNORE;IGNORE;IGNORE;<U0295> % LATIN LETTER PHARYNGEAL VOICED FRICATIVE <U0296> IGNORE;IGNORE;IGNORE;<U0296> % LATIN LETTER INVERTED GLOTTAL STOP <U0298> IGNORE;IGNORE;IGNORE;<U0298> % LATIN LETTER BILABIAL CLICK <U02A1> IGNORE;IGNORE;IGNORE;<U02A1> % LATIN LETTER GLOTTAL STOP WITH STROKE <U02A2> IGNORE;IGNORE;IGNORE;<U02A2> % LATIN LETTER REVERSED GLOTTAL STOP WITH STROKE %% % Latin % Almost all changes here result from CEN/TC304's resolution % for the Latin script part of the Modern European Scripts / MES-3 to % treat only the letters a to z and thorn as distinct on the first % level and treat other combinations as variants or ligatures <U0250> <S0061>;"<BASE><VRNT1>";"<MIN><MIN>";<U0250> % LATIN SMALL LETTER TURNED A <U0251> <S0061>;"<BASE><VRNT2>";"<MIN><MIN>";<U0251> % LATIN SMALL LETTER ALPHA <U0252> <S0061>;"<BASE><VRNT3>";"<MIN><MIN>";<U0252> % LATIN SMALL LETTER TURNED ALPHA <U0299> <S0062>;"<BASE><VRNT1>";"<MIN><MIN>";<U0299> % LATIN LETTER SMALL CAPITAL B <U0180> <S0062>;"<BASE><VRNT2>";"<MIN><MIN>";<U0180> % LATIN SMALL LETTER B WITH STROKE <U0243> <S0062>;"<BASE><VRNT2>";"<CAP><MIN>";<U0243> % LATIN CAPITAL LETTER B WITH STROKE <U0253> <S0062>;"<BASE><VRNT3>";"<MIN><MIN>";<U0253> % LATIN SMALL LETTER B WITH HOOK <U0181> <S0062>;"<BASE><VRNT3>";"<CAP><MIN>";<U0181> % LATIN CAPITAL LETTER B WITH HOOK <U0183> <S0062>;"<BASE><VRNT4>";"<MIN><MIN>";<U0183> % LATIN SMALL LETTER B WITH TOPBAR <U0182> <S0062>;"<BASE><VRNT4>";"<CAP><MIN>";<U0182> % LATIN CAPITAL LETTER B WITH TOPBAR <U0188> <S0063>;"<BASE><VRNT1>";"<MIN><MIN>";<U0188> % LATIN SMALL LETTER C WITH HOOK <U0187> <S0063>;"<BASE><VRNT1>";"<CAP><MIN>";<U0187> % LATIN CAPITAL LETTER C WITH HOOK <U0255> <S0063>;"<BASE><VRNT2>";"<MIN><MIN>";<U0255> % LATIN SMALL LETTER C WITH CURL <U0297> <S0063>;"<BASE><VRNT3>";"<MIN><MIN>";<U0297> % LATIN LETTER STRETCHED C % <VRNT1> is used for U00F0 LATIN SMALL LETTER ETH (already in CTT) <U0256> <S0064>;"<BASE><VRNT2>";"<MIN><MIN>";<U0256> % LATIN SMALL LETTER D WITH TAIL <U0189> <S0064>;"<BASE><VRNT2>";"<CAP><MIN>";<U0189> % LATIN CAPITAL LETTER AFRICAN D <U0257> <S0064>;"<BASE><VRNT3>";"<MIN><MIN>";<U0257> % LATIN SMALL LETTER D WITH HOOK <U018A> <S0064>;"<BASE><VRNT3>";"<CAP><MIN>";<U018A> % LATIN CAPITAL LETTER D WITH HOOK <U018C> <S0064>;"<BASE><VRNT4>";"<MIN><MIN>";<U018C> % LATIN SMALL LETTER D WITH TOPBAR <U018B> <S0064>;"<BASE><VRNT4>";"<CAP><MIN>";<U018B> % LATIN CAPITAL LETTER D WITH TOPBAR <U0221> <S0064>;"<BASE><VRNT5>";"<MIN><MIN>";<U0221> % LATIN SMALL LETTER D WITH CURL <U018D> <S0064>;"<BASE><VRNT6>";"<MIN><MIN>";<U018D> % LATIN SMALL LETTER TURNED DELTA <U02A5> "<S0064><S007A>";"<BASE><BASE><VRNT4>";"<COMPAT><COMPAT><COMPAT>";<U02A5> % LATIN SMALL LETTER DZ DIGRAPH WITH CURL <U02A4> "<S0064><S007A>";"<BASE><BASE><VRNT5>";"<COMPAT><COMPAT><COMPAT>";<U02A4> % LATIN SMALL LETTER DEZH DIGRAPH <U01DD> <S0065>;"<BASE><VRNT1>";"<MIN><MIN>";<U01DD> % LATIN SMALL LETTER TURNED E <U018E> <S0065>;"<BASE><VRNT1>";"<CAP><MIN>";<U018E> % LATIN CAPITAL LETTER REVERSED E <U0259> <S0065>;"<BASE><VRNT2>";"<MIN><MIN>";<U0259> % LATIN SMALL LETTER SCHWA <U018F> <S0065>;"<BASE><VRNT2>";"<CAP><MIN>";<U018F> % LATIN CAPITAL LETTER SCHWA <U025B> <S0065>;"<BASE><VRNT3>";"<MIN><MIN>";<U025B> % LATIN SMALL LETTER OPEN E <U0190> <S0065>;"<BASE><VRNT3>";"<CAP><MIN>";<U0190> % LATIN CAPITAL LETTER OPEN E <U0258> <S0065>;"<BASE><VRNT4>";"<MIN><MIN>";<U0258> % LATIN SMALL LETTER REVERSED E <U025A> <S0065>;"<BASE><VRNT5>";"<MIN><MIN>";<U025A> % LATIN SMALL LETTER SCHWA WITH HOOK <U025C> <S0065>;"<BASE><VRNT6>";"<MIN><MIN>";<U025C> % LATIN SMALL LETTER REVERSED OPEN E <U025D> <S0065>;"<BASE><VRNT7>";"<MIN><MIN>";<U025D> % LATIN SMALL LETTER REVERSED OPEN E WITH HOOK <U025E> <S0065>;"<BASE><VRNT8>";"<MIN><MIN>";<U025E> % LATIN SMALL LETTER CLOSED REVERSED OPEN E <U029A> <S0065>;"<BASE><VRNT9>";"<MIN><MIN>";<U029A> % LATIN SMALL LETTER CLOSED OPEN E <U0192> <S0066>;"<BASE><VRNT1>";"<MIN><MIN>";<U0192> % LATIN SMALL LETTER F WITH HOOK <U0191> <S0066>;"<BASE><VRNT1>";"<CAP><MIN>";<U0191> % LATIN CAPITAL LETTER F WITH HOOK <U0261> <S0067>;"<BASE><VRNT1>";"<MIN><MIN>";<U0261> % LATIN SMALL LETTER SCRIPT G <U0262> <S0067>;"<BASE><VRNT2>";"<MIN><MIN>";<U0262> % LATIN LETTER SMALL CAPITAL G <U01E5> <S0067>;"<BASE><VRNT3>";"<MIN><MIN>";<U01E5> % LATIN SMALL LETTER G WITH STROKE <U01E4> <S0067>;"<BASE><VRNT3>";"<CAP><MIN>";<U01E4> % LATIN CAPITAL LETTER G WITH STROKE <U0260> <S0067>;"<BASE><VRNT4>";"<MIN><MIN>";<U0260> % LATIN SMALL LETTER G WITH HOOK <U0193> <S0067>;"<BASE><VRNT4>";"<CAP><MIN>";<U0193> % LATIN CAPITAL LETTER G WITH HOOK <U029B> <S0067>;"<BASE><VRNT5>";"<MIN><MIN>";<U029B> % LATIN LETTER SMALL CAPITAL G WITH HOOK <U0263> <S0067>;"<BASE><VRNT6>";"<MIN><MIN>";<U0263> % LATIN SMALL LETTER GAMMA <U0194> <S0067>;"<BASE><VRNT6>";"<CAP><MIN>";<U0194> % LATIN CAPITAL LETTER GAMMA <U0264> <S0067>;"<BASE><VRNT7>";"<MIN><MIN>";<U0264> % LATIN SMALL LETTER RAMS HORN <U01A3> <S0067>;"<BASE><VRNT8>";"<MIN><MIN>";<U01A3> % LATIN SMALL LETTER OI <U01A2> <S0067>;"<BASE><VRNT8>";"<CAP><MIN>";<U01A2> % LATIN CAPITAL LETTER OI <U029C> <S0068>;"<BASE><VRNT1>";"<MIN><MIN>";<U029C> % LATIN LETTER SMALL CAPITAL H <U0266> <S0068>;"<BASE><VRNT2>";"<MIN><MIN>";<U0266> % LATIN SMALL LETTER H WITH HOOK <U0267> <S0068>;"<BASE><VRNT3>";"<MIN><MIN>";<U0267> % LATIN SMALL LETTER HENG WITH HOOK <U0265> <S0068>;"<BASE><VRNT4>";"<MIN><MIN>";<U0265> % LATIN SMALL LETTER TURNED H <U02AE> <S0068>;"<BASE><VRNT5>";"<MIN><MIN>";<U02AE> % LATIN SMALL LETTER TURNED H WITH FISHHOOK <U02AF> <S0068>;"<BASE><VRNT6>";"<MIN><MIN>";<U02AF> % LATIN SMALL LETTER TURNED H WITH FISHHOOK AND TAIL <U0195> "<S0068><S0076>";"<BASE><BASE>";"<MIN><MIN>";<U0195> % LATIN SMALL LETTER HV <U01F6> "<S0068><S0076>";"<BASE><BASE>";"<CAP><MIN>";<U01F6> % LATIN CAPITAL LETTER HWAIR <U0131> <S0069>;"<BASE><VRNT1>";"<MIN><MIN>";<U0131> % LATIN SMALL LETTER DOTLESS I <U026A> <S0069>;"<BASE><VRNT2>";"<MIN><MIN>";<U026A> % LATIN LETTER SMALL CAPITAL I <U0268> <S0069>;"<BASE><VRNT3>";"<MIN><MIN>";<U0268> % LATIN SMALL LETTER I WITH STROKE <U0197> <S0069>;"<BASE><VRNT3>";"<CAP><MIN>";<U0197> % LATIN CAPITAL LETTER I WITH STROKE <U0269> <S0069>;"<BASE><VRNT4>";"<MIN><MIN>";<U0269> % LATIN SMALL LETTER IOTA <U0196> <S0069>;"<BASE><VRNT4>";"<CAP><MIN>";<U0196> % LATIN CAPITAL LETTER IOTA <U029D> <S006A>;"<BASE><VRNT1>";"<MIN><MIN>";<U029D> % LATIN SMALL LETTER J WITH CROSSED-TAIL <U025F> <S006A>;"<BASE><VRNT2>";"<MIN><MIN>";<U025F> % LATIN SMALL LETTER DOTLESS J WITH STROKE <U0284> <S006A>;"<BASE><VRNT3>";"<MIN><MIN>";<U0284> % LATIN SMALL LETTER DOTLESS J WITH STROKE AND HOOK <U0199> <S006B>;"<BASE><VRNT1>";"<MIN><MIN>";<U0199> % LATIN SMALL LETTER K WITH HOOK <U0198> <S006B>;"<BASE><VRNT1>";"<CAP><MIN>";<U0198> % LATIN CAPITAL LETTER K WITH HOOK <U0138> <S006B>;"<BASE><VRNT2>";"<MIN><MIN>";<U0138> % LATIN SMALL LETTER KRA <U029E> <S006B>;"<BASE><VRNT3>";"<MIN><MIN>";<U029E> % LATIN SMALL LETTER TURNED K %<VRNT1> is used for U0140 LATIN SMALL LETTER L WITH MIDDLE DOT (already in CTT) <U029F> <S006C>;"<BASE><VRNT2>";"<MIN><MIN>";<U029F> % LATIN LETTER SMALL CAPITAL L <U019A> <S006C>;"<BASE><VRNT3>";"<MIN><MIN>";<U019A> % LATIN SMALL LETTER L WITH BAR <U023D> <S006C>;"<BASE><VRNT3>";"<CAP><MIN>";<U023D> % LATIN CAPITAL LETTER L WITH BAR <U026B> <S006C>;"<BASE><VRNT4>";"<MIN><MIN>";<U026B> % LATIN SMALL LETTER L WITH MIDDLE TILDE <U026C> <S006C>;"<BASE><VRNT5>";"<MIN><MIN>";<U026C> % LATIN SMALL LETTER L WITH BELT <U026D> <S006C>;"<BASE><VRNT6>";"<MIN><MIN>";<U026D> % LATIN SMALL LETTER L WITH RETROFLEX HOOK <U0234> <S006C>;"<BASE><VRNT7>";"<MIN><MIN>";<U0234> % LATIN SMALL LETTER L WITH CURL <U019B> <S006C>;"<BASE><VRNT8>";"<MIN><MIN>";<U019B> % LATIN SMALL LETTER LAMBDA WITH STROKE <U026E> "<S006C><S007A>";"<BASE><BASE><VRNT5>";"<MIN><MIN><MIN>";<U026E> % LATIN SMALL LETTER LEZH <U0271> <S006D>;"<BASE><VRNT1>";"<MIN><MIN>";<U0271> % LATIN SMALL LETTER M WITH HOOK <U026F> <S006D>;"<BASE><VRNT2>";"<MIN><MIN>";<U026F> % LATIN SMALL LETTER TURNED M <U019C> <S006D>;"<BASE><VRNT2>";"<CAP><MIN>";<U019C> % LATIN CAPITAL LETTER TURNED M <U0270> <S006D>;"<BASE><VRNT3>";"<MIN><MIN>";<U0270> % LATIN SMALL LETTER TURNED M WITH LONG LEG <U0149> <S006E>;"<BASE><VRNT1>";"<MIN><MIN>";<U0149> % LATIN SMALL LETTER N PRECEDED BY APOSTROPHE <U0274> <S006E>;"<BASE><VRNT2>";"<MIN><MIN>";<U0274> % LATIN LETTER SMALL CAPITAL N <U0272> <S006E>;"<BASE><VRNT3>";"<MIN><MIN>";<U0272> % LATIN SMALL LETTER N WITH LEFT HOOK <U019D> <S006E>;"<BASE><VRNT3>";"<CAP><MIN>";<U019D> % LATIN CAPITAL LETTER N WITH LEFT HOOK <U019E> <S006E>;"<BASE><VRNT4>";"<MIN><MIN>";<U019E> % LATIN SMALL LETTER N WITH LONG RIGHT LEG <U0220> <S006E>;"<BASE><VRNT4>";"<CAP><MIN>";<U0220> % LATIN CAPITAL LETTER N WITH LONG RIGHT LEG <U0273> <S006E>;"<BASE><VRNT5>";"<MIN><MIN>";<U0273> % LATIN SMALL LETTER N WITH RETROFLEX HOOK <U0235> <S006E>;"<BASE><VRNT6>";"<MIN><MIN>";<U0235> % LATIN SMALL LETTER N WITH CURL <U014B> <S006E>;"<BASE><VRNT7>";"<MIN><MIN>";<U014B> % LATIN SMALL LETTER ENG <U014A> <S006E>;"<BASE><VRNT7>";"<CAP><MIN>";<U014A> % LATIN CAPITAL LETTER ENG % <VRNT1> is used for U0153 LATIN SMALL LIGATURE OE (already in CTT) <U0254> <S006F>;"<BASE><VRNT2>";"<MIN><MIN>";<U0254> % LATIN SMALL LETTER OPEN O <U0186> <S006F>;"<BASE><VRNT2>";"<CAP><MIN>";<U0186> % LATIN CAPITAL LETTER OPEN O <U0275> <S006F>;"<BASE><VRNT3>";"<MIN><MIN>";<U0275> % LATIN SMALL LETTER BARRED O <U019F> <S006F>;"<BASE><VRNT3>";"<CAP><MIN>";<U019F> % LATIN CAPITAL LETTER O WITH MIDDLE TILDE <U0277> <S006F>;"<BASE><VRNT4>";"<MIN><MIN>";<U0277> % LATIN SMALL LETTER CLOSED OMEGA <U0223> <S006F>;"<BASE><VRNT5>";"<MIN><MIN>";<U0223> % LATIN SMALL LETTER OU <U0222> <S006F>;"<BASE><VRNT5>";"<CAP><MIN>";<U0222> % LATIN CAPITAL LETTER OU <U0276> "<S006F><S0065>";"<BASE><VRNT1><BASE>";"<COMPAT><COMPAT><COMPAT>";<U0276> % LATIN LETTER SMALL CAPITAL OE <U01A5> <S0070>;"<BASE><VRNT1>";"<MIN><MIN>";<U01A5> % LATIN SMALL LETTER P WITH HOOK <U01A4> <S0070>;"<BASE><VRNT1>";"<CAP><MIN>";<U01A4> % LATIN CAPITAL LETTER P WITH HOOK <U0278> <S0070>;"<BASE><VRNT2>";"<MIN><MIN>";<U0278> % LATIN SMALL LETTER PHI <U02A0> <S0071>;"<BASE><VRNT1>";"<MIN><MIN>";<U02A0> % LATIN SMALL LETTER Q WITH HOOK <U0280> <S0072>;"<BASE><VRNT1>";"<MIN><MIN>";<U01A6> % LATIN LETTER SMALL CAPITAL R <U01A6> <S0072>;"<BASE><VRNT1>";"<CAP><MIN>";<U01A6> % LATIN LETTER YR <U0279> <S0072>;"<BASE><VRNT2>";"<MIN><MIN>";<U0279> % LATIN SMALL LETTER TURNED R <U027A> <S0072>;"<BASE><VRNT3>";"<MIN><MIN>";<U027A> % LATIN SMALL LETTER TURNED R WITH LONG LEG <U027B> <S0072>;"<BASE><VRNT4>";"<MIN><MIN>";<U027B> % LATIN SMALL LETTER TURNED R WITH HOOK <U027C> <S0072>;"<BASE><VRNT5>";"<MIN><MIN>";<U027C> % LATIN SMALL LETTER R WITH LONG LEG <U027D> <S0072>;"<BASE><VRNT6>";"<MIN><MIN>";<U027D> % LATIN SMALL LETTER R WITH TAIL <U027E> <S0072>;"<BASE><VRNT7>";"<MIN><MIN>";<U027E> % LATIN SMALL LETTER R WITH FISHHOOK <U027F> <S0072>;"<BASE><VRNT8>";"<MIN><MIN>";<U027F> % LATIN SMALL LETTER REVERSED R WITH FISHHOOK <U0281> <S0072>;"<BASE><VRNT9>";"<MIN><MIN>";<U0281> % LATIN LETTER SMALL CAPITAL INVERTED R % <VRNT1> is used for U00DF LATIN SMALL LETTER SHARP S (already in CTT) % <VRNT2> is used for U017F LATIN SMALL LETTER LONG S (already in CTT) <U0282> <S0073>;"<BASE><VRNT3>";"<MIN><MIN>";<U0282> % LATIN SMALL LETTER S WITH HOOK <U0283> <S0073>;"<BASE><VRNT4>";"<MIN><MIN>";<U0283> % LATIN SMALL LETTER ESH <U01A9> <S0073>;"<BASE><VRNT4>";"<CAP><MIN>";<U01A9> % LATIN CAPITAL LETTER ESH <U01AA> <S0073>;"<BASE><VRNT5>";"<MIN><MIN>";<U01AA> % LATIN LETTER REVERSED ESH LOOP <U0285> <S0073>;"<BASE><VRNT6>";"<MIN><MIN>";<U0285> % LATIN SMALL LETTER SQUAT REVERSED ESH <U0286> <S0073>;"<BASE><VRNT7>";"<MIN><MIN>";<U0286> % LATIN SMALL LETTER ESH WITH CURL <U0167> <S0074>;"<BASE><VRNT1>";"<MIN><MIN>";<U0167> % LATIN SMALL LETTER T WITH STROKE <U0166> <S0074>;"<BASE><VRNT1>";"<CAP><MIN>";<U0166> % LATIN CAPITAL LETTER T WITH STROKE <U01AB> <S0074>;"<BASE><VRNT2>";"<MIN><MIN>";<U01AB> % LATIN SMALL LETTER T WITH PALATAL HOOK <U01AD> <S0074>;"<BASE><VRNT3>";"<MIN><MIN>";<U01AD> % LATIN SMALL LETTER T WITH HOOK <U01AC> <S0074>;"<BASE><VRNT3>";"<CAP><MIN>";<U01AC> % LATIN CAPITAL LETTER T WITH HOOK <U0288> <S0074>;"<BASE><VRNT4>";"<MIN><MIN>";<U0288> % LATIN SMALL LETTER T WITH RETROFLEX HOOK <U01AE> <S0074>;"<BASE><VRNT4>";"<CAP><MIN>";<U01AE> % LATIN CAPITAL LETTER T WITH RETROFLEX HOOK <U0236> <S0074>;"<BASE><VRNT5>";"<MIN><MIN>";<U0236> % LATIN SMALL LETTER T WITH CURL <U0287> <S0074>;"<BASE><VRNT6>";"<MIN><MIN>";<U0287> % LATIN SMALL LETTER TURNED T <U02A8> "<S0074><S0063>";"<BASE><BASE><VRNT2>";"<COMPAT><COMPAT><COMPAT>";<U02A8> % LATIN SMALL LETTER TC DIGRAPH WITH CURL <U0289> <S0075>;"<BASE><VRNT1>";"<MIN><MIN>";<U0289> % LATIN SMALL LETTER U BAR <U0244> <S0075>;"<BASE><VRNT1>";"<CAP><MIN>";<U0244> % LATIN CAPITAL LETTER U BAR <U028A> <S0075>;"<BASE><VRNT2>";"<MIN><MIN>";<U028A> % LATIN SMALL LETTER UPSILON <U01B1> <S0075>;"<BASE><VRNT2>";"<CAP><MIN>";<U01B1> % LATIN CAPITAL LETTER UPSILON <U028B> <S0076>;"<BASE><VRNT1>";"<MIN><MIN>";<U028B> % LATIN SMALL LETTER V WITH HOOK <U01B2> <S0076>;"<BASE><VRNT1>";"<CAP><MIN>";<U01B2> % LATIN CAPITAL LETTER V WITH HOOK <U028C> <S0076>;"<BASE><VRNT2>";"<MIN><MIN>";<U028C> % LATIN SMALL LETTER TURNED V <U0245> <S0076>;"<BASE><VRNT2>";"<CAP><MIN>";<U0245> % LATIN CAPITAL LETTER TURNED V <U028D> <S0077>;"<BASE><VRNT1>";"<MIN><MIN>";<U028D> % LATIN SMALL LETTER TURNED W <U01BF> <S0077>;"<BASE><VRNT2>";"<MIN><MIN>";<U01BF> % LATIN LETTER WYNN <U01F7> <S0077>;"<BASE><VRNT2>";"<CAP><MIN>";<U01F7> % LATIN CAPITAL LETTER WYNN <U028F> <S0079>;"<BASE><VRNT1>";"<MIN><MIN>";<U028F> % LATIN LETTER SMALL CAPITAL Y <U01B4> <S0079>;"<BASE><VRNT2>";"<MIN><MIN>";<U01B4> % LATIN SMALL LETTER Y WITH HOOK <U01B3> <S0079>;"<BASE><VRNT2>";"<CAP><MIN>";<U01B3> % LATIN CAPITAL LETTER Y WITH HOOK <U028E> <S0079>;"<BASE><VRNT3>";"<MIN><MIN>";<U028E> % LATIN SMALL LETTER TURNED Y <U021D> <S0079>;"<BASE><VRNT4>";"<MIN><MIN>";<U021D> % LATIN SMALL LETTER YOGH <U021C> <S0079>;"<BASE><VRNT4>";"<CAP><MIN>";<U021C> % LATIN CAPITAL LETTER YOGH <U01B6> <S007A>;"<BASE><VRNT1>";"<MIN><MIN>";<U01B6> % LATIN SMALL LETTER Z WITH STROKE <U01B5> <S007A>;"<BASE><VRNT1>";"<CAP><MIN>";<U01B5> % LATIN CAPITAL LETTER Z WITH STROKE <U0225> <S007A>;"<BASE><VRNT2>";"<MIN><MIN>";<U0225> % LATIN SMALL LETTER Z WITH HOOK <U0224> <S007A>;"<BASE><VRNT2>";"<CAP><MIN>";<U0224> % LATIN CAPITAL LETTER Z WITH HOOK <U0290> <S007A>;"<BASE><VRNT3>";"<MIN><MIN>";<U0290> % LATIN SMALL LETTER Z WITH RETROFLEX HOOK <U0291> <S007A>;"<BASE><VRNT4>";"<MIN><MIN>";<U0291> % LATIN SMALL LETTER Z WITH CURL <U0292> <S007A>;"<BASE><VRNT5>";"<MIN><MIN>";<U0292> % LATIN SMALL LETTER EZH <U01B7> <S007A>;"<BASE><VRNT5>";"<CAP><MIN>";<U01B7> % LATIN CAPITAL LETTER EZH <U01EF> <S007A>;"<BASE><VRNT5><CARON>";"<MIN><MIN><MIN>";<U01EF> % LATIN SMALL LETTER EZH WITH CARON <U01EE> <S007A>;"<BASE><VRNT5><CARON>";"<CAP><MIN><MIN>";<U01EE> % LATIN CAPITAL LETTER EZH WITH CARON <U01B9> <S007A>;"<BASE><VRNT6>";"<MIN><MIN>";<U01B9> % LATIN SMALL LETTER EZH REVERSED <U01B8> <S007A>;"<BASE><VRNT6>";"<CAP><MIN>";<U01B8> % LATIN CAPITAL LETTER EZH REVERSED <U01BA> <S007A>;"<BASE><VRNT7>";"<MIN><MIN>";<U01BA> % LATIN SMALL LETTER EZH WITH TAIL <U0293> <S007A>;"<BASE><VRNT8>";"<MIN><MIN>";<U0293> % LATIN SMALL LETTER EZH WITH CURL % Greek % ISO14651_2006_TABLE1_en.txt now contains the tailorings of CR 14400 in its CTT % Full conformance with GOST requirements for Cyrillic letters <U0453> <S0452>;"<BASE><VRNT1>";"<MIN><MIN>";<U0453> % CYRILLIC SMALL LETTER GJE <U0403> <S0452>;"<BASE><VRNT1>";"<CAP><MIN>";<U0403> % CYRILLIC CAPITAL LETTER GJE <U045C> <S045B>;"<BASE><VRNT1>";"<MIN><MIN>";<U045C> % CYRILLIC SMALL LETTER KJE <U040C> <S045B>;"<BASE><VRNT1>";"<CAP><MIN>";<U040C> % CYRILLIC CAPITAL LETTER KJE % Georgian: Identical to ISO14651_2006_TABLE1_en.txt % Armenian: <U0587> <ECH-YIWN>;<BASE>;<CAP>;<U0587> % ARMENIAN SMALL LIGATURE ECH YIWN reorder-end %% for EOR's EORDeltaTable
Annex A (informative): Principles behind the European Ordering Rules
A.0 Introduction
This annex aims to present the information inherent in section 6 in a more accessible form for those who are interested in the principles guiding the composition of the table. Those readers not concerned with implementation details may take this more traditional treatment of the matter as an authoritative interpretation of the body of this European Standard.
A.1 Terms and definitions
For the purpose of this annex, the following definitions apply in addition to those in the body of this European Standard (see section 3).
A.1.1
digit
any of the characters 0 (U0030), 1 (U0031), 2 (U0032), 3 (U0033), 4 (U0034), 5 (U0035), 6 (U0036), 7 (U0037), 8 (U0038), 9 (U0039)
A.1.2
letter
character used to represent (either alone or in combination) sounds or sequences of sounds of a natural language in writing
NOTE: Here equivalent to all characters of the Multilingual European Subset No 3 whose name contains one of the words LETTER or LIGATURE
A.1.3
first level letter
character that is a member of the following list of letters:
Latin script:
- a (U0061), A (U0041), b (U0062), B (U0042), c (U0063), C (U0043), d (U0064), D (U0044), e (U0065), E (U0045), f (U0066), F (U0046), g (U0067), G (U0047), h (U0068), H (U0048), i (U0069), I (U0049), j (U006A), J (U004A), k (U006B), K (U004B), l (U006C), L (U004C), m (U006D), M (U004D), n (U006E), N (U004E), o (U006F), O (U004F), p (U0070), P (U0050), q (U0071), Q (U0051), r (U0072), R (U0052), s (U0073), S (U0053), t (U0074), T (U0054), u (U0075), U (U0055), v (U0076), V (U0056), w (U0077), W (U0057), x (U0078), X (U0058), y (U0079), Y (U0059), z (U007A), Z (U005A), þ (U00FE), Þ (U00DE)
Greek script:
α (U03B1), Α (U0391), β (U03B2), Β (U0392), γ (U03B3), Γ (U0393), δ (U03B4), Δ (U0394), ε (U03B5), Ε (U0395), Ϝ (U03DC), Ϛ (U03DA), ζ (U03B6), Ζ (U0396), η (U03B7), Η (U0397), θ (U03B8), Θ (U0398), ι (U03B9), Ι (U0399), κ (U03BA), Κ (U039A), λ (U03BB), Λ (U039B), μ (U03BC), Μ (U039C), ν (U03BD), Ν (U039D), ξ (U03BE), Ξ (U039E), ο (U03BF), Ο (U039F), π (U03C0), Π (U03A0), Ϟ (U03DE), ρ (U03C1), Ρ (U03A1), σ (U03C3), Σ (U03A3), τ (U03C4), Τ (U03A4), υ (U03C5), Υ (U03A5), φ (U03C6), Φ (U03A6), χ (U03C7), Χ (U03A7), ψ (U03C8), Ψ (U03A8), ω (U03C9), Ω (U03A9), Ϡ (U03E0)
NOTE Stigma Ϛ (U03DA / U03DB), Qoppa Ϟ (U03DE / U03DF) and Sampi Ϡ (U03E0 / U03E1) are archaic letters that are currently used to designate numerals. Digamma Ϝ (U03DC) is not used in any modern language.
NOTE Through collection 9 ”Greek Symbols and Coptic” of ISO/IEC 10646:2003 MES contains also a number of Coptic letters. Their order is specified in ISO/IEC 14651:2007.
Cyrillic script:
а (U0430), А (U0410), ӑ (U04D1), Ӑ (U04D0), ӓ (U04D3), Ӓ (U04D2), ә (U04D9), Ә (U04D8), ӛ (U04DB), Ӛ (U04DA), ӕ (U04D5), Ӕ (U04D4), б (U0431), Б (U0411), в (U0432), В (U0412), г (U0433), Г (U0413), ғ (U0493), Ғ (U0492), ҕ (U0495), Ҕ (U0494), д (U0434), Д (U0414), ђ (U0452), Ђ (U0402), ҙ (U0499), Ҙ (U0498), е (U0435), Е (U0415), ӗ (U04D7), Ӗ (U04D6), є (U0454), Є (U0404), ж (U0436), Ж (U0416), ӝ (U04DD), Ӝ (U04DC), җ (U0497), Җ (U0496), з (U0437), З (U0417), ӟ (U04DF), Ӟ (U04DE), ѕ (U0455), Ѕ (U0405), ӡ (U04E1), Ӡ (U04E0), и (U0438), И (U0418), ӥ (U04E5), Ӥ (U04E4), і (U0456), І (U0406), ї (U0457), Ї (U0407), й (U0439), Й (U0419), ҋ (U048B), Ҋ (U048A), ј (U0458), Ј (U0408), к (U043A), К (U041A), қ (U049B), Қ (U049A), ӄ (U04C4), Ӄ (U04C3), ҡ (U04A1), Ҡ (U04A0), ҟ (U049F), Ҟ (U049E), ҝ (U049D), Ҝ (U049C), л (U043B), Л (U041B), ӆ (U04C6), Ӆ (U04C5), љ (U0459), Љ (U0409), м (U043C), М (U041C), ӎ (U04CE), Ӎ (U04CD), н (U043D), Н (U041D), ӊ (U04CA), Ӊ (U04C9), ң (U04A3), Ң (U04A2), ӈ (U04C8), Ӈ (U04C7), ҥ (U04A5), Ҥ (U04A4), њ (U045A), Њ (U040A), о (U043E), О (U041E), ӧ (U04E7), Ӧ (U04E6), ө (U04E9), Ө (U04E8), ӫ (U04EB), Ӫ (U04EA), п (U043F), П (U041F), ҧ (U04A7), Ҧ (U04A6), ҁ (U0481), Ҁ (U0480), р (U0440), Р (U0420), ҏ (U048F), Ҏ (U048E), с (U0441), С (U0421), ҫ (U04AB), Ҫ (U04AA), т (U0442), Т (U0422), ҭ (U04AD), Ҭ (U04AC), ћ (U045B), Ћ (U040B), у (U0443), У (U0423), ў (U045E), Ў (U040E), ӱ (U04F1), Ӱ (U04F0), ӳ (U04F3), Ӳ (U04F2), ү (U04AF), Ү (U04AE), ұ (U04B1), Ұ (U04B0), ѹ (U0479), Ѹ (U0478), ф (U0444), Ф (U0424), х (U0445), Х (U0425), ҳ (U04B3), Ҳ (U04B2), һ (U04BB), Һ (U04BA), ѡ (U0461), Ѡ (U0460), ѿ (U047F), Ѿ (U047E), ѽ (U047D), Ѽ (U047C), ѻ (U047B), Ѻ (U047A), ц (U0446), Ц (U0426), ҵ (U04B5), Ҵ (U04B4), ч (U0447), Ч (U0427), ӵ (U04F5), Ӵ (U04F4), ҷ (U04B7), Ҷ (U04B6), ӌ (U04CC), Ӌ (U04CB), ҹ (U04B9), Ҹ (U04B8), ҽ (U04BD), Ҽ (U04BC), ҿ (U04BF), Ҿ (U04BE), џ (U045F), Џ (U040F), ш (U0448), Ш (U0428), щ (U0449), Щ (U0429), ъ (U044A), Ъ (U042A), ы (U044B), Ы (U042B), ӹ (U04F9), Ӹ (U04F8), ь (U044C), Ь (U042C), ҍ (U048D), Ҍ (U048C), ѣ (U0463), Ѣ (U0462), э (U044D), Э (U042D), ӭ (U04ED), Ӭ (U04EC), ю (U044E), Ю (U042E), я (U044F), Я (U042F), ѥ (U0465), Ѥ (U0464), ѧ (U0467), Ѧ (U0466), ѫ (U046B), Ѫ (U046A), ѩ (U0469), Ѩ (U0468), ѭ (U046D), Ѭ (U046C), ѯ (U046F), Ѯ (U046E), ѱ (U0471), Ѱ (U0470), ѳ (U0473), Ѳ (U0472), ѵ (U0475), Ѵ (U0474), ѷ (U0477), Ѷ (U0476), ҩ (U04A9), Ҩ (U04A8), Ӏ (U04C0)
Georgian script:
ა (U10D0), ბ (U10D1), გ (U10D2), დ (U10D3), ე (U10D4), ვ (U10D5), ზ (U10D6), ჱ (U10F1), თ (U10D7), ი (U10D8), კ (U10D9), ლ (U10DA), მ (U10DB), ნ (U10DC), ჲ (U10F2), ო (U10DD), პ (U10DE), ჟ (U10DF), რ (U10E0), ს (U10E1), ტ (U10E2), ჳ (U10F3), უ (U10E3), ფ (U10E4), ქ (U10E5), ღ (U10E6), ყ (U10E7), შ (U10E8), ჩ (U10E9), ც (U10EA), ძ (U10EB), წ (U10EC), ჭ (U10ED), ხ (U10EE), ჴ (U10F4), ჯ (U10EF), ჰ (U10F0), ჵ (U10F5), ჶ (U10F6), ჷ (U10F7), ჸ (U10F8)
NOTE: ჳ (U10F3), ჴ (U10F4), ჵ (U10F5) and ჶ (U10F6) are today considered archaic letters.
Armenian script:
- ա (U0561), Ա (U0531), բ (U0562), Բ (U0532), գ (U0563), Գ (U0533), դ (U0564), Դ (U0534), ե (U0565), Ե (U0535), զ (U0566), Զ (U0536), է (U0567), Է (U0537), ը (U0568), Ը (U0538), թ (U0569), Թ (U0539), ժ (U056A), Ժ (U053A), ի (U056B), Ի (U053B), լ (U056C), Լ (U053C), խ (U056D), Խ (U053D), ծ (U056E), Ծ (U053E), կ (U056F), Կ (U053F), հ (U0570), Հ (U0540), ձ (U0571), Ձ (U0541), ղ (U0572), Ղ (U0542), ճ (U0573), Ճ (U0543), մ (U0574), Մ (U0544), յ (U0575), Յ (U0545), ն (U0576), Ն (U0546), շ (U0577), Շ (U0547), ո (U0578), Ո (U0548), չ (U0579), Չ (U0549), պ (U057A), Պ (U054A), ջ (U057B), Ջ (U054B), ռ (U057C), Ռ (U054C), ս (U057D), Ս (U054D), վ (U057E), Վ (U054E), տ (U057F), Տ (U054F), ր (U0580), Ր (U0550), ց (U0581), Ց (U0551), ւ (U0582), Ւ (U0552), փ (U0583), Փ (U0553), ք (U0584), Ք (U0554), և (U0587), օ (U0585), Օ (U0555), ֆ (U0586), Ֆ (U0556)
A.1.4
diacritical mark
any of a number of recurring graphical structures placed over, under or next to a first level letter which does not significantly modify the shape of the first level letter itself and which in combination with that first level letter is a valid letter.
NOTE These structures modify meaning or pronunciation or some other feature of the first level letter. The diacritical marks which are relevant to this European Standard are listed in section A.8.1.1
A.1.5
letter with diacritical marks
letter which can be seen as equivalent to the combination between a first level letter and one or more diacritical marks
NOTE Some letters with diacritical marks are treated as first level letters in some languages, e.g. ä in Swedish and ñ in Spanish. However, these are subject to national standards or local practices which are outside the scope of this European Standard.
NOTE Very few Latin letters such as ǻ (U01FB) have more than one diacritical mark. A considerable number of Greek letters have more than one diacritical mark.
A.1.6
equivalent letter form
character created by joining two or more distinct first level letters or two or more letters with diacritical marks or any combination of these
NOTE Examples for equivalent letter forms in MES are the LATIN SMALL LIGATURE FI (UFB01), LATIN SMALL LIGATURE FL (UFB02), and the Croatian dz (U01F2 / U01F3).
A.1.7
second level letter
letter that is neither a first level letter nor an equivalent letter form nor a letter with diacritical marks
NOTE The second level letters which are relevant to this European Standard are listed in A.8.2
A.1.8
capital letter
letter which has the string CAPITAL in its name in ISO/IEC 10646
NOTE This definition works for the repertoire of MES, but not necessarily for the full repertoire of the UCS.
NOTE A capital letter is also known as an uppercase letter
NOTE For the first level letters these are
Latin script:
- A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Þ
Greek script:
- Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω
Cyrillic script:
- А Ӑ Ӓ Ә Ӛ Ӕ Б В Г Ғ Ҕ Д Ђ Ҙ Е Ӗ Є Ж Ӝ Җ З Ӟ Ѕ Ӡ И Ӥ Ҋ І Ї Й Ј К Қ Ӄ Ҡ Ҟ Ҝ Л Ӆ Љ М Ӎ Н Ӊ Ң Ӈ Ҥ Њ О Ӧ Ө Ӫ П Ҧ Ҁ Р Ҏ С Ҫ Т Ҭ Ћ У Ў Ӱ Ӳ Ү Ұ Ѹ Ф Х Ҳ Һ Ѡ Ѿ Ѽ Ѻ Ц Ҵ Ч Ӵ Ҷ Ӌ Ҹ Ҽ Ҿ Џ Ш Щ Ъ Ы Ӹ Ь Ҍ Ѣ Э Ӭ Ю Я Ѥ Ѧ Ѫ Ѩ Ѭ Ѯ Ѱ Ѳ Ѵ Ѷ Ҩ
Georgian script:
- (none in MES)
NOTE The function of capital letters in Georgian differs significantly from the function of capital letters in the other four scripts. The Georgian letters which ISO/IEC 10646:2003 calls GEORGIAN CAPITAL LETTER make up the asomtavruli script that is primarily used in Old Georgian texts. The remaining letters (classified simply as GEORGIAN LETTER in ISO/IEC 10646:2003) are usually identified with the mxedruli or military script that is used almost exclusively for writing modern Georgian. For this reason the MES collection only comprises the "Basic Georgian" collection (10D0-10FF) with the mxedruli script.
Armenian script:
- Ա Բ Գ Դ Ե Զ Է Ը Թ Ժ Ի Լ Խ Ծ Կ Հ Ձ Ղ Ճ Մ Յ Ն Շ Ո Չ Պ Ջ Ռ Ս Վ Տ Ր Ց Ւ Փ Ք Օ Ֆ
A.1.9
small letter
letter which is not a capital letter
NOTE: A small letter is also known as a lowercase letter
A.1.10
special character
character that is neither a letter nor a digit
NOTE Special characters are often called symbols, but also include punctuation marks, apostrophes, mathematical operators, monetary symbols and others.
A.1.11
space character
one of the special characters listed in 20.1 of ISO/IEC 10646:2003
NOTE: The space characters also in MES are 0020 SPACE,00A0 NO-BREAK SPACE, 2000 EN QUAD, 2001 EM QUAD, 2002 EN SPACE, 2003 EM SPACE, 2004 THREE-PER-EM SPACE, 2005 FOUR-PER-EM SPACE, 2006 SIX-PER-EM SPACE, 2007 FIGURE SPACE, 2008 PUNCTUATION SPACE, 2009 THIN SPACE, 200A HAIR SPACE
NOTE A number of different types of "spaces" such as tabulators or line breaks exist which are not part of MES, but which are used often in various fields of application. These may be understood as space characters for the purposes of this annex.
A.2 Preparatory procedures
A.2.1 Purpose
Most ordering tasks require more than simply the ordering of strings. In a telephone directory, for example, one might want to order by names first, followed by addresses and phone numbers, recurring to addresses only when ordering by names fails to establish a unique sequence and to phone numbers only if both names and addresses are identical.
Each of these units is called a key and the approach is called the multiple ordering key approach.
A.2.2 Methodology
More rigorously expressed, the multiple ordering key approach implies the preprocessing of the data in the following steps, any or all of which may be omitted, especially in the case of a single ordering key:
- subdivision of data into multiple ordering keys through the introduction of a higher level protocol
- establishing a hierarchy between these keys
- extracting the keys from the data
- subjecting the keys to some form of normalization
NOTE This normalization might include, but is not limited to: changing capital letters to small letters where it is considered appropriate (e. g. in the case of sentence initial capitals or capitals for emphasis), lemmatization (especially for inflected languages), expansion of abbreviations, or reduction of blanks between words to one throughout the data. It can also be left out entirely. NOTE An especially important step is usually the correct treatment of numeral strings where leading zeroes might have to be introduced to ensure proper comparisons between corresponding decimals. Failure to do so may result in faulty ordering.
Starting with the keys highest in the hierarchy equivalent keys which were thus obtained are compared with the aid of the ordering rules as established in this European Standard. As soon as a unique sequence is established, further keys are ignored.
A.2.3 Further preprocessing
Further preprocessing of some kind may or may not be necessary, but is not within the scope of this European Standard.
This European Standard assumes that users have already performed these preparatory procedures which are left entirely to their discretion and are thus out of its scope. It is concerned exclusively with the ordering of strings which belong to one key and which have undergone those preparatory procedures.
A.3 The multilevel ordering procedure
A.3.1 General principles
This European Standard defines in this informative annex a multilevel ordering procedure whose results are identical to those produced by the application of the rules of the body of this standard.
Multilevel ordering procedure means that the input strings are first compared on the first ordering level. Only when the procedure described for this level fails to establish a unique and determined sequence for the strings the different parts of the second ordering level are taken into consideration. If this likewise fails to produce a unique sequence the third ordering level is invoked, and after this the fourth ordering level. If this also cannot establish a unique sequence, two strings are regarded as equivalent.
Each level compares two strings in the following manner: The first non-ignored characters are compared. If the ordering rules for that level specify a unique and determined sequence for these characters then this determines the sequence of the strings. If not, the second non-ignored characters are compared, and so forth until one of the following conditions is met. If more than one of the conditions are true, only the first one which is fulfilled is applicable:
- the ordering rules for that level define a unique sequence for the two non-ignored characters which is then also the ordering sequence for the strings;
- one of the strings has no more non-ignored characters whereas the other has. Then the string without more characters precedes the other one;
- both strings have no more non-ignored characters. Then the next ordering level, if existing, is invoked. If there are no more levels, the two strings are deemed equivalent.
A.3.2 Assumptions and aims
This European Standard acts according to certain assumptions:
- access to information must be facilitated as much as possible;
- the user is not assumed to know details of ISO/IEC 10646;
- the rules are derived from standardized rules and common practice in a large number of European languages without giving preference to the rules of any language or languages in particular;
These assumptions motivate a set of principles that underlie these European Ordering Rules and help to clarify the decisions taken:
- second level letters are ordered according to their visual appearance, not according to their pronunciation or meaning unless user-expectation demands something else;
- forms which the user perceives as more basic should precede special or combined ones. Forms used primarily for emphasis should likewise follow after more basic forms.
A.3.3 Rules (valid throughout)
A.3.3.1 Ordering by script
Digits precede letters. Letters are ordered by scripts, putting Latin letters before Greek ones before Cyrillic ones before Georgian ones before Armenian ones.
A.3.3.2 Equivalent letter forms
Equivalent letter forms are decomposed into the letters out of which they are formed.
A.4 First ordering level
A.4.1 Validity
All of the following rules are valid for the first ordering level only.
A.4.2 Equivalent or ignored characters
A.4.2.1 Capital and small letters
Capital and small forms of the same letter are treated as equivalent.
A.4.2.2 Second level letters
Second level letters are treated as equivalent to one or more first level letters as specified in section A.8.2.
A.4.2.3 Letters with diacritical marks
Letters with diacritical marks are treated as equivalent to their corresponding first level letters.
NOTE For the definition of first level letters please cf. section A.1.3.
A.4.2.4 Special characters
Special characters are ignored.
A.4.3 Ordering sequences
A.4.3.1 Digits
Digits are to be ordered in the following sequence:
- 0 1 2 3 4 5 6 7 8 9
A.4.3.2 Latin script
Latin first level letters are to be ordered in the following sequence:
- a b c d e f g h i j k l m n o p q r s t u v w x y z þ
A.4.3.3 Greek script
Greek first level letters are to be ordered in the following sequence:
- α β γ δ ε Ϝ Ϛ ζ η θ ι κ λ μ ν ξ ο π Ϟ ρ σ τ υ φ χ ψ ω Ϡ
A.4.3.4 Cyrillic script
Cyrillic first level letters are to be ordered in the following sequence:
- а ӑ ӓ ә ӛ ӕ б в г ғ ҕ д ђ ҙ е ӗ є ж ӝ җ з ӟ ѕ ӡ и ӥ і ї й ҋ ј к қ ӄ ҡ ҟ ҝ л ӆ љ м ӎ н ӊ ң ӈ ҥ њ о ӧ ө ӫ п ҧ ҁ р ҏ с ҫ т ҭ ћ у ў ӱ ӳ ү ұ ѹ ф х ҳ һ ѡ ѿ ѽ ѻ ц ҵ ч ӵ ҷ ӌ ҹ ҽ ҿ џ ш щ ъ ы ӹ ь ҍ ѣ э ӭ ю я ѥ ѧ ѫ ѩ ѭ ѯ ѱ ѳ ѵ ѷ ҩ Ӏ
NOTE This sequence is based on pan-Cyrillic requirements as specified by GOST. It was officially communicated to the editor of this European Standard by GOST's designated expert in the field and maximally facilitates the process of finding information in pan-Cyrillic texts.
A.4.3.5 Georgian script
Georgian first level letters are to be ordered in the following sequence:
- ა ბ გ დ ე ვ ზ ჱ თ ი კ ლ მ ნ ჲ ო პ ჟ რ ს ტ ჳ უ ფ ქ ღ ყ შ ჩ ც ძ წ ჭ ხ ჴ ჯ ჰ ჵ ჶ ჷ ჸ
A.4.3.6 Armenian script
Armenian first level letters are to be ordered in the following sequence:
- ա բ գ դ ե զ է ը թ ժ ի լ խ ծ կ հ ձ ղ ճ մ յ ն շ ո չ պ ջ ռ ս վ տ ր ց ւ փ ք և օ ֆ
A.5 Second ordering level
A.5.1 No unique sequence after the first ordering level
If the first ordering level does not result in a unique sequence, the second ordering level is invoked. It is distinguished from the first ordering level by no longer treating letters with diacritical marks and second level letters as equivalent to first level letters.
The second ordering level is divided into two parts: second level letters and diacritical marks. If the treatment of second level letters alone results in a unique sequence, diacritical marks are to be ignored.
A.5.2 Equivalent or ignored characters
A.5.2.1 Capital and small letters
Capital and small forms of the same letter are treated as equivalent.
A.5.2.2 Special characters
Special characters are ignored.
A.5.3 Ordering sequences
A.5.3.1 Second level letters
Second level letters are to be ordered after their corresponding first level letter. In the case of multiple second level letters with the same first level letter they are to be ordered in the sequence specified by A.8.2.
A.5.3.2 Letters with diacritical marks
Letters with diacritical marks that have only one diacritical mark are to be ordered with respect to their diacritical mark in the sequence indicated in section A.8.1.1. For letters with more than one diacritical mark, the diacritical mark shall be considered in the following order: Inside the character before outside; below the character before above; working from bottom to top, then from left to right. In practice, this results for MES in the sequence indicated in section A.8.1.2.
NOTE In some European countries, notably in France, diacritics are treated differently from this European Standard by parsing diacritics backwards within each word. For applications targeted for this market this must be taken into consideration by the declaration of a suitable delta.
A.6 Third ordering level
A.6.1 No unique sequence after the second ordering level
If the second ordering level also does not result in a unique sequence of strings, the third ordering level is invoked. It no longer treats capital and small letters as equivalent.
A.6.2 Ignored characters
Special characters are ignored.
A.6.3 Ordering sequences
A.6.3.1 Capitalization
Small letters are ordered before the corresponding capital ones.
A.7 Fourth ordering level
A.7.1 No unique sequence after the third ordering level
If the third ordering level likewise does not result in a unique sequence of strings, the fourth ordering level is invoked. It takes special characters into account.
A.7.2.Sequence of special characters
Most special characters of the Multilingual European Subset No 3 except for currency signs are ordered in the sequence of the default tailorable template of ISO/IEC 14651:2007. For most special characters this is the order in which they are listed in ISO/IEC 10646 and relevant appendices. However, for a number of special characters ISO/IEC 14651 defines a divergent sequence in line with the specification of the Canadian standard CAN/CSA Z243.230-1996.
NOTE It is advised to pay particular attention to special characters that may have the role of structuring entries in some manner. These include punctuation marks, hyphens, apostrophes and brackets.
A.7.3 Equivalence
Two strings between which after the fourth ordering level no unique sequence can be established are considered to be equivalent.
NOTE For further options to break the deadlock in certain circumstances please cf. the informative annex C: Ordering by position and by style.
A.8 Specific ordering sequences
A.8.1 Diacritical marks
A.8.1.1 Diacritical marks
This form of presentation has been chosen to enable the unification of diacritical marks across scripts without modifying the resulting sequence of strings. Official Greek names of the diacritics are underlined.
Shape1 |
Diacritical mark2 |
Alternative names3 |
|
᾿ |
U0313 |
COMBINING COMMA ABOVE |
PSILI (spacing U1FBF) / spiritus lenis |
̔ |
U0314 |
COMBINING REVERSED COMMA ABOVE |
DASIA (spacing U1FFE) / spiritus asper |
´ |
U0301 |
COMBINING ACUTE ACCENT |
OXIA, Tonos |
` |
U0300 |
COMBINING GRAVE ACCENT |
VARIA |
˘ |
U0306 |
COMBINING BREVE |
VRACHY |
̂ |
U0302 |
COMBINING CIRCUMFLEX ACCENT |
|
̌ |
U030C |
COMBINING CARON |
|
˚ |
U030A |
COMBINING RING ABOVE |
|
῀ |
U0342 |
COMBINING GREEK PERISPOMENI |
|
¨ |
U0308 |
COMBINING DIAERESIS |
DIALYTICA, umlaut, trema4 |
˝ |
U030B |
COMBINING DOUBLE ACUTE ACCENT |
|
˜ |
U0303 |
COMBINING TILDE |
|
˙ |
U0307 |
COMBINING DOT ABOVE |
|
¸ |
U0327 |
COMBINING CEDILLA |
|
˛ |
U0328 |
COMBINING OGONEK |
|
̄ |
U0304 |
COMBINING MACRON |
Greek macron, length |
̦ |
U0326 |
COMBINING COMMA BELOW5 |
|
ι |
U1FBE |
PROSGEGRAMMENI |
iota adscriptum) |
ͅ |
U0345 |
COMBINING GREEK YPOGEGRAMMENI |
iota subscriptum6 |
A.8.1.2 Multiple diacritical marks
Shape |
Diacritical mark7 |
|
ἄ8 |
U1FCE |
PSILI AND OXIA |
ᾄ |
─ |
PSILI AND OXIA AND YPOGEGRAMMENI |
ἂ |
U1FCD |
PSILI AND VARIA |
ᾂ |
─ |
PSILI AND VARIA AND YPOGEGRAMMENI |
ἆ |
U1FCF |
PSILI AND PERISPOMENI |
ᾆ |
─ |
PSILI AND PERISPOMENI AND YPOGEGRAMMENI |
ᾀ |
─ |
PSILI AND YPOGEGRAMMENI |
ἅ |
U1FDE |
DASIA AND OXIA |
ᾅ |
─ |
DASIA AND OXIA AND YPOGEGRAMMENI |
ἃ |
U1FDD |
DASIA AND VARIA |
ᾃ |
─ |
DASIA AND VARIA AND YPOGEGRAMMENI |
ἇ |
U1FDF |
DASIA AND PERISPOMENI |
ᾇ |
─ |
DASIA AND PERISPOMENI AND YPOGEGRAMMENI |
ᾁ |
─ |
DASIA AND YPOGEGRAMMENI |
ᾴ |
─ |
OXIA AND YPOGEGRAMMENI |
ᾲ |
─ |
VARIA AND YPOGEGRAMMENI |
ǻ |
─ |
RING ABOVE AND ACUTE |
ᾷ |
─ |
PERISPOMENI AND YPOGEGRAMMENI |
΅ |
U1FEE |
DIALYTIKA AND OXIA |
΅ |
U0385 |
DIALYTIKA AND TONOS |
῭ |
U1FED |
DIALYTIKA AND VARIA |
῁ |
U1FC1 |
DIALYTIKA AND PERISPOMENI |
ȫ |
─ |
DIAERESIS AND MACRON |
A.8.2 Second level letters
Shape |
Position and name of second level letter in ISO/IEC 10646:2003 |
Equiv. FOL9 |
|
ɐ |
U0250 |
LATIN SMALL LETTER TURNED A |
a |
ɑ |
U0251 |
LATIN SMALL LETTER ALPHA |
a |
ɒ |
U0252 |
LATIN SMALL LETTER TURNED ALPHA |
a |
æ |
LATIN SMALL LETTER AE |
ae |
|
Æ |
LATIN CAPITAL LETTER AE |
AE |
|
ǽ |
U01FD |
LATIN SMALL LETTER AE WITH ACUTE |
áe |
Ǽ |
U01FC |
LATIN CAPITAL LETTER AE WITH ACUTE |
ÁE |
ǣ |
LATIN SMALL LETTER AE WITH MACRON |
āe |
|
Ǣ |
LATIN CAPITAL LETTER AE WITH MACRON |
ĀE |
|
ʙ |
U0299 |
LATIN LETTER SMALL CAPITAL B |
b |
ƀ |
U0180 |
LATIN SMALL LETTER B WITH STROKE |
b |
Ƀ |
U0243 |
LATIN CAPITAL LETTER B WITH STROKE |
B |
ɓ |
U0253 |
LATIN SMALL LETTER B WITH HOOK |
b |
Ɓ |
U0181 |
LATIN CAPITAL LETTER B WITH HOOK |
B |
ƃ |
U0183 |
LATIN SMALL LETTER B WITH TOPBAR |
b |
Ƃ |
U0182 |
LATIN CAPITAL LETTER B WITH TOPBAR |
B |
ƈ |
U0188 |
LATIN SMALL LETTER C WITH HOOK |
c |
Ƈ |
U0187 |
LATIN CAPITAL LETTER C WITH HOOK |
C |
ɕ |
U0255 |
LATIN SMALL LETTER C WITH CURL |
c |
ʗ |
U0297 |
LATIN LETTER STRETCHED C |
c |
đ |
U0111 |
LATIN SMALL LETTER D WITH STROKE |
d |
Đ |
U0110 |
LATIN CAPITAL LETTER D WITH STROKE |
D |
ð |
LATIN SMALL LETTER ETH |
d |
|
Ð |
LATIN CAPITAL LETTER ETH |
D |
|
ɖ |
U0256 |
LATIN SMALL LETTER D WITH TAIL |
d |
Ɖ |
U0189 |
LATIN CAPITAL LETTER AFRICAN D |
D |
ɗ |
U0257 |
LATIN SMALL LETTER D WITH HOOK |
d |
Ɗ |
U018A |
LATIN CAPITAL LETTER D WITH HOOK |
D |
ƌ |
U018C |
LATIN SMALL LETTER D WITH TOPBAR |
d |
Ƌ |
U018B |
LATIN CAPITAL LETTER D WITH TOPBAR |
D |
ȡ |
U0221 |
LATIN SMALL LETTER D WITH CURL |
d |
ƍ |
U018D |
LATIN SMALL LETTER TURNED DELTA |
d |
ʥ |
LATIN SMALL LETTER DZ DIGRAPH WITH CURL |
dz |
|
ʤ |
LATIN SMALL LETTER DEZH DIGRAPH |
dz |
|
ǝ |
U01DD |
LATIN SMALL LETTER TURNED E |
e |
Ǝ |
U018E |
LATIN CAPITAL LETTER REVERSED E |
E |
ə |
U0259 |
LATIN SMALL LETTER SCHWA |
e |
Ə |
U018F |
LATIN CAPITAL LETTER SCHWA |
E |
ɛ |
U025B |
LATIN SMALL LETTER OPEN E |
e |
Ɛ |
U0190 |
LATIN CAPITAL LETTER OPEN E |
E |
ɘ |
U0258 |
LATIN SMALL LETTER REVERSED E |
e |
ɚ |
U025A |
LATIN SMALL LETTER SCHWA WITH HOOK |
e |
ɜ |
U025C |
LATIN SMALL LETTER REVERSED OPEN E |
e |
ɝ |
U025D |
LATIN SMALL LETTER REVERSED OPEN E WITH HOOK |
e |
ɞ |
U025E |
LATIN SMALL LETTER CLOSED REVERSED OPEN E |
e |
ʚ |
U029A |
LATIN SMALL LETTER CLOSED OPEN E |
e |
ƒ |
U0192 |
LATIN SMALL LETTER F WITH HOOK |
f |
Ƒ |
U0191 |
LATIN CAPITAL LETTER F WITH HOOK |
F |
ɡ |
U0261 |
LATIN SMALL LETTER SCRIPT G |
g |
ɢ |
U0262 |
LATIN LETTER SMALL CAPITAL G |
g |
ǥ |
LATIN SMALL LETTER G WITH STROKE |
g |
|
Ǥ |
LATIN CAPITAL LETTER G WITH STROKE |
G |
|
ɠ |
U0260 |
LATIN SMALL LETTER G WITH HOOK |
g |
Ɠ |
U0193 |
LATIN CAPITAL LETTER G WITH HOOK |
G |
ʛ |
U029B |
LATIN SMALL CAPITAL LETTER G WITH HOOK |
g |
ɣ |
U0263 |
LATIN SMALL LETTER GAMMA |
g |
Ɣ |
U0194 |
LATIN CAPITAL LETTER GAMMA |
G |
ɤ |
U0264 |
LATIN SMALL LETTER RAMS HORN |
g |
ƣ10 |
LATIN SMALL LETTER OI |
g |
|
Ƣ |
LATIN CAPITAL LETTER OI |
G |
|
ħ |
U0127 |
LATIN SMALL LETTER H WITH STROKE |
h |
Ħ |
U0126 |
LATIN CAPITAL LETTER H WITH STROKE |
H |
ʜ |
U029C |
LATIN LETTER SMALL CAPITAL H |
h |
ɦ |
U0266 |
LATIN SMALL LETTER H WITH HOOK |
h |
ɧ |
U0267 |
LATIN SMALL LETTER HENG WITH HOOK |
h |
ɥ |
U0265 |
LATIN SMALL LETTER TURNED H |
h |
ʮ |
U02AE |
LATIN SMALL LETTER TURNED H WITH FISHHOOK |
h |
ʯ |
U02AF |
LATIN SMALL LETTER TURNED H WITH FISHHOOK AND TAIL |
h |
ƕ |
U0195 |
LATIN SMALL LETTER HV |
hv |
Ƕ |
LATIN CAPITAL LETTER HWAIR |
HV |
|
ⁱ |
U2071 |
SUPERSCRIPT LATIN SMALL LETTER I |
i |
ı |
U0131 |
LATIN SMALL LETTER DOTLESS I |
i |
ɪ |
U026A |
LATIN LETTER SMALL CAPITAL I |
i |
ɨ |
U0268 |
LATIN SMALL LETTER I WITH STROKE |
i |
Ɨ |
U0197 |
LATIN CAPITAL LETTER I WITH STROKE |
I |
ɩ |
U0269 |
LATIN SMALL LETTER IOTA |
i |
Ɩ |
U0196 |
LATIN CAPITAL LETTER IOTA |
I |
ij |
U0133 |
LATIN SMALL LIGATURE IJ |
ij |
IJ |
U0132 |
LATIN CAPITAL LIGATURE IJ |
IJ |
ʝ |
U029D |
LATIN SMALL LETTER J WITH CROSSED-TAIL |
j |
ɟ |
U025F |
LATIN SMALL LETTER DOTLESS J WITH STROKE |
j |
ʄ |
U0284 |
LATIN SMALL LETTER DOTLESS J WITH STROKE AND HOOK |
j |
ƙ |
U0199 |
LATIN SMALL LETTER K WITH HOOK |
k |
Ƙ |
U0198 |
LATIN CAPITAL LETTER K WITH HOOK |
K |
ĸ |
U0138 |
LATIN SMALL LETTER KRA |
k |
ʞ |
U029E |
LATIN SMALL LETTER TURNED K |
k |
ł |
U0142 |
LATIN SMALL LETTER L WITH STROKE |
l |
Ł |
U0141 |
LATIN CAPITAL LETTER L WITH STROKE |
L |
ŀ |
U0140 |
LATIN SMALL LETTER L WITH MIDDLE DOT |
l |
Ŀ |
U013F |
LATIN CAPITAL LETTER L WITH MIDDLE DOT |
L |
ʟ |
U029F |
LATIN LETTER SMALL CAPITAL L |
l |
ƚ |
U019A |
LATIN SMALL LETTER L WITH BAR |
l |
Ƚ |
U023D |
LATIN CAPITAL LETTER L WITH BAR |
L |
ɫ |
U026B |
LATIN SMALL LETTER L WITH MIDDLE TILDE |
l |
ɬ |
U026C |
LATIN SMALL LETTER L WITH BELT |
l |
ɭ |
U026D |
LATIN SMALL LETTER L WITH RETROFLEX HOOK |
l |
ȴ |
U0234 |
LATIN SMALL LETTER L WITH CURL |
l |
ƛ |
U019B |
LATIN SMALL LETTER LAMBDA WITH STROKE |
l |
ɮ |
U026E |
LATIN SMALL LETTER LEZH |
lz |
ɱ |
U0271 |
LATIN SMALL LETTER M WITH HOOK |
m |
ɯ |
U026F |
LATIN SMALL LETTER TURNED M |
m |
Ɯ |
U019C |
LATIN CAPITAL LETTER TURNED M |
M |
ɰ |
U0270 |
LATIN SMALL LETTER TURNED M WITH LONG LEG |
m |
ⁿ |
U207F |
SUPERSCRIPT LATIN SMALL LETTER N |
n |
'n |
U0149 |
LATIN SMALL LETTER N PRECEDED BY APOSTROPHE |
n |
ɴ |
U0274 |
LATIN LETTER SMALL CAPITAL N |
n |
ɲ |
U0272 |
LATIN SMALL LETTER N WITH LEFT HOOK |
n |
Ɲ |
U019D |
LATIN CAPITAL LETTER N WITH LEFT HOOK |
N |
ƞ |
U019E |
LATIN SMALL LETTER N WITH LONG RIGHT LEG |
n |
Ƞ |
U0220 |
LATIN CAPITAL LETTER N WITH LONG RIGHT LEG |
N |
ɳ |
U0273 |
LATIN SMALL LETTER N WITH RETROFLEX HOOK |
n |
ȵ |
U0235 |
LATIN SMALL LETTER N WITH CURL |
n |
ŋ |
U014B |
LATIN SMALL LETTER ENG |
n |
Ŋ |
U014A |
LATIN CAPITAL LETTER ENG |
N |
ø |
LATIN SMALL LETTER O WITH STROKE |
o |
|
Ø |
LATIN CAPITAL LETTER O WITH STROKE |
O |
|
ǿ |
U01FF |
LATIN SMALL LETTER O WITH STROKE AND ACUTE |
o |
Ǿ |
U01FE |
LATIN CAPITAL LETTER O WITH STROKE AND ACUTE |
O |
ơ |
LATIN SMALL LETTER O WITH HORN |
o |
|
Ơ |
LATIN CAPITAL LETTER O WITH HORN |
O |
|
ɔ |
U0254 |
LATIN SMALL LETTER OPEN O |
o |
Ɔ |
U0186 |
LATIN CAPITAL LETTER OPEN O |
O |
ɵ |
U0275 |
LATIN SMALL LETTER BARRED O |
o |
Ɵ |
U019F |
LATIN CAPITAL LETTER O WITH MIDDLE TILDE |
O |
ɷ |
U0277 |
LATIN SMALL LETTER CLOSED OMEGA |
o |
ȣ |
U0223 |
LATIN SMALL LETTER OU |
o |
Ȣ |
U0222 |
LATIN CAPITAL LETTER OU |
O |
œ |
U0153 |
LATIN SMALL LIGATURE OE |
oe |
ɶ |
U0276 |
LATIN LETTER SMALL CAPITAL OE |
oe |
Œ |
U0152 |
LATIN CAPITAL LIGATURE OE |
OE |
ƥ |
LATIN SMALL LETTER P WITH HOOK |
p |
|
Ƥ |
LATIN CAPITAL LETTER P WITH HOOK |
P |
|
ɸ |
U0278 |
LATIN SMALL LETTER PHI |
p |
ʠ |
LATIN SMALL LETTER Q WITH HOOK |
q |
|
ʀ |
U0280 |
LATIN LETTER SMALL CAPITAL R |
r |
Ʀ |
LATIN LETTER YR |
R |
|
ɹ |
U0279 |
LATIN SMALL LETTER TURNED R |
r |
ɺ |
U027A |
LATIN SMALL LETTER TURNED R WITH LONG LEG |
r |
ɻ |
U027B |
LATIN SMALL LETTER TURNED R WITH HOOK |
r |
ɼ |
U027C |
LATIN SMALL LETTER R WITH LONG LEG |
r |
ɽ |
U027D |
LATIN SMALL LETTER R WITH TAIL |
r |
ɾ |
U027E |
LATIN SMALL LETTER R WITH FISHHOOK |
r |
ɿ |
U027F |
LATIN SMALL LETTER REVERSED R WITH FISHHOOK |
r |
ʁ |
U0281 |
LATIN LETTER SMALL CAPITAL INVERTED R |
r |
ſ |
U017F |
LATIN SMALL LETTER LONG S |
s |
ʂ |
U0282 |
LATIN SMALL LETTER S WITH HOOK |
s |
ʃ |
U0283 |
LATIN SMALL LETTER ESH |
s |
Ʃ |
LATIN CAPITAL LETTER ESH |
S |
|
ƪ |
U01AA |
LATIN REVERSED ESH LOOP |
s |
ʅ |
U0285 |
LATIN SMALL LETTER SQUAT REVERSED ESH |
s |
ʆ |
U0286 |
LATIN SMALL LETTER ESH WITH CURL |
s |
ß |
U00DF |
LATIN SMALL LETTER SHARP S |
ss |
ŧ |
U0167 |
LATIN SMALL LETTER T WITH STROKE |
t |
Ŧ |
U0166 |
LATIN CAPITAL LETTER T WITH STROKE |
T |
ƫ |
U01AB |
LATIN SMALL LETTER T WITH PALATAL HOOK |
t |
ƭ |
U01AD |
LATIN SMALL LETTER T WITH HOOK |
t |
Ƭ |
U01AC |
LATIN CAPITAL LETTER T WITH HOOK |
T |
ʈ |
U0288 |
LATIN SMALL LETTER T WITH RETROFLEX HOOK |
t |
Ʈ |
U01AE |
LATIN CAPITAL LETTER T WITH RETROFLEX HOOK |
T |
ȶ |
U0236 |
LATIN SMALL LETTER T WITH CURL |
t |
ʇ |
U0287 |
LATIN SMALL LETTER TURNED T |
t |
ʨ |
LATIN SMALL LETTER TC DIGRAPH WITH CURL |
tc |
|
ư |
LATIN SMALL LETTER U WITH HORN |
u |
|
Ư |
U01AF |
LATIN CAPITAL LETTER U WITH HORN |
U |
ʉ |
U0289 |
LATIN SMALL LETTER U BAR |
u |
Ʉ |
U0244 |
LATIN CAPITAL LETTER U BAR |
U |
ʊ |
U028A |
LATIN SMALL LETTER UPSILON |
u |
Ʊ |
LATIN CAPITAL LETTER UPSILON |
U |
|
ʋ |
U028B |
LATIN SMALL LETTER V WITH HOOK |
v |
Ʋ |
LATIN CAPITAL LETTER V WITH HOOK |
V |
|
ʌ |
U028C |
LATIN SMALL LETTER TURNED V |
v |
Ʌ |
U0245 |
LATIN CAPITAL LETTER TURNED V |
V |
ʍ |
U028D |
LATIN SMALL LETTER TURNED W |
w |
ƿ |
U01BF |
LATIN LETTER WYNN |
w |
Ƿ |
LATIN CAPITAL LETTER WYNN |
W |
|
ʏ |
U028F |
LATIN LETTER SMALL CAPITAL Y |
y |
ƴ |
LATIN SMALL LETTER Y WITH HOOK |
y |
|
Ƴ |
LATIN CAPITAL LETTER Y WITH HOOK |
Y |
|
ʎ |
U028E |
LATIN SMALL LETTER TURNED Y |
y |
ȝ |
U021D |
LATIN SMALL LETTER YOGH |
y |
Ȝ |
U021C |
LATIN CAPITAL LETTER YOGH |
Y |
ƶ |
LATIN SMALL LETTER Z WITH STROKE |
z |
|
Ƶ |
LATIN CAPITAL LETTER Z WITH STROKE |
Z |
|
ȥ |
U0225 |
LATIN SMALL LETTER Z WITH HOOK |
z |
Ȥ |
U0224 |
LATIN CAPITAL LETTER Z WITH HOOK |
Z |
ʐ |
U0290 |
LATIN SMALL LETTER Z WITH RETROFLEX HOOK |
z |
ʑ |
U0291 |
LATIN SMALL LETTER Z WITH CURL |
z |
ʒ |
U0292 |
LATIN SMALL LETTER EZH |
z |
Ʒ |
LATIN CAPITAL LETTER EZH |
Z |
|
ǯ |
U01EF |
LATIN SMALL LETTER EZH WITH CARON |
z |
Ǯ |
U01EE |
LATIN CAPITAL LETTER EZH WITH CARON |
Z |
ƹ |
LATIN SMALL LETTER EZH REVERSED |
z |
|
Ƹ |
LATIN CAPITAL LETTER EZH REVERSED |
Z |
|
ƺ |
U01BA |
LATIN SMALL LETTER EZH WITH TAIL |
z |
ʓ |
U0293 |
LATIN SMALL LETTER EZH WITH CURL |
z |
ς |
GREEK SMALL LETTER FINAL SIGMA |
σ |
|
ґ |
U0491 |
CYRILLIC SMALL LETTER GHE UPTURN |
г |
Ґ |
U0490 |
CYRILLIC CAPITAL LETTER GHE UPTURN |
Г |
ѓ |
U0453 |
CYRILLIC SMALL LETTER GJE |
ђ |
Ѓ |
U0403 |
CYRILLIC CAPITAL LETTER GJE |
Ђ |
ќ |
U045C |
CYRILLIC SMALL LETTER KJE |
ћ |
Ќ |
U040C |
CYRILLIC CAPITAL LETTER KJE |
Ћ |
Annex B (informative): Word-by-word ordering
B.1 Modified terminology
For the purpose of this appendix a special character shall be a character that is neither a letter nor a digit nor a diacritical mark nor a space character.
NOTE For the purpose of this annex, a space character can include all characters which are usually considered to divide words. Typical examples of these might be hyphens, apostrophes and brackets. Cf. also note to A.1.11.
B.2 Principles
Word-by-word ordering is a frequently used alternative to letter-by-letter-ordering. It is a special case of multiple-key ordering which treats space characters as key separators. The maximal string is thus a set of characters enclosed by space characters.
NOTE The string can well be smaller if further keys so demand.
The sets of strings thus obtained are ordered following the European Ordering Rules as specified in the main part of this European Standard.
B.3 Example of Word-by-word vs. letter-by-letter ordering
Letter-by-letter ordering |
Word-by-word-ordering |
in- |
in- |
B.4 Simplified word-by-word ordering
If the text to be ordered word by word contains only few second level letters, letters with diacritical marks, or special characters, the following method will in most cases produce the same result as the method that is specified above.
In the ordering by script section (A.3.3.1) space characters precede digits and letters. The space character is then removed from the table of special characters. The other ordering rules remain unchanged.
Annex C (informative): Ordering by position and by style
C.1 Background
In some cases it is desirable to differentiate further on the third ordering level, e. g. in the case where definitions and ordinary usage of a word are distinguished solely by the application of some form of internal tagging. This tagging usually takes in print the form of a formatting style. Especially in lexicography it is also often thought to be desirable to distinguish between loan words and native words in such a manner.
This formatting can be expressed by changing the position to the baseline, e. g. in mathematical or chemical formulae, or by highlighting it with certain typographic features, e. g. italic typeface, that serves to indicate some property of the word.
C.2 Recommended rules
In line with ISO 12199 this European Standard recommends that, if the implementer deems it necessary to make this differentiation, she or he modify (A.9.2.1) (Capitalization) on the third ordering level in the following manner:
Letters are to be arranged in the sequence indicated in this list:
- small letter on baseline
- capital letter on baseline
- small letter above baseline
- capital letter above baseline
- small letter below baseline
- capital letter below baseline
If this does not result in a unique sequence, typographic styles are to be taken into consideration in the sequence listed:
- roman abcde
boldface abcde
italic abcde
boldface italic abcde
- others
Annex D (informative): Mixed-script ordering with one predominant script
D.1 Background
Many publications — often of the encyclopedia type — handle scripts differently from this European Standard, especially if they cover predominantly one script with a few entries from other scripts interspersed. They implicitly transliterate strings from other scripts into the predominant one and order according to the rules for that script. For printing the strings are then rendered in their original form. This has the advantage for the user to find related articles e. g. on λόγος and logic near to each other.
D.2 Suggested steps
This may involve the following steps:
- extraction of the strings to be ordered from the relevant data. All preparatory procedures described in the main part of this European Standard may be relevant here
- implicit transliteration into the predominant script
- ordering of the strings thus obtained as specified in the main part of this European Standard
- rendering of strings in their original form, but in the order thus obtained
D.3 Explicit transliteration
A different, likewise common method is the method of explicit transliteration which selects the transliterated word such as logos and adds the original rendering in brackets.
Annex E (informative): Defining National Deltas based on the EOR
E.1 Background
Ordering rules for European languages can benefit from unambiguous, ideally machine-processable specifications, both in the case of formal national standards and de facto standards issued by, e.g., the relevant language institutes. This work can be facilitated by basing this specification on a tailoring of the EOR. This annex gives an overview over possible approaches for writing such a delta.
E.2 Structured Specification
In most cases a delta will start with a structured, but not machine-processable description of the linguistic and cultural ordering preferences. Such a specification can list sequences of letters that are distinguished on the first ordering level. For example, Norwegian could list a b c d e f g h i j k l m n o p q r s t u v w x y z æ ø å, whereas the Polish first level letters would be a ą b c ć d e ę f g h i j k l ł m n ń o ó p r s ś t u w y z ź ż.
A structured specification can then discuss typical distinctions that are made on the second ordering level, such as ss vs. ß in German or v vs. w in Icelandic dictionaries. If required, it can then look at the treatment of otherwise identical words beginning with lowercase vs. uppercase letters and specify a preference for the third ordering level. Should there be established procedures for the treatment of symbols, those can again be described.
In a second step, the differences between these rules and the EOR can be captured, again in words. Such rules could be ”make w sort equal to v on the first ordering level, but distinguish them on the second, where w follows v” or ”order æ ø å after z on the first ordering level” in Norwegian. In some cases, specific decisions in the EOR will already meet the requirements, so the delta can be small. Likewise, should there be no established cultural preferences, the delta can just implicitly rely on the EOR defaults. In line with ISO/IEC 14651 this European Standard recommends that the defined delta be as small as possible while still expressing the cultural and linguistic ordering preferences comprehensively.
This type of structured, but not machine-processable description meets the requirements of section 6.4 in ISO/IEC 14651:2007. In many cases, a profile at this level of structured, human-readable specification will suffice. If the rules are generally available, they can be translated into their machine-processable equivalents.
E.3 Machine-Readable Specification
Machine-processable specifications offer the additional benefit that they can be directly plugged into an operating system's locale data. All applications that build on the operating system's API will thus automatically profit from this data and order lists correctly according to the user's chosen cultural preferences.
Section 6.4 of ISO/IEC 14651:2007 prescribes that a machine-processable delta must contain at least one valid order_start entry, a specification of the number of levels for comparison, a definition of symbol definition weights and a list of character definitions. However, it leaves the question of the concrete syntax to express these rules explicitly open. This European Standard recommends to use either LDML or the POSIX-oriented syntax employed in ISO/IEC 14651's CTT or both to express the delta.
It is out of the scope of this European Standard to describe the exact syntax and semantics of LDML or the syntax rules for ISO/IEC 14651's CTT. Section 6 and Annex G can serve as practical examples of the design and implementation of two representations of a given delta. More tutorial information as well as pointers further examples and to syntax validators for both syntaxes can be found on http://purl.oclc.org/NET/CDFG/EN13710
E.4 Example 1: National Delta for German
E.4.1 Structured Specification
E.4.1.1 First Ordering Level
- The first level letters are: abcdefghijklmnopqrstuvwxyz
- Digits follow letters
- Spaces are significant
- The letter thorn (þ U00FE / Þ U00DE) is ordered as th
The letters wynn (ƿ U01BF/ Ƿ U01F7) and ezh (ʒ U0292 / Ʒ U01B7) are ordered as y
E.4.1.2 Second Ordering Level
The Umlaut (diaeresis in ISO/IEC 10646) is the diacritic that must be taken into account before all others. The umlaut is treated as distinct from the trema and can only occur in combination with the base letters a, o and u. The sequence of the other diacritics is compatible with the EOR delta.
E.4.1.3 Third Ordering Level
Lowercase letters precede uppercase ones.
E.4.1.4 Fourth Ordering Level
No specific rules.
E.4.2 Delta against EOR in 14651-syntax
% -*- coding: utf-8; -*- % ISO/IEC 14651-conformant delta for DIN 5007-1:2004 (sample) reorder-after <BASE> %The umlaut is the diacritic with the highest priority collating-symbol <UMLAUT> %specifically for the weight of the umlaut (distinct from the Trema in DIN 5007) <UMLAUT> reorder-end %Digits must follow letters %For the treatment of Roman numerals as well as digits as numbers we %need preprocessing that is out of the scope of this profile reorder-after <TFFFF> %After the Han ideographs, but before <SFFFF> <S0030> % 0 <S0031> % 1 <S0032> % 2 <S0033> % 3 <S0034> % 4 <S0035> % 5 <S0036> % 6 <S0037> % 7 <S0038> % 8 <S0039> % 9 reorder-end reorder-after <SFFFF> order_start forward;forward;forward;forward %Whitspace precedes according to 6.2.1 all other characters on the %first level. They thus get a non-ignorable weight <U0009> <S0009>;<BASE>;<MIN>;<U0009> % HORIZONTAL TABULATION (in 6429) <U000A> <S000A>;<BASE>;<MIN>;<U000A> % LINE FEED (in 6429) <U000B> <S000B>;<BASE>;<MIN>;<U000B> % VERTICAL TABULATION (in 6429) <U000C> <S000C>;<BASE>;<MIN>;<U000C> % FORM FEED (in 6429) <U000D> <S000D>;<BASE>;<MIN>;<U000D> % CARRIAGE RETURN (in 6429) <U0020> <S0020>;<BASE>;<MIN>;<U0020> % SPACE %Ligatures and the sharp S are already treated in the EOR delta in the sense of DIN 5007 <U00E4> <S0061>;"<BASE><UMLAUT>";"<MIN><MIN>";<U00E4> % LATIN SMALL LETTER A WITH DIAERESIS <U00C4> <S0061>;"<BASE><UMLAUT>";"<CAP><MIN>";<U00C4> % LATIN CAPITAL LETTER A WITH DIAERESIS <U00F6> <S006F>;"<BASE><UMLAUT>";"<MIN><MIN>";<U00F6> % LATIN SMALL LETTER O WITH DIAERESIS <U00D6> <S006F>;"<BASE><UMLAUT>";"<CAP><MIN>";<U00D6> % LATIN CAPITAL LETTER O WITH DIAERESIS <U00FC> <S0075>;"<BASE><UMLAUT>";"<MIN><MIN>";<U00FC> % LATIN SMALL LETTER U WITH DIAERESIS <U00DC> <S0075>;"<BASE><UMLAUT>";"<CAP><MIN>";<U00DC> % LATIN CAPITAL LETTER U WITH DIAERESIS %If we want an ordering according to 6.1.1.4.2, where Umlaute become %base letter + e, these lines replace the previous six weight assignments %<U00E4> "<S0061><S0065>";"<BASE><BASE><UMLAUT>";"<MIN><MIN><MIN>";<U00E4> % LATIN SMALL LETTER A WITH DIAERESIS %<U00C4> "<S0061><S0065>";"<BASE><BASE><UMLAUT>";"<CAP><MIN><MIN>";<U00C4> % LATIN CAPITAL LETTER A WITH DIAERESIS %<U00F6> "<S006F><S0065>";"<BASE><BASE><UMLAUT>";"<MIN><MIN><MIN>";<U00F6> % LATIN SMALL LETTER O WITH DIAERESIS %<U00D6> "<S006F><S0065>";"<BASE><BASE><UMLAUT>";"<CAP><MIN><MIN>";<U00D6> % LATIN CAPITAL LETTER O WITH DIAERESIS %<U00FC> "<S0075><S0065>";"<BASE><BASE><UMLAUT>";"<MIN><MIN><MIN>";<U00FC> % LATIN SMALL LETTER U WITH DIAERESIS %<U00DC> "<S0075><S0065>";"<BASE><BASE><UMLAUT>";"<CAP><MIN><MIN>";<U00DC> % LATIN CAPITAL LETTER U WITH DIAERESIS %Treatment of other Latin characters according to DIN 31638, 7.3.4.1 <U00FE> "<S0074><S0068>";"<BASE><BASE><VRNT1>";"<MIN><MIN><MIN>";<U00FE> % LATIN SMALL LETTER THORN (as th on the first level) <U00DE> "<S0074><S0068>";"<BASE><BASE><VRNT1>";"<CAP><CAP><MIN>";<U00DE> % LATIN CAPITAL LETTER THORN <U01BF> <S0079>;"<BASE><VRNT3>";"<MIN><MIN>";<U01BF> % LATIN LETTER WYNN (as y on the first level) <U01F7> <S0079>;"<BASE><VRNT3>";"<CAP><MIN>";<U01F7> % LATIN CAPITAL LETTER WYNN <U0292> <S0079>;"<BASE><VRNT4>";"<MIN><MIN>";<U0292> % LATIN SMALL LETTER EZH (as y on the first level, yogh and ezh are equated in ISO/IEC 10646) <U01B7> <S0079>;"<BASE><VRNT4>";"<CAP><MIN>";<U01B7> % LATIN CAPITAL LETTER EZH reorder-end
E.5 Example 2: National Delta for Norwegian
E.5.1 Structured Specification
E.5.1.1 First Ordering Level
- The first level letters are: abcdefghijklmnopqrstuvwxyzæøå
- The letter ä is ordered as æ
- The letter ö is ordered as ø
- The letter ü is ordered as y
- The letter o with double acute is ordered as ø
- The letter u with double acute is ordered as y
- The letter þ is ordered as th
E.5.1.2 Second Ordering Level
No specific rules.
E.5.1.3 Third Ordering Level
Lowercase letters precede uppercase ones.
E.5.1.4 Fourth Ordering Level
No specific rules.
E.5.2 Delta against EOR in 14651-syntax
% ISO/IEC 14651-conformant delta for Norwegian (sample) reorder-after <S00FE> collating-symbol <S00E6> % LATIN SMALL LETTER AE collating-symbol <S00F8> % LATIN SMALL LETTER O WITH STROKE collating-symbol <S00E5> % LATIN SMALL LETTER A WITH RING ABOVE <S00E6> <S00F8> <S00E5> reorder-end reorder-after <SFFFF> order_start forward;forward;forward;forward %first level letters are abcdefghijklmnopqrstuvwxyzæøå %a-z as in EOR %ü and ű as y <U00FC> <S0079>;"<BASE><VRNT2>";"<MIN><MIN>";<U00FC> % LATIN SMALL LETTER U WITH DIAERESIS <U00DC> <S0079>;"<BASE><VRNT2>";"<CAP><MIN>";<U00DC> % LATIN CAPITAL LETTER U WITH DIAERESIS <U0171> <S0079>;"<BASE><VRNT3>";"<MIN><MIN>";<U0171> % LATIN SMALL LETTER U WITH DOUBLE ACUTE <U0170> <S0079>;"<BASE><VRNT3>";"<CAP><MIN>";<U0170> % LATIN CAPITAL LETTER U WITH DOUBLE ACUTE <U00E6> <S00E6>;<BASE>;<MIN>;<U00E6> % LATIN SMALL LETTER AE <U00C6> <S00E6>;<BASE>;<CAP>;<U00C6> % LATIN CAPITAL LETTER AE %ä as æ <U00E4> <S00E6>;"<BASE><VRNT1>";"<MIN><MIN>";<U00E4> % LATIN SMALL LETTER A WITH DIAERESIS <U00C4> <S00E6>;"<BASE><VRNT1>";"<CAP><MIN>";<U00C4> % LATIN CAPITAL LETTER A WITH DIAERESIS <U00F8> <S00F8>;<BASE>;<MIN>;<U00F8> % LATIN SMALL LETTER O WITH STROKE <U00D8> <S00F8>;<BASE>;<CAP>;<U00D8> % LATIN CAPITAL LETTER O WITH STROKE %ö as ø <U00F6> <S00F8>;"<BASE><VRNT1>";"<MIN><MIN>";<U00F6> % LATIN SMALL LETTER O WITH DIAERESIS <U00D6> <S00F8>;"<BASE><VRNT1>";"<CAP><MIN>";<U00D6> % LATIN CAPITAL LETTER O WITH DIAERESIS %ő also as ø <U0151> <S00F8>;"<BASE><VRNT2>";"<MIN><MIN>";<U0151> % LATIN SMALL LETTER O WITH DOUBLE ACUTE <U0150> <S00F8>;"<BASE><VRNT2>";"<CAP><MIN>";<U0150> % LATIN CAPITAL LETTER O WITH DOUBLE ACUTE <U00FE> "<S0074><S0068>";"<BASE><BASE><VRNT1>";"<MIN><MIN><MIN>";<U00FE> % LATIN SMALL LETTER THORN (as th on the first level) <U00DE> "<S0074><S0068>";"<BASE><BASE><VRNT1>";"<CAP><CAP><MIN>";<U00DE> % LATIN CAPITAL LETTER THORN <U00E5> <S00E5>;<BASE>;<MIN>;<U00E5> % LATIN SMALL LETTER A WITH RING ABOVE <U00C5> <S00E5>;<BASE>;<CAP>;<U00C5> % LATIN CAPITAL LETTER A WITH RING ABOVE reorder-end
Annex F (informative): Modern European Scripts / MES
This annex reproduces for ease of reference the definition of the Modern European Scripts collection in ISO/IEC 10646:2003, A.4.3:
1 |
BASIC LATIN |
0020-007E |
2 |
LATIN-1 SUPPLEMENT |
00A0-00FF |
3 |
LATIN EXTENDED-A |
0100-017F |
4 |
LATIN EXTENDED-B |
0180-024F |
5 |
IPA EXTENSIONS |
0250-02AF |
6 |
SPACING MODIFIER LETTERS |
02B0-02FF |
7 |
COMBINING DIACRITICAL MARKS |
0300-036F |
8 |
BASIC GREEK |
0370-03CF |
9 |
GREEK SYMBOLS AND COPTIC |
03D0-03FF |
10 |
CYRILLIC |
0400-04FF |
11 |
ARMENIAN |
0530-058F |
27 |
BASIC GEORGIAN |
10D0-10FF |
30 |
LATIN EXTENDED ADDITIONAL |
1E00-1EFF |
31 |
GREEK EXTENDED |
1F00-1FFF |
32 |
GENERAL PUNCTUATION |
2000-206F |
33 |
SUPERSCRIPTS AND SUBSCRIPTS |
2070-209F |
34 |
CURRENCY SYMBOLS |
20A0-20CF |
35 |
COMBINING DIACRITICAL MARKS FOR SYMBOLS |
20D0-20FF |
36 |
LETTERLIKE SYMBOLS |
2100-214F |
37 |
NUMBER FORMS |
2150-218F |
38 |
ARROWS |
2190-21FF |
39 |
MATHEMATICAL OPERATORS |
2200-22FF |
40 |
MISCELLANEOUS TECHNICAL |
2300-23FF |
42 |
OPTICAL CHARACTER RECOGNITION |
2440-245F |
44 |
BOX DRAWING |
2500-257F |
45 |
BLOCK ELEMENTS |
2580-259F |
46 |
GEOMETRIC SHAPES |
25A0-25FF |
47 |
MISCELLANEOUS SYMBOLS |
2600-26FF |
65 |
COMBINING HALF MARKS |
FE20-FE2F |
70 |
SPECIALS |
FFF0-FFFD |
92 |
CYRILLIC SUPPLEMENT |
0500-052F |
104 |
LTR ALPHABETIC PRESENTATION FORMS |
FB00-FB1C |
Annex G (informative): EOR Delta in LDML Syntax
<?xml version="1.0" encoding="utf-8"?>
<ldml>
<identity>
<!--
Authors: Marc Wilhelm Küster (EN 13710 editor) and Åke Persson
LDML version of the rules specified in Section 6 of EN 13710
-->
<version number="1.0" cldrVersion="1.8.1"/>
<generation date="2010-05-03"/>
<language type="EOR"/>
</identity>
<collations validSubLocales="All European locales">
<collation type="standard">
<settings caseLevel="on" caseFirst="lower" strength="quaternary"/>
<rules>
<!--
currency signs and modifier letters are ignored in EOR
-->
<reset><last_tertiary_ignorable/></reset>
<i>$</i><!-- U0024 -->
<i>¢</i><!-- U00A2 -->
<i>£</i><!-- U00A3 -->
<i>¤</i><!-- U00A4 -->
<i>¥</i><!-- U00A5 -->
<i>₠</i><!-- U20A0 -->
<i>₡</i><!-- U20A1 -->
<i>₢</i><!-- U20A2 -->
<i>₣</i><!-- U20A3 -->
<i>₤</i><!-- U20A4 -->
<i>₥</i><!-- U20A5 -->
<i>₦</i><!-- U20A6 -->
<i>₧</i><!-- U20A7 -->
<i>₨</i><!-- U20A8 -->
<i>₩</i><!-- U20A9 -->
<i>₪</i><!-- U20AA -->
<i>₫</i><!-- U20AB -->
<i>€</i><!-- U20AC -->
<i>₭</i><!-- U20AD -->
<i>₮</i><!-- U20AE -->
<i>₯</i><!-- U20AF -->
<i>₰</i><!-- U20B0 -->
<i>₱</i><!-- U20B1 -->
<i>₲</i><!-- U20B2 -->
<i>₳</i><!-- U20B3 -->
<i>₴</i><!-- U20B4 -->
<i>₵</i><!-- U20B5 -->
<!-- General category Lm -->
<i>ʰ</i><!-- U02B0 -->
<i>ʱ</i><!-- U02B1 -->
<i>ʲ</i><!-- U02B2 -->
<i>ʳ</i><!-- U02B3 -->
<i>ʴ</i><!-- U02B4 -->
<i>ʵ</i><!-- U02B5 -->
<i>ʶ</i><!-- U02B6 -->
<i>ʷ</i><!-- U02B7 -->
<i>ʸ</i><!-- U02B8 -->
<i>ʻ</i><!-- U02BB -->
<i>ʼ</i><!-- U02BC -->
<i>ʽ</i><!-- U02BD -->
<i>ʾ</i><!-- U02BE -->
<i>ʿ</i><!-- U02BF -->
<i>ˀ</i><!-- U02C0 -->
<i>ˁ</i><!-- U02C1 -->
<i>ː</i><!-- U02D0 -->
<i>ˑ</i><!-- U02D1 -->
<i>ˠ</i><!-- U02E0 -->
<i>ˡ</i><!-- U02E1 -->
<i>ˢ</i><!-- U02E2 -->
<i>ˣ</i><!-- U02E3 -->
<i>ˤ</i><!-- U02E4 -->
<i>ˮ</i><!-- U02EE -->
<!-- Glottal stops -->
<i>ʔ</i><!-- U0294 -->
<i>ʕ</i><!-- U0295 -->
<i>ʖ</i><!-- U0296 -->
<i>ʘ</i><!-- U0298 -->
<i>ʡ</i><!-- U02A1 -->
<i>ʢ</i><!-- U02A2 -->
<!--Latin letters: only a-z + þ are basic letters -->
<reset>a</reset> <!-- turned a, latin alpha, turned alpha -->
<s>ɐ</s><!-- U0250 -->
<s>ɑ</s><!-- U0251 -->
<s>ɒ</s><!-- U0252 -->
<reset>b</reset> <!-- small capital b, b with stroke, b with hook, b with topbar -->
<s>ʙ</s><!-- U0299 -->
<s>ƀ</s><!-- U0180 -->
<t>Ƀ</t><!-- U0243 -->
<s>ɓ</s><!-- U0253 -->
<t>Ɓ</t><!-- U0181 -->
<s>ƃ</s><!-- U0183 -->
<t>Ƃ</t><!-- U0182 -->
<reset>c</reset> <!-- c with hook, c with curl, stretched c -->
<s>ƈ</s><!-- U0188 -->
<t>Ƈ</t><!-- U0187 -->
<s>ɕ</s><!-- U0255 -->
<s>ʗ</s><!-- U0297 -->
<reset>d</reset> <!-- African d, d with hook,
d with topbar, d with curl, turned delta -->
<s>ɖ</s><!-- U0256 -->
<t>Ɖ</t><!-- U0189 -->
<s>ɗ</s><!-- U0257 -->
<t>Ɗ</t><!-- U018A -->
<s>ƌ</s><!-- U018C -->
<t>Ƌ</t><!-- U018B -->
<s>ȡ</s><!-- U0221 -->
<s>ƍ</s><!-- U018D -->
<reset>e</reset> <!-- turned e, schwa, open e, reversed e, schwa with hook,
reversed open e, reversed open e with hook, closed reversed open e, closed open e -->
<s>ǝ</s><!-- U01DD -->
<t>Ǝ</t><!-- U018E -->
<s>ə</s><!-- U0259 -->
<t>Ə</t><!-- U018F -->
<s>ɛ</s><!-- U025B -->
<t>Ɛ</t><!-- U0190 -->
<s>ɘ</s><!-- U0258 -->
<s>ɚ</s><!-- U025A -->
<s>ɜ</s><!-- U025C -->
<s>ɝ</s><!-- U025D -->
<s>ɞ</s><!-- U025E -->
<s>ʚ</s><!-- U029A -->
<reset>f</reset> <!-- f with hook -->
<s>ƒ</s><!-- U0192 -->
<t>Ƒ</t><!-- U0191 -->
<reset>g</reset> <!-- script g, small capital g, g with stroke,
g with hook, latin gamma, gha -->
<s>ɡ</s><!-- U0261 -->
<s>ɢ</s><!-- U0262 -->
<s>ǥ</s><!-- U01E5 -->
<t>Ǥ</t><!-- U01E4 -->
<s>ɠ</s><!-- U0260 -->
<t>Ɠ</t><!-- U0193 -->
<s>ʛ</s><!-- U029B -->
<s>ɣ</s><!-- U0263 -->
<t>Ɣ</t><!-- U0194 -->
<s>ɤ</s><!-- U0264 -->
<s>ƣ</s><!-- U01A3 -->
<t>Ƣ</t><!-- U01A2 -->
<reset>h</reset> <!-- small capital h, h with hook, heng with hook, turned h,
turned h with fishhook, turned h with fishhook and tail -->
<s>ʜ</s><!-- U029C -->
<s>ɦ</s><!-- U0266 -->
<s>ɧ</s><!-- U0267 -->
<s>ɥ</s><!-- U0265 -->
<s>ʮ</s><!-- U02AE -->
<s>ʯ</s><!-- U02AF -->
<reset>i</reset> <!-- dotless i, small capital i, i with stroke, latin iota -->
<s>ı</s><!-- U0131 -->
<s>ɪ</s><!-- U026A -->
<s>ɨ</s><!-- U0268 -->
<t>Ɨ</t><!-- U0197 -->
<s>ɩ</s><!-- U0269 -->
<t>Ɩ</t><!-- U0196 -->
<reset>j</reset> <!-- j with crossed tail, dotless j with stroke,
dotless j with stroke and hook -->
<s>ʝ</s><!-- U029D -->
<s>ɟ</s><!-- U025F -->
<s>ʄ</s><!-- U0284 -->
<reset>k</reset> <!-- k with hook, kra, turned k -->
<s>ƙ</s><!-- U0199 -->
<t>Ƙ</t><!-- U0198 -->
<s>ĸ</s><!-- U0138 -->
<s>ʞ</s><!-- U029E -->
<reset>l</reset> <!-- small capital l, l with bar, l with middle tilde, l with belt,
l with retroflex hook, l with curl, l with curl, latin lambda with stroke -->
<!-- U029F -->
<s>ƚ</s><!-- U019A -->
<t>Ƚ</t><!-- U023D -->
<s>ɫ</s><!-- U026B -->
<s>ɬ</s><!-- U026C -->
<s>ɭ</s><!-- U026D -->
<s>ȴ</s><!-- U0234 -->
<s>ƛ</s><!-- U019B -->
<reset>m</reset> <!-- m with hook, turned m, turned m with long leg -->
<s>ɱ</s><!-- U0271 -->
<s>ɯ</s><!-- U026F -->
<t>Ɯ</t><!-- U019C -->
<s>ɰ</s><!-- U0270 -->
<reset>n</reset> <!-- n preceded by apostrophe, small capital n, n with left hook,
n with long right leg, n with retroflex hook, n with curl, eng -->
<s>ʼn</s><!-- U0149 -->
<s>ɴ</s><!-- U0274 -->
<s>ɲ</s><!-- U0272 -->
<t>Ɲ</t><!-- U019D -->
<s>ƞ</s><!-- U019E -->
<t>Ƞ</t><!-- U0220 -->
<s>ɳ</s><!-- U0273 -->
<s>ȵ</s><!-- U0235 -->
<s>ŋ</s><!-- U014B -->
<t>Ŋ</t><!-- U014A -->
<reset>o</reset> <!-- open o, o with middle tilde, closed omega, ou -->
<s>ɔ</s><!-- U0254 -->
<t>Ɔ</t><!-- U0186 -->
<s>ɵ</s><!-- U0275 -->
<t>Ɵ</t><!-- U019F -->
<s>ɷ</s><!-- U0277 -->
<s>ȣ</s><!-- U0223 -->
<t>Ȣ</t><!-- U0222 -->
<reset>p</reset> <!-- p with hook, latin phi -->
<s>ƥ</s><!-- U01A5 -->
<t>Ƥ</t><!-- U01A4 -->
<s>ɸ</s><!-- U0278 -->
<reset>q</reset> <!-- q with hook -->
<s>ʠ</s><!-- U02A0 -->
<reset>r</reset> <!-- small capital r, yr, turned r, turned r with long leg,
turned r with hook, r with long leg, r with tail, r with fishhook,
reversed r with fishhook, small capital inverted r -->
<s>ʀ</s><!-- U0280 -->
<t>Ʀ</t><!-- U01A6 -->
<s>ɹ</s><!-- U0279 -->
<s>ɺ</s><!-- U027A -->
<s>ɻ</s><!-- U027B -->
<s>ɼ</s><!-- U027C -->
<s>ɽ</s><!-- U027D -->
<s>ɾ</s><!-- U027E -->
<s>ɿ</s><!-- U027F -->
<s>ʁ</s><!-- U0281 -->
<reset>s</reset> <!-- s with hook, esh, esh loop, reversed esh,
esh with curl -->
<s>ʂ</s><!-- U0282 -->
<s>ʃ</s><!-- U0283 -->
<t>Ʃ</t><!-- U01A9 -->
<s>ƪ</s><!-- U01AA -->
<s>ʅ</s><!-- U0285 -->
<s>ʆ</s><!-- U0286 -->
<reset>t</reset> <!-- t with stroke, t with palatal hook, t with hook,
t with retroflex hook, t with curl, turned t, digraph tc with curl -->
<s>ŧ</s><!-- U0167 -->
<t>Ŧ</t><!-- U0166 -->
<s>ƫ</s><!-- U01AB -->
<s>ƭ</s><!-- U01AD -->
<t>Ƭ</t><!-- U01AC -->
<s>ʈ</s><!-- U0288 -->
<t>Ʈ</t><!-- U01AE -->
<s>ȶ</s><!-- U0236 -->
<s>ʇ</s><!-- U0287 -->
<reset>u</reset> <!-- u bar, latin upsilon -->
<s>ʉ</s><!-- U0289 -->
<t>Ʉ</t><!-- U0244 -->
<s>ʊ</s><!-- U028A -->
<t>Ʊ</t><!-- U01B1 -->
<reset>v</reset> <!-- v with hook, turned v -->
<s>ʋ</s><!-- U028B -->
<t>Ʋ</t><!-- U01B2 -->
<s>ʌ</s><!-- U028C -->
<t>Ʌ</t><!-- U0245 -->
<reset>w</reset> <!-- turned w, wynn -->
<s>ʍ</s><!-- U028D -->
<s>ƿ</s><!-- U01BF -->
<t>Ƿ</t><!-- U01F7 -->
<reset>y</reset> <!-- small capital y, y with hook, turned y, yogh -->
<s>ʏ</s><!-- U028F -->
<s>ƴ</s><!-- U01B4 -->
<t>Ƴ</t><!-- U01B3 -->
<s>ʎ</s><!-- U028E -->
<s>ȝ</s><!-- U021D -->
<t>Ȝ</t><!-- U021C -->
<reset>z</reset> <!-- z with stroke, z with hook, z with retroflex hook,
z with curl, ezh, ezh with caron,
ezh reversed, ezh with tail, ezh with curl -->
<s>ƶ</s><!-- U01B6 -->
<t>Ƶ</t><!-- U01B5 -->
<s>ȥ</s><!-- U0225 -->
<t>Ȥ</t><!-- U0224 -->
<s>ʐ</s><!-- U0290 -->
<s>ʑ</s><!-- U0291 -->
<s>ʒ</s><!-- U0292 -->
<t>Ʒ</t><!-- U01B7 -->
<s>ǯ</s><!-- U01EF -->
<t>Ǯ</t><!-- U01EE -->
<s>ƹ</s><!-- U01B9 -->
<t>Ƹ</t><!-- U01B8 -->
<s>ƺ</s><!-- U01BA -->
<s>ʓ</s><!-- U0293 -->
<!-- Digraphs -->
<reset>dʑ</reset> <!-- dz digraph with curl -->
<s>ʥ</s><!-- U02A5 -->
<reset>dʒ</reset> <!-- dezh digraph -->
<s>ʤ</s><!-- U02A4 -->
<reset>hv</reset> <!-- hv -->
<s>ƕ</s><!-- U0195 -->
<reset>HV</reset> <!-- hwair -->
<s>Ƕ</s><!-- U01F6 -->
<reset>lʒ</reset> <!-- lezh -->
<s>ɮ</s><!-- U026E -->
<reset>oe</reset> <!-- small capital oe -->
<s>ɶ</s><!-- U0276 -->
<reset>tɕ</reset> <!-- tc digraph with curl -->
<s>ʨ</s><!-- U02A8 -->
<!-- tailorings against Greek obsolete when done against current UCA table -->
<!-- Cyrillic letters: full conformance with GOST requirements -->
<reset>ђ</reset> <!-- gje as variant of dje (Serbian) -->
<s>ѓ</s><!-- U0453 -->
<t>Ѓ</t><!-- U0403 -->
<reset>ћ</reset> <!-- kje as variant of tshe (Serbian) -->
<s>ќ</s><!-- U045C -->
<t>Ќ</t><!-- U040C -->
<!-- Georgian: unchanged -->
<!-- Armenian -->
<reset>ք</reset> <!-- U0584 -->
<p>և</p><!-- U0587 -->
</rules>
</collation>
</collations>
</ldml>
1 Shapes may vary according to fonts and styles
2 If possible, combining diacritical marks are referenced. If no corresponding combining diacritical mark exists, the table lists non-combining variants. Diacritical marks are unified for Cyrillic and Latin but not for Greek and Latin. This reflects prevalent usage and user-expectations
3 Names in lowercase letters are only an informative selection of some of the most common alternative names. Names in capitals are normative.
4 Strictly speaking, umlaut and trema can be two typographically slightly different phenomena, but the distinction is increasingly becoming obsolete.
5 The letters sometimes referred to as small g with comma above and capital g with comma below are to be ordered as small g with cedilla and capital g with cedilla respectively.
6 Exists only in combination with α, η, ω as ᾳ, ῃ, ῳ.
7 Position and name in ISO/IEC 10646:2003. If no corresponding combining diacritical mark exists, the table lists non-combining variants. If these also do not exist, the table simply gives the names of the diacritical marks. Diacritical marks are not unified across scripts unless this reflects prevalent usage and user-expectations
8 This and several other combinations cannot reasonably be printed as stand-alone diacritics. They are presented here in combination with the letter α
9 Equivalent on First Ordering Level
10 Also known as letter gha
