European Ordering Rules: Ordering of characters from Latin, Greek, Cyrillic, Georgian and Armenian scripts

Status: prEN 13710 integrating the comments from the CEN enquiry

Editor and Contact Person: Marc Wilhelm Küster (kuester [AT] fh-worms [DOT] de)

Contents

  1. European Ordering Rules: Ordering of characters from Latin, Greek, Cyrillic, Georgian and Armenian scripts
  2. Foreword
  3. Introduction
  4. 1 Scope
  5. 2 Normative references
  6. 3 Terms and definitions
    1. 3.1
    2. 3.2
    3. 3.3
    4. 3.4
    5. 3.5
    6. 3.6
    7. 3.7
    8. 3.8
  7. 4 Conformance
  8. 5 Tailorability
  9. 7 Bibliography
  10. 6 EOR Delta Table
  11. Annex A (informative): Principles behind the European Ordering Rules
    1. A.0 Introduction
    2. A.1 Terms and definitions
      1. A.1.1
      2. A.1.2
      3. A.1.3
      4. A.1.4
      5. A.1.5
      6. A.1.6
      7. A.1.7
      8. A.1.8
      9. A.1.9
      10. A.1.10
      11. A.1.11
    3. A.2 Preparatory procedures
      1. A.2.1 Purpose
      2. A.2.2 Methodology
      3. A.2.3 Further preprocessing
    4. A.3 The multilevel ordering procedure
      1. A.3.1 General principles
      2. A.3.2 Assumptions and aims
      3. A.3.3 Rules (valid throughout)
        1. A.3.3.1 Ordering by script
        2. A.3.3.2 Equivalent letter forms
    5. A.4 First ordering level
      1. A.4.1 Validity
      2. A.4.2 Equivalent or ignored characters
        1. A.4.2.1 Capital and small letters
        2. A.4.2.2 Second level letters
        3. A.4.2.3 Letters with diacritical marks
        4. A.4.2.4 Special characters
      3. A.4.3 Ordering sequences
        1. A.4.3.1 Digits
        2. A.4.3.2 Latin script
        3. A.4.3.3 Greek script
        4. A.4.3.4 Cyrillic script
        5. A.4.3.5 Georgian script
        6. A.4.3.6 Armenian script
    6. A.5 Second ordering level
      1. A.5.1 No unique sequence after the first ordering level
      2. A.5.2 Equivalent or ignored characters
        1. A.5.2.1 Capital and small letters
        2. A.5.2.2 Special characters
      3. A.5.3 Ordering sequences
        1. A.5.3.1 Second level letters
        2. A.5.3.2 Letters with diacritical marks
    7. A.6 Third ordering level
      1. A.6.1 No unique sequence after the second ordering level
      2. A.6.2 Ignored characters
      3. A.6.3 Ordering sequences
        1. A.6.3.1 Capitalization
    8. A.7 Fourth ordering level
      1. A.7.1 No unique sequence after the third ordering level
      2. A.7.3 Equivalence
    9. A.8 Specific ordering sequences
      1. A.8.1 Diacritical marks
        1. A.8.1.1 Diacritical marks
        2. A.8.1.2 Multiple diacritical marks
      2. A.8.2 Second level letters
  12. Annex B (informative): Word-by-word ordering
    1. B.1 Modified terminology
    2. B.2 Principles
    3. B.3 Example of Word-by-word vs. letter-by-letter ordering
    4. B.4 Simplified word-by-word ordering
  13. Annex C (informative): Ordering by position and by style
    1. C.1 Background
    2. C.2 Recommended rules
  14. Annex D (informative): Mixed-script ordering with one predominant script
    1. D.1 Background
    2. D.2 Suggested steps
    3. D.3 Explicit transliteration
  15. Annex E (informative): Defining National Deltas based on the EOR
    1. E.1 Background
    2. E.2 Structured Specification
    3. E.3 Machine-Readable Specification
    4. E.4 Example 1: National Delta for German
      1. E.4.1 Structured Specification
        1. E.4.1.1 First Ordering Level
        2. E.4.1.2 Second Ordering Level
        3. E.4.1.3 Third Ordering Level
        4. E.4.1.4 Fourth Ordering Level
      2. E.4.2 Delta against EOR in 14651-syntax
    5. E.5 Example 2: National Delta for Norwegian
      1. E.5.1 Structured Specification
        1. E.5.1.1 First Ordering Level
        2. E.5.1.2 Second Ordering Level
        3. E.5.1.3 Third Ordering Level
        4. E.5.1.4 Fourth Ordering Level
      2. E.5.2 Delta against EOR in 14651-syntax
  16. Annex F (informative): Modern European Scripts / MES
  17. Annex G (informative): EOR Delta in LDML Syntax

Foreword

This document (prEN 13710:2009) has been prepared by Technical Committee CEN/TC “304”, the secretariat of which is held by DIN.

This document is currently submitted to the CEN Enquiry.

This document will supersede ENV 13710:2001-12 European Ordering Rules — Ordering of characters from the Latin, Greek and Cyrillic scripts and CR 14400:2001-12 European Ordering Rules - Ordering for Latin, Greek, Cyrillic, Georgian and Armenian scripts.

Introduction

This European Standard provides rules for ordering multilingual lists into a well-defined and intuitive sequence. These rules are intended for data from different European languages that must be brought into a predictable order that makes it easy for users from multiple cultural backgrounds to find information. At the same time the standard is a basis for the definition of language-specific profiles taking the rules of a given language community into account at the same time as the total pan-European character set in a consistent, pan-European manner.

The rules have been tested and widely adopted in two predecessor specifications, ENV 13710:2000-12 European Ordering Rules — Ordering of characters from the Latin, Greek and Cyrillic scripts and its companion and extension specification CR 14400:2001-12 European Ordering Rules - Ordering for Latin, Greek, Cyrillic, Georgian and Armenian scripts. This European Standard consolidates these two documents into one technically largely upwards compatible standard.

This European Standard caters for two different target groups, software implementers on the one hand and users of ordering applications on the other.

Software implementers need unambiguous, machine-processable guidelines, which can readily be loaded into existing and future ordering applications. This goal can best be achieved by defining a European default ordering table in the syntaxes of two internationally relevant specifications in the field:

Users with no specific ICT background, however, need an explanation of the principles in a form more in line with existing national ordering standards or relevant practice. Tailoring tables can be difficult to read for human readers, so an explanation of the principles behind that table is given in the informative annexes. Users not familiar with the formal syntax of the tailoring table are advised to consult those annexes first.

The normative main part of this European Standard specifies letter-by-letter ordering of character strings. Informative Annex A presents equivalent information in a more human-oriented way. Informative Annex B deals with word-by-word ordering as a special form of ordering with multiple keys. Informative Annex C explains the use of further ordering criteria. Informative Annex D presents a widely used alternative to the main part, namely the amalgamation of several scripts in one index via implicit transliteration. Informative Annex E gives guidance on the use of this European Standard as the basis for expressing national deltas. Informative Annex F lists the underlying character repertoire for ease of reference. Informative Annex G expresses the formal delta in the LDML syntax.

Following the practice of ISO/IEC 14651 characters are referenced as UXXXX where X stands for any hexadecimal digit and refers to the code position of that character in ISO/IEC 10646. This convention is used throughout this European Standard.

1 Scope

This European Standard specifies the order between two character strings composed of characters from the Modern European Scripts (MES) collection of ISO/IEC 10646:2003 or subsets of it.

The ordering rules specified in this European Standard are only applicable for lists of data in more than one European language and when this data is intended for a multicultural audience. They complement existing national standards or practices in the field.

2 Normative references

This European Standard incorporates by dated or undated reference provisions from other publications. These normative references are quoted at the appropriate places in the text, and the publications are listed hereafter.

All standards are subject to revision. Dated references do not always refer to subsequent amendments of the publication in question. Undated references always refer to the latest edition.

ISO/IEC 10646:2003-12, Information Technology — Universal Multi-Octet Coded Character set (UCS). Second edition

ISO 12199:2000-8, Alphabetical ordering of multilingual terminological and lexicographical data represented in the Latin alphabet

ISO/IEC 14651:2007-12, International string ordering and comparison — Method for comparing character strings and description of the common template tailorable ordering

Unicode Technical Standard #10: Unicode Collation Algorithm. Version 5.2.0 (2009-10-08)

3 Terms and definitions

For the purpose of this European Standard the following definitions of ISO/IEC 10646 and of ISO/IEC 14651 apply:

3.1

character

member of a set of elements used for the organisation, control, or representation of data [ISO/IEC 10646:2003]

3.2

character string

sequence of characters considered as a single object [ISO/IEC 14651]

3.3

collating symbol

symbol used to specify weights assigned to a collating element [ISO/IEC 14651]

3.4

collating element

sequence of one or more characters that are considered a single entity for ordering [ISO/IEC 14651]

3.5

collation table

mapping from collating elements to weighting elements [ISO/IEC 14651]

3.6

delta

list of the differences between a given collation table and another one [ISO/IEC 14651]

3.7

ordering

process by which, given two strings, it is determined whether the first one is less than, equal to, or greater than the second one [ISO/IEC 14651]

3.8

sorting

presentation of information in a structured way

4 Conformance

In order to be conformant to this European Standard an application shall meet the requirements prescribed in section 6 of ISO/IEC 14651 and its Common Template Table ISO14651_2006_TABLE1 after the application of the EOR delta table specified in section 6 of this European Standard. An equivalent description of the resulting tailored table shall equally conform to this European Standard.

5 Tailorability

The European Ordering Rules defined in this standard can be taken as a default template which can be tailored to the needs of any European country in the manner specified by ISO/IEC 14651 (cf. also Informative Annex E).

This European Standard is not meant to influence national standards or traditions in the field of ordering, its scope being the ordering of multilingual data. Nonetheless, national standards are encouraged to express their national ordering rules on this European Standard by declaring a formalized set of deviation rules (”delta”), as explained in Informative Annex E. This way, the respective ordering rules are automatically machine-processable and can be incorporated into international repositories of locale data, allowing for more widespread support of national ordering standards across software products.

7 Bibliography

ISO/IEC 15897:1999 Information technology – Procedures for the registration of cultural elements

Unicode Technical Standard #35: Unicode Locale Data Markup Language (LDML). Version 1.8 (2010-04-28)

6 EOR Delta Table

%% EOR's EORDeltaTable
%
%% European Ordering Rules.
%
% EOR delta for MES-3 from ISO/IEC 14651:2007's CTT (ISO14651_2006_TABLE1_en.txt).
%
% EOR uses four levels for comparison

reorder-after <BASE> % Introduce the LIG weight.
collating-symbol <LIG>
<LIG>
reorder-end

reorder-after <VRNT3> %Introduce more variants
collating-symbol <VRNT4>
collating-symbol <VRNT5>
collating-symbol <VRNT6>
collating-symbol <VRNT7>
collating-symbol <VRNT8>
collating-symbol <VRNT9>
<VRNT4>
<VRNT5>
<VRNT6>
<VRNT7>
<VRNT8>
<VRNT9>
reorder-end

reorder-after <S0584> %Introduce a weight for U0587 ARMENIAN SMALL LIGATURE ECH YIWN
collating-symbol <ECH-YIWN>
<ECH-YIWN>
reorder-end

reorder-after <SFFFF> 

order_start forward;forward;forward;forward

% Non-alphanumeric characters (including some modifier letters):

% The DRACHMA SIGN is already in ISO14651_2006 ignorable on levels 1-3
<U0024> IGNORE;IGNORE;IGNORE;<U0024> % DOLLAR SIGN
<U00A2> IGNORE;IGNORE;IGNORE;<U00A2> % CENT SIGN
<U00A3> IGNORE;IGNORE;IGNORE;<U00A3> % POUND SIGN
<U00A4> IGNORE;IGNORE;IGNORE;<U00A4> % CURRENCY SIGN
<U00A5> IGNORE;IGNORE;IGNORE;<U00A5> % YEN SIGN
<U20A0> IGNORE;IGNORE;IGNORE;<U20A0> % EURO-CURRENCY SIGN
<U20A1> IGNORE;IGNORE;IGNORE;<U20A1> % COLON SIGN
<U20A2> IGNORE;IGNORE;IGNORE;<U20A2> % CRUZEIRO SIGN
<U20A3> IGNORE;IGNORE;IGNORE;<U20A3> % FRENCH FRANC SIGN
<U20A4> IGNORE;IGNORE;IGNORE;<U20A4> % LIRA SIGN
<U20A5> IGNORE;IGNORE;IGNORE;<U20A5> % MILL SIGN
<U20A6> IGNORE;IGNORE;IGNORE;<U20A6> % NAIRA SIGN
<U20A7> IGNORE;IGNORE;IGNORE;<U20A7> % PESETA SIGN
<U20A8> IGNORE;IGNORE;IGNORE;<U20A8> % RUPEE SIGN
<U20A9> IGNORE;IGNORE;IGNORE;<U20A9> % WON SIGN
<U20AA> IGNORE;IGNORE;IGNORE;<U20AA> % NEW SHEQEL SIGN
<U20AB> IGNORE;IGNORE;IGNORE;<U20AB> % DONG SIGN
<U20AC> IGNORE;IGNORE;IGNORE;<U20AC> % EURO SIGN
<U20AD> IGNORE;IGNORE;IGNORE;<U20AD> % KIP SIGN
<U20AE> IGNORE;IGNORE;IGNORE;<U20AE> % TUGRIK SIGN
<U20AF> IGNORE;IGNORE;IGNORE;<U20AF> % DRACHMA SIGN
<U20B0> IGNORE;IGNORE;IGNORE;<U20B0> % GERMAN PENNY SIGN
<U20B1> IGNORE;IGNORE;IGNORE;<U20B1> % PESO SIGN
<U20B2> IGNORE;IGNORE;IGNORE;<U20B2> % GUARANI SIGN
<U20B3> IGNORE;IGNORE;IGNORE;<U20B3> % AUSTRAL SIGN
<U20B4> IGNORE;IGNORE;IGNORE;<U20B4> % HRYVNIA SIGN
<U20B5> IGNORE;IGNORE;IGNORE;<U20B5> % CEDI SIGN

% Modifier letters that are not ignorable in ISO14651_2006_TABLE1_en.txt
<U02B0> IGNORE;IGNORE;IGNORE;<U02B0> % MODIFIER LETTER SMALL H
<U02B1> IGNORE;IGNORE;IGNORE;<U02B1> % MODIFIER LETTER SMALL H WITH HOOK
<U02B2> IGNORE;IGNORE;IGNORE;<U02B2> % MODIFIER LETTER SMALL J
<U02B3> IGNORE;IGNORE;IGNORE;<U02B3> % MODIFIER LETTER SMALL R
<U02B4> IGNORE;IGNORE;IGNORE;<U02B4> % MODIFIER LETTER SMALL TURNED R
<U02B5> IGNORE;IGNORE;IGNORE;<U02B5> % MODIFIER LETTER SMALL TURNED R WITH HOOK
<U02B6> IGNORE;IGNORE;IGNORE;<U02B6> % MODIFIER LETTER SMALL CAPITAL INVERTED R
<U02B7> IGNORE;IGNORE;IGNORE;<U02B7> % MODIFIER LETTER SMALL W
<U02B8> IGNORE;IGNORE;IGNORE;<U02B8> % MODIFIER LETTER SMALL Y
<U02BB> IGNORE;IGNORE;IGNORE;<U02BB> % MODIFIER LETTER TURNED COMMA
<U02BC> IGNORE;IGNORE;IGNORE;<U02BC> % MODIFIER LETTER APOSTROPHE
<U02BD> IGNORE;IGNORE;IGNORE;<U02BD> % MODIFIER LETTER REVERSED COMMA
<U02BE> IGNORE;IGNORE;IGNORE;<U02BE> % MODIFIER LETTER RIGHT HALF RING
<U02BF> IGNORE;IGNORE;IGNORE;<U02BF> % MODIFIER LETTER LEFT HALF RING
<U02C0> IGNORE;IGNORE;IGNORE;<U02C0> % MODIFIER LETTER GLOTTAL STOP
<U02C1> IGNORE;IGNORE;IGNORE;<U02C1> % MODIFIER LETTER REVERSED GLOTTAL STOP
<U02D0> IGNORE;IGNORE;IGNORE;<U02D0> % MODIFIER LETTER TRIANGULAR COLON
<U02D1> IGNORE;IGNORE;IGNORE;<U02D1> % MODIFIER LETTER HALF TRIANGULAR COLON
<U02E0> IGNORE;IGNORE;IGNORE;<U02E0> % MODIFIER LETTER SMALL GAMMA
<U02E1> IGNORE;IGNORE;IGNORE;<U02E1> % MODIFIER LETTER SMALL L
<U02E2> IGNORE;IGNORE;IGNORE;<U02E2> % MODIFIER LETTER SMALL S
<U02E4> IGNORE;IGNORE;IGNORE;<U02E4> % MODIFIER LETTER SMALL REVERSED GLOTTAL STOP
<U02EE> IGNORE;IGNORE;IGNORE;<U02EE> % MODIFIER LETTER DOUBLE APOSTROPHE

<U0294> IGNORE;IGNORE;IGNORE;<U0294> % LATIN LETTER GLOTTAL STOP
<U0295> IGNORE;IGNORE;IGNORE;<U0295> % LATIN LETTER PHARYNGEAL VOICED FRICATIVE
<U0296> IGNORE;IGNORE;IGNORE;<U0296> % LATIN LETTER INVERTED GLOTTAL STOP
<U0298> IGNORE;IGNORE;IGNORE;<U0298> % LATIN LETTER BILABIAL CLICK
<U02A1> IGNORE;IGNORE;IGNORE;<U02A1> % LATIN LETTER GLOTTAL STOP WITH STROKE
<U02A2> IGNORE;IGNORE;IGNORE;<U02A2> % LATIN LETTER REVERSED GLOTTAL STOP WITH STROKE

%% 
% Latin 
% Almost all changes here result from CEN/TC304's resolution
% for the Latin script part of the Modern European Scripts / MES-3 to
% treat only the letters a to z and thorn as distinct on the first
% level and treat other combinations as variants or ligatures

<U0250> <S0061>;"<BASE><VRNT1>";"<MIN><MIN>";<U0250> % LATIN SMALL LETTER TURNED A
<U0251> <S0061>;"<BASE><VRNT2>";"<MIN><MIN>";<U0251> % LATIN SMALL LETTER ALPHA
<U0252> <S0061>;"<BASE><VRNT3>";"<MIN><MIN>";<U0252> % LATIN SMALL LETTER TURNED ALPHA

<U0299> <S0062>;"<BASE><VRNT1>";"<MIN><MIN>";<U0299> % LATIN LETTER SMALL CAPITAL B
<U0180> <S0062>;"<BASE><VRNT2>";"<MIN><MIN>";<U0180> % LATIN SMALL LETTER B WITH STROKE
<U0243> <S0062>;"<BASE><VRNT2>";"<CAP><MIN>";<U0243> % LATIN CAPITAL LETTER B WITH STROKE
<U0253> <S0062>;"<BASE><VRNT3>";"<MIN><MIN>";<U0253> % LATIN SMALL LETTER B WITH HOOK
<U0181> <S0062>;"<BASE><VRNT3>";"<CAP><MIN>";<U0181> % LATIN CAPITAL LETTER B WITH HOOK
<U0183> <S0062>;"<BASE><VRNT4>";"<MIN><MIN>";<U0183> % LATIN SMALL LETTER B WITH TOPBAR
<U0182> <S0062>;"<BASE><VRNT4>";"<CAP><MIN>";<U0182> % LATIN CAPITAL LETTER B WITH TOPBAR

<U0188> <S0063>;"<BASE><VRNT1>";"<MIN><MIN>";<U0188> % LATIN SMALL LETTER C WITH HOOK
<U0187> <S0063>;"<BASE><VRNT1>";"<CAP><MIN>";<U0187> % LATIN CAPITAL LETTER C WITH HOOK
<U0255> <S0063>;"<BASE><VRNT2>";"<MIN><MIN>";<U0255> % LATIN SMALL LETTER C WITH CURL
<U0297> <S0063>;"<BASE><VRNT3>";"<MIN><MIN>";<U0297> % LATIN LETTER STRETCHED C

% <VRNT1> is used for U00F0 LATIN SMALL LETTER ETH (already in CTT)
<U0256> <S0064>;"<BASE><VRNT2>";"<MIN><MIN>";<U0256> % LATIN SMALL LETTER D WITH TAIL
<U0189> <S0064>;"<BASE><VRNT2>";"<CAP><MIN>";<U0189> % LATIN CAPITAL LETTER AFRICAN D
<U0257> <S0064>;"<BASE><VRNT3>";"<MIN><MIN>";<U0257> % LATIN SMALL LETTER D WITH HOOK
<U018A> <S0064>;"<BASE><VRNT3>";"<CAP><MIN>";<U018A> % LATIN CAPITAL LETTER D WITH HOOK
<U018C> <S0064>;"<BASE><VRNT4>";"<MIN><MIN>";<U018C> % LATIN SMALL LETTER D WITH TOPBAR
<U018B> <S0064>;"<BASE><VRNT4>";"<CAP><MIN>";<U018B> % LATIN CAPITAL LETTER D WITH TOPBAR
<U0221> <S0064>;"<BASE><VRNT5>";"<MIN><MIN>";<U0221> % LATIN SMALL LETTER D WITH CURL
<U018D> <S0064>;"<BASE><VRNT6>";"<MIN><MIN>";<U018D> % LATIN SMALL LETTER TURNED DELTA

<U02A5> "<S0064><S007A>";"<BASE><BASE><VRNT4>";"<COMPAT><COMPAT><COMPAT>";<U02A5> % LATIN SMALL LETTER DZ DIGRAPH WITH CURL
<U02A4> "<S0064><S007A>";"<BASE><BASE><VRNT5>";"<COMPAT><COMPAT><COMPAT>";<U02A4> % LATIN SMALL LETTER DEZH DIGRAPH

<U01DD> <S0065>;"<BASE><VRNT1>";"<MIN><MIN>";<U01DD> % LATIN SMALL LETTER TURNED E
<U018E> <S0065>;"<BASE><VRNT1>";"<CAP><MIN>";<U018E> % LATIN CAPITAL LETTER REVERSED E
<U0259> <S0065>;"<BASE><VRNT2>";"<MIN><MIN>";<U0259> % LATIN SMALL LETTER SCHWA
<U018F> <S0065>;"<BASE><VRNT2>";"<CAP><MIN>";<U018F> % LATIN CAPITAL LETTER SCHWA
<U025B> <S0065>;"<BASE><VRNT3>";"<MIN><MIN>";<U025B> % LATIN SMALL LETTER OPEN E
<U0190> <S0065>;"<BASE><VRNT3>";"<CAP><MIN>";<U0190> % LATIN CAPITAL LETTER OPEN E
<U0258> <S0065>;"<BASE><VRNT4>";"<MIN><MIN>";<U0258> % LATIN SMALL LETTER REVERSED E
<U025A> <S0065>;"<BASE><VRNT5>";"<MIN><MIN>";<U025A> % LATIN SMALL LETTER SCHWA WITH HOOK
<U025C> <S0065>;"<BASE><VRNT6>";"<MIN><MIN>";<U025C> % LATIN SMALL LETTER REVERSED OPEN E
<U025D> <S0065>;"<BASE><VRNT7>";"<MIN><MIN>";<U025D> % LATIN SMALL LETTER REVERSED OPEN E WITH HOOK
<U025E> <S0065>;"<BASE><VRNT8>";"<MIN><MIN>";<U025E> % LATIN SMALL LETTER CLOSED REVERSED OPEN E
<U029A> <S0065>;"<BASE><VRNT9>";"<MIN><MIN>";<U029A> % LATIN SMALL LETTER CLOSED OPEN E

<U0192> <S0066>;"<BASE><VRNT1>";"<MIN><MIN>";<U0192> % LATIN SMALL LETTER F WITH HOOK
<U0191> <S0066>;"<BASE><VRNT1>";"<CAP><MIN>";<U0191> % LATIN CAPITAL LETTER F WITH HOOK

<U0261> <S0067>;"<BASE><VRNT1>";"<MIN><MIN>";<U0261> % LATIN SMALL LETTER SCRIPT G
<U0262> <S0067>;"<BASE><VRNT2>";"<MIN><MIN>";<U0262> % LATIN LETTER SMALL CAPITAL G
<U01E5> <S0067>;"<BASE><VRNT3>";"<MIN><MIN>";<U01E5> % LATIN SMALL LETTER G WITH STROKE
<U01E4> <S0067>;"<BASE><VRNT3>";"<CAP><MIN>";<U01E4> % LATIN CAPITAL LETTER G WITH STROKE
<U0260> <S0067>;"<BASE><VRNT4>";"<MIN><MIN>";<U0260> % LATIN SMALL LETTER G WITH HOOK
<U0193> <S0067>;"<BASE><VRNT4>";"<CAP><MIN>";<U0193> % LATIN CAPITAL LETTER G WITH HOOK
<U029B> <S0067>;"<BASE><VRNT5>";"<MIN><MIN>";<U029B> % LATIN LETTER SMALL CAPITAL G WITH HOOK
<U0263> <S0067>;"<BASE><VRNT6>";"<MIN><MIN>";<U0263> % LATIN SMALL LETTER GAMMA
<U0194> <S0067>;"<BASE><VRNT6>";"<CAP><MIN>";<U0194> % LATIN CAPITAL LETTER GAMMA
<U0264> <S0067>;"<BASE><VRNT7>";"<MIN><MIN>";<U0264> % LATIN SMALL LETTER RAMS HORN
<U01A3> <S0067>;"<BASE><VRNT8>";"<MIN><MIN>";<U01A3> % LATIN SMALL LETTER OI
<U01A2> <S0067>;"<BASE><VRNT8>";"<CAP><MIN>";<U01A2> % LATIN CAPITAL LETTER OI

<U029C> <S0068>;"<BASE><VRNT1>";"<MIN><MIN>";<U029C> % LATIN LETTER SMALL CAPITAL H
<U0266> <S0068>;"<BASE><VRNT2>";"<MIN><MIN>";<U0266> % LATIN SMALL LETTER H WITH HOOK
<U0267> <S0068>;"<BASE><VRNT3>";"<MIN><MIN>";<U0267> % LATIN SMALL LETTER HENG WITH HOOK
<U0265> <S0068>;"<BASE><VRNT4>";"<MIN><MIN>";<U0265> % LATIN SMALL LETTER TURNED H
<U02AE> <S0068>;"<BASE><VRNT5>";"<MIN><MIN>";<U02AE> % LATIN SMALL LETTER TURNED H WITH FISHHOOK 
<U02AF> <S0068>;"<BASE><VRNT6>";"<MIN><MIN>";<U02AF> % LATIN SMALL LETTER TURNED H WITH FISHHOOK AND TAIL

<U0195> "<S0068><S0076>";"<BASE><BASE>";"<MIN><MIN>";<U0195> % LATIN SMALL LETTER HV
<U01F6> "<S0068><S0076>";"<BASE><BASE>";"<CAP><MIN>";<U01F6> % LATIN CAPITAL LETTER HWAIR

<U0131> <S0069>;"<BASE><VRNT1>";"<MIN><MIN>";<U0131> % LATIN SMALL LETTER DOTLESS I
<U026A> <S0069>;"<BASE><VRNT2>";"<MIN><MIN>";<U026A> % LATIN LETTER SMALL CAPITAL I
<U0268> <S0069>;"<BASE><VRNT3>";"<MIN><MIN>";<U0268> % LATIN SMALL LETTER I WITH STROKE
<U0197> <S0069>;"<BASE><VRNT3>";"<CAP><MIN>";<U0197> % LATIN CAPITAL LETTER I WITH STROKE
<U0269> <S0069>;"<BASE><VRNT4>";"<MIN><MIN>";<U0269> % LATIN SMALL LETTER IOTA
<U0196> <S0069>;"<BASE><VRNT4>";"<CAP><MIN>";<U0196> % LATIN CAPITAL LETTER IOTA

<U029D> <S006A>;"<BASE><VRNT1>";"<MIN><MIN>";<U029D> % LATIN SMALL LETTER J WITH CROSSED-TAIL
<U025F> <S006A>;"<BASE><VRNT2>";"<MIN><MIN>";<U025F> % LATIN SMALL LETTER DOTLESS J WITH STROKE
<U0284> <S006A>;"<BASE><VRNT3>";"<MIN><MIN>";<U0284> % LATIN SMALL LETTER DOTLESS J WITH STROKE AND HOOK

<U0199> <S006B>;"<BASE><VRNT1>";"<MIN><MIN>";<U0199> % LATIN SMALL LETTER K WITH HOOK
<U0198> <S006B>;"<BASE><VRNT1>";"<CAP><MIN>";<U0198> % LATIN CAPITAL LETTER K WITH HOOK
<U0138> <S006B>;"<BASE><VRNT2>";"<MIN><MIN>";<U0138> % LATIN SMALL LETTER KRA
<U029E> <S006B>;"<BASE><VRNT3>";"<MIN><MIN>";<U029E> % LATIN SMALL LETTER TURNED K

%<VRNT1> is used for U0140 LATIN SMALL LETTER L WITH MIDDLE DOT (already in CTT)
<U029F> <S006C>;"<BASE><VRNT2>";"<MIN><MIN>";<U029F> % LATIN LETTER SMALL CAPITAL L
<U019A> <S006C>;"<BASE><VRNT3>";"<MIN><MIN>";<U019A> % LATIN SMALL LETTER L WITH BAR
<U023D> <S006C>;"<BASE><VRNT3>";"<CAP><MIN>";<U023D> % LATIN CAPITAL LETTER L WITH BAR
<U026B> <S006C>;"<BASE><VRNT4>";"<MIN><MIN>";<U026B> % LATIN SMALL LETTER L WITH MIDDLE TILDE
<U026C> <S006C>;"<BASE><VRNT5>";"<MIN><MIN>";<U026C> % LATIN SMALL LETTER L WITH BELT
<U026D> <S006C>;"<BASE><VRNT6>";"<MIN><MIN>";<U026D> % LATIN SMALL LETTER L WITH RETROFLEX HOOK
<U0234> <S006C>;"<BASE><VRNT7>";"<MIN><MIN>";<U0234> % LATIN SMALL LETTER L WITH CURL
<U019B> <S006C>;"<BASE><VRNT8>";"<MIN><MIN>";<U019B> % LATIN SMALL LETTER LAMBDA WITH STROKE

<U026E> "<S006C><S007A>";"<BASE><BASE><VRNT5>";"<MIN><MIN><MIN>";<U026E> % LATIN SMALL LETTER LEZH

<U0271> <S006D>;"<BASE><VRNT1>";"<MIN><MIN>";<U0271> % LATIN SMALL LETTER M WITH HOOK
<U026F> <S006D>;"<BASE><VRNT2>";"<MIN><MIN>";<U026F> % LATIN SMALL LETTER TURNED M
<U019C> <S006D>;"<BASE><VRNT2>";"<CAP><MIN>";<U019C> % LATIN CAPITAL LETTER TURNED M
<U0270> <S006D>;"<BASE><VRNT3>";"<MIN><MIN>";<U0270> % LATIN SMALL LETTER TURNED M WITH LONG LEG

<U0149> <S006E>;"<BASE><VRNT1>";"<MIN><MIN>";<U0149> % LATIN SMALL LETTER N PRECEDED BY APOSTROPHE
<U0274> <S006E>;"<BASE><VRNT2>";"<MIN><MIN>";<U0274> % LATIN LETTER SMALL CAPITAL N
<U0272> <S006E>;"<BASE><VRNT3>";"<MIN><MIN>";<U0272> % LATIN SMALL LETTER N WITH LEFT HOOK
<U019D> <S006E>;"<BASE><VRNT3>";"<CAP><MIN>";<U019D> % LATIN CAPITAL LETTER N WITH LEFT HOOK
<U019E> <S006E>;"<BASE><VRNT4>";"<MIN><MIN>";<U019E> % LATIN SMALL LETTER N WITH LONG RIGHT LEG
<U0220> <S006E>;"<BASE><VRNT4>";"<CAP><MIN>";<U0220> % LATIN CAPITAL LETTER N WITH LONG RIGHT LEG
<U0273> <S006E>;"<BASE><VRNT5>";"<MIN><MIN>";<U0273> % LATIN SMALL LETTER N WITH RETROFLEX HOOK
<U0235> <S006E>;"<BASE><VRNT6>";"<MIN><MIN>";<U0235> % LATIN SMALL LETTER N WITH CURL
<U014B> <S006E>;"<BASE><VRNT7>";"<MIN><MIN>";<U014B> % LATIN SMALL LETTER ENG
<U014A> <S006E>;"<BASE><VRNT7>";"<CAP><MIN>";<U014A> % LATIN CAPITAL LETTER ENG

% <VRNT1> is used for U0153 LATIN SMALL LIGATURE OE (already in CTT)
<U0254> <S006F>;"<BASE><VRNT2>";"<MIN><MIN>";<U0254> % LATIN SMALL LETTER OPEN O
<U0186> <S006F>;"<BASE><VRNT2>";"<CAP><MIN>";<U0186> % LATIN CAPITAL LETTER OPEN O
<U0275> <S006F>;"<BASE><VRNT3>";"<MIN><MIN>";<U0275> % LATIN SMALL LETTER BARRED O
<U019F> <S006F>;"<BASE><VRNT3>";"<CAP><MIN>";<U019F> % LATIN CAPITAL LETTER O WITH MIDDLE TILDE
<U0277> <S006F>;"<BASE><VRNT4>";"<MIN><MIN>";<U0277> % LATIN SMALL LETTER CLOSED OMEGA
<U0223> <S006F>;"<BASE><VRNT5>";"<MIN><MIN>";<U0223> % LATIN SMALL LETTER OU
<U0222> <S006F>;"<BASE><VRNT5>";"<CAP><MIN>";<U0222> % LATIN CAPITAL LETTER OU

<U0276> "<S006F><S0065>";"<BASE><VRNT1><BASE>";"<COMPAT><COMPAT><COMPAT>";<U0276> % LATIN LETTER SMALL CAPITAL OE

<U01A5> <S0070>;"<BASE><VRNT1>";"<MIN><MIN>";<U01A5> % LATIN SMALL LETTER P WITH HOOK
<U01A4> <S0070>;"<BASE><VRNT1>";"<CAP><MIN>";<U01A4> % LATIN CAPITAL LETTER P WITH HOOK
<U0278> <S0070>;"<BASE><VRNT2>";"<MIN><MIN>";<U0278> % LATIN SMALL LETTER PHI 

<U02A0> <S0071>;"<BASE><VRNT1>";"<MIN><MIN>";<U02A0> % LATIN SMALL LETTER Q WITH HOOK

<U0280> <S0072>;"<BASE><VRNT1>";"<MIN><MIN>";<U01A6> % LATIN LETTER SMALL CAPITAL R
<U01A6> <S0072>;"<BASE><VRNT1>";"<CAP><MIN>";<U01A6> % LATIN LETTER YR
<U0279> <S0072>;"<BASE><VRNT2>";"<MIN><MIN>";<U0279> % LATIN SMALL LETTER TURNED R
<U027A> <S0072>;"<BASE><VRNT3>";"<MIN><MIN>";<U027A> % LATIN SMALL LETTER TURNED R WITH LONG LEG
<U027B> <S0072>;"<BASE><VRNT4>";"<MIN><MIN>";<U027B> % LATIN SMALL LETTER TURNED R WITH HOOK
<U027C> <S0072>;"<BASE><VRNT5>";"<MIN><MIN>";<U027C> % LATIN SMALL LETTER R WITH LONG LEG
<U027D> <S0072>;"<BASE><VRNT6>";"<MIN><MIN>";<U027D> % LATIN SMALL LETTER R WITH TAIL
<U027E> <S0072>;"<BASE><VRNT7>";"<MIN><MIN>";<U027E> % LATIN SMALL LETTER R WITH FISHHOOK
<U027F> <S0072>;"<BASE><VRNT8>";"<MIN><MIN>";<U027F> % LATIN SMALL LETTER REVERSED R WITH FISHHOOK
<U0281> <S0072>;"<BASE><VRNT9>";"<MIN><MIN>";<U0281> % LATIN LETTER SMALL CAPITAL INVERTED R

% <VRNT1> is used for U00DF LATIN SMALL LETTER SHARP S (already in CTT)
% <VRNT2> is used for U017F LATIN SMALL LETTER LONG S (already in CTT)
<U0282> <S0073>;"<BASE><VRNT3>";"<MIN><MIN>";<U0282> % LATIN SMALL LETTER S WITH HOOK
<U0283> <S0073>;"<BASE><VRNT4>";"<MIN><MIN>";<U0283> % LATIN SMALL LETTER ESH
<U01A9> <S0073>;"<BASE><VRNT4>";"<CAP><MIN>";<U01A9> % LATIN CAPITAL LETTER ESH
<U01AA> <S0073>;"<BASE><VRNT5>";"<MIN><MIN>";<U01AA> % LATIN LETTER REVERSED ESH LOOP
<U0285> <S0073>;"<BASE><VRNT6>";"<MIN><MIN>";<U0285> % LATIN SMALL LETTER SQUAT REVERSED ESH
<U0286> <S0073>;"<BASE><VRNT7>";"<MIN><MIN>";<U0286> % LATIN SMALL LETTER ESH WITH CURL

<U0167> <S0074>;"<BASE><VRNT1>";"<MIN><MIN>";<U0167> % LATIN SMALL LETTER T WITH STROKE
<U0166> <S0074>;"<BASE><VRNT1>";"<CAP><MIN>";<U0166> % LATIN CAPITAL LETTER T WITH STROKE
<U01AB> <S0074>;"<BASE><VRNT2>";"<MIN><MIN>";<U01AB> % LATIN SMALL LETTER T WITH PALATAL HOOK
<U01AD> <S0074>;"<BASE><VRNT3>";"<MIN><MIN>";<U01AD> % LATIN SMALL LETTER T WITH HOOK
<U01AC> <S0074>;"<BASE><VRNT3>";"<CAP><MIN>";<U01AC> % LATIN CAPITAL LETTER T WITH HOOK
<U0288> <S0074>;"<BASE><VRNT4>";"<MIN><MIN>";<U0288> % LATIN SMALL LETTER T WITH RETROFLEX HOOK
<U01AE> <S0074>;"<BASE><VRNT4>";"<CAP><MIN>";<U01AE> % LATIN CAPITAL LETTER T WITH RETROFLEX HOOK
<U0236> <S0074>;"<BASE><VRNT5>";"<MIN><MIN>";<U0236> % LATIN SMALL LETTER T WITH CURL
<U0287> <S0074>;"<BASE><VRNT6>";"<MIN><MIN>";<U0287> % LATIN SMALL LETTER TURNED T

<U02A8> "<S0074><S0063>";"<BASE><BASE><VRNT2>";"<COMPAT><COMPAT><COMPAT>";<U02A8> % LATIN SMALL LETTER TC DIGRAPH WITH CURL

<U0289> <S0075>;"<BASE><VRNT1>";"<MIN><MIN>";<U0289> % LATIN SMALL LETTER U BAR
<U0244> <S0075>;"<BASE><VRNT1>";"<CAP><MIN>";<U0244> % LATIN CAPITAL LETTER U BAR
<U028A> <S0075>;"<BASE><VRNT2>";"<MIN><MIN>";<U028A> % LATIN SMALL LETTER UPSILON
<U01B1> <S0075>;"<BASE><VRNT2>";"<CAP><MIN>";<U01B1> % LATIN CAPITAL LETTER UPSILON

<U028B> <S0076>;"<BASE><VRNT1>";"<MIN><MIN>";<U028B> % LATIN SMALL LETTER V WITH HOOK
<U01B2> <S0076>;"<BASE><VRNT1>";"<CAP><MIN>";<U01B2> % LATIN CAPITAL LETTER V WITH HOOK
<U028C> <S0076>;"<BASE><VRNT2>";"<MIN><MIN>";<U028C> % LATIN SMALL LETTER TURNED V
<U0245> <S0076>;"<BASE><VRNT2>";"<CAP><MIN>";<U0245> % LATIN CAPITAL LETTER TURNED V

<U028D> <S0077>;"<BASE><VRNT1>";"<MIN><MIN>";<U028D> % LATIN SMALL LETTER TURNED W
<U01BF> <S0077>;"<BASE><VRNT2>";"<MIN><MIN>";<U01BF> % LATIN LETTER WYNN
<U01F7> <S0077>;"<BASE><VRNT2>";"<CAP><MIN>";<U01F7> % LATIN CAPITAL LETTER WYNN

<U028F> <S0079>;"<BASE><VRNT1>";"<MIN><MIN>";<U028F> % LATIN LETTER SMALL CAPITAL Y 
<U01B4> <S0079>;"<BASE><VRNT2>";"<MIN><MIN>";<U01B4> % LATIN SMALL LETTER Y WITH HOOK
<U01B3> <S0079>;"<BASE><VRNT2>";"<CAP><MIN>";<U01B3> % LATIN CAPITAL LETTER Y WITH HOOK
<U028E> <S0079>;"<BASE><VRNT3>";"<MIN><MIN>";<U028E> % LATIN SMALL LETTER TURNED Y
<U021D> <S0079>;"<BASE><VRNT4>";"<MIN><MIN>";<U021D> % LATIN SMALL LETTER YOGH
<U021C> <S0079>;"<BASE><VRNT4>";"<CAP><MIN>";<U021C> % LATIN CAPITAL LETTER YOGH

<U01B6> <S007A>;"<BASE><VRNT1>";"<MIN><MIN>";<U01B6> % LATIN SMALL LETTER Z WITH STROKE
<U01B5> <S007A>;"<BASE><VRNT1>";"<CAP><MIN>";<U01B5> % LATIN CAPITAL LETTER Z WITH STROKE
<U0225> <S007A>;"<BASE><VRNT2>";"<MIN><MIN>";<U0225> % LATIN SMALL LETTER Z WITH HOOK
<U0224> <S007A>;"<BASE><VRNT2>";"<CAP><MIN>";<U0224> % LATIN CAPITAL LETTER Z WITH HOOK 
<U0290> <S007A>;"<BASE><VRNT3>";"<MIN><MIN>";<U0290> % LATIN SMALL LETTER Z WITH RETROFLEX HOOK
<U0291> <S007A>;"<BASE><VRNT4>";"<MIN><MIN>";<U0291> % LATIN SMALL LETTER Z WITH CURL
<U0292> <S007A>;"<BASE><VRNT5>";"<MIN><MIN>";<U0292> % LATIN SMALL LETTER EZH
<U01B7> <S007A>;"<BASE><VRNT5>";"<CAP><MIN>";<U01B7> % LATIN CAPITAL LETTER EZH
<U01EF> <S007A>;"<BASE><VRNT5><CARON>";"<MIN><MIN><MIN>";<U01EF> % LATIN SMALL LETTER EZH WITH CARON
<U01EE> <S007A>;"<BASE><VRNT5><CARON>";"<CAP><MIN><MIN>";<U01EE> % LATIN CAPITAL LETTER EZH WITH CARON
<U01B9> <S007A>;"<BASE><VRNT6>";"<MIN><MIN>";<U01B9> % LATIN SMALL LETTER EZH REVERSED
<U01B8> <S007A>;"<BASE><VRNT6>";"<CAP><MIN>";<U01B8> % LATIN CAPITAL LETTER EZH REVERSED
<U01BA> <S007A>;"<BASE><VRNT7>";"<MIN><MIN>";<U01BA> % LATIN SMALL LETTER EZH WITH TAIL 
<U0293> <S007A>;"<BASE><VRNT8>";"<MIN><MIN>";<U0293> % LATIN SMALL LETTER EZH WITH CURL


% Greek
% ISO14651_2006_TABLE1_en.txt now contains the tailorings of CR 14400 in its CTT

% Full conformance with GOST requirements for Cyrillic letters

<U0453> <S0452>;"<BASE><VRNT1>";"<MIN><MIN>";<U0453> % CYRILLIC SMALL LETTER GJE
<U0403> <S0452>;"<BASE><VRNT1>";"<CAP><MIN>";<U0403> % CYRILLIC CAPITAL LETTER GJE

<U045C> <S045B>;"<BASE><VRNT1>";"<MIN><MIN>";<U045C> % CYRILLIC SMALL LETTER KJE
<U040C> <S045B>;"<BASE><VRNT1>";"<CAP><MIN>";<U040C> % CYRILLIC CAPITAL LETTER KJE

% Georgian: Identical to ISO14651_2006_TABLE1_en.txt

% Armenian: 
<U0587> <ECH-YIWN>;<BASE>;<CAP>;<U0587> % ARMENIAN SMALL LIGATURE ECH YIWN


reorder-end %% for EOR's EORDeltaTable

Annex A (informative): Principles behind the European Ordering Rules

A.0 Introduction

This annex aims to present the information inherent in section 6 in a more accessible form for those who are interested in the principles guiding the composition of the table. Those readers not concerned with implementation details may take this more traditional treatment of the matter as an authoritative interpretation of the body of this European Standard.

A.1 Terms and definitions

For the purpose of this annex, the following definitions apply in addition to those in the body of this European Standard (see section 3).

A.1.1

digit

any of the characters 0 (U0030), 1 (U0031), 2 (U0032), 3 (U0033), 4 (U0034), 5 (U0035), 6 (U0036), 7 (U0037), 8 (U0038), 9 (U0039)

A.1.2

letter

character used to represent (either alone or in combination) sounds or sequences of sounds of a natural language in writing

A.1.3

first level letter

character that is a member of the following list of letters:

Latin script:

Greek script:

Cyrillic script:

Georgian script:

Armenian script:

A.1.4

diacritical mark

any of a number of recurring graphical structures placed over, under or next to a first level letter which does not significantly modify the shape of the first level letter itself and which in combination with that first level letter is a valid letter.

A.1.5

letter with diacritical marks

letter which can be seen as equivalent to the combination between a first level letter and one or more diacritical marks

A.1.6

equivalent letter form

character created by joining two or more distinct first level letters or two or more letters with diacritical marks or any combination of these

A.1.7

second level letter

letter that is neither a first level letter nor an equivalent letter form nor a letter with diacritical marks

A.1.8

capital letter

letter which has the string CAPITAL in its name in ISO/IEC 10646

Latin script:

Greek script:

Cyrillic script:

Georgian script:

Armenian script:

A.1.9

small letter

letter which is not a capital letter

NOTE: A small letter is also known as a lowercase letter

A.1.10

special character

character that is neither a letter nor a digit

A.1.11

space character

one of the special characters listed in 20.1 of ISO/IEC 10646:2003

A.2 Preparatory procedures

A.2.1 Purpose

Most ordering tasks require more than simply the ordering of strings. In a telephone directory, for example, one might want to order by names first, followed by addresses and phone numbers, recurring to addresses only when ordering by names fails to establish a unique sequence and to phone numbers only if both names and addresses are identical.

Each of these units is called a key and the approach is called the multiple ordering key approach.

A.2.2 Methodology

More rigorously expressed, the multiple ordering key approach implies the preprocessing of the data in the following steps, any or all of which may be omitted, especially in the case of a single ordering key:

  1. subdivision of data into multiple ordering keys through the introduction of a higher level protocol
  2. establishing a hierarchy between these keys
  3. extracting the keys from the data
  4. subjecting the keys to some form of normalization

    NOTE This normalization might include, but is not limited to: changing capital letters to small letters where it is considered appropriate (e. g. in the case of sentence initial capitals or capitals for emphasis), lemmatization (especially for inflected languages), expansion of abbreviations, or reduction of blanks between words to one throughout the data. It can also be left out entirely. NOTE An especially important step is usually the correct treatment of numeral strings where leading zeroes might have to be introduced to ensure proper comparisons between corresponding decimals. Failure to do so may result in faulty ordering.

Starting with the keys highest in the hierarchy equivalent keys which were thus obtained are compared with the aid of the ordering rules as established in this European Standard. As soon as a unique sequence is established, further keys are ignored.

A.2.3 Further preprocessing

Further preprocessing of some kind may or may not be necessary, but is not within the scope of this European Standard.

This European Standard assumes that users have already performed these preparatory procedures which are left entirely to their discretion and are thus out of its scope. It is concerned exclusively with the ordering of strings which belong to one key and which have undergone those preparatory procedures.

A.3 The multilevel ordering procedure

A.3.1 General principles

This European Standard defines in this informative annex a multilevel ordering procedure whose results are identical to those produced by the application of the rules of the body of this standard.

Multilevel ordering procedure means that the input strings are first compared on the first ordering level. Only when the procedure described for this level fails to establish a unique and determined sequence for the strings the different parts of the second ordering level are taken into consideration. If this likewise fails to produce a unique sequence the third ordering level is invoked, and after this the fourth ordering level. If this also cannot establish a unique sequence, two strings are regarded as equivalent.

Each level compares two strings in the following manner: The first non-ignored characters are compared. If the ordering rules for that level specify a unique and determined sequence for these characters then this determines the sequence of the strings. If not, the second non-ignored characters are compared, and so forth until one of the following conditions is met. If more than one of the conditions are true, only the first one which is fulfilled is applicable:

  1. the ordering rules for that level define a unique sequence for the two non-ignored characters which is then also the ordering sequence for the strings;
  2. one of the strings has no more non-ignored characters whereas the other has. Then the string without more characters precedes the other one;
  3. both strings have no more non-ignored characters. Then the next ordering level, if existing, is invoked. If there are no more levels, the two strings are deemed equivalent.

A.3.2 Assumptions and aims

This European Standard acts according to certain assumptions:

These assumptions motivate a set of principles that underlie these European Ordering Rules and help to clarify the decisions taken:

A.3.3 Rules (valid throughout)

A.3.3.1 Ordering by script

Digits precede letters. Letters are ordered by scripts, putting Latin letters before Greek ones before Cyrillic ones before Georgian ones before Armenian ones.

A.3.3.2 Equivalent letter forms

Equivalent letter forms are decomposed into the letters out of which they are formed.

A.4 First ordering level

A.4.1 Validity

All of the following rules are valid for the first ordering level only.

A.4.2 Equivalent or ignored characters

A.4.2.1 Capital and small letters

Capital and small forms of the same letter are treated as equivalent.

A.4.2.2 Second level letters

Second level letters are treated as equivalent to one or more first level letters as specified in section A.8.2.

A.4.2.3 Letters with diacritical marks

Letters with diacritical marks are treated as equivalent to their corresponding first level letters.

A.4.2.4 Special characters

Special characters are ignored.

A.4.3 Ordering sequences

A.4.3.1 Digits

Digits are to be ordered in the following sequence:

A.4.3.2 Latin script

Latin first level letters are to be ordered in the following sequence:

A.4.3.3 Greek script

Greek first level letters are to be ordered in the following sequence:

A.4.3.4 Cyrillic script

Cyrillic first level letters are to be ordered in the following sequence:

A.4.3.5 Georgian script

Georgian first level letters are to be ordered in the following sequence:

A.4.3.6 Armenian script

Armenian first level letters are to be ordered in the following sequence:

A.5 Second ordering level

A.5.1 No unique sequence after the first ordering level

If the first ordering level does not result in a unique sequence, the second ordering level is invoked. It is distinguished from the first ordering level by no longer treating letters with diacritical marks and second level letters as equivalent to first level letters.

The second ordering level is divided into two parts: second level letters and diacritical marks. If the treatment of second level letters alone results in a unique sequence, diacritical marks are to be ignored.

A.5.2 Equivalent or ignored characters

A.5.2.1 Capital and small letters

Capital and small forms of the same letter are treated as equivalent.

A.5.2.2 Special characters

Special characters are ignored.

A.5.3 Ordering sequences

A.5.3.1 Second level letters

Second level letters are to be ordered after their corresponding first level letter. In the case of multiple second level letters with the same first level letter they are to be ordered in the sequence specified by A.8.2.

A.5.3.2 Letters with diacritical marks

Letters with diacritical marks that have only one diacritical mark are to be ordered with respect to their diacritical mark in the sequence indicated in section A.8.1.1. For letters with more than one diacritical mark, the diacritical mark shall be considered in the following order: Inside the character before outside; below the character before above; working from bottom to top, then from left to right. In practice, this results for MES in the sequence indicated in section A.8.1.2.

A.6 Third ordering level

A.6.1 No unique sequence after the second ordering level

If the second ordering level also does not result in a unique sequence of strings, the third ordering level is invoked. It no longer treats capital and small letters as equivalent.

A.6.2 Ignored characters

Special characters are ignored.

A.6.3 Ordering sequences

A.6.3.1 Capitalization

Small letters are ordered before the corresponding capital ones.

A.7 Fourth ordering level

A.7.1 No unique sequence after the third ordering level

If the third ordering level likewise does not result in a unique sequence of strings, the fourth ordering level is invoked. It takes special characters into account.

A.7.2.Sequence of special characters

Most special characters of the Multilingual European Subset No 3 except for currency signs are ordered in the sequence of the default tailorable template of ISO/IEC 14651:2007. For most special characters this is the order in which they are listed in ISO/IEC 10646 and relevant appendices. However, for a number of special characters ISO/IEC 14651 defines a divergent sequence in line with the specification of the Canadian standard CAN/CSA Z243.230-1996.

A.7.3 Equivalence

Two strings between which after the fourth ordering level no unique sequence can be established are considered to be equivalent.

A.8 Specific ordering sequences

A.8.1 Diacritical marks

A.8.1.1 Diacritical marks

This form of presentation has been chosen to enable the unification of diacritical marks across scripts without modifying the resulting sequence of strings. Official Greek names of the diacritics are underlined.

Shape1

Diacritical mark2

Alternative names3

­᾿­

U0313

COMBINING COMMA ABOVE

PSILI (spacing U1FBF) / spiritus lenis

­̔­

U0314

COMBINING REVERSED COMMA ABOVE

DASIA (spacing U1FFE) / spiritus asper

­´­

U0301

COMBINING ACUTE ACCENT

OXIA, Tonos

­`­

U0300

COMBINING GRAVE ACCENT

VARIA

­˘­

U0306

COMBINING BREVE

VRACHY

­ ̂ ­

U0302

COMBINING CIRCUMFLEX ACCENT

­ ̌­

U030C

COMBINING CARON

­˚­

U030A

COMBINING RING ABOVE

­῀­

U0342

COMBINING GREEK PERISPOMENI

­¨­

U0308

COMBINING DIAERESIS

DIALYTICA, umlaut, trema4

­˝­

U030B

COMBINING DOUBLE ACUTE ACCENT

­˜­

U0303

COMBINING TILDE

­˙­

U0307

COMBINING DOT ABOVE

­¸­

U0327

COMBINING CEDILLA

­˛­

U0328

COMBINING OGONEK

­ ̄ ­

U0304

COMBINING MACRON

Greek macron, length

­̦­

U0326

COMBINING COMMA BELOW5

­ι­

U1FBE

PROSGEGRAMMENI

iota adscriptum)

­ͅ­

U0345

COMBINING GREEK YPOGEGRAMMENI

iota subscriptum6

A.8.1.2 Multiple diacritical marks

Shape

Diacritical mark7

8

U1FCE

PSILI AND OXIA

PSILI AND OXIA AND YPOGEGRAMMENI

U1FCD

PSILI AND VARIA

PSILI AND VARIA AND YPOGEGRAMMENI

U1FCF

PSILI AND PERISPOMENI

PSILI AND PERISPOMENI AND YPOGEGRAMMENI

PSILI AND YPOGEGRAMMENI

U1FDE

DASIA AND OXIA

DASIA AND OXIA AND YPOGEGRAMMENI

U1FDD

DASIA AND VARIA

DASIA AND VARIA AND YPOGEGRAMMENI

U1FDF

DASIA AND PERISPOMENI

DASIA AND PERISPOMENI AND YPOGEGRAMMENI

DASIA AND YPOGEGRAMMENI

OXIA AND YPOGEGRAMMENI

VARIA AND YPOGEGRAMMENI

ǻ

RING ABOVE AND ACUTE

PERISPOMENI AND YPOGEGRAMMENI

­΅­

U1FEE

DIALYTIKA AND OXIA

­ ΅­

U0385

DIALYTIKA AND TONOS

­῭­

U1FED

DIALYTIKA AND VARIA

­῁­

U1FC1

DIALYTIKA AND PERISPOMENI

ȫ

DIAERESIS AND MACRON

A.8.2 Second level letters

Shape

Position and name of second level letter in ISO/IEC 10646:2003

Equiv. FOL9

ɐ

U0250

LATIN SMALL LETTER TURNED A

a

ɑ

U0251

LATIN SMALL LETTER ALPHA

a

ɒ

U0252

LATIN SMALL LETTER TURNED ALPHA

a

æ

U00E6

LATIN SMALL LETTER AE

ae

Æ

U00C6

LATIN CAPITAL LETTER AE

AE

ǽ

U01FD

LATIN SMALL LETTER AE WITH ACUTE

áe

Ǽ

U01FC

LATIN CAPITAL LETTER AE WITH ACUTE

ÁE

ǣ

U01E3

LATIN SMALL LETTER AE WITH MACRON

āe

Ǣ

U01E2

LATIN CAPITAL LETTER AE WITH MACRON

ĀE

ʙ

U0299

LATIN LETTER SMALL CAPITAL B

b

ƀ

U0180

LATIN SMALL LETTER B WITH STROKE

b

Ƀ

U0243

LATIN CAPITAL LETTER B WITH STROKE

B

ɓ

U0253

LATIN SMALL LETTER B WITH HOOK

b

Ɓ

U0181

LATIN CAPITAL LETTER B WITH HOOK

B

ƃ

U0183

LATIN SMALL LETTER B WITH TOPBAR

b

Ƃ

U0182

LATIN CAPITAL LETTER B WITH TOPBAR

B

ƈ

U0188

LATIN SMALL LETTER C WITH HOOK

c

Ƈ

U0187

LATIN CAPITAL LETTER C WITH HOOK

C

ɕ

U0255

LATIN SMALL LETTER C WITH CURL

c

ʗ

U0297

LATIN LETTER STRETCHED C

c

đ

U0111

LATIN SMALL LETTER D WITH STROKE

d

Đ

U0110

LATIN CAPITAL LETTER D WITH STROKE

D

ð

U00F0

LATIN SMALL LETTER ETH

d

Ð

U00D0

LATIN CAPITAL LETTER ETH

D

ɖ

U0256

LATIN SMALL LETTER D WITH TAIL

d

Ɖ

U0189

LATIN CAPITAL LETTER AFRICAN D

D

ɗ

U0257

LATIN SMALL LETTER D WITH HOOK

d

Ɗ

U018A

LATIN CAPITAL LETTER D WITH HOOK

D

ƌ

U018C

LATIN SMALL LETTER D WITH TOPBAR

d

Ƌ

U018B

LATIN CAPITAL LETTER D WITH TOPBAR

D

ȡ

U0221

LATIN SMALL LETTER D WITH CURL

d

ƍ

U018D

LATIN SMALL LETTER TURNED DELTA

d

ʥ

U02A5

LATIN SMALL LETTER DZ DIGRAPH WITH CURL

dz

ʤ

U02A4

LATIN SMALL LETTER DEZH DIGRAPH

dz

ǝ

U01DD

LATIN SMALL LETTER TURNED E

e

Ǝ

U018E

LATIN CAPITAL LETTER REVERSED E

E

ə

U0259

LATIN SMALL LETTER SCHWA

e

Ə

U018F

LATIN CAPITAL LETTER SCHWA

E

ɛ

U025B

LATIN SMALL LETTER OPEN E

e

Ɛ

U0190

LATIN CAPITAL LETTER OPEN E

E

ɘ

U0258

LATIN SMALL LETTER REVERSED E

e

ɚ

U025A

LATIN SMALL LETTER SCHWA WITH HOOK

e

ɜ

U025C

LATIN SMALL LETTER REVERSED OPEN E

e

ɝ

U025D

LATIN SMALL LETTER REVERSED OPEN E WITH HOOK

e

ɞ

U025E

LATIN SMALL LETTER CLOSED REVERSED OPEN E

e

ʚ

U029A

LATIN SMALL LETTER CLOSED OPEN E

e

ƒ

U0192

LATIN SMALL LETTER F WITH HOOK

f

Ƒ

U0191

LATIN CAPITAL LETTER F WITH HOOK

F

ɡ

U0261

LATIN SMALL LETTER SCRIPT G

g

ɢ

U0262

LATIN LETTER SMALL CAPITAL G

g

ǥ

U01E5

LATIN SMALL LETTER G WITH STROKE

g

Ǥ

U01E4

LATIN CAPITAL LETTER G WITH STROKE

G

ɠ

U0260

LATIN SMALL LETTER G WITH HOOK

g

Ɠ

U0193

LATIN CAPITAL LETTER G WITH HOOK

G

ʛ

U029B

LATIN SMALL CAPITAL LETTER G WITH HOOK

g

ɣ

U0263

LATIN SMALL LETTER GAMMA

g

Ɣ

U0194

LATIN CAPITAL LETTER GAMMA

G

ɤ

U0264

LATIN SMALL LETTER RAMS HORN

g

ƣ10

U01A3

LATIN SMALL LETTER OI

g

Ƣ

U01A2

LATIN CAPITAL LETTER OI

G

ħ

U0127

LATIN SMALL LETTER H WITH STROKE

h

Ħ

U0126

LATIN CAPITAL LETTER H WITH STROKE

H

ʜ

U029C

LATIN LETTER SMALL CAPITAL H

h

ɦ

U0266

LATIN SMALL LETTER H WITH HOOK

h

ɧ

U0267

LATIN SMALL LETTER HENG WITH HOOK

h

ɥ

U0265

LATIN SMALL LETTER TURNED H

h

ʮ

U02AE

LATIN SMALL LETTER TURNED H WITH FISHHOOK

h

ʯ

U02AF

LATIN SMALL LETTER TURNED H WITH FISHHOOK AND TAIL

h

ƕ

U0195

LATIN SMALL LETTER HV

hv

Ƕ

U01F6

LATIN CAPITAL LETTER HWAIR

HV

U2071

SUPERSCRIPT LATIN SMALL LETTER I

i

ı

U0131

LATIN SMALL LETTER DOTLESS I

i

ɪ

U026A

LATIN LETTER SMALL CAPITAL I

i

ɨ

U0268

LATIN SMALL LETTER I WITH STROKE

i

Ɨ

U0197

LATIN CAPITAL LETTER I WITH STROKE

I

ɩ

U0269

LATIN SMALL LETTER IOTA

i

Ɩ

U0196

LATIN CAPITAL LETTER IOTA

I

ij

U0133

LATIN SMALL LIGATURE IJ

ij

IJ

U0132

LATIN CAPITAL LIGATURE IJ

IJ

ʝ

U029D

LATIN SMALL LETTER J WITH CROSSED-TAIL

j

ɟ

U025F

LATIN SMALL LETTER DOTLESS J WITH STROKE

j

ʄ

U0284

LATIN SMALL LETTER DOTLESS J WITH STROKE AND HOOK

j

ƙ

U0199

LATIN SMALL LETTER K WITH HOOK

k

Ƙ

U0198

LATIN CAPITAL LETTER K WITH HOOK

K

ĸ

U0138

LATIN SMALL LETTER KRA

k

ʞ

U029E

LATIN SMALL LETTER TURNED K

k

ł

U0142

LATIN SMALL LETTER L WITH STROKE

l

Ł

U0141

LATIN CAPITAL LETTER L WITH STROKE

L

ŀ

U0140

LATIN SMALL LETTER L WITH MIDDLE DOT

l

Ŀ

U013F

LATIN CAPITAL LETTER L WITH MIDDLE DOT

L

ʟ

U029F

LATIN LETTER SMALL CAPITAL L

l

ƚ

U019A

LATIN SMALL LETTER L WITH BAR

l

Ƚ

U023D

LATIN CAPITAL LETTER L WITH BAR

L

ɫ

U026B

LATIN SMALL LETTER L WITH MIDDLE TILDE

l

ɬ

U026C

LATIN SMALL LETTER L WITH BELT

l

ɭ

U026D

LATIN SMALL LETTER L WITH RETROFLEX HOOK

l

ȴ

U0234

LATIN SMALL LETTER L WITH CURL

l

ƛ

U019B

LATIN SMALL LETTER LAMBDA WITH STROKE

l

ɮ

U026E

LATIN SMALL LETTER LEZH

lz

ɱ

U0271

LATIN SMALL LETTER M WITH HOOK

m

ɯ

U026F

LATIN SMALL LETTER TURNED M

m

Ɯ

U019C

LATIN CAPITAL LETTER TURNED M

M

ɰ

U0270

LATIN SMALL LETTER TURNED M WITH LONG LEG

m

U207F

SUPERSCRIPT LATIN SMALL LETTER N

n

'n

U0149

LATIN SMALL LETTER N PRECEDED BY APOSTROPHE

n

ɴ

U0274

LATIN LETTER SMALL CAPITAL N

n

ɲ

U0272

LATIN SMALL LETTER N WITH LEFT HOOK

n

Ɲ

U019D

LATIN CAPITAL LETTER N WITH LEFT HOOK

N

ƞ

U019E

LATIN SMALL LETTER N WITH LONG RIGHT LEG

n

Ƞ

U0220

LATIN CAPITAL LETTER N WITH LONG RIGHT LEG

N

ɳ

U0273

LATIN SMALL LETTER N WITH RETROFLEX HOOK

n

ȵ

U0235

LATIN SMALL LETTER N WITH CURL

n

ŋ

U014B

LATIN SMALL LETTER ENG

n

Ŋ

U014A

LATIN CAPITAL LETTER ENG

N

ø

U00F8

LATIN SMALL LETTER O WITH STROKE

o

Ø

U00D8

LATIN CAPITAL LETTER O WITH STROKE

O

ǿ

U01FF

LATIN SMALL LETTER O WITH STROKE AND ACUTE

o

Ǿ

U01FE

LATIN CAPITAL LETTER O WITH STROKE AND ACUTE

O

ơ

U01A1

LATIN SMALL LETTER O WITH HORN

o

Ơ

U01A0

LATIN CAPITAL LETTER O WITH HORN

O

ɔ

U0254

LATIN SMALL LETTER OPEN O

o

Ɔ

U0186

LATIN CAPITAL LETTER OPEN O

O

ɵ

U0275

LATIN SMALL LETTER BARRED O

o

Ɵ

U019F

LATIN CAPITAL LETTER O WITH MIDDLE TILDE

O

ɷ

U0277

LATIN SMALL LETTER CLOSED OMEGA

o

ȣ

U0223

LATIN SMALL LETTER OU

o

Ȣ

U0222

LATIN CAPITAL LETTER OU

O

œ

U0153

LATIN SMALL LIGATURE OE

oe

ɶ

U0276

LATIN LETTER SMALL CAPITAL OE

oe

Œ

U0152

LATIN CAPITAL LIGATURE OE

OE

ƥ

U01A5

LATIN SMALL LETTER P WITH HOOK

p

Ƥ

U01A4

LATIN CAPITAL LETTER P WITH HOOK

P

ɸ

U0278

LATIN SMALL LETTER PHI

p

ʠ

U02A0

LATIN SMALL LETTER Q WITH HOOK

q

ʀ

U0280

LATIN LETTER SMALL CAPITAL R

r

Ʀ

U01A6

LATIN LETTER YR

R

ɹ

U0279

LATIN SMALL LETTER TURNED R

r

ɺ

U027A

LATIN SMALL LETTER TURNED R WITH LONG LEG

r

ɻ

U027B

LATIN SMALL LETTER TURNED R WITH HOOK

r

ɼ

U027C

LATIN SMALL LETTER R WITH LONG LEG

r

ɽ

U027D

LATIN SMALL LETTER R WITH TAIL

r

ɾ

U027E

LATIN SMALL LETTER R WITH FISHHOOK

r

ɿ

U027F

LATIN SMALL LETTER REVERSED R WITH FISHHOOK

r

ʁ

U0281

LATIN LETTER SMALL CAPITAL INVERTED R

r

ſ

U017F

LATIN SMALL LETTER LONG S

s

ʂ

U0282

LATIN SMALL LETTER S WITH HOOK

s

ʃ

U0283

LATIN SMALL LETTER ESH

s

Ʃ

U01A9

LATIN CAPITAL LETTER ESH

S

ƪ

U01AA

LATIN REVERSED ESH LOOP

s

ʅ

U0285

LATIN SMALL LETTER SQUAT REVERSED ESH

s

ʆ

U0286

LATIN SMALL LETTER ESH WITH CURL

s

ß

U00DF

LATIN SMALL LETTER SHARP S

ss

ŧ

U0167

LATIN SMALL LETTER T WITH STROKE

t

Ŧ

U0166

LATIN CAPITAL LETTER T WITH STROKE

T

ƫ

U01AB

LATIN SMALL LETTER T WITH PALATAL HOOK

t

ƭ

U01AD

LATIN SMALL LETTER T WITH HOOK

t

Ƭ

U01AC

LATIN CAPITAL LETTER T WITH HOOK

T

ʈ

U0288

LATIN SMALL LETTER T WITH RETROFLEX HOOK

t

Ʈ

U01AE

LATIN CAPITAL LETTER T WITH RETROFLEX HOOK

T

ȶ

U0236

LATIN SMALL LETTER T WITH CURL

t

ʇ

U0287

LATIN SMALL LETTER TURNED T

t

ʨ

U02A8

LATIN SMALL LETTER TC DIGRAPH WITH CURL

tc

ư

U01B0

LATIN SMALL LETTER U WITH HORN

u

Ư

U01AF

LATIN CAPITAL LETTER U WITH HORN

U

ʉ

U0289

LATIN SMALL LETTER U BAR

u

Ʉ

U0244

LATIN CAPITAL LETTER U BAR

U

ʊ

U028A

LATIN SMALL LETTER UPSILON

u

Ʊ

U01B1

LATIN CAPITAL LETTER UPSILON

U

ʋ

U028B

LATIN SMALL LETTER V WITH HOOK

v

Ʋ

U01B2

LATIN CAPITAL LETTER V WITH HOOK

V

ʌ

U028C

LATIN SMALL LETTER TURNED V

v

Ʌ

U0245

LATIN CAPITAL LETTER TURNED V

V

ʍ

U028D

LATIN SMALL LETTER TURNED W

w

ƿ

U01BF

LATIN LETTER WYNN

w

Ƿ

U01F7

LATIN CAPITAL LETTER WYNN

W

ʏ

U028F

LATIN LETTER SMALL CAPITAL Y

y

ƴ

U01B4

LATIN SMALL LETTER Y WITH HOOK

y

Ƴ

U01B3

LATIN CAPITAL LETTER Y WITH HOOK

Y

ʎ

U028E

LATIN SMALL LETTER TURNED Y

y

ȝ

U021D

LATIN SMALL LETTER YOGH

y

Ȝ

U021C

LATIN CAPITAL LETTER YOGH

Y

ƶ

U01B6

LATIN SMALL LETTER Z WITH STROKE

z

Ƶ

U01B5

LATIN CAPITAL LETTER Z WITH STROKE

Z

ȥ

U0225

LATIN SMALL LETTER Z WITH HOOK

z

Ȥ

U0224

LATIN CAPITAL LETTER Z WITH HOOK

Z

ʐ

U0290

LATIN SMALL LETTER Z WITH RETROFLEX HOOK

z

ʑ

U0291

LATIN SMALL LETTER Z WITH CURL

z

ʒ

U0292

LATIN SMALL LETTER EZH

z

Ʒ

U01B7

LATIN CAPITAL LETTER EZH

Z

ǯ

U01EF

LATIN SMALL LETTER EZH WITH CARON

z

Ǯ

U01EE

LATIN CAPITAL LETTER EZH WITH CARON

Z

ƹ

U01B9

LATIN SMALL LETTER EZH REVERSED

z

Ƹ

U01B8

LATIN CAPITAL LETTER EZH REVERSED

Z

ƺ

U01BA

LATIN SMALL LETTER EZH WITH TAIL

z

ʓ

U0293

LATIN SMALL LETTER EZH WITH CURL

z

ς

U03C2

GREEK SMALL LETTER FINAL SIGMA

σ

ґ

U0491

CYRILLIC SMALL LETTER GHE UPTURN

г

Ґ

U0490

CYRILLIC CAPITAL LETTER GHE UPTURN

Г

ѓ

U0453

CYRILLIC SMALL LETTER GJE

ђ

Ѓ

U0403

CYRILLIC CAPITAL LETTER GJE

Ђ

ќ

U045C

CYRILLIC SMALL LETTER KJE

ћ

Ќ

U040C

CYRILLIC CAPITAL LETTER KJE

Ћ

Annex B (informative): Word-by-word ordering

B.1 Modified terminology

For the purpose of this appendix a special character shall be a character that is neither a letter nor a digit nor a diacritical mark nor a space character.

B.2 Principles

Word-by-word ordering is a frequently used alternative to letter-by-letter-ordering. It is a special case of multiple-key ordering which treats space characters as key separators. The maximal string is thus a set of characters enclosed by space characters.

The sets of strings thus obtained are ordered following the European Ordering Rules as specified in the main part of this European Standard.

B.3 Example of Word-by-word vs. letter-by-letter ordering

Letter-by-letter ordering

Word-by-word-ordering

in-
inability
in absentia
inadvisable
in extenso
in medias res
in memoriam

in-
in absentia
in extenso
in medias res
in memoriam
inability
inadvisable

B.4 Simplified word-by-word ordering

If the text to be ordered word by word contains only few second level letters, letters with diacritical marks, or special characters, the following method will in most cases produce the same result as the method that is specified above.

In the ordering by script section (A.3.3.1) space characters precede digits and letters. The space character is then removed from the table of special characters. The other ordering rules remain unchanged.

Annex C (informative): Ordering by position and by style

C.1 Background

In some cases it is desirable to differentiate further on the third ordering level, e. g. in the case where definitions and ordinary usage of a word are distinguished solely by the application of some form of internal tagging. This tagging usually takes in print the form of a formatting style. Especially in lexicography it is also often thought to be desirable to distinguish between loan words and native words in such a manner.

This formatting can be expressed by changing the position to the baseline, e. g. in mathematical or chemical formulae, or by highlighting it with certain typographic features, e. g. italic typeface, that serves to indicate some property of the word.

C.2 Recommended rules

In line with ISO 12199 this European Standard recommends that, if the implementer deems it necessary to make this differentiation, she or he modify (A.9.2.1) (Capitalization) on the third ordering level in the following manner:

Letters are to be arranged in the sequence indicated in this list:

  1. small letter on baseline
  2. capital letter on baseline
  3. small letter above baseline
  4. capital letter above baseline
  5. small letter below baseline
  6. capital letter below baseline

If this does not result in a unique sequence, typographic styles are to be taken into consideration in the sequence listed:

  1. roman abcde
  2. boldface abcde

  3. italic abcde

  4. boldface italic abcde

  5. others

Annex D (informative): Mixed-script ordering with one predominant script

D.1 Background

Many publications — often of the encyclopedia type — handle scripts differently from this European Standard, especially if they cover predominantly one script with a few entries from other scripts interspersed. They implicitly transliterate strings from other scripts into the predominant one and order according to the rules for that script. For printing the strings are then rendered in their original form. This has the advantage for the user to find related articles e. g. on λόγος and logic near to each other.

D.2 Suggested steps

This may involve the following steps:

D.3 Explicit transliteration

A different, likewise common method is the method of explicit transliteration which selects the transliterated word such as logos and adds the original rendering in brackets.

Annex E (informative): Defining National Deltas based on the EOR

E.1 Background

Ordering rules for European languages can benefit from unambiguous, ideally machine-processable specifications, both in the case of formal national standards and de facto standards issued by, e.g., the relevant language institutes. This work can be facilitated by basing this specification on a tailoring of the EOR. This annex gives an overview over possible approaches for writing such a delta.

E.2 Structured Specification

In most cases a delta will start with a structured, but not machine-processable description of the linguistic and cultural ordering preferences. Such a specification can list sequences of letters that are distinguished on the first ordering level. For example, Norwegian could list a b c d e f g h i j k l m n o p q r s t u v w x y z æ ø å, whereas the Polish first level letters would be a ą b c ć d e ę f g h i j k l ł m n ń o ó p r s ś t u w y z ź ż.

A structured specification can then discuss typical distinctions that are made on the second ordering level, such as ss vs. ß in German or v vs. w in Icelandic dictionaries. If required, it can then look at the treatment of otherwise identical words beginning with lowercase vs. uppercase letters and specify a preference for the third ordering level. Should there be established procedures for the treatment of symbols, those can again be described.

In a second step, the differences between these rules and the EOR can be captured, again in words. Such rules could be ”make w sort equal to v on the first ordering level, but distinguish them on the second, where w follows v” or ”order æ ø å after z on the first ordering level” in Norwegian. In some cases, specific decisions in the EOR will already meet the requirements, so the delta can be small. Likewise, should there be no established cultural preferences, the delta can just implicitly rely on the EOR defaults. In line with ISO/IEC 14651 this European Standard recommends that the defined delta be as small as possible while still expressing the cultural and linguistic ordering preferences comprehensively.

This type of structured, but not machine-processable description meets the requirements of section 6.4 in ISO/IEC 14651:2007. In many cases, a profile at this level of structured, human-readable specification will suffice. If the rules are generally available, they can be translated into their machine-processable equivalents.

E.3 Machine-Readable Specification

Machine-processable specifications offer the additional benefit that they can be directly plugged into an operating system's locale data. All applications that build on the operating system's API will thus automatically profit from this data and order lists correctly according to the user's chosen cultural preferences.

Section 6.4 of ISO/IEC 14651:2007 prescribes that a machine-processable delta must contain at least one valid order_start entry, a specification of the number of levels for comparison, a definition of symbol definition weights and a list of character definitions. However, it leaves the question of the concrete syntax to express these rules explicitly open. This European Standard recommends to use either LDML or the POSIX-oriented syntax employed in ISO/IEC 14651's CTT or both to express the delta.

It is out of the scope of this European Standard to describe the exact syntax and semantics of LDML or the syntax rules for ISO/IEC 14651's CTT. Section 6 and Annex G can serve as practical examples of the design and implementation of two representations of a given delta. More tutorial information as well as pointers further examples and to syntax validators for both syntaxes can be found on http://purl.oclc.org/NET/CDFG/EN13710

E.4 Example 1: National Delta for German

E.4.1 Structured Specification

E.4.1.1 First Ordering Level

E.4.1.2 Second Ordering Level

The Umlaut (diaeresis in ISO/IEC 10646) is the diacritic that must be taken into account before all others. The umlaut is treated as distinct from the trema and can only occur in combination with the base letters a, o and u. The sequence of the other diacritics is compatible with the EOR delta.

E.4.1.3 Third Ordering Level

Lowercase letters precede uppercase ones.

E.4.1.4 Fourth Ordering Level

No specific rules.

E.4.2 Delta against EOR in 14651-syntax

% -*- coding: utf-8; -*-
% ISO/IEC 14651-conformant delta for DIN 5007-1:2004 (sample)

reorder-after <BASE> %The umlaut is the diacritic with the highest priority
collating-symbol <UMLAUT> %specifically for the weight of the umlaut (distinct from the Trema in DIN 5007)
<UMLAUT> 
reorder-end


%Digits must follow letters 

%For the treatment of Roman numerals as well as digits as numbers we
%need preprocessing that is out of the scope of this profile

reorder-after <TFFFF> %After the Han ideographs, but before <SFFFF>
<S0030>  % 0
<S0031>  % 1
<S0032>  % 2
<S0033>  % 3
<S0034>  % 4
<S0035>  % 5
<S0036>  % 6
<S0037>  % 7
<S0038>  % 8
<S0039>  % 9
reorder-end

reorder-after <SFFFF>

order_start forward;forward;forward;forward 

%Whitspace precedes according to 6.2.1 all other characters on the
%first level. They thus get a non-ignorable weight 

<U0009> <S0009>;<BASE>;<MIN>;<U0009> % HORIZONTAL TABULATION (in 6429)
<U000A> <S000A>;<BASE>;<MIN>;<U000A> % LINE FEED (in 6429)
<U000B> <S000B>;<BASE>;<MIN>;<U000B> % VERTICAL TABULATION (in 6429)
<U000C> <S000C>;<BASE>;<MIN>;<U000C> % FORM FEED (in 6429)
<U000D> <S000D>;<BASE>;<MIN>;<U000D> % CARRIAGE RETURN (in 6429)
<U0020> <S0020>;<BASE>;<MIN>;<U0020> % SPACE


%Ligatures and the sharp S are already treated in the EOR delta in the sense of DIN 5007

<U00E4> <S0061>;"<BASE><UMLAUT>";"<MIN><MIN>";<U00E4> % LATIN SMALL LETTER A WITH DIAERESIS
<U00C4> <S0061>;"<BASE><UMLAUT>";"<CAP><MIN>";<U00C4> % LATIN CAPITAL LETTER A WITH DIAERESIS
<U00F6> <S006F>;"<BASE><UMLAUT>";"<MIN><MIN>";<U00F6> % LATIN SMALL LETTER O WITH DIAERESIS
<U00D6> <S006F>;"<BASE><UMLAUT>";"<CAP><MIN>";<U00D6> % LATIN CAPITAL LETTER O WITH DIAERESIS
<U00FC> <S0075>;"<BASE><UMLAUT>";"<MIN><MIN>";<U00FC> % LATIN SMALL LETTER U WITH DIAERESIS
<U00DC> <S0075>;"<BASE><UMLAUT>";"<CAP><MIN>";<U00DC> % LATIN CAPITAL LETTER U WITH DIAERESIS

%If we want an ordering according to 6.1.1.4.2, where Umlaute become
%base letter + e, these lines replace the previous six weight assignments
%<U00E4> "<S0061><S0065>";"<BASE><BASE><UMLAUT>";"<MIN><MIN><MIN>";<U00E4> % LATIN SMALL LETTER A WITH DIAERESIS
%<U00C4> "<S0061><S0065>";"<BASE><BASE><UMLAUT>";"<CAP><MIN><MIN>";<U00C4> % LATIN CAPITAL LETTER A WITH DIAERESIS
%<U00F6> "<S006F><S0065>";"<BASE><BASE><UMLAUT>";"<MIN><MIN><MIN>";<U00F6> % LATIN SMALL LETTER O WITH DIAERESIS
%<U00D6> "<S006F><S0065>";"<BASE><BASE><UMLAUT>";"<CAP><MIN><MIN>";<U00D6> % LATIN CAPITAL LETTER O WITH DIAERESIS
%<U00FC> "<S0075><S0065>";"<BASE><BASE><UMLAUT>";"<MIN><MIN><MIN>";<U00FC> % LATIN SMALL LETTER U WITH DIAERESIS
%<U00DC> "<S0075><S0065>";"<BASE><BASE><UMLAUT>";"<CAP><MIN><MIN>";<U00DC> % LATIN CAPITAL LETTER U WITH DIAERESIS

%Treatment of other Latin characters according to DIN 31638, 7.3.4.1
<U00FE> "<S0074><S0068>";"<BASE><BASE><VRNT1>";"<MIN><MIN><MIN>";<U00FE> % LATIN SMALL LETTER THORN (as th on the first level)
<U00DE> "<S0074><S0068>";"<BASE><BASE><VRNT1>";"<CAP><CAP><MIN>";<U00DE> % LATIN CAPITAL LETTER THORN

<U01BF> <S0079>;"<BASE><VRNT3>";"<MIN><MIN>";<U01BF> % LATIN LETTER WYNN (as y on the first level)
<U01F7> <S0079>;"<BASE><VRNT3>";"<CAP><MIN>";<U01F7> % LATIN CAPITAL LETTER WYNN        
<U0292> <S0079>;"<BASE><VRNT4>";"<MIN><MIN>";<U0292> % LATIN SMALL LETTER EZH (as y  on the first level, yogh and ezh are equated in ISO/IEC 10646)
<U01B7> <S0079>;"<BASE><VRNT4>";"<CAP><MIN>";<U01B7> % LATIN CAPITAL LETTER EZH 

reorder-end

E.5 Example 2: National Delta for Norwegian

E.5.1 Structured Specification

E.5.1.1 First Ordering Level

E.5.1.2 Second Ordering Level

No specific rules.

E.5.1.3 Third Ordering Level

Lowercase letters precede uppercase ones.

E.5.1.4 Fourth Ordering Level

No specific rules.

E.5.2 Delta against EOR in 14651-syntax

% ISO/IEC 14651-conformant delta for Norwegian (sample)
reorder-after <S00FE>
collating-symbol <S00E6> % LATIN SMALL LETTER AE
collating-symbol <S00F8> % LATIN SMALL LETTER O WITH STROKE
collating-symbol <S00E5> % LATIN SMALL LETTER A WITH RING ABOVE
<S00E6>
<S00F8>
<S00E5>
reorder-end

reorder-after <SFFFF>

order_start forward;forward;forward;forward 
%first level letters are abcdefghijklmnopqrstuvwxyzæøå 
%a-z as in EOR

%ü and ű as y
<U00FC> <S0079>;"<BASE><VRNT2>";"<MIN><MIN>";<U00FC> % LATIN SMALL LETTER U WITH DIAERESIS
<U00DC> <S0079>;"<BASE><VRNT2>";"<CAP><MIN>";<U00DC> % LATIN CAPITAL LETTER U WITH DIAERESIS
<U0171> <S0079>;"<BASE><VRNT3>";"<MIN><MIN>";<U0171> % LATIN SMALL LETTER U WITH DOUBLE ACUTE
<U0170> <S0079>;"<BASE><VRNT3>";"<CAP><MIN>";<U0170> % LATIN CAPITAL LETTER U WITH DOUBLE ACUTE

<U00E6> <S00E6>;<BASE>;<MIN>;<U00E6> % LATIN SMALL LETTER AE
<U00C6> <S00E6>;<BASE>;<CAP>;<U00C6> % LATIN CAPITAL LETTER AE
%ä as æ
<U00E4> <S00E6>;"<BASE><VRNT1>";"<MIN><MIN>";<U00E4> % LATIN SMALL LETTER A WITH DIAERESIS
<U00C4> <S00E6>;"<BASE><VRNT1>";"<CAP><MIN>";<U00C4> % LATIN CAPITAL LETTER A WITH DIAERESIS

<U00F8> <S00F8>;<BASE>;<MIN>;<U00F8> % LATIN SMALL LETTER O WITH STROKE
<U00D8> <S00F8>;<BASE>;<CAP>;<U00D8> % LATIN CAPITAL LETTER O WITH STROKE
%ö as ø
<U00F6> <S00F8>;"<BASE><VRNT1>";"<MIN><MIN>";<U00F6> % LATIN SMALL LETTER O WITH DIAERESIS
<U00D6> <S00F8>;"<BASE><VRNT1>";"<CAP><MIN>";<U00D6> % LATIN CAPITAL LETTER O WITH DIAERESIS
%ő also as ø
<U0151> <S00F8>;"<BASE><VRNT2>";"<MIN><MIN>";<U0151> % LATIN SMALL LETTER O WITH DOUBLE ACUTE
<U0150> <S00F8>;"<BASE><VRNT2>";"<CAP><MIN>";<U0150> % LATIN CAPITAL LETTER O WITH DOUBLE ACUTE

<U00FE> "<S0074><S0068>";"<BASE><BASE><VRNT1>";"<MIN><MIN><MIN>";<U00FE> % LATIN SMALL LETTER THORN (as th on the first level)
<U00DE> "<S0074><S0068>";"<BASE><BASE><VRNT1>";"<CAP><CAP><MIN>";<U00DE> % LATIN CAPITAL LETTER THORN

<U00E5> <S00E5>;<BASE>;<MIN>;<U00E5> % LATIN SMALL LETTER A WITH RING ABOVE
<U00C5> <S00E5>;<BASE>;<CAP>;<U00C5> % LATIN CAPITAL LETTER A WITH RING ABOVE
reorder-end

Annex F (informative): Modern European Scripts / MES

This annex reproduces for ease of reference the definition of the Modern European Scripts collection in ISO/IEC 10646:2003, A.4.3:

1

BASIC LATIN

0020-007E

2

LATIN-1 SUPPLEMENT

00A0-00FF

3

LATIN EXTENDED-A

0100-017F

4

LATIN EXTENDED-B

0180-024F

5

IPA EXTENSIONS

0250-02AF

6

SPACING MODIFIER LETTERS

02B0-02FF

7

COMBINING DIACRITICAL MARKS

0300-036F

8

BASIC GREEK

0370-03CF

9

GREEK SYMBOLS AND COPTIC

03D0-03FF

10

CYRILLIC

0400-04FF

11

ARMENIAN

0530-058F

27

BASIC GEORGIAN

10D0-10FF

30

LATIN EXTENDED ADDITIONAL

1E00-1EFF

31

GREEK EXTENDED

1F00-1FFF

32

GENERAL PUNCTUATION

2000-206F

33

SUPERSCRIPTS AND SUBSCRIPTS

2070-209F

34

CURRENCY SYMBOLS

20A0-20CF

35

COMBINING DIACRITICAL MARKS FOR SYMBOLS

20D0-20FF

36

LETTERLIKE SYMBOLS

2100-214F

37

NUMBER FORMS

2150-218F

38

ARROWS

2190-21FF

39

MATHEMATICAL OPERATORS

2200-22FF

40

MISCELLANEOUS TECHNICAL

2300-23FF

42

OPTICAL CHARACTER RECOGNITION

2440-245F

44

BOX DRAWING

2500-257F

45

BLOCK ELEMENTS

2580-259F

46

GEOMETRIC SHAPES

25A0-25FF

47

MISCELLANEOUS SYMBOLS

2600-26FF

65

COMBINING HALF MARKS

FE20-FE2F

70

SPECIALS

FFF0-FFFD

92

CYRILLIC SUPPLEMENT

0500-052F

104

LTR ALPHABETIC PRESENTATION FORMS

FB00-FB1C

Annex G (informative): EOR Delta in LDML Syntax

<?xml version="1.0" encoding="utf-8"?>
<ldml>
  <identity>
    <!-- 
         Authors: Marc Wilhelm Küster (EN 13710 editor) and Åke Persson
         
         LDML version of the rules specified in Section 6 of EN 13710
    -->
    <version number="1.0" cldrVersion="1.8.1"/>
    <generation date="2010-05-03"/>
    <language type="EOR"/>
  </identity>
  <collations validSubLocales="All European locales">
    <collation type="standard">
      <settings caseLevel="on" caseFirst="lower" strength="quaternary"/>
      <rules>
        <!-- 
             currency signs and modifier letters are ignored in EOR
        -->
        <reset><last_tertiary_ignorable/></reset>
        <i>$</i><!-- U0024 -->
        <i>¢</i><!-- U00A2 -->
        <i>£</i><!-- U00A3 -->
        <i>¤</i><!-- U00A4 -->
        <i>¥</i><!-- U00A5 -->
        <i>₠</i><!-- U20A0 -->
        <i>₡</i><!-- U20A1 -->
        <i>₢</i><!-- U20A2 -->
        <i>₣</i><!-- U20A3 -->
        <i>₤</i><!-- U20A4 -->
        <i>₥</i><!-- U20A5 -->
        <i>₦</i><!-- U20A6 -->
        <i>₧</i><!-- U20A7 -->
        <i>₨</i><!-- U20A8 -->
        <i>₩</i><!-- U20A9 -->
        <i>₪</i><!-- U20AA -->
        <i>₫</i><!-- U20AB -->
        <i>€</i><!-- U20AC -->
        <i>₭</i><!-- U20AD -->
        <i>₮</i><!-- U20AE -->
        <i>₯</i><!-- U20AF -->
        <i>₰</i><!-- U20B0 -->
        <i>₱</i><!-- U20B1 -->
        <i>₲</i><!-- U20B2 -->
        <i>₳</i><!-- U20B3 -->
        <i>₴</i><!-- U20B4 -->
        <i>₵</i><!-- U20B5 -->

        <!-- General category Lm -->
        <i>ʰ</i><!-- U02B0 -->
        <i>ʱ</i><!-- U02B1 -->
        <i>ʲ</i><!-- U02B2 -->
        <i>ʳ</i><!-- U02B3 -->
        <i>ʴ</i><!-- U02B4 -->
        <i>ʵ</i><!-- U02B5 -->
        <i>ʶ</i><!-- U02B6 -->
        <i>ʷ</i><!-- U02B7 -->
        <i>ʸ</i><!-- U02B8 -->
        <i>ʻ</i><!-- U02BB -->
        <i>ʼ</i><!-- U02BC -->
        <i>ʽ</i><!-- U02BD -->
        <i>ʾ</i><!-- U02BE -->
        <i>ʿ</i><!-- U02BF -->
        <i>ˀ</i><!-- U02C0 -->
        <i>ˁ</i><!-- U02C1 -->
        <i>ː</i><!-- U02D0 -->
        <i>ˑ</i><!-- U02D1 -->
        <i>ˠ</i><!-- U02E0 -->
        <i>ˡ</i><!-- U02E1 -->
        <i>ˢ</i><!-- U02E2 -->
        <i>ˣ</i><!-- U02E3 -->
        <i>ˤ</i><!-- U02E4 -->
        <i>ˮ</i><!-- U02EE -->

        <!-- Glottal stops -->
        <i>ʔ</i><!-- U0294 -->
        <i>ʕ</i><!-- U0295 -->
        <i>ʖ</i><!-- U0296 -->
        <i>ʘ</i><!-- U0298 -->
        <i>ʡ</i><!-- U02A1 -->
        <i>ʢ</i><!-- U02A2 -->

        <!--Latin letters: only a-z + þ are basic letters -->
        <reset>a</reset> <!-- turned a, latin alpha, turned alpha -->
        <s>ɐ</s><!-- U0250 -->
        <s>ɑ</s><!-- U0251 -->
        <s>ɒ</s><!-- U0252 -->

        <reset>b</reset> <!-- small capital b, b with stroke, b with hook, b with topbar -->
        <s>ʙ</s><!-- U0299 -->
        <s>ƀ</s><!-- U0180 -->
        <t>Ƀ</t><!-- U0243 -->
        <s>ɓ</s><!-- U0253 -->
        <t>Ɓ</t><!-- U0181 -->
        <s>ƃ</s><!-- U0183 -->
        <t>Ƃ</t><!-- U0182 -->

        <reset>c</reset> <!-- c with hook, c with curl, stretched c -->
        <s>ƈ</s><!-- U0188 -->
        <t>Ƈ</t><!-- U0187 -->
        <s>ɕ</s><!-- U0255 -->        
        <s>ʗ</s><!-- U0297 -->

        <reset>d</reset> <!-- African d, d with hook, 
        d with topbar, d with curl, turned delta -->
        <s>ɖ</s><!-- U0256 -->
        <t>Ɖ</t><!-- U0189 -->
        <s>ɗ</s><!-- U0257 -->
        <t>Ɗ</t><!-- U018A -->
        <s>ƌ</s><!-- U018C -->
        <t>Ƌ</t><!-- U018B -->
        <s>ȡ</s><!-- U0221 -->
        <s>ƍ</s><!-- U018D -->

        <reset>e</reset> <!-- turned e, schwa, open e, reversed e, schwa with hook, 
        reversed open e, reversed open e with hook, closed reversed open e, closed open e -->     
        <s>ǝ</s><!-- U01DD -->
        <t>Ǝ</t><!-- U018E -->
        <s>ə</s><!-- U0259 -->
        <t>Ə</t><!-- U018F -->        
        <s>ɛ</s><!-- U025B -->
        <t>Ɛ</t><!-- U0190 -->
        <s>ɘ</s><!-- U0258 -->
        <s>ɚ</s><!-- U025A -->
        <s>ɜ</s><!-- U025C -->
        <s>ɝ</s><!-- U025D -->
        <s>ɞ</s><!-- U025E -->
        <s>ʚ</s><!-- U029A -->
        
        <reset>f</reset> <!-- f with hook -->
        <s>ƒ</s><!-- U0192 -->
        <t>Ƒ</t><!-- U0191 -->

        <reset>g</reset> <!-- script g, small capital g, g with stroke, 
        g with hook, latin gamma, gha -->
        <s>ɡ</s><!-- U0261 -->
        <s>ɢ</s><!-- U0262 -->
        <s>ǥ</s><!-- U01E5 -->
        <t>Ǥ</t><!-- U01E4 -->
        <s>ɠ</s><!-- U0260 -->
        <t>Ɠ</t><!-- U0193 -->
        <s>ʛ</s><!-- U029B -->
        <s>ɣ</s><!-- U0263 -->
        <t>Ɣ</t><!-- U0194 -->
        <s>ɤ</s><!-- U0264 -->
        <s>ƣ</s><!-- U01A3 --> 
        <t>Ƣ</t><!-- U01A2 -->
        
        <reset>h</reset> <!-- small capital h, h with hook, heng with hook, turned h, 
        turned h with fishhook, turned h with fishhook and tail -->
        <s>ʜ</s><!-- U029C -->
        <s>ɦ</s><!-- U0266 -->
        <s>ɧ</s><!-- U0267 -->
        <s>ɥ</s><!-- U0265 -->
        <s>ʮ</s><!-- U02AE -->
        <s>ʯ</s><!-- U02AF -->

        <reset>i</reset> <!-- dotless i, small capital i, i with stroke, latin iota -->
        <s>ı</s><!-- U0131 -->
        <s>ɪ</s><!-- U026A -->
        <s>ɨ</s><!-- U0268 -->
        <t>Ɨ</t><!-- U0197 -->
        <s>ɩ</s><!-- U0269 -->
        <t>Ɩ</t><!-- U0196 -->

        <reset>j</reset> <!-- j with crossed tail, dotless j with stroke, 
        dotless j with stroke and hook -->
        <s>ʝ</s><!-- U029D -->
        <s>ɟ</s><!-- U025F -->
        <s>ʄ</s><!-- U0284 -->

        <reset>k</reset> <!-- k with hook, kra, turned k -->
        <s>ƙ</s><!-- U0199 -->
        <t>Ƙ</t><!-- U0198 -->
        <s>ĸ</s><!-- U0138 -->
        <s>ʞ</s><!-- U029E -->

        <reset>l</reset> <!-- small capital l, l with bar, l with middle tilde, l with belt, 
        l with retroflex hook, l with curl, l with curl, latin lambda with stroke -->
        <!-- U029F -->
        <s>ƚ</s><!-- U019A -->
        <t>Ƚ</t><!-- U023D -->
        <s>ɫ</s><!-- U026B -->
        <s>ɬ</s><!-- U026C -->
        <s>ɭ</s><!-- U026D -->
        <s>ȴ</s><!-- U0234 -->
        <s>ƛ</s><!-- U019B -->
        
        <reset>m</reset> <!-- m with hook, turned m, turned m with long leg -->
        <s>ɱ</s><!-- U0271 -->
        <s>ɯ</s><!-- U026F -->
        <t>Ɯ</t><!-- U019C -->
        <s>ɰ</s><!-- U0270 -->

        <reset>n</reset> <!-- n preceded by apostrophe, small capital n, n with left hook, 
        n with long right leg, n with retroflex hook, n with curl, eng -->
        <s>ʼn</s><!-- U0149 -->
        <s>ɴ</s><!-- U0274 -->
        <s>ɲ</s><!-- U0272 -->
        <t>Ɲ</t><!-- U019D -->
        <s>ƞ</s><!-- U019E -->
        <t>Ƞ</t><!-- U0220 -->       
        <s>ɳ</s><!-- U0273 -->
        <s>ȵ</s><!-- U0235 -->
        <s>ŋ</s><!-- U014B -->
        <t>Ŋ</t><!-- U014A -->

        <reset>o</reset> <!-- open o, o with middle tilde, closed omega, ou -->
        <s>ɔ</s><!-- U0254 -->
        <t>Ɔ</t><!-- U0186 -->
        <s>ɵ</s><!-- U0275 -->
        <t>Ɵ</t><!-- U019F -->
        <s>ɷ</s><!-- U0277 -->
        <s>ȣ</s><!-- U0223 -->
        <t>Ȣ</t><!-- U0222 -->

        <reset>p</reset> <!-- p with hook, latin phi -->
        <s>ƥ</s><!-- U01A5 -->
        <t>Ƥ</t><!-- U01A4 -->
        <s>ɸ</s><!-- U0278 -->

        <reset>q</reset> <!-- q with hook -->
        <s>ʠ</s><!-- U02A0 -->
     
        <reset>r</reset> <!-- small capital r, yr, turned r, turned r with long leg, 
        turned r with hook, r with long leg, r with tail, r with fishhook, 
        reversed r with fishhook, small capital inverted r -->
        <s>ʀ</s><!-- U0280 -->
        <t>Ʀ</t><!-- U01A6 -->
        <s>ɹ</s><!-- U0279 -->
        <s>ɺ</s><!-- U027A -->
        <s>ɻ</s><!-- U027B -->
        <s>ɼ</s><!-- U027C -->
        <s>ɽ</s><!-- U027D -->
        <s>ɾ</s><!-- U027E -->
        <s>ɿ</s><!-- U027F -->
        <s>ʁ</s><!-- U0281 -->

        <reset>s</reset> <!-- s with hook, esh, esh loop, reversed esh, 
        esh with curl -->
        <s>ʂ</s><!-- U0282 -->
        <s>ʃ</s><!-- U0283 -->
        <t>Ʃ</t><!-- U01A9 -->
        <s>ƪ</s><!-- U01AA -->
        <s>ʅ</s><!-- U0285 -->
        <s>ʆ</s><!-- U0286 -->

        <reset>t</reset> <!-- t with stroke, t with palatal hook, t with hook, 
        t with retroflex hook, t with curl, turned t, digraph tc with curl -->
        <s>ŧ</s><!-- U0167 -->
        <t>Ŧ</t><!-- U0166 -->
        <s>ƫ</s><!-- U01AB -->
        <s>ƭ</s><!-- U01AD -->
        <t>Ƭ</t><!-- U01AC -->
        <s>ʈ</s><!-- U0288 -->
        <t>Ʈ</t><!-- U01AE -->
        <s>ȶ</s><!-- U0236 -->
        <s>ʇ</s><!-- U0287 -->

        <reset>u</reset> <!-- u bar, latin upsilon -->
        <s>ʉ</s><!-- U0289 -->
        <t>Ʉ</t><!-- U0244 -->
        <s>ʊ</s><!-- U028A -->
        <t>Ʊ</t><!-- U01B1 -->

        <reset>v</reset> <!-- v with hook, turned v -->
        <s>ʋ</s><!-- U028B -->
        <t>Ʋ</t><!-- U01B2 -->
        <s>ʌ</s><!-- U028C -->
        <t>Ʌ</t><!-- U0245 -->

        <reset>w</reset> <!-- turned w, wynn -->
        <s>ʍ</s><!-- U028D -->
        <s>ƿ</s><!-- U01BF -->
        <t>Ƿ</t><!-- U01F7 -->

        <reset>y</reset> <!-- small capital y, y with hook, turned y, yogh -->
        <s>ʏ</s><!-- U028F -->
        <s>ƴ</s><!-- U01B4 -->
        <t>Ƴ</t><!-- U01B3 -->
        <s>ʎ</s><!-- U028E -->
        <s>ȝ</s><!-- U021D -->
        <t>Ȝ</t><!-- U021C -->

        <reset>z</reset> <!-- z with stroke,  z with hook, z with retroflex hook, 
        z with curl, ezh, ezh with caron, 
        ezh reversed, ezh with tail, ezh with curl -->
        <s>ƶ</s><!-- U01B6 -->
        <t>Ƶ</t><!-- U01B5 -->
        <s>ȥ</s><!-- U0225 -->
        <t>Ȥ</t><!-- U0224 -->
        <s>ʐ</s><!-- U0290 -->
        <s>ʑ</s><!-- U0291 -->
        <s>ʒ</s><!-- U0292 -->
        <t>Ʒ</t><!-- U01B7 -->
        <s>ǯ</s><!-- U01EF -->
        <t>Ǯ</t><!-- U01EE -->
        <s>ƹ</s><!-- U01B9 -->
        <t>Ƹ</t><!-- U01B8 -->
        <s>ƺ</s><!-- U01BA -->
        <s>ʓ</s><!-- U0293 -->
       

        <!-- Digraphs -->
        <reset>dʑ</reset> <!-- dz digraph with curl -->
        <s>ʥ</s><!-- U02A5 -->
        <reset>dʒ</reset> <!-- dezh digraph -->
        <s>ʤ</s><!-- U02A4 -->
        <reset>hv</reset> <!-- hv -->
        <s>ƕ</s><!-- U0195 -->
        <reset>HV</reset> <!-- hwair -->
        <s>Ƕ</s><!-- U01F6 -->
        <reset>lʒ</reset> <!-- lezh -->
        <s>ɮ</s><!-- U026E -->
        <reset>oe</reset> <!-- small capital oe -->
        <s>ɶ</s><!-- U0276 -->
        <reset>tɕ</reset> <!-- tc digraph with curl -->
        <s>ʨ</s><!-- U02A8 -->

        <!-- tailorings against Greek obsolete when done against current UCA table -->

        <!-- Cyrillic letters: full conformance with GOST requirements -->

        <reset>ђ</reset> <!-- gje as variant of dje (Serbian) -->
        <s>ѓ</s><!-- U0453 -->
        <t>Ѓ</t><!-- U0403 -->

        <reset>ћ</reset> <!-- kje as variant of tshe (Serbian) -->
        <s>ќ</s><!-- U045C -->
        <t>Ќ</t><!-- U040C -->

        <!-- Georgian: unchanged -->
        
        <!-- Armenian -->
        <reset>ք</reset> <!-- U0584 -->
        <p>և</p><!-- U0587 -->
      </rules>
    </collation>
  </collations>
</ldml>
  • 1 Shapes may vary according to fonts and styles

  • 2 If possible, combining diacritical marks are referenced. If no corresponding combining diacritical mark exists, the table lists non-combining variants. Diacritical marks are unified for Cyrillic and Latin but not for Greek and Latin. This reflects prevalent usage and user-expectations

  • 3 Names in lowercase letters are only an informative selection of some of the most common alternative names. Names in capitals are normative.

  • 4 Strictly speaking, umlaut and trema can be two typographically slightly different phenomena, but the distinction is increasingly becoming obsolete.

  • 5 The letters sometimes referred to as small g with comma above and capital g with comma below are to be ordered as small g with cedilla and capital g with cedilla respectively.

  • 6 Exists only in combination with α, η, ω as ᾳ, ῃ, ῳ.

  • 7 Position and name in ISO/IEC 10646:2003. If no corresponding combining diacritical mark exists, the table lists non-combining variants. If these also do not exist, the table simply gives the names of the diacritical marks. Diacritical marks are not unified across scripts unless this reflects prevalent usage and user-expectations

  • 8 This and several other combinations cannot reasonably be printed as stand-alone diacritics. They are presented here in combination with the letter α

  • 9 Equivalent on First Ordering Level

  • 10 Also known as letter gha

None: PreEN13710 (last edited 2010-05-30 14:00:11 by MKuester)