ISO/IEC 8859

ISO/IEC 8859 is a collection of fifteen different 8-bit character encodings. By definition, an 8-bit character encoding assigns a unique number between 0 and 255 to a character. The first ISO/IEC 8895 encodings were designed in the mid-1980s by the European Computer Manufacturer's Association (ECMA) and endorsed by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC).

The ISO/IEC 8895 collection consists of numbered parts: ISO/IEC 8859-1 through ISO/IEC 8859-16. They are to be used by languages that use different letters, for example, part 6 covers  most of the Arabic language characters; see Table 1 for an overview. The part ISO/IEC 8859-12 was destined for Latin/Devanagari but was prematurely abandoned.

Often the ASCII codes (codes 0 through 127) are seen as part of ISO/IEC 8859. The first 32 ASCII codes are control characters, these form control character set 0, referred to as C0. The characters from 128 through 159 (hexadecimal: 0x80 – 0x9F) constitute control set C1 of ISO 8859. The Windows Latin character set (Windows code page 1252) uses many of the positions in control set C1 for printable characters. Thus, the Windows encoding from 128 through 159 is completely different from the Latin-1 (ISO/IEC 8859-1) encoding. However, the Windows code page 1252 is identical to Latin-1 from character 160 (non-breaking space) through 255 (ÿ). The extended ASCII set used by  DOS, on the other hand, is completely different between 128 and 255, but coincides again with ASCII for the characters below 128.

The ISO and IEC are also responsible for  ISO 10646 (UCS, Universal Character Set), a much more ambitious and elaborate character encoding than ISO/IEC 8859. UCS is kept synchronized with Unicode of the Unicode Consortium. Latin-1 (ISO/IEC 8859-1) has been adopted as the first code pages of ISO 10646  and Unicode.

On the World-Wide-Web, a near-exponential increase in usage of Unicode UTF-8 is observed. ISO/IEC 8859-1 is in 2011 still important, but is on the decline on the Web.

Table 2 lists all the characters in the different parts. The columns are organized such that it is relatively easy to switch between character sets. For example, the German umlauts ë, ä, ö, and ü and scharfes S ß are found at exactly the same positions in Latin-1, Latin-2, Latin-3, Latin-4, Latin-5 (column 9), and Latin-6 (column 10). Thus one can write German/Polish with Latin-2 or German/Turkish with Latin-5.

The HTML version of table 2 is prepared in Unicode UTF-8. Two examples: the Latin-3 character H with stroke (column 3, row 161, U+0126) is given by &amp;#x126; &rarr; &#x126;. The Thai digit 8 (column 11, row 248, U+0E58) is given by &amp;#xE58; &rarr; &#xE58;.


 * Row 160 gives the non-breaking space (HTML: &amp;nbsp;) and row 173 gives, except for column 11 (Thai), the soft hyphen (HTML: &amp;shy;) that only shows at line breaks. Other empty fields are unassigned.


 * LRM stands for left-to-right mark (U+200E) and RLM stands for right-to-left mark (U+200F).