dcsimg
  • Character Encoding


  • Each letter, number, punctuation character, and control character used on a computer has a binary value associated with it. Computer developers have defined schemes for coding the characters so that computers understand them. The process of coding the characters into machine-readable language is called character encoding.

    Different encoding schemes exist that encode characters differently. For example, encoding letters and numbers from a U.S. keyboard are usually not a problem in North America; however, in another country, there may not be characters that correspond to the North American characters. Common data values that can cause problems in other environments include null, TAB, and accented letters (such as Ñ).

    In AAMVAnet, you can use a variety of encoding schemes to make sure that characters in your data are translated correctly. For more information, see the tabs below or contact AAMVA Enterprise Architecture.

  • Following is a summary of some common encoding schemes.

    This scheme... 
    Is used on...
    And has these features... 
    More Info
    American Standard Code for Information Interchange (ASCII) UNIX and older Windows computers
    • Has many variants; in the U.S., ASCII - ISO 8859-1 (Latin-1) is the most common. (In 2004, the 8859-1 working group stopped working on 8859-1, in order to concentrate on UNICode).
    • Uses primarily eight binary bits for each character.
    • In all variants, numbers and English letters map to the same hexadecimal values in the range 00 to 7f.
    • Above 7f, different characters are assigned differently in different variants. (The mapping in ASCII 8859-1 uses the UNICode table U000 Basic-Latin for hex values 00 to 7f, followed by UNICode Latin-1 for hex values 80 to ff which supports characters from other West European languages.)
    ASCII
    Extended Binary Coded Decimal Interchange Code (EBCDIC) Mainframe computers
    • Has many variants; in the U.S., the default in IBM Enterprise COBOL is EBCDIC 1140.
    • Uses eight binary bits for each character.
    • The variant is normally tailored to the country of use. For example, EBCDIC-US has a "$" character, and EBCDIC-Spain has a "Ñ".
    • Numbers and English letters map to the same hexadecimal values. However, the hex values used by the extra characters [which extra characters?] have different meanings depending on the variant used. 
    EBCDIC
    UNICode N/A
    • Not implemented directly on hardware, but hardware may use a scheme which uses UNICode.
    • Has multiple tables that are allocated based on a range of hex values. UNICode's large range of hex values provides support for most modern scripts and also supports scripts of ancient languages.
    • Tables are identified by their starting hex value.
      • Table U000 - Basic-Latin contains hex values 00 to 7f and covers the characters on an English keyboard.
      • Table U0080 - Latin-1 contains hex values 80 to ff and covers additional European characters.
      • Currently, UNICode tables include UFF00 – Korean Half width Jamo.
    UNICode
    UCS/UNICode Transformation Format (UTF) Internet
    (XML and web services)
    • The most popular variant is UTF-8
    • Uses eight binary bits for common characters. For less common characters, 16 or more bits are used. 
    • Uses UNICode
    UTF-8
    UNICode Character System (USC) Newer Windows computers
    • Sometimes called Code Page 1252
    • Uses 16 binary bits for each character
    • Ranges from hex value 0000 to ffff
    • Uses UNICode
     

     

    If your system uses AMIE to communicate over AAMVAnet (see Note), messages are restricted to characters which can be used in the ASCII, EBCDIC-1140 and UTF-8 encoding schemes. These acceptable characters are sometimes referred to as “printable characters”.

    This limitation exists because different types of computers use different data encoding schemes. Therefore, to communicate across the network, the only characters that can be used are those that are common to all computers connected to the network.

    Note: If your application uses UNI, then AMIE is used to communicate over AAMVAnet.

    Allowable Characters

    • Space
    • a to z
    • A to Z
    • 0 to 9
    • ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ \ _ { | } ~

     

     

    If your system uses XML to communicate over AAMVAnet (for example, if you are using a web service), then the UTF-8 scheme is typically used to encode printable characters. UTF-8 uses the Unicode character tables which support characters from many alphabets, including characters used in Asian languages. UTF-8 supports the following white space characters:

    • Space
    • TAB
    • Paragraph return (line feed carriage return)

    Other binary data that cannot be mapped to a printable character must be excluded from message data. 

    Translating Reserved XML Code Characters

    In XML, the angle bracket (< >), ampersand (&), apostrophe ('), and quotation characters (") are all used to denote XML code. To represent these characters in message data in a XML document, they must be replaced by the following "escape characters". When XML data is processed via a web service or a parser, it may automatically convert these characters to and from their XML form.

    Example 

    • Before Translation: Speeding > 5mph over "posted limit"
    • After Translation: Speeding &gt 5mph over &quotposted limit&quot

    To represent this character...
    This is used...

    <

    &lt

    >

    &gt

    &

    &amp

    '

    &apos

    "

    &quot

     

      

    Following is information about delimiter and separator characters in other standards. If you use these standards, make sure that your vehicle and driver's license data does not contain these reserved characters. 

    This standard...
    Uses these characters as separators and delimiters...
    ANC X12
    • The asterisk (*) is the preferred separator for data elements, but you can specify others. If you communicate with organizations who use the asterisk as a separator (for example, organizations who use the Automated Commercial Environment's truck e-manifest), you may need to determine how you will exchange data that contains asterisks (for example, a driver's license number that contains an asterisk).
    • The paragraph mark (carriage return) is the preferred data segment terminator.

    You can define delimiters for each interchange by specifying them in the interchange start segment. The only requirement is that the delimiter character you specify cannot be used elsewhere in the interchange.

    ICAO 9303 (Names) The less-than character (<) is the delimiter between names. This standard is used internationally on travel documents such as passports.
    Line-Sequential Data The paragraph mark (carriage return) is the delimiter between records or fields.
    Magnetic Stripe Most of the non-letter and non-number characters are delimiters. For details, see the AAMVA DL/ID Card Design Standard.
    PDF417 Bar Code (2D) Several binary control characters are delimiters. For details, see the AAMVA DL/ID Card Design Standard.

     

       

  • Related Content


  • Related Organizations