blob: 9591e41aac8078ca20210b729f4be0cb38b820be [file] [log] [blame]
4.2 Character strings
A character string is a sequence of characters. All the characters in a character string are taken from a single character set. A character string has a length, which is the number of characters in the sequence. The length is 0 (zero) or a positive integer. A character string type is described by a character string type descriptor.
A character string type descriptor contains:
- The name of the specific character string type (CHARACTER , CHARACTER VARYING , and CHARACTER LARGE OBJECT ; NATIONAL CHARACTER , NATIONAL CHARACTER VARYING , and NATIONAL CHARACTER LARGE OBJECT are represented as CHARACTER , CHARACTER VARYING , and CHARACTER LARGE OBJECT , respectively).
- The length or maximum length in characters of the character string type.
- The catalog name, schema name, and character set name of the character set of the character string type.
- The catalog name, schema name, and collation name of the collation of the character string type.
The character set of a character string type may be specified explicitly or implicitly.
The <key word>s NATIONAL CHARACTER are used to specify an implementation-defined character set. Special syntax (N'string') is provided for representing literals in that character set. With two exceptions, a character string expression is assignable only to sites of a character string type whose character set is the same. The exceptions are as specified in Subclause 4.2.7, "Universal character sets", and such other cases as may be implementation-defined. If a store assignment would result in the loss of non-<space> characters due to truncation, then an exception condition is raised. If a retrieval assignment or evaluation of a <cast specification> would result in the loss of characters due to truncation, then a warning condition is raised.
Character sets fall into three categories: those defined by national or international standards, those defined by SQL-implementations, and those defined by applications. The character sets defined by ISO/IEC 10646 and The Unicode Standard are known as Universal Character Sets (UCS) and their treatment is described in Subclause 4.2.7, "Universal character sets". Every character set contains the <space> character (equivalent to U+0020). An application defines a character set by assigning a new name to a character set from one of the first two categories. They can be defined to "reside" in any schema chosen by the application. Character sets defined by standards or by SQL-implementations reside in the Information Schema (named INFORMATION_SCHEMA) in each catalog, as do collations defined by standards and collations, transliterations, and transcodings defined by SQL-implementations.
NOTE 9 : The Information Schema is defined in ISO/IEC 9075-11.