Collations

Sorting and searching non-English text can cause a number of problems, a frequent one being how to handle accented letters, for example á, à and â.

The rules for sorting vary because the various natural languages sort words differently. There are occasions where the accented form of a letter is treated as a distinct letter for the purpose of comparison. For example, in Swedish, Å is a separate letter that is sorted after Z. In some languages, it is common to sort uppercase before lowercase, in other languages this is reversed; sometimes it is just a matter of personal preference.

A collation, also known as a collating sequence, is a database object containing a set of rules that determines how character strings are compared, searched and alphabetically sorted. The rules in the collation determine whether one character string is less than, equal to or greater than another. A collation also determines how case-sensitivity and accents are handled.

In Mimer SQL, a collation belongs to a schema. In this release, the pre-defined collations included belong to INFORMATION_SCHEMA.

When a collation is used, Mimer SQL first checks to see if it belongs to the ident's schema. If Mimer SQL does not find it there, it checks for it in INFORMATION_SCHEMA.

Character Sets and Collations

For character data, Mimer SQL uses the character set ISO 8859-1, also known as the Latin1 character set. By default, character data is sorted in the numerical order of their codes according to the ISO8BIT collation.

For national character data, Mimer SQL uses the Unicode character set. By default national character data is sorted according to the UCS_BASIC collation.

It is not possible to add, drop or modify a character set.

Every character set has one default collation.

Character Set

Default Collation

Data Types

ISO 8859-1

ISO8BIT

CHARACTER
CHARACTER VARYING
CLOB

UNICODE

UCS_BASIC

NCHAR
NCHAR VARYING
NCLOB

If you want to sort characters in a different way than the default, you can specify a collation at the column level when creating or altering a table or creating a domain. You can also override a collation by using a COLLATE clause in an SQL statement.

Using Collations

You can specify a collation for ordering characters when you create or alter a table or create a domain.

If you have specified a collation for a column, the collation is used implicitly in SQL statements.

You only need to explicitly use a collate clause in SQL statements if you want to override the default collation or the collation you specified when creating or altering the table or creating the domain.

Character Strings

SQL only permits compatible character strings to be compared. That is, you can compare character strings only if the source and target strings belong to the same collation or can be coerced into having the same collation.

A character string that is defined with a named collation can only be compared or assigned to a character string that is either defined with the same named collation or is defined without a collation.

In the case where one of the strings is not associated with a named collation then it will be implicitly coerced to the same collation as the other string.

String Comparison Examples

The following three comparisons are all legal (and equivalent):

job_title = 'developer' COLLATE english_1

 

job_title COLLATE english_1 = 'developer'

 

job_title COLLATE english_1 = 'developer' COLLATE english_1

 

But –

job_title COLLATE english_1 = 'developer' COLLATE swedish_1

 

is illegal because different collations are specified.

CREATE/ALTER TABLE

When creating or altering a table, you can specify a collation in the column-definition, for example:

CREATE TABLE employees (surname CHAR(20) COLLATE swedish_1

            ...

CREATE DOMAIN and CREATE TYPE

When creating a domain you can specify a collation for the character and national character string data types, for example:

CREATE DOMAIN name_type AS VARCHAR(48) COLLATE english_1;

CREATE TYPE car_models AS VARCHAR(48) COLLATE english_1;

 

All properties of a domain apply to the column when the domain/type is used in a CREATE TABLE or ALTER TABLE statement.

CREATE INDEX

To improve performance when retrieving data, you can create more than one index for a column using different collations, for example:

CREATE INDEX cnt_eng_ind ON countries (country COLLATE english_3);

CREATE INDEX cnt_swe_ind ON countries (country COLLATE swedish_3);

 

Which index that will be used depends on the situation. For example:

SELECT * FROM countries ORDER BY country COLLATE english_3;

 

will use the cnt_eng_ind index.

And

SELECT * FROM countries ORDER BY country COLLATE swedish_3;

 

will use the cnt_swe_ind index.

Collation Precedence

A collation specified in the column-definition will take precedence over a domain collation.

Continuing with the example above, the domain collation was set to english_1, but in the following example the column country is set to swedish_1, which takes precedence over the domain setting:

     CREATE TABLE countries (

     code CHARACTER(2),

     country name_type COLLATE swedish_1,

     ...

Altering Collations on Columns

You can change the collation specified for a column by using the ALTER TABLE statement, for example:

ALTER TABLE countries ALTER COLUMN country CHAR(20) COLLATE english_1;

 

To return to the default (ISO8BIT) sorting order, you would enter:

ALTER TABLE countries ALTER COLUMN country CHAR(20) COLLATE ISO8BIT;

 

By altering a collation, for example to the default ISO8BIT collation, you can remove any dependencies associated with the collation. This makes it possible to drop the collation – see the next section.

Dropping a Collation

You can drop a collation only if there are no dependencies, for example:

DROP COLLATION collation_name RESTRICT;

Finding Out the Default Collation For a Column

You can find out which collation a column uses by reviewing the INFORMATION_SCHEMA.COLUMNS view, for example:

SELECT *

FROM INFORMATION_SCHEMA.COLUMNS

WHERE table_name = 'table1'

  AND column_name = 'col1';

 

For more information, see the Mimer SQL Reference Manual, INFORMATION_SCHEMA dictionary views.

Using Collations – Examples

The following sections contain examples of how to use collations and what effects collations can have on a result set.

The examples are based on the following (rather simple) table, table1:

colswe

coleng

col1

A

A

A

a

a

a

W

W

W

å

å

å

v

v

v

colswe uses the swedish_1 collation, coleng uses the english_1 collation and col1 uses the Mimer SQL default ISO8BIT collation.

Comparison Operators

You can qualify the comparison operators (=, <>, <, <=, …) with a COLLATE clause. For example:

SELECT col1

FROM table1

WHERE col1 > 'm';

 

would give the following result:

col1

å

v

However, explicitly using the COLLATE clause and the english_1 collation:

SELECT col1

FROM table1

WHERE col1 > 'm' COLLATE english_1;

 

would give the following result:

col1

W

v

Similarly, explicitly using the COLLATE clause and the swedish_1 collation:

SELECT col1

FROM table1

WHERE col1 > 'm' COLLATE swedish_1;

 

would give the following result:

col1

W

å

v

ORDER BY

You can use a COLLATE clause together with an ORDER BY clause to sort result sets. In most cases a level 3 collation is suitable for order by purposes. For example:

SELECT *

FROM table1

ORDER BY col1 COLLATE swedish_3;

 

retrieves the data and sorts it on col1 according to the swedish_3 collation:

colswe

coleng

col1

a

a

a

A

A

A

v

v

v

W

W

W

å

å

å

Similarly, the following statement:

SELECT *

FROM table1

ORDER BY col1 COLLATE english_3;

 

retrieves the data and sorts it according to the english_3 collation:

colswe

coleng

col1

a

a

a

A

A

A

å

å

å

v

v

v

W

W

W

Note:Where a collation defines a number of characters with the same sort-order value, the retrieval order within the sort-order value is not defined.

GROUP BY

Depending on the collation associated with a column, you might get differing results when using GROUP BY.

For example, the statement:

SELECT col1, COUNT(*)

FROM table1

GROUP BY col1 COLLATE swedish_1;

 

gives the following result:

col1

 

A

2

W

2

å

1

According to the swedish_1 collation, two instances of the character ‘a’ were found and one instance of ‘å’ which is considered a separate character in the Swedish language.

Similarly, using the english_1 collation in the statement:

SELECT col1, COUNT(*)

FROM table1

GROUP BY col1 COLLATE english_1;

 

gives the following result:

col1

 

A

3

v

1

W

1

According to the english_1 collation, three instances of the character ‘a’ were found, as the character ‘å’ has the same sort-order value as ‘A’ and ‘a’.

Scalar String Functions

You can use the COLLATE clause with the scalar string functions SUBSTRING and TRIM.

Character strings that are derived from a single string, for example, those returned from the TRIM and SUBSTRING functions, inherit the collation from the source string.

TRIM and COLLATE

You should be aware of the consequences when you use a TRIM function on a column that has a collation.

For example, referring to table1, see Using Collations – Examples, the following statement:

SELECT TRIM('v' FROM colswe)

FROM table1;

 

would trim both ‘W’ and ‘v’ from the result set as the characters ‘W’ and ‘v’ have the same sort-order value in a Swedish case-insensitive collation.

Similarly, the following statement:

SELECT TRIM('a' FROM col1)

FROM table1;

 

would trim ‘A’, ‘a’ and ‘å’ from the result set as the characters ‘A’, ‘a’ and ‘å’ have the same sort-order value in an English case-insensitive collation.

Concatenation Operator

Suppose you want to concatenate columns, colswe and col1, for example:

SELECT colswe || coleng

FROM table1;

 

Because the columns use different collations the result set will have the default collation ISO8BIT.

However, if you want apply a collation to the result set, you can add a COLLATE clause. for example:

SELECT (colswe || coleng) COLLATE swedish_1

FROM table1;

IN and BETWEEN

A collation will affect the results of a query that uses IN or BETWEEN.

For example, the following statement:

SELECT *

FROM table1

WHERE coleng BETWEEN 'a' and 'B';

 

returns:

colswe

coleng

col1

A

A

A

a

a

a

å

å

å

But, the statement:

SELECT *

FROM table1

WHERE colswe BETWEEN 'a' and 'B';

 

returns

colswe

coleng

col1

A

A

A

a

a

a

UNION, EXCEPT and INTERSECT

When performing a UNION (or EXCEPT or INTERSECT), you must know what collations are involved in order to ensure that you get the result you want.

For example, the following statement:

SELECT colswe

FROM table1

 

UNION

 

SELECT coleng

FROM table1;

 

raises an error because the UNION operator can’t understand which duplicate rows to remove or not.

To perform the UNION according to the swedish_1 collation, you would explicitly use a COLLATE clause, for example:

SELECT colswe COLLATE swedish_1

FROM table1

 

UNION

 

SELECT coleng

FROM table1;

 

which would return:

colswe

A

W

å

Similarly, for a UNION result according to the english_1 collation, you would enter:

SELECT colswe COLLATE english_1

FROM table1

 

UNION

 

SELECT coleng

FROM table1;

 

which would return:

colswe

A

v

W

DISTINCT

When you use DISTINCT, you must consider the consequences of which collation is associated with a column.

In the following example:

SELECT DISTINCT col1

FROM table1;

 

All entries in col1 are considered DISTINCT as it uses the Mimer SQL default collation ISO8BIT:

col1

A

W

a

v

å

However, in this next statement:

SELECT DISTINCT colswe

FROM table1;

 

colswe uses the swedish_1 collation. ‘å’ and ‘A’ are considered to be distinct, but ‘v’ and ‘W’ are not:

colswe

A

W

å

Similarly, in this example:

SELECT DISTINCT coleng

FROM table1;

 

coleng uses the english_1 collation, ‘v’ and ‘W’ are considered to be distinct, but ‘å’ and ‘A’ are not:

coleng

A

v

W