This is an old revision of the document!
This is an early draft for a possible generic mechanism to allow authors to define custom text-transforms.
The general form of an @text-transform at-rule is:
@text-transform <transform-name> { [ descriptor: value; ]+ }
The descriptors express the conversion from certain characters to other characters, using different mechanism to specify the the source and target characters. If several descriptors are used, the transform described is the result of successively applying them all, in the order they appear in the @text-transform.
Example:
The following two transforms are identical.
@text-tranform abcdef1 { convert: "abc" to "def"; } @text-tranform abcdef2 { convert: "a" to "d"; convert: "b" to "e"; convert: "c" to "f"; }
Name: convert Value: <string> to <string> default: N/A
This descriptor creates a 1 to 1 mapping from the characters in the first string to the characters in the second string.
Both strings should be of equal length. If they are not, the longer on is truncated to the same length as the shorter one.
Name: convert-range Value: <string>,<string> to <string>,<string> default: N/A
It would sometimes be tedious to use the convert descriptor when the list of characters is long, but this can be simplified using convert-range when the characters' unicode code points for a continuous sequence.
Each pair of strings define an range of unicode characters, inclusive of the ones listed. All 4 strings must contain a single Unicode character.
The numerical code point value of the character in the first (resp. third) string must be less than the one in the second (resp. fourth) string. If it is not, the descriptor must be ignored. Both ranges should be of equal length. If they are not, the longer on is truncated to the same length as the shorter one. The ranges may overlap.
Example:
@text-transform latin-only-uppercase { convert-range: "a","z" to "A","Z"; }
Name: convert-predefined Value: <text-transform> default: N/A
This descriptor makes it possible to refer to existing text tranforms, either predefined by CSS or defined by the author. While an @text-transform using only this descriptor is not very useful, combining it with other descriptors allows authors to extend or define variants of existing transforms. convert-predefined cannot refer to the text-transform whose definition it is part of.
[<string> [, <string>]? to <string> [, <string>]?] | <text-transform>
Name: applies-to Value: all | initial Default: all
It would let people define customized versions of text-transform:capitalize;
The following use cases only apply to a single language. Defining all the possibly useful text-transforms for all languages would go beyond the capacity and expertise of the CSS WG. Having the generic mechanism allows authors to solve their specific problem.
In Japanese, small kanas appearing within ruby are sometimes replaced by the equivalent full-size kana. The following transform defines this conversion
@text-transform full-size-kana { convert: "ぁぃぅぇぉゕゖっゃゅょゎ" to "あいうえおかけつやゆよわ"; convert: "ァィゥェォヵㇰヶㇱㇲッㇳㇴㇵㇶㇷㇸㇹㇺャュョㇻㇼㇽㇾㇿヮ" to "アイウエオカクケシスツトヌハヒフヘホヤユヨラリルレロワ"; convert: "ァィゥェォャュョ" to "アイウエオツヤユヨ"; }
As discussed in this thread, ß (aka ß or U+00DF) is traditionally considered a lower case letter without an uppercase equivalent. text-transform:uppercase leaves it unchanged. Unicode has introduced ẞ (U+1E9E), an uppercase version of it since 5.1, but without making it a target of toupper().
This letter being rather new, authors are bound to disagree whether it is a proper uppercase variant of U+00DF, or not. Those who think it is not may use text-transform:uppercase; and text-transform:lowercase Those who think it is could use the following.
@text-transform german-uppercase { convert-predefined: uppercase; convert: "ß" to "ẞ"; } @text-transform german-lowercase { convert-predefined: lowercase; convert: "ẞ" to "ß"; }
http://en.wikipedia.org/wiki/Dotted_and_dotless_I
In Turkish and a few related languages, dotted and dotless i are distinct letters, both in upper land lower case.
The uppercasing and lowercasing algorithm defined for the text-transform property only preserve this when the content language of the element is known.
Someone, for example in a user style sheet, may want to apply an uppercase or lowercase transform to a document where language is insufficiently marked up, but known to the author of the style sheet to be Turkish. In this case, the generic uppercase and lowercase transforms would fail, but the following would work.
@text-transform turkic-uppercase { convert: "i" to "İ"; convert-predefined: uppercase; } @text-transform turkic-lowercase { convert: "I" to "ı"; convert-predefined: lowercase; }
http://en.wikipedia.org/wiki/Letter_case#Other_forms_of_case http://en.wikipedia.org/wiki/Georgian_alphabet
The Georgian language has used three different unicameral alphabets through history: Asomtavruli, Nuskhuri, and Mkhedruli. Recently, some authors have been using Asomtavruli letters in an otherwise Mkhedruli text, in a way that resembles a bicameral alphabet. One may assume that they would find the following transform useful.
@text-transform Mkhedruli-to-Asomtavruli {
convert: "ა","ჵ" to "Ⴀ","Ⴥ";
}
@text-transform Asomtavruli-to-Mkhedruli {
convert: "Ⴀ","Ⴥ" to "ა","ჵ";
}
The following cases are examples of cases useful in several languages, but rare enough that they are better addressed by authors when needed than by the CSS WG.
http://en.wikipedia.org/wiki/Long_s http://www.fileformat.info/info/unicode/char/17f/index.htm
In old (18th century and earlier) European texts, the letter s, when at the middle or begining of the word, was written ſ (U+017F). S occuring at the end of a word would be written as the modern s is.
Modern readers are often unfamiliar with this letter form, and for readability reasons, one may want to convert from one to the other. The follow transform would accomplish this.
@text-transform modernize-s { convert: "ſ" to "s"; }
Here are some more example of how the generic mechanism may be used
In the “Asterix and the Great Crossing” comic book, the Viking characters are supposed to speak a foreign language unintelligible to the main characters, but still understandable to the readers. This is represented by writing down their speech normally, except that some letters are replaced by similarly looking letters found in Scandinavian languages.
This effect could be obtained by the following transform:
@text-transform fake-norse { convert: "aoAO" to "åøÅØ"; }