Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
at-text-transform [2011/12/01 07:41]
florian [The convert-predefined descriptor]
at-text-transform [2014/12/09 15:48] (current)
Line 1: Line 1:
-This is an early draft for a possible generic mechanism to allow authors to define custom text-transforms. +This page has movedSee [[ideas:at-text-transform|here]]
- +
-====== Defining Custom Text Transforms: the @text-transform rule====== +
- +
-The general form of an @text-transform at-rule is: +
- +
-<​code>​ +
-@text-transform <​transform-name>​ +
-descriptor: value; ]+ } +
-</​code>​ +
- +
-The descriptors express the conversion from certain characters to other characters, using different mechanism to specify the the source and target characters. If several descriptors are used, the transform described is the result of successively applying them all, in the order they appear in the @text-transform. +
- +
-<note warning>​ISSUE:​ should "​@text-transform foo {...}" be used as "​text-transform:​ custom(foo);"​ or as "​text-transform:​ foo;"? I would rather do the same as counter-styles in lists, but let's discuss.</​note>​ +
- +
-Example: +
- +
-The following two transforms are identical. +
- +
-<​code>​ +
-@text-tranform abcdef1 +
-+
-    convert: "​abc"​ to "​def";​ +
-+
- +
-@text-tranform abcdef2 +
-+
-    convert: "​a"​ to "​d";​ +
-    convert: "​b"​ to "​e";​ +
-    convert: "​c"​ to "​f";​ +
-+
-</​code>​ +
- +
-===== The convert descriptor ===== +
- +
-<​code>​ +
-Name: convert +
-Value: <​string>​ to <​string>​ +
-default: N/A +
-</​code>​ +
- +
-This descriptor creates a 1 to 1 mapping from the characters in the first string to the characters in the second string. +
-<note warning>​ISSUE:​ how should we define character here? Legacy or extended grapheme cluster?</​note>​ +
- +
-Both strings should be of equal length. If they are not, the longer on is truncated to the same length as the shorter one. +
-<note warning>​ISSUE:​ define length properly in terms of grapheme clusters</​note>​ +
- +
-===== The convert-range descriptor ===== +
- +
-<​code>​ +
-Name: convert-range +
-Value: <​string>,<​string>​ to <​string>,<​string>​ +
-default: N/A +
-</​code>​ +
- +
-It would sometimes be tedious to use the convert descriptor when the list of characters is long, but this can be simplified using convert-range when the characters'​ unicode code points for a continuous sequence. +
- +
-Each pair of strings define an range of unicode characters, inclusive of the ones listed. All 4 strings must contain a single Unicode character. +
- +
-<​note>​NOTE:​ Here, grapheme clusters don't make sense</​note>​ +
- +
-The numerical code point value of the character in the first (resp. third) string must be less than the one in the second (resp. fourth) string. If it is not, the descriptor must be ignored. Both ranges should be of equal length. If they are not, the longer on is truncated to the same length as the shorter one. The ranges may overlap. +
- +
- +
-Example: +
-<​code>​ +
-@text-transform latin-only-uppercase +
-+
-    convert-range:​ "​a","​z"​ to "​A","​Z";​ +
-+
-</​code>​ +
- +
-===== The convert-predefined descriptor ===== +
- +
-<​code>​ +
-Name: convert-predefined +
-Value: <​text-transform>​ +
-default: N/A +
-</​code>​ +
- +
-This descriptor makes it possible to refer to existing text tranforms, either predefined by CSS or defined by the author. While an @text-transform using only this descriptor is not very useful, combining it with other descriptors allows authors to extend or define variants of existing transforms. convert-predefined cannot refer to the text-transform whose definition it is part of. +
- +
- +
-<note warning>​ISSUE:​ Should we combine the 3 descriptors into one, with the following syntax? +
-<​code> ​[<​string>​ [, <​string>​]?​ to <​string>​ [, <​string>​]?​] | <​text-transform></​code>​ </​note>​ +
- +
-<note warning>​ISSUEdo we also need one more desciptor along these lines: +
-<​code>​Name:​ applies-to +
-Value: all | initial +
-Default: all +
-</​code>​ +
-It would let people define customized versions of text-transform:​capitalize;​ +
-</​note>​ +
- +
- +
- +
- +
-====== Use cases ====== +
- +
-===== Single Languages use cases ===== +
- +
-The following use cases only apply to a single language. Defining all the possibly useful text-transforms for all languages would go beyond the capacity and expertise of the CSS WG. Having the generic mechanism allows authors to solve their specific problem. +
- +
-==== Full-size kana ==== +
-In Japanese, small kanas appearing within ruby are sometimes replaced by the equivalent full-size kana. The following transform defines this conversion +
- +
-<​code>​ +
-@text-transform full-size-kana +
-+
-    convert: "​ぁぃぅぇぉゕゖっゃゅょゎ"​ to "​あいうえおかけつやゆよわ";​  +
-    convert: "​ァィゥェォヵㇰヶㇱㇲッㇳㇴㇵㇶㇷㇸㇹㇺャュョㇻㇼㇽㇾㇿヮ"​ to "​アイウエオカクケシスツトヌハヒフヘホヤユヨラリルレロワ";​ +
-    convert: "​ァィゥェォャュョ"​ to "​アイウエオツヤユヨ";​ +
-+
-</​code>​ +
- +
-==== German ß ==== +
- +
-As discussed [[http://​lists.w3.org/​Archives/​Public/​www-style/​2011Nov/​0193.html|in this thread]], ß (aka &szlig; or U+00DF) is traditionally considered a lower case letter without an uppercase equivalent. text-transform:​uppercase leaves it unchanged. Unicode has introduced ẞ (U+1E9E), an uppercase version of it since 5.1, but without making it a target of toupper(). +
- +
-This letter being rather new, authors are bound to disagree whether it is a proper uppercase variant of U+00DF, or not. Those who think it is not may use text-transform:​uppercase;​ and text-transform:​lowercase Those who think it is could use the following. +
- +
-<​code>​ +
-@text-transform german-uppercase +
-+
-    convert-predefined:​ uppercase;​ +
-    convert: "​ß"​ to "​ẞ";​ +
-+
- +
-@text-transform german-lowercase +
-+
-    convert-predefined:​ lowercase;​ +
-    convert: "​ẞ"​ to "​ß";​ +
-+
-</​code>​ +
- +
-==== Turkish i/ı ==== +
- +
-http://​en.wikipedia.org/​wiki/​Dotted_and_dotless_I +
- +
-In Turkish and a few related languages, dotted and dotless i are distinct letters, both in upper land lower case. +
- +
-The uppercasing and lowercasing algorithm defined for the text-transform property only preserve this when the content language of the element is known. +
- +
-Someone, for example in a user style sheet, may want to apply an uppercase or lowercase transform to a document where language is insufficiently marked up, but known to the author of the style sheet to be Turkish. In this case, the generic uppercase and lowercase transforms would fail, but the following would work.  +
- +
- +
-<​code>​ +
-@text-transform turkic-uppercase +
-+
-    convert: "​i"​ to "​İ";​ +
-    convert-predefined:​ uppercase;​ +
-+
- +
-@text-transform turkic-lowercase +
-+
-    convert: "​I"​ to "​ı";​ +
-    convert-predefined:​ lowercase;​ +
-+
-</​code>​ +
- +
-==== Georgian upper/lower case ==== +
- +
-http://​en.wikipedia.org/​wiki/​Letter_case#​Other_forms_of_case +
-http://​en.wikipedia.org/​wiki/​Georgian_alphabet +
- +
-The Georgian language has used three different unicameral alphabets through history: Asomtavruli,​ Nuskhuri, and Mkhedruli. Recently, some authors have been using Asomtavruli letters in an otherwise Mkhedruli text, in a way that resembles a bicameral alphabet. One may assume that they would find the following transform useful. +
- +
-@text-transform Mkhedruli-to-Asomtavruli +
-+
-    convert: "​ა","​ჵ"​ to "​Ⴀ","​Ⴥ";​ +
-+
- +
-@text-transform Asomtavruli-to-Mkhedruli +
-+
-    convert: "​Ⴀ","​Ⴥ"​ to "​ა","​ჵ";​ +
-+
- +
- +
-===== Cross-language use cases ===== +
- +
-The following cases are examples of cases useful in several languages, but rare enough that they are better addressed by authors when needed than by the CSS WG. +
- +
-==== Long s ==== +
- +
-http://​en.wikipedia.org/​wiki/​Long_s +
-http://​www.fileformat.info/​info/​unicode/​char/​17f/​index.htm +
- +
-In old (18th century and earlier) European texts, the letter s, when at the middle or begining of the word, was written ſ (U+017F). S occuring at the end of a word would be written as the modern s is. +
- +
-Modern readers are often unfamiliar with this letter form, and for readability reasons, one may want to convert from one to the other. The follow transform would accomplish this. +
- +
-<​code>​ +
-@text-transform modernize-s +
-+
-    convert: "​ſ"​ to "​s";​ +
-+
-</​code>​ +
- +
-===== Miscellaneous ===== +
- +
-Here are some more example of how the generic mechanism may be used +
- +
-==== Comic book vikings ==== +
-In the "​Asterix and the Great Crossing"​ comic book, the Viking characters are supposed to speak a foreign language unintelligible to the main characters, but still understandable to the readers. This is represented by writing down their speech normally, except that some letters are replaced by similarly looking letters found in Scandinavian languages. +
- +
-This effect could be obtained by the following transform:​ +
- +
-<​code>​ +
-@text-transform fake-norse +
-+
-    convert: "​aoAO"​ to "​åøÅØ";​ +
-+
-</​code>​+
 
at-text-transform.txt · Last modified: 2014/12/09 15:48 (external edit)
Recent changes RSS feed Valid XHTML 1.0 Valid CSS Driven by DokuWiki