Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ideas:at-text-transform [2012/04/17 06:08] – [Use cases] florianideas:at-text-transform [2018/11/04 10:50] (current) – old revision restored (2018/09/25 17:09) fantasai
Line 1: Line 1:
-This is an early draft for a possible generic mechanism to allow authors to define custom text-transforms.+<note warning>This page used to hold a proposal for a possible generic mechanism to allow authors to define custom text-transforms.
  
-====== Defining Custom Text Transforms: the @text-transform rule======+Wikis are good to sketch ideas, but not ideal to maintain specifications, so this has been moved to the author's personal site: https://specs.rivoal.net/css-custom-tt/
  
-The general form of an @text-transform at-rule is: +Feedback welcome on that pageYou may also consult old version of this wiki page at https://wiki.csswg.org/ideas/at-text-transform?do=revisions
- +
-<code css> +
-@text-transform <transform-name> +
-{ [ descriptor: value; ]+ } +
-</code> +
- +
-<transform-name> may be any valid identifier other than none, inherit and initial. +
- +
-A text transform created using this at-rule may be used simply by using <transform-name> as the value of the text-transform property. If <transform-name> conflicts with a existing CSS keyword, the conflict is resolved in favor of the name introduced using @text-transform. +
- +
- +
-Each @text-transform rule specifies a value for every text-transform descriptor, either implicitly or explicitly. Those not given explicit value in the rule take the initial value listed with each descriptor in this specification. These descriptors apply solely within the context of the @text-transform rule in which they are defined, and do not apply to document language elements. There is no notion of which elements the descriptors apply to or whether the values are inherited by child elements. When a given descriptor occurs multiple times in a given @text-transform rule, only the last specified value is used; all prior values for that descriptor must be ignored. +
- +
-===== The transformation descriptor ===== +
- +
-<code bnf> +
-Name: transformation +
-Value: <conversion># +
-defaultN/+
- +
-<conversion> = [<char-list> to <char-list>] | <'text-transform'> +
-<char-list> = <enumeration> | <range> +
-<range> = <urange> | <string> +
-<enumeration> = <string> +
-</code> +
- +
-This descriptor defines which character will be replaced by which, by listing a series of conversions, to be applied in the same order as they appear in the descriptor. +
- +
-Conversions may refer to existing text transforms, either predefined by CSS or defined by the author. While an transformation using only a single such conversion is not very useful, combining it with other conversions allows authors to extend or define variants of existing transforms. Referring to the text-transform currently being define is not allowed, and makes the whole descriptor invalid. +
- +
-Conversions may also define new mapping from one <char-list> to another. +
- +
-When defined using a <urange> ((http://dev.w3.org/csswg/css3-fonts/#descdef-unicode-range)) , the <char-list> is composed of each individual Unicode character code point designated by the <urange>+
- +
-A <range> may also be defined as a string made of a single unicode character, followed by a hyphen (U+002D) followed by another signle unicode character. The semantics are identical to the <urange> U+XXXXXX-YYYYYY where XXXXXX is the code point of the first character and YYYYYY the code point of the second character. +
- +
-If defined by an <enumeration>, it is composed of each character in the string, where what a character is depends on the character-type descriptor. The same character may not appear twice in the <char-list> defining the source of the mapping, otherwise the whole descriptor is invalid. +
- +
-In addition to [[http://www.w3.org/TR/css3-values/#strings|the usual CSS rules of character escaping]], hyphen (U+002D) need to be escaped to appear in a string in a conversion.  +
- +
-In a <conversion>, If the source <char-list> is longer than the target <char-list>, then the last item of the target list is used for all remaining items in the source list. +
- +
-<note warning>ISSUE 1: Should we allow spaces and other collapsible characters in the target? Since text-transform is applied after white space collapsing, what are the implications of generating runs of collapsible white space that won't be collapsedIt has been proposed that we should allow them, and trigger a second white space collapsing if they are actually used.</note> +
- +
-<note warning>ISSUE 2: Should we allow an empty <char-list> as the target? It has been suggested that this be used to delete text. I am not sure I like the idea that text-transform could be able to make some non-empty element empty.</note> +
- +
-<note warning>ISSUE 3: It has been suggested that it should be possible to write text-transforms that behave differently on different languages. This can probably be achieved by adding some optional part at the beginning of each <conversion>, although I am not sure what +
-the syntax should be.</note> +
- +
-Examples: +
- +
-<code css> +
-@text-transform latin-only-uppercase  +
-+
-    transformation: "a-z" to "A-Z"; +
-+
-</code> +
- +
-The following two transforms are identical. +
- +
-<code css> +
-@text-tranform abcdef1  +
-+
-    transformation: "abc" to "def"; +
-+
-@text-tranform abcdef2 +
-+
-    transformation: "a" to "d", +
-                    "b" to "e", +
-                    "c" to "f"; +
-+
-</code> +
- +
- +
-===== The character-type descriptor ===== +
- +
-<code bnf>Name: character-type +
-Value: extended | legacy | single | spaced +
-Default: extended +
-</code> +
- +
-This definition affects what is meant by character processing in two different contexts: +
- +
-  * strings used as an <enumeration> in the transformation descriptor +
-  * the text to which the text-transform will be applied. +
- +
-In an <enumeration>, the possible values have the following meanings: +
- +
-  * extended: characters are extended grapheme clusters, as defined in [[http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries|UAX29]] +
-  * legacy: characters are legacy grapheme clusters, as defined in [[http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries|UAX29]] +
-  * single: characters are single Unicode code points. +
-  * spaced: characters are space separated sequences of Unicode code points. +
- +
-How the the text to which the text-transform property is applied must be processed also depends on the value of this descriptor. +
- +
-When the value is 'extended', 'legacy', or 'single', each character (defined a respectively as extended grapheme clusters, legacy grapheme clusters, or single unicode code points) in the text is processed individually. +
- +
-<note> +
-Example: +
-<code> +
-@text-transform foo { +
-  character-type: extended; +
-  transformation: "e" to "a"; +
-+
-</code> +
-If the text to which the above text transform is applied contains the U+65 U+301 sequence ('é'), it will not be transformed, because the transform applies to whole grapheme clusters. +
- +
-On the other hand, the following text transform would transform that same sequence into U+61 U+301 ('á'), as it would consider each Unicode code point individually. +
- +
-<code> +
-@text-transform foo { +
-  character-type: single; +
-  transformation: "e" to "a"; +
-+
-</code>+
 </note> </note>
- 
-<note warning> ISSUE 4: 
-Define the processing model on the text for the 'spaced' value, to decide what happens to a piece of text like "aaaaa" when a transform like the following is applied to it: 
- 
-<code>@text-transform foo { 
-  character-type: spaced; 
-  transformation: "aa ca a" to "c a ca"; 
-}</code> 
-</note> 
- 
- 
-<note warning>ISSUE 5: what happens when either extended or legacy is specified, but the enumeration contains invalid clusters? Skip the invalid character, make the conversion invalid, or make the whole descriptor invalid?</note> 
- 
- 
-<note warning>ISSUE 6: define what happens when a text-transform refers via its transformation descriptor to another text-transform which has a different character-type. My guess: the original character-type applies to processing the enumerations, while the character-type in the including text-transform applies to the text that will be transformed. Or maybe this means this descriptor should be split in two.</note> 
- 
- 
-===== The scope descriptor ===== 
-<code bnf>Name: scope 
-Value: all | [initial || medial || final] 
-Default: all 
-</code> 
- 
-This descriptor makes it possible to restrict which characters in the source text are affected by the transform. 
- 
-  * 'all' lets the transform apply to any character 
-  * 'initial' lets the transform apply at the beginning of a word 
-  * 'final' lets the transform apply at the end of a word 
-  * 'medial' lets the transform apply to characters within a word other than at the beginning and the end. 
- 
-<note warning>ISSUE 7: More fancy values could be added here in the future to support things like title case, or to match only the base character, or only the diacritics.</note> 
- 
-The definition of "word" is UA-dependent; [[http://www.unicode.org/reports/tr29/tr29-17.html|UAX29]] is suggested (but not required) for determining such word boundaries. 
- 
-The transformation descriptor may be used to refer to existing text-transforms in the definition of a new one. If the text-transforms 
-referred to have a different scope than the scope specified in the text-transform that refers to them, they apply at the intersection of the 
-two scopes. 
- 
-Example: 
- 
-<code css> 
-@text-transform latin-only-uppercase 
-{ 
-    transformation: "a-z" to "A-Z"; 
-} 
-@text-transform latin-only-capitalize 
-{ 
-    transformation: latin-only-uppercase; 
-    scope: initial; 
-} 
-</code> 
- 
-===== DOM interaction ===== 
- 
-Custom text transform values defined within @text-transform rules are accessible via the following modifications to the CSS Object Model. 
- 
-==== Interface CSSRule  ==== 
- 
-The following additional rule type is added to the CSSRule interface. 
- 
-=== IDL Definition === 
-<code> 
-interface CSSRule { 
-... 
-const unsigned short TEXT_TRANSFORM_RULE = 1000; 
-... 
-}; 
-</code> 
- 
-==== Interface CSSTextTransformRule ==== 
- 
- The CSSTextTransformRule interface represents a complete set of keyframes for a single animation. 
- 
-=== IDL Definition === 
-    interface CSSTextTransformRule : CSSRule { 
-        attribute          DOMString   name; 
-        readonly attribute CSSStyleDeclaration style; 
-    }; 
-     
-=== Attributes === 
- 
-== name of type DOMString == 
-This attribute is the name of the transform, used by the text-transform property. 
- 
-== style of type CSSStyleDeclaration == 
-This attribute represents all the descriptors associated with this text-transform. 
-====== Use cases ====== 
- 
-===== Single Languages use cases ===== 
- 
-The following use cases only apply to a single language. Defining all the possibly useful text-transforms for all languages would go beyond the capacity and expertise of the CSS WG. Having the generic mechanism allows authors to solve their specific problem. 
- 
-==== Full-size kana ==== 
-In Japanese, small kanas appearing within ruby are sometimes replaced by the equivalent full-size kana. The following transform defines this conversion 
- 
-<code css> 
-@text-transform full-size-kana 
-{ 
-    transformation: "ぁぃぅぇぉゕゖっゃゅょゎ" to "あいうえおかけつやゆよわ", 
-                    "ァィゥェォヵㇰヶㇱㇲッㇳㇴㇵㇶㇷㇸㇹㇺャュョㇻㇼㇽㇾㇿヮ" to "アイウエオカクケシスツトヌハヒフヘホヤユヨラリルレロワ", 
-                    "ァィゥェォャュョ" to "アイウエオツヤユヨ"; 
-} 
-</code> 
- 
-==== German ß ==== 
- 
-As discussed [[http://lists.w3.org/Archives/Public/www-style/2011Nov/0193.html|in this thread]], ß (aka &szlig; or U+00DF) is traditionally considered a lower case letter without an uppercase equivalent. text-transform:uppercase leaves it unchanged. Unicode has introduced ẞ (U+1E9E), an uppercase version of it since 5.1, but without making it a target of toupper(). 
- 
-This letter being rather new, authors are bound to disagree whether it is a proper uppercase variant of U+00DF, or not. Those who think it is not may use text-transform:uppercase; and text-transform:lowercase Those who think it is could use the following. 
- 
-<code css> 
-@text-transform german-uppercase 
-{ 
-    transformation: U+00DF to U+1E9E, uppercase; 
-} 
- 
-@text-transform german-lowercase 
-{ 
-    transformation: U+1E9E to U+00DF, lowercase; 
-} 
-</code> 
-<note warning> 
-ISSUE 8: It has been suggested that overloading existing values with a language descriptor or selector would be better: <code css>@text-transform uppercase 
-{ 
-    transformation: U+00DF to U+1E9E; 
-    language: de; 
-} 
-</code><code css>@text-transform uppercase:lang(de) 
-{ 
-    transformation: U+00DF to U+1E9E; 
-}</code> 
-</note> 
- 
-==== Turkish i/ı ==== 
- 
-http://en.wikipedia.org/wiki/Dotted_and_dotless_I 
- 
-In Turkish and a few related languages, dotted and dotless i are distinct letters, both in upper land lower case. 
- 
-The uppercasing and lowercasing algorithm defined for the text-transform property only preserve this when the content language of the element is known. 
- 
-Someone, for example in a user style sheet, may want to apply an uppercase or lowercase transform to a document where language is insufficiently marked up, but known to the author of the style sheet to be Turkish. In this case, the generic uppercase and lowercase transforms would fail, but the following would work.  
- 
-<code css> 
-@text-transform turkic-uppercase 
-{ 
-    transformation: "i" to "İ", uppercase; 
-} 
- 
-@text-transform turkic-lowercase 
-{ 
-    transformation: "I" to "ı", lowercase; 
-} 
-</code> 
- 
-==== Georgian upper/lower case ==== 
- 
-http://en.wikipedia.org/wiki/Letter_case#Other_forms_of_case 
-http://en.wikipedia.org/wiki/Georgian_alphabet 
- 
-The Georgian language has used three different unicameral alphabets through history: Asomtavruli, Nuskhuri, and Mkhedruli. Recently, some authors have been using Asomtavruli letters in an otherwise Mkhedruli text, in a way that resembles a bicameral alphabet. One may assume that they would find the following transform useful. 
- 
-<code css> 
-@text-transform Mkhedruli-to-Asomtavruli 
-{ 
-    transformation: "ა-ჵ" to "Ⴀ-Ⴥ"; 
-} 
- 
-@text-transform Asomtavruli-to-Mkhedruli 
-{ 
-    transformation: "Ⴀ-Ⴥ" to "ა-ჵ"; 
-} 
-</code> 
- 
-===== Cross-language use cases ===== 
- 
-The following cases are examples of cases useful in several languages, but rare enough that they are better addressed by authors when needed than by the CSS WG. 
- 
-==== Long s ==== 
- 
-http://en.wikipedia.org/wiki/Long_s 
-http://www.fileformat.info/info/unicode/char/17f/index.htm 
- 
-In old (18th century and earlier) European texts, the letter s, when at the middle or begining of the word, was written ſ (U+017F). S occuring at the end of a word would be written as the modern s is. 
- 
-Modern readers are often unfamiliar with this letter form, and for readability reasons, one may want to convert from one to the other. The follow transform would accomplish this. 
- 
-<code css> 
-@text-transform modernize-s 
-{ 
-    transformation: "ſ" to "s"; 
-} 
-</code> 
- 
-This does the opposite transform: 
- 
-<code css> 
-@text-transform long-s 
-{ 
-    transformation: "s" to "ſ" ; 
-    scope: initial medial; 
-} 
-</code> 
- 
-===== Miscellaneous ===== 
- 
-Here are some more example of how the generic mechanism may be used 
- 
-==== Transliteration ==== 
- 
-Most writing systems of the world have at least one common transliteration scheme into the roman script. 
- 
-<code css romanization.css> 
-@text-transform romanization  
-{ 
-    character-type: spaced 
- /* ISO 9 (Cyrillic) */ 
-    transformation: "А а Ӑ ӑ Ӓ ӓ Ә ә Б б В в Г г Ґ ґ Ҕ ҕ Ғ ғ Д д Ђ ђ Ѓ ѓ Е е Ё ё Ӗ ӗ Є є Ҽ ҽ Ҿ ҿ 
-                     Ж ж Ӂ ӂ Ӝ ӝ Җ җ З з Ӟ ӟ Ѕ ѕ Ӡ ӡ И и Ӥ ӥ І і Ї ї Й й Ј ј К к Қ қ Ҟ ҟ Л л Љ љ 
-                     М м Н н Њ њ Ҥ ҥ Ң ң О о Ӧ ӧ Ө ө П п Ҧ ҧ Р р С с Ҫ ҫ Т т Ҭ ҭ Ћ ћ Ќ ќ 
-                     У у У́ у́ Ў ў Ӱ ӱ Ӳ ӳ Ү ү Ф ф Х х Ҳ ҳ Һ һ Ц ц Ҵ ҵ Ч ч Ӵ ӵ Ҷ ҷ Џ џ Ш ш Щ щ 
-                     Ъ ъ ’ Ы ы Ӹ ӹ Ь ь Э э Ю ю Я я Ѣ ѣ Ѫ ѫ Ѳ ѳ Ѵ ѵ Ҩ ҩ" 
-                to  "A a Ă ă Ä ä A̋ a̋ B b V v G g G̀ g̀ Ğ ğ Ġ ġ D d Đ đ Ǵ ǵ E e Ë ë Ĕ ĕ Ê ê C̆ c̆ Ç̆ ç̆ 
-                     Ž ž Z̆ z̆ Z̄ z̄ Ž̦ ž̧ Z z Z̈ z̈ Ẑ ẑ Ź ź I i Î î Ì ì Ï ï J j J̌ ǰ K k Ķ ķ K̄ k̄ L l L̂ l̂ 
-                     M m N n N̂ n̂ Ṅ ṅ Ṇ ṇ O o Ö ö Ô ô P p Ṕ ṕ R r S s Ç ç T t Ţ ţ Ć ć Ḱ ḱ 
-                     U u Ú ú Ŭ ŭ Ü ü Ű ű Ù ù F f H h Ḩ ḩ Ḥ ḥ C c C̄ c̄ Č č C̈ c̈ Ç ç D̂ d̂ Š š Ŝ ŝ 
-                     ʺ ʺ ‵ Y y Ÿ ÿ ʹ ʹ È è Û û Â â Ě ě Ǎ ǎ F̀ f̀ Ỳ ỳ Ò ò", 
- /* ISO 843 (Greek) */ 
-                    "Α α Ά ά Β β Γ γ Δ δ Ε ε Έ έ Ζ ζ Η η Ή ή Θ  θ  Ι ι Ί ί Ϊ ϊ ΐ Κ κ Λ λ Μ μ 
-                     Ν ν Ξ ξ Ο ο Ό ό Π π Ρ ρ Σ σ ς Τ τ Υ υ Ύ ύ Ϋ ϋ Φ φ Χ  χ  Ψ  ψ  Ω ω Ώ ώ" 
-                to  "A a Á á V v G g D d E e É é Z z Ī ī Ī́ ī́ Th th I i Í í Ï ï ḯ K k L l M m 
-                     N n X x O o Ó ó P p R r S s s T t Y y Ý ý Ÿ ÿ F f Ch ch Ps ps Ō ō Ṓ ṓ"; 
-} 
-</code> 
- 
-==== Comic book vikings ==== 
-In the "Asterix and the Great Crossing" comic book, the Viking characters are supposed to speak a foreign language unintelligible to the main characters, but still understandable to the readers. This is represented by writing down their speech normally, except that some letters are replaced by similarly looking letters found in Scandinavian languages. 
- 
-This effect could be obtained by the following transform: 
- 
-<code css> 
-@text-transform fake-norse 
-{ 
-    transformation: "aoAO" to "åøÅØ"; 
-} 
-</code> 
- 
-==== Leet speak ==== 
-In Internet, hacker and gamer culture, a phenomenon is quite common, where characters are replaced by other characters or character sequences which have a somewhat similar glyphic appearance. Although no single consensual convention exists and sometimes mappings are neither injective nor surjective, one could simulate this playful style with a transform like the following: 
- 
-<code css> 
-@text-transform leet-speak 
-{ 
-    transformation: "A-Z" to "48©)3F6H1!K£MN0¶9®57UVW*¥2"; 
-} 
-</code> 
 
ideas/at-text-transform.1334668085.txt.gz · Last modified: 2014/12/09 15:48 (external edit)
Recent changes RSS feed Valid XHTML 1.0 Valid CSS Driven by DokuWiki