Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
spec:utr50 [2012/06/09 02:08] – [Analysis by Codepoint] kojiishispec:utr50 [2012/07/30 20:13] – [Comparisons] fantasai
Line 1: Line 1:
 ====== UTR #50 Review Memo ====== ====== UTR #50 Review Memo ======
 This page is a memo page to make our discussion on [[http://www.unicode.org/reports/tr50/|UTR #50]] smooth. This page is a memo page to make our discussion on [[http://www.unicode.org/reports/tr50/|UTR #50]] smooth.
 +
 +===== Open Issues =====
 +
 +[[:spec:utr50:agenda|Tracking open issues, or resolved issues not yet published in an update]]
  
 ===== Analysis by Codepoint ===== ===== Analysis by Codepoint =====
  
-Codes used for analysis by codepoint:+Two modes are presented: Stacked (''text-orientation: upright'') and Mixed (''text-orientation: mixed''). Codes used for analysis by codepoint:
  
-^Code^UTR50^MSFT^Meaning^ +^Code^Meaning^ 
-^U|U|S|Upright; translates between horizontal and vertical| +^U|Upright; translates between horizontal and vertical| 
-^R|S|R|Sideways; rotates between horizontal and vertical| +^R|Sideways; rotates between horizontal and vertical| 
-^T<sub>U</sub>|T|ST|Typeset upright with alternate glyph. Best fallback is just upright.| +^T<sub>U</sub>|Typeset upright with alternate glyph. Best fallback is just upright.| 
-^T<sub>R</sub>|SB|RT|Typeset upright with alternate glyph. Best fallback is just sideways.| +^T<sub>R</sub>|Typeset upright with alternate glyph. Best fallback is just sideways.| 
-^V|?|?|Upright wrt Unicode code charts, but translates between horizontal and vertical|+^V|Upright wrt Unicode code charts, but translates between horizontal and vertical (VO=U/HO=L)|
  
-Two modes are presented: Stacked (''text-orientation: upright''and Mixed (''text-orientationmixed'')+Codepoint classifications and notes by general category:
  
-^General Category^Stack^Mixed^Memo^ +  * [[spec:utr50:letters|Letters (L*) and Numbers (N*)]] 
-|[[http://www.fileformat.info/info/unicode/category/Cc/list.htm|Other, Control (Cc)]]|R|R|:?undefined in stacked mode+  * [[spec:utr50:punctuation|Punctuation (P*) and Spaces (Z*)]] 
-|[[http://www.fileformat.info/info/unicode/category/Cf/list.htm|OtherFormat (Cf)]]|R|R|:?undefined in stacked mode+  [[spec:utr50:symbols|SymbolModifier (Sk)]] 
-|[[http://www.fileformat.info/info/unicode/category/Co/list.htm|OtherPrivate Use (Co)]]|U|U| Bias for East Asian use, since other usage is unknown | +  * [[spec:utr50:symbols:currency|Symbol, Currency (Sc)]] 
-|[[http://www.fileformat.info/info/unicode/category/Cs/list.htm|OtherSurrogate (Cs)]]|R|R|:?no need to define?| +  [[spec:utr50:symbols:math|SymbolMath (Sm)]] 
-|M*|Follows grapheme cluster||| +  [[spec:utr50:symbols:currency|SymbolCurrency (Sc)]] 
-|Land N*| See [[spec:utr50:letters]]||| +  * Symbol, Other (So) 
-|Pand Z*| See [[spec:utr50:punctuation]]||| +    * [[spec:utr50:symbols:textual]] 
-|S| See [[spec:utr50:symbols]]|||+    [[spec:utr50:symbols:pictographs]] 
 +    [[spec:utr50:symbols:cjk]] 
 +    * [[spec:utr50:symbols:enclosed]] 
 +    [[spec:utr50:symbols:ancient]] 
 +    * [[spec:utr50:symbols:game]] 
 +    * [[spec:utr50:symbols:technical]] 
 +    * [[spec:utr50:symbols:drawing]] 
 +  * [[spec:utr50:symbols:arrows]] (So and Sm) 
 +  * [[spec:utr50:control]]
  
-Potential categories to support special behavior:+Potential tailoring categories:
   * [[spec:utr50:symbols:arrows|Arrows]]   * [[spec:utr50:symbols:arrows|Arrows]]
   * Math relational operators (equals, greater-than, etc)   * Math relational operators (equals, greater-than, etc)
Line 33: Line 45:
  
   * [[spec:utr50:diff20120609|Differences against the current draft]]   * [[spec:utr50:diff20120609|Differences against the current draft]]
-===== General ===== +  * [[http://blog.antenna.co.jp/CSSPage/tr50-taro.20120712.html|Comparison of UTR50 and Yamamoto-san'proposal]] 
-  * [[http://www.unicode.org/review/pri207/|PRI #207]] review period ends on Oct 24th, 2011 --- way too short +===== Notes on Interaction with Font Design ===== 
-  * Eric mentioned that [[http://www.unicode.org/forum/viewtopic.php?f=35&t=201|UTR #50 is for Japanese text and should define Hangul orientation that appears in Japanese text, rather than Hangul native orientation]]. Our goal of "upright-right" is a good vertical text flow for East Asian. Are we seeing things differently? +
-  * UTR #50 only tries "some level of compatibility with existing fonts". Again, this is very different from our goals, isn't this? +
-  * UTR #50 defines not only glyph orientation in vertical text flow but also character spacing classes in horizontal text flow, similar to what we have in the [[http://dev.w3.org/csswg/css3-text/#text-spacing-prop|text-spacing]] property. Shouldn't this be a separate discussion? Review period is too short for such a big property. +
-  * UTR #50'suggested grapheme clusterization is a) imprecise b) doesn't handle exceptions in [[http://www.w3.org/TR/css3-writing-modes/#character-properties|Me and Zs categories]] +
-  * Should add categories for tailorable vs. not tailorable, e.g. Phags-pa and Ideographic are not tailorable to rotate. +
-  * OpenType feature for sideways vertical glyphs would be critical to allow calligraphic and condensed fonts to work with this scheme. +
-===== The East Asian Orientation Property ===== +
-  * What are the definitions of U, S, SB, and T? ([[http://www.unicode.org/forum/viewtopic.php?f=35&t=198|Tk is gone]]) +
-    * Which one allows font designers to put alternate glyphs; i.e., UA applies vert feature? +
-    * Maybe most of the following issues are related with the fundamental question: "what are the goals of UTR #50". If it's for font designers to decide visual glyph orientations to put in vert table, some of these problems are gone, and CSS WG still needs to develop our own algorithm to decide orientation for UAs to render, which could be different from visual glyph orientation.+
   * From what I understand, T allows anything; from changing glyph to changing orientations, so although "representative glyphs" are shown, their orientations are undefined in UTR #50. Some rotate, some do not, and it's up to font designer. Is this correct understanding?   * From what I understand, T allows anything; from changing glyph to changing orientations, so although "representative glyphs" are shown, their orientations are undefined in UTR #50. Some rotate, some do not, and it's up to font designer. Is this correct understanding?
   * If UTR #50 means fonts should not change glyphs/positions for U/S/SB, there are compatibility and font designing problems here.   * If UTR #50 means fonts should not change glyphs/positions for U/S/SB, there are compatibility and font designing problems here.
Line 58: Line 61:
     * Most font designers I contacted believe that it's ok as long as the font is a square font, but I'm worried as it has never been tested at all.     * Most font designers I contacted believe that it's ok as long as the font is a square font, but I'm worried as it has never been tested at all.
  
-==== Yi, Mongolian, Hangul, Bopomofo, Egyp ==== +===== Potential Tailorings =====
-  * [[http://www.unicode.org/forum/viewtopic.php?f=35&t=202|Vertical Directionality property from johnwcowan]] +
-  * [[http://lists.w3.org/Archives/Public/public-i18n-cjk/2011OctDec/0000.html|Hangul characters upright or sideways in vertical flow?]] +
-  * [[http://lists.w3.org/Archives/Public/www-style/2011Oct/0128.html|Yi and Hangul]] +
-  * [[http://lists.w3.org/Archives/Public/www-style/2011Oct/0374.html|Egyp]] also [[http://www.omniglot.com/writing/egyptian_hieratic.htm|Hieratic]] does not rotate +
- +
-==== Math ==== +
- +
-  * Fonts seem inconsistent about whether fullwidth characters are upright or sideways. ASCII is sideways. +
-  * Some of them are unified; U+00B1 PLUS-MINUS SIGN, U+00D7 MULTIPLICATION SIGN, U+00F7 DIVISION SIGN, many Sm in U+22xx etc. have full-width glyphs in Japanese fonts and are traditionally upright. Not very comprehensive nor has logical distinction just like other EAW=A though. +
-  * Maybe we could assume MathML are sideways while symbols in text are upright? +
- +
-Interesting scans: +
- +
-  * Although Han characters within math are sometimes sideways: http://d.hatena.ne.jp/choiyaki/20110908/1315431640 that may be a limitation of the math typesetter: http://fantasai.inkedblade.net/style/scans/ChinatownSFPL013.png http://fantasai.inkedblade.net/style/scans/ChinatownSFPL015.png +
-  * "y" in math are sideways, while "y" in text are upright: http://twitpic.com/2hzi0s +
-  * Equals sign is sideways, even when math is set upright: http://fantasai.inkedblade.net/style/scans/ChinatownSFPL023.png http://fantasai.inkedblade.net/style/scans/ChinatownSFPL027.png http://fantasai.inkedblade.net/style/scans/ChinatownSFPL028.png +
-  * Koji's book with prime/double prime ?{{:spec:vert_math.png?linkonly|}} +
- +
- +
-==== Tailoring ==== +
- +
-CSS would need to define some tailorings, should the Unicode spec include them too? E.g.+
  
   * upright-cyrillic   * upright-cyrillic
Line 91: Line 72:
   * sideways-unified-punctuation-type-stuff?   * sideways-unified-punctuation-type-stuff?
  
-===== The East Asian Class Property ====== +===== Historical ======
-Not reviewed yet.+
  
 +  * [[http://lists.w3.org/Archives/Public/www-international/2011OctDec/0034.html|Comments from CSS3 Writing Modes editors to Unicode circa October 2011]]
 +  * [[http://www.unicode.org/forum/viewtopic.php?f=35&t=202|Vertical Directionality property from johnwcowan]]
 +  * [[http://lists.w3.org/Archives/Public/public-i18n-cjk/2011OctDec/0000.html|Hangul characters upright or sideways in vertical flow?]]
 +  * [[http://lists.w3.org/Archives/Public/www-style/2011Oct/0128.html|Yi and Hangul]]
 +  * [[http://lists.w3.org/Archives/Public/www-style/2011Oct/0374.html|Egyp]] also [[http://www.omniglot.com/writing/egyptian_hieratic.htm|Hieratic]] does not rotate
  
-===== Comments to Unicode ===== 
- 
-From the CSS3 Writing Modes editors. 
- 
-==== Deadlines ==== 
- 
-We believe the deadline for comment is too short for such a complex spec. In particular, the new classes will take time to review codepoint-by-codepoint. We hope therefore that Unicode plans to update the spec through multiple review cycles until it stabilizes before publishing UTR50 as a completed spec. 
- 
-==== Scope ==== 
- 
-UTR #50 scopes itself to Japanese layout. However, CSS needs to address all vertical writing systems (i.e. systems in which entire books are written in vertical text, not just used as a graphical effect). If the scope is not broadened to include other writing systems, we cannot rely on UTR#50. 
- 
-==== OpenType Features ==== 
- 
-To force consistency in orientation, UTR#50 expects ''vert'' to apply only to ''T'' (and maybe ''SB'') category glyphs. However, this is incompatible with many fonts and cannot be implemented by a system that expects to correctly handle legacy content (in other words, any content authored with currently-existing fonts). 
- 
-We would need to apply ''vert'' to the ''U'' category as well in order to handle: 
-  * proportional and non-square (compressed) fonts, e.g. [[http://www.axisfont.com/|AXIS fonts]] 
-  * cursive fonts 
- 
-We would need to apply ''vert'' to the ''SB'' category to handle 
-  * Glyph differences between vertical and horizontal writing in calligraphic / handwriting fonts, e.g. {{kodomonoji_20111005-en.png?linkonly|}} {{suzuedo.png?linkonly|}} 
- 
-A new font feature would be needed to apply to the ''S'' category to handle 
-  * slanted fonts, e.g {{susha.png?linkonly|}} 
-  * potential alignment issues for punctuation 
- 
-==== Tailoring ==== 
- 
-UTR #50 makes no mention of tailoring the orientations. We think the orientation classes should be tailorable; probably Unicode agrees, but this should be more clearly explained. 
- 
-So that we don't have to manage codepoint-by-codepoint character classes, we'd eventually like UTR#50 to include classes that are commonly tailored / not tailored, that we can reference. Some examples: 
- 
-  * class for characters that are generally not tailored, i.e. vertical-native scripts such as Han, Hangul, Phags-Pa etc. 
-  * class for characters that belong to Western writing systems (typically set sideways) but are often set upright as symbols, i.e. Latin, Greek, and Cyrillic 
-  * brackets, which are pretty much never tailored to upright 
-  * maybe others? 
-    * ''So'' --- registered were mentioned as an issue in UTR#50, and here are samples of copyright symbol {{:spec:copyright_vert.jpg?linkonly|}} {{:spec:copyright_horz.png?linkonly|}} 
- 
-==== Grapheme Clusters ==== 
- 
-UTR #50 does not provide any rules or pointers to rules about grapheme clusterization. We suggest referencing UAX29 and giving examples of where the boundaries there might adjusted (e.g. in Indian scripts). 
- 
-The properties of a grapheme cluster should be defined. We suggest that the properties come from the first base character, except in the following cases: 
- 
-  * Grapheme clusters formed with a combining mark of class Me should be treated as So in the Common script. 
-  * Grapheme clusters formed with a base of Zs should belong to category Sk and take their EAW from the space. 
- 
-See also http://www.w3.org/TR/css3-writing-modes/#character-properties 
- 
-==== Miscategorized Scripts ==== 
- 
-The following scripts should be upright: 
- 
-  * Hangul 
-  * All variants of Egyptian 
- 
-Yi needs more investigation from someone who knows the language. Older books are written vertically, and seem to be a rotation from the Unicode code charts. However I've seen vertical captions in horizontally-set books printed upright. 
- 
-==== Halfwidth Forms ==== 
- 
-I was informed that halfwidth forms are strongly discouraged in vertical text, and typically set sideways. [?] 
- 
-==== Arrows and Box-Drawing ==== 
- 
-Arrows and box-drawing characters should be set sideways by default, as unlike other symbols, they are usually typeset in spatial relation to other content rather than as a standalone graphic. (The same logic applies to the [[http://www.unicode.org/forum/viewtopic.php?f=35&t=206&sid=c7aee9b970811d5a1f4819bf10e2de6e|bracket pieces]].) 
- 
-Box drawing characters are any characters in the U+2500--U+259F range. 
- 
-Arrows are ''So'' characters in the U+2190--U+21FF, U+261A--U+261F, U+2794--U+27BE, U+2B00--U+2B11, and U+2B45--U+2B46 ranges; and ''Sm'' characters in the U+27F0--297F and U+2B30--U+2B4C ranges. 
- 
-Placing arrows into the ''S'' category instead of ''U'' also relieves concerns about inconsistent arrow orientations due to ''vert'' interpretation. 
- 
-==== Superscripts, Subscripts, Bracket Pieces ==== 
- 
-We concur with the comments that suggest changing superscripts, subscripts, and bracket pieces to ''S'' by default. 
- 
-  * [[http://www.unicode.org/forum/viewtopic.php?f=35&t=204|superscripts and subscripts ]] 
-  * [[http://www.unicode.org/forum/viewtopic.php?f=35&t=206|bracket pieces]] 
- 
-==== Math ==== 
- 
-Because of the following reasons: 
-  * digits are typeset sideways by default 
-  * commonly used variable names (Latin, Greek) are typeset sideways by default 
-  * superscripts and subscripts are [[http://www.unicode.org/forum/viewtopic.php?f=35&t=204|typically typeset sideways]] 
-  * arrows, which function as relations in math, would also be typeset sideways by default (see above) 
-  * ASCII math symbols are expected to typeset sideways 
-  * mathematical formulae are usually typeset sideways even in vertical text 
-  * the most commonly-used symbols that are intermixed with prose (× and +) are symmetric wrt rotation, and the equals sign (''='') seems to be typeset sideways even when everything else is upright ([[http://fantasai.inkedblade.net/style/scans/ChinatownSFPL028.png|example]]) 
-we suggest math symbols should be typeset sideways by default. 
- 
-When intermixed in prose, variable names are often typeset upright, and in such styles math symbols might also be typeset upright. However in these situations some tailoring is necessary for the variable names whatever the mathematical default, so using this style to determine the default rules in plaintext does not make sense. 
- 
-The default orientation of fullwidth math symbols is less clear; perhaps they should be U/T (for equals). 
 
spec/utr50.txt · Last modified: 2014/12/09 15:48 by 127.0.0.1
Recent changes RSS feed Valid XHTML 1.0 Valid CSS Driven by DokuWiki