This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
spec:utr50 [2012/06/08 06:18] – [Analysis by Codepoint] kojiishi | spec:utr50 [2012/07/30 20:12] – fantasai | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== UTR #50 Review Memo ====== | ====== UTR #50 Review Memo ====== | ||
This page is a memo page to make our discussion on [[http:// | This page is a memo page to make our discussion on [[http:// | ||
+ | |||
+ | ===== Open Issues ===== | ||
+ | |||
+ | [[: | ||
===== Analysis by Codepoint ===== | ===== Analysis by Codepoint ===== | ||
- | Codes used for analysis by codepoint: | + | Two modes are presented: Stacked ('' |
- | ^Code^UTR50^MSFT^Meaning^ | + | ^Code^Meaning^ |
- | ^U|U|S|Upright; translates between horizontal and vertical| | + | ^U|Upright; translates between horizontal and vertical| |
- | ^R|S|R|Sideways; rotates between horizontal and vertical| | + | ^R|Sideways; |
- | ^T< | + | ^T< |
- | ^T< | + | ^T< |
- | ^V|?|?|Upright wrt Unicode code charts, but translates between horizontal and vertical| | + | ^V|Upright wrt Unicode code charts, but translates between horizontal and vertical |
- | Two modes are presented: Stacked ('' | + | Codepoint classifications |
- | ^General Category^Stack^Mixed^Memo^ | + | * [[spec:utr50: |
- | |[[http:// | + | * [[spec:utr50:punctuation|Punctuation (P*) and Spaces (Z*)]] |
- | |[[http:// | + | |
- | |[[http:// | + | * [[spec:utr50:symbols: |
- | |[[http:// | + | |
- | |M*|Follows grapheme cluster||| | + | |
- | |L* and N*| See [[spec: | + | * Symbol, Other (So) |
- | |P* and Z*| See [[spec: | + | * [[spec:utr50:symbols: |
- | |S* | See [[spec: | + | * [[spec: |
+ | * [[spec: | ||
+ | | ||
+ | * [[spec: | ||
+ | | ||
+ | * [[spec: | ||
+ | * [[spec: | ||
+ | * [[spec: | ||
+ | * [[spec: | ||
- | Potential categories | + | Potential |
* [[spec: | * [[spec: | ||
* Math relational operators (equals, greater-than, | * Math relational operators (equals, greater-than, | ||
* SB brackets | * SB brackets | ||
- | Comparisons: | + | ===== Comparisons |
- | * [[spec: | + | |
- | ===== General | + | * [[spec:utr50: |
- | * [[http:// | + | |
- | * Eric mentioned that [[http:// | + | ===== Notes on Interaction with Font Design |
- | * UTR #50 only tries "some level of compatibility with existing fonts" | + | |
- | * UTR #50 defines not only glyph orientation in vertical text flow but also character spacing classes in horizontal text flow, similar to what we have in the [[http:// | + | |
- | * UTR #50's suggested grapheme clusterization is a) imprecise b) doesn' | + | |
- | * Should add categories for tailorable vs. not tailorable, e.g. Phags-pa and Ideographic are not tailorable to rotate. | + | |
- | * OpenType feature for sideways vertical glyphs would be critical to allow calligraphic and condensed fonts to work with this scheme. | + | |
- | ===== The East Asian Orientation Property | + | |
- | * What are the definitions of U, S, SB, and T? ([[http:// | + | |
- | * Which one allows font designers to put alternate glyphs; i.e., UA applies vert feature? | + | |
- | * Maybe most of the following issues are related with the fundamental question: "what are the goals of UTR #50". If it's for font designers to decide visual glyph orientations to put in vert table, some of these problems are gone, and CSS WG still needs to develop our own algorithm to decide orientation for UAs to render, which could be different from visual glyph orientation. | + | |
* From what I understand, T allows anything; from changing glyph to changing orientations, | * From what I understand, T allows anything; from changing glyph to changing orientations, | ||
* If UTR #50 means fonts should not change glyphs/ | * If UTR #50 means fonts should not change glyphs/ | ||
Line 57: | Line 61: | ||
* Most font designers I contacted believe that it's ok as long as the font is a square font, but I'm worried as it has never been tested at all. | * Most font designers I contacted believe that it's ok as long as the font is a square font, but I'm worried as it has never been tested at all. | ||
- | ==== Yi, Mongolian, Hangul, Bopomofo, Egyp ==== | + | ===== Potential Tailorings |
- | * [[http:// | + | |
- | * [[http:// | + | |
- | * [[http:// | + | |
- | * [[http:// | + | |
- | + | ||
- | ==== Math ==== | + | |
- | + | ||
- | * Fonts seem inconsistent about whether fullwidth characters are upright or sideways. ASCII is sideways. | + | |
- | * Some of them are unified; U+00B1 PLUS-MINUS SIGN, U+00D7 MULTIPLICATION SIGN, U+00F7 DIVISION SIGN, many Sm in U+22xx etc. have full-width glyphs in Japanese fonts and are traditionally upright. Not very comprehensive nor has logical distinction just like other EAW=A though. | + | |
- | * Maybe we could assume MathML are sideways while symbols in text are upright? | + | |
- | + | ||
- | Interesting scans: | + | |
- | + | ||
- | * Although Han characters within math are sometimes sideways: http:// | + | |
- | * " | + | |
- | * Equals sign is sideways, even when math is set upright: http:// | + | |
- | * Koji's book with prime/ | + | |
- | + | ||
- | + | ||
- | ==== Tailoring ==== | + | |
- | + | ||
- | CSS would need to define some tailorings, should the Unicode spec include them too? E.g. | + | |
* upright-cyrillic | * upright-cyrillic | ||
Line 90: | Line 72: | ||
* sideways-unified-punctuation-type-stuff? | * sideways-unified-punctuation-type-stuff? | ||
- | ===== The East Asian Class Property | + | ===== Historical |
- | Not reviewed yet. | + | |
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
- | ===== Comments to Unicode ===== | ||
- | |||
- | From the CSS3 Writing Modes editors. | ||
- | |||
- | ==== Deadlines ==== | ||
- | |||
- | We believe the deadline for comment is too short for such a complex spec. In particular, the new classes will take time to review codepoint-by-codepoint. We hope therefore that Unicode plans to update the spec through multiple review cycles until it stabilizes before publishing UTR50 as a completed spec. | ||
- | |||
- | ==== Scope ==== | ||
- | |||
- | UTR #50 scopes itself to Japanese layout. However, CSS needs to address all vertical writing systems (i.e. systems in which entire books are written in vertical text, not just used as a graphical effect). If the scope is not broadened to include other writing systems, we cannot rely on UTR#50. | ||
- | |||
- | ==== OpenType Features ==== | ||
- | |||
- | To force consistency in orientation, | ||
- | |||
- | We would need to apply '' | ||
- | * proportional and non-square (compressed) fonts, e.g. [[http:// | ||
- | * cursive fonts | ||
- | |||
- | We would need to apply '' | ||
- | * Glyph differences between vertical and horizontal writing in calligraphic / handwriting fonts, e.g. {{kodomonoji_20111005-en.png? | ||
- | |||
- | A new font feature would be needed to apply to the '' | ||
- | * slanted fonts, e.g {{susha.png? | ||
- | * potential alignment issues for punctuation | ||
- | |||
- | ==== Tailoring ==== | ||
- | |||
- | UTR #50 makes no mention of tailoring the orientations. We think the orientation classes should be tailorable; probably Unicode agrees, but this should be more clearly explained. | ||
- | |||
- | So that we don't have to manage codepoint-by-codepoint character classes, we'd eventually like UTR#50 to include classes that are commonly tailored / not tailored, that we can reference. Some examples: | ||
- | |||
- | * class for characters that are generally not tailored, i.e. vertical-native scripts such as Han, Hangul, Phags-Pa etc. | ||
- | * class for characters that belong to Western writing systems (typically set sideways) but are often set upright as symbols, i.e. Latin, Greek, and Cyrillic | ||
- | * brackets, which are pretty much never tailored to upright | ||
- | * maybe others? | ||
- | * '' | ||
- | |||
- | ==== Grapheme Clusters ==== | ||
- | |||
- | UTR #50 does not provide any rules or pointers to rules about grapheme clusterization. We suggest referencing UAX29 and giving examples of where the boundaries there might adjusted (e.g. in Indian scripts). | ||
- | |||
- | The properties of a grapheme cluster should be defined. We suggest that the properties come from the first base character, except in the following cases: | ||
- | |||
- | * Grapheme clusters formed with a combining mark of class Me should be treated as So in the Common script. | ||
- | * Grapheme clusters formed with a base of Zs should belong to category Sk and take their EAW from the space. | ||
- | |||
- | See also http:// | ||
- | |||
- | ==== Miscategorized Scripts ==== | ||
- | |||
- | The following scripts should be upright: | ||
- | |||
- | * Hangul | ||
- | * All variants of Egyptian | ||
- | |||
- | Yi needs more investigation from someone who knows the language. Older books are written vertically, and seem to be a rotation from the Unicode code charts. However I've seen vertical captions in horizontally-set books printed upright. | ||
- | |||
- | ==== Halfwidth Forms ==== | ||
- | |||
- | I was informed that halfwidth forms are strongly discouraged in vertical text, and typically set sideways. [?] | ||
- | |||
- | ==== Arrows and Box-Drawing ==== | ||
- | |||
- | Arrows and box-drawing characters should be set sideways by default, as unlike other symbols, they are usually typeset in spatial relation to other content rather than as a standalone graphic. (The same logic applies to the [[http:// | ||
- | |||
- | Box drawing characters are any characters in the U+2500--U+259F range. | ||
- | |||
- | Arrows are '' | ||
- | |||
- | Placing arrows into the '' | ||
- | |||
- | ==== Superscripts, | ||
- | |||
- | We concur with the comments that suggest changing superscripts, | ||
- | |||
- | * [[http:// | ||
- | * [[http:// | ||
- | |||
- | ==== Math ==== | ||
- | |||
- | Because of the following reasons: | ||
- | * digits are typeset sideways by default | ||
- | * commonly used variable names (Latin, Greek) are typeset sideways by default | ||
- | * superscripts and subscripts are [[http:// | ||
- | * arrows, which function as relations in math, would also be typeset sideways by default (see above) | ||
- | * ASCII math symbols are expected to typeset sideways | ||
- | * mathematical formulae are usually typeset sideways even in vertical text | ||
- | * the most commonly-used symbols that are intermixed with prose (× and +) are symmetric wrt rotation, and the equals sign ('' | ||
- | we suggest math symbols should be typeset sideways by default. | ||
- | |||
- | When intermixed in prose, variable names are often typeset upright, and in such styles math symbols might also be typeset upright. However in these situations some tailoring is necessary for the variable names whatever the mathematical default, so using this style to determine the default rules in plaintext does not make sense. | ||
- | |||
- | The default orientation of fullwidth math symbols is less clear; perhaps they should be U/T (for equals). |