This is an old revision of the document!
This page is a memo page to make our discussion on UTR #50 smooth.
Codes used for analysis by codepoint:
Code | UTR50 | MSFT | Meaning |
---|---|---|---|
U | U | S | Upright; translates between horizontal and vertical |
R | S | R | Sideways; rotates between horizontal and vertical |
TU | T | ST | Typeset upright with alternate glyph. Best fallback is just upright. |
TR | SB | RT | Typeset upright with alternate glyph. Best fallback is just sideways. |
V | ? | ? | Upright wrt Unicode code charts, but translates between horizontal and vertical |
Two modes are presented: Stacked (text-orientation: upright
) and Mixed (text-orientation: mixed
)
General Category | Stack | Mixed | Memo |
---|---|---|---|
Other, Control (Cc) | U | R | |
Other, Format (Cf) | U | R | |
Other, Private Use (Co) | U | R | |
Other, Surrogate (Cs) | U | R | no need to define? |
M* | Follows grapheme cluster | ||
L* and N* | See Letters and Numbers Orientation By Codepoint | ||
P* and Z* | See Punctuation Orientation By Codepoint | ||
S* | See Symbols Orientation by Codepoint |
Potential categories to support special behavior:
Interesting scans:
CSS would need to define some tailorings, should the Unicode spec include them too? E.g.
Not reviewed yet.
From the CSS3 Writing Modes editors.
We believe the deadline for comment is too short for such a complex spec. In particular, the new classes will take time to review codepoint-by-codepoint. We hope therefore that Unicode plans to update the spec through multiple review cycles until it stabilizes before publishing UTR50 as a completed spec.
UTR #50 scopes itself to Japanese layout. However, CSS needs to address all vertical writing systems (i.e. systems in which entire books are written in vertical text, not just used as a graphical effect). If the scope is not broadened to include other writing systems, we cannot rely on UTR#50.
To force consistency in orientation, UTR#50 expects vert
to apply only to T
(and maybe SB
) category glyphs. However, this is incompatible with many fonts and cannot be implemented by a system that expects to correctly handle legacy content (in other words, any content authored with currently-existing fonts).
We would need to apply vert
to the U
category as well in order to handle:
We would need to apply vert
to the SB
category to handle
A new font feature would be needed to apply to the S
category to handle
UTR #50 makes no mention of tailoring the orientations. We think the orientation classes should be tailorable; probably Unicode agrees, but this should be more clearly explained.
So that we don't have to manage codepoint-by-codepoint character classes, we'd eventually like UTR#50 to include classes that are commonly tailored / not tailored, that we can reference. Some examples:
So
— registered were mentioned as an issue in UTR#50, and here are samples of copyright symbol copyright_vert.jpg copyright_horz.pngUTR #50 does not provide any rules or pointers to rules about grapheme clusterization. We suggest referencing UAX29 and giving examples of where the boundaries there might adjusted (e.g. in Indian scripts).
The properties of a grapheme cluster should be defined. We suggest that the properties come from the first base character, except in the following cases:
See also http://www.w3.org/TR/css3-writing-modes/#character-properties
The following scripts should be upright:
Yi needs more investigation from someone who knows the language. Older books are written vertically, and seem to be a rotation from the Unicode code charts. However I've seen vertical captions in horizontally-set books printed upright.
I was informed that halfwidth forms are strongly discouraged in vertical text, and typically set sideways. [?]
Arrows and box-drawing characters should be set sideways by default, as unlike other symbols, they are usually typeset in spatial relation to other content rather than as a standalone graphic. (The same logic applies to the bracket pieces.)
Box drawing characters are any characters in the U+2500–U+259F range.
Arrows are So
characters in the U+2190–U+21FF, U+261A–U+261F, U+2794–U+27BE, U+2B00–U+2B11, and U+2B45–U+2B46 ranges; and Sm
characters in the U+27F0–297F and U+2B30–U+2B4C ranges.
Placing arrows into the S
category instead of U
also relieves concerns about inconsistent arrow orientations due to vert
interpretation.
We concur with the comments that suggest changing superscripts, subscripts, and bracket pieces to S
by default.
Because of the following reasons:
=
) seems to be typeset sideways even when everything else is upright (example)we suggest math symbols should be typeset sideways by default.
When intermixed in prose, variable names are often typeset upright, and in such styles math symbols might also be typeset upright. However in these situations some tailoring is necessary for the variable names whatever the mathematical default, so using this style to determine the default rules in plaintext does not make sense.
The default orientation of fullwidth math symbols is less clear; perhaps they should be U/T (for equals).