There is a macro "Display Character Code." It used to be the case that one could
use the cursor keys (e.g. the right arrow) to step through the parts of a Unicode
composite character, and at each character--base or combining--get the character code
of that character using this macro.
After bug 3455 was fixed, the above behavior was no longer possible. In particular,
the cursor will no longer move to each combining character--it only moves to base
characters.
There are several implication of this. One of the implications is that it is no longer
possible to edit a composite character: the only thing you can do is delete the entire
composite character (base + diacritics) and start over. I suppose that's not the
end of the world.
A more important implication is that the "Display Character Code" no longer works
for combining characters--it is impossible to use this macro to find the code point
of such a character (it only works for base characters). In principle, one can use
the Hex plugin, but this is a clumsy work-around, involving switching to a different
file format, finding where the character in question was, and switching back. It
would be far better to be able to use this macro.
There are (at least) two possible fixes. One would be to restore the ability of the
cursor keys (and backspace and delete keys) to move one character at a time, regardless
of whether that character is a base or combining character. Based on an email thread
of Aug 2012 entitled "Editing Unicode combining characters", this would be a lot of
work. A simpler but acceptable method, suggested by Kazutoshi Satoda 25 Aug 2012
in that same thread, would be for the "Display Character Code" macro to output a sequence
of code points, including the base character and any combining characters.
I'm attaching a file that contains a base character 'a' (ASCII/Unicode U+61) followed
by a combining acute accent (U+301). Notice that this is in Unicode *decomposed*
(NFD) format; if it were in the composed format (NFC), it would be a single code point,
U+E1, and would not illustrate the problem. (The problem also arises with combinations
of base+diacritics for which there is no NFC form. BTW, I'm assuming that sourceforge
doesn't convert uploaded files to NFC--if it does, the code point U+E1 will show up.
Let me know if that happens and I'll come up with a combination that won't change.)
To demonstrate the problem with the attached file, put the cursor at the top of the
file, and call "Display Character Code." It will show code point 61. Move the cursor
one position to the right; you'll be at EOF, and the macro will not display anything.
It should instead either display the code point U+301 (first solution above, allow
cursor movement to each character whether base or diacritic), or else when the cursor
is at the top of the file (before the accented 'a') the macro should display "61 301"
or similar (e.g. "61+301"), i.e. one code point for each character (second solution
above).
This problem will not occur in 8-bit encodings, since they don't have a notion of
combining characters. I don't know enough about other non-8-bit encodings (like Big5)
to know whether it happens there; probably just in Unicode.
Jedit v5.1.0 (and other versions since at least Aug 2012), Windows 7, Java version
1.8.0_20.
Submitted | mcswell - 2014-09-21 18:19:57.607000 | Assigned | |
---|---|---|---|
Priority | 5 | Labels | unicode macros |
Status | open | Group | minor bug |
Resolution | None |