Previous Next Tracker index See it online !

(8/231) 1325082 - matching word break also matches interior of word

In 4.2 final

Searching with the regular expressions

\b. or \<. jumps to the beginning of a word, unless the
cursor is already in a word, in which case it finds the
next character within the word.

This makes it impossible to replace characters only at
the beginning of a word.

For instance:

search on
\<.

replace with BeanShell snippet
_0.toLowerCase()

should force the first character of every word to be
lower case. However, it actually forces *every*
character of every word to be lower case.

Submitted	abgrover - 2005-10-12 - 17:50:09z	Assigned	nobody
Priority	4	Category	search and replace
Status	Open	Group	normal bug
Resolution	None	Visibility	No

Comments

2005-10-13 - 03:39:00z vampire0	Logged In: YES user_id=918212 As far as I have found, \b, \< and some others are not supported by the gnu.regex-Package. One additional point for throwing it away and using java.util.regex instead. ;-)
2008-03-02 - 21:44:33z rschwenn	Logged In: YES user_id=1486645 Originator: NO The gnu.regex-Package is thrown away now. Not so the bug. For example: - The Expression "\b." matches every single charcter in a word. - The Expression ".\b" matches the last charcter in a word (as expected) but also a following space character. jEdit 4.3pre12 JRE 1.6.0_03 WinXP SP2
2012-01-19 - 20:15:50z sjakob	I don't think this is a bug. I see 2 issues with Alan G's approach: 1) Note that the boundary matching characters match the boundary and not the characters themselves. Each word has two boundaries, one BEFORE the first character and one AFTER the last. Your regex will match the first boundary of your complete search string, which occurs before the first word character. 2) The replace string you specify indicates that you want to replace every character with a lower-case character since "_0" refers to the complete contents of your searched text. To clarify, if my search text is the string "this text" Alan's BeanShell snippet is equivalent to "this text".toLowerCase(). As an alternative, the following appears to work for me: Search regex: \b(\w)(\w*) Replace with BeanShell snippet: _1.toLowerCase() + _2 By separating the first character of each word (following a boundary) from the rest I can transform just that one character.
2012-01-21 - 12:21:24z rschwenn	Just tried again: 1. ".\b" matches the word boundary and the preceding character as expected 2. "\b." matches every single character in a word, which is a bug, isn't it? jEdit 4.4.2 and jEdit 4.5pre1 JRE 1.6.0_24 WinXP SP3
2012-01-22 - 10:55:23z boise	As noted, \b matches word boundaries, i.e. both the beginning and end of a "word". So \b. will indeed match the first letter of any word, but also the first whitespace character (and any other non-word character such as the dot after a sentence) AFTER every word. (Though that doen't matter if you just want to upper-case it). If you want to upper-case the first letter of every word you should use \b\w instead, or even \b[a-zA-Z] (or similar) depending on whether the search or the conversion is slowest. I'm guessing that when doing this in jEdit, that after each match has been handled, the pattern is applied again to whatever comes after the last matched character. This will indeed cause every character in a word to match, since the first position of every string will match \b. In other words, the first match for \b in any non-blank string is (and should be) ^, i.e. the first position of the matched string. In that case it is not a bug, and the correct way is indeed to use something like \b(\w)\w* as suggested by Mr. Jakob. That would let each match consume the rest of the word so that it is not matched in the next iteration.
2012-04-22 - 14:21:04z jarekczek	Search and replace design does not allow to fix this without significant interface extension. SearchMatcher class has findNext method which always starts from 0 index. It does not allow to supply a different index. If findNext method has no access to the previous characters, it is not able to perform a word boundary search correctly. So I don't expect this to be fixed soon. A fix would require much attention because there are many clauses for reverse search which must be taken into consideration. I'm not going to do it. I don't think it is really a crucial functionality, so lowering the priority. I even have a workaround. First do replace all "\b" with "X" (this works), then all "X." with a suitable java snippet. Of course X must be substituted with something that is not contained in the file.

(8/231) 1325082 - matching word break also matches interior of word

Comments

Attachments