PreviousNextTracker indexSee it online !

(5/240) 2469 - matching word break also matches interior of word

In 4.2 final

Searching with the regular expressions

\b. or \<. jumps to the beginning of a word, unless the
cursor is already in a word, in which case it finds the
next character within the word.

This makes it impossible to replace characters only at
the beginning of a word.

For instance:

search on
\<.

replace with BeanShell snippet
_0.toLowerCase()

should force the first character of every word to be
lower case. However, it actually forces \*every\*
character of every word to be lower case.

Submitted abgrover - 2005-10-12 17:50:09 Assigned
Priority 4 Labels search and replace
Status open Group normal bug
Resolution None

Comments

2005-10-13 03:39:00
vampire0

Logged In: YES
user_id=918212

As far as I have found, \b, \< and some others are not
supported by the gnu.regex-Package.
One additional point for throwing it away and using
java.util.regex instead. ;-)

2008-03-02 21:44:33
rschwenn

Logged In: YES
user_id=1486645
Originator: NO

The gnu.regex-Package is thrown away now. Not so the bug.

For example:
- The Expression "\b." matches \*every\* single charcter in a word.
- The Expression ".\b" matches the last charcter in a word (as expected) but also a following space character.

jEdit 4.3pre12
JRE 1.6.0_03
WinXP SP2

2012-01-19 19:41:43
ezust

- **milestone**: 101608 --> 101607
- **priority**: 5 --> 7

2012-01-19 20:15:50
sjakob

I don't think this is a bug. I see 2 issues with Alan G's approach:
1) Note that the boundary matching characters match the boundary and not the characters themselves. Each word has two boundaries, one BEFORE the first character and one AFTER the last. Your regex will match the first boundary of your complete search string, which occurs before the first word character.
2) The replace string you specify indicates that you want to replace every character with a lower-case character since "_0" refers to the complete contents of your searched text. To clarify, if my search text is the string "this text" Alan's BeanShell snippet is equivalent to "this text".toLowerCase().

As an alternative, the following appears to work for me:
Search regex: \b(\w)(\w\*)
Replace with BeanShell snippet: _1.toLowerCase() + _2

By separating the first character of each word (following a boundary) from the rest I can transform just that one character.

2012-01-21 12:21:24
rschwenn

Just tried again:

1. ".\b" matches the word boundary and the preceding character \*as expected\*
2. "\b." matches \*every single character\* in a word, which is \*a bug\*, isn't it?

jEdit 4.4.2 and jEdit 4.5pre1
JRE 1.6.0_24
WinXP SP3

2012-01-22 10:55:23
boise

As noted, \b matches word boundaries, i.e. both the beginning and end of a "word". So \b. will indeed match the first letter of any word, but also the first whitespace character (and any other non-word character such as the dot after a sentence) AFTER every word. (Though that doen't matter if you just want to upper-case it).

If you want to upper-case the first letter of every word you should use \b\w instead, or even \b\[a-zA-Z\] (or similar) depending on whether the search or the conversion is slowest.

I'm guessing that when doing this in jEdit, that after each match has been handled, the pattern is applied again to whatever comes after the last matched character. This will indeed cause every character in a word to match, since the first position of every string will match \b. In other words, the first match for \b in any non-blank string is (and should be) ^, i.e. the first position of the matched string.

In that case it is not a bug, and the correct way is indeed to use something like \b(\w)\w\* as suggested by Mr. Jakob. That would let each match consume the rest of the word so that it is not matched in the next iteration.

2012-04-22 14:21:04
jarekczek

Search and replace design does not allow to fix this without significant interface extension. SearchMatcher class has findNext method which always starts from 0 index. It does not allow to supply a different index. If findNext method has no access to the previous characters, it is not able to perform a word boundary search correctly. So I don't expect this to be fixed soon.

A fix would require much attention because there are many clauses for reverse search which must be taken into consideration. I'm not going to do it.

I don't think it is really a crucial functionality, so lowering the priority. I even have a workaround. First do replace all "\b" with "X" (this works), then all "X." with a suitable java snippet. Of course X must be substituted with something that is not contained in the file.

2012-04-22 14:21:04
jarekczek

- **milestone**: 101607 --> normal bug
- **priority**: 7 --> 4