The RULES Tag

The RULES Tag
Prev	Chapter 10. Mode Definition Syntax	Next

RULES tags must be placed inside the MODE tag. Each RULES tag defines a ruleset. A ruleset consists of a number of parser rules, with each parser rule specifying how to highlight a specific syntax token. There must be at least one ruleset in each edit mode. There can also be more than one, with different rulesets being used to highlight different parts of a buffer (for example, in HTML mode, one rule set highlights HTML tags, and another highlights inline JavaScript). For information about using more than one ruleset, see the section called “The SPAN Tag”.

The RULES tag supports the following attributes, all of which are optional:

SET - the name of this ruleset. All rulesets other than the first must have a name.
IGNORE_CASE - if set to FALSE, matches will be case sensitive. Otherwise, case will not matter. Default is TRUE.
ESCAPE - specifies a character sequence for escaping literals. The first character following the escape sequence is not considered as input for syntax highlighting, thus being highlighted with default token for the rule set.
NO_WORD_SEP - any non-alphanumeric character not in this list is treated as a word separator for the purposes of syntax highlighting.
DEFAULT - the token type for text which doesn't match any specific rule. Default is NULL. See the section called “Token Types” for a list of token types.
HIGHLIGHT_DIGITS
DIGIT_RE - see below for information about these two attributes.

Here is an example RULES tag:

<RULES IGNORE_CASE="FALSE" HIGHLIGHT_DIGITS="TRUE">
    ... parser rules go here ...
</RULES>

Highlighting Numbers

If the HIGHLIGHT_DIGITS attribute is set to TRUE, jEdit will attempt to highlight numbers in this ruleset.

Any word consisting entirely of digits (0-9) will be highlighted with the DIGIT token type. A word that contains other letters in addition to digits will be highlighted with the DIGIT token type only if it matches the regular expression specified in the DIGIT_RE attribute. If this attribute is not specified, it will not be highlighted.

Here is an example DIGIT_RE regular expression that highlights Java-style numeric literals (normal numbers, hexadecimals prefixed with 0x, numbers suffixed with various type indicators, and floating point literals containing an exponent). Note that newlines have been inserted here for readability.

DIGIT_RE="(0[lL]?|[1-9]\d{0,9}(\d{0,9}[lL])?
|0[xX]\p{XDigit}{1,8}(\p{XDigit}{0,8}[lL])?
|0[0-7]{1,11}([0-7]{0,11}[lL])?|([0-9]+\.[0-9]*
|\.[0-9]+)([eE][+-]?[0-9]+)?[fFdD]?|[0-9]+([eE][+-]?[0-9]+[fFdD]?
|([eE][+-]?[0-9]+)?[fFdD]))"

Regular expression syntax is described in Appendix E, Regular Expressions.

Rule Ordering Requirements

You might encounter this very common pitfall when writing your own modes.

Since jEdit checks buffer text against parser rules in the order they appear in the ruleset, more specific rules must be placed before generalized ones, otherwise the generalized rules will catch everything.

This is best demonstrated with an example. The following is incorrect rule ordering:

<SPAN TYPE="MARKUP">
    <BEGIN>[</BEGIN>
    <END>]</END>
</SPAN>

<SPAN TYPE="KEYWORD1">
    <BEGIN>[!</BEGIN>
    <END>]</END>
</SPAN>

If you write the above in a rule set, any occurrence of “[” (even things like “[!DEFINE”, etc) will be highlighted using the first rule, because it will be the first to match. This is most likely not the intended behavior.

The problem can be solved by placing the more specific rule before the general one:

<SPAN TYPE="KEYWORD1">
    <BEGIN>[!</BEGIN>
    <END>]</END>
</SPAN>

<SPAN TYPE="MARKUP">
    <BEGIN>[</BEGIN>
    <END>]</END>
</SPAN>

Now, if the buffer contains the text “[!SPECIAL]”, the rules will be checked in order, and the first rule will be the first to match. However, if you write “[FOO]”, it will be highlighted using the second rule, which is exactly what you would expect.

Per-Ruleset Properties

The PROPS tag (described in the section called “The PROPS Tag”) can also be placed inside the RULES tag to define ruleset-specific properties. The following properties can be set on a per-ruleset basis:

commentEnd - the comment end string.
commentStart - the comment start string.
lineComment - the line comment string.

This allows different parts of a file to have different comment strings (in the case of HTML, for example, in HTML text and inline JavaScript). For information about the commenting commands, see the section called “Commenting Out Code”.