PreviousNextTracker indexSee it online !

(53/212) 168 - Syntax Highlighting to Have Unlimited Number of Token types

Hello,

I found the fixed number of token types in jEdit very restrictive and unpleasant. When I have some combined code (HTML + PHP + evt. JavaScript + CSS), jEdit colors everything in the similar colors. But having blocks written in different language looking different would be so nice...

I studied jEdit's highlighting definition syntax and figured out that this is not limited by the highlighting system, but by the number of token types.

I've tried jEdit long time ago, then it was quite user unfriendly, crashing and run slowly on my then computer. Now I've found that with plugins, it has all features I look for to switch from my favorite but old editor, HomeSite (which's developement has already ended), EXCEPT for the genial syntax coloring of HomeSite... I will put a screenshot at http://ondra.zizka.cz/temp/HomeSite_screenshot.png . (Intentionally synthetized mix of all languages together, what is bad practice).

So, my feature request is:

As far as the "parsing" system is capable of the feature I ask for, and even the mode files would not have to be rewritten, I guess this is only a matter of the following:

Let's not have fixed set of token types; instead, let's track all token types of each mode and let it be configurable similarly to shortcuts:
1) Separate color configuration for each mode, and
2) Global default color config for certain token types (comment, keyword1, operator), which would be applied if the specific mode setting would be "use default for this token type".

Regards,
Ondra Žižka

Submitted pekarna - 2007-09-23 14:12:57 Assigned
Priority 5 Labels core
Status open Group None
Resolution None

Comments

2007-09-23 14:12:57
pekarna

Screenshot of HomeSite highlighting, and separated file pane in Project View

HomeSite_screenshot.png (142.1Kio)

2007-09-23 14:36:50
*anonymous

Logged In: YES
user_id=1477607
Originator: NO

I don't think we are limited by the number of tokens, although I agree that in principle there is no reason to limit the number of tokens.
I've had a few discussions with some of the developers about syntax highlighting. I, too, suggested to do all that you ask in this feature request, and more. However, since the jEdit core is fairly complex already, I've written a plugin named SyntaxHelper. Currently, this plugin only makes it easy to configure the style of each token, by showing you the syntax highlighting option pane in a dockable window and letting it follow the caret and show you the style of the token under the caret. Unfortunately, it depends on the latest jEdit development version and can't be released until jEdit 4.3pre11 (or 4.3final) comes out.
A future plan for this plugin is to enable mode-specific token styles and names. That is, each mode will be able to bind its own names for the existing token types, which will be more meaningful for the mode, and also suggest default styles for these token types. Users will be able to customize the global defaults as well as the mode-specific styles for each token type using the plugin. I don't want to introduce more complexity into jEdit for something that can easily be done in a plugin. I also thought of some revolutionary idea, where the plugin will let the user pick an arbitrary editor window on the screen (of some other editor) and "import" the style settings from that editor by querying the OS for the text style where possible (or apply some AI technique, but this gets out of scope for a SyntaxHelper plugin :-) and use it for the token type that jEdit would map this text to. But both of these are long-term plans.

2007-09-23 16:03:19
pekarna

Logged In: YES
user_id=1053064
Originator: YES

Ah... and was that discussion public?

And, if jEdit core devs decide to keep limited number of token types, could they at least add some? That shouldn't be that hard. My suggestion is below. I know that such solution is not much systemic, but better than nothing.

When do you assume that plugin could be ready for use? I would love to help, but yet I do not know anything about jEdit plugins writing. If you had some task that common Java programmer could do, tell me.


Suggested token types:

Most important: DELIMITER for delimiters between different languages (E.G. HTML / PHP <? ?>, C++ / embeded SQL, etc.) Such delimiters are the very important when editing combined code.

Also imporant: MARKUP2 - MARKUP8 for different colors for different XML/HTML tags - having the same colors for all tags is really ugly. See the attachment for an example of nice colored HTML and you simply must agree that such coloring is much prettier and much more lucid.

FUNCTION2 (e.g. for PHP built-in functions)
FUNCTION3 (e.g. for JavaScript -||-)
FUNCTION4 (e.g. for CSS selectors)


OPERATOR2 (e.g. for PHP's @ operator, which is kind of exceptional).
OPERATOR3 (e.g. for PHP's . operator, which usualy appears between strings and should be more visible).

KEYWORD 5 to 8 - each language has it's own keywords and it's own highlighting style (or should have)...

2007-09-24 09:36:53
*anonymous

Logged In: YES
user_id=1477607
Originator: NO

My discussions were not public. I actually suggested that each mode file defines the token types it uses, and that jEdit collects the token types from the mode files and enables their customization in the Global Options dialog. At least some of the token types have hard-coded semantics, so the token types can't be purely defined in the mode files.

The plugin is currently very limited and only provides a user friendly way to customize the existing token type styles. In its current, limited version, it will only be available after the next release of jEdit (4.3pre11 or something like that). Thanks for your suggestion to help, maybe we can work together on this. I suggest to continue the discussion you started in the community site.

Regarding token types: I don't think we are limited by the number of token types. jEdit defines 17 token types, do you know of a language that requires more?
Regarding delimiters between languages: This is not a matter of token types. The concept is that a buffer is opened in an edit mode (a single edit mode). I don't know if jEdit currently handles several languages in the same buffer, as far as I know it does not (correct me if I'm wrong...), and if this is the case, a massive change is required in jEdit to support that and I doubt this will ever happen due to the complexity. However, I think it can be done, to some extent, in a plugin. I suggest to discuss this along with the other requested features on the community site until we reach some decision.

2007-09-24 10:27:57
*anonymous

Logged In: YES
user_id=1477607
Originator: NO

Sorry, I was completely mistaken in the last part. It turns out jEdit has support for mixed-language buffers which is quite nice. I still think that the number of tokens is sufficient, unless a single language requires more token types for itself. I think that jEdit cannot define global semantics to the fixed set of token types - the semantics is set by the mode files, and each mode files can use the token types for whatever semantics it wishes. That's why I suggested to enable mode-specific names for the token types which indicate the sematics of the types for the mode (to go along with mode-specific styles for the token types).

2007-09-26 01:03:38
pekarna

Screenshot of jEdit highlighting, and single file and folders pane in Project View

jEdit_screenshot.png (160.2Kio)

2007-09-26 01:03:39
pekarna

Logged In: YES
user_id=1053064
Originator: YES

Continuing the discussion here to provide arguments for implementing this in one place.

Yes, jEdit can handle multiple languages gracefully.

And no, I think that current set of token types do \*not\* suffice:

Originally, jEdit (as it seemed to me) was originally intended to edit single-language files, mainly Java and ocassionally the others. And the idea was, that the programmer likes all the languages look the same, e.g. operators in red, keywords in blue, etc.

Then, after implementing very precious syntax highlighting system and few tweaks of it, it become able to parse multiple-language files, like the HTML + PHP + CSS + JavaScript quartet. And at this moment, having all languages look the same turned to be great disadvantage - sources are hard to scan and navigate through. Just try to open some mixed PHP file in jEdit and compare it with the attached image. I also attach the screenshot of jEdit. Notice how ugly the code is, compared to HomeSite's hilite.

The main problem is that the token types are not differentiate by language - what leads to situation, when ALL HTML tags are simply said to be "MARKUP", what is absolutely insufficient, and also the PHP delimiters, which should be the most visible part of document, have the very same look\! And other things, like all HTML arguments being colored the same way as PHP strings using LITERAL<n>, and so on and so on.

Having token types differentiate by language would allow nice syntax highlighting and for web editing, jEdit would become my "weapon of choice" :)

And, about the global stuff: That would be possible, using the KISS-like string matching. Then:
PHP::String could define LITERAL1 as its default (definition in the mode file, of course).
PHP::LineComment, PHP::BlockComment, JavaScript::LineComment, JavaScript::BlockComment, and HTML::Comment could define COMMENT1 as their default. CSS::BlockComment could use other, in example.

That's almost whole my idea. Very simple in principle and I guess it would not take that much effort to implement it, as far as I had some affairs with syntax highlighting programming on several occasions. Better do it now, before too many plugins will have to be modified after such change ;-)
File Added: jEdit_screenshot.png

2007-09-26 01:18:27
pekarna

Logged In: YES
user_id=1053064
Originator: YES

For the case someone would implement some of functionality described, I should be able to rewrite the PHP mode file to test with.

2007-09-26 07:06:11
kpouer

Logged In: YES
user_id=285591
Originator: NO

Hi, I'm sorry but I don't agree with you, I looked at the screenshots, and the difference between jEdit and Homesite is that in Homesite the html tags very basic syntax highlight.
Homesites seems to choose a color for each tag and use this color for the entire tag and it's attributes.
I don't think we should have different tokens for different languages. But it may be possible to have different style for the same token according to the language.
Another idea, I don't know if it would be easy to do :
add a background color when delegating to another edit mode.

2007-09-28 12:26:55
pekarna

Logged In: YES
user_id=1053064
Originator: YES

First, that is not the only difference, either I've chosen bad example, or you looked badly.

Second, HomeSite uses similar model of syntax hl - external files describing the syntax (compiled into an executable). Coloring depends on the parser, not HomeSite. The fact I am pointing at is that HomeSite can have unlimited number of tokens, thus the attributes like "onclick" could be parsed as JavaScript.

Third, different styles according to the language would be a workaround; having the same set of tokens for all languages in the world is (imho) generally bad idea, as the languages themselves do not have the same tokens. Eg., for HTML, it would result in using different token types for different tags. Having just one MARKUP is just too less. And for other languages, more MARKUPs will remain unused.

Fourth, what do you mean by "language"? If you ever opened the mode files, you could see that the languages are mixed together in one file and currently can not be differentiated, as they have just a name and no namespace. And if you mean "different mode files", that would not solve the HTML + PHP + ... mixed file coloring. The same is true for different background color for each edit mode - HTML and PHP are in the same file.

2007-09-29 20:15:31
*anonymous

Logged In: YES
user_id=1477607
Originator: NO

I think the following would do just what you ask:

1. Rename the global, fixed set of tokens to have generic, meaningless names, like "token1", "token2", ...
2. Letting each mode file assign a meaning (mode-specific name) to each token type. E.g. html could assign "MARKUP1", "MARKUP4" to token types "TOKEN5" ... "TOKEN8", while c++ could assign "Variable" to "TOKEN5", "function" to "TOKEN6" and so on.
3. Enabling mode-specific token type styles. E.g. HTML could have TOKEN1 mapped to yellow text, while PHP could have TOKEN1 mapped to red text.
4. The Syntax highlighting option pane would have a mode combo-box allowing you to select a mode, so you'll see the token type names and styles defined for that mode and be able to customize them for that mode (and include a "Use default settings" check-box which is set by default, like Global Options -> Editing).
5. When rules are delegated to another mode file (e.g. when one language is embedded in a buffer of another one), the styles of the delegated mode will be used for its rules. So tokens handled by the PHP mode file will use PHP styles, no matter if they exist in a PHP buffer or in an HTML buffer with embedded PHP.

Am I right?

2007-09-30 23:28:56
pekarna

Logged In: YES
user_id=1053064
Originator: YES

Yes, that closes to the state I am willing for. I think that current number of all token types would suffice for each separate language. And if not, one could create second mode file for the same language and delegate to the tokens defined there. However, I am not sure whether that would be easier to implement than the "unlimited" tokens solution. I wish I had time to get familiar with jEdit's core.

2008-03-14 10:17:15
itowi

Logged In: YES
user_id=1889436
Originator: NO

Just another suggestion for how to handle mixed language files:
In PHPedit, only one language is highlited at a time (depending on where the caret is) while all other parts of the filed are 'dimmed' (grey text on white background). This could translate to jEdit as follows:

1. Introduce one other token type for the 'dimmed' text (or associate a text style for 'dimmed' display with each language)
2. Optionally display anything that is delegated to another mode file than the one for the current caret position in the (its?) 'dimmed' style (the alternative being the old behavior of highliting all of the text)
3. Have separate mode files for each language, i.e., don't mix (some of) HTML, PHP, JavaScript and CSS into one mode file (which would seem to me like a cleaner solution anyway)

This would seem to require little changes to the current implementation while still enabling a clear distinction between the different languages.

2008-06-29 13:08:04
pekarna

Logged In: YES
user_id=1053064
Originator: YES

I tried jEdit after a long time, and I see that still there is no support for multiple language files (HTML+PHP, e.g.).
Current system makes such files very confusing and almost unreadable.

- equal signs for HTML attributes have the same color as operators - very ugly
- PHP delimiter has the same color as the HTML tags - makes them almost invisible, although they should be most visible (see the attachments)
- comments have the same color in PHP and in HTML - confusing when quickly scanning the file
and more...


Is there any plugin that could widen the variety of language tokens, or better, provide a non-trivial syntax highlighting, which could color files with multiple languages?

Thanks, Ondra

2008-06-29 13:08:46
pekarna

Logged In: YES
user_id=1053064
Originator: YES

I tried jEdit after a long time, and I see that still there is no support for multiple language files (HTML+PHP, e.g.).
Current system makes such files very confusing and almost unreadable.

- equal signs for HTML attributes have the same color as operators - very ugly
- PHP delimiter has the same color as the HTML tags - makes them almost invisible, although they should be most visible (see the attachments)
- comments have the same color in PHP and in HTML - confusing when quickly scanning the file
and more...


Is there any plugin that could widen the variety of language tokens, or better, provide a non-trivial syntax highlighting, which could color files with multiple languages?

Thanks, Ondra