ace -- 语法高亮

Creating a Syntax Highlighter for Ace 给ace创建一个语法高亮

Creating a new syntax highlighter for Ace is extremely simple. You'll need to define two pieces of code: a new mode, and a new set of highlighting rules.

创建一个新的ace语法高亮极为简单。你需要定义两个代码: 一个新的mode和一组新的高亮规则。

Where to Start

We recommend using the Ace Mode Creator when defining your highlighter. This allows you to inspect your code's tokens, as well as providing a live preview of the syntax highlighter in action.

我们建议使用 Ace Mode Creator 定义你的高亮。这允许你检查你的代码的tokens,以及在操作中提供语法高亮的实时预览。

Ace Mode Creator :  https://ace.c9.io/tool/mode_creator.html

Defining a Mode

Every language needs a mode. A mode contains the paths to a language's syntax highlighting rules, indentation rules, and code folding rules. Without defining a mode, Ace won't know anything about the finer aspects of your language.

Here is the starter template we'll use to create a new mode:

每种语言都需要一个mode。mode包含语言的语法高亮规则,缩进规则和代码折叠规则的路径。在没有定义mode的情况下,ACE对你语言的细微之处一无所知

这是一个启动模板,我们将用它创建一个新的mode:

 

define(function(require, exports, module) {
"use strict";
 
var oop = require("../lib/oop");
// defines the parent mode
var TextMode = require("./text").Mode;
var Tokenizer = require("../tokenizer").Tokenizer;
var MatchingBraceOutdent = require("./matching_brace_outdent").MatchingBraceOutdent;
 
// defines the language specific highlighters and folding rules
var MyNewHighlightRules = require("./mynew_highlight_rules").MyNewHighlightRules;
var MyNewFoldMode = require("./folding/mynew").MyNewFoldMode;
 
var Mode = function() {
// set everything up
this.HighlightRules = MyNewHighlightRules;
this.$outdent = new MatchingBraceOutdent();
this.foldingRules = new MyNewFoldMode();
};
oop.inherits(Mode, TextMode);
 
(function() {
// configure comment start/end characters
this.lineCommentStart = "//";
this.blockComment = {start: "/*", end: "*/"};
 
// special logic for indent/outdent.
// By default ace keeps indentation of previous line
this.getNextLineIndent = function(state, line, tab) {
var indent = this.$getIndent(line);
return indent;
};
 
this.checkOutdent = function(state, line, input) {
return this.$outdent.checkOutdent(line, input);
};
 
this.autoOutdent = function(state, doc, row) {
this.$outdent.autoOutdent(doc, row);
};
 
// create worker for live syntax checking
this.createWorker = function(session) {
var worker = new WorkerClient(["ace"], "ace/mode/mynew_worker", "NewWorker");
worker.attachToDocument(session.getDocument());
worker.on("errors", function(e) {
session.setAnnotations(e.data);
});
return worker;
};
 
}).call(Mode.prototype);
 
exports.Mode = Mode;
});

What's going on here? First, you're defining the path to TextMode (more on this later). Then you're pointing the mode to your definitions for the highlighting rules, as well as your rules for code folding. Finally, you're setting everything up to find those rules, and exporting the Mode so that it can be consumed. That's it!

这里发生了什么?首先,你定义了TextMode的路径(稍后对此进行更多的阐述)。然后,你将mode指向你定义的高亮规则以及代码折叠规则。最后你设置所有的内容来查找这些规则,并导出该Mode以便它可以被使用。

 

Regarding TextMode, you'll notice that it's only being used once: oop.inherits(Mode, TextMode);. If your new language depends on the rules of another language, you can choose to inherit the same rules, while expanding on it with your language's own requirements. For example, PHP inherits from HTML, since it can be embedded directly inside .html pages. You can either inherit from TextMode, or any other existing mode, if it already relates to your language.

关于 TextMode, 你会注意到它只使用了一次:oop.inherits(Mode, TextMode); 如果你的新语言依赖于其他语言的规则,那么你可以选择继承相同的规则,同时根据你的语言自身的需求对其进行扩展。例如,PHP从HTML继承,因为PHP可以直接嵌入到.html页面中。你也可以从 TextMode继承,或者其他已有的mode,如果它已经涉及到你的语言。

 

All Ace modes can be found in the lib/ace/mode folder.

ace的所有modes都可以在 lib/ace/mode 文件夹中找到

Defining Syntax Highlighting Rules 定义语法高亮规则

The Ace highlighter can be considered to be a state machine. Regular expressions define the tokens for the current state, as well as the transitions into another state. Let's define mynew_highlight_rules.js, which our mode above uses.

All syntax highlighters start off looking something like this:

ace高亮可以被认为是一个状态机。正则表达式给当前状态定义tokens,以及转换到另一个状态。让我们定义 mynew_highlight_rules.js,上面使用的mode。

所有的语法高亮开始都像这样:

define(function(require, exports, module) {
"use strict";
 
var oop = require("../lib/oop");
var TextHighlightRules = require("./text_highlight_rules").TextHighlightRules;
 
var MyNewHighlightRules = function() {
 
// regexp must not have capturing parentheses. Use (?:) instead.
// regexps are ordered -> the first match is used
this.$rules = {
"start" : [
{
token: token, // String, Array, or Function: the CSS token to apply
regex: regex, // String or RegExp: the regexp to match
next: next // [Optional] String: next state to enter
}
]
};
};
 
oop.inherits(MyNewHighlightRules, TextHighlightRules);
 
exports.MyNewHighlightRules = MyNewHighlightRules;
 
});

The token state machine operates on whatever is defined in this.$rules. The highlighter always begins at the start state, and progresses down the list, looking for a matching regex. When one is found, the resulting text is wrapped within a <span class="ace_<token>"> tag, where <token> is defined as the token property. Note that all tokens are preceded by the ace_prefix when they're rendered on the page.

token状态机运行在 this.$rules里不管什么定义。高亮总是从start 状态开始,并沿着列表前进,寻找匹配的正则表达式regex。当找到文本时,被找到的文本被包裹在<span class="ace_<token>">标签中, <token>是上面定义的 token属性。请注意,当tokens渲染到页面上时,都会以 ace_ 前缀呈现。

 

Once again, we're inheriting from TextHighlightRules here. We could choose to make this any other language set we want, if our new language requires previously defined syntaxes. For more information on extending languages, see "extending Highlighters" below.

再来一次,我们从 TextHighlightRules 继承下来。如果我们的新语言需要先前定义的语法,我们可以选择把它变成我们想要的任何其它语言集。有关扩展语言的更多信息,请查看下面的 extending Highlighters 

 

Defining Tokens  定义tokens

The Ace highlighting system is heavily inspired by the TextMate language grammar. Most tokens will follow the conventions of TextMate when naming grammars. A thorough (albeit incomplete) list of tokens can be found on the Ace Wiki.

ace高亮系统深受 TextMate language grammar 启发。当命名语法时,大多数tokens将遵循 TextMate的约定。在ace wiki上可以找到完整的token列表 (虽然不完整):    

token列表: https://github.com/ajaxorg/ace/wiki/Creating-or-Extending-an-Edit-Mode#commonTokens

 

For the complete list of tokens, see tool/tmtheme.js. It is possible to add new token names, but the scope of that knowledge is outside of this document.

有关完整的tokens列表, 请查看 tool/tmtheme.js  https://github.com/ajaxorg/ace/blob/master/tool/tmtheme.js    可以添加新的token名称,但该知识的范围在该文档之外。

 

Multiple tokens can be applied to the same text by adding dots in the token, e.g. token: support.function wraps the text in a <span class="ace_support ace_function"> tag.

通过在tokens添加 点 ,可以将多个tokens作用于同一文本。例如 token: support.function   将文本包裹在 <span class="ace_support ace_function">标签中。

 

Defining Regular Expressions 定义正则表达式

Regular expressions can either be a RegExp or String definition

正则表达式既可以是正则表达式也可以是字符串定义

If you're using a regular expression, remember to start and end the line with the / character, like this:

如果你使用一个正则表达式,记住像下面这样,在一行的开始和结束使用 / 字符。

{
token : "constant.language.escape",
regex : /\$[\w\d]+/
}
 

A caveat of using stringed regular expressions is that any \ character must be escaped. That means that even an innocuous regular expression like this:

使用字符串形式的正则表达式的一个警告是任何 \ 字符必须被转义。这意味着,即使是一个像下面这样的无害的正则表达式:

regex: "function\s*\(\w+\)"
 

Must actually be written like this:

必须像下面这样编写:

regex: "function\\s*\(\\w+\)"
 

Groupings 分组

You can also include flat regexps--(var)--or have matching groups--((a+)(b+)). There is a strict requirement whereby matching groups must cover the entire matched string; thus, (hel)lo is invalid. If you want to create a non-matching group, simply start the group with the ?: predicate; thus, (hel)(?:lo) is okay. You can, of course, create longer non-matching groups. For example:

你也可以包括 单一的正则 --(var)-- 或者 匹配组 --((a+)(b+))。严格要求匹配组必须覆盖整个匹配字符串,因此 (hel)lo 是无效的。如果你想创建一个不匹配的组,只需要用 ?: 谓语作为组的开始;像 (hel)(?:lo) 也是可以的。 当然,你可以创建更长的非匹配组。 例如:

{
token : "constant.language.boolean",
regex : /(?:true|false)\b/
},
 

For flat regular expression matches, token can be a String, or a Function that takes a single argument (the match) and returns a string token. For example, using a function might look like this:

对于单一的正则表达式匹配, token可以是一个 String, 或者是一个接收单个参数(当前匹配)并返回一个字符串token的Function。例如,使用函数可能看起来像下面这样:

var colors = lang.arrayToMap(
("aqua|black|blue|fuchsia|gray|green|lime|maroon|navy|olive|orange|" +
"purple|red|silver|teal|white|yellow").split("|")
);
 
var fonts = lang.arrayToMap(
("arial|century|comic|courier|garamond|georgia|helvetica|impact|lucida|" +
"symbol|system|tahoma|times|trebuchet|utopia|verdana|webdings|sans-serif|" +
"serif|monospace").split("|")
);
 
...
 
{
token: function(value) {
if (colors.hasOwnProperty(value.toLowerCase())) {
return "support.constant.color";
}
else if (fonts.hasOwnProperty(value.toLowerCase())) {
return "support.constant.fonts";
}
else {
return "text";
}
},
regex: "\\-?[a-zA-Z_][a-zA-Z0-9_\\-]*"
}

 

If token is a function,it should take the same number of arguments as there are groups, and return an array of tokens.

如果token是一个函数,它应该具有与组相同的参数数目,并且返回一个tokens数组。

 

For grouped regular expressions, token can be a String, in which case all matched groups are given that same token, like this:

对于分组正则表达式,token可以是 String , 在这种情况下,所有的匹配组都被赋予相同的token。像下面这样

{
token: "identifier",
regex: "(\\w+\\s*:)(\\w*)"
}
 

More commonly, though, token is an Array (of the same length as the number of groups), whereby matches are given the token of the same alignment as in the match. For a complicated regular expression, like defining a function, that might look something like this:

然而,更常见的是,token是一个数组(长度与 组的数量 相同),由此,匹配被赋予与匹配中相同的对齐的token。对于一个复杂的正则表达式,像定义一个函数,看起来可能像下面这样:

{
token : ["storage.type", "text", "entity.name.function"],
regex : "(function)(\\s+)([a-zA-Z_][a-zA-Z0-9_]*\\b)"
}

 

Defining States 定义状态

The syntax highlighting state machine stays in the start state, until you define a next state for it to advance to. At that point, the tokenizer stays in that new state, until it advances to another state. Afterwards, you should return to the original start state.

语法高亮状态机停留在 start 状态,直到你给它定义一个 next 状态来更新。此时, tokenizer保持在新的 state , 直到它进入到另一个状态。然后, 你应该回到原来的 start 状态。

Here's an example:

this.$rules = {
"start" : [ {
token : "text",
regex : "<\\!\\[CDATA\\[",
next : "cdata"
} ],
 
"cdata" : [ {
token : "text",
regex : "\\]\\]>",
next : "start"
}, {
defaultToken : "text"
} ]
};

In this extremely short sample, we're defining some highlighting rules for when Ace detects a <![CDATA tag. When one is encountered, the tokenizer moves from start into the cdata state. It remains there, applying the text token to any string it encounters. Finally, when it hits a closing ]> symbol, it returns to the start state and continues to tokenize anything else.

在这个非常短的示例中,我们定义了一些用于检测 <![CDATA 标签的高亮规则。当遇到一个时,tokenizer从 start 移动到 cdata状态。它仍然存在,将 ‘text’ token应用到它遇到的任何字符串。最后,当它命中关闭  ]> 符号时, 它返回到start 状态并且继续标记任何其他东西。

 

Using the TMLanguage Tool  使用 TMLanguage 工具

There is a tool that will take an existing tmlanguage file and do its best to convert it into Javascript for Ace to consume. Here's what you need to get started:

有一个工具,它将使用现有的 tmlanguage 文件,并尽最大努力将其转换成 Javascript以供 ace使用。一下是你需要开始的:

  1. In the Ace repository, navigate to the tools folder.
    1.   在ace库中, 导航到 tools 文件夹
  2. Run npm install to install required dependencies.
    1.   运行 npm install 安装需要的依赖
  3. Run node tmlanguage.js <path_to_tmlanguage_file>; for example, node <path_to_tmlanguage_file> /Users/Elrond/elven.tmLanguage
    1.   运行 node tmlanguage.js <path_to_tmlanguage_file> 例如: node tmlanguage  /Users/Elrond/elven.tmLanguage 

Two files are created and placed in lib/ace/mode: one for the language mode, and one for the set of highlight rules. You will still need to add the code into ace/ext/modelist.js, and add a sample file for testing.

两个文件被创建并放置在 lib/ace/mode 目录下: 一个是语言 mode, 一个是高亮规则的集合。你仍然需要将代码添加到 ace/ext/modelist.js中,并添加用于测试的示例文件。

 

A Note on Accuracy 关于精度的一点注记

Your .tmlanguage file will then be converted to the best of the converter’s ability. It is an understatement to say that the tool is imperfect. Probably, language mode creation will never be able to be fully autogenerated. There's a list of non-determinable items; for example:

你的 .tmlanguage 文件会转换为 转换器最好的能力。这是一个轻描淡写的说法,该工具是不完美的。也许,语言模式的创造永远不能完全自生。这里有一个不可确定的项目清单,如下:

  • The use of regular expression lookbehinds
    This is a concept that JavaScript simply does not have and needs to be faked
    •   正则表达式查找表的使用
    •       这是一个javascript根本没有,需要伪造的概念。
  • Deciding which state to transition to
    While the tool does create new states correctly, it labels them with generic terms like state_2state_10e.t.c.
    •   决定向哪个 状态 过渡
    •      虽然工具确实创建了新的状态,但它用 state_2, state_10等通用属于来标记它们。
  • Extending modes
    Many modes say something like include source.c, to mean, “add all the rules in C highlighting.” That syntax does not make sense to Ace or this tool (though of course you can extending existing highlighters).
    •   扩展模式
    •       许多模式都说一些类似于 include source.c 的例子, 意思是”在c高亮中加入所有的规则“。这种语法对于ace或者这个工具是没有意义的(当然,你可以扩展现有的高亮显示器)。
  • Rule preference order
    •   规则偏好顺序
  • Gathering keywords
    Most likely, you’ll need to take keywords from your language file and run them through createKeywordMapper()
    •   关键词采集
    •       最有可能的,你需要从你的语言文件中获取关键词,并通过  createKeywordMapper() 运行它们。

However, the tool is an excellent way to get a quick start, if you already possess a tmlanguage file for you language.

然而。如果你对你的语言已经拥有了一个 tmlanguage 文件,这个工具是一个很好的快速入门的方法。

 

Extending Highlighters  扩展高亮

Suppose you're working on a LuaPage, PHP embedded in HTML, or a Django template. You'll need to create a syntax highlighter that takes all the rules from the original language (Lua, PHP, or Python) and extends it with some additional identifiers (<?lua<?php{%, for example). Ace allows you to easily extend a highlighter using a few helper functions.

假设你正在处理一个 LuaPage, PHP 嵌入到 HTML, 或者一个 Django模板。你需要创建一个语法高亮程序,它从原始语言(Lua, PHP, or Python)获取所有语法规则,并使用一些附加标识符(例如, <?lua  <?php, {%)扩展它。ace允许你使用几个辅助函数轻松扩展高亮。

 

Getting Existing Rules  获取已有的规则

To get the existing syntax highlighting rules for a particular language, use the getRules() function. For example:

要获得特定语言的现有语法高亮规则,使用getRules() 函数,例如:

var HtmlHighlightRules = require("./html_highlight_rules").HtmlHighlightRules;
 
this.$rules = new HtmlHighlightRules().getRules();
 
/*
this.$rules == Same this.$rules as HTML highlighting
*/
 

Extending a Highlighter

The addRules method does one thing, and it does one thing well: it adds new rules to an existing rule set, and prefixes any state with a given tag. For example, let's say you've got two sets of rules, defined like this:

addRules 方法做一件事,并且做的很好: 它向现有规则集添加新规则,并且用一个给定的标签给任何状态添加前缀。例如,假设你有两套规则,定义如下:

this.$rules = {
"start": [ /* ... */ ]
};
 
var newRules = {
"start": [ /* ... */ ]
}

If you want to incorporate newRules into this.$rules, you'd do something like this:

如果你想将 newRules 合并到 this.$rules , 你可以这样做:

this.addRules(newRules, "new-");
 
/*
this.$rules = {
"start": [ ... ],
"new-start": [ ... ]
};
*/

Extending Two Highlighters

The last function available to you combines both of these concepts, and it's called embedRules. It takes three parameters:

最后一个可用的函数将这两个概念结合起来,称为 embedRules。 它接收三个参数:

  1. An existing rule set to embed with
    1.   嵌入现有的规则
  2. A prefix to apply for each state in the existing rule set
    1.   在现有规则集中应用每个状态的前缀
  3. A set of new states to add
    1.   添加一组新的状态

Like addRulesembedRules adds on to the existing this.$rules object.

像 addRules, embedRules 添加到现有的 this.$rules 对象。

To explain this visually, let's take a look at the syntax highlighter for Lua pages, which combines all of these concepts:

为了直观的解释这一点,让我们看看 Lua页面的语法高亮,它结合了所有这些概念:

var HtmlHighlightRules = require("./html_highlight_rules").HtmlHighlightRules;
var LuaHighlightRules = require("./lua_highlight_rules").LuaHighlightRules;
 
var LuaPageHighlightRules = function() {
this.$rules = new HtmlHighlightRules().getRules();
 
for (var i in this.$rules) {
this.$rules[i].unshift({
token: "keyword",
regex: "<\\%\\=?",
next: "lua-start"
}, {
token: "keyword",
regex: "<\\?lua\\=?",
next: "lua-start"
});
}
this.embedRules(LuaHighlightRules, "lua-", [
{
token: "keyword",
regex: "\\%>",
next: "start"
},
{
token: "keyword",
regex: "\\?>",
next: "start"
}
]);
};

Here, this.$rules starts off as a set of HTML highlighting rules. To this set, we add two new checks for <%= and <?lua=. We also delegate that if one of these rules are matched, we should move onto the lua-start state. Next, embedRules takes the already existing set of LuaHighlightRules and applies the lua- prefix to each state there. Finally, it adds two new checks for %> and ?>, allowing the state machine to return to start.

这里, this.$rules 规则从一组 HTML高亮规则开始。对于这个集合,我们添加了两个新的检查 <%=  和  <?lua= 。我们还授权,如果这些规则中的一个匹配,我们应该移动到 lua-start 状态。接下来,embedRules将已经存在的 LuaHIghlightRUles集合应用lua-前缀到每个状态。最后, 它为 %> 和 ?> 添加了两个新的检查,允许状态机返回到 start 。

 

Code Folding

Adding new folding rules to your mode can be a little tricky. First, insert the following lines of code into your mode definition:

在你的mode中添加新的折叠规则可能会有点棘手。 首先,将下面几行代码插入到你的mode定义中。

var MyFoldMode = require("./folding/newrules").FoldMode;
 
...
var MyMode = function() {
 
...
 
this.foldingRules = new MyFoldMode();
};

 

You'll be defining your code folding rules into the lib/ace/mode/folding folder. Here's a template that you can use to get started:

你将代码折叠规则定义到 lib/ace/mode/folding 文件夹。 这里有个模板你可以用它来开始。

define(function(require, exports, module) {
"use strict";
 
var oop = require("../../lib/oop");
var Range = require("../../range").Range;
var BaseFoldMode = require("./fold_mode").FoldMode;
 
var FoldMode = exports.FoldMode = function() {};
oop.inherits(FoldMode, BaseFoldMode);
 
(function() {
 
// regular expressions that identify starting and stopping points
this.foldingStartMarker;
this.foldingStopMarker;
 
this.getFoldWidgetRange = function(session, foldStyle, row) {
var line = session.getLine(row);
 
// test each line, and return a range of segments to collapse
};
 
}).call(FoldMode.prototype);
 
});

 

Just like with TextMode for syntax highlighting, BaseFoldMode contains the starting point for code folding logic. foldingStartMarkerdefines your opening folding point, while foldingStopMarker defines the stopping point. For example, for a C-style folding system, these values might look like this:

就像TextMode语法高亮一样,BaseFoldMode包含代码折叠逻辑的起点。foldingStartMarker 定义了你的折叠打开点, 而foldingStopMarker定义了停止点。例如,对于 C-style 折叠系统,这些值可能是这样:

this.foldingStartMarker = /(\{|\[)[^\}\]]*$|^\s*(\/\*)/;
this.foldingStopMarker = /^[^\[\{]*(\}|\])|^[\s\*]*(\*\/)/;

 

These regular expressions identify various symbols--{[//--to pay attention to. getFoldWidgetRange matches on these regular expressions, and when found, returns the range of relevant folding points. For more information on the Range object, see the Ace API documentation.

这些正则表达式各种符号-- {,[,// --  要注意。 在这些正则表达式上匹配 getFoldWidgetRange, 当找到时,返回相关折叠点的范围。有关Range对象的更多信息,查看 the Ace API documentation  

Again, for a C-style folding mechanism, a range to return for the starting fold might look like this:

同样,对于 C-style 折叠机构,返回起始折叠范围可能是这样:

var line = session.getLine(row);
var match = line.match(this.foldingStartMarker);
if (match) {
var i = match.index;
 
if (match[1])
return this.openingBracketBlock(session, match[1], row, i);
 
var range = session.getCommentFoldRange(row, i + match[0].length);
range.end.column -= 2;
return range;
}

Let's say we stumble across the code block hello_world() {. Our range object here becomes:

{
startRow: 0,
endRow: 0,
startColumn: 0,
endColumn: 13
}

Testing Your Highlighter

The best way to test your tokenizer is to see it live, right? To do that, you'll want to modify the live Ace demo to preview your changes. You can find this file in the root Ace directory with the name kitchen-sink.html.

  1. add an entry to supportedModes in ace/ext/modelist.js
  2. add a sample file to demo/kitchen-sink/docs/ with same name as the mode file

Once you set this up, you should be able to witness a live demonstration of your new highlighter.

Adding Automated Tests

Adding automated tests for a highlighter is trivial so you are not required to do it, but it can help during development.

In lib/ace/mode/_test create a file named 

text_<modeName>.txt

with some example code. (You can skip this if the document you have added in demo/docs both looks good and covers various edge cases in your language syntax).

 

Run node highlight_rules_test.js -gen to preserve current output of your tokenizer in tokens_<modeName>.json

After this running highlight_rules_test.js optionalLanguageName will compare output of your tokenizer with the correct output you've created.

Any files ending with the _test.js suffix are automatically run by Ace's Travis CI server.

posted @ 2018-11-20 12:26  她在村口等我  阅读(1991)  评论(0编辑  收藏  举报