blob: 6e1e07a7ee94220a7a031752ebf44217f11ae925 [file] [log] [blame]
 N4JS Design Specification

Last Updated: 2019-08-08

Authors:
Jens von Pilgrim, Jakub Siberski, Mark-Oliver Reiser, Torsten Krämer, Ákos Kitta, Sebastian Zarnekow, Lorenzo Bettini, Jörg Reichert, Kristian Duske, Marcus Mews, Minh Quang Tran, Luca Beurer-Kellner

This document contains the N4JS Design and Implementation documentation.

1. Introduction

This document describes design aspects of the N4JS compiler and IDE. It relies on the following N4JS related specifications:

• N4JS Language Specification [N4JSSpec]

1.1. Notation

We reuse the notation specified in [N4JSSpec].

1.2. IDE Components

The N4JS and N4JSIDE components are organized via features. The following features with included plugins are defined (the common prefix "org.eclipse.n4js" is omitted at the plugin name):

Feature Plugin Description

org.eclipse.n4js.lang.sdk

N4JS core language with parser, validation etc.

org.eclipse.n4js

Xtext grammar with generator and custom code for N4JS, scoping (and binding) implementation, basic validation (and Xsemantics type system).

doc

(in doc folder) General documentation (including web page) written in AsciiDoc

external.libraries

Support for N4JS libraries shipped with the IDE, i.e. core N4JS library and mangelhaft.

ui

UI components for N4JS, e.g., proposal provider, labels, outline, quickfixes.

jsdoc

Parser and model for JSDoc

external.libraries.update

Not included in feature. Updates the external library plugin

org.eclipse.n4js.ts.sdk

Type System

ts

Xtext grammar with generator and custom code for type expressions and standalone type definitions.

ts.model

Xcore based types model with helper classes etc.

ts.ui

Xtext generated UI for type system, not really used as this TS files are not editable by users.

org.eclipse.n4js.unicode.sdk

common.unicode

Xtext grammar with generator and custom code used by all other grammars for proper unicode support.

org.eclipse.n4js.regex.sdk

Regular expression grammar and UI, used by N4JS grammar and UI

regex

Xtext grammar with generator and custom code used by N4JS grammars for regular expressions.

regex.ui

UI components for regular expressions, e.g., proposal provider, labels, outline, quickfixes.

org.eclipse.n4js.sdk

This feature defines the N4JSIDE. It contains core UI plugins and all includes (almost all) other features!

environments

Utility plugin, registers n4scheme for EMF proxy resolution.

model

Xcore based N4JS model with helper classes etc.

product

N4JSIDE main application.

releng.utils

(in releng folder) Contains utility classes only used for building the system, e.g., tools for generating antlr based parser with extended features.

utils

general utilities

utils.ui

general UI utilities

org.eclipse.n4js.compiler.sdk

Compilers and Transpilers

generator.common

Not included in feature, logically associated.

N4JS headless generator (i.e. command line compiler).

transpiler

Generic transpiler infrastructure

transpiler.es

Transpiler to compile to EcmaScript

org.eclipse.n4js.json.sdk

N4JS JSON

json

Xtext grammar with generator and custom code for a extensible JSON language support. Used in N4JS for the project description in terms of a package.json file.

json.ui

UI components for extensible JSON language support, e.g., proposal provider, labels, outline.

json.model

Not included in feature, logically associated. Xcore based model for the JSON language.

org.eclipse.n4js.semver.sdk

Semantic version string support.

semver

Parser and tools for semantic version strings.

semver.ui

UI tools for semantic version strings.

semver.model

Not included in feature, logically associated. Xcore model of semantic version strings.

org.eclipse.n4js.runner.sdk

Runners for executing N4JS or JavaScript code

runner

Generic interfaces and helper for runners, i.e. JavaScript engines executing N4JS or JavaScript code.

runner.chrome

Runner for executing N4JS or JavaScript with Chrome.

runner.chrome.ui

UI classes for launching the Chrome runner via the org.eclipse.debug.ui

runner.nodejs

Runner for executing N4JS or JavaScript with node.js.

runner.nodejs.ui

UI classes for launching the node.js runner via the org.eclipse.debug.ui

runner.ui

Generic interfaces for configuring N4JS runner via the debug ui.

org.eclipse.n4js.tester.sdk

Runners and UI for tests (via mangelhaft).

tester

Generic interfaces and helper for testers, i.e. JavaScript engines executing N4JS tests (using mangelhaft).

tester.nodejs

Tester based on the nodejs runner for executing mangelhaft tests with node.js

tester.nodejs.ui

UI for showing test results.

tester.ui

Configuration of tests via the debug UI.

org.eclipse.n4js.jsdoc2spec.sdk

JSDoc 2 Specification

jsdoc2spec

Exporter to generate API documentation with specification tests awareness

jsdoc2spec.ui

UI for API doc exporter

org.eclipse.n4js.xpect.sdk

xpect

Xpect test methods.

xpect.ui

UI for running Xpext tests methods from the N4JSIDE (for creating bug reports).

org.eclipse.n4js.smith.sdk

Feature for internal N4JS IDE plugins only intended for development (for example, the AST Graph view).

smith

Non-UI classes for tools for smiths, that is, tools for developers of the N4JS IDE such as AST views etc.

smith.ui

UI classes for tools for smiths, that is, tools for developers of the N4JS IDE such as AST views etc.

org.eclipse.n4js.tests.helper.sdk

Test helpers.

org.eclipse.n4js.dependencies.sdk

Collection of all external non-ui dependencies, used for local mirroring of update sites.

org.eclipse.n4js.dependencies.ui.sdk

Collection of all external ui dependencies, used for local mirroring of update sites.

uncategorized plugins

flowgraphs

Control and data flow graph model and computer.

Fragments

not associated to features, only listed here for completeness

utils.logging

Fragment only, configuration for loggers, in particular for the product and for the tests

1.2.1. Naming Conventions

In the above sections, tests were omitted. We use the following naming conventions (by example) for test and tests helper:

project

-

project.tests

tests for project, is a fragment

project.tests.helper

helper classes used ONLY by tests

project.tests.performance

performance tests

project.tests.integration

integration tests

project.ui

-

project.ui.tests

tests for ui project, fragment of project.ui

project.ui.tests.helper

helper classes used ONLY by tests

project.ui.tests.performance

-

tests.helper

general test helper

ui.tests.helper

general ui test helper

project.xpect.tests

xpect tests for the project, despite dependnecies to UI the can be executed as plain JUnit tests

project.xpect.ui.tests

xpect tests for the project, need to be executed as eclipse plugin tests

Due to Maven, tests are in subfolder tests (incl. helpers), implementation bundles in plugins, and release engineering related bundles in releng.

2. Eclipse Setup

2.1. System Requirements

In all cases, Java 11 is required to be installed on your system. Node.js version 10+ is also required, and for some tests you need Yarn to be globally installed.

2.2. Contribute

Eclipse developers who want to develop N4JS itself should use the Oomph Eclipse installer. The N4JS project is listed under "Eclipse Projects/N4JS" This setup installs the correct Eclipse version, creates a new workspace and clones all projects into it (for details see below).

2.2.1. Eclipse Installer

The recommended way to install the Eclipse IDE and set up the workspace is to use the Eclipse Installer. This installer is to be downloaded from https://wiki.eclipse.org/Eclipse_Installer

Run the installer and apply the following steps:

1. change to "Advance Mode" via the menu (upper-right corner) (no need to move the installer)

2. select a product, e.g. "Eclipse IDE for Eclipse Committers" with product version "2019-06". Hint: Do not select "latest" because this will cause automatic updates which may lead to weird errors later on.

3. double-click the entry Eclipse Projects/N4JS so that it is shown in the catalog view below

4. on the next page, configure paths accordingly. You only have to configure the installation and workspace folder. You may want to use git with https instead of ssh.

5. start installation

The installer will then guide you through the rest of the installation. All plug-ins are downloaded and configured automatically, so is the workspace including downloading the git repository and setting up the workspace.

The workspace is configured automatically. This includes fetching the necessary git repository. If you have selected git with SSH you may run into problems. In this case you can re-run the scripts and select HTTPS instead, this should work in any case.

Eventually the installer scripts are done, that means the git repository has been cloned and the workspace has been configured (including the project set setup). Now the automatic build kicks in as you can see in the status bar.

The build will show a lot of errors while still working. Eventually the whole project should have been compiled without any errors. Unfortunately, due to a known issue, two problems exists. Please have a look at the linked issue on how to fix that (it is quite easy).

2.2.1.1. Changing the Setup Script

The setup scripts is stored at

n4js/releng/org.eclipse.n4js.targetplatform/N4JS.setup

Details about Oomph-Setup scripts can be found at

Manual IDE configuration is not recommended!

For a manual install, clone the code and import all top-level projects from the docs, features, plugins, releng, testhelpers, and tests folders. Activate the targetplatform contained in the releng/org.eclipse.n4js.targetplatform/ project.

The N4JS IDE is developed with Eclipse 2019-06 or better since the system is based on Eclipse anyway. It is almost impossible to use another IDE to develop Eclipse plugins. The list of required plugins includes:

It is important to use the latest version of Xtext and the corresponding service release of Xcore. You will find the latest version numbers and plugins used in the target platform definition at https://github.com/eclipse/n4js/blob/master/releng/org.eclipse.n4js.targetplatform/org.eclipse.n4js.targetplatform.target

You may need to adjust some settings in Eclipse, most importantly

• Text file encoding to Other: UTF-8 and

• New text file line delimiter to Unix .

3. Release Engineering

3.1. Nightly build on Eclipse infrastructure

The N4JS IDE, headless n4jsc.jar, and the N4JS update site is being built on the Eclipse Common Build Infrastructure (CBI). For this purpose the N4JS project is using a dedicated Jenkins instance, referred to as a "Jenkins Instance Per Project" (JIPP) in Eclipse CBI documentation. At this time, the N4JS project’s JIPP is running on the "old" infrastructure, not yet using docker. This will be migrated at a later point in time.

The N4JS JIPP is available at: https://ci.eclipse.org/n4js/

The nightly build performs the following main steps:

1. compile the N4JS implementation,

2. build the n4jsc.jar, the IDE products for MacOS, Windows, Linux, and the update site,

3. run tests,

4. sign the IDE product for macOS and package it in a .dmg file,

6. move all artifacts older than 7 days from download.eclipse.org to archive.eclipse.org.

Details about all the above steps can be found in the Jenkinsfile eclipse-nightly.jenkinsfile, located in the root folder of the N4JS source repository on GitHub.

The most accurate documentation for our JIPP can be found at https://wiki.eclipse.org/IT_Infrastructure_Doc. Note that many other documents do not apply to our JIPP, at the moment, as they refer to the new infrastructure, e.g. https://wiki.eclipse.org/CBI and https://wiki.eclipse.org/Jenkins.

3.2. Build the N4JS IDE from command line

Ensure you have

• Java 11

• Maven 3.2.x and

• Node.js 8

Clone the repository

git clone https://github.com/Eclipse/n4js.git

Change to the n4js folder:

cd n4js

Run the Maven build:

mvn clean verify

You may have to increase the memory for maven via export MAVEN_OPTS="-Xmx2048m" (Unix) or set MAVEN_OPTS="-Xmx2048m" (Windows).

Available optional maven profiles are:

buildProduct

create IDE products (Windows, macOS, Linux) and a jar for headless compilation

execute-plugin-tests

run OSGi tests (without UI)

execute-plugin-ui-tests

run UI-based OSGi tests

execute-ecmas-tests

run ECMA test suite

execute-smoke-tests

run generated tests using corrupted source code as input

execute-accesscontrol-tests

run generated tests for checking accessibility of class/interface members

execute-hlc-integration-tests

run integration tests using the headless jar (requires docker!)

Available system properties:

noTests

suppress execution of all tests

startAndKeepVerdaccio

enforce starting and suppress stopping of the test verdaccio (see Test Verdaccio containing n4js-libs)

For extending the N4JS-language in a different project, the org.eclipse.n4js.releng.util module needs to be published as a maven-plugin. You can deploy this SNAPSHOT-artifact to a local folder by providing the local-snapshot-deploy-folder-property pointing to an absolute path in the local file system:
mvn clean deploy -Dlocal-snapshot-deploy-folder=/var/lib/my/folder/local-mvn-deploy-repository

The existence of local-snapshot-deploy-folder will trigger a profile enabling the deploy-goal for the project org.eclipse.n4js.releng.util

3.2.2. Test Verdaccio containing n4js-libs

If profile execute-hlc-integration-tests is active, a local verdaccio instance is started and populated with freshly-compiled n4js-libs (the libraries located under top-level folder /n4js-libs) and is stopped before the end of the build. The verdaccio instance is started as a docker container called n4js-test-verdaccio.

When giving -DstartAndKeepVerdaccio on the command line, such a test verdaccio will always be started/populated but never stopped, regardless of whether profile execute-hlc-integration-tests is active or not. This is useful to enforce starting of the test verdaccio (even without running integration tests) and then reusing it in subsequent builds.

3.2.3. Generation of Eclipse help for spec and design document

The HTML pages for N4JSSpec and N4JSDesign documents are generated from the Asciidoc sources in the project org.eclipse.n4js.spec org.eclipse.n4js.design by Asciispec.

Figure 1. The process of creating Eclipse help for N4JSSpec

Figure The process of creating Eclipse help for N4JSSpec shows the generation process for N4JSSpec document. The process for N4JSDesign (and other adoc documents) is the same. The following explains the diagram.

• Asciispec is used to compile the source N4JSSpec Asciidoc into a single large N4JSSpec.html file which contains all the chapters. The use of the custom parameter -a eclipse-help-mode indicates that a special header and footer styles as well as CSS style should be used (i.e. no table of content menu, no download links etc.). Here, we are using the possibility provided by Asciidoctor to configure header/footer as well as CSS style via parameter :docinfodir: and :stylesheet:.

• Our custom tool Chunker splits N4JSSpec.html (and other documents) into multiple chunked HTML files, each of which corresponds to either the index file or a chapter. It automatically re-writes internal links.

• Another custom tool EclipseHelpTOCGenerator takes to Docbook file N4JSSpec.xml and generates an XML file describing the table of content (TOC) in the Eclipse format. This TOC file references the chunked HTML files above.

• Another custom tool IndexTocGenerator takes to Docbook file N4JSSpec.xml similar to EclipseHelpTOCGenerator, but it generates an HTML fragment which can be embedded into the index.html page generated by the Chunker (Thus it has to run before the Chunker in that case).

3.3. Updating frameworks and dependencies

3.3.1. Update of Eclipse, EMF, Xtext, etc.

For updating the N4JS IDE to a new version of Eclipse, EMF, Xtext, etc. follow these steps:

1. Create a new branch.

2. Bump versions of all dependencies mentioned in file N4JS.setup:

1. Update all labels that refer to the version of the Ooomph setup (search for "label!" to find them).

2. Choose a new Eclipse version and define this in N4JS.setup.

3. For those other dependencies that come with Eclipse (e.g. EMF, Xtext) find out which version matches the chosen Eclipse version and define that version in N4JS.setup.
Tip: use the contents list of the SimRel you are targeting, e.g. https://projects.eclipse.org/releases/2019-03

4. For those other dependencies that are available via the Eclipse Orbit, find out which version is the latest version available in the Orbit and define that version in N4JS.setup.
(choose the correct link for the chosen Eclipse version!)

5. For all remaining dependencies (i.e. unrelated to Eclipse and not in Orbit), choose a version to use and define it in N4JS.setup.

3. Check Require-Bundle sections of MANIFEST.MF files by searching for related bundle names or for ;bundle-version=":

1. There should be at most one version constraint for a specific bundle
NOTE: the version constraints in the MANIFEST.MF files are just lower bounds and - at this time - we do not bump them to the latest version, in most cases.

2. There should be no version constraints to our bundles (i.e. org.eclipse.n4js…​)

4. Review parent pom.xml files, i.e. releng/org.eclipse.n4js.parent/pom.xml:

1. Update property xtext-version.

2. Check all other *-version properties and update them where needed.

5. Update target platform file org.eclipse.n4js.targetplatform.target using Ooomph’s auto-generation:

1. Start the Eclipse Installer.

2. Update the Eclipse Installer (using the button with the turning arrows).

3. On the second page, add the N4JS.setup file from your branch to the Eclipse Installer, using a GitHub raw(!) URL:
https://raw.githubusercontent.com/eclipse/n4js/BRANCH_NAME/releng/org.eclipse.n4js.targetplatform/N4JS.setup

4. Ooomph a new development environment with this setup.

5. In the new Eclipse workspace created by Ooomph, the target platform file should have uncommitted changes:

1. carefully review these changes, to be sure they make sense, and then

2. commit & push those changes to your branch.

6. Thoroughly test the new versions, including some manual(!) tests:

1. Run Jenkins builds.

2. Ooomph another N4JS development environment with Eclipse Installer. This time, after Ooomphing is completed, the target platform file should no longer have any uncommitted changes.

3. Ensure the following types of tests can be executed locally in the newly installed Eclipse:

1. plain JUnit tests (e.g. org.eclipse.n4js.lang.tests).

2. Plugin tests.

3. Plugin UI tests.

4. SWTBot tests.

5. Xpect tests (individual files and entire bundles; e.g. org.eclipse.n4js.spec.tests).

6. Xpect UI tests.

4. Ensure an N4JS IDE product can be launched from within the newly installed Eclipse using the launch configuration provided in the n4js repository.

5. After launching the N4JS IDE product, refresh the workspace and review/commit any changes in file N4JS__IDE.launch.

6. Download a product created in a Jenkins CI build and test it manually.

7. After merging to master: download a product created in a nightly build and test it manually. Ensure signing and JRE bundling are still working properly.

All the above steps need to be performed in the n4js-n4 repository, accordingly (e.g. file N4JS-N4.setup).

3.3.2. Update of the embedded JRE

For updating the embedded JRE inside the N4JS IDE follow these steps:

1. Given a new JRE download location for Linux, MacOS and Windows with a common new version

2. Update the location related properties in the pom.xml files of

1. n4js/builds/pom.xml

2. n4js/builds/org.eclipse.n4js.jre.linux.gtk.x86_64/pom.xml

3. n4js/builds/org.eclipse.n4js.jre.macosx.cocoa.x86_64/pom.xml

4. n4js/builds/org.eclipse.n4js.jre.win32.win32.x86_64/pom.xml

3. Update the versions at all following locations:

1. n4js/builds/org.eclipse.n4js.jre.linux.gtk.x86_64/META-INF/MANIFEST.MF

2. n4js/builds/org.eclipse.n4js.jre.linux.gtk.x86_64/META-INF/p2.inf

3. n4js/builds/org.eclipse.n4js.jre.macosx.cocoa.x86_64/META-INF/MANIFEST.MF

4. n4js/builds/org.eclipse.n4js.jre.macosx.cocoa.x86_64/META-INF/p2.inf

5. n4js/builds/org.eclipse.n4js.jre.win32.win32.x86_64/META-INF/MANIFEST.MF

6. n4js/builds/org.eclipse.n4js.jre.win32.win32.x86_64/META-INF/p2.inf

4. Update the openjdk docker image used as base image in the "FROM" line at the top of all docker files:

1. n4js-n4/jenkins/docker-build/Dockerfile

4. Tips and Tricks

In this chapter we collect some coding hints and guidelines on how to properly use the APIs of Eclipse, EMF, Xtext and other dependencies we are using, as well as our own utilities and helpers.

This chapter is only about coding; add information on things like Eclipse setup or Maven/Jenkins to one of the preceding chapters. Similarly, this chapter is intended to provide just a quick overview, check-list and reminder; add detailed information and diagrams to one of the succeeding chapters.

4.1. Naming

• The internal handling of N4JS project names is non-trivial (due to the support for npm scopes), see API documentation of ProjectDescriptionUtils#isProjectNameWithScope(String) for a detailed overview. In short:

• IN4JSProject#getProjectName() and IProject#getName() return different values!

• Avoid using the Eclipse project name, i.e. the return value of IProject#getName(), as far as possible (only use it in UI code when actually dealing with what is shown in the Eclipse UI).

• The last segment of an URI or path pointing to an N4JS project is not always the project name; use utilities in ProjectDescriptionUtils instead, e.g. #deriveN4JSProjectNameFromURI()! (However, given an URI or path pointing to a file inside an N4JS project, you can use its last segment to obtain the file name.)

4.2. Logging

In many situations developer needs to use some kind of logging. When in need, follow these rules:

1. Use org.apache.log4j.Logger; for logging. Other logging utilities (like java built in logger) are not configured.

2. do not use System.out nor Sysetem.err for logging. It is ok to use it for debugging purposes, but those calls should never be merged to master. (with exception of headless compiler, which uses them explicitly)

3. There is central logger configuration in org.eclipse.n4js.utils.logging (and org.eclipse.n4js.utils.logging) that should be used

1. log4j.xml used for production

2. log4j_tests.xml used when running tests

4. in Eclipse run configurations logger has to be set properly, e.g. log4j.configuration=file:${workspace_loc:org.eclipse.n4js.utils.logging/log4j_tests.xml} 5. in maven configurations logger has to be set separately, e.g. -Dlog4j.configuration="file:${basedir}/../../plugins/org.eclipse.n4js.utils.logging/log4j_tests.xml

4.3. Cancellation Handling

At various occasions, Xtext provides an instance of class CancelIndicator to allow our code to handle cancellation of long-running task.

Some things to keep in mind:

• whenever a CancelIndicator is available any code that might not return immediately should implement proper cancellation handling (as explained in the next items).

• most importantly: reacting to a cancellation by returning early from a method is an anti-pattern that leads to problems (client code might continue work on a canceled and thus invalid state); instead: throw an OperationCanceledException!

• don’t use CancelIndicator#isCanceled() for cancellation handling, except in certain special cases. A valid exception case might be during logging to show a message like "operation was canceled".

• instead, inject the Xtext service called OperationCanceledManager and invoke its method #checkCanceled(), passing-in the cancel indicator (this method is null-safe; it will throw an OperationCanceledException in case a cancellation has occurred). Don’t directly create and throw an OperationCanceledException yourself.

• use the other methods provided by OperationCanceledManager when appropriate (see code of that class for details).

• in try/catch blocks, when catching exceptions of a super type of OperationCanceledException, be sure to not suppress cancellation exceptions. For example:

// Java code
@Inject private OperationCanceledManager operationCanceledManager;
/** Returns true on success, false otherwise. */
public boolean doSomething(CancelIndicator ci) {
try {
// do something that might be canceled
return true;
} catch(Exception e) {
operationCanceledManager.propagateIfCancelException(e); // <- IMPORTANT!
return false;
}
}

Try/finally blocks, on the other hand, do not need any special handling.

• a cancel indicator can also be stored in the rule environment (see RuleEnvironmentExtensions#addCancelIndicator()). This means:

• if you create a rule environment completely from scratch and you have a cancel indicator at hand, add it to the rule environment via RuleEnvironmentExtensions#addCancelIndicator() (not required when using RuleEnvironmentExtensions#wrap() for deriving a rule environment from an existing one).

• if you have a rule environment available, be sure to use its cancel indicator in long-running operations, i.e. with code like:

// Xtend code
import static extension org.eclipse.n4js.typesystem.utils.RuleEnvironmentExtensions.*
class C {
@Inject private OperationCanceledManager operationCanceledManager;
def void doSomething() {
for(a : aLotOfStuff) {
operationCanceledManager.checkCanceled(G.cancelIndicator);
// main work ...
}
}

4.4. Caching

• Caching of external libraries (implemented in ExternalProjectMappings)

• always mind that the diff of current state and cached state is a necessary information for cleaning dependencies of removed npms

• see EclipseExternalIndexSynchronizer#synchronizeNpms() for implementation

• updating also happens when external root locations change (see ExternalIndexUpdater)

• Caching of user workspace projects (implemented in MuliCleartriggerCache)

• caches only some project information and should be refactored along with Core, Model and EclipseBasedN4JSWorkspace

4.5. Dependency Injection

There are some things to keep in mind when using dependency injection in the context of Xtext. This is a longer topic and it is discussed in the appendix Xtext Injection.

4.6. Miscellaneous

• Resource load states: when an N4JS/N4JSD file is loaded, a certain sequence of processing is triggered (parsing, linking, validation, etc.) and thus an N4JSResource transitions through a sequence of "load states". For details, see N4JS Resource Load States.

5. Parser

Some of the concepts described here were presented at EclipseCon 2013 and XtextCon 2014. Note that the material presented at the linked videos may be outdated.

5.1. Overview

The parser is created from an Xtext grammar. Actually, there are several grammars used as shown in Figure CD Grammars. These grammars and the parsers generated from them are described more closely in the following sections.

Figure 2. N4 Grammars

5.2. N4JS Parser

One of the most tricky parts of JavaScript is the parsing because there is a conceptual mismatch between the ANTLR runtime and the specified grammar. Another challenge is the disambiguation of regular expressions and binary operations. Both features require significant customizing of the generated parser (see figure below).

Figure 3. Overview custom parser implementation (runtime only)

5.3. Parser Generation Post-Processing

The ANTLR grammar that is generated by Xtext is post-processed to inject custom code into the grammar file before it is passed to the ANTLR tool. This is required in particular due to ASI (Automated Semicolon Insertion), but for some other reasons as well.

Actually, there are several injections:

1. Due to Xtext restrictions, the generated ANTLR grammar file (*.g) is modified. This means that some some additional actions are added and some rules are rewritten.

2. Due to ANTLR restrictions, the generated ANTLR Java parser (*.java) os modified. This means that some generated rules are slightly modified to match certain requirements.

3. Due to Java restrictions, the generated Java parser needs to be preprocessed in order to reduce the size of certain methods since they must not exceed 64k characters. This is implemented by means of an MWE fragment, activated after the other post processing steps are done.

The first two steps are handled by AntlrGeneratorWithCustomKeywordLogic, which is configured with additional helpers in GenerateN4JS.mwe2. shows the customized classes which modify the code generation. These classes are all part of the releng.utils bundle.

Figure 4. Class Diagram Parser Generation

5.3.1. Automatic Semicolon Insertion

The EcmaScript specification mandates that valid implementations automatically insert a semicolon as a statement delimiter if it is missing and the input file would become invalid due to the missing semicolon. This is known as ASI. It implies that not only valid implementations have to perform this, but a valid parser has to mimic this behavior in order to parse executable code. The ASI is implemented by two different means.

The parser’s error recovery strategy is customized so it attempts to insert a semicolon if it was expected. Both strategies have to work hand in hand in order to consume all sorts of legal JavaScript code.

5.3.1.1. Injected code in the Antlr grammar file

Under certain circumstances, the parser has to actively promote a token to become a semicolon even though it may be a syntactically a closing brace or line break. This has to happen before that token is consumed thus the rules for return statements, continue statements and break statements are enhanced to actively promote these tokens to semicolons.

The same rule is applied to promote line breaks between an expression and a possible postfix operator ++ or . At this location the line break is always treated as a semicolon even though the operator may be validly consumed and produce a postfix expression.

In both cases, the method promoteEOL() is used to move a token that may serve as an automatically injected semicolon from the so called hidden token channel to the semantic channel. The hidden tokens are usually not handled by the parser explicitly thus they are semantically invisible (therefore the term hidden token). Nevertheless, they can be put on the semantic channel explicitly to make them recognizable. That’s implemented in the EOL promotion. The offending tokens include the hidden line terminators and multi-line comments that include line breaks. Furthermore, closing braces (right curly brackets) are included in the set of offending tokens as well as explicit semicolons.

5.3.1.2. Customized error recovery

Since the EOL promotion does not work well with Antlr prediction mode, another customization complements that feature. As soon as an invalid token sequence is attempted to be parsed and missing semicolon would make that sequence valid, an offending token is sought and moved to the semantic channel. This is implemented in the custom recovery strategy.

5.3.2. Async and No line terminator allowed here Handling

There is no way of directly defining No line terminator allowed here. This is required not only for ASI, but also for async. This requires not only a special rule (using some rules from ASI), but also a special error recovery since the token ’async’ may be rejected (by the manually enriched rule) which is of course unexpected behavior from the generated source code.

5.3.3. Regular Expression

The ANTLR parsing process can basically be divided into three steps. First of all, the file contents has to be read from disk. This includes the proper encoding of bytes to characters. The second step is the lexing or tokenizing of the character stream. A token is a basically a typed region in the stream, that is a triplet of token-id, offset and length. The last step is the parsing of these tokens. The result is a semantic model that is associated with a node tree. All necessary information to validate the model can be deduced from these two interlinked representations.

Figure 5. Simplified visualization of the parsing

Since the default semantics and control flow of Antlr generated parsers do not really fit the requirements of a fully working JavaScript parser, some customizations are necessary. Regular expression literals in JavaScript cannot be syntactically disambiguated from div operations without contextual information. Nevertheless, the spec clearly describes, where a regular expression may appear and where it is prohibited. Unfortunately, it is not possible to implement these rules in the lexer alone, since it does not have enough contextual information. Therefore, the parser has been enhanced to establish a communication channel with the lexer. It announces when it expects a regular expression rather than a binary operation.

This required a reworking of the Antlr internals. Instead of a completely pre-populated TokenStream, the parser works on a lazy implementation that only reads as many characters as possible without a disambiguation between regular expression literals and divide operators.

Only after the parser has read this buffered tokens and potentially announced that it expects a regular expression, another batch of characters is processed by the lexer until the next ambiguous situation occurs. This is fundamentally different from the default behavior of Antlr.

Figure 6. Abstract control and object flow during parsing

shows the involved classes which allow for this lexer-parser communication.

Figure 7. Class Diagram Parser-Lexer Communication

5.3.4. Unicode

Unicode support in JavaScript includes the possibility to use unicode escape sequences in identifiers, string literals and regular expression literals. Another issue in this field is the specification of valid identifiers in JavaScript. They are described by means of unicode character classes. These have to be enumerated in the terminal rules in order to fully accept or reject valid or invalid JS identifiers.

For that purpose, a small code generator is used to define the terminal fragments for certain unicode categories. The UnicodeGrammarGenerator basically iterates all characters from Character.MIN_VALUE to Character.MAX_VALUE and adds them as alternatives to the respective terminal fragments, e.g. UNICODE_DIGIT_FRAGMENT.

The real terminal rules are defined as a composition of these generated fragments. Besides that, each character in an identifier, in a string literal or in a regular expression literal may be represented by its unicode escape value, e.g.  u0060. These escape sequences are handled and validated by the IValueConverter for the corresponding terminal rules.

The second piece of the puzzle are the unicode escaped sequences that may be used in keywords. This issue is covered by the UnicodeKeywordHelper which replaces the default terminal representation in the generated Antlr grammar by more elaborated alternatives. The keyword if is not only lexed as ’if’ but as seen in snippet Terminal if listing.

Terminal if
If :
( 'i' | '\\' 'u' '0   0   6   9' )
( 'f' | '\\' 'u' '0   0   6   6' );

5.3.5. Literals

Template literals are also to be handled specially, see TemplateLiteralDisambiguationInjector for details.

5.4. Modifiers

On the AST side, all modifiers are included in a single enumeration N4Modifier. In the types model however, the individual modifiers are mapped to two different enumerations of access modifiers (namely TypeAccessModifier and MemberAccessModifier) and a number of boolean properties (in case of non-access modifiers such as abstract or static). This mapping is done by the types builder, mostly by calling methods in class ModifierUtils.

The grammar allows the use of certain modifiers in many places that are actually invalid. Rules where a certain modifier may appear in the AST are implemented in method isValid(EClass,N4Modifier) in class ModifierUtils and checked via several validations in N4JSSyntaxValidator. Those validations also check for a particular order of modifiers that is not enforced by the grammar.

See API documentation of enumeration N4Modifier in file N4JS.xcore and the utility class ModifierUtils for more details.

5.5. Conflict Resolutions

5.5.1. Reserved Keywords vs. Identifier Names

Keywords and identifiers have to be distinguished by the lexer. Therefore, there is no means to decide upfront whether a certain keyword is actually used as a keyword or whether it is used as an identifier in a given context. This limitation is idiomatically overcome by a data type rule for valid identifiers. This data type rule enumerates all keywords which may be used as identifiers and the pure IDENTIFIER terminal rule as seen in Keywords as Identifier listing.

Keywords as Identifier
N4JSIdentifier: IDENTIFIER
| 'get'
| 'set'
...
;

5.5.2. Operators and Generics

The ambiguity between shift operators and nested generics arises also from the fact, that Antlr lexer upfront without any contextual information. When implemented naively, the grammar will be broken, since a token sequence a>>b can either be part of List<List<a>> b or it can be part of a binary operation int c = a >> b. Therefore the shift operator may not be defined with a single token but has to be composed from individual characters (see Shift Operator listing).

Shift Operator listing
ShiftOperator:
'>' '>' '>'?
| '<' '<'
;
This section may be outdated!

The CA parser also needs adjustments for supporting automatic semicolon insertion and regular expressions. Instead of modifying the CA parser generator similar to the normal parser, the former reuses parts of the latter as far as possible. That is, the token sequence that is produced during production parsing is used as is for the content assist parser. Semicolons have already been inserted where appropriate and regular expression are successfully distinguished from divide operators.

Since the n4js grammar uses syntactic predicates, the content assist parser is compiled with backtracking enabled. This is always the case for Xtext’s CA parsers that rely on backtracking or predicates (local backtracking) in the production parser. This approach is both good (CA works in general) and bad (unpredictable decisions in case of error at locations prior to the cursor). Since parsing with backtracking enabled makes for a fundamental difference in how the prediction and parsing works and how the parser decides which decision paths to take, the customization patterns from the production parser are not applied 1:1 to the CA parser, but adapted instead. The content assist parser doesn’t use a freshly lexed token stream with unicode support, ASI or regular expression literals, but instead uses a synthesized token sequence which is rebuilt from the existing node model.

The token stream that is consumed by the content assist parser is therefore not created by a lexer but by the org.eclipse.n4js.ui.contentassist.NodeModelTokenSource. It traverses the existing node model that is contained in the resource and was produced by the production parser. This approach has the significant advantage that any decision that was made by that parser is also immediately applicable to the content assist infrastructure. For that purpose, the leaf nodes of the node model are mapped to ANTLR token types. This is achieved by the org.eclipse.n4js.ui.contentassist.ContentAssistTokenTypeMapper which is capable to provide the untyped ANTLR token type (primitive int) for a given grammar element.

Special considerations have been made for the last token in the produced source. If it overlaps with an existing leaf node but does not fully cover it, the plain Antlr lexer is used to consume the prefix that is overlapping. Since the terminals will never overlap with each other the longest match always wins without backtracking in the lexer, it is save to assume that only one token is produced from the prefix. The very last token in the org.eclipse.n4js.ui.contentassist.NodeModelTokenSource is always the EOF token (org.antlr.runtime.Token.EOF_TOKEN).

Given that the token source is equal to the prefix in the production token source, some more thought has to be put into the synthesized end of file. The production parser used the complete file to decide where to automatically insert a semicolon and where not to. This would potentially change if there was another token next to the artificial EOF. Therefore, two cases have to considered. The first one describes CA request next to an automatically inserted semicolon and the second one describes CA requests at a position where a semicolon could have been inserted if the token to the right was another one. The org.eclipse.n4js.ui.contentassist.CustomN4JSParser reflects these cases. Heuristics are applied to the end of the token sequence to decide whether a second pass has to be performed to collect yet more following elements. Based on the concrete sequence, the last automatically inserted semicolon is removed from the sequence prior to the second pass or such is a token is explicitly synthesized and appended. Besides the second pass, another special treatment is made for postfix expressions. Those may not be interrupted by a hidden semicolon so those are filtered from the resulting follow set if appropriate.

The parser is used by the org.eclipse.n4js.ui.contentassist.ContentAssistContextFactory where all relevant entry points from the super class are specialized to pass the node model in the the parser facade (org.eclipse.n4js.ui.contentassist.CustomN4JSParser). In that sense, the ContentAssistContextFactory serves as a drop-in replacement binding the default ParserBasedContentAssistContextFactory.StatefulFactory.

6. Type System

6.1. Type Model and Grammar

The type model is used to define actual types and their relations (meta-model is defined by means of Xcore in file Types.xcore) and also references to types (meta-model in TypeRefs.xcore). The type model is built via the N4JSTypesBuilder when a resource is loaded and processed, and most type related tasks work only on the type model. Some types that are (internally) available in N4JS are not defined in N4JS, but instead in a special, internal type language not available to N4JS developers, called N4TS and defined in file Types.xtext.

The types are referenced by AST elements; vice versa the AST elements can be referenced from the types (see SyntaxRelatedTElement). This backward reference is a simple reference to an EObject.

6.1.1. Type Model Overview

The following figure, Types and Type References, shows the classes of the type model and their inheritance relations, both the actual type definitions as defined in Types.xcore and the type references defined in TypeRefs.xcore. The most important type reference is the ParameterizedTypeRef; it is used for most user-defined references, for both parameterized and non-parameterized references. In the latter case, the list of type arguments is empty.

Figure 8. Type Model Overview: Types in the upper half and Type References in the lower half.

Most types are self-explanatory. TypeDefs is the container element used in N4TS. Note that not all types and properties of types are available in N4JS – some can only be used in the N4TS language or be inferred by the type system for internal purposes. Some types need some explanation:

• TObjectPrototype: Metatype for defining built-in object types such as Object or Date, only available in N4TS.

• VirtualBaseType: This type is not available in N4JS. It is used to define common properties provided by all types of a certain metatype. E.g., it is used for defining some properties shared by all enumerations (this was the reason for introducing this type).

We distinguish four kinds of types as summarized in Kind Of Types. Role is an internal construct for different kind of users who can define the special kind of type. The language column refers to the language used to specify the type; which is either N4JS or N4TS.

Table 1. Kind of Types
Kind Language Role Remark

user

N4JS

developer

User defined types, such as declared classes or functions. These types are to be explicitly defined or imported in the code.

library

N4JSD

developer