How to support a new unicode version in the scanner: | |
1) Go to http://www.unicode.org/Public/ | |
2) Select the folder that corresponds to the unicode version for which you want to generate the scanner resource files | |
3) Select the ucdxml folder and download the file called ucd.all.flat.zip. | |
4) Unzip that file on your disk. This creates a file called ucd.all.flat.xml. | |
5) To generate the resource files for identifier starts, you need to invoke | |
org.eclipse.jdt.core.internal.tools.unicode.GenerateIdentifierStartResources with the following arguments: | |
- first argument: unicode version | |
- second argument: path to the ucd.all.flat.xml file. | |
- third argument: folder in which the resource files will be generated | |
For example: | |
8.0 c:/unicode8.0.0/ucd.all.flat.xml c:/unicode8.0.0/res | |
6) To generate the resource files for identifier parts, you need to invoke | |
org.eclipse.jdt.core.internal.tools.unicode.GenerateIdentifierPartResources with the same arguments used previously. | |
7) Once this is done, you need to edit org.eclipse.jdt.internal.compiler.parser.ScannerHelper to add a new table for the new unicode support. | |
For example: | |
- add the new method: | |
static void initializeTable19() { | |
Tables9 = initializeTables("unicode8"); //$NON-NLS-1$ | |
} | |
- add the new static field Tables9. | |
- add a new folder unicode8 as a sub folder of org/eclipse/jdt/internal/compiler/parser/. | |
- put into this folder all resource files generated in step 5 and 6. | |
- modify | |
org.eclipse.jdt.internal.compiler.parser.ScannerHelper.isJavaIdentifierPart(long, int) | |
org.eclipse.jdt.internal.compiler.parser.ScannerHelper.isJavaIdentifierStart(long, int) | |
To use the new Tables9 values based on the compliance value by adding a new else if condition. | |
For org.eclipse.jdt.internal.compiler.parser.ScannerHelper.isJavaIdentifierPart(long, int) this becomes | |
The last else becomes an else if that supports the previous 1.8 compliance | |
else if (complianceLevel <= ClassFileConstants.JDK1_8) { | |
// java 7 supports Unicode 6.2 | |
if (Tables8 == null) { | |
initializeTable18(); | |
} | |
switch((codePoint & 0x1F0000) >> 16) { | |
case 0 : | |
return isBitSet(Tables8[PART_INDEX][0], codePoint & 0xFFFF); | |
case 1 : | |
return isBitSet(Tables8[PART_INDEX][1], codePoint & 0xFFFF); | |
case 2 : | |
return isBitSet(Tables8[PART_INDEX][2], codePoint & 0xFFFF); | |
case 14 : | |
return isBitSet(Tables8[PART_INDEX][3], codePoint & 0xFFFF); | |
} | |
} else { | |
// java 9 supports Unicode 8 | |
if (Tables9 == null) { | |
initializeTable19(); | |
} | |
switch((codePoint & 0x1F0000) >> 16) { | |
case 0 : | |
return isBitSet(Tables9[PART_INDEX][0], codePoint & 0xFFFF); | |
case 1 : | |
return isBitSet(Tables9[PART_INDEX][1], codePoint & 0xFFFF); | |
case 2 : | |
return isBitSet(Tables9[PART_INDEX][2], codePoint & 0xFFFF); | |
case 14 : | |
return isBitSet(Tables9[PART_INDEX][3], codePoint & 0xFFFF); | |
} | |
} | |
8) Do the same set of changes for org.eclipse.jdt.internal.compiler.parser.ScannerHelper.isJavaIdentifierStart(long, int). | |
9) You need to add a regression test class in org.eclipse.jdt.core.tests.compiler.regression similar to org.eclipse.jdt.core.tests.compiler.regression.Unicode18Test. | |
You can get the character value for the regression test by checking the ucd.all.flat.xml file and searching for an entry that has the age parameter equals to the | |
unicode version you want to check (i.e. for unicode 8, age="8.0"). | |
If you have any questions regarding this tool, please comment in the bug report 506870: https://bugs.eclipse.org/bugs/show_bug.cgi?id=506870 |