Ejscript has just released a new version and is on a roll with great performance advances. Ejscript has been tuned from the bottom up adding a powerful new memory manager, smart generational garbage collection, object shaping and streamlining the VM interpreter and byte codes. See the Change Log for details and or Download.
Ejscript Language Architecture
The Ejscript virtual machine is a portable, cross-platform virtual machine optimized for embedded use outside browsers. It has a compact byte code that minimizes memory use and supports cached compilation of scripts for repeated execution. It is a direct-threaded, stack-based, early-binding, high-level, type-less byte code, cloning, composite native classed VM. Each of these attributes is described below.
Direct-threaded refers not to multi-threading, but to an approach in VM design where the VM inner loop will jump directly from opcode to opcode without a central dispatch loop. Direct threading can sometimes have an adverse effect on CPU instruction caching, but this can be offset by having a high level byte code that can invoke operations directly on properties without having a fetch-do-store cycle. We aim to further mitigate cache depletion by the use of inline native code generation in a future release.
Ejscript uses a stack based expression stack with bypass opcodes. This is a blend of the stack-based and register-based approaches to VM design. By using high-level byte codes that can access, manipulate and call properties directly, and bypass the expression stack, Ejscript avoids the overhead of a stack-based VM and gains many of the benefits of a register-based approach.
The VM makes extensive use of early binding to resolve property references into direct references and has many dedicated op codes for fast property access. The Ejscript compiler tries wherever possible to bind property references to property slot offsets and uses these optimized opcodes for such accesses. This is used not just for load/store operations but also for function calls and object creation.
Ejscript optimizes the creation of classes and objects. It detects when objects are being created using prototype based inheritance and then transparently creates backing classes. These classes allow very fast property access and quick repeat object allocations.
High Level, Type-Less, Byte Code
The Ejscript byte code learns from the best of thinking in recent VM byte code design and has a high-level, type-less byte code with out-of-band exception handling. What is somewhat unusual about the Ejscript VM design, is the lack of dedicated type byte codes. There are no string or number byte codes — all types are treated equally. This has freed up large swathes of byte codes to be allocated to optimized property access byte codes. The compiler uses the optimized early binding byte codes to streamline data access and the VM ensures valid binding at run-time. The net result is a very compact byte code and tiny program footprints. This is ideal for embedded web applications and translates into fast loading and response.
Ejscript supports multiple interpreters with defined data interchange mechanisms. This ensures security where the data from one interpreter is not visible from another interpreter. Ejscript also supports rapid cloning of interpreters from a master interpreter. This is used in the Ejscript web framework to quickly create an interpreter to service a web request. Memory footprint and garbage collection overhead is minimized by read-only sharing of system types and classes between interpreters.
Composite Native Classes
Ejscript supports composite native types and classes that can be created in native C code. A composite native class is one where the properties of the class are not represented as discrete Ejscript properties that exist in the VM in their own right, but rather as native system types. This can dramatically compress native classes down to the minimum storage required to represent the objects state.
Composite native classes have several benefits, they:
- run much faster because they can access properties natively without going via the VM.
- have memory footprint requirements that are much smaller, typically more than a 10x reduction in size.
For example: A video class may have height and width coordinate properties. In a normal scripted or native class, these would be discrete Ejscript numeric objects with their own storage. In a composite native class, the class will store these internally as packed, native integers.
Ejscript can support composite native types because the VM and its native type interface delegates down operations to the native types themselves. This is in keeping with the high-level byte code philosophy.
The Ejscript garbage collector is a generational, non-compacting, mark and sweep collector. It is optimized for use in embedded applications – particularly when it is embedded inside web servers.
The garbage collector is generational in that it understands the ages of objects and focuses on collecting more recent garbage first and most aggressively. This results in short, quick collection cycles. This works especially well for web applications where the per-request interpreters are cloned from a master interpreter. In this case, most of the long lived generation of system classes and types reside in a master interpreter and are best not considered for collection.
The collector is also integrated into the event mechanism so that collections will run during idle periods for longer running or event-based applications.
The garbage collector is non-compacting because for short duration applications the overhead of compacting memory outweighs the benefit. Furthermore, by using direct object references, early-binding and direct-threading the VM can have very optimized code paths when accessing object properties.
For web applications, more generalized garbage collectors are sometimes less suited as they are often optimized for long running applications, whereas web applications are typified by short running requests. For such short applications, the collector should not waste time doing unnecessary collections and may, in fact, not need to be run at all. The Ejscript garbage collector takes advantage of a virtual memory management strategy where arenas of memory can be instantly freed in one step at the completion of web requests and thus avoids collection in many cases.
When objects are collected, they are recycled via type-specific object pools that provide very rapid allocation of new object instances.
Future VM Enhancements
This VM is a good platform for future enhancements. The byte code was designed to support inline native code interleaving where the compiler can selectively generate native code blocks. The compiler will be able to generate this native code either when pre-compiling for maximum speed or on-the-fly as a just-in-time compiler.
The Ejscript compiler is an optimizing script compiler that generates optimized byte code. It has up to 4 passes depending on the mode of compilation:
- Conditional compilation and hoisting
- Early binding
- Code generation
The compiler supports conditional compilation based upon the evaluation of constant time expressions. The Ejscript compiler can run either stand-alone or be embedded in a shell or other program.
When the Ejscript compiler is be embedded into another application, it provides a one-step compile and execute cycle. The byte code output of the compiler can optionally be cached as module files or can be simply created, executed, and then discarded.
The ejs command shell uses this approach to provide an interactive Ejscript command shell. The Appweb web server uses this approach to host the Ejscript web framework for web applications.
Stand Alone Compilation
The Ejscript compiler can also operate in a stand-alone mode where scripts are compiled into optimized byte code in the form of module files. These compact module files can be distributed and executed by the VM without requiring further use of the compiler. This is often ideal for embedded applications where saving the memory footprint normally occupied by the compiler is a compelling advantage.
Compiling scripts into byte code and module files at development time has other advantages as well. The compiler can spend more time optimizing the program and thus generate better byte code. If the program will be run many, many times – spending some development cycles to produce the best code possible is well worth the effort. The code may also be more secure as the source code for the program is not being supplied – only the byte code.
Ejscript provides a module language directive and module loading mechanism that permits like sections of a program to be grouped into a single loadable module. The Ejscript module directive solves several problems for the script developer:
- How to package script files into a convenient package.
- How to provides a unique, safe namespace for declarations that is guaranteed not to clash with other modules.
- How to specify and resolve module dependencies.
- How to version and permit side-by-side execution of different versions of the same module.
The Ejscript module format provides a compressed archive of module declarations in a single physical file. The module file contains byte code, constant pool, debug code, exception handlers and class and method signatures. A single module file may actually contain multiple logical modules. This way, an application containing more than one module may still be packaged as a single module file.
The compiler has the ability to compile multiple source files and generate one file per module or optionally merge input files (or modules) into a single output module. The compiler thus functions as a link editor resolving module dependencies and ensuring the final module contains all the necessary dependency records.
When compiled with debug information enabled via the –debug switch, the compiler will store in the module file, debug information to enable symbolic debugging. Ejscript currently does not have a symbolic debugger, but trace output and C level debugging can inspect script source statements and line numbers.
Ejscript provides a safe namespace by using namespaces to qualify module names. The namespace for a module ensures that two declarations of the same name, but in different modules can be correctly identified by the compiler and VM.
Namespaces are mostly transparent to the user and module writer – except that the user imports the module by using a “use module” directive, and the module writer publishes a module by bracketing the code in a “module” directive.
The Ejscript module file format specifies any and all dependent modules that are required by the module. A module dependency is specified in the Ejscript source program via the “use module” directive. This instructs the compiler to load (or compile) that module and perform various compile time and run time checks. The VM module loader will load all dependent modules and initialize them in the appropriate order.
Module names are opaque, but by convention must uniquely qualify a module name. You can use a reverse domain naming convention to ensure uniqueness. It is anticipated that in the future, some form of module registry will exist to reserve unique names.
Side by Side Execution
Because modules load in dedicated and isolated namespaces, a module is guaranteed not to clash with other names. Module writers can version packages by adding a version namespace in addition to the module namespace. The user of the module can then put a “use namespace” directive in their code to request a specific version of the module.