File size: 6,173 Bytes
bc20498 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
This repository is only intended to be used as a reference implementation of Turndown's reference HTML parser.
It is not intended to be used as a general-purpose DOM implementation. No contributions other than bugfixes to
supported Turndown use case will be accepted.
# Server-side DOM implementation based on Mozilla's dom.js
This is a fork of [Angular Domino](https://github.com/angular/domino), which is fork of the original [Domino](https://github.com/fgnass/domino).
As the name might suggest, domino's goal is to provide a <b>DOM in No</b>de.
In contrast to the original [dom.js](https://github.com/andreasgal/dom.js) project, domino was not designed to run untrusted code. Hence it doesn't have to hide its internals behind a proxy facade which makes the code not only simpler, but also [more performant](https://github.com/fgnass/dombench).
Domino currently doesn't use any harmony/ES6 features like proxies or WeakMaps and therefore also runs in older Node versions.
## Speed over Compliance
Domino is intended for _building_ pages rather than scraping them. Hence Domino doesn't execute scripts nor does it download external resources.
Also Domino doesn't generally implement properties which have been deprecated in HTML5.
Domino sticks to [DOM level 4](http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#interface-attr), which means that Attributes do not inherit the Node interface.
<b>Note that</b> because domino does not use proxies,
`Element.attributes` is not a true JavaScript array; it is an object
with a `length` property and an `item(n)` accessor method. See
[github issue #27](https://github.com/fgnass/domino/issues/27) for
further discussion. It does however implement direct indexed accessors
(`element.attributes[i]`) and is live.
## CSS Selector Support
Domino provides support for `querySelector()`, `querySelectorAll()`, and `matches()` backed by the [Zest](https://github.com/chjj/zest) selector engine.
## Optimization
Domino represents the DOM tree structure in the same way Webkit and
other browser-based implementations do: as a linked list of children
which is converted to an array-based representation iff the
`Node#childNodes` accessor is used. You will get the best performance
from tree modification code (inserting and removing children) if you
avoid the use of `Node#childNodes` and traverse the tree using
`Node#firstChild`/`Node#nextSibling` (or
`Node#lastChild`/`Node#previousSibling`) or `querySelector()`/etc.
## Usage
Domino supports the DOM level 4 API, and thus API documentation can be
found on standard reference sites. For example, you could start from
MDN's documentation for
[Document](https://developer.mozilla.org/en-US/docs/Web/API/Document) and
[Node](https://developer.mozilla.org/en-US/docs/Web/API/Node).
The only exception is the initial creation of a document:
```javascript
var domino = require('domino');
var Element = domino.impl.Element; // etc
// alternatively: document = domino.createDocument(htmlString, true)
var h1 = document.querySelector('h1');
console.log(h1.innerHTML);
console.log(h1 instanceof Element);
```
There is also an incremental parser available, if you need to interleave
parsing with other processing:
```javascript
var domino = require('domino');
var pauseAfter = function(ms) {
var start = Date.now();
return function() { return (Date.now() - start) >= ms; };
};
var incrParser = domino.createIncrementalHTMLParser();
incrParser.write('<p>hello<');
incrParser.write('b>&am');
incrParser.process(pauseAfter(1/*ms*/)); // can interleave processing
incrParser.write('p;');
// ...etc...
incrParser.end(); // when done writing the document
while (incrParser.process(pauseAfter(10/*ms*/))) {
// ...do other housekeeping...
}
console.log(incrParser.document().outerHTML);
```
If you want a more standards-compliant way to create a `Document`, you can
also use [DOMImplementation](https://developer.mozilla.org/en-US/docs/Web/API/DOMImplementation):
```javascript
var domino = require('domino');
var domimpl = domino.createDOMImplementation();
var doc = domimpl.createHTMLDocument();
```
By default many domino methods will be stored in writable properties, to
allow polyfills (as browsers do). You can lock down the implementation
if desired as follows:
```javascript
global.__domino_frozen__ = true; // Must precede any `require('domino')`
var domino = require('domino');
```
## Tests
Domino includes test from the [W3C DOM Conformance Suites](http://www.w3.org/DOM/Test/)
as well as tests from [HTML Working Group](http://www.w3.org/html/wg/wiki/Testing).
When you checkout this repository for the first time, run the following command to also check out code for the mentioned tests:
```
git submodule update --init --recursive
```
The tests can be run via `npm test` or directly though the [Mocha](http://mochajs.org/) command line:

## License and Credits
The majority of the code was originally written by [Andreas Gal](https://github.com/andreasgal/) and [David Flanagan](https://github.com/davidflanagan) as part of the [dom.js](https://github.com/andreasgal/dom.js) project. Please refer to the included LICENSE file for the original copyright notice and disclaimer.
[Felix Gnass](https://github.com/fgnass/) extracted the code and turned
it into a stand-alone npm package.
The code has been maintained since 2013 by [C. Scott Ananian](https://github.com/cscott/) on behalf of the Wikimedia Foundation, which uses it in its
[Parsoid](https://www.mediawiki.org/wiki/Parsoid) project. A large number
of improvements have been made, mostly focusing on correctness,
performance, and (to a lesser extent) completeness of the implementation.
[1]: https://travis-ci.org/fgnass/domino.svg
[2]: https://travis-ci.org/fgnass/domino
[3]: https://david-dm.org/fgnass/domino.svg
[4]: https://david-dm.org/fgnass/domino
[5]: https://david-dm.org/fgnass/domino/dev-status.svg
[6]: https://david-dm.org/fgnass/domino#info=devDependencies
|