The library can therefore be used to parse XML and HTML in a very easy and maintanable way.
You can simply install the latest version of artoo.js with npm:
npm install artoo-js
Or if you need the latest development version:
npm install git+https://github.com/medialab/artoo.git
Then require it in any of your scripts likewise:
var artoo = require('artoo-js');
When using artoo.js within node.js, the library will switch to cheerio instead of jQuery.
If you are not familiar with artoo.js' scraping utilities, you should go read the quick start before continuing.
There are two ways of using scraping utilities along with cheerio
You can bootstrap your cheerio object so you can use the
var artoo = require('artoo-js'), cheerio = require('cheerio'); // Bootstrapping cheerio artoo.bootstrap(cheerio); var $ = cheerio.load(myXMLString); var data = $('ul > li').scrape(params);
Or else you can set artoo's own context to be a specific cheerio instance (a dollar variable typically). Please note that this is not ideal when dealing with asynchronous processes run in parallel.
var artoo = require('artoo-js'), cheerio = require('cheerio'); var $ = cheerio.load(myXMLString); // Setting artoo's context artoo.setContext($); var data = artoo.scrape('ul > li', params);
Note that you can also use
Most of the library's helpers, writers and parsers included, can be used within node.js.
var artoo = require('artoo-js'); var tabularData = [['John', 'Tell'], ['Mary', 'Proudlike']]; var csvString = artoo.writers.csv(tabularData);
When artoo.js is built for npm, it bundles some other version of the library such as the one running with the Chrome extension and the Phantom.js one.
You can access their paths in node likewise if needed:
var artoo = require('artoo-js'); artoo.paths.browser; artoo.paths.chrome; artoo.paths.phantom;