artoo.js The client-side scraping companion.

artoo.helpers


artoo.js’ helpers module’s aim is to provide you with some useful but miscellaneous function ranging from external script injection to array conversion.


Miscellaneous

Root helpers - callable from artoo

Standard helpers - callable from artoo.helpers

Parsers - callable from artoo.parsers

Writers - callable from artoo.writers

Cookies - callable from artoo.cookies

Custom console - callable from artoo.log


artoo.emitter

artoo.js uses the emmett library to provide its user with simple event emitting if needed.

var ee = new artoo.emitter();

ee.on('message', function(e) {
  console.log('received', e.data);
});

ee.emit('message', 'Hello world!');

Note also that artoo itself is an event emitter.

artoo.on('message', function(e) {
  console.log('received', e.data);
});

artoo.emit('message');

artoo.getGlobalVariables

Returns global variables set by the page itself.

This can be useful if you need to search for loopholes in the page’s JavaScript code.

Sometimes, you don’t even need to scrape as the developer forgot his/her variables containing the data you need in the global scope.

artoo.getGlobalVariables();

artoo.injectScript

Inject a remote script into the current webpage and trigger a callback when the browser has retrieved it.

artoo.injectScript(url, [callback]]);

Example

artoo.injectScript('//randomcdn/jquery-1.11.0.min.js', function() {
  console.log('Finished injecting jquery version 1.11.0');
});

artoo.injectStyle

Inject a remote stylesheet into the current webpage and trigger a callback when the browser has retrieved it.

artoo.injectStyle(url, [callback]]);

Example

artoo.injectStyle('//localhost:8000/style.css', function() {
  console.log('Finished injecting my custom css rules');
});

artoo.injectInlineStyle

Inject a css style string into the current webpage.

artoo.injectInlineStyle(styleString);

Example

artoo.injectInlineStyle('.my-class {color: red;}');

artoo.waitFor

Wait for some event to happen before triggering a callback. This is especially useful when you need, for instance, to wait for the webpage to load some elements before continuing to scrape.

artoo.waitFor(condition, callback, [params]);

Arguments

  • condition function : a function returning false if the event you are waiting for has not happen or true in the other case.
  • callback ?function : a function to trigger when your event has finally happened. Note that if callback is an object, it will be considered as the params argument.
  • params ?object : an object that may contain the following properties:
    • done ?function : same as callback argument.
    • interval ?integer [30] : Interval in milliseconds between each check of the condition.
    • timeout ?integer : Timeout in milliseconds. Callback will be called with first argument as a timeout Error.

Example

// We are waiting for a list to populate
var currentLength = $('.list-item').length;

artoo.waitFor(
  function() {
    return $('.list-item').length > currentLength;
  },
  function(err) {
    console.log('Yay, new items in the list!');
  }
);

artoo.helpers.createDocument

Returns a new DOM document for you to use.

artoo.helpers.createDocument([root, namespace]);

Arguments

  • root ?string : if not root is provided, the function will return a new HTML document. Otherwise, it will create a document having the specified root.
  • namespace ?string : an optional namespace.

Examples

// Creating a new HTML document
artoo.helpers.createDocument();

// Creating an XML document containing fruits
artoo.helpers.createDocument('fruits');

artoo.helpers.jquerify

Takes a string, a DOM document or else and returns a jquery object of it.

This is useful to retrieve a jquery usable object when you don’t really know what you have to handle: HTML document, XML document, string, erroneous string…

artoo.helpers.jquerify(data);
>>> $data

Parses a cookie string.

artoo.parsers.cookie(
  'name2=value2; Expires=Wed, 09 Jun 2021 10:18:14 GMT'
);
>>> {
  httpOnly: false,
  secure: false,
  key: 'name2',
  value: 'value2',
  expires: 'Wed, 09 Jun 2021 10:18:14 GMT'
}

artoo.parsers.cookies

Parses a string containing one or more cookies and return an object storing them in a key/value fashion. Typically, this can be used to parse document.cookie.

artoo.parsers.cookies(document.cookie);
>>> {
  cookie1: 'value1',
  cookie2: 'value2'
}

artoo.parsers.headers

Parses a string containing HTTP headers. Typically, this can be used to parse XHR headers.

artoo.parsers.headers(xhr.getAllResponseHeaders());
>>> {
  Date: 'Sat, 13 Dec 2014 15:27:42 GMT'
  Connection: 'keep-alive'
  Content-Length: '9072'
  Content-Type: 'text/html; charset=utf-8'
}

artoo.parsers.queryString

Parses a querystring.

artoo.parsers.queryString('var1=value1&var2=value2');
>>> {
  var1: 'value1',
  var2: 'value2'
}

artoo.parsers.url

Parses the given url as Node.js would.

artoo.parsers.url('http://localhost:8000')
>>> {
  href: 'http://localhost:8000/',
  protocol: 'http',
  host: 'localhost:8000',
  hostname: 'localhost',
  port: 8000,
  path: '/',
  pathname: '/',
  domain: 'localhost'
}

Creates a cookie string with given parameters

artoo.writers.cookie(key, value, [params]);

Arguments

  • key string: the cookie’s key.
  • value string: the cookie’s value.
  • params ?object: an object of optional parameters and containing the following keys.
    • days ?integer: number of days before cookie expiration.
    • domain ?string: cookie’s domain.
    • httpOnly ?boolean [false]: should the cookie be httpOnly?
    • path ?string: cookie’s path.
    • secure ?boolean [false]: should the cookie be secure?

Example

artoo.writers.cookie('myKey', 'myValue', {
  httpOnly: true
});
>>> 'myKey=myValue; HttpOnly'

artoo.writers.csv

Converts an array of array into a CSV string or an array of objects into a CSV string with headers.

artoo.writers.csv(data, [params]);

Arguments

  • data array : The array of array or array of object to convert into a CSV string.
  • params ?object : an object of optional parameters and containing the following keys:
    • delimiter ?string [,] : The field delimiter.
    • escape ?string ["] : The escape character for the fields and the field delimiter.
    • order ?array : if you pass an array of objects, the wanted keys and their order.
    • headers ?boolean or ?array : if false, the fonction won’t add a header line. If you provide an array, the header line will follow it. Note that, by default, a header line is added when an array of object is passed as data.

Example

var persons =[
   {
     firstname: 'Caroline',
     lastname: 'Williams'
   },
   {
     filename: 'Steven',
     lastname: 'Douglas'
   }
];

artoo.writers.csv(persons);
>>> 'firstname,lastname
     Caroline,Williams
     Steven,Douglas'

artoo.writers.queryString

Converts an object into a query string.

artoo.writers.queryString(object, [caster]);

Arguments

  • object object: a simple key/value object to convert.
  • caster ?function: a function used to cast values into the desired format.

Examples

artoo.writers.queryString({hello: 'world'});
>>> 'hello=world'

artoo.writers.queryString(
  {
    hello: 'world',
    secure: true,
    authenticated: false
  },
  function(v) {
    if (v === true)
      return 1;
    else if (v === false)
      return 0;
    else
      return v;
  }
);
>>> 'hello=world&secure=1&authenticated=0'

artoo.writers.yaml

Converts a JavaScript variable into a YAML string.

artoo.writers.yaml(data);

Example

var data = {
  hello : 'world',
  how: 'are you?',
  colors: ['yellow', 'blue']
};

artoo.writers.yaml(data);
>>> '---
     hello: world
     how: are you?
     colors:
       - yellow
       - blue'

artoo.cookies

artoo.cookies provides a simple way to interact with the host page’s stored cookies.

Retrieving cookies

// Get every cookies
artoo.cookies();
artoo.cookies.get();
artoo.cookies.getAll();

// Get cookie by key
artoo.cookies(key);
artoo.cookies.get(key);

Setting cookies

As per artoo.writers.cookie.

artoo.cookies.set(key, value, [params]);

Removing cookies

// Remove cookie by key
artoo.cookies.remove(key);

// Removing every cookies
artoo.cookies.removeAll();
artoo.cookies.clear();

artoo.log

Provide a simple and colorful way to log data to your console.

Levels

  • verbose : cyan
  • debug : blue
  • info : green
  • warning : orange
  • error : red

Examples

// If no level is provided, 'debug' is taken by default.
artoo.log('hello');
artoo.log('hello', 'info');

// Some aliases exist
artoo.log.verbose('hello');
artoo.log.debug('hello');
artoo.log.info('hello');
artoo.log.error('hello');
artoo.log.warning('hello');