When doing their work, sandcrawler spiders abide by a precise lifecyle and will emit events each time they do something so that anyone can hook onto it to track them and implement custom logic.
Example
spider.on('spider:start', function() {
console.log('The spider has started.');
});
spider.on('job:success', function(job) {
console.log('Success for url:', job.res.url);
});
If you ever wonder know what's inside the job objects passed around most of job-level events, be sure to check out this part of the documentation first.
Spider-level events
Job-level events
Emitted when the spider starts.
Emitted when the spider tears down. This is useful to plugins needing to cleanup when the hooked spider finishes its work.
Emitted when the spider succeeds. Note that the spider can succeed even if some jobs did not. Indeed, the spider will only considered as failed if a global error occurred while running.
Data
Emitted when the spider fails globally.
Data
Emitted when the spider ends, whether it succeeded or failed.
Data
success
or fail
.Emitted when a job is added to the spider's queue when running.
Data
Emitted when a job is discarded from the job's queue because it was rejected by a beforeScraping
middleware.
Data
Emitted when the spider starts processing a job.
Data
Emitted when the spider starts scraping a job.
Data
Emitted when a job succeeds.
Data
Emitted when a job fails.
Data
Emitted when a job is retried.
Data
now
or later
.Emitted when a job ends, whether it succeeded or failed.
Data
success
or fail
.