Crawler Python API

Getting started with Crawler is easy. The main class you need to care about is Crawler

crawler.main

Main Module

class crawler.main.Crawler(url, delay, ignore)

Main Crawler object.

Example:

c = Crawler('http://example.com')
c.crawl()
Parameters:
  • delay – Number of seconds to wait between searches
  • ignore – Paths to ignore
crawl()

Crawl the URL set up in the crawler.

This is the main entry point, and will block while it runs.

get(url)

Get a specific URL, log its response, and return its content.

Parameters:url – The fully qualified URL to retrieve
crawler.main.run_main()

A small wrapper that is used for running as a CLI Script.