Step 2: Building References & API docs

Note

Finish at 11:15

Concepts

Referencing

Another important Sphinx feature is that it allows referencing across documents. This is another powerful way to tie documents together.

The simplest way to do this is to define an explicit reference object:

.. _reference-name:

Cool section
------------

Which can then be referenced with :ref::

:ref:`reference-name`

Which will then be rendered with the title of the section Cool section.

Sphinx also supports :doc:`docname` for linking to a document.

Semantic Descriptions and References

Sphinx also has much more powerful semantic referencing capabilities, which knows all about software development concepts.

Say you’re creating a CLI application. You can define an option for that program quite easily:

.. option:: -i <regex>, --ignore <regex>

   Ignore pages that match a specific pattern.

That can also be referenced quite simply:

:option:`-i`

Sphinx includes a large number of these semantic types, including:

External References

Sphinx also includes a number of pre-defined references for external concepts. Things like PEP’s and RFC’s:

You can learn more about this at :pep:`8` or :rfc:`1984`.

You can read more about this in the Sphinx inline-markup docs.

Automatically generating this markup

Of course, Sphinx wants to make your life easy. It includes ways to automatically create these object definitions for your own code. This is called audodoc, which allows you do to syntax like this:

.. automodule:: crawler

and have it document the full Python module importable as crawler. You can also do a full range of auto functions:

.. autoclass::
.. autofunction::
.. autoexception::

Warning

The module must be importable by Sphinx when running. We’ll cover how to do this in the Tasks below.

You can read more about this in the Sphinx autodoc docs.

Tasks

Referencing Code

Let’s go ahead and add a cookbook to our documentation. Users will often come to your project to solve the same problems. Including a Cookbook or Examples section will be a great resource for this content.

In your cookbook.rst, add the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Cookbook

Crawl a web page

The most simple way to use our program is with no arguments.
Simply run:

python main.py -u <url>

to crawl a webpage.

Crawl a page slowly

To add a delay to your crawler,
use -d:

python main.py -d 10 -u <url>

This will wait 10 seconds between page fetches.

Crawl only your blog

You will want to use the -i flag,
which while ignore URLs matching the passed regex::

python main.py -i "^blog" -u <url>

This will only crawl pages that contain your blog URL.

Note

Live Preview: Cookbook

Remember, you will need to use :option: blocks here. This is because they are referencing a command line option for our program.

Adding Reference Targets

Now that we have pointed at our CLI options, we need to actually define them. In your cli.rst file, add the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
Command Line Options

These flags allow you to change the behavior of Crawler.
Check out how to use them in the Cookbook.

-d <sec>, --delay <sec>

Use a delay in between page fetchs so we don't overwhelm the remote server.
Value in seconds.

Default: 1 second
    
-i <regex>, --ignore <regex>

Ignore pages that match a specific pattern.

Default: None

Note

Live Preview: Command Line Options

Here you are documenting the actual options your code takes.

Try it out

Let’s go ahead and build the docs and see what happens. Do a:

make html

Here you will see that the :option: blocks magically become links to the definition. This is your first taste of Semantic Markup. With Sphinx, we are able to simply say that something is a option, and then it handles everything for us; linking between the definition and the usage.

Importing Code

Being able to define options and link to them is pretty neat. Wouldn’t it be great if we could do that with actual code too? Sphinx makes this easy, let’s take a look.

We’ll go ahead and create an api.rst that will hold our API reference:

1
2
3
4
5
6
7
8
Crawler Python API

Getting started with Crawler is easy.
The main class you need to care about is crawler.main.Crawler

crawler.main

automodule: crawler.main

Note

Live Preview: Crawler Python API

Remember, you’ll need to use the .. autoclass:: directive to pull in your source code. This will render the docstrings of your Python code nicely.

Requirements

In order to build your code, it needs to be able to import it. This means it needs all of the required Python modules you import in the code.

If you have third party dependencies, that means that you have to have them installed in your Python environment. Luckily, for most cases you can actually mock these variables using autodoc_mock_imports.

In your conf.py go ahead and add:

autodoc_mock_imports = [‘bs4’, ‘requests’]

This will allow your docs to import the example code without requiring those modules be installed.

Tell Sphinx about your code

When Sphinx runs autodoc, it imports your Python code to pull off the docstrings. This means that Sphinx has to be able to see your code. We’ll need to add our PYTHONPATH to our conf.py so it can import the code.

If you open up your conf.py file, you should see something close to this on line 18:

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#sys.path.insert(0, os.path.abspath('.'))

As it notes, you need to let it know the path to your Python source. In our example it will be ../src/, so go ahead and put that in this setting.

Note

You should always use relative paths here. Part of the value of Sphinx is having your docs build on other people’s computers, and if you hard code local paths that won’t work!

Try it out

Now go ahead and regenerate your docs and look at the magic that happened:

make html

Your Python docstrings have been magically imported into the project.

Tie it all together

Now let’s link directly to that for users who come in to the project. Update your index.rst to look like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
Crawler Step 2 Documentation

User Guide

toctree:

install
support
cookbook

Programmer Reference

toctree:

cli
api

Note

Live Preview: Crawler Step 2 Documentation

One last time, let’s rebuild those docs:

make html

Warning

You now have awesome documentation! :)

Now you have a beautiful documentation reference that is coming directly from your code. This means that every time you change your code, it will automatically be reflected in your documentation.

The beauty of this approach is that it allows you to keep your prose and reference documentation in the same place. It even lets you semantically reference the code from inside the docs. This is amazingly powerful and a great way to write documentation.

Extra Credit

Have some extra time left? Let’s look through the code to understand what’s happening here more.

Look through intersphinx

Intersphinx allows you to bring the power of Sphinx references to multiple projects. It lets you pull in references, and semantically link them across projects. For example, in this guide we reference the Sphinx docs a lot, so we have this intersphinx setting:

intersphinx_mapping = {
    'sphinx': ('http://sphinx-doc.org/', None),
}

Which allows us to add a prefix to references and have them resolve:

:ref:`sphinx:inline-markup`

We can also ignore the prefix, and Sphinx will fall back to intersphinx references if none exist in the current project:

:ref:`inline-markup`

You can read more about this in the intersphinx docs.

Understand the code

A lot of the magic that is happening in Importing Code above is actually in the source code.

Check out the code for crawler/main.py:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
"""
Main Module
"""
import time
from optparse import OptionParser
# Python 3 compat
try:
    from urlparse import urlparse
except ImportError:
    from urllib.parse import urlparse

import requests
from bs4 import BeautifulSoup

from utils import log, should_ignore


class Crawler(object):

    """
    Main Crawler object.

    Example::

        c = Crawler('http://example.com')
        c.crawl()

    :param delay: Number of seconds to wait between searches
    :param ignore: Paths to ignore

    """

    def __init__(self, url, delay, ignore):
        self.url = url
        self.delay = delay
        if ignore:
            self.ignore = ignore.split(',')
        else:
            self.ignore = []

    def get(self, url):
        """
        Get a specific URL, log its response, and return its content.

        :param url: The fully qualified URL to retrieve
        """
        response = requests.get(url)
        log(url, response.status_code)
        return response.content

    def crawl(self):
        """
        Crawl the URL set up in the crawler.

        This is the main entry point, and will block while it runs.
        """
        html = self.get(self.url)
        soup = BeautifulSoup(html, "html.parser")
        for tag in soup.findAll('a', href=True):
            link = tag['href']
            parsed = urlparse(link)
            if parsed.scheme:
                to_get = link
            else:
                to_get = self.url + link
            if should_ignore(self.ignore, to_get):
                print('Ignoring URL: {url}'.format(url=to_get))
                continue
            self.get(to_get)
            time.sleep(self.delay)


def run_main():
    """
    A small wrapper that is used for running as a CLI Script.
    """

    parser = OptionParser()
    parser.add_option("-u", "--url", dest="url", default="http://docs.readthedocs.org/en/latest/",
                      help="URL to fetch")
    parser.add_option("-d", "--delay", dest="delay", type="int", default=1,
                      help="Delay between fetching")
    parser.add_option("-i", "--ignore", dest="ignore", default='',
                      help="Ignore a subset of URL's")

    (options, args) = parser.parse_args()

    c = Crawler(url=options.url, delay=options.delay, ignore=options.ignore)
    c.crawl()

if __name__ == '__main__':
    run_main()

As you can see, we’re heavily using RST in our docstrings. This gives us the same power as we have in Sphinx, but allows it to live within the code base.

This approach of having the docs live inside the code is great for some things. However, the power of Sphinx allows you to mix docstrings and prose documentation together. This lets you keep the amount of

Moving on

Could it get better? In fact, it can and it will. Let’s go on to Step 3: Keeping Documentation Up to Date.