1.2. Single-sourcing with DocBook

In 1999 Cogent started producing single-source documentation. This lets us generate different output formats, such as HTML, QNX Helpviewer, PDF, and Postscript, from a group of source documents. The source files are written in SGML, using the DocBook DTD.

SGML

SGML stands for Standard Generalized Markup Language. It is a standard for marking up documents. Document markup is used by text-processing software to identify parts of a text for various purposes. WYSIWYG word-processors, for example, use hidden markup to identify paragraphs, text font changes, and so on. However there is no standard markup language among these word-processors, which makes it difficult for one program to read a document from another program. SGML was created to address these and other problems.

SGML marks up the text as a set of elements. Each element is explicity marked with start and end tags, like this:

... text <elementname>marked-up text</elementname> text...

Every piece of text in an SGML document is tagged as an element. Elements can nest inside each other, creating a parent-child relationship, and rules are written to specify how the elements relate to each other.

DTD

A specific collection of elements and the rules governing their relationships is called a document type definition, or DTD. Any document that is said to be of a given type must conform to the markup rules as laid out in the DTD. An example of an SGML DTD is HTML. It specifies what constitutes valid or invalid hypertext markup.

The DocBook DTD

The DTD we use for our documentation is DocBook. It has elements such as Chapter, Section, Paragraph, and so on, such as you would use to mark up parts of any book. This is called structural markup. It also has elements corresponding to software terms such as Function, ProgramListing and UserInput that are useful for marking up software documentation. This is called semantic markup.

However, DocBook does not have any elements such as Bold, Italic, or BigFont, etc. These are used for making presentational markup. DocBook doesn't use presentational markup because that type of markup is output-specific and system-specific. Instead, DocBook has stylesheets that read the structural and semantic markup and use that markup as a basis for creating formatted output. For example, in our documents a paragraph tagged with <para></para> is displayed in PDF as a fixed block of text, whereas in HTML its size and shape change depending on the browser text font and window size. The advantage is that one SGML source file can generate any type of output: Postscript, PDF, RTF, HTML, QNX Helpviewer, even audio files or braille.

The DocBook Stylesheets

The stylesheets that format the output for DocBook documents are called the Modular DocBook Stylesheets. They are modular in the sense that they are written in chunks, each of which is related to a certain type of document content, and which can be changed or updated without affecting the other chunks. They can also be customized with a customization layer, or driver. The language of the Modular DocBook Stylesheets is DSSSL, an implementation of Scheme, a Lisp variant. DSSSL is the main language currently being used to create SGML stylesheets.

From a single source to multiple documents

Here is how our single-source documentation is used to create documents in multiple formats.

  1. The text of the source documents is written with a text editor (Emacs) into an SGML file. The first line of that file declares the document type definition (DTD) as DocBook. Emacs has a special mode for parsing SGML called PSGML, and it can check to make sure the document conforms to the DocBook DTD.
  2. To process the document, we call jade, which: reads the DocBook DTD, the Modular DocBook Stylesheet and the SGML source; makes sure the document conforms to the DocBook DTD; and then generates HTML, RTF, or TeX output based on the Modular DocBook Stylesheets.
  3. To get PDF or Postscript output, the TeX file is then processed by jadetex and/or a few other tools.

For more information, I suggest first reading the relevant sections of DocBook - The Definitive Guide (TDG) by Normal Walsh, the creator and maintainer of DocBook. For specific information on our processing tools and more on the above topics, see the Tools chapter.