View: HTML | Text | PS | PDF | RTF | Wiki | DocBook | DocBook Inline | changelog | about this header |
Circle Jeff Boweron![]() |
The DocBook format uses XML or SGML to create a generic document. It focuses on the content of the document rather than the formatting, this allows you to quickly and easily use style sheets to convert entire libraries into custom formats without needing to open a single document for editing.
A DocBook file looks a bit like an HTML file. To see what the DocBook Language looks like, click the "DocBook Inline" link at the top of this page. You'll also note that you can download the source file itself, this is useful if you need to edit someone else's DocBook file as you don't need to recreate it from scratch.
This document won't describe the details of the DocBook language and history, however it will provide some links in Appendix B for you that can help you learn.
DocBook is a portable format so it works equally well on all platforms. But some platforms are more equal than others. This document assumes that you've got access to a Linux machine as your primary means of compiling a DocBook document. If you want to run DocBook on your Mac or Windows laptop one of the more flexible means of doing so is to install Linux as a virtual machine using a program like VirtualBox.
Assuming you've got a Debian-based system like Ubuntu installing DocBook requires the command sudo apt-get install docbook-utils. That's it. You're done. Well, mostly. The scripts I've published here are linked in Appendix A and there is also a more comprehensive apt-get statement.
I recommend creating a new directory for each document, it's up to you if you want to create further subdirectories to organize things. My method is to create a docs directory with an index document at http://www.ebower.com/docs and then have each document a subdirectory off of that. I name each .docbook file with the same base filename as the directory (for example, docs/Document-Subject/Document-Subject.docbook). Finding what works for you is important here.
There is no WYSIWYG editor for DocBook stuff (actually, several exist but you and Google need to share some alone time to see if any fit your needs). The first step in creating a DocBook is to load up your favorite text editor VIM (if you're an Emacs user simply type Ctrl-Fn-Command-Alt-F12-D and the LISP interpreter will pretend to understand the DocBook XML format). The easiest route is to open up someone else's document to see what it looks like. The tags are fairly well-named so comparing the HTML version of the file to the DocBook version should be pretty intuitive. I'd recommend creating a template file, just an empty file with basic headers and fundamental building blocks so you can reference common tasks quickly. You can then copy this template file over and edit it rather than typing the header information in every time.
Example 1. An example DocBook template
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML v4.5//EN" "/usr/share/xml/docbook/schema/dtd/4.5/docbookx.dtd">
<article lang="en">
<articleinfo>
<keywordset>
<keyword></keyword>
</keywordset>
<title></title>
<abstract>
<para></para>
</abstract>
</articleinfo>
<section id="">
<title></title>
<para></para>
</section>
<!-- INCLUDE ../about-me.docbook -->
</article>
In February 2010 I modified my template to include an appendix. Currently this is used to refer to a forum post for comments and suggestions, but the number of insightful posts (currently 1) is overwhelmed by the spammers (which is >>1). This appendix can also be used to reference change request numbers for requirements, or any other "once-and-done" editable information. While I was at it I ran xml_pp to normalize the formatting.
In January 2011 I also added functionality for INCLUDE statements which will dynamically include a text file at compilation time. I use this for an "About Me" page, but any semi-dynamic content will work. I use the made-up term "semi-dynamic" because it only gets read at compile and not on view. This is perfect for my needs.
Once you've created a DocBook document you'll probably want to compile it into something that's a bit prettier. A simple place to start is with the docbook2html command. When you run this command with an argument of the filename you're working with you should end up with an HTML file for each section. If you run docbook2html -u filename you should have a single HTML file with the entire document.
There are a slew of other converters installed for various formats such as PDF, TXT, and RTF. You may find that it's more efficient to use a script to create multiple files at once so you don't have to remember the parameters of each one. I've also added Charlie Gero's set of links to the top of my HTML files to allow for easy downloads. The script I use is in /usr/bin/dbcompile and the contents can be seen below. Skip past listing.
Example 2. An example DocBook compile script
#!/bin/bash
# Version 0.1.2.1
#First setup some variables
#This is the current directory level represented by the number of / characters
#For example, to use a filename in /home/user you need 3 slashes.
directorylevel=$(pwd|sed 's/[^/]//g'|wc -m)
#Finds the name of the current directory.
bottomlevel=$(pwd | cut -d/ -f$directorylevel)
#Set up some filenames
origfilename=$bottomlevel.docbook
filename=$bottomlevel-proc.docbook
basefilename=$(basename $filename .docbook)
tablefilename=table.html
if [ -f ~/.dbcompile/dbcompile.header ]; then
tableheader=~/.dbcompile/dbcompile.header
echo Found custom header file in $tableheader
else
tableheader=/etc/dbcompile/dbcompile.header
fi
echo Running custom script
if [ -e "run-on-compile" ]; then
echo Found script!
chmod 755 run-on-compile
./run-on-compile
else
echo No script found.
fi
echo Running Spellcheck on Original \(note we don\'t check the INCLUDES\)
xml_spellcheck $origfilename
echo Cleaning up Spellcheck files
rm xml_spellcheck_*.txt
rm xml_spellcheck_*.bak
echo Making the XML Print Pretty on the original
docbook_pp $origfilename
xml_pp -p programlisting -i.last $origfilename
OPERATION_RESULT=$?
if [ $OPERATION_RESULT != 0 ]; then
echo Exit Code: $OPERATION_RESULT
exit $OPERATION_RESULT
fi
echo Processing INCLUDES
if [ -f "$filename" ]; then
rm $filename
fi
includefound=false
while read line
do
if [ ! "$line" = "" ]; then
echo "$line" >> $filename
fi
iscomment=$(echo $line | awk '{print $1}')
command=$(echo $line | awk '{print $2}')
if [ "$iscomment" = "<!--" ] && [ "$command" = "INCLUDE" ]; then
includefound=true
includefile=$(echo $line | awk '{print $3}')
if [ -f "$includefile" ]; then
echo Including $includefile.
cat $includefile >> $filename
echo "<!-- INCLUDE_END $includefile -->" >> $filename
else
echo Can\'t find $includefile, aborting.
exit 2
fi
fi
done < $origfilename
if [ $includefound = "true" ]; then
echo Done processing includes.
else
echo No INCLUDE tags found.
fi
echo Making the XML Print Pretty
docbook_pp $origfilename
xml_pp -p programlisting -i.last $filename
OPERATION_RESULT=$?
if [ $OPERATION_RESULT != 0 ]; then
echo Exit Code: $OPERATION_RESULT
exit $OPERATION_RESULT
fi
echo Updating changelog
datelog "$1"
echo Creating HTML...
docbook2html -u $filename
echo Creating TXT...
docbook2txt $filename
mv $basefilename.txt $bottomlevel.txt
echo Creating PS...
docbook2ps $filename
mv $basefilename.ps $bottomlevel.ps
echo Creating PDF...
# Docbook2PDF is a little broken when it comes to images, using ps2pdf instead
# docbook2pdf $filename
ps2pdf $bottomlevel.ps
echo Creating RTF...
docbook2rtf $filename
mv $basefilename.rtf $bottomlevel.rtf
#To create the inline we need to take the original and remove the
#special characters.
echo Creating DocBook Inline...
cat $filename | sed -e 's~&~\&~g' -e 's~<~\<~g' -e 's~>~\>~g' \
> $filename.ildb
echo Creating Wiki...
docbook2wiki $filename
mv $basefilename.wiki $bottomlevel.wiki
echo Generating header...
rm $tablefilename 2> /dev/null
cat $tableheader | grep -v '^#' \
| sed s/\$DB_BASENAME/$bottomlevel/ > $tablefilename
# The following code is left for historical purposes and will be removed
# in a future release.
#
#echo \<style\>body\{font-family\: helvetica, sans-serif\;\} td\{border\: \
# 1px solid black\; padding\: 2px\;\} table \{ width\: 80\%\; \
# border-collapse\: collapse\;\} table.BLOCKQUOTE \{width\: 100\%\} \
# table.BLOCKQUOTE td \{border\: 0px\;\}\</style\> >> $tablefilename
#echo \<table bgcolor=#BBEEFF width=100% style=\"width\: 100%\"\> >> \
# $tablefilename
#echo \<tr\> >> $tablefilename
#echo \<td style="font-family: helvetica, sans-serif;font-size:11pt;"\>\
# \<b\>View:\</b\> \<a href="index.html"\>HTML\</a\> \| \
# \<a href="$bottomlevel.txt" rel="nofollow"\>Text\</a\> \| \
# \<a href="$bottomlevel.ps" rel="nofollow"\>PS\</a\> \| \
# \<a href="$bottomlevel.pdf" rel="nofollow"\>PDF\</a\> \| \
# \<a href="$bottomlevel.rtf" rel="nofollow"\>RTF\</a\> \| \
# \<a href="$bottomlevel.docbook" rel="nofollow"\>DocBook\</a\> \| \
# \<a href="$bottomlevel.docbook.html" rel="nofollow"\>DocBook Inline\</a\> \ \| \
# \<a href="changelog.html"\>changelog\</a\> \| \
# \<a href="https://www.ebower.com/docs/docbook/"\>about this header\</a\>\</td\> >> $tablefilename
#echo \</tr\> >> $tablefilename
#echo \</table\> >> $tablefilename
echo Merging table into HTML file...
cat $tablefilename > index.html ; cat $basefilename.html >> index.html
echo Making index file Print Pretty...
tidy -m -i -q index.html &> /dev/null
echo Merging table into Inline Docbook file...
cat $tablefilename > $origfilename.html
echo \<pre style=white-space\:pre-wrap\> >> $origfilename.html
cat $filename.ildb >> $origfilename.html
echo \</pre\> >> $origfilename.html
echo Making Inline Docbook file print pretty...
tidy -m -i -q $origfilename.html &> /dev/null
echo Cleaning up after myself...
rm $filename.ildb
rm $tablefilename
# For some reason the DocBook file ends up losing global read permissions
chmod +r $filename
chmod +r $origfilename
It assumes that you've created a directory subject and inside there is a subject.docbook file. I create an index.html file which will provide links to the other formats which are all also contained in the same directory.
As of Feb 2010 I've updated this script to include xml_pp to help make the XML itself a little more uniform. This program is in the xml-twig-tools package and it will reformat the tag indents for you. The script will save your last docbook file as filename.docbook.last in case it screws something up. Since it doesn't actually edit the file in a meaningful way you can continue your vi session with impunity and overwrite the new file if that's how you roll.
I've also added a spellcheck to the script from xml_twig_tools. The spellcheck engine is actually aspell and if you accidentally add a word to the dictionary that you didn't intend, edit ~/.aspell.en.pws (assuming you're using English, of course).
In February 2010 I added a changelog mechanism. This adds a changelog link at the end of the html document which lists when the document was first compiled, when it was last compiled, and any milestones you wish to record. By issuing the command dbcompile "Updated table 7" a milestone will appear with the current date associated with it. If there is no text supplied the compile with be largely invisible except for the Last Compiled line changing. This is intentional since many times I need multiple compiles because of a missing bracket or a simple typo and these compiles do not need to be logged. If you feel every compile should be logged, feel free to edit changelog.html yourself.
Example 3. Logging Changes
#!/bin/bash
#Set the date variables
today=$(date +"%Y %b %d")
now=$(date +"%H:%M %Z")
logfile=changelog.html
#Create changelog if it doesn't exist
if [ -e "$logfile" ]; then
echo Changelog found
#Remove the last change message
sed '$d' $logfile > $logfile.temp
mv $logfile.temp $logfile
else
echo Creating new changelog
echo \<pre\>Document created $today at $now > $logfile
fi
if [ -n "$1" ]; then
echo $today "$1" >> $logfile
fi
#Add the last compile time today
echo Last change $today at $now \</pre\> >> $logfile
In April 2010 I added a mechanism to run a custom script on compile. The dbcompile script now checks for a file called run-on-compile and, if found, will first ensure its permissions are 755 and then run it. This function is useful if you need to link a live file, in that case run-on-compile would simply contain the cp command.
After playing with all sorts of ways to get external file references in through official means, in January of 2011 I've modified my script to recognize an HTML-style comment of the format <-- INCLUDE filename -->. During the processing with dbcompile I'll replace this comment with the contents of the named file if it exists, or error out if it doesn't exist. Note that a normal DocBook compiler will simply skip over this step, you can view the DocBook file to see what the pre-processed DocBook format looks like (with just the INCLUDE tag) or you can look at the DocBook Inline link and check out the "About Me" appendix to see how it looks after I've included the text. The nice thing about this is that it doesn't break standard implementations but it provides the "nearly live" functionality I'm looking for.
Also in January 2011 I modified the script such that the links to documents in other formats (for example, the PDF version) has the "nofollow" tag. This tag really just affects search engines and prevents them from indexing these documents. Since they're a clone of the HTML document (minus links) it's not useful for Google to pull up a PostScript or Text file when the HTML main page is available.
As a part of this change I also moved the header file to /etc/dbcompile/dbcompile.header. This allows me to make the HTML for the header table listing the various formats look a little nicer. You can also more easily replace this with a custom header in ~/.dbcompile/dbcompile.header in case you'd like to customize things. An example dbcompile.header appears below.
Example 4. Default Header Table
# This file is a template for the dbcompile header table. It ignores lines
# starting with "#" and replaces $DB_BASENAME with the base DocBook filename.
# For example, to reference the text version of foo.docbook, use
# $DB_BASENAME.txt
#
# You may copy this file to ~/.dbcompile/dbcompile.header and edit as you wish.
<style>
body{font-family: helvetica, sans-serif;}
td{border: 1px solid black; padding: 2px;}
table { width: 80%; border-collapse: collapse;}
table.BLOCKQUOTE {width: 100%}
table.BLOCKQUOTE td {border: 0px;}
</style>
<table summary="Table for formatting only" bgcolor=#BBEEFF width=100% style="width: 100%">
<tr>
<td style="font-family: helvetica, sans-serif;font-size:11pt;">
<b>View:</b> <a href="index.html">HTML</a> |
<a href="$DB_BASENAME.txt" rel="nofollow">Text</a> |
<a href="$DB_BASENAME.ps" rel="nofollow">PS</a> |
<a href="$DB_BASENAME.pdf" rel="nofollow">PDF</a> |
<a href="$DB_BASENAME.rtf" rel="nofollow">RTF</a> |
<a href="$DB_BASENAME.wiki" rel="nofollow">Wiki</a> |
<a href="$DB_BASENAME.docbook" rel="nofollow">DocBook</a> |
<a href="$DB_BASENAME.docbook.html" rel="nofollow">DocBook Inline</a> |
<a href="changelog.html">changelog</a> |
<a href="https://www.ebower.com/docs/docbook/">about this header</a>
</td>
</tr>
</table>
As a final part of this update, I've also started using tidy to make the HTML more rational. Not only does docbook2html produce ugly output, but my kludge of tacking the table to the beginning (before the body and meta tags) probably breaks some simplistic browsers or web crawlers. This adds a dependency of tidy.
In February 2011 I introduced another helper tool, docbook_pp (DocBook Print Pretty). This works hand-in-hand with xml_pp to normalize some formatting. While I expect this to grow in size, currently it is primarily responsible for ensuring that <para> tags exist on their own line. This helps with my poor attempt to convert from DocBook to Wiki format.
Example 5. docbook_pp
#!/bin/bash
# Version 0.1.2.1
# docbook_pp is meant to pre-process a DocBook file before it's fed into xml_pp
tempfile=/tmp/$1_pp.$$.tmp
backup=$1_pp.bak
if [ ! $# = 1 ]; then
echo Usage:
echo " $(basename $0) filename"
exit 1
fi
if [ ! -f "$1" ]; then
echo File \"$1\" not found.
exit 2
fi
echo Preformatting $1
echo Backup saved at $backup
cp $1 $backup
sed -r \
-e s/'<para>(.)'/"<para>\\n\\1"/g -e s/'(.)<\/para>'/"\\1\\n<\/para>"/g \
$1 > $tempfile
mv $tempfile $1
The docbook2pdf command doesn't like to insert the images properly. My scripts will call ps2pdf after the PostScript file is created instead.
There is an undocumented dependency on the text-based browser Lynx to create proper text files. Running sudo apt-get install lynx should clean that up for you. Thanks to David Gay for discovering this.
Sometimes it's useful to be able to export a DocBook file to a Wiki page. This is a tricky thing to actually pull off, Wiki pages are designed to be edited by a group while DocBook files are more traditional documents with tighter controls. One of the primary functions of this process is to migrate control of the document to the public, as of now there's not a good method of keeping Wiki changes in sync with your DocBook source. Hopefully soon I'll be able to work on a wiki2docbook that will help keep the DocBook version up-to-date, in the mean time this may be a good way to hand off the document to someone else who isn't enlightened enough to use DocBook.
The docbook2wiki script is an early beta. It handles most of the tag I use, but none that I don't. There are also likely significant flaws in the processing of potentially complex entities like tables. There will also probably be issues if you don't write your documents in my style or use dbcompile to preformat them. Please read the notes at the beginning of the script for a list of known issues and feel free to contact me if you run into troubles. The output seems to work well with my work's WikiMedia-based Wiki page, but there isn't a hard-and-fast standard. Skip past listing.
Example 6. docbook2wiki
#!/bin/bash
# Version 0.1.2.1
# Fundamentally is this a good idea? It can be used to migrate from DocBook
# to Wiki but Wiki is dynamic and Docbook is static. Another project will be
# wiki2docbook but a lot of tag info is lost.
# Known issues:
# URLs with "file://" are not supported on Wiki?
# Relative URLs are not supportable on Wiki
# Try to keep tags on unique lines in DocBook, and use xml_pp before processing
# No support for images (yet)
# Table support likely very weak
# Title support needs work, can omit main document title, have it as parent
# (pushing other sections down a level), or have it as first topic
# (incrementing other sections by one)
# Meta tags not often supportable on Wiki
# Appendix tags are rendered as ordinary sections
# Included files sometimes skip last character of the line, expecially "if"s
if [ ! $# = 1 ]; then
echo Usage:
echo " $(basename $0) filename"
exit 1
fi
if [ ! -f "$1" ]; then
echo File $1 not found
exit 2
fi
echo Warning: Using hardcoded catalogs and styles
echo Working on: $(pwd)/$1
printmeta=false
filebase=$(basename $1 .docbook)
inputfile=$1
programbasename=$(basename $0)
tempdir=/tmp/$programbasename.$$
sectionfile=$tempdir/$filebase.sections
outputfile=$filebase.wiki
preprocessedoutput=$tempdir/$filebase.tmp
tempoutput=$tempdir/$filebase.tmp.wiki
mode=normal
level=0
debug=0
mkdir $tempdir
# Set up wiki markups
sectionbreak=\=
bold=\'\'\'
bold_end=$bold
italic=\'\'
italic_end=$italic
bold_italic=\'\'\'\'\'
bold_italic_end=$bold_italic
fixedwidth='<tt><nowiki>'
fixedwidth_end='<\/nowiki><\/tt>'
bullet='*'
number='#'
code=' '
int_link='[['
int_link_end=']]'
ext_link='\['
ext_link_end='\]'
hline='----'
# Set up mapping between DocBook and Wiki markups
command=$bold
command_end=$bold_end
emphasis=$italic
emphasis_end=$italic_end
filename=$fixedwidth
filename_end=$fixedwidth_end
computeroutput=$fixedwidth
computeroutput_end=$fixedwidth_end
function global_process {
# This function does global search and replace on the entire line.
cat $inputfile | sed -r \
-e s/'<para>'//g -e s/'<\/para>'/_/g \
-e s/'<abstract>'//g -e s/'<\/abstract>'/"__TOC__"/g \
-e s/'<command>'/$command/g -e s/'<\/command>'/${command_end}/g \
-e s/'<filename>'/$filename/g -e s/'<\/filename>'/$filename_end/g \
-e s/'<computeroutput>'/$computeroutput/g \
-e s/'<\/computeroutput>'/$computeroutput_end/g \
-e s/'<emphasis>'/$emphasis/g -e s/'<\/emphasis>'/$emphasis_end/g \
-e s/'<thead>'/"!"/g -e s/'<\/thead>'/"!!"/g \
-e s/'<row>'/"|-"/g -e s/'<\/row>'//g \
-e s/'<tbody>'//g -e s/'<\/tbody>'//g \
-e s/'<entry>'/"|"/g -e s/'<\/entry>'//g \
-e s/'<\/table>'/_/g \
> $preprocessedoutput
}
function line_process {
# This function does a line-by line process of the file
processed_line_whitespace=$(echo "$line" | \
sed -r \
-e s/'<\?xml(.*?)\?>'//g \
-e s/'<!--(.*?)-->'//g \
-e s/'<!DOCTYPE(.*?)>'//g \
-e s/'<article(.*?)>'//g \
-e s/'<\/article(.*?)>'//g \
-e s/'<email>(.*?)<\/email>'/"[mailto:\\1 \\1]"/g \
-e s/'<tgroup(.*?)>'/'{| border="1" frame="border" rules="all" class="CALSTABLE"\n|+ '"$tabletitle"/g -e s/'<\/tgroup>'/"|}"/g \
)
# Regexps are greedy, so more than one <ulink> per line is problematic
# We replace the </ulink> with <</ulink>> one at a time until we've got them
# all. Not efficient, but it works.
while [ ! "$(echo "$processed_line_whitespace" | grep '</ulink>')" = "" ]; do
processed_line_whitespace=$(echo "$processed_line_whitespace" | \
sed s/'<\/ulink>'/'<<\/ulink>>'/)
if [ "$debug" -gt 6 ]; then
echo "Hunting ulinks, be vewwwy quit! $processed_line_whitespace"
fi
processed_line_whitespace=$(echo "$processed_line_whitespace" | \
sed -r s/'<ulink url="(.*?)">(.*?)<<\/ulink>>'/"$ext_link\\1 \\2$ext_link_end"/g)
done
processed_line=$(echo "$processed_line_whitespace")
}
function check_for_title {
# Checks to see if $processed_line contains a <title> tag and extracts to
# $title. Sets $istitle to true/false and changes mode to "look_for_title"
# to force certain behaviors.
title=$(echo $processed_line | sed -r s/'<title>(.*?)<\/title>'/\\1/g)
if [ "$title" = "" ] || [ "$title" = "$processed_line" ]; then
istitle=false
else
mode=look_for_title
istitle=true
fi
if [ "$debug" -gt 6 ]; then
echo "istitle : \"$istitle\""
fi
}
# Do a global processing of the file
global_process
# Process the input file line-by-line and put output into $tmpoutput
cat $preprocessedoutput |
while read line; do
if [ "$debug" -gt 6 ]; then
echo ""
echo "Line Start Mode : $mode"
echo "Unprocessed Line: \"$line\""
fi
printline=true
line_process
if [ "$debug" -gt 6 ]; then
echo "Processed Line : \"$processed_line\""
fi
# If there's a blank line, don't print it unless you're in a <programlisting>
# block.
if [ "$processed_line" = "" ] && [ ! "$mode" = "programlisting" ]; then
printline=false
fi
# We need to check for the document title at level 0
if [ "$level" = 0 ]; then
check_for_title
fi
# Do special things based on the mode we're in
case "$mode" in
"look_for_title")
title=$(echo $processed_line | sed -r s/'<title>(.*?)<\/title>'/\\1/g)
if [ ! "$title" = "" ] && [ ! "$title" = "$processed_line" ]; then
if [ ! "$level" = 0 ]; then
echo "$sectionid=$title" >> $sectionfile
fi
if [ $debug -gt 3 ]; then
echo Found new section at level $level called \"$title\"
fi
processed_line="$sectionbreak $title $sectionbreak"
if [ $level = 0 ]; then
if [ "$debug" -gt 6 ]; then
echo "Found document title"
fi
# This line will create a new section for the title, incrementing the other
# sections by one.
# processed_line="$sectionbreak$sectionbreak $title $sec$sec"
# This line will create a master section for the title, indenting subsequent
# sections by one
# processed_line="$sectionbreak $title $sectionbreak"
# These lines will create a simple bold version of the title followed by a
# horizontal line, leaving the section hierarchy alone.
echo "$bold$title$bold_end" >> $tempoutput
echo "$hline" >> $tempoutput
processed_line="_"
fi
mode=normal
fi
;;
"keyword")
keyword=$(echo $processed_line | sed -r s/'<keyword>(.*?)<\/keyword>'/\\1/g)
if [ ! "$keyword" = "" ] && [ ! "$keyword" = "</keywordset>" ]; then
printline=false
if [ $debug -gt 3 ]; then
echo Found new keyword called \"$keyword\"
keywords="$keywords$keyword "
fi
fi
;;
"example")
check_for_title
mode=example
if [ "$istitle" = "true" ]; then
processed_line="${bold}Example:${bold_end} $title"
if [ "$debug" -gt 6 ]; then
echo "Found example title \"$title\""
fi
else
examplefilename=$(echo $processed_line | awk -F "\"" '{print $2}')
exampletype=$(echo $processed_line | awk -F "\"" '{print $4}')
if [ "$debug" -gt 6 ]; then
echo "Example filename: \"$examplefilename\""
echo "Example type : \"$exampletype\""
fi
case "$exampletype" in
"linespecific")
if [ -f "$examplefilename" ]; then
processed_line="_"
(
IFS='\n'
cat "$examplefilename" |
while read "exampleline"; do
echo "$code$exampleline" | sed s/'<'/'\<'/g >> "$tempoutput"
done
)
if [ "$debug" -gt 6 ]; then
echo "Adding filename $examplefilename"
fi
else
echo File "$examplefilename" not found
if [ "$debug" -gt 6 ]; then
echo "\"$examplefilename\" not found"
fi
echo "${italic}${bold}ERROR! $examplefilename not found!${italic_end}${bold_end}" >> "$tempoutput"
processed_line="_"
fi
;;
"")
printline=false
;;
esac
fi
;;
"programlisting")
if [ ! "$processed_line" = "</programlisting>" ]; then
processed_line="$code$(echo "$processed_line_whitespace" | sed s/'<'/'\<'/g)"
fi
;;
"list")
if [ ! "$processed_line" = "" ]; then
processed_line="$bullet $processed_line"
mode=normal
fi
;;
"table")
check_for_title
mode=normal
if [ "$istitle" = "true" ]; then
tabletitle="$title"
printline=false
if [ "$debug" -gt 6 ]; then
echo "Found table title \"$title\""
fi
fi
;;
"thead")
if [ ! "$processed_line" = "|-" ]; then
processed_line=$(echo "$processed_line" | sed s/'|'/'!'/)
fi
;;
esac
# Now we check the $processed_line to see if we need to change modes
case "$(echo $processed_line)" in
'<section'*)
mode=look_for_title
level=$(echo "$level + 1" | bc)
sectionbreak="$sectionbreak="
sectionid=$(echo $processed_line | awk -F "\"" '{print $2}')
printline=false
if [ "$debug" -gt 6 ]; then
echo "Found section start - level: $level id: $sectionid"
fi
;;
"</section>")
mode=normal
sectionbreak=${sectionbreak:0:$level}
level=$(echo "$level - 1" | bc)
printline=false
if [ "$debug" -gt 6 ]; then
echo "Found section stop - level $level"
fi
;;
'<appendix'*)
mode=look_for_title
level=$(echo "$level + 1" | bc)
sectionbreak="$sectionbreak="
sectionid=$(echo $processed_line | awk -F "\"" '{print $2}')
printline=false
if [ "$debug" -gt 6 ]; then
echo "Found section start - level: $level id: $sectionid"
fi
;;
"</appendix>")
mode=normal
sectionbreak=${sectionbreak:0:$level}
level=$(echo "$level - 1" | bc)
printline=false
if [ "$debug" -gt 6 ]; then
echo "Found section stop - level $level"
fi
;;
"<programlisting>")
mode=programlisting
printline=false
;;
"</programlisting>")
mode=normal
processed_line=""
;;
"<keywordset>")
mode=keyword
keywords="{{keywords>"
printline=false
if [ "$debug" -gt 6 ]; then
echo "Found keyword start"
fi
;;
"</keywordset>")
mode=normal
keywords="$keywords}}"
processed_line=$keywords
if [ "$printmeta" = false ]; then
printline=false
fi
if [ "$debug" -gt 6 ]; then
echo "Done adding keywords \"$keywords\", printmeta is $printmeta"
fi
;;
"<example>")
mode=example
printline=false
if [ "$debug" -gt 6 ]; then
echo "Found example start"
fi
;;
"</example>")
mode=normal
printline=false
if [ "$debug" -gt 6 ]; then
echo "Found example end"
fi
;;
"<listitem>")
mode=list
printline=false
if [ "$debug" -gt 6 ]; then
echo "Found list item"
fi
;;
"</listitem>")
mode=normal
printline=false
if [ "$debug" -gt 6 ]; then
echo "Found list item end"
fi
;;
"<itemizedlist>")
listmode=itemized
printline=false
if [ "$debug" -gt 6 ]; then
echo "Found itemized list"
fi
;;
"</itemizedlist>")
listmode=none
printline=false
if [ "$debug" -gt 6 ]; then
echo "Found itemized list end"
fi
;;
"<table"*)
mode=table
printline=false
;;
"!")
# This can be changed to <thead>, I misunderstood the wiki markup
mode=thead
printline=false
;;
"!!")
# This can be changes to </thead>
mode=normal
printline=false
;;
" _")
# This is a kludge saying that we need to print a blank line in a
# <programlisting> tag
processed_line=" "
;;
"_")
# By default I through away blanks, but sometimes we need them. This
# is most commonly a </para> tag.
processed_line=""
;;
esac
if [ $printline = true ]; then
echo "$processed_line" >> "$tempoutput"
if [ "$debug" -gt 6 ]; then
echo "Output line : \"$processed_line\""
fi
fi
done
# xref tags stink. We need to associate the section ID to the section name
# I store these as I find them and then do one pass through the temp file
# for every section. There must be a better way...
echo Processing xref tags
cat $sectionfile |
while read line; do
sectionid=$(echo "$line" | awk -F "=" '{print $1}')
sectionname=$(echo "$line" | awk -F "=" '{print $2}')
if [ "$debug" -gt 6 ]; then
echo Processing $sectionid \($sectionname\)
fi
sed -r -e s^'<xref linkend="'$sectionid'"(.*?)\/>'^"[[#$sectionname|$sectionname]]"^g "$tempoutput" > "$tempoutput".tmp
mv "$tempoutput".tmp "$tempoutput"
done
cp "$tempoutput" "$outputfile"
rm -r $tempdir
echo Done.
Sometimes it makes sense to just globally recompile all documents. An example of this is an update to the stylesheets or just as a catchall to ensure that the latest files are included in the builds. When this happens, a simple script I've called db-rebuildall can be run.
Generally the easiest method of publishing the DocBook files is to put them on a webserver and share a link to the HTML formatted document. This allows the user to download any other formats that may appeal to them if they need to read them offline and it helps prevent filling up mail quotas.
It's taken me a long time to realize that I was missing the quintessential part of any programming-related topic - the infamous "Hello World" example. Granted, I do break Hello World a bit in that I try to get a bunch of simple functions in a single document, but here is the DocBook file and here is the compiled HTML (if you view the source you'll note it's ugly HTML, hence my use of tidy to clean things up):
Example 8. helloworld.docbook
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML v4.4//EN" "/usr/share/xml/docbook/schema/dtd/4.4/docbookx.dtd">
<article lang="en">
<articleinfo>
<keywordset>
<keyword>hello</keyword>
<keyword>world</keyword>
</keywordset>
<title>Hello World</title>
<abstract>
<para>
A very simple, but complete, DocBook document.
</para>
</abstract>
</articleinfo>
<section id="hello">
<title>Hello</title>
<para>
World!
</para>
<para>
I don't do anything fancy in this document, but I did want to show you a basic
document that should compile just fine. It's sometimes hard to find complete
examples like this, many snippets work well is you know what XML is supposed
to look like but if you're new to the topic it can be confusing.
</para>
<para>
You don't need the carriage returns at the end of every line, but when I put it
into a listing in my main document it doesn't provide word wrapping. In
a normal compilation excess whitespace and line breaks are ignored.
</para>
<para>
To compile this document, simply download it and run <command>docbook2html -u helloworld.docbook</command>.
This should create <filename>helloworld.html</filename> which you can open
in any web browser and it should look something like
<ulink url="helloworld.html">this</ulink>.
</para>
<section id="nesting">
<title>Nesting</title>
<para>
You can nest many things like <emphasis>sections</emphasis>, but note that
you'll need to close each one individually.
</para>
</section>
</section>
<appendix id="other-objects">
<title>Other Objects</title>
<para>
Other objects like tables and images just go between <para> objects
(note the use of HTML-style escape characters for things like <, >,
(non-breaking space), &, ", and others - a search for HTML
escape characters should turn up gold here).
</para>
<programlisting>
As an example object, this is a program listing.
It could be an image, or a table, or anything else.
Spaces are not eaten when rendered in HTML so t h i s will not look
like t h i s and carriage returns/linefeeds are also preserved. And this is
typically a non-kerning (fixed-width) font but you can change that in the
stylesheet definition (but I don't recommend it).
</programlisting>
<para>
After the object just pick up where you left off as if it was just a paragraph
above you.
</para>
</appendix>
</article>
This section will describe some common functions that may or may not be intuitively obvious to the beginner. Most of these are covered in the tutorial links in Appendix B, but it's often useful to have a centralized reference.
There are several common tags you'll probably run into frequently. While you have control over how these tags are formatted, there are defaults defined for the standard DocBook Style Sheets. Note that many of these tags, by default, will render identically. It's up to you to be diligent about using the right tag for the right entity, remember that the purpose of DocBook is to make it easy if you decide to change all variables to italicized text - if you had defined them all as literal
this can be a problem. This means you need to break the habit of thinking bold
and italics
and start thinking command
and varname
.
You can use the <emphasis>emphasis tag</emphasis> to create emphasis in your document. This will typically be an italicized version of your default font. For references to literal text you can use the <literal>literal tag</literal>.
Computer-related documents frequently refer to many standard entities. When referring to filenames or directories you can use the <filename>filename tag</filename>, but packages may be better expressed with the <package>package tag</package>. When expressing commands the <command>command tag</command> helps them stand out. Variables can be expressed with the <varname>varname tag</varname>
and tags themselves can be expressed with the <sgmltag>sgml tag<sgmltag>
.
There are two methods to express computer output, the first is an inline <computeroutput>computeroutput tag</computeroutput> which is good for short bits of output or fragments. For extended output that spans multiple lines, consider the following:
<programlisting> The programlisting tags are great for displaying the entire output of a command or for listing chunks of code. It preserves whitespace and uses a non-kerning (fixed-width) font in most stylesheets. However, it also will force a newline before and after the tags so it is not appropriate for inline text. </programlisting>
References to foreign phrases can use the <foreignphrase>foreignphrase tag</foreignphrase>.
Sometimes you need to use special characters in your document and without proper escaping the document won't compile. Since DocBook is based on XML, it uses the same XML/HTTP-style escape characters. Some common characters appear in the table below.
Table 1. List of Selected Special Characters
Character Name | Appearance | Escape Sequence | Description |
---|---|---|---|
Less Than | < | < | The < symbol is interpreted as the start of an XML tag, so < should be used in most instances. |
Greater Than | > | > | The > symbol can be misread as an XML tag end, but this is not very common. Still, use of > can be beneficial in some circumstances. |
Ampersand | & | & | the & symbol is frequently verboten and can be replaced with & |
Quote | " | " | Quotes can sometimes confuse parsers, use of " will indicate that this is not starting nor ending a quote but is simply the quote character. |
Non‑Breaking Space | | A non-breaking space is a space that doesn't count as whitespace. It can be used to keep two words together, but is often used for formatting. As an example, a programlisting block will often remove leading spaces. Converting the initial character to will prevent this from happening. | |
Currency | € ¢ £ ¥ | € ¢ £ ¥ | Select currencies are easy to work with, the dollar symbol is used on many keyboards because it is used in many programming languages. Additional symbols for currency often exist, but don't have a common reference and must be expressed in UNICODE format. Another consideration is that some symbols may not render well in all document types. A list of most common currencies can be found here. |
Symbols | © ® ° ± µ | © ® ° ± µ | Some common symbols have text to describe them, many other symbols exist that can be expressed in UNICODE format. |
General UNICODE Character | N/A | &#xHHHH or &#DDDD | UNICODE characters can be expressed in hex or decimal format (a list of characters can be found here). As an example, π will result in the Pi (π) symbol. Similarly, π will produce the same character (π). |
Docbook supports several different list styles. A simple bullet item list can be expressed as follows:
<itemizedlist> <listitem> <para> Entry 1 </para> </listitem> <listitem> <para> Entry 2 </para> </listitem> </itemizedlist>
The result of the code above looks like this:
Entry 1
Entry 2
A numbered or enumerated list is called an ordered list in Docbook and is expressed like this:
<orderedlist> <listitem> <para> Entry 1 </para> </listitem> <listitem> <para> Entry 2 </para> </listitem> </orderedlist>
When processed this is the result:
Entry 1
Entry 2
A list of variables can also be expressed, typically as a variable name followed by a description:
<variablelist> <varlistentry> <term> Variable 1 </term> <listitem> <para> Description </para> </listitem> </varlistentry> <varlistentry> <term> Variable 2 </term> <listitem> <para> Description </para> </listitem> </varlistentry> </variablelist>
This will produce the following output:
Description
Description
Frequently you'll want to make references to internal sections of your document. To do so, you'll want to use the id="section_name"
variable for your section
tags. This has the added benefit of creating more intelligent URL markers for your table of contents links (using things like index.html#install-ubuntu instead of an automatically generated index.html#AEN13):
<section id="section-name-here"> <para> Paragraph text here. </para> </section>
Note that the id
variable can belong to a section
or an appendix
. Once you've got one defined you can put an xref
reference anyplace in your document (well, anyplace there would normally be text) and it will be replaced at compile time with the section number and (in link-enabled formats like HTML) a link to the section. As an example:
Please see <xref linkend="xref" /> for more details on this.
Will yield:
Please see Section 4.3 for more details on this.
You can also insert an anchor
tag arbitrarily in a document as the target of an xref
. This tag is invisible and has the format <anchor id="anchor-name" xreflabel="text" />. The xref
will use "anchor-name" as the linkend
value and the output will use "text" as a replacement for the xref
tag. Again, the anchor
tag is invisible and will be omitted in the final output.
Since many times DocBook documents are read in HTML form it's useful to use META tags so search engines can properly classify a document. These are called Keywords in DocBook and can be a member of an articleinfo
or chapterinfo
grouping. The following is an example of keywords for an article on IPv6 and Ubuntu:
<articleinfo> <keywordset> <keyword>ubuntu</keyword> <keyword>ipv6</keyword> </keywordset> ... </articleinfo>
Will produce output in your HTML documents of the format:
<META NAME="KEYWORD" CONTENT="docbook">
Hyperlinks are fairly trivial to use in DocBook. They work the same as HTML hyperlinks but use the ulink
tag.
<ulink url="http://www.ebower.com/">eBower</ulink> will generate eBower
Email hyperlinks are of a special format.
<email>webmaster@ebower.com</email> will generate <webmaster@ebower.com>
DocBook allows for multiple image formats to be listed so the various compilers can select a format that's compatible. For example, the docbook2html looks for jpg or png files while docbook2ps prefers eps files. As such it's frequently useful to write a script that converts images to common formats. I've created a script that will convert all images in the current directory to png and eps files. By default it will convert the jpg files, but if you supply it with another format (for example, a gif) it will convert those instead. I've placed it in /usr/bin/convertimg.
Example 9. An example DocBook image conversion script
#!/bin/bash
if [ $1 ] ; then
ext=$1
else
ext=jpg
fi
for infile in $( ls *.$ext )
do
outfile=$(echo $infile | sed 's/\(.*\)\..*/\1/')
echo Converting $infile
convert $infile $(echo $outfile).png
convert -resize 400x800\> $infile $(echo $outfile).eps
done
The convert command is not included in Ubuntu's default desktop install, you may need to get the imagemagick package to use this script.
Once you've got the images in a common format you need to include them in a mediaobject
block. Note that by enclosing this in a figure
you get numbering.
<figure> <title>Figure Title</title> <mediaobject> <imageobject><imagedata fileref="filename.png" format="PNG" /></imageobject> <imageobject><imagedata fileref="filename.eps" format="EPS" /></imageobject> </mediaobject> </figure>
Sparklines are the use of graphs as typography. A small chapter by their inventor (Edward Tufte) appears on his forum. While the examples given tend to focus on the use of small, concise graphs to augment numbers and text in tables, they can also appear inline to a paragraph. As an example, one can include a sparkline to indicate that the current temperature is 15°C to concisely show a brief temperature history in the space equivalent to a word. Sparklines are not native to DocBook but are instead simple, inline graphics approximately 15 pixels high (this prevents line spacing issues in most formats). You'll note that sparklines are limited in that there is no room for scales or labels so they will need to be self-explanatory. They will only work in graphics-enabled output (HTML, PDF and PS).
A snippet of the paragraph above can be represented with the inlinemediaobject
tags used in the normal figures above. The difference is that the figure
and title
tags don't cause the image to appear as a traditional figure would.
the current temperature is <inlinemediaobject><imageobject><imagedata fileref="sparkline.png"/></imageobject><imageobject><imagedata fileref="sparkline.eps"/></imageobject></inlinemediaobject> 15°C
The same mechanism can be used for other inline graphics such as mathematical forumulae or even small graphics that do not require a separate callout, figure number, or all that precious space. The premise behind this methodology is that before computers and movable type there was no distinction between graphics and text. It was only once we migrated to using movable type for text and engravings or woodcuts for graphics, and now Word (vim!) for text, Excel (gnuplot!) for graphs and Photoshop (gimp!) for images that we've found a need to create an artificial separation.
Often it makes sense to insert a text file into a document instead of copying the contents of the file. For example, if you're referencing code you don't want to have to duplicate the changes in the documentation whenever the code changes. This document also uses these references so it's easy to keep it up to date should my scripts change. The example
tags create a caption and the programlisting
tags will format it in a way that says whitespace is important. Note, however, that these links are not live. They are compiled into the final documents so you'll need to run dbcompile whenever the source files change manually.
<example> <title>The Title of the Text File</title> <programlisting><inlinemediaobject><imageobject> <imagedata fileref="/path/to/file" format="linespecific" /> </imageobject></inlinemediaobject></programlisting> </example>
Tables can be very complex critters and the links in Appendix B have a reference to a more full-fledged table definition. The code below will get you a very basic table without any spans.
<table frame="all">
<title>The Title of the Table</title>
<tgroup align="center" cols="3" colsep="1" rowsep="1">
<thead>
<row>
<entry>Heading1</entry>
<entry>Heading2</entry>
<entry>Heading3</entry>
</row>
</thead>
<tbody>
<row>
<entry>Data1</entry>
<entry>Data2</entry>
<entry>Data3</entry>
</row>
<row>
<entry>Data4</entry>
<entry>Data5</entry>
<entry>Data6</entry>
</row>
</tbody>
</tgroup>
</table>
This code will produce the following table:
Sometimes it's useful to have footnotes, a reference to text at the bottom of the document. To do so, you can use a footnote
block to create a reference to a footnote that will appear at the end of the document. If you need to reuse the same footnote, use the footnoteref
tag to refer to the original footnote.
<para> Footnotes can appear anyplace<footnote id="footnote-appearance"> <para> Footnotes will typically appear between <para> tags but can also be the child of <programlisting> and other tags as described <ulink url="http://www.docbook.org/tdg/en/html/footnote.html">here</ulink> </para> </footnote> in the text. You can also duplicate a footnote anywhere else in the document<footnoteref linkend="footnote-appearance"/> without retyping the footnote text, but the reverse link will be to the part of the document where the footnote text was defined. </para>
This will produce the following output:
Footnotes can appear anyplace[1] in the text. You can also duplicate a footnote anywhere else in the document[1] without retyping the footnote text, but the reverse link will be to the part of the document where the footnote text was defined.
You can also now install from PPA by loading ppa:ubuntu-ebower/ebower (see instructions in Appendix C) and running sudo apt-get install dbcompile. This is the easiest method as it takes care of dependencies for you and will be the most up-to-date.
If you want to do things manually, the following line will install all packages required for proper DocBook happiness:
sudo apt-get install vim docbook-utils docbook-defguide xml-twig-tools imagemagick lynx tidy
In addition to the packages above, you may wish to include my DocBook helper scripts. I store mine in /usr/bin but feel free to put them wherever you like. Note that I now use some external config files so you'll probably want to examine the non-executable files to ensure they're in the right spot. For example, dbcompile.header should be in /etc/dbcompile/ or ~/.dbcompile/.
To install all of this in one fell swoop you can experiment with dbcompile-0.1.1.1.deb which will install my scripts to /usr/bin as well as the dependencies above. Simply run sudo dpkg -i dbcompile-0.1.1.1.deb after downloading to install it. Assuming you'll get an error message thanks to the dependencies, you'll want to run sudo apt-get -f upgrade to finish the install. Note that this file may be out of date, so please consider using the PPA for the latest and greatest.
I've created a Personal Packages Archive (PPA) on Launchpad.net. You can add this using the command sudo add-apt-repository ppa:ubuntu-ebower/ebower and then sudo apt-get update to load the list of packages. Alternatively you can add the following lines to your /etc/apt/sources.lst (this is useful if you're running server and don't want to add the add-apt-repository command):
deb http://ppa.launchpad.net/ubuntu-ebower/ebower/ubuntu precise main deb-src http://ppa.launchpad.net/ubuntu-ebower/ebower/ubuntu precise main
Make sure you replace precise above with your distro (lucid, oneiric, etc.). If your distro is not available please contact me with the version and package name and I'll try to update things posthaste. After this is done, run the command sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys C27A63B3 to import the keys.
By installing my scripts from the PPA you'll be sure to have the most up-to-date versions in case I need to make changes. However, note that I'm not a developer and I only play one on the Internet. If you use my scripts as a baseline and make changes yourself please let me know how you're changing things and I'll try to accommodate your efforts to ensure that I don't stomp on your customizations.
My name is Jeff Bower, I'm a technology professional with more years of experience in the telecommunications industry than I'd care to admit. I tend to post with the username jdbower on various forums. Writing these documents is a hobby of mine, I hope you find them useful and feel free to browse more at https://www.ebower.com/docs.
If you've got any questions or feedback please feel free to email me at docs@ebower.com or follow me on Google+ or Twitter.
[1] | Footnotes will typically appear between <para> tags but can also be the child of <programlisting> and other tags as described here |