next up previous contents
Next: 8. Naive-user documentation Up: Workbook for CMPE 185 Previous: 6. Algorithm Description


7. In-program Documentation

7.1 Goals--recognizing that programs are documents

In-program documentation is the most common, and most neglected, form of writing for computer programmers. We hope that this exercise teaches you

7.2 Audience assessment--maintenance programmers

Just as tailors spend more time making alterations than sewing new suits, most programmers spend more of their working time modifying old code than they do writing new code. If a program is difficult to change, it will be clumsily modified, and the resulting mess will be blamed on the original programmer. To protect your reputation, it is essential that you make your programs easy to modify cleanly.

Writing easily modifiable code is still an art, rather than a science, but all the techniques of structuring and modularization taught in programming classes help. Kevin disagrees with Niklaus Wirth, who uses a book title to claim that Algorithms+Data Structures=Programs [Wir76]. Certainly, algorithms and data structures are necessary for programs, but interface specifications and maintenance documentation are also essential.

Programs are read by compilers and by people. Far too many programmers stop when they get a program that is readable by a compiler--totally ignoring their other audience. Who are the people who read programs? Mainly other programmers. Most often the programmers are not reading to learn the language or to admire your style, but to make specific changes to the program. They usually read only those parts of the program they think will be relevant, so you must provide cross-references to information not immediately visible.

Many programs rely on external documentation. For student work, the external documentation is often the assignment sheet. External documentation may be easier to write than in-program documentation, but is not nearly as useful to a programmer attempting to modify your code. First, the external documentation often gets separated from the source, so the programmer doesn't have it to read. Second, the source is often modified without changing the external documentation, so the two may no longer agree. It is essential that maintenance documentation be included in the source code itself.

7.3 Assignment--adding comments to Wirth's code

Read the attached description of a solution to the knight's-tour problem [Wir76, 137-142]. Re-write Wirth's Program 3.3 presented in Section 7.8 (not changing the algorithm or the data structure) to make it comprehensible without the accompanying text. You will have to change the variable names, and you may want to change the procedure names. You can play with the indenting and other aspects of formatting, if you think it makes the program more readable. (We prefer the vertical alignment of begin and end used in Wirth's book to the alignment of if and end sometimes taught at UCSC. ) It is acceptable to translate the program from PASCAL into C or C++, but the changes made should be minor syntactic ones, not major restructuring of the program.

Your finished product should be an executable, well-documented program. Wirth's name should appear prominently near the beginning (as the original programmer), but you should take credit for the changes and documentation you provided.

7.4 Writing process--in-program documentation

Despite the way this assignment is structured, in-program documentation is not the last thing you do to a working program. It is most needed in the early stages of the programming, as scaffolding to support the incomplete program fragments. Whenever you write a procedure, you should write the explanation of the interface and function before writing the procedure itself. If you aren't crystal-clear about the purpose of the procedure, you won't be able to write it correctly.

In good programs, the documentation is written first, and the program written to match the documentation. For large programs with multiple programmers, it is particularly important to get the interfaces between modules well-defined early in the process. If the interface specification isn't written out completely, isn't distributed to the programmers, or isn't comprehensible, it might as well not exist.

At this point in your academic career, you've probably been told a hundred times that you need to document your programs better. When pressed, your instructors have probably mumbled vaguely about wanting more documentation but not precisely what was lacking. We might not do much better at telling you what is expected, but we'll try.

Of course, the best way to find out what is needed in the way of program documentation is to try modifying someone else's code. Whenever you run into something you can't figure out, or, worse yet, something you think you understand but are wrong about, then you've found a place where the documentation is inadequate.

To forestall such misunderstandings, you have to write far more documentation than you, as the creator of the code, believe you need. Because well-documented programs are so rare, you've probably never seen one and aren't aware what they look like. Perhaps the best example is Knuth's type-setting program TEX [Knu86]. We have put the source code on reserve in the Science Library--take a look at it to see what a well-documented program looks like. Note that the ratio of documentation to code is large and that the documentation concentrates on explaining data structures and techniques, not on repeating what the code says.

7.5 Format for in-program documentation

Everybody has her or his own ideas about the best places to put documentation in source code, and the best ways to format it. The ``correct'' style depends on the language and on the practices of the company that will maintain the program. We'll try to concentrate on the universal features of source code documentation.

Much program documentation is contained in comments. In most documentation styles, two different types of comments are used: block comments and one-line comments. Block comments are multi-line chunks of text, often containing several paragraphs. One good style for block comments in Pascal is the following:

 *      Block comments contain ordinary text written in good English.
 *      Several lines may be needed, and blank lines are left 
 *      between paragraphs.
 *      If variables or expressions are needed, they should be typed 
 *      exactly as they would appear in the code.  For example, 
 *      alpha[char] is 1 for each alphabetic character, 0 for others.
 *      For ease of later editing, each sentence can be started on a new line,
 *      and no "enclosing box" is used.
 *      The vertical line of stars makes the block comment easy to find.
 *      If you feel obligated to enclose the block comments, you can
 *      put a horizontal line of stars as the first and last line,
 *      but don't put a right-hand column of stars, as it discourages
 *      programmers from changing the comment when they change the code.
 *      A similar style can be used in C programs (with /* and */ delimiters).
 *      Ada programs and assembly language programs use a delimiter
 *      ("--" for Ada) at the beginning of each line of a block comment.
This style for block comments has several advantages:

One-line comments are used to mark interesting places in the program, and often are simple declarative or imperative sentences (for example, ``x[i] is now the largest element.'' or ``Swap leftmost and rightmost incorrect elements.'').

The following sections explain when to use block comments and when to use one-line comments.

7.5.1 Identify your work

Put your name and corporate address in a block comment at the beginning of every file. When you modify someone else's code, do not remove the original names, but add a comment saying that you modified the code, what you did to it, and when. In this way you retain the modification history of the program with the source code. These block comments can be extremely useful when tracking down a new bug to find out what changes have been most recently made to the program.

7.5.2 Use white space freely

  Ideally, each procedure and its accompanying documentation should fit on a single page. Use page breaks and blank lines to make the printed appearance of the source code correspond to the logical structure of the program. The emphasis on appearance is not to satisfy some perverse artistic æsthetic, but to make the program more readable.

Within procedures, adding a blank line between sections of code acts like a paragraph break to indicate a change in emphasis. Starting a new page before each procedure acts like a section or chapter break, to indicate a whole new topic. The page-break character (often called a form-feed) is control-L--put it before a procedure's block comment to make the whole thing start on a new page.

Because white space is a powerful tool for grouping text together visually, try to use it in semantically meaningful ways. Don't break code up arbitrarily, just because it has gotten too long--do add white space between sections performing different tasks. If you feel that a section has gotten too long, then look for a way to break into explainable subparts, don't just add a blank line in the middle.

7.5.3 Indent to show block structure

  Block-structured languages are commonly printed with varying indentation. The more deeply nested a statement is, the more deeply it is indented.

Some programmers use tabs for indenting, but the resulting programs are often too wide to fit on a screen or piece of paper. Other programmers use only one or two spaces for indenting, making the level of nesting difficult to see. Each level of indentation should be about four spaces deeper than the previous, and indentation levels should be consistent. (If you use vi, try ``:set shiftwidth=4 autoindent''.)

Slightly different styles of indenting are used by different programmers, which can get confusing when reading a program that has been written by a larger team or modified by several different programmers. It is good practice for all members of a programming team to use the same indenting style and for all subsequent modifications to use the same style. Large development projects often have a style guide containing the conventions to be used by the programmers.

One popular style calls for the bodies of loops and then- or else-clauses to be indented 4 spaces deeper than the loop- or if-statement, with the punctuation (begin and end or { and }) at the same level as the outer text:

		if (test)
		{		body
A sequence of tests that determine which of several actions to perform should have all the tests indented to the same level, and all the actions one level deeper:

		{		action1
		else if(test2)
		{		action2
		{		default action

7.5.4 Name variables carefully

A variable name should tell the reader (not just the programmer) what its function is; calling a variable rownum is much better than calling it i. Try to avoid using the same variable for multiple purposes.

7.5.5 Use a block comment for each procedure interface

A procedure's block comment should describe the external view of the procedure. As a minimum, the comment should tell which parameters are input, which are output, what global data structures are used, and what side-effects the procedure may have. The function of the procedure should be clearly explained. Any preconditions that must hold before the procedure can be safely executed should also be included. Choose a standard format for all procedures, perhaps something like

(*      procedure-name
 *      action:  explain what the procedure does (not how it does it).
 *      inputs:  list each variable that is input and what it means.
 *      outputs: list each parameter that is an output and what it
 *               means.
 *      returns: what is the meaning of the value returned by a 
 *               function?
 *      globals: list any global variables or data structures needed by
 *               procedure.
 *      side-effects:  list any changes expected as a result of running the 
 *               procedure.

The same variable may occur in globals, inputs, and outputs. A global constant may be listed as a global, but not as an input. A global variable that is used, but not changed, may be listed as both a global and as an input. A global variable that is set should be listed as an output or as a side-effect. For example, if a procedure reads data from a file and puts it into a global array, that array is properly viewed as an output of the procedure. On the other hand, if a procedure finds a path through a maze and keeps some statistics about the search as it goes, the changes to the global statistics are a side-effect.

A side-effect is any action performed by the procedure that is not completely contained in the values of the output variables. Common side-effects are changing other global variables, reading input (thus advancing the position in the input stream), and printing output. You often have to mention a global variable twice--once in the section listing global variables, to indicate that the global variable is used in some way, and once in the side-effects section, to explain how the global variable is changed by the procedure.

In many cases, it is better to say something like ``side-effects: none'' than to leave the reader trying to decide whether a procedure has no side-effects or you simply forgot to document the side-effects. But be careful--programmers have written block comments that claimed a procedure had no inputs, outputs, global variables, or side-effect--in other words that the procedure did nothing at all! In most cases, the procedure printed a message or read something from an input--both of which are classified as side-effects.

7.5.6 Use a block comment inside each procedure to explain method

Unless a procedure uses a completely trivial technique, a sentence or two of explanation at the beginning will make it much easier for the reader to understand. You do not have to cram 30 pages of analysis from a text book into a couple of sentences. If you can't say all that needs to be said in a couple of paragraphs, you should at least mention what technique is being used and give a complete citation to a more complete external description.

The interface description comment is not the correct place for an explanation of the techniques used. You should be able to change techniques (say, from insertion sort to heap sort) without changing the interface description. Use a separate comment for methods.

The methods comment explains what is done, why it is done, and when it is done. The details of how it is done should be left for the code. If you are not sure how much to put in the methods comment, it is probably better to put in too much than to put in too little.

7.5.7 Use a block comment for each data type

When you define a record or array structure, explain the purpose of each field. Give an overall view of the data structure (is it a binary tree? a linked list? a heap?). What special properties of the data structure are you relying on?

7.5.8 Use a block comment for each data structure

You will often use apparently simple data structures (such as arrays) without having to declare a special type. Often, the simpler the data structure, the more ways it could be being used. Elementary data structures may need more commentary than the structured types that have special declarations. For example, a simple array has many different possible meanings. It may store a mapping from one set to another--what are the sets? what does the mapping mean? Or the array may be being used as a hash table--what are the keys? what method is used to resolve collisions?

Identify global variables that are used as constants throughout the program. Although PASCAL only recognizes scalar constants, programmers often use global arrays and sets as constants. Because PASCAL does not support pre-loaded arrays, you should say where the structure is initialized.

7.5.9 Use a one-line comment for each local variable

Each local variable should have a single, distinct purpose. The purpose is hinted at by the name of the variable, and explicitly stated in a comment where the variable is declared. For example,

    var L: integer; {A[L] is the leftmost unchecked value.}
The one-line comment may be just a noun phrase, with an implicit ``The variable foo is'' in front of it. For example,
    var L: integer; {the index of the leftmost unchecked value}

Watch out for slightly inaccurate statements. If you're in a hurry you might write ``first unchecked value'' when you mean ``the index of the leftmost unchecked value''. Saying ``SP is the stack pointer'' is much less valuable than saying

    var SP: integer; {STK[SP] is the top element of the stack, and 
	              STK[SP-1] is the next available empty slot}

If you are cramped for space in the comment, you can sometimes leave out the definite articles (for example, {index of leftmost unchecked value}). Don't do it if any confusion could result; make the comment multi-line instead.

Be sure your names are not misleading. Using x for a variable that indexes the vertical direction in a printout, and y for the horizontal variable is a common mistake--people are conditioned to expect x to be the horizontal direction. Be particularly careful in checking input and output loops.

7.5.10 Use comments sparingly inside the body of the code itself

Many beginning programmers, when told to document, attempt to be explicit and write useless comments like

     i := i +1;    {increment i.}

Many short comments interfere with reading the code itself. A long block comment at the beginning of the procedure can be far more helpful and is usually easier to write. Reserve one-line comments for such tasks as

7.5.11 Use assertions.

Many languages support (or, by using macro packages, can be made to support) assert statements. The format varies with the implementation, but usually consists of an expression that is supposed to evaluate to true and an error message to print if the assertion is not true. When an assertion fails, it usually breaks to a debugging program (or causes the program to abort). For example, at the beginning of a procedure, one might test a precondition as follows:

        assert(L<R, "left and right pointers are crossed");

Loop invariants are particularly good things to put in assertions. David Gries recommends using the loop invariants as the foundation for designing new algorithms and accurately implementing existing algorithms [Gri83].

If your language does not support assert statements, it is still worthwhile to put the assertions in as comments, since they tell the reader something about what you were expecting when you wrote the code.

7.6 Things to keep in mind for peer editing

Read your partner's code as if you had never seen the algorithm or data structures before. Consider what you would need to know to make a major modification. For example, consider writing a graphical procedure to draw the chess board with the tour, or consider changing the simple backtracking search to an ``intelligent'' heuristic search. What do you need to know about the data structures? about the algorithm?

Look at the spacing and indentation on the page. Is the code visually divided into natural chunks? Compare with the guidelines in Sections 7.5.2 and 7.5.3. Can you see at a glance where each loop begins and ends?

Check that all block comments are in good English. Telegraphic style, in which you leave out articles and verbs, may be used only in one-line comments.

Check that the variable names are meaningful. For example,

Check that the global data structures are well-commented. For example, the array for the board is the most important data structure in the knight's tour problem. It is important to document the meaning of the contents, not just of the indices. What does the number in the board array mean? Special values (such as 0 to mean that a square has not yet been visited) should be explicitly described.

Is the output format for the program described? To say that a program ``prints the solution'' is not particularly helpful. There are many ways to print the solution to a knight's tour--listing the positions visited as (x,y) pairs, drawing a chessboard with numbers to indicate the order for visiting the squares, moving a drawing of a knight around on a display, or using either of the two standard chess notations for describing the movements. The description of the format should be explicit enough that another programmer can write an interface to the program without having to learn all the internal data structures and operations.

Is the input format correctly described? Even Wirth's external documentation is a bit inconsistent, as the starting location and board size are not parameters but compile-time constants!

Is the explanation of the recursive procedure clear? The best way to describe a recursive procedure is with preconditions and postconditions. How much of the tour is already completed before each call to the procedure? What do the different possible return values mean? If global variables are changed, what do they contain before and after the procedure? Re-read Section 6.5 about how to explain recursion.

Termination criteria are also useful--when does the recursion stop? Does the programmer clearly distinguish between the action of one iteration of the recursion (lengthening the tour by one) and the overall effect of the procedure (determining whether there is a tour that starts with the current partial tour)?

Are the side-effects properly described? The main purpose of the recursive procedure is to change the global board array--the returned Boolean value is just an output to flag when the procedure is finished. The documentation should make it clear what the contents of the array are after the procedure returns. Warning: the values are different depending on whether a tour was found or not.

Are the words carefully chosen? Be careful not to copy Wirth too slavishly--he is not a native speaker of English. For example, ``covering'' is a technical term mis-used by Wirth--a covering is a collection of sets whose union contains the object being covered. You should write using your own words, being particularly careful with any words that you have only recently acquired.

Watch out for vague words like valid or good. When they occur, look for a more descriptive word.

When technical words are used, make sure they are used consistently. Try to have only one meaning for each word, and only one word for each concept. As an example of what can go wrong, one programmer used square, field, record, record field, box, location, playing field location, and board square interchangeably for two concepts that were not distinguished, the coordinates of an element of the board array and the contents of that element.

Watch out for off-by-one errors in both the programs and the comments!

7.7 The final draft

Typos in programs are not just minor inconveniences; they can be fatal errors. Make sure your final program compiles and runs before handing it in. It is easy to insert a comment that isn't properly terminated, and so even if all you edit are the comments, you have to recompile and re-run to make sure you didn't break anything.

Try to keep each procedure and its documentation on one page. If you must print a procedure on two pages, make the page break in a natural division in the code. Insert page breaks and white space to make the appearance of the printed source code correspond to the logical structure of the program. The unix filter pr will automatically add the file name, date, and time to the top of each page. If you're using UNIX, look into using pr (or the equivalent lpr -p option). Some of the machines on campus have enscript, which does an even better job of printing program listings.

For this assignment only, do not double space what you hand in. There should be enough white space in a well-formatted program for us to write whatever comments we need. If you end up with dense blocks of code or text, something needs to be changed.

Note: the usual 8.5'' $\times$ 11'' rules apply. If you have 14'' wide paper, cut it down to 11''. Remember to burst the pages and staple in the upper left corner. The emphasis in this class is on making the program as easy for the reader to understand as possible--if it cascades off a desk and doesn't fit in a file folder, the reader is much less likely to try to read it.

Our old eyes are also tired of straining to read pale grey, fuzzy characters. If the printout you get is not crisp and clear, demand that it be reprinted with a decent ribbon. If you are using your own printer, make sure that you have a spare ribbon available.

7.8 Wirth's description of the knight's tour program

  The remainder of this section is copied with the permission of Prentice-Hall from Niklaus Wirth's Algorithms + Data Structures = Programs, 1976, pp. 137-141. Figure and display numbers have been changed to fit the numbering of the current document.

Backtracking Algorithms

A particularly intriguing programming endeavor is the subject of ``general problem solving.'' The task is to determine algorithms for finding solutions to specific problems not by following a fixed rule of computation, but by trial and error. The common pattern is to decompose the trial-and-error process into partial tasks. Often these tasks are most naturally expressed in recursive terms and consist of the exploration of a finite number of subtasks. We may generally view the entire process as a trial or search process that gradually builds up and scans (prunes) a tree of subtasks. In many problems this search tree grows very rapidly, usually exponentially, depending on a given parameter. The search effort increases accordingly. Frequently, the search tree can be pruned by the use of heuristics only, thereby reducing computation to tolerable bounds.

It is not our aim to discuss general heuristic rules in this text. Rather, the general principle of breaking up such problem-solving tasks into subtasks and the application of recursion is to be the subject of this chapter. We start out by demonstrating the underlying technique by using an example, namely the well-known knight's tour.

Given is a $n\times n$ board with n2 fields. A knight--being allowed to move according to the rules of chess--is placed on the field with initial coordinates x0, y0. The problem is to find a covering of the entire board, if there exists one, i.e., to compute a tour of n2-1 moves such that every field of the board is visisted exactly once.

The obvious way to reduce the problem of covering n2 fields is to consider the problem of either performing the next move or finding out that none is possible. Let us therefore define an algorithm trying to perform a next move. A first approach is shown in (7.1).

 ...\space (\it no more
\bf end\end{tabbing}}\end{displaymath} (1)

If we wish to be more precise in describing this algorithm, we are forced to make some decision on data representation. An obvious step is to represent the board by a matrix, say h. Let us also introduce a type to denote index values:

type index = 1 .. n ;
var h: array [index, index] of integer

The decision to represent each field of the board by an integer instead of a Boolean value denoting occupation is because we wish to keep track of the history of successive board occupations. The following convention is an obvious choice:

		h[x,y] = 0: 		field { x, y }$\left\{ x, y \right\}$has not been visited
		h[x,y] = i: 		field { x, y }$\left\{ x, y \right\}$has been visited in the $i^{\rm th}$ move $(1\leq i \leq n^2)$

The next decision concerns the choice of appropriate parameters. They are to determine the starting conditions for the next move and also to report on its success. The former task is adequately solved by specifying the coordinates x,y from which the move is to be made and by specifying the number i of the move (for recording purposes). The latter task requires a Boolean result parameter: $q=\hbox{\it
true}$ denotes success; $q=\hbox{\it false}$ failure.

Which statements can now be refined on the basis of these decisions? Certainly ``board not full'' can be expressed as ``i<n2.'' Moreover, if we introduce two local variables u and v to stand for the coordinates of possible move destination determined according to the jump pattern of knights, then the predicate ``acceptable'' can be expressed as the logical combination of the conditions that the new field lies on the board, i.e., $1\leq u \leq n$ and $1\leq v \leq n$,and that it has not been visited previously, i.e., h[u,v]=0. The operation of recording the legal move is expressed by the assignment h[u,v]:=i, and the cancellation of this recording as h[u,v]:=0. If a local variable q1 is introduced and used as the result parameter in the recursive call of this algorithm, then q1 may be substituted for ``move was successful.'' Thereby we arrive at the formulation shown in (7.2).

xxx\=xxx\=xxx\=xxx\=xxx\=xxx\=xxx\=\k... more candidates})$\\  
\\ gt $q:=q1$\\ \bf end\end{tabbing}}\end{displaymath} (2)

Just one more refinement step will lead us to a program expressed fully in terms of our basic programming notation. We should note that so far the program was developed completely independently of the laws governing the jumps of the knight. This delaying of considerations of particularities of the problem was quite deliberate. But now it is time to take them into account.

Given a starting coordinate pair { x, y } $\left\{ x, y \right\}$, there are eight potential candidates for coordinates { u, v } $\left\{ u, v \right\}$of the destination. They are numbered 1 to 8 in Figure 7.1.

Figure 7.1: The eight possible moves of a knight.
{l\vert c\vert c\vert c\vert c\ver...
 ...ticolumn{1}{c}{}&\multicolumn{5}{c}{$x$}\\ \end{tabular}\end{center}\end{figure}

A simple method of obtaining u,v from x,y is by addition of the coordinate differences stored in either an array of difference pairs or in two arrays of single differences. Let these arrays be denoted by a and b, appropriately initialized. Then an index k may be used to number the ``next candidate.'' The details are shown in Figure 7.2. The recursive procedure is initiated by a call with the coordinates x0, y0 of that field as parameters, from which the tour is to start. This field must be given the value 1; all others are to be marked free.

H[x_0, y_0]:= 1;\hskip 5em\hbox{try}(2,x_0,y_0,q)\end{displaymath}

Figure 7.2: Knight's tour
program knightstour(output);
const n=5; nsq=25;
 else writeln('NO SOLUTION')

One further detail must not be overlooked: A variable h[u,v] does exist only if both u and v lie within the array bounds $1\ldots
n$. Consequently, the expression in (7.2), substituted for ``acceptable'' in (7.1), is valid only if its first two constituent terms are true. A proper reformulation is shown in Figure 7.2 in which, furthermore, the double relation $1\leq u \leq n$ is replaced by the expression $u\ \hbox{\bf in } [1,2,\ldots,n]$, which for sufficiently small n is possibly more efficient (see Section 1.10.3 [Wir76]). Table 7.1 indicates solutions obtained with initial positions { 1, 1 } $\left\{ 1, 1 \right\}$, { 3, 3 } $\left\{ 3, 3 \right\}$for n=5 and { 1, 1 } $\left\{ 1, 1 \right\}$for n=6.

Table 7.1: Three Knight's Tours
1 6 15 10 21
14 9 20 5 16
19 2 7 22 11
8 13 24 17 4
25 18 3 12 23

23 10 15 4 25
16 5 24 9 14
11 22 1 18 3
6 17 20 13 8
21 12 7 2 19

1 16 7 26 11 14
34 25 12 15 6 27
17 2 33 8 13 10
32 35 24 21 28 5
23 18 3 30 9 20
36 31 22 19 4 29

It is possible to replace the result parameter q and the local variable q1 by a global variable, thereby simplifying the program somewhat.

next up previous contents
Next: 8. Naive-user documentation Up: Workbook for CMPE 185 Previous: 6. Algorithm Description

Kevin Karplus
Computer Engineering
University of California, Santa Cruz
Santa Cruz, CA 95064

HTML version created 1/1/1999