5.4.3 Static metrics

The basic technique for measuring the small scale complexity of an implementation was introduced in the previous chapter. McCabes cyclomatic complexity measures the number of possible paths through an algorithm by counting the number of distinct regions on a flowgraph. The measureable attribute was validated by evaluating it to the cognitive complexity of the algorithm which it was representing. However, the measurement is related to the flow of control and does not include other considerations such as the complexity of the decisions which control the flow or the operations which comprise the arcs.

The simplest and most straightforward additional measurement which could be made is to count the number of lines of source code which are required. This is a very inadequate measurement, for reasons which were explained above and because it is very dependent upon the programming language being used. Even if the measurement is accepted as valid it is still not as well defined as it might appear. For example it is not clear how to count the following:

The most straightforward measurement might be that the number of lines of code to be counted are all lines which are not blank and which are not composed entirely of comments, irrespective of the number of Ada statements or declarations which they contain. This might be better termed Effective Lines Of Code (ELOC). One useful measurement which can be derived from it is the density of comments. This can be defined as:

The density of comments value will be between 0.0 and 1.0 and it can be used as a quality indicator. A quality indicator is an attribute which if present does not guarantee the quality of the product, but if it is absent indicates that one aspect of quality is missing. A cut-off value for comment density might be declared by an organisation and source code whose comment density fell below this value could be rejected as being under commented.

Other aspects of the way in which source code is written are also amenable for use as quality indicators. The average length, in characters, of identifiers will give some indication of the meaningfulness of the identifiers chosen by a developer. Meaningful identifiers tend to be longer rather than shorter and a cut-off value for the average length might be decided upon.

The extent to which source code complies with indentation requirements can also be measured and a very low tolerance for deviations from the recommended use of indentation enforced. The use of capitals and lower case conventions can also be easily checked. Compliance with these factors can easily be measured by a software tool which first examines the existing source code and reformats it according to the specified conventions. It can then measure the existing source code against the reformatted code for deviations. However, if such a tool is available then it could easily be configured to output the reformatted source code ensuring that it complies with the convention. Such a tool is known as a pretty printer and if it is available a near zero tolerance for deviations from recommended style can be enforced.

All of these considerations need to be communicated to a developer and require the organisation to explicitly state them in a source code layout style guide, commonly known simply as a style guide. A style guide which complies with the style of source code layout used in this book is included in Appendix B.

Although all of these measurements can be used as source code quality indicators they do not assist with measuring the complexity of the implementation. The basis of a cyclomatic complexity is the number of places where flow of control can divide and a refinement to the basic measurement would be to attempt to measure the complexity of these decisions. The majority of these decisions are formed from BOOLEAN expressions and an analysis of the possible forms of a BOOLEAN expression will yield a decision complexity measurement.

The simplest BOOLEAN expression is the value of a BOOLEAN variable, and this can be give an arbitrary complexity value of 1. The next simplest possible expression is a negated BOOLEAN value which can be given an increased complexity value of 1.5. Continuing this approach a complexity value can be assigned to each part of a complex BOOLEAN expression and the overall complexity derived by adding all the factors together. The following table presents a possible scale of factors for the commonest components of a Boolean expression.

Thus the Boolean expression controlling a binary chop sort, which might take the following form.

 	while (not Located) and (LowerBound < UpperBound) loop

could have its complexity calculated as: 1.5 for the not Located term, 0.5 for the less than ( < ) relation and 1.0 for the and conjunction; this gives a total of 3.0 for the overall complexity of the decision. By computing the average complexity of all the controlling expressions in the subprogram, an  expression complexity metric can be obtained. This can be used in conjunction with the cyclometric complexity metric to give a better indication of a subprogram's complexity.

Many of the controlling expressions used in the source code examples in this book have been deliberately constructed so that they are controlled by a simple Boolean variable expression. The complexity of the controlling decision is then in part determined by the supporting operations in the body of the code. An operation complexity metric can be determined by a similar technique to that used for computing the expression complexity above. A suitable extension to the table above for computing the complexity might be as follows.

Following the binary chop example above, the following supporting expressions might be required.

	a)	MiddleLocation := UpperLocation + LowerLocation /2;
	b)	Located := List( MiddleLocation) = ItemToFind;
	c)	UpperBound := MiddleLocation - 1;
	d)	LowerBound := MiddleLocation +1;

The complexity of each of these expressions is as follows.

	a) addition + implicit parenthesis + division	= 2.0
	b) array subscript + simple relation		= 1.0
	c) subtraction					= 0.5
	d) addition					= 0.5

Thus the average operational complexity of these four expressions is 1.0. In addition to the total number of expressions and their average operational complexity, the number of unique operators used in the source code also has an influence upon its complexity. A unique operator is one where the name of the operator or the types of the operands is different from all other operators used in the code. For the four expressions above there are three unique operators: INTEGER addition ( + ), INTEGER subtraction ( - ) and INTEGER disision ( / ).

This approach to determining the complexity of a software component by assigning complexity ratings to the various operations can be continued by deriving a technique for the classification of subprogram calls. The complexity of a subprogram call depends upon the number and modes of its actual parameters. A complexity factor of 1.0 is allowed for an in-only parameter, 1.5 for an out-only parameter and 2.0 for an in-out parameter, with a lower bound of 1.0 for parameterless subprogram calls. These values are weighted by a factor of 1.0 for subprograms declared in commonly used standard packages, 1.25 for subprograms declared in the same source code module module and 1.5 for subprograms from less commonly used standard packages or from non-standard packages. Finally in recognition of the cognitive complexity of recursion a weighting of 2.0 could be allowed for a recursive call.

This approach could also be used for the computation of a complexity factor for subprogram parameters. One factor could be calculated for the subprogram parameter itself, based upon the number and modes of its parameters. A weighting factor could be used for the actual call of the subprogram in recognition of the cognitive complexity of indirectly calling a subprogram. Likewise the additional cognitive complexity of instantiating and calling a generic subprogram would also have to be allowed for. The production of complexity rules for these situations will be suggested as an end of chapter exercise.

Putting all of these complexity considerations together an overall complexity measurement of a subprogram can be made. Taking the second version of the SquareRoot function from Chapter 1 of this section, the complexity of each operation can be shown on the source code as follows.

s3.0 e2.5    CheckAssertion( (Number >= 0.0 and Accuracy > 0.0),
                            "Square root pre assertion error");
c1.5        while not CloseEnough loop
e2.0           CurrentGuess := (MaximumValue + MinimumValue) / 2.0;
e0.5           CurrentGuessSquared := CurrentGuess * CurrentGuess;
e1.0           CurrentError := abs( Number - CurrentGuessSquared);
c0.5           if CurrentError <= Accuracy then 
e0.5              CloseEnough := TRUE;
              else
c0.5              if CurrentGuessSquared > Number then
e0.5                 MaximumValue := CurrentGuess;
                 else
e0.5                 MinimumValue := CurrentGuess;
                 end if;
              end if;
           end loop;
s3.0 e5.0       CheckAssertion( (CurrentGuess * CurrentGuess >= 
                                             Number - Accuracy) and
                               (CurrentGuess * CurrentGuess <= 
                                             Number + Accuracy)),
                             "Square root post assertion error");
e0.5        return CurrentGuess;
        end SquareRoot;

Each complexity measure has been labelled with e for expression, c for controlling expression and s for subprogram call. The two subprogram calls have an s value recorded for the call and also an e value as one of the actual parameters is an expression. An s value of 3.0 has been computed as 2.0 for two in-only parameters multiplied by 1.5 as the CheckAssertion procedure is declared in a non-standard package. Examples of the calculation of the other values have been given above. The results of the complexity analysis of this version of the SquareRoot function can be conveniently expressed in a table.

In addition to the average operational and expression complexity values the maximum values have also been recorded as it is expected that it is the most complex parts of subprograms which provide the greatest potential difficulty. In addition to the number of unique operators which have been used a count of the number of unique subprograms which have been called is also recorded. The list of unique operators used is: Floating point >=, > and <, floating point assignment, floating point +, -, * and /, floating point abs, BOOLEAN assignment and BOOLEAN and conjunction.

This process of complexity analysis is so well defined that there are a number of automated tools which will perform the analysis automatically and produce such tables for inclusion in a program's technical documentation. details of some tools which are available are given in Appendix A.