Software Metrics

Note: This is a copy of a page I wrote for the software engineering course I taught at the University of Illinois Urbana-Champaign. I am reposting it here on my blog in the hope that it will be found useful by others in the future.

Software Metrics

What are software metrics?

  • Quantitative measurements distilled from data
  • Distilled by measuring software development processes and actual source code
  • Highlight areas that need work in specific nodes of code as well as generalizations about your code overall
  • “You can’t control what you can’t measure” -Tom DeMarco

Limitations of metrics

  • Software metrics are intended to help programmers control and monitor software production, but…
  • It’s difficult to determine “how much” software there is in a given program
  • Can give a skewed impression of software, especially when calculated early in the software development process
  • Can be difficult or complex to calculate, especially as the volume of code grows

Examples of metrics

  • Lines of code
  • Number of classes & interfaces
  • Code to comment ratio
  • Cyclomatic complexity
  • Code coverage
  • Bugs to lines of code ratio
  • Cohesion
  • Coupling
  • Failed tests per build
  • Version control commits per day
  • Lines of code per commit

Terminology

  • Node
    • A block of source code, usually either a single line, a function/method, class, or package. A node can have multiple children, but only one direct parent
  • Program
    • A graph of all of the nodes that comprise the source code
  • Flow graph
    • A directed graph of all of the single line nodes connected with vertices where the possible flow of execution might proceed

Some Specifics

Lines of Code

  • A key size attribute of software
  • Can be a good measure of software volatility, especially when tracked over the entire development process
  • Can be used as the basis for other metrics, such as the bugs:code and tests:code ratios

Code to comment ratio

  • We’ve already seen how important commenting is to developing quality code
  • This metric puts a numerical value on the amount of inline documentation in a piece of software
  • Gives developers warning on when code needs to be documented

Cyclomatic Complexity

  • Directly measures the number of linearly independent paths through source code
  • CC = E – N + p
    • where E = the number of edges of the program’s flow graph
    • N = the number of nodes of the graph
    • p = the number of connected components of the graph
  • If code contains no decisions, then CC=1, if a piece of code contains a binary if statement, CC=2, etc…
  • Upper bound on the number of unique test cases required to have complete coverage of a given branch
  • Commonly used thresholds:
Complex and high risk
Cyclomatic Complexity Risk Evaluation
1-10 A simple program without much risk
11-20 More complex, moderate risk
21-50
> 50 Practically untestable, very high risk
  • Lower CC contributes to a program’s understandability and indicates that it is more easily modifiable
  • Generally, the greater CC becomes, the more complex and unmaintainable the code becomes
  • Greater cyclomatic complexity indicates a greater learning curve for new developers

Code Coverage

  • A metric that describes to extent to which the source code of a program has been tested
  • Different degrees of code coverage:
    • Function coverage – Has each function in the program been executed?
    • Statement coverage – Has each line of the source code been executed?
    • Condition coverage – Has each evaluation point (such as a true/false decision) been executed?
    • Path coverage – Has every possible route through a given part of the code been executed?
    • Entry/exit coverage – Has every possible call and return of the function been executed
  • Some of the above are connected together
  • Code Coverage and Unit Tests
    • Indicator of how well your tests actually test your code
    • Lets you know if you have enough tests in place
    • Allows you to maintain the quality of your test suite over the lifetime of the project
  • How Code Coverage works (in Java)
    1. compile the source code
    2. instrument the compiled class files, excluding the compiled test cases. This adds the necessary information to allow for…
    3. Collect runtime data
    4. merge the runtime data into a auditable report
    5. When the tests are executed, the extra info added in when the files were instrumented will write out exact coverage data to disk

Cohesion

  • Cohesion is a measure of how strongly-related the various responsibilities of a software module are
  • A node is usually deemed to have “high cohesion” or “low cohesion”
  • High Cohesion can indicate many things about code, including the extent of reuse of code and readability
  • Disadvantages of low cohesion:
    • Increased difficulty in understanding nodes of source code
    • Increased difficulty in maintaining source code – changes will affect multiple nodes, changes in one node will require changes in many other nodes
    • Increased difficulty in reusing a node of source code, since most other nodes will not need the functionality that a node with low cohesion provides

Coupling

  • Coupling is the extent to which a node relies on the other nodes in the source code
  • Nodes can be called either “loosely/weakly coupled” or “strongly/tightly coupled”
  • Loose coupling indicates high cohesion!
  • Loose coupling refers to a relationship between nodes such that one node interacts with the other nodes via a stable interface and does not need be concerned with the internal implementation of the other nodes
  • Types of coupling:
    • Content coupling (tightest) – is when one node modifies or relies on the internal workings of another node
    • Common coupling – is when nodes share the same global data
    • External coupling – Is when nodes rely on an external data format
    • Data coupling – Is when nodes share data through parameters
    • Message coupling (loosest) – Is when modules are not dependent on each other, they use a public interface to communicate

Methods for decreasing coupling and increasing cohesion

  • Transmit messages between nodes in a flexible format (such as XML)
  • Use public interfaces to communicate messages between nodes where a file format is not required
  • Separate code into nodes that perform logical chunks of work (example: MVC pattern)
  • Write code such that the implementation of a given node of code is independent from how it is used by other nodes

Free tools for auditing software

Modular Programming

Note: This is a copy of a page I wrote for the software engineering course I taught at the University of Illinois Urbana-Champaign. I am reposting it here on my blog in the hope that it will be found useful by others in the future.

Introduction

According to McConnell, a module is either (1) a collection of data along with routines, or functions, that use or manipulate that data or (2) a collection of routines, or services, that operate on any external data given to it. Modular Programming aims to maintain this type of structure in code by stipulating that all classes or routines be independent(de-coupling) as much as possible without inhibiting their ability to interact (cohesion). In other words, a truly modular program is one in which cohesion is maximized and coupling is minimized. Why should one code in this way? The answer lies in the fact that modular programming has proven, with actual experimental studies, to be more maintainable and easier to debug. Since this course involves writing a fair amount of code, we advocate the modular programming approach. We describe some good rules of thumb in the sections to follow.

Cohesion vs Coupling

Cohesion and Coupling are two important aspects of modular programming that need to be well defined before one starts writing modules. A module is cohesive if it offers services that are all related to each other , particularly in terms of high level functionality. For example, if a module contains the functions enqueue() , dequeue() makePhoneCall(), writeToDisk(), readTextFile() . Perhaps the type of data used is uniform for this program, but many of these functions have nothing to do with one another. The correct solution to making this a modular program is to group the functions that are related to each other into separate modules. With this, a supermodule can be created to connect these submodules. A module is decoupled and independent if it allows for other modules to interact with it very easily without having to use any additional “hacks”. De-coupling requires one to understand the parts of the module that are independent from each other. You can think of independence by asking “how much does this function or subclass affect this other function or subclass”? If the answer is ” a lot” , then one should maintain the code as it is, making a note about how these two components are highly related. If the answer is “very little”, which typically would be the case for programmers new to modularity, then it is best to decouple the two components into separate submodules.

Information Hiding

As the term suggests, information hiding aims to prohibit others from viewing private areas of one’s program. Users of one’s code need only to know the interface of the code and not the implementation. Thus, one would only expose the code’s interface while hiding the private implementation details. According to McConnel, Such private areas are typically:

  1. Areas likely to change frequently
  2. Complicated data within a module
  3. Intricate Logic with routines of the module
  4. Operations at the programming-language level

As an example, suppose you are writing a program for your company and it involves data about the number of employees at the company. One may not want to tell outsiders or competitors about the size of its company; it might hurt that company financially to do so. Instead of making the company size publicly viewable you would have an interface that only gives public information (Gender, age, ethnic demographics as a percentage but not the actual numbers). What is being hidden is how those percentages are being calculated since the company size would be needed to calculate such demographics.

Ensuring Modularity

To make sure that you are following modular design, here are some principles to adhere to:

  1. A module should address one central functionality or goal.
  2. If a module is built from other smaller components, a module should be easily broken down into these components.
  3. Implementation details of the module should be hidden from external modules. It should be seen as a black box.
  4. The interface of the module should allow for easy access to its services without needing to set or hardcode any other information.
  5. The set of services that the module provides should be related, in terms of high-level functionality, to each other.

Extending Modules

It is important to realize that once you have a fully functional module, you can use this module as part of a larger module containing submodules as its components. This idea defines the recursive nature of modular programming, in that you can always follow this principle to build larger and larger programs. The “super-module” should follow all the principles discussed above, maintaining maximum cohesion and minimal coupling with other such modules.

Code Smells


Note: This is a copy of a page I wrote for the software engineering course I taught at the University of Illinois Urbana-Champaign. I am reposting it here on my blog in the hope that it will be found useful by others in the future.

Code Smells

What is a code smell?

According to Wikipedia, a code smell is “any symptom in the source code of a program that possibly indicates a deeper problem”. Code smells tend to be patterns that commonly show up in source code that when fixed, often lead to better, more maintainable, reliable, and cleaner code. The following is an incomplete list of common code smells with examples and suggested solutions for fixing them.

Duplicate Code

What is it: When segments of source code are repeated throughout the program.

How to fix it:

Type Solution
Duplicate Methods in subclasses Move code to superclass, create a superclass if needed
Duplicate expressions in superclass Extract duplicates into their own methods
Duplicate expressions in different classes Extract duplicates to a common component

Long Methods/Functions

What is it: When methods or functions are excessively long

How to fix it:

Type Solution
Code that will not fit on a page Extract functions from long fragments
Can’t think of the function all at once Extract into several smaller functions, add comments

Large Classes

What is it: Any class with more than 6-8 functions and 12-14 variables

How to fix it: split into component classes, create superclasses

Long Parameter List

What is it: When a function or method has too many parameters (generally more than 3-4)

How to fix it: Introduce a parameter object in place of many parameters to a function, but this is only worth doing if there are several functions with the same parameters, could also use a dynamic parameter object that is multipurpose (think Java Properties object)

Message Chain

What is it: When you call several functions successively such as:

person.getAddress().getZip();

How to fix it: Replace commonly called chains with helper functions such as:

person.getZip();

Feature Envy

What is it: When code wants to be in a different class, such as:

csDept.getFaculty().add(newProfessor);
csDept.setFacultyCount(csDept.getFacultyCount()+1)

How to fix it: Create a composite function that handles all necessary actions, such as:

csDept.addFaculty(newProfessor);

that handles the above two statements.

Switch statements, nested ifs

What is it: The use of switch statements where unnecessary, when if statements are deeply nested (more than 2 deep)

How to fix it:

  • Replace with a method call
  • Make subclasses for each case
  • Try to keep nesting to at most two levels

Temporary Fields

What is it: When instance variables are only used for part of the lifetime of an object

How to fix it: Change those instance variables into local variables to where they are used or move them to another object that better suits them

Refused Bequest

What is it: A is subclass of B, A overrides methods of B, does not use some inherited methods and fields of B

How to fix it: Give A and B a common superclass and move what A and B both use into it

Too Many Bugs

What is it: When functionality of your work suffers due to too many bugs in the code

How to fix it: Unit test to find bugs, fuzz your application with various inputs to test all possible cases

Too hard to understand

What is it: When your source code is not easily understood when read by someone reading it for the first time

How to fix it:

  • Use descriptive variable names (example: rowIndex instead of i in for loops)
  • Use many meaningful comments to guide reader through the code

Too hard to change

What is it: When your code becomes too hard to change when one of its specifications changes. Examples include:

  • a change in input format
  • a change in output format
  • a change in internal data structures
  • a change in communications format/protocol

How to fix it: Modularize your code – make more classes that each expose an interface but hide their internal algorithms and data structures. Some example modules that you might include in a project could :

  • a module that only handles input
  • a module that only handles output
  • modules that each perform a piece of the program’s logic

Using a decomposition similar to this, if you changed any of the above specifications, you would only have to change one module of your code, rather than the entire program source.