Names Tutorial N: Summary

Top  Previous  Next

Opinion is more divided on whether it's a good idea for formulas to identify their inputs by names rather than coordinates in formulas than by any other topic in the field of developing spreadsheet models.

Arguments for using names

Names make a formula clearer and easier to read.  The intent of =Revenue-Costs is clearer than =P999-P807.

This is particularly so when formula refer to cells on other workbooks.  The name formula would still be =Revenue-Costs; the coordinate formula would be ='Profit and loss'!P999-'Profit and loss'!P807.  In longer formula, the clutter of worksheet names is particularly impenetrable.

A naming convention that is agreed by a team and used consistently within it greatly aids different team members from picking up and understanding each others' models.

Many methodologies for developing spreadsheets focus on laying out the spreadsheet so that related items are physically near to each other.  In some cases, this involves extensive repetition, to stage values for input to and output from sub-parts of a model.  Referring to quantities using meaningful names allows one to worry less about where quantities happen to lie physically on a spreadsheet.  What matters is whether the terms are semantically close, and whether related quantities can be found near each other in the alphabetic listings that Excel offers with the various commands for manipulating names.

Names facilitate code reuse.  Model A may contain a complex tax calculation in an obscure jurisdiction.  Model B needs to be fitted with the same calculation.  If both models are expressed in coordinates, there is no chance that the fragment of model A that contains the calculation will have references that are consistent with where the relevant inputs fall in model B.  If the two models have been developed using names, and those names follow a convention agreed within the team, then Model A's calculation may be copied into Model B and made to work quickly with little modification, without any concern for physical layout.

Relying on Excel to match rows and columns relieves the analyst of the need to do the same and eliminates completely a common cause of error, making reference to a cell which is in a different column from the one intended.  It also forces a model developer to pursue consistent conventions about what columns are for in different parts, or on different worksheets, of a model.

Arguments against using names

Names are not widely understood by ordinary spreadsheet users.  Using them makes a spreadsheet less accessible, and less safely modifiable, by such users.

The formula =P999-P807 refers unambiguously to the cells mentioned.  Determining quite which cells the formula =Revenue-Costs refers to needs some work, firstly, to check that those names are associated with the intended cells, and secondly to fathom out the effect of any row or column matching that may apply.

What happens when a formula expressed using names is cut or copied to a new location is not the same as what happens when a formula expressed in coordinate notation is so moved.  The latter is widely understood; the former much less so, and is a source of error in the hands of recipients of a spreadsheet untrained in the behavior of Excel names.

This last point is particularly an issue when a formula that uses a name is cut or copied from one workbook to another.  Excel will make new names in the destination workbook in a process over which the user has no control and may not be aware of.

Two spreadsheets can have identical formulas but act differently because the names are differently defined.  This kind of inconsistency can be hard to detect.

Using names reduces the opportunity to transfer a model from Excel into other spreadsheet packages.  Not all of them can make sense of names, though increasingly many can.

Independent research

The one bit of published research on the use of names was presented to the Eusprig conference in Paris in 2009. Ruth McKeever, Kevin McDaid & Brian Bishop of Dundalk Institute of Technology presented Analysis of the Impact of Named Ranges on the Debugging Performance of Novice Users. It found that a group of novice spreadsheet users found a lower proportion of the errors deliberately seeded into a spreadsheet that made use of names than did a similar group presented with errors and spreadsheets that were identical except that they used coordinate references throughout and no names.

Areas that would be interesting for future research are

whether the same detection rates would apply in groups that had been made fluent in the use of names before the exercise started

the rate of faults over the whole life cycle of a spreadsheets, ie during the development process as well as a review stage, rather than a review stage alone.