|
Aug 01
2008
|
Invariant data in a program is data that, once initialized, never changes. A new term was coined for it after discovering that the existing jargon const, final, readonly and immutable had different meanings for different programmers and different programming languages (and even had shifting meanings within programming languages!). A term was needed that enabled a precise definition.
The existing terminology confusion led to the discovery that invariance doesn't get much respect in the programming language business (*), it's the Rodney Dangerfield of concepts. C++ only has invariance for literals and top level const. There is, for example, no way to declare a pointer to invariant data. Java has similar problems with its final storage class. There is no way to declare an entire data structure as invariant, or even the leaves of a data structure as invariant. Perl has invariance only for strings.
(Note: pointer to const in C++ does not declare a pointer to invariant data, as another alias to that same data can still change it.)
The D programming language, however, brings invariant to the fore as a transitive, first class, pervasive language feature. Whole data structures can be invariant, as well as leaves and substructures of it. It adds some measure of complexity to the language, and since other languages have such severely limited invariant support, and yet large and successful programs are developed with them, the obvious question is why bother? Invariance doesn't seem to be a property anyone misses.
So here's a list, in no particular order of importance, of what invariance is good for and why it is worth the extra complexity:
- By marking data as invariant, the self-documenting nature of the type is improved. There is no wondering if any part of a function's implementation modifies any part of an invariant argument's data structure. There is no wondering if any subsequent user of an invariant variable changes it or any part of what it refers to.
- Invariant data enables many optimization opportunities. If an optimizer can figure out what an invariant variable is initialized with, it knows that any use of that variable will use the same value, and so generic code can be replaced with special case code for that particular value.
- Data computed using invariant operands is also invariant, and so the result can be cached instead of recomputed.
- Invariant function arguments make pure functions possible, and with pure functions one can use functional programming techniques. Pure functions are inherently thread safe and parallelizable, which is important in the coming ubiquitous world of multiple cores.
- Invariant data can be treated as having the semantics of value types, but with the overhead of a reference type. Strings (discussed in one of my previous columns) are a canonical example of this.
- If a deep copy of a data structure is required, such as for implementing a copy-modify-swap lock-free algorithm, the invariant portions of the structure need not be copied. For example, a symbol table using invariant strings for its leaves need not copy those strings when copying the symbol table.
- Invariant data can be shared among multiple threads without need for locking, synchronization, etc. Data races and sequential consistency problems with mutable data are just not possible with invariant data.
- Invariant data can be automatically partitioned out by the development tools into ROM for use in embedded systems.
- In virtual memory systems, invariant data can be placed into pages marked with the read-only hardware bit, to provide protection against memory corruption and detection of such bugs.
I believe these add up to a compelling case for invariant support, what do you think?
(*) Functional programming (FP) languages inherently require invariance, but these languages have not really entered mainstream use. Perhaps this is because FP languages take a good thing too far and make everything invariant. FP has sometimes crept in the back door, for example C++ templates were discovered to actually be a functional programming language.

written by Frank Hileman, August 01, 2008
Please change the terminology to "immutable" if possible. It is confusing. "Invariant" has a well established meaning.
written by Sean Kelly, August 02, 2008
written by Joel Rosdahl, August 02, 2008
Walter: What problem do you see with "immutable"?
written by Sean Kelly, August 03, 2008
In C++ containers, for example, it's not uncommon to need both const and non-const versions of class methods that vary only by return type. In D 2.0, we not only have const methods but invariant methods as well, thus further increasing the maintenance cost of such classes.
Unlike C++ however, it os both possible and even desirable to overload a function for const and invariant parameters in order to achieve any optimization benefit from the latter when possible. If I have a collection of algorithms that operate on arrays, for example, it would be natural for me to declare these array parameters as const to provide a contract to the user. However, if the user passes actual invariant data then I would like the compiler to optimize my algorithm appropriately, thus requiring additional permutations of the function for each const/invariant parameter. This isn't an issue for template functions if the template system overloads on const-ness, but traditional functions are left with little recourse but explicit duplication, and not only two duplicates but potentially 2N (N being the number of const parameters for the function). As a library programmer who will have to do this by necessity, I am very much concerned that the potential, theoretical gain in optimization and automatic parallelization will not at all be worth the maintenance cost.
written by Andrew Koenig, August 03, 2008
So, because strings are immutable, you can use them as keys. Tuples are immutable if all their elements are, so you can use a tuple of strings (or mixed strings and numbers) as keys as well.
However, sets are mutable -- otherwise, how can you add an element to one. That means that you can't use a set as a key, which means that you can't have a set of sets. The solution: A new data type named frozenset, which is like a set but is immutable. So if you want a set of sets, you have to convert each set to a frozenset (which copies the data structure), and then you can use it as a set element.
This somewhat clumsy circumlocution appears necessary, but perhaps it suggests that functional and imperative paradigms don't always mix all that well.









First, the term is already widely used for a different concept, namely the notion of a condition that will always be true at a particular point in the code. An invariant is an important intellectual tool in correctness proofs.
Second, there is another term that is already in widespread use in other programming languages for this concept, namely "immutable." For example, strings in Python are immutable in the sense that there is no operation you can perform that will change the value of a string object. Operations that appear on the surface to be doing so, such as +=, actually create a new object and rebind the relevant variable to that object.