“Make illegal states unrepresentable” is a mantra commonly used in functional modeling. The mantra was coined by Yaron Minsky in a 2010 guest lecture at Harvard. The idea is to model data representations in such a way that nonsensical (“illegal”) values (“states”) are inexpressible (“unrepresentable”). By adhering to "Make illegal states unrepresentable" the resulting models are often simpler, expressing the desired intent more directly. This leads to systems that are more robust (often correct by construction) and more loosely coupled.
Objectives: Simplicity, Maintainability, Correctness
Related patterns: Static types, Parse, don’t validate, Smart Constructor
“Make illegal states unrepresentable” was first introduced by Ron Minsky in the context of Ocaml, a statically typed language with explicit support for algebraic data types. The original example illustrates the mantra by turning a flat product type with many nullable fields into a sum-of-products. The mantra itself is not limited to statically typed languages and there are other techniques to implement the mantra.
Minsky's original example starts off with the following OCaml code, which illustrates some subtle data modelling issues.
type connection_state =
| Connecting
| Connected
| Disconnected
type connection_info = {
state: connection_state;
server: Inet_addr.t;
last_ping_time: Time.t option;
last_ping_id: int option;
session_id: string option;
when_initiated: Time.t option;
when_disconnected: Time.t option;
}
With this representation some illegal states can be represented. For example, we can come up with a Connecting
state that has a when_disconnected
value.
let illegal_1 = {
state = Connecting;
when_disconnected = Some some_time;
...
}
Or we can come up with a Connected
state that has a last_ping_time
but no last_ping_id
.
let illegal_2 = {
state = Connected;
last_ping_time = Some some_time;
last_ping_id = None;
...
}
In this representation there are some invariants that aren't captured by the model, so they have to be adhered to by users of the model, which introduces implicit coupling. The invariants have to be described outside of the programming language facilities, for instance in a comment. These descriptions can get out of sync with the code. Since the invariants aren't machine checked they are a common source of bugs.
An improved model that inherently expresses relevant invariants looks like this:
type connecting = { when_initiated: Time.t; }
type connected = { last_ping : (Time.t * int) option;
session_id: string; }
type disconnected = { when_disconnected: Time.t; }
type connection_state =
| Connecting of connecting
| Connected of connected
| Disconnected of disconnected
type connection_info = {
state : connection_state;
server: Inet_addr.t;
}
Now connection_state
is a proper sum-of-products. The fields that only make sense for one of the three states are now part of the corresponding type only. The field server
makes sense for all states and is therefore part of the larger connection_info
type. Additionally, the requirement that last_ping_time
and last_ping_id
are either both Some ...
or both None
is now expressed by last_ping
being a single option of a tuple.
Dynamically typed languages such as Clojure do not explicitly mention algebraic data types as a language feature. Still, the above technique of turning a flat product into a sum-of-products can be used in dynamically typed languages as well. We use Clojure (+ active-clojure records) to illustrate this point. The flawed representation uses a flat record as in the Ocaml code above:
(define-record-type ConnectionInfo
make-connection-info
connection-info?
[state connection-info-state ;; one of :connecting, :connected, :disconnected
server connection-info-server ;; non-nullable
last-ping-time connection-info-last-ping-time ;; nullable
last-ping-id connection-info-last-ping-id ;; nullable
session-id connection-info-session-id ;; nullable
when-initiated connection-info-when-initiated ;; nullable
when-disconnected connection-info-when-disconnected ;; nullable
])
This, again, allows for nonsensical values to be representable:
(def illegal-1
(make-connection-info
;; connection-state
:connecting
...
;; when-disconnected
some-date))
(def illegal-2
(make-connection-info
;; connection-state
:connected
...
;; last-ping-time
some-date
;; last-ping-id
nil
...
))
A better representation looks quite similar to the improved Ocaml code above:
(define-record-type Connecting
^:private mk-connecting
connecting?
[when connecting-when])
(defn make-connecting [when]
(assert (date? when))
(mk-connecting when))
(define-record-type Connected
^:private mk-connected
connected?
[last-ping connected-last-ping
session-id connected-session-id])
(define-record-type Ping
make-ping
ping?
[when ping-when
id ping-id])
(defn make-connected [last-ping session-id]
(assert (ping? last-ping))
(assert (session-id? session-id))
(mk-connected last-ping session-id))
(define-record-type Disconnected
^:private mk-disconnected
disconnected?
[when disconnected-when])
(defn make-disconnected [when]
(assert (date? when))
(mk-disconnected when))
(defn connection-state? [x]
(or (connecting? x)
(connected? x)
(disconnected? x))
(define-record-type ConnectionInfo
^:private mk-connection-info
connection-info?
[state connection-info-state
server connection-info-server])
(defn make-connection-info [state server]
(assert (connection-state? state))
(assert (inet-addr? server))
(mk-connection-info state server))
Since Clojure doesn't do proper static type checking, the smart constructor make-connection-info
above can only check its parameters at runtime via assert
. Still, this latter representation is preferable to the previous one for the same reasons as in the original Ocaml example.
Some domains are best modelled with the help of associative key-value maps or pure functions. As an example imagine we want to model the concept of a time series. We start with a Scala representation of time series as a list of time-double-tuples:
object TimeSeriesService {
type TimeSeries = List[(Time, Double)]
}
This allows for illegal and nonsensical values to be represented:
// Let t1, t2, t3 be timestamps with t1 < t2 < t3
val ts1 = List((t2, 6.5), (t1, 5.0), (t3, 7.3))
val ts2 = List((t1, 6.5), (t1, 6.5))
val ts3 = List((t1, 6.5), (t1, 13.4))
ts3
is certainly illegal, and ts1
and ts2
are surely not the common case either. A user of the TimeSeriesService
now has to think hard about what each these cases denote. A more straightforward model is to represent time series as associative maps from time to double:
object TimeSeriesService {
type TimeSeries = Map[Time, Double]
}
Now contradictory or redundant values are inexpressible. An even better model is time series as functions from time to optional double:
sealed trait TS extends Function1[Time, Option[Double]]
object TimeSeriesService {
type TimeSeries = TS
}
This way, time series can be represented by lists, maps, proper functions, static values, etc, as long as we can make these types behave like a function:
class TSList(list: List[(Time, Option[Double])])
extends TS {
def apply(t: Time): Option[Double] =
list.find( (tx, _) => tx == t ).flatMap(_._2)
}
class TSConst(val: Double) extends TS {
def apply(_: Time): Option[Double] =
Some(val)
}
With this model of time series as functions we made illegal states unrepresentable and also we made all legal states representable, which might be just as important.
TODO
“Make illegal states unrepresentable” can have a great impact both for understanding a software's domain and for the software architecture itself. Models designed with “Make illegal states unrepresentable” in mind are simpler and therefore aid in gaining insight into a domain. Systems using these models are robust and loosely coupled.
“Make illegal states unrepresentable” leads to simpler models. This frees the designer's mind from having to deal with arbitrary technical intracacies of the model. Take the model of time series as functions of time described above.
sealed trait TS extends Function1[Time, Option[Double]]
There's little you can do with functions, so there's little you can do wrong with functions.
The flawed representation of time series as a list of tuples allows for many illegal values to be represented:
object TimeSeriesServiceLegacy {
def getTimeSeriesData(from: Time, to: Time): List[(Time, Double)] = ???
}
// Let t1, t2, t3 be timestamps with t1 < t2 < t3
val ts1 = List((t2, 6.5), (t1, 5.0), (t3, 7.3))
val ts2 = List((t1, 6.5), (t1, 6.5))
val ts3 = List((t1, 6.5), (t1, 13.4))
Since we can represent these values, our software system must handle them consistently, i.e. each user of the corresponding interface has to agree on a course of action when encountering one of these values. In the case of time series as lists of tuples, we have to answer the following questions unanimously:
How do we deal with out-of-order entries in time series lists?
How do we deal with duplicate entries in time series lists?
How do we deal with contradictory entries in time series lists?
Alternatively some users might choose to ignore these cases, relying on a separate validation step beforehand.
All of this leads to brittle systems. With "Make illegal states unrepresentable" we simply define the problem away. Detailed decisions can be encapsulated in a single parse step (Parse, don't validate) so every subsequent calculation is freed from having to deal with these minutiae. In Domain-Driven-Design literature such a parse step is often referred to as an Anti-Corruption Layer.
object TimeSeriesServiceAntiCorruption {
private def parseList(lst: List[(Time, Double)], acc: Map[Time, Double]): Map[Time, Double] =
lst match {
case Nil => acc
case (t, x) :: rest => parseList(rest, acc + (t -> x))
}
def getTimeSeriesData(from: Time, to: Time): Map[Time, Double] =
parseList(TimeSeriesServiceLegacy.getTimeSeriesData(from, to), Map.empty)
}
TODO
“Make illegal states unrepresentable” specifically mentions “states”. In software engineering, “state” usually refers to mutable state, but Ron Minsky's original example illustrates that in this context “state” has a different notion. His connection_state
describes immutable values. The term “state” hints at the idea that these values are parts of a state machine. connection_state
most likely moves from Connecting ...
via Connected ...
to Disconnected ...
.
“Make illegal states unrepresentable” is applicable to use cases apart from state machines. A better term would be "Make illegal values unrepresentable". In addition to values created with sums and products, the latter would include functions, since functions are values in functional programming. The idea of "Make illegal values unrepresentable" would then be equivalent to programming with machine-checked specifications. Examples of the latter are How to Keep Your Neighbours in Order by Conor McBride and Symbolic and Automatic Differentiation of Languages by Conal Elliott.