“Make illegal states unrepresentable” is a mantra commonly used in functional modeling. The idea is to model data representations in such a way that nonsensical (“illegal”) values (“states”) are inexpressible (“unrepresentable”).
Objectives: Simplicity, Maintainability, Correctness
Related patterns: Static types, Parse, don’t validate, Smart Constructor
The term “Make illegal states unrepresentable” was coined by Yaron Minsky in a 2010 guest lecture at Harvard. Ron never fully explained what the mantra means, but his example is quite illuminating in its own right.
Minsky starts off with the following OCaml code as an example of bad data modelling.
type connection_state =
| Connecting
| Connected
| Disconnected
type connection_info = {
state: connection_state;
server: Inet_addr.t;
last_ping_time: Time.t option;
last_ping_id: int option;
session_id: string option;
when_initiated: Time.t option;
when_disconnected: Time.t option;
}
With this representation some illegal states can be represented. For example, we can come up with a Connecting
state that has a when_disconnected
value.
let illegal_1 = {
state = Connecting;
when_disconnected = Some some_time;
...
}
Or we can come up with a Connected
state that has a last_ping_time
but no last_ping_id
.
let illegal_2 = {
state = Connected;
last_ping_time = Some some_time;
last_ping_id = None;
...
}
In this representation there are some invariants that aren't captured by the model, so they have to be adhered to by users of the model, which introduces implicit coupling. The invariants have to be described outside of the programming language facilities, for instance in a comment. These descriptions can get out of sync with the code. Since the invariants aren't machine checked they are a common source of bugs.
An improved model that inherently expresses relevant invariants looks like this:
type connecting = { when_initiated: Time.t; }
type connected = { last_ping : (Time.t * int) option;
session_id: string; }
type disconnected = { when_disconnected: Time.t; }
type connection_state =
| Connecting of connecting
| Connected of connected
| Disconnected of disconnected
type connection_info = {
state : connection_state;
server: Inet_addr.t;
}
Now connection_state
is a proper sum-of-products. The fields that only make sense for one of the three states are now part of the corresponding type only. The field server
makes sense for all states and is therefore part of the larger connection_info
type. Additionally, the requirement that last_ping_time
and last_ping_id
are either both Some ...
or both None
is now expressed by last_ping
being a single option of a tuple.
The example above – transforming a product type (connection_info
) with many nullable fields into a sum type with less nullable fields – demonstrates one effective technique to make illegal states unrepresentable. There are others, such as using associative maps and functions. Take a Scala representation of time series as a list of time-double-tuples:
object TimeSeriesService {
type TimeSeries = List[(Time, Double)]
}
This allows for illegal and nonsensical values to be represented:
// Let t1, t2, t3 be timestamps with t1 < t2 < t3
val ts1 = List((t2, 6.5), (t1, 5.0), (t3, 7.3))
val ts2 = List((t1, 6.5), (t1, 6.5))
val ts3 = List((t1, 6.5), (t1, 13.4))
ts3
is certainly illegal, and ts1
and ts2
are surely not the common case either. A user of the TimeSeriesService
now has to think hard about what each these cases denote. A more straightforward model is to represent time series as associative maps from time to double:
object TimeSeriesService {
type TimeSeries = Map[Time, Double]
}
Now contradictory or redundant values are inexpressible. An even better model is time series as functions from time to optional double:
sealed trait TS extends Function1[Time, Option[Double]]
object TimeSeriesService {
type TimeSeries = TS
}
This way, time series can be represented by lists, maps, proper functions, static values, etc, as long as we can make these types behave like a function:
class TSList(list: List[(Time, Option[Double])])
extends TS {
def apply(t: Time): Option[Double] =
list.find( (tx, _) => tx == t ).flatMap(_._2)
}
class TSConst(val: Double) extends TS {
def apply(_: Time): Option[Double] =
Some(val)
}
With this model of time series as functions we made illegal states unrepresentable and also we made all legal states representable.
“Make illegal states unrepresentable” was invented in the context of OCaml, which is a functional programming language with a strong static type system, but the mantra is applicable to most other programming paradigms. For example the Rust community adopted the mantra. Rust has a strong static type system but isn't a functional language. On the other hand, Clojure – a dynamically typed functional language – also allows programmers to make illegal states unrepresentable.
This blog post discusses “Make illegal states unrepresentable” in the context of Rust.
Minsky's original talk suggests that "Make illegal states unrepresentable" is about algebraic data types and compile-time type checking. The underlying ideas are applicable to dynamically typed languages such as Clojure (+ active-clojure records) as well, as is witnessed by the following contrasting pair of data representations.
The bad representation uses a flat record as in the OCaml code above:
(define-record-type ConnectionInfo
make-connection-info
connection-info?
[state connection-info-state ;; one of :connecting, :connected, :disconnected
server connection-info-server ;; non-nullable
last-ping-time connection-info-last-ping-time ;; nullable
last-ping-id connection-info-last-ping-id ;; nullable
session-id connection-info-session-id ;; nullable
when-initiated connection-info-when-initiated ;; nullable
when-disconnected connection-info-when-disconnected ;; nullable
])
This, again, allows for nonsensical values to be representable:
(def illegal-1
(make-connection-info
;; connection-state
:connecting
...
;; when-disconnected
some-date))
(def illegal-2
(make-connection-info
;; connection-state
:connected
...
;; last-ping-time
some-date
;; last-ping-id
nil
...
))
A better representation looks quite similar to the improved OCaml code above:
(define-record-type Connecting
^:private mk-connecting
connecting?
[when connecting-when])
(defn make-connecting [when]
(assert (date? when))
(mk-connecting when))
(define-record-type Connected
^:private mk-connected
connected?
[last-ping connected-last-ping
session-id connected-session-id])
(define-record-type Ping
make-ping
ping?
[when ping-when
id ping-id])
(defn make-connected [last-ping session-id]
(assert (ping? last-ping))
(assert (session-id? session-id))
(mk-connected last-ping session-id))
(define-record-type Disconnected
^:private mk-disconnected
disconnected?
[when disconnected-when])
(defn make-disconnected [when]
(assert (date? when))
(mk-disconnected when))
(defn connection-state? [x]
(or (connecting? x)
(connected? x)
(disconnected? x))
(define-record-type ConnectionInfo
^:private mk-connection-info
connection-info?
[state connection-info-state
server connection-info-server])
(defn make-connection-info [state server]
(assert (connection-state? state))
(assert (inet-addr? server))
(mk-connection-info state server))
Since Clojure doesn't do proper static type checking, the smart constructor make-connection-info
above can only check its parameters at runtime via assert
. Still, this latter representation is preferable to the previous one for the same reasons as in the original OCaml example.
“Make illegal states unrepresentable” specifically mentions “states”. In software engineering, “state” usually refers to mutable state, but Ron Minsky's original example illustrates that in this context “state” has a different notion. His connection_state
describes immutable values. The term “state” hints at the idea that these values are parts of a state machine. connection_state
most likely moves from Connecting ...
via Connected ...
to Disconnected ...
.
“Make illegal states unrepresentable” is applicable to use cases apart from state machines. A better term would be "Make illegal values unrepresentable". In addition to values created with sums and products, the latter would include functions, since functions are values in functional programming. The idea of "Make illegal values unrepresentable" would then be equivalent to programming with machine-checked specifications. Examples of the latter are How to Keep Your Neighbours in Order by Conor McBride and Symbolic and Automatic Differentiation of Languages by Conal Elliott.