Functional programming in C++. Why would you want to do that and is it possible? In this series of posts I’m going to introduce functional programming concepts and show how they can be implemented in C++. See below for an introduction to the series and the kick-off article on Algebraic Data Types.
Since the dawn of computing there have been two parallel approaches to understanding and using what we now call software. The first line of thought, from which C++ was born, focuses on immediate practicality and the capabilities of the machines that are available. The second line of thought, which produced the discipline of functional programming, is a search for simple and beautiful mathematical constructs that captures the underlying essence of software and the domains modeled within it.
The concern of this series is the latter. By lifting thinking about design from the level of pointers, classes, and templates to mathematical concepts, which include algebraic data types, the resulting programs are more simple, general, concise, and composable.
Some of the material may seem difficult at first but if you keep trudging along, the benefits will become clear soon enough. My hope is that once someone works through these articles in their entirety, they will be convinced that a) C++ can be used as an advanced (more powerful than Haskell 2010), functional language that has the added feature of being performance tunable and b) that functional programming is an approach that produces readable, correct, and composable code.
The concern of this article is the mathematical basis and notation for Algebraic Data Types. The sequal will explain how to apply ADTs directly to C++. As you learn these concepts, I encourage you to ponder how a data type you designed recently would be implemented abstractly with ADTs. When I do this I often come up with something more simple or general than my first thought.
A type can be thought of as a set of possible values. We use a ∈ A to say that a is in the set A, or in relation to types a has the type A. long in c++ is an example of a type. We’d say 64l ∈ long since the literal
64l has that type.
Algebraic data types deal with how we can create more complex types based on combining others. ADTs are described using 5 symbols: 1, ⊕, ⊗, ::=, and :=. We’ll be looking at each of them below.
Please consider this a guided tutorial. If you have a question or something needs clarification, please make a comment below.
Given two types, A and B, we call the type A ⊗ B the product of A and B. A value of type A ⊗ B can be thought of as containing an element of A and an element B. Mathematically, this is often referred to as a pair where the value (a,b) is of type A ⊗ B if a ∈ A and b ∈ B.
Pairs can be generalized to form n-tuples.
Exercise. If ℝ is the type of the real numbers, what would be the type of points in 3-space?
Solution. ℝ ⊗ ℝ ⊗ ℝ
Given two types, A and B, we call the type A ⊕ B the sum of A and B. A value of type A ⊕ B can be thought of as containing an element of type A or an element of type B.
Values of a sum like A ⊕ B can be represented denotationally as either (0,a) where a ∈ A or (1,b) where b ∈ B.
Given the sum int ⊕ char where int and char are the normal C++ types of the same name, some elements would be (0, 12), (1, ‘a’), and (0, 33). Note that the 0 or 1 discriminates which of the underlying types are being stored in the pair’s second element.
As with the product operation, sums can be extended to n elements.
Exercise. If n is the number of values of a type A. How many values are in type A ⊗ A? How about A ⊕ A?
Solution. n × n for the product and n + n for the sum. Because we define our sum operation to be a disjoint union, values of A ⊕ A are tagged with which side of ⊕ they fall upon. If sum were defined to be a non-disjoint union, A ⊕ A would have only n elements.
Exercise. Using what we’ve covered so far, come up with a type for representing the values of booleans.
Solution. 1 ⊕ 1. The only two values of this type are (0,1) and (1,1).
Type definitions allow us to use shorthand names for more complex types. The general form is a := e where a is the newly bound identifier and e is the type expression. We read := as “is the same as”.
Expanding on our bool exercise, Bool₁ := 1 ⊕ 1 allows us to conveniently use Bool₁ wherever we would otherwise use 1 ⊕ 1.
Bool₁ as defined suffers from a major drawback however. The type 1 ⊕ 1 can be used to represent any type with two elements. For example, a state of “on” or “off” could be represented as 1 ⊕ 1 just as easily.
To remedy this we include an alternative type definition ::= (note the two colons), called “is implemented with”. a ::= e states that a is a distinct type from e (for all v ∈ a, it is not true that v ∈ e), yet there is an affinity between them (given a va ∈ a there is a corresponding ve ∈ e and vice versa).
To denote elements of type a where a ::= e, we surround an element of type e with parentheses and give it the subscript a. For example if ve ∈ e, (ve)a is the value of type a that corresponds to ve of type e.
Exercise. Denote the values of type Bool₂ where Bool₂ ::= 1 ⊕ 1.
Solution. ((0,1))Bool₂ and ((1,1))Bool₂
Exercise. Consider the following definition of Bool₃:
T ::= 1
F ::= 1
Bool₃ := T ⊕ F
Denote the values of type Bool₃.
Solution. We start with elements of base types and work our way up.
T has exactly one element, (1)T.
Similarly the element of F is (1)F.
Therefore the elements of Bool₃ are:
Type functions allow us to create reusable type templates. The general forms for type functions are f a₁ a₂ … an := e and f a₁ a₂ … an ::= e where f is the name of the newly bound type function and the type expression e may use a₁…an.
Consider an OpInt type that can be either an integer or nothing. A simple way to declare OpInt is as follows:
none ::= 1
OpInt := none ⊕ int
If we think optional values are useful in a more general context, we can declare a type function Op that creates various optional types based on its type argument:
none ::= 1
Op a := none ⊕ a
Exercise. Define OpInt using Op from above.
Solution. OpInt := Op int
Interesting types occur when type definitions are allowed to be recursive. That is, the name to the left of the ‘=’ symbol also appears on the right hand side. Here are a few examples:
The natural numbers N:
Z ::= 1
N := Z ⊕ N
If z is the single value of type Z, values of N take the form:
n0 = (0, z)
n₁ = (1, n0) = (1, (0, z))
n₂ = (1, n₁) = (1, (1, (0,z)))
n₃ = (1, n₂) = (1, (1, (1, (0, z))))
Exercise. Define a list type L of arbitrary types a. Show a few values of type L char.
emptyList ::= 1
L a := emptyList ⊕ (a ⊗ L a)
And a couple values of type L char:
- “” = (0, emptyList)
- “Hi” = (1, (‘H’, (1, (‘i’, (0, emptyList)))))
Since the types in the sum are disjoint, we can also represent these values unambiguously as follows:
- “” = emptyList
- “Hi” = (‘H’, (‘i’, emptyList))
Exercise. Declare a type for binary trees BTree, with values of type a at leaf nodes. Each node is either a leaf or has exactly two subtrees:
Solution. BTree a ::= (BTree a ⊗ BTree a) ⊕ a
Exercise. Declare a type for binary trees BTree₂, with values of type a at leaf nodes and b at branch nodes. Again, each node is either a leaf or has exactly two subtrees:
Solution. BTree₂ a b ::= (b ⊗ BTree₂ a b ⊗ BTree₂ a b) ⊕ a
Exercise. How do the above trees differ from BTree₃ (Thanks to Peter Marendic for inspiring this exercise):
emptyTree ::= 1
BTree₃ a ::= emptyTree ⊕ (a ⊗ BTree₃ a ⊗ BTree₃ a)
Solution. BTree₃ allows for a empty trees and branch nodes containing 1 subtree as well as 2.
Next up, we’ll talk about how to realize algebraic datatypes in C++ using Boost libraries. See you soon!