Notation

Haskell implementation of the Overview/Notation

Sections

Introduction

Represent

Set-builder notation

Powersets and Partition functions

Probability functions

Lists

Trees

Introduction

The code in this Haskell commentary can be executed by copying and pasting the code into a Haskell interpreter running the implementation in Alignment repository. Start your favourite Haskell interpreter and then load the AlignmentDev module. For example,

cd Alignment
ghci 
:l AlignmentDev

Note that, depending on your Haskell interpreter you may have to use multi-line mode to cut and paste let and do expressions spread over several lines. For example, using ghci on the command line to enter the following

let x = [1,2,
      3,4]

requires either

:{
let x = [1,2,
      3,4]
:}

:set +m

let x = [1,2,
      3,4]

In the second case multi-line mode is terminated by a blank line. The code in this Haskell commentary is formatted assuming the +m is set. Note that in some interpreters the lines require being cut and pasted individually, even in multi-line mode.

Users of older versions of the Glasgow Haskell Compiler Interpreter should also prevent the compiler from applying the monomorphism restriction to bindings lacking explicit type signatures,

:seti -XNoMonomorphismRestriction

The AlignmentDev module imports these modules,

import Data.List
import qualified Data.Set as Set
import qualified Data.Map as Map
import GHC.Real
import System.Random
import AlignmentUtil
import Alignment
import AlignmentSubstrate
import AlignmentDistribution
import AlignmentApprox
import AlignmentRandom
import AlignmentPracticable
import AlignmentPracticableIO

The AlignmentDev module also implements a number of frequently used abbreviations.

The types and functions described in this section are defined in the AlignmentUtil module.

Represent

The Alignment types implement the class type Represent, defined in AlignmentUtil, which requires them to implement the represent function,

represent :: Show a => a -> String

The represent function returns a String that approximates to a set-theoretic representation of the structure. AlignmentDev defines the abbreviation rp. For example Set.Set is represented,

let xx = Set.fromList [-10,-9..10]

rp xx
"{-10,-9,-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7,8,9,10}"

And Map.Map is represented,

let mm = Map.fromList [(i,i*2) | i <- [1..5]]

rp mm
"{(1,2),(2,4),(3,6),(4,8),(5,10)}"

AlignmentDev also defines a way to represent a list on separate lines,

rpln [[1..i] | i <- [1..5]]
"[1]"
"[1,2]"
"[1,2,3]"
"[1,2,3,4]"
"[1,2,3,4,5]"

Set-builder notation

The notation used throughout this discussion is conventional set-theoretic with some additions. Sets are often defined using set-builder notation, for example $Z = \{ \mathrm{f}(x) : x \in X,~\mathrm{p}(x) \}$ where $\mathrm{f}(x)$ is a function, $X$ is another set and $\mathrm{p}(x)$ is a predicate.

Tuples, or lists, can be defined similarly where the order is not important, for example, $\sum (\mathrm{f}(x) : x \in X,~\mathrm{p}(x))$. Haskell has a special syntax for list comprehensions,

let f x = x^2
    p x = -5 < x && x < 5
    xx = Set.fromList [-10,-9..10]

rp xx
"{-10,-9,-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7,8,9,10}"

rp [f x | x <- Set.toList xx, p x]
"[16,9,4,1,0,1,4,9,16]"

sum [f x | x <- Set.toList xx, p x]
60

let zz = Set.fromList [f x | x <- Set.toList xx, p x]

rp zz
"{0,1,4,9,16}"

sum $ Set.toList zz
30

Powersets and Partition functions

The powerset function is defined as $\mathrm{P}(A) := \{X : X \subseteq A \}$,

setsPowerset :: Ord a => Set.Set a -> Set.Set (Set.Set a)

For example,

rpln $ Set.toList $ setsPowerset (Set.fromList [1,2,3])
"{}"
"{1}"
"{1,2}"
"{1,2,3}"
"{1,3}"
"{2}"
"{2,3}"
"{3}"

The partition function $\mathrm{B}$ is the set of all partitions of an argument set. A partition is a set of non-empty disjoint subsets, called components, which union to equal the argument, \[ \forall P \in \mathrm{B}(A)~\forall C\in P~(C \neq \emptyset) \] and \[ \forall P \in \mathrm{B}(A)~\forall C,D \in P~(C \neq D \implies C \cap D = \emptyset) \] and \[ \forall P \in \mathrm{B}(A)~(\bigcup P = A) \]

setsSetPartition :: Ord a => Set.Set a -> Set.Set (Set.Set (Set.Set a))

For example,

rpln $ Set.toList $ setsSetPartition (Set.fromList [1,2,3])
"{ {1},{2},{3} }"
"{ {1},{2,3} }"
"{ {1,2},{3} }"
"{ {1,2,3} }"
"{ {1,3},{2} }"

A partition $P \in \mathrm{B}(A)$ is a parent of a partition $Q \in \mathrm{B}(A)$, if each component of $Q$ is a subset of a component in $P$, $\forall D \in Q~\exists C \in P~(D \subseteq C)$,

pairPartitionsIsParent :: Ord a => Set.Set (Set.Set a) -> Set.Set (Set.Set a) -> Bool

For example,

let ll = Set.toList $ setsSetPartition (Set.fromList [1,2,3])

rpln [(xx,yy) | xx <- ll, yy <- ll, pairPartitionsIsParent xx yy]
"({ {1},{2},{3} },{ {1},{2},{3} })"
"({ {1},{2,3} },{ {1},{2},{3} })"
"({ {1},{2,3} },{ {1},{2,3} })"
"({ {1,2},{3} },{ {1},{2},{3} })"
"({ {1,2},{3} },{ {1,2},{3} })"
"({ {1,2,3} },{ {1},{2},{3} })"
"({ {1,2,3} },{ {1},{2,3} })"
"({ {1,2,3} },{ {1,2},{3} })"
"({ {1,2,3} },{ {1,2,3} })"
"({ {1,2,3} },{ {1,3},{2} })"
"({ {1,3},{2} },{ {1},{2},{3} })"
"({ {1,3},{2} },{ {1,3},{2} })"

A parent partition is a decrement partition if it has one fewer components, $|P| = |Q| - 1$,

pairPartitionsIsDecrement :: Ord a => Set.Set (Set.Set a) -> Set.Set (Set.Set a) -> Bool

pairPartitionsIsDecrement pp qq = pairPartitionsIsParent pp qq && Set.size pp == Set.size qq - 1

For example,

rpln [(xx,yy) | xx <- ll, yy <- ll, pairPartitionsIsDecrement xx yy]
"({ {1},{2,3} },{ {1},{2},{3} })"
"({ {1,2},{3} },{ {1},{2},{3} })"
"({ {1,2,3} },{ {1},{2,3} })"
"({ {1,2,3} },{ {1,2},{3} })"
"({ {1,2,3} },{ {1,3},{2} })"
"({ {1,3},{2} },{ {1},{2},{3} })"

Relations

A relation $A \in \mathrm{P}(\mathcal{X} \times \mathcal{Y})$ between the set $\mathcal{X}$ and the set $\mathcal{Y}$ is a set of pairs, $\forall (x,y) \in A~(x \in \mathcal{X}~\wedge~y \in \mathcal{Y})$.

The domain of a relation is $\mathrm{dom}(A) := \{x : (x,y) \in A\}$ and the range is $\mathrm{ran}(A) := \{y : (x,y) \in A\}$,

relationsDomain :: (Ord a, Ord b) => Set.Set (a,b) -> Set.Set a
relationsRange :: (Ord a, Ord b) => Set.Set (a,b) -> Set.Set b

For example,

let dom = relationsDomain
    ran = relationsRange

let rr = Set.fromList [(i,i*2) | i <- [1..5]]

rp rr
"{(1,2),(2,4),(3,6),(4,8),(5,10)}"

rp $ dom rr
"{1,2,3,4,5}"

rp $ ran rr
"{2,4,6,8,10}"

Functions

Functions are special cases of relations such that each element of the domain appears exactly once. Functions can be finite or infinite. For example, $\{(1,2),(2,4)\} \subset \{(x,2x) : x \in \mathbf{R}\}$. The powerset of functional relations between sets is denoted $\to$. For example, $\{(x,2x) : x \in \mathbf{R}\} \in \mathbf{R} \to \mathbf{R}$,

relationsIsFunc :: (Ord a) => Set.Set (a,b) -> Bool

For example,

let isfunc = relationsIsFunc

let rr = Set.fromList [(i,i*2) | i <- [1..5]]

isfunc rr
True

let ss = Set.fromList [(i,i `div` 2) | i <- [1..7]]

isfunc ss
True

isfunc $ rr `Set.union` ss
False

The application of the function $F \in \mathcal{X} \to \mathcal{Y}$ to an argument $x \in \mathcal{X}$ is denoted by $F(x) \in \mathcal{Y}$ or $F_x \in \mathcal{Y}$.

Functions $F \in \mathcal{X} \to \mathcal{Y}$ and $G \in \mathcal{Y} \to \mathcal{Z}$ can be composed $G \circ F \in \mathcal{X} \to \mathcal{Z}$.

The inverse of a function, $\mathrm{inverse} \in (\mathcal{X} \to \mathcal{Y}) \to (\mathcal{Y} \to \mathrm{P}(\mathcal{X}))$, is defined $\mathrm{inverse}(F) := \{(y,\{x : (x,z) \in F,~z=y\}) : y \in \mathrm{ran}(F) \}$, and is sometimes denoted $F^{-1}$,

functionsInverse :: (Ord a, Ord b) => Map.Map a b -> Map.Map b (Set.Set a)

For example,

let inv = functionsInverse

let ff = Map.fromList [(i,i*2) | i <- [1..5]]

rp $ inv ff
"{(2,{1}),(4,{2}),(6,{3}),(8,{4}),(10,{5})}"

let gg = Map.fromList [(i,i `div` 2) | i <- [1..7]]

rp $ inv gg
"{(0,{1}),(1,{2,3}),(2,{4,5}),(3,{6,7})}"

The range of the inverse is a partition of the domain, $\mathrm{ran}(F^{-1}) \in \mathrm{B}(\mathrm{dom}(F))$,

functionsDomain :: (Ord a, Ord b) => Map.Map a b -> Set.Set a
functionsRange :: (Ord a, Ord b) => Map.Map a b -> Set.Set b
setsIsPartition :: Ord a => Set.Set (Set.Set a) -> Bool

For example,

let dom = functionsDomain
    ran = functionsRange
    ispart = setsIsPartition

ispart $ ran (inv ff)
True

ispart $ ran (inv gg)
True

Functions may be recursive. Algorithms are represented as recursive functions.

The powerset of bijective relations, or one-to-one functions, is denoted $\leftrightarrow$. The cardinality of the domain of a bijective function equals the range, $F \in \mathrm{dom}(F) \leftrightarrow \mathrm{ran}(F) \implies |\mathrm{dom}(F)|= |\mathrm{ran}(F)|$,

functionsIsBijective :: (Ord a, Ord b) => Map.Map a b -> Bool

For example,

let isbi = functionsIsBijective

isbi ff
True

Set.size (dom ff) == Set.size (ran ff)
True

isbi gg
False

Total functions are denoted with a colon. For example, the left total function $F \in X :\to Y$ requires that $\mathrm{dom}(F)=X$ but only that $\mathrm{ran}(F) \subseteq Y$.

Orders

An order $D$ on some set $X$ is a choice of the enumerations, $D \in X :\leftrightarrow: \{1 \ldots |X|\}$. Given the order, any subset $Y \subseteq X$ can be enumerated. Define $\mathrm{order}(D,Y) \in Y :\leftrightarrow: \{1 \ldots |Y|\}$ such that $\forall a,b \in Y~(D_a \leq D_b \implies \mathrm{order}(D,Y)(a) \leq \mathrm{order}(D,Y)(b))$.

Numbers

The set of natural numbers $\mathbf{N}$ is taken to include $0$. The set $\mathbf{N}_{>0}$ excludes $0$. The space of a non-zero natural number is the natural logarithm, $\mathrm{space}(n) := \ln n$. The set of rational numbers is denoted $\mathbf{Q}$. The set of log-rational numbers is denoted $\ln \mathbf{Q}_{>0} = \{\ln q : q \in \mathbf{Q}_{>0}\}$. The set of real numbers is denoted $\mathbf{R}$.

Factorial

The factorial of a non-zero natural number $n \in \mathbf{N}_{>0}$ is written $n! = \prod \{1 \ldots n\}$,

factorial :: Integer -> Integer

For example,

[factorial i | i <- [1..10]]
[1,2,6,24,120,720,5040,40320,362880,3628800]

The unit-translated gamma function is the real function that corresponds to the factorial function. It is defined $(\Gamma_!) \in \mathbf{R} \to \mathbf{R}$ as $\Gamma_! x = \Gamma (x+1)$ which is such that $\forall n \in \mathbf{N}_{>0}~(\Gamma_! n = \Gamma (n+1) = n!)$. The log gamma function is defined in AlignmentApprox,

logGamma :: Double -> Double

For example,

let facln x = logGamma (x + 1)

[exp (facln x) | x <- [1 .. 10]]
[1.0,2.0000000002237415,5.99999998992882,24.000000065370017,119.99999946822973,720.0000032523144,5040.000006121138,40319.99961625437,362880.0029698836,3628800.0054153493]

[exp (facln x) | x <- [2, 2.1 .. 3]]
[2.0000000002237415,2.1976202786855406,2.4239654801668395,2.683437381956396,2.9812064264534093,3.3233509697662127,3.717023852331667,4.170651783634442,4.694174206636131,5.299329735238307,5.9999999899288285]

Aggregate functions

Given a relation $A \subset \mathcal{X} \times \mathcal{Y}$ such that an order operator is defined on the range, $\mathcal{Y}$, the $\mathrm{max}$ function returns the maximum subset, $\mathrm{max} \in \mathrm{P}(\mathcal{X} \times \mathcal{Y}) \to (\mathcal{X} \to \mathcal{Y})$ \[ \mathrm{max}(A) := \{(x,y) : (x,y) \in A,~\forall (r,s) \in A~(s \leq y)\} \] For convenience define the functions $\mathrm{maxd}(A) := \mathrm{dom}(\mathrm{max}(A))$ and $\mathrm{maxr}(A) := m$, where $\{m\} = \mathrm{ran}(\mathrm{max}(A))$. The corresponding functions for minimum, $\mathrm{min}$, $\mathrm{mind}$ and $\mathrm{minr}$, are similarly defined,

functionsRelation :: (Ord a, Ord b) => Map.Map a b -> Set.Set (a,b)
relationsMaximum :: (Ord a, Ord b) => Set.Set (a,b) -> Set.Set (a,b)
relationsMinimum :: (Ord a, Ord b) => Set.Set (a,b) -> Set.Set (a,b)

For example,

let rel = functionsRelation
    max = relationsMaximum
    maxd = relationsDomain . relationsMaximum
    maxr = Set.findMax . relationsRange
    min = relationsMinimum
    mind = relationsDomain . relationsMinimum
    minr = Set.findMin . relationsRange

let gg = Map.fromList [(i,i `div` 2) | i <- [1..7]]

rp $ inv gg
"{(0,{1}),(1,{2,3}),(2,{4,5}),(3,{6,7})}"

rp $ max (rel gg)
"{(6,3),(7,3)}"

rp $ maxd (rel gg)
"{6,7}"

maxr (rel gg)
3

rp $ min (rel gg)
"{(1,0)}"

rp $ mind (rel gg)
"{1}"

minr (rel gg)
0

Given a relation $A \subset \mathcal{X} \times \mathcal{Y}$ such that the arithmetic operators are defined on the range, $\mathcal{Y}$, the $\mathrm{sum}$ function is defined $\mathrm{sum}(A) := \sum (y : (x,y) \in A)$,

relationsSum :: (Ord a, Ord b, Num b) => Set.Set (a,b) -> b

For example,

let rr = Set.fromList [(i, toRational i * 2) | i <- [1..10]] :: Set.Set (Int,Rational)

relationsSum rr
110 % 1

The relation can be normalised, $\mathrm{normalise}(A) := \{(x,y/\mathrm{sum}(A)) : (x,y) \in A\}$. Define notation $\hat{A} := \mathrm{normalise}(A)$. A normalised relation is such that its sum is one, $\mathrm{sum}(\hat{A}) = 1$,

relationsNormalise :: (Ord a, Ord b, Fractional b) => Set.Set (a,b) -> Set.Set (a,b)

For example,

rp $ relationsNormalise rr
"{(1,1 % 55),(2,2 % 55),(3,3 % 55),(4,4 % 55),(5,1 % 11),(6,6 % 55),(7,7 % 55),(8,8 % 55),(9,9 % 55),(10,2 % 11)}"

relationsSum $ relationsNormalise rr
1 % 1

Probability functions

The set of probability functions $\mathcal{P}$ is the set of rational valued functions such that the values are bounded $[0,1]$ and sum to $1$, $\mathcal{P} \subset \mathcal{X} \to \mathbf{Q}_{[0,1]}$ and $\forall P \in \mathcal{P}~(\mathrm{sum}(P)=1)$. The normalisation of a positive rational valued function $F \in \mathcal{X} \to \mathbf{Q}_{\geq 0}$ is a probability function, $\hat{F} \in \mathcal{P}$,

functionsIsProbability :: Ord a => Map.Map a Rational -> Bool

For example,

let pp = Map.fromList [(i, toRational i / 15) | i <- [1..5]] :: Map.Map Int Rational

relationsSum (functionsRelation pp)
1 % 1

functionsIsProbability pp
True

The entropy of positive rational valued functions, $\mathrm{entropy} \in (\mathcal{X} \to \mathbf{Q}_{\geq 0}) \to \mathbf{Q}_{\geq 0} \ln \mathbf{Q}_{> 0}$, is defined as $\mathrm{entropy}(N) := -\sum (\hat{N}_x \ln \hat{N}_x : x \in \mathrm{dom}(N),~N_x > 0)$. The entropy of a singleton is zero, $\mathrm{entropy}(\{(\cdot,1)\}) = 0$,

entropy :: [Rational] -> Double

For example,

entropy (replicate 5 1)
1.6094379124341005

entropy (replicate 5 (1%5))
1.6094379124341005

entropy [1..5]
1.4897503188505912

entropy [1,1]
0.6931471805599453

entropy [1,2]
0.6365141682948128

entropy [1]
-0.0

Entropy is maximised in uniform functions as the cardinality tends to infinity, $\mathrm{entropy}(X \times \{1/|X|\}) = \ln |X|$,

[entropy nn | i <- [1..5], let nn = replicate i 1]
[-0.0,0.6931471805599453,1.0986122886681096,1.3862943611198906,1.6094379124341005]

[log i | i <- [1..5]]
[0.0,0.6931471805599453,1.0986122886681098,1.3862943611198906,1.6094379124341003]

Given some finite function $F \in \mathcal{X} \to \mathcal{Y}$, where $0 < |F| < \infty$, a probability function may be constructed from its distribution, $\{(y,|X|) : (y,X) \in F^{-1}\}^{\wedge} \in (\mathcal{Y} \to \mathbf{Q}_{\geq 0}) \cap \mathcal{P}$,

let inv = Map.toList . functionsInverse . Map.fromList 
    norm nn = Set.toList $ relationsNormalise $ Set.fromList [(x, toRational y) | (x,y) <- nn]

let ff = [(1,3),(2,1),(3,1),(4,1),(5,3),(6,2)]

rpln $ inv ff
"(1,{2,3,4})"
"(2,{6})"
"(3,{1,5})"

rpln $ norm [(y, Set.size xx) | (y,xx) <- inv ff]
"(1,1 % 2)"
"(2,1 % 6)"
"(3,1 % 3)"

The probability function of an arbitrarily chosen finite function is likely to have high entropy,

let ent nn = entropy [toRational y | (_,y) <- nn]

ent $ norm [(y, Set.size xx) | (y,xx) <- inv ff]
1.0114042647073518

let gg = [(1,1),(2,1),(3,1),(4,1),(5,1),(6,2)]

rpln $ inv gg
"(1,{1,2,3,4,5})"
"(2,{6})"

rpln $ norm [(y, Set.size xx) | (y,xx) <- inv gg]
"(1,5 % 6)"
"(2,1 % 6)"

ent $ norm [(y, Set.size xx) | (y,xx) <- inv gg]
0.45056120886630463

See Maximum Entropy for more discussion of probability functions and entropy.

A probability function $P(z) \in (X :\to \mathbf{Q}_{\geq 0}) \cap \mathcal{P}$, parameterised by some parameter $z \in Z = \mathrm{dom}(P)$, has a corresponding likelihood function $L(x) \in Z :\to \mathbf{Q}_{\geq 0}$, parameterised by coordinate $x \in X$, such that $L(x)(z) = P(z)(x)$. The maximum likelihood estimate $\tilde{z}$ of the parameter, $z$, at coordinate $x \in X$ is the mode of the likelihood function, \[ \begin{eqnarray} \{\tilde{z}\} &=& \mathrm{maxd}(L(x))\\ &=& \mathrm{maxd}(\{(z,P(z)(x)) : z \in Z\})\\ &=& \{z : z \in Z,~\forall z’ \in Z~(P(z)(x) \geq P(z’)(x))\} \end{eqnarray} \]

Lists

A list is a object valued function of the natural numbers $\mathcal{L}(\mathcal{X}) \subset \mathbf{N} \to \mathcal{X}$, such that $\forall L \in \mathcal{L}(\mathcal{X})~(L \neq \emptyset \implies \mathrm{dom}(L) = \{1 \ldots |L|\})$. Two lists $L,M \in \mathcal{L}(\mathcal{X})$ may be concatenated, $\mathrm{concat}(L,M) := L \cup \{(|L|+i,x) : (i,x) \in M\}$. Lists of type a are implemented in Haskell with the list type [a]. They can be enumerated by zipping with the integers,

let ll = [3,1,4,1,5,9]

rp $ zip [1..] ll
"[(1,3),(2,1),(3,4),(4,1),(5,5),(6,9)]"

let mm = [2,6,5,3,5,8]

rp $ zip [1..] (ll ++ mm)
"[(1,3),(2,1),(3,4),(4,1),(5,5),(6,9),(7,2),(8,6),(9,5),(10,3),(11,5),(12,8)]"

Trees

A tree is recursively defined as a tree valued function of objects, $\mathrm{trees}(\mathcal{X}) = \mathcal{X} \to \mathrm{trees}(\mathcal{X})$. In AlignmentUtil a Tree is defined as a Map.Map of any type a to a Tree of a,

data Tree a = Tree (Map.Map a (Tree a))

The nodes of the tree $T \in \mathrm{trees}(\mathcal{X})$ are $\mathrm{nodes}(T) := T \cup \bigcup \{\mathrm{nodes}(R) : (x,R) \in T\}$, and the elements are $\mathrm{elements}(T) := \mathrm{dom}(\mathrm{nodes}(T))$. The paths of a tree $\mathrm{paths}(T) \subset \mathcal{L}(\mathcal{X})$ is a set of lists. Given a set of lists $Q \subset \mathcal{L}(\mathcal{X})$ a tree can be constructed $\mathrm{tree}(Q) \in \mathrm{trees}(\mathcal{X})$,

treesNodes :: (Ord a, Ord (Tree a)) => Tree a -> Set.Set (a, Tree a)
treesElements :: (Ord a, Ord (Tree a)) => Tree a -> Set.Set a
treesPaths :: (Ord a, Ord (Tree a)) => Tree a -> Set.Set [a]
pathsTree :: (Ord a, Ord (Tree a)) => Set.Set [a] -> Tree a

For example,

let tt = pathsTree $ Set.fromList [
      [1,2],
      [1,3],
      [1,3,4],
      [1,3,4,5,6,7],
      [1],
      [1,8],
      [9]]
  
rpln $ Set.toList $ treesPaths tt
"[1,2]"
"[1,3,4,5,6,7]"
"[1,8]"
"[9]"

rp tt
"{(1,{(2,{}),(3,{(4,{(5,{(6,{(7,{})})})})}),(8,{})}),(9,{})}"

rpln $ Set.toList $ treesNodes tt
"(1,{(2,{}),(3,{(4,{(5,{(6,{(7,{})})})})}),(8,{})})"
"(2,{})"
"(3,{(4,{(5,{(6,{(7,{})})})})})"
"(4,{(5,{(6,{(7,{})})})})"
"(5,{(6,{(7,{})})})"
"(6,{(7,{})})"
"(7,{})"
"(8,{})"
"(9,{})"

rp $ treesElements tt
"{1,2,3,4,5,6,7,8,9}"

top