Notation
Haskell implementation of the Overview/Notation
Sections
Powersets and Partition functions
Introduction
The code in this Haskell commentary can be executed by copying and pasting the code into a Haskell interpreter running the implementation in Alignment repository. Start your favourite Haskell interpreter and then load the AlignmentDev
module. For example,
cd Alignment
ghci
:l AlignmentDev
Note that, depending on your Haskell interpreter you may have to use multi-line mode to cut and paste let
and do
expressions spread over several lines. For example, using ghci
on the command line to enter the following
let x = [1,2,
3,4]
requires either
:{
let x = [1,2,
3,4]
:}
or
:set +m
let x = [1,2,
3,4]
In the second case multi-line mode is terminated by a blank line. The code in this Haskell commentary is formatted assuming the +m
is set. Note that in some interpreters the lines require being cut and pasted individually, even in multi-line mode.
Users of older versions of the Glasgow Haskell Compiler Interpreter should also prevent the compiler from applying the monomorphism restriction to bindings lacking explicit type signatures,
:seti -XNoMonomorphismRestriction
The AlignmentDev
module imports these modules,
import Data.List
import qualified Data.Set as Set
import qualified Data.Map as Map
import GHC.Real
import System.Random
import AlignmentUtil
import Alignment
import AlignmentSubstrate
import AlignmentDistribution
import AlignmentApprox
import AlignmentRandom
import AlignmentPracticable
import AlignmentPracticableIO
The AlignmentDev
module also implements a number of frequently used abbreviations.
The types and functions described in this section are defined in the AlignmentUtil
module.
Represent
The Alignment types implement the class type Represent
, defined in AlignmentUtil
, which requires them to implement the represent
function,
represent :: Show a => a -> String
The represent
function returns a String
that approximates to a set-theoretic representation of the structure. AlignmentDev
defines the abbreviation rp
. For example Set.Set
is represented,
let xx = Set.fromList [-10,-9..10]
rp xx
"{-10,-9,-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7,8,9,10}"
And Map.Map
is represented,
let mm = Map.fromList [(i,i*2) | i <- [1..5]]
rp mm
"{(1,2),(2,4),(3,6),(4,8),(5,10)}"
AlignmentDev
also defines a way to represent a list on separate lines,
rpln [[1..i] | i <- [1..5]]
"[1]"
"[1,2]"
"[1,2,3]"
"[1,2,3,4]"
"[1,2,3,4,5]"
Set-builder notation
The notation used throughout this discussion is conventional set-theoretic with some additions. Sets are often defined using set-builder notation, for example $Z = \{ \mathrm{f}(x) : x \in X,~\mathrm{p}(x) \}$ where $\mathrm{f}(x)$ is a function, $X$ is another set and $\mathrm{p}(x)$ is a predicate.
Tuples, or lists, can be defined similarly where the order is not important, for example, $\sum (\mathrm{f}(x) : x \in X,~\mathrm{p}(x))$. Haskell has a special syntax for list comprehensions,
let f x = x^2
p x = -5 < x && x < 5
xx = Set.fromList [-10,-9..10]
rp xx
"{-10,-9,-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7,8,9,10}"
rp [f x | x <- Set.toList xx, p x]
"[16,9,4,1,0,1,4,9,16]"
sum [f x | x <- Set.toList xx, p x]
60
let zz = Set.fromList [f x | x <- Set.toList xx, p x]
rp zz
"{0,1,4,9,16}"
sum $ Set.toList zz
30
Powersets and Partition functions
The powerset function is defined as $\mathrm{P}(A) := \{X : X \subseteq A \}$,
setsPowerset :: Ord a => Set.Set a -> Set.Set (Set.Set a)
For example,
rpln $ Set.toList $ setsPowerset (Set.fromList [1,2,3])
"{}"
"{1}"
"{1,2}"
"{1,2,3}"
"{1,3}"
"{2}"
"{2,3}"
"{3}"
The partition function $\mathrm{B}$ is the set of all partitions of an argument set. A partition is a set of non-empty disjoint subsets, called components, which union to equal the argument, \[ \forall P \in \mathrm{B}(A)~\forall C\in P~(C \neq \emptyset) \] and \[ \forall P \in \mathrm{B}(A)~\forall C,D \in P~(C \neq D \implies C \cap D = \emptyset) \] and \[ \forall P \in \mathrm{B}(A)~(\bigcup P = A) \]
setsSetPartition :: Ord a => Set.Set a -> Set.Set (Set.Set (Set.Set a))
For example,
rpln $ Set.toList $ setsSetPartition (Set.fromList [1,2,3])
"{ {1},{2},{3} }"
"{ {1},{2,3} }"
"{ {1,2},{3} }"
"{ {1,2,3} }"
"{ {1,3},{2} }"
A partition $P \in \mathrm{B}(A)$ is a parent of a partition $Q \in \mathrm{B}(A)$, if each component of $Q$ is a subset of a component in $P$, $\forall D \in Q~\exists C \in P~(D \subseteq C)$,
pairPartitionsIsParent :: Ord a => Set.Set (Set.Set a) -> Set.Set (Set.Set a) -> Bool
For example,
let ll = Set.toList $ setsSetPartition (Set.fromList [1,2,3])
rpln [(xx,yy) | xx <- ll, yy <- ll, pairPartitionsIsParent xx yy]
"({ {1},{2},{3} },{ {1},{2},{3} })"
"({ {1},{2,3} },{ {1},{2},{3} })"
"({ {1},{2,3} },{ {1},{2,3} })"
"({ {1,2},{3} },{ {1},{2},{3} })"
"({ {1,2},{3} },{ {1,2},{3} })"
"({ {1,2,3} },{ {1},{2},{3} })"
"({ {1,2,3} },{ {1},{2,3} })"
"({ {1,2,3} },{ {1,2},{3} })"
"({ {1,2,3} },{ {1,2,3} })"
"({ {1,2,3} },{ {1,3},{2} })"
"({ {1,3},{2} },{ {1},{2},{3} })"
"({ {1,3},{2} },{ {1,3},{2} })"
A parent partition is a decrement partition if it has one fewer components, $|P| = |Q| - 1$,
pairPartitionsIsDecrement :: Ord a => Set.Set (Set.Set a) -> Set.Set (Set.Set a) -> Bool
is
pairPartitionsIsDecrement pp qq = pairPartitionsIsParent pp qq && Set.size pp == Set.size qq - 1
For example,
rpln [(xx,yy) | xx <- ll, yy <- ll, pairPartitionsIsDecrement xx yy]
"({ {1},{2,3} },{ {1},{2},{3} })"
"({ {1,2},{3} },{ {1},{2},{3} })"
"({ {1,2,3} },{ {1},{2,3} })"
"({ {1,2,3} },{ {1,2},{3} })"
"({ {1,2,3} },{ {1,3},{2} })"
"({ {1,3},{2} },{ {1},{2},{3} })"
Relations
A relation $A \in \mathrm{P}(\mathcal{X} \times \mathcal{Y})$ between the set $\mathcal{X}$ and the set $\mathcal{Y}$ is a set of pairs, $\forall (x,y) \in A~(x \in \mathcal{X}~\wedge~y \in \mathcal{Y})$.
The domain of a relation is $\mathrm{dom}(A) := \{x : (x,y) \in A\}$ and the range is $\mathrm{ran}(A) := \{y : (x,y) \in A\}$,
relationsDomain :: (Ord a, Ord b) => Set.Set (a,b) -> Set.Set a
relationsRange :: (Ord a, Ord b) => Set.Set (a,b) -> Set.Set b
For example,
let dom = relationsDomain
ran = relationsRange
let rr = Set.fromList [(i,i*2) | i <- [1..5]]
rp rr
"{(1,2),(2,4),(3,6),(4,8),(5,10)}"
rp $ dom rr
"{1,2,3,4,5}"
rp $ ran rr
"{2,4,6,8,10}"
Functions
Functions are special cases of relations such that each element of the domain appears exactly once. Functions can be finite or infinite. For example, $\{(1,2),(2,4)\} \subset \{(x,2x) : x \in \mathbf{R}\}$. The powerset of functional relations between sets is denoted $\to$. For example, $\{(x,2x) : x \in \mathbf{R}\} \in \mathbf{R} \to \mathbf{R}$,
relationsIsFunc :: (Ord a) => Set.Set (a,b) -> Bool
For example,
let isfunc = relationsIsFunc
let rr = Set.fromList [(i,i*2) | i <- [1..5]]
isfunc rr
True
let ss = Set.fromList [(i,i `div` 2) | i <- [1..7]]
isfunc ss
True
isfunc $ rr `Set.union` ss
False
The application of the function $F \in \mathcal{X} \to \mathcal{Y}$ to an argument $x \in \mathcal{X}$ is denoted by $F(x) \in \mathcal{Y}$ or $F_x \in \mathcal{Y}$.
Functions $F \in \mathcal{X} \to \mathcal{Y}$ and $G \in \mathcal{Y} \to \mathcal{Z}$ can be composed $G \circ F \in \mathcal{X} \to \mathcal{Z}$.
The inverse of a function, $\mathrm{inverse} \in (\mathcal{X} \to \mathcal{Y}) \to (\mathcal{Y} \to \mathrm{P}(\mathcal{X}))$, is defined $\mathrm{inverse}(F) := \{(y,\{x : (x,z) \in F,~z=y\}) : y \in \mathrm{ran}(F) \}$, and is sometimes denoted $F^{-1}$,
functionsInverse :: (Ord a, Ord b) => Map.Map a b -> Map.Map b (Set.Set a)
For example,
let inv = functionsInverse
let ff = Map.fromList [(i,i*2) | i <- [1..5]]
rp $ inv ff
"{(2,{1}),(4,{2}),(6,{3}),(8,{4}),(10,{5})}"
let gg = Map.fromList [(i,i `div` 2) | i <- [1..7]]
rp $ inv gg
"{(0,{1}),(1,{2,3}),(2,{4,5}),(3,{6,7})}"
The range of the inverse is a partition of the domain, $\mathrm{ran}(F^{-1}) \in \mathrm{B}(\mathrm{dom}(F))$,
functionsDomain :: (Ord a, Ord b) => Map.Map a b -> Set.Set a
functionsRange :: (Ord a, Ord b) => Map.Map a b -> Set.Set b
setsIsPartition :: Ord a => Set.Set (Set.Set a) -> Bool
For example,
let dom = functionsDomain
ran = functionsRange
ispart = setsIsPartition
ispart $ ran (inv ff)
True
ispart $ ran (inv gg)
True
Functions may be recursive. Algorithms are represented as recursive functions.
The powerset of bijective relations, or one-to-one functions, is denoted $\leftrightarrow$. The cardinality of the domain of a bijective function equals the range, $F \in \mathrm{dom}(F) \leftrightarrow \mathrm{ran}(F) \implies |\mathrm{dom}(F)|= |\mathrm{ran}(F)|$,
functionsIsBijective :: (Ord a, Ord b) => Map.Map a b -> Bool
For example,
let isbi = functionsIsBijective
isbi ff
True
Set.size (dom ff) == Set.size (ran ff)
True
isbi gg
False
Total functions are denoted with a colon. For example, the left total function $F \in X :\to Y$ requires that $\mathrm{dom}(F)=X$ but only that $\mathrm{ran}(F) \subseteq Y$.
Orders
An order $D$ on some set $X$ is a choice of the enumerations, $D \in X :\leftrightarrow: \{1 \ldots |X|\}$. Given the order, any subset $Y \subseteq X$ can be enumerated. Define $\mathrm{order}(D,Y) \in Y :\leftrightarrow: \{1 \ldots |Y|\}$ such that $\forall a,b \in Y~(D_a \leq D_b \implies \mathrm{order}(D,Y)(a) \leq \mathrm{order}(D,Y)(b))$.
Numbers
The set of natural numbers $\mathbf{N}$ is taken to include $0$. The set $\mathbf{N}_{>0}$ excludes $0$. The space of a non-zero natural number is the natural logarithm, $\mathrm{space}(n) := \ln n$. The set of rational numbers is denoted $\mathbf{Q}$. The set of log-rational numbers is denoted $\ln \mathbf{Q}_{>0} = \{\ln q : q \in \mathbf{Q}_{>0}\}$. The set of real numbers is denoted $\mathbf{R}$.
Factorial
The factorial of a non-zero natural number $n \in \mathbf{N}_{>0}$ is written $n! = \prod \{1 \ldots n\}$,
factorial :: Integer -> Integer
For example,
[factorial i | i <- [1..10]]
[1,2,6,24,120,720,5040,40320,362880,3628800]
The unit-translated gamma function is the real function that corresponds to the factorial function. It is defined $(\Gamma_!) \in \mathbf{R} \to \mathbf{R}$ as $\Gamma_! x = \Gamma (x+1)$ which is such that $\forall n \in \mathbf{N}_{>0}~(\Gamma_! n = \Gamma (n+1) = n!)$. The log gamma function is defined in AlignmentApprox
,
logGamma :: Double -> Double
For example,
let facln x = logGamma (x + 1)
[exp (facln x) | x <- [1 .. 10]]
[1.0,2.0000000002237415,5.99999998992882,24.000000065370017,119.99999946822973,720.0000032523144,5040.000006121138,40319.99961625437,362880.0029698836,3628800.0054153493]
[exp (facln x) | x <- [2, 2.1 .. 3]]
[2.0000000002237415,2.1976202786855406,2.4239654801668395,2.683437381956396,2.9812064264534093,3.3233509697662127,3.717023852331667,4.170651783634442,4.694174206636131,5.299329735238307,5.9999999899288285]
Aggregate functions
Given a relation $A \subset \mathcal{X} \times \mathcal{Y}$ such that an order operator is defined on the range, $\mathcal{Y}$, the $\mathrm{max}$ function returns the maximum subset, $\mathrm{max} \in \mathrm{P}(\mathcal{X} \times \mathcal{Y}) \to (\mathcal{X} \to \mathcal{Y})$ \[ \mathrm{max}(A) := \{(x,y) : (x,y) \in A,~\forall (r,s) \in A~(s \leq y)\} \] For convenience define the functions $\mathrm{maxd}(A) := \mathrm{dom}(\mathrm{max}(A))$ and $\mathrm{maxr}(A) := m$, where $\{m\} = \mathrm{ran}(\mathrm{max}(A))$. The corresponding functions for minimum, $\mathrm{min}$, $\mathrm{mind}$ and $\mathrm{minr}$, are similarly defined,
functionsRelation :: (Ord a, Ord b) => Map.Map a b -> Set.Set (a,b)
relationsMaximum :: (Ord a, Ord b) => Set.Set (a,b) -> Set.Set (a,b)
relationsMinimum :: (Ord a, Ord b) => Set.Set (a,b) -> Set.Set (a,b)
For example,
let rel = functionsRelation
max = relationsMaximum
maxd = relationsDomain . relationsMaximum
maxr = Set.findMax . relationsRange
min = relationsMinimum
mind = relationsDomain . relationsMinimum
minr = Set.findMin . relationsRange
let gg = Map.fromList [(i,i `div` 2) | i <- [1..7]]
rp $ inv gg
"{(0,{1}),(1,{2,3}),(2,{4,5}),(3,{6,7})}"
rp $ max (rel gg)
"{(6,3),(7,3)}"
rp $ maxd (rel gg)
"{6,7}"
maxr (rel gg)
3
rp $ min (rel gg)
"{(1,0)}"
rp $ mind (rel gg)
"{1}"
minr (rel gg)
0
Given a relation $A \subset \mathcal{X} \times \mathcal{Y}$ such that the arithmetic operators are defined on the range, $\mathcal{Y}$, the $\mathrm{sum}$ function is defined $\mathrm{sum}(A) := \sum (y : (x,y) \in A)$,
relationsSum :: (Ord a, Ord b, Num b) => Set.Set (a,b) -> b
For example,
let rr = Set.fromList [(i, toRational i * 2) | i <- [1..10]] :: Set.Set (Int,Rational)
relationsSum rr
110 % 1
The relation can be normalised, $\mathrm{normalise}(A) := \{(x,y/\mathrm{sum}(A)) : (x,y) \in A\}$. Define notation $\hat{A} := \mathrm{normalise}(A)$. A normalised relation is such that its sum is one, $\mathrm{sum}(\hat{A}) = 1$,
relationsNormalise :: (Ord a, Ord b, Fractional b) => Set.Set (a,b) -> Set.Set (a,b)
For example,
rp $ relationsNormalise rr
"{(1,1 % 55),(2,2 % 55),(3,3 % 55),(4,4 % 55),(5,1 % 11),(6,6 % 55),(7,7 % 55),(8,8 % 55),(9,9 % 55),(10,2 % 11)}"
relationsSum $ relationsNormalise rr
1 % 1
Probability functions
The set of probability functions $\mathcal{P}$ is the set of rational valued functions such that the values are bounded $[0,1]$ and sum to $1$, $\mathcal{P} \subset \mathcal{X} \to \mathbf{Q}_{[0,1]}$ and $\forall P \in \mathcal{P}~(\mathrm{sum}(P)=1)$. The normalisation of a positive rational valued function $F \in \mathcal{X} \to \mathbf{Q}_{\geq 0}$ is a probability function, $\hat{F} \in \mathcal{P}$,
functionsIsProbability :: Ord a => Map.Map a Rational -> Bool
For example,
let pp = Map.fromList [(i, toRational i / 15) | i <- [1..5]] :: Map.Map Int Rational
relationsSum (functionsRelation pp)
1 % 1
functionsIsProbability pp
True
The entropy of positive rational valued functions, $\mathrm{entropy} \in (\mathcal{X} \to \mathbf{Q}_{\geq 0}) \to \mathbf{Q}_{\geq 0} \ln \mathbf{Q}_{> 0}$, is defined as $\mathrm{entropy}(N) := -\sum (\hat{N}_x \ln \hat{N}_x : x \in \mathrm{dom}(N),~N_x > 0)$. The entropy of a singleton is zero, $\mathrm{entropy}(\{(\cdot,1)\}) = 0$,
entropy :: [Rational] -> Double
For example,
entropy (replicate 5 1)
1.6094379124341005
entropy (replicate 5 (1%5))
1.6094379124341005
entropy [1..5]
1.4897503188505912
entropy [1,1]
0.6931471805599453
entropy [1,2]
0.6365141682948128
entropy [1]
-0.0
Entropy is maximised in uniform functions as the cardinality tends to infinity, $\mathrm{entropy}(X \times \{1/|X|\}) = \ln |X|$,
[entropy nn | i <- [1..5], let nn = replicate i 1]
[-0.0,0.6931471805599453,1.0986122886681096,1.3862943611198906,1.6094379124341005]
[log i | i <- [1..5]]
[0.0,0.6931471805599453,1.0986122886681098,1.3862943611198906,1.6094379124341003]
Given some finite function $F \in \mathcal{X} \to \mathcal{Y}$, where $0 < |F| < \infty$, a probability function may be constructed from its distribution, $\{(y,|X|) : (y,X) \in F^{-1}\}^{\wedge} \in (\mathcal{Y} \to \mathbf{Q}_{\geq 0}) \cap \mathcal{P}$,
let inv = Map.toList . functionsInverse . Map.fromList
norm nn = Set.toList $ relationsNormalise $ Set.fromList [(x, toRational y) | (x,y) <- nn]
let ff = [(1,3),(2,1),(3,1),(4,1),(5,3),(6,2)]
rpln $ inv ff
"(1,{2,3,4})"
"(2,{6})"
"(3,{1,5})"
rpln $ norm [(y, Set.size xx) | (y,xx) <- inv ff]
"(1,1 % 2)"
"(2,1 % 6)"
"(3,1 % 3)"
The probability function of an arbitrarily chosen finite function is likely to have high entropy,
let ent nn = entropy [toRational y | (_,y) <- nn]
ent $ norm [(y, Set.size xx) | (y,xx) <- inv ff]
1.0114042647073518
let gg = [(1,1),(2,1),(3,1),(4,1),(5,1),(6,2)]
rpln $ inv gg
"(1,{1,2,3,4,5})"
"(2,{6})"
rpln $ norm [(y, Set.size xx) | (y,xx) <- inv gg]
"(1,5 % 6)"
"(2,1 % 6)"
ent $ norm [(y, Set.size xx) | (y,xx) <- inv gg]
0.45056120886630463
See Maximum Entropy for more discussion of probability functions and entropy.
A probability function $P(z) \in (X :\to \mathbf{Q}_{\geq 0}) \cap \mathcal{P}$, parameterised by some parameter $z \in Z = \mathrm{dom}(P)$, has a corresponding likelihood function $L(x) \in Z :\to \mathbf{Q}_{\geq 0}$, parameterised by coordinate $x \in X$, such that $L(x)(z) = P(z)(x)$. The maximum likelihood estimate $\tilde{z}$ of the parameter, $z$, at coordinate $x \in X$ is the mode of the likelihood function, \[ \begin{eqnarray} \{\tilde{z}\} &=& \mathrm{maxd}(L(x))\\ &=& \mathrm{maxd}(\{(z,P(z)(x)) : z \in Z\})\\ &=& \{z : z \in Z,~\forall z’ \in Z~(P(z)(x) \geq P(z’)(x))\} \end{eqnarray} \]
Lists
A list is a object valued function of the natural numbers $\mathcal{L}(\mathcal{X}) \subset \mathbf{N} \to \mathcal{X}$, such that $\forall L \in \mathcal{L}(\mathcal{X})~(L \neq \emptyset \implies \mathrm{dom}(L) = \{1 \ldots |L|\})$. Two lists $L,M \in \mathcal{L}(\mathcal{X})$ may be concatenated, $\mathrm{concat}(L,M) := L \cup \{(|L|+i,x) : (i,x) \in M\}$. Lists of type a
are implemented in Haskell with the list type [a]
. They can be enumerated by zipping with the integers,
let ll = [3,1,4,1,5,9]
rp $ zip [1..] ll
"[(1,3),(2,1),(3,4),(4,1),(5,5),(6,9)]"
let mm = [2,6,5,3,5,8]
rp $ zip [1..] (ll ++ mm)
"[(1,3),(2,1),(3,4),(4,1),(5,5),(6,9),(7,2),(8,6),(9,5),(10,3),(11,5),(12,8)]"
Trees
A tree is recursively defined as a tree valued function of objects, $\mathrm{trees}(\mathcal{X}) = \mathcal{X} \to \mathrm{trees}(\mathcal{X})$. In AlignmentUtil
a Tree
is defined as a Map.Map
of any type a
to a Tree
of a
,
data Tree a = Tree (Map.Map a (Tree a))
The nodes of the tree $T \in \mathrm{trees}(\mathcal{X})$ are $\mathrm{nodes}(T) := T \cup \bigcup \{\mathrm{nodes}(R) : (x,R) \in T\}$, and the elements are $\mathrm{elements}(T) := \mathrm{dom}(\mathrm{nodes}(T))$. The paths of a tree $\mathrm{paths}(T) \subset \mathcal{L}(\mathcal{X})$ is a set of lists. Given a set of lists $Q \subset \mathcal{L}(\mathcal{X})$ a tree can be constructed $\mathrm{tree}(Q) \in \mathrm{trees}(\mathcal{X})$,
treesNodes :: (Ord a, Ord (Tree a)) => Tree a -> Set.Set (a, Tree a)
treesElements :: (Ord a, Ord (Tree a)) => Tree a -> Set.Set a
treesPaths :: (Ord a, Ord (Tree a)) => Tree a -> Set.Set [a]
pathsTree :: (Ord a, Ord (Tree a)) => Set.Set [a] -> Tree a
For example,
let tt = pathsTree $ Set.fromList [
[1,2],
[1,3],
[1,3,4],
[1,3,4,5,6,7],
[1],
[1,8],
[9]]
rpln $ Set.toList $ treesPaths tt
"[1,2]"
"[1,3,4,5,6,7]"
"[1,8]"
"[9]"
rp tt
"{(1,{(2,{}),(3,{(4,{(5,{(6,{(7,{})})})})}),(8,{})}),(9,{})}"
rpln $ Set.toList $ treesNodes tt
"(1,{(2,{}),(3,{(4,{(5,{(6,{(7,{})})})})}),(8,{})})"
"(2,{})"
"(3,{(4,{(5,{(6,{(7,{})})})})})"
"(4,{(5,{(6,{(7,{})})})})"
"(5,{(6,{(7,{})})})"
"(6,{(7,{})})"
"(7,{})"
"(8,{})"
"(9,{})"
rp $ treesElements tt
"{1,2,3,4,5,6,7,8,9}"