Maximum Entropy

Haskell implementation of the Overview/Maximum Entropy

Let $X \subset \mathcal{X}$ be a finite set of micro-states, $0 < |X| < \infty$. Consider a system of $n$ distinguishable particles, each in a micro-state. The set of states of the system is the set of micro-state functions of particle identifier, $\{1 \ldots n\} :\to X$. The cardinality of the set of states is $|X|^n$.

let states xx n = foldl mul [[]] [1..n]
      where
        mul rr i = [jj ++ [(i,x)] | jj <- rr, x <- Set.toList xx]

let xx = Set.fromList ['a'..'c']

rp xx
"{'a','b','c'}"

let ss = states xx 6

length ss
729

3^6
729

rpln $ take 10 $ ss
"[(1,'a'),(2,'a'),(3,'a'),(4,'a'),(5,'a'),(6,'a')]"
"[(1,'a'),(2,'a'),(3,'a'),(4,'a'),(5,'a'),(6,'b')]"
"[(1,'a'),(2,'a'),(3,'a'),(4,'a'),(5,'a'),(6,'c')]"
"[(1,'a'),(2,'a'),(3,'a'),(4,'a'),(5,'b'),(6,'a')]"
"[(1,'a'),(2,'a'),(3,'a'),(4,'a'),(5,'b'),(6,'b')]"
"[(1,'a'),(2,'a'),(3,'a'),(4,'a'),(5,'b'),(6,'c')]"
"[(1,'a'),(2,'a'),(3,'a'),(4,'a'),(5,'c'),(6,'a')]"
"[(1,'a'),(2,'a'),(3,'a'),(4,'a'),(5,'c'),(6,'b')]"
"[(1,'a'),(2,'a'),(3,'a'),(4,'a'),(5,'c'),(6,'c')]"
"[(1,'a'),(2,'a'),(3,'a'),(4,'b'),(5,'a'),(6,'a')]"

Each state implies a distribution of particles over micro-states, \[ \begin{eqnarray} I &=& \big\{(R,\{(x,|C|) : (x,C) \in R^{-1}\}) : R \in \{1 \ldots n\} :\to X\big\} \end{eqnarray} \] That is, a state $R \in \{1 \ldots n\} :\to X$ has a particle distribution $I(R) \in X \to \{1 \ldots n\}$ such that $\sum_{x \in X} I(R)(x) = n$,

let inv = Map.toList . functionsInverse . Map.fromList 

let ii = [(rr, [(x, Set.size cc) | (x,cc) <- inv rr]) | rr <- ss]

rpln $ take 10 $ ii
"([(1,'a'),(2,'a'),(3,'a'),(4,'a'),(5,'a'),(6,'a')],[('a',6)])"
"([(1,'a'),(2,'a'),(3,'a'),(4,'a'),(5,'a'),(6,'b')],[('a',5),('b',1)])"
"([(1,'a'),(2,'a'),(3,'a'),(4,'a'),(5,'a'),(6,'c')],[('a',5),('c',1)])"
"([(1,'a'),(2,'a'),(3,'a'),(4,'a'),(5,'b'),(6,'a')],[('a',5),('b',1)])"
"([(1,'a'),(2,'a'),(3,'a'),(4,'a'),(5,'b'),(6,'b')],[('a',4),('b',2)])"
"([(1,'a'),(2,'a'),(3,'a'),(4,'a'),(5,'b'),(6,'c')],[('a',4),('b',1),('c',1)])"
"([(1,'a'),(2,'a'),(3,'a'),(4,'a'),(5,'c'),(6,'a')],[('a',5),('c',1)])"
"([(1,'a'),(2,'a'),(3,'a'),(4,'a'),(5,'c'),(6,'b')],[('a',4),('b',1),('c',1)])"
"([(1,'a'),(2,'a'),(3,'a'),(4,'a'),(5,'c'),(6,'c')],[('a',4),('c',2)])"
"([(1,'a'),(2,'a'),(3,'a'),(4,'b'),(5,'a'),(6,'a')],[('a',5),('b',1)])"

nub [sum (snd (unzip nn)) | (_,nn) <- ii]
[6]

The cardinality of states for each particle distribution, $I(R)$, is the multinomial coefficient, \[ \begin{eqnarray} W &=& \{(N,|D|) : (N,D) \in I^{-1}\}
&=& \{(N,\frac{n!}{\prod_{(x,\cdot) \in N} N_x!}) : (N,\cdot) \in I^{-1}\} \end{eqnarray} \] That is, there are $W(I(R))$ states that have the same particle distribution, $I(R)$, as state $R$,

let ww = [(nn, Set.size dd) | (nn,dd) <- inv ii]

rpln ww
"([('a',1),('b',1),('c',4)],30)"
"([('a',1),('b',2),('c',3)],60)"
"([('a',1),('b',3),('c',2)],60)"
"([('a',1),('b',4),('c',1)],30)"
"([('a',1),('b',5)],6)"
"([('a',1),('c',5)],6)"
"([('a',2),('b',1),('c',3)],60)"
"([('a',2),('b',2),('c',2)],90)"
"([('a',2),('b',3),('c',1)],60)"
"([('a',2),('b',4)],15)"
"([('a',2),('c',4)],15)"
"([('a',3),('b',1),('c',2)],60)"
"([('a',3),('b',2),('c',1)],60)"
"([('a',3),('b',3)],20)"
"([('a',3),('c',3)],20)"
"([('a',4),('b',1),('c',1)],30)"
"([('a',4),('b',2)],15)"
"([('a',4),('c',2)],15)"
"([('a',5),('b',1)],6)"
"([('a',5),('c',1)],6)"
"([('a',6)],1)"
"([('b',1),('c',5)],6)"
"([('b',2),('c',4)],15)"
"([('b',3),('c',3)],20)"
"([('b',4),('c',2)],15)"
"([('b',5),('c',1)],6)"
"([('b',6)],1)"
"([('c',6)],1)"

The function combinationMultinomial is defined in AlignmentUtil,

combinationMultinomial :: Integer -> [Integer] -> Integer

let mult n nn = combinationMultinomial (toInteger n) [toInteger y | (_,y) <- nn]

and [mult 6 nn == toInteger d | (nn,d) <- ww]
True

The normalisation of the state distribution over particle distributions is a probability function, $\hat{W} \in ((X \to \{1 \ldots n\}) \to \mathbf{Q}_{>0}) \cap \mathcal{P}$,

let norm nn = Set.toList $ relationsNormalise $ Set.fromList [(x, toRational y) | (x,y) <- nn]

rpln $ norm ww
"([('a',1),('b',1),('c',4)],10 % 243)"
"([('a',1),('b',2),('c',3)],20 % 243)"
"([('a',1),('b',3),('c',2)],20 % 243)"
"([('a',1),('b',4),('c',1)],10 % 243)"
"([('a',1),('b',5)],2 % 243)"
"([('a',1),('c',5)],2 % 243)"
"([('a',2),('b',1),('c',3)],20 % 243)"
"([('a',2),('b',2),('c',2)],10 % 81)"
"([('a',2),('b',3),('c',1)],20 % 243)"
"([('a',2),('b',4)],5 % 243)"
"([('a',2),('c',4)],5 % 243)"
"([('a',3),('b',1),('c',2)],20 % 243)"
"([('a',3),('b',2),('c',1)],20 % 243)"
"([('a',3),('b',3)],20 % 729)"
"([('a',3),('c',3)],20 % 729)"
"([('a',4),('b',1),('c',1)],10 % 243)"
"([('a',4),('b',2)],5 % 243)"
"([('a',4),('c',2)],5 % 243)"
"([('a',5),('b',1)],2 % 243)"
"([('a',5),('c',1)],2 % 243)"
"([('a',6)],1 % 729)"
"([('b',1),('c',5)],2 % 243)"
"([('b',2),('c',4)],5 % 243)"
"([('b',3),('c',3)],20 % 729)"
"([('b',4),('c',2)],5 % 243)"
"([('b',5),('c',1)],2 % 243)"
"([('b',6)],1 % 729)"
"([('c',6)],1 % 729)"

In the case where the number of particles is large, $n \gg \ln n$, the logarithm of the multinomial coefficient of a particle distribution $N \in X \to \{1 \ldots n\}$ approximates to the scaled entropy, \[ \begin{eqnarray} \ln \frac{n!}{\prod_{(x,\cdot) \in N} N_x!} &\approx& n \times \mathrm{entropy}(N) \end{eqnarray} \] so the probability of the particle distribution varies with its entropy, $\hat{W}(N) \sim \mathrm{entropy}(N)$,

let ent nn = entropy [toRational y | (_,y) <- nn]

rpln $ sort [(d, nn, ent nn) | (nn,d) <- ww]
"(1,[('a',6)],-0.0)"
"(1,[('b',6)],-0.0)"
"(1,[('c',6)],-0.0)"
"(6,[('a',1),('b',5)],0.45056120886630463)"
"(6,[('a',1),('c',5)],0.45056120886630463)"
"(6,[('a',5),('b',1)],0.45056120886630463)"
"(6,[('a',5),('c',1)],0.45056120886630463)"
"(6,[('b',1),('c',5)],0.45056120886630463)"
"(6,[('b',5),('c',1)],0.45056120886630463)"
"(15,[('a',2),('b',4)],0.6365141682948128)"
"(15,[('a',2),('c',4)],0.6365141682948128)"
"(15,[('a',4),('b',2)],0.6365141682948128)"
"(15,[('a',4),('c',2)],0.6365141682948128)"
"(15,[('b',2),('c',4)],0.6365141682948128)"
"(15,[('b',4),('c',2)],0.6365141682948128)"
"(20,[('a',3),('b',3)],0.6931471805599453)"
"(20,[('a',3),('c',3)],0.6931471805599453)"
"(20,[('b',3),('c',3)],0.6931471805599453)"
"(30,[('a',1),('b',1),('c',4)],0.8675632284814612)"
"(30,[('a',1),('b',4),('c',1)],0.8675632284814612)"
"(30,[('a',4),('b',1),('c',1)],0.8675632284814612)"
"(60,[('a',1),('b',2),('c',3)],1.0114042647073518)"
"(60,[('a',1),('b',3),('c',2)],1.0114042647073518)"
"(60,[('a',2),('b',1),('c',3)],1.0114042647073518)"
"(60,[('a',2),('b',3),('c',1)],1.0114042647073516)"
"(60,[('a',3),('b',1),('c',2)],1.0114042647073518)"
"(60,[('a',3),('b',2),('c',1)],1.0114042647073516)"
"(90,[('a',2),('b',2),('c',2)],1.0986122886681096)"

The least probable particle distributions are singletons, \[ \begin{eqnarray} \mathrm{mind}(W) &=& \big\{\{(x,n)\} : x \in X\big\} \end{eqnarray} \] because they have only one state, $\forall x \in X~(W(\{(x,n)\}) = 1)$. The entropy of a singleton distribution is zero, $\mathrm{entropy}(\{(x,n)\}) = 0$,

"(1,[('a',6)],-0.0)"
"(1,[('b',6)],-0.0)"
"(1,[('c',6)],-0.0)"

In the case where the number of particles per micro-state is integral, $n/|X| \in \mathbf{N}_{>0}$, the modal particle distribution is the uniform distribution, \[ \begin{eqnarray} \mathrm{maxd}(W) &=& \big\{\{(x,n/|X|) : x \in X\}\big\} \end{eqnarray} \] The entropy of the uniform distribution is maximised, $\mathrm{entropy}(\{(x,n/|X|) : x \in X\}) = \ln |X|$,

"(90,[('a',2),('b',2),('c',2)],1.0986122886681096)"

log 3
1.0986122886681098

The normalisation of a particle distribution $N \in X \to \{1 \ldots n\}$ is a micro-state probability function, $\hat{N} \in (X \to \mathbf{Q}_{\geq 0}) \cap \mathcal{P}$, which is independent of the number of particles, $\mathrm{sum}(\hat{N}) = 1$.

So in the case where a problem domain is parameterised by an unknown micro-state probability function otherwise arbitrarily chosen from a known subset $Q \subseteq (X \to \mathbf{Q}_{\geq 0}) \cap \mathcal{P}$, where the number of particles is known to be large, the maximum likelihood estimate $\tilde{P} \in Q$ is the probability function with the greatest entropy, $\forall P \in Q~(\mathrm{entropy}(\tilde{P}) \geq \mathrm{entropy}(P))$ or $\tilde{P} \in \mathrm{maxd}(\{(P,\mathrm{entropy}(P)) : P \in Q\})$,

rpln $ sort [(ent nn, norm nn) | (nn,d) <- ww]
"(-0.0,[('a',1 % 1)])"
"(-0.0,[('b',1 % 1)])"
"(-0.0,[('c',1 % 1)])"
"(0.45056120886630463,[('a',1 % 6),('b',5 % 6)])"
"(0.45056120886630463,[('a',1 % 6),('c',5 % 6)])"
"(0.45056120886630463,[('a',5 % 6),('b',1 % 6)])"
"(0.45056120886630463,[('a',5 % 6),('c',1 % 6)])"
"(0.45056120886630463,[('b',1 % 6),('c',5 % 6)])"
"(0.45056120886630463,[('b',5 % 6),('c',1 % 6)])"
"(0.6365141682948128,[('a',1 % 3),('b',2 % 3)])"
"(0.6365141682948128,[('a',1 % 3),('c',2 % 3)])"
"(0.6365141682948128,[('a',2 % 3),('b',1 % 3)])"
"(0.6365141682948128,[('a',2 % 3),('c',1 % 3)])"
"(0.6365141682948128,[('b',1 % 3),('c',2 % 3)])"
"(0.6365141682948128,[('b',2 % 3),('c',1 % 3)])"
"(0.6931471805599453,[('a',1 % 2),('b',1 % 2)])"
"(0.6931471805599453,[('a',1 % 2),('c',1 % 2)])"
"(0.6931471805599453,[('b',1 % 2),('c',1 % 2)])"
"(0.8675632284814612,[('a',1 % 6),('b',1 % 6),('c',2 % 3)])"
"(0.8675632284814612,[('a',1 % 6),('b',2 % 3),('c',1 % 6)])"
"(0.8675632284814612,[('a',2 % 3),('b',1 % 6),('c',1 % 6)])"
"(1.0114042647073516,[('a',1 % 3),('b',1 % 2),('c',1 % 6)])"
"(1.0114042647073516,[('a',1 % 2),('b',1 % 3),('c',1 % 6)])"
"(1.0114042647073518,[('a',1 % 6),('b',1 % 3),('c',1 % 2)])"
"(1.0114042647073518,[('a',1 % 6),('b',1 % 2),('c',1 % 3)])"
"(1.0114042647073518,[('a',1 % 3),('b',1 % 6),('c',1 % 2)])"
"(1.0114042647073518,[('a',1 % 2),('b',1 % 6),('c',1 % 3)])"
"(1.0986122886681096,[('a',1 % 3),('b',1 % 3),('c',1 % 3)])"

top