# Aligned Induction

## Maximum Entropy

Python implementation of the Overview/Maximum Entropy

Let $X \subset \mathcal{X}$ be a finite set of micro-states, $0 < |X| < \infty$. Consider a system of $n$ distinguishable particles, each in a micro-state. The set of states of the system is the set of micro-state functions of particle identifier, $\{1 \ldots n\} :\to X$. The cardinality of the set of states is $|X|^n$.

def states(xx,n):
ll = [[]]
for i in range(1,n+1):
ll0 = []
for jj in ll:
for x in xx:
ll0.append(jj+[(i,x)])
ll = ll0
return ll

xx = sset(['a','b','c'])

xx
{'a', 'b', 'c'}

ss = states(xx,6)

len(ss)
729

3**6
729

rpln(ss[:10])
# [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'a'), (6, 'a')]
# [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'a'), (6, 'b')]
# [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'a'), (6, 'c')]
# [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'b'), (6, 'a')]
# [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'b'), (6, 'b')]
# [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'b'), (6, 'c')]
# [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'c'), (6, 'a')]
# [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'c'), (6, 'b')]
# [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'c'), (6, 'c')]
# [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'b'), (5, 'a'), (6, 'a')]


Each state implies a distribution of particles over micro-states, $\begin{eqnarray} I &=& \big\{(R,\{(x,|C|) : (x,C) \in R^{-1}\}) : R \in \{1 \ldots n\} :\to X\big\} \end{eqnarray}$ That is, a state $R \in \{1 \ldots n\} :\to X$ has a particle distribution $I(R) \in X \to \{1 \ldots n\}$ such that $\sum_{x \in X} I(R)(x) = n$,

def inv(ll):
return slist(functionsInverse(sdict(ll)).items())

ii = [(slist(rr),slist([(x,len(cc)) for (x,cc) in inv(rr)])) for rr in ss]

rpln(ii[:10])
# ([(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'a'), (6, 'a')], [('a', 6)])
# ([(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'a'), (6, 'b')], [('a', 5), ('b', 1)])
# ([(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'a'), (6, 'c')], [('a', 5), ('c', 1)])
# ([(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'b'), (6, 'a')], [('a', 5), ('b', 1)])
# ([(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'b'), (6, 'b')], [('a', 4), ('b', 2)])
# ([(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'b'), (6, 'c')], [('a', 4), ('b', 1), ('c', 1)])
# ([(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'c'), (6, 'a')], [('a', 5), ('c', 1)])
# ([(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'c'), (6, 'b')], [('a', 4), ('b', 1), ('c', 1)])
# ([(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'c'), (6, 'c')], [('a', 4), ('c', 2)])
# ([(1, 'a'), (2, 'a'), (3, 'a'), (4, 'b'), (5, 'a'), (6, 'a')], [('a', 5), ('b', 1)])

sset([sum([c for (_,c) in nn]) for (_,nn) in ii])
# {6}


The cardinality of states for each particle distribution, $I(R)$, is the multinomial coefficient, $\begin{eqnarray} W &=& \{(N,|D|) : (N,D) \in I^{-1}\} &=& \{(N,\frac{n!}{\prod_{(x,\cdot) \in N} N_x!}) : (N,\cdot) \in I^{-1}\} \end{eqnarray}$ That is, there are $W(I(R))$ states that have the same particle distribution, $I(R)$, as state $R$,

ww = [(nn, len(dd)) for (nn,dd) in inv(ii)]

rpln(ww)
# ([('a', 1), ('b', 1), ('c', 4)], 30)
# ([('a', 1), ('b', 2), ('c', 3)], 60)
# ([('a', 1), ('b', 3), ('c', 2)], 60)
# ([('a', 1), ('b', 4), ('c', 1)], 30)
# ([('a', 1), ('b', 5)], 6)
# ([('a', 1), ('c', 5)], 6)
# ([('a', 2), ('b', 1), ('c', 3)], 60)
# ([('a', 2), ('b', 2), ('c', 2)], 90)
# ([('a', 2), ('b', 3), ('c', 1)], 60)
# ([('a', 2), ('b', 4)], 15)
# ([('a', 2), ('c', 4)], 15)
# ([('a', 3), ('b', 1), ('c', 2)], 60)
# ([('a', 3), ('b', 2), ('c', 1)], 60)
# ([('a', 3), ('b', 3)], 20)
# ([('a', 3), ('c', 3)], 20)
# ([('a', 4), ('b', 1), ('c', 1)], 30)
# ([('a', 4), ('b', 2)], 15)
# ([('a', 4), ('c', 2)], 15)
# ([('a', 5), ('b', 1)], 6)
# ([('a', 5), ('c', 1)], 6)
# ([('a', 6)], 1)
# ([('b', 1), ('c', 5)], 6)
# ([('b', 2), ('c', 4)], 15)
# ([('b', 3), ('c', 3)], 20)
# ([('b', 4), ('c', 2)], 15)
# ([('b', 5), ('c', 1)], 6)
# ([('b', 6)], 1)
# ([('c', 6)], 1)


The function combinationMultinomial is defined in AlignmentUtil,

combinationMultinomial :: Integer -> [Integer] -> Integer


So

def mult(n,nn):
return combinationMultinomial(n,[y for (_,y) in nn])

all([mult(6,nn) == d for (nn,d) in ww])
True


The normalisation of the state distribution over particle distributions is a probability function, $\hat{W} \in ((X \to \{1 \ldots n\}) \to \mathbf{Q}_{>0}) \cap \mathcal{P}$,

def norm(nn):
return [(i,ratio(r)) for (i,r) in relationsNormalise([(x,ratio(y)) for (x,y) in nn])]

rpln(norm(ww))
# ([('a', 1), ('b', 1), ('c', 4)], 10 % 243)
# ([('a', 1), ('b', 2), ('c', 3)], 20 % 243)
# ([('a', 1), ('b', 3), ('c', 2)], 20 % 243)
# ([('a', 1), ('b', 4), ('c', 1)], 10 % 243)
# ([('a', 1), ('b', 5)], 2 % 243)
# ([('a', 1), ('c', 5)], 2 % 243)
# ([('a', 2), ('b', 1), ('c', 3)], 20 % 243)
# ([('a', 2), ('b', 2), ('c', 2)], 10 % 81)
# ([('a', 2), ('b', 3), ('c', 1)], 20 % 243)
# ([('a', 2), ('b', 4)], 5 % 243)
# ([('a', 2), ('c', 4)], 5 % 243)
# ([('a', 3), ('b', 1), ('c', 2)], 20 % 243)
# ([('a', 3), ('b', 2), ('c', 1)], 20 % 243)
# ([('a', 3), ('b', 3)], 20 % 729)
# ([('a', 3), ('c', 3)], 20 % 729)
# ([('a', 4), ('b', 1), ('c', 1)], 10 % 243)
# ([('a', 4), ('b', 2)], 5 % 243)
# ([('a', 4), ('c', 2)], 5 % 243)
# ([('a', 5), ('b', 1)], 2 % 243)
# ([('a', 5), ('c', 1)], 2 % 243)
# ([('a', 6)], 1 % 729)
# ([('b', 1), ('c', 5)], 2 % 243)
# ([('b', 2), ('c', 4)], 5 % 243)
# ([('b', 3), ('c', 3)], 20 % 729)
# ([('b', 4), ('c', 2)], 5 % 243)
# ([('b', 5), ('c', 1)], 2 % 243)
# ([('b', 6)], 1 % 729)
# ([('c', 6)], 1 % 729)


In the case where the number of particles is large, $n \gg \ln n$, the logarithm of the multinomial coefficient of a particle distribution $N \in X \to \{1 \ldots n\}$ approximates to the scaled entropy, $\begin{eqnarray} \ln \frac{n!}{\prod_{(x,\cdot) \in N} N_x!} &\approx& n \times \mathrm{entropy}(N) \end{eqnarray}$ so the probability of the particle distribution varies with its entropy, $\hat{W}(N) \sim \mathrm{entropy}(N)$,

def ent(nn):
return entropy([y for (_,y) in nn])

ww.sort(key = lambda x: x[1])

rpln([(d,nn,ent(nn)) for (nn,d) in ww])
# (1, [('a', 6)], -0.0)
# (1, [('b', 6)], -0.0)
# (1, [('c', 6)], -0.0)
# (6, [('a', 1), ('b', 5)], 0.45056120886630463)
# (6, [('a', 1), ('c', 5)], 0.45056120886630463)
# (6, [('a', 5), ('b', 1)], 0.45056120886630463)
# (6, [('a', 5), ('c', 1)], 0.45056120886630463)
# (6, [('b', 1), ('c', 5)], 0.45056120886630463)
# (6, [('b', 5), ('c', 1)], 0.45056120886630463)
# (15, [('a', 2), ('b', 4)], 0.6365141682948128)
# (15, [('a', 2), ('c', 4)], 0.6365141682948128)
# (15, [('a', 4), ('b', 2)], 0.6365141682948128)
# (15, [('a', 4), ('c', 2)], 0.6365141682948128)
# (15, [('b', 2), ('c', 4)], 0.6365141682948128)
# (15, [('b', 4), ('c', 2)], 0.6365141682948128)
# (20, [('a', 3), ('b', 3)], 0.6931471805599453)
# (20, [('a', 3), ('c', 3)], 0.6931471805599453)
# (20, [('b', 3), ('c', 3)], 0.6931471805599453)
# (30, [('a', 1), ('b', 1), ('c', 4)], 0.8675632284814612)
# (30, [('a', 1), ('b', 4), ('c', 1)], 0.8675632284814612)
# (30, [('a', 4), ('b', 1), ('c', 1)], 0.8675632284814612)
# (60, [('a', 1), ('b', 2), ('c', 3)], 1.0114042647073518)
# (60, [('a', 1), ('b', 3), ('c', 2)], 1.0114042647073518)
# (60, [('a', 2), ('b', 1), ('c', 3)], 1.0114042647073518)
# (60, [('a', 2), ('b', 3), ('c', 1)], 1.0114042647073516)
# (60, [('a', 3), ('b', 1), ('c', 2)], 1.0114042647073518)
# (60, [('a', 3), ('b', 2), ('c', 1)], 1.0114042647073516)
# (90, [('a', 2), ('b', 2), ('c', 2)], 1.0986122886681096)


The least probable particle distributions are singletons, $\begin{eqnarray} \mathrm{mind}(W) &=& \big\{\{(x,n)\} : x \in X\big\} \end{eqnarray}$ because they have only one state, $\forall x \in X~(W(\{(x,n)\}) = 1)$. The entropy of a singleton distribution is zero, $\mathrm{entropy}(\{(x,n)\}) = 0$,

# (1, [('a', 6)], -0.0)
# (1, [('b', 6)], -0.0)
# (1, [('c', 6)], -0.0)


In the case where the number of particles per micro-state is integral, $n/|X| \in \mathbf{N}_{>0}$, the modal particle distribution is the uniform distribution, $\begin{eqnarray} \mathrm{maxd}(W) &=& \big\{\{(x,n/|X|) : x \in X\}\big\} \end{eqnarray}$ The entropy of the uniform distribution is maximised, $\mathrm{entropy}(\{(x,n/|X|) : x \in X\}) = \ln |X|$,

# (90, [('a', 2), ('b', 2), ('c', 2)], 1.0986122886681096)

log(3)
1.0986122886681098


The normalisation of a particle distribution $N \in X \to \{1 \ldots n\}$ is a micro-state probability function, $\hat{N} \in (X \to \mathbf{Q}_{\geq 0}) \cap \mathcal{P}$, which is independent of the number of particles, $\mathrm{sum}(\hat{N}) = 1$.

So in the case where a problem domain is parameterised by an unknown micro-state probability function otherwise arbitrarily chosen from a known subset $Q \subseteq (X \to \mathbf{Q}_{\geq 0}) \cap \mathcal{P}$, where the number of particles is known to be large, the maximum likelihood estimate $\tilde{P} \in Q$ is the probability function with the greatest entropy, $\forall P \in Q~(\mathrm{entropy}(\tilde{P}) \geq \mathrm{entropy}(P))$ or $\tilde{P} \in \mathrm{maxd}(\{(P,\mathrm{entropy}(P)) : P \in Q\})$,

rpln([(ent(nn), norm(nn)) for (nn,d) in ww])
# (-0.0, [('a', 1 % 1)])
# (-0.0, [('b', 1 % 1)])
# (-0.0, [('c', 1 % 1)])
# (0.45056120886630463, [('a', 1 % 6), ('b', 5 % 6)])
# (0.45056120886630463, [('a', 1 % 6), ('c', 5 % 6)])
# (0.45056120886630463, [('a', 5 % 6), ('b', 1 % 6)])
# (0.45056120886630463, [('a', 5 % 6), ('c', 1 % 6)])
# (0.45056120886630463, [('b', 1 % 6), ('c', 5 % 6)])
# (0.45056120886630463, [('b', 5 % 6), ('c', 1 % 6)])
# (0.6365141682948128, [('a', 1 % 3), ('b', 2 % 3)])
# (0.6365141682948128, [('a', 1 % 3), ('c', 2 % 3)])
# (0.6365141682948128, [('a', 2 % 3), ('b', 1 % 3)])
# (0.6365141682948128, [('a', 2 % 3), ('c', 1 % 3)])
# (0.6365141682948128, [('b', 1 % 3), ('c', 2 % 3)])
# (0.6365141682948128, [('b', 2 % 3), ('c', 1 % 3)])
# (0.6931471805599453, [('a', 1 % 2), ('b', 1 % 2)])
# (0.6931471805599453, [('a', 1 % 2), ('c', 1 % 2)])
# (0.6931471805599453, [('b', 1 % 2), ('c', 1 % 2)])
# (0.8675632284814612, [('a', 1 % 6), ('b', 1 % 6), ('c', 2 % 3)])
# (0.8675632284814612, [('a', 1 % 6), ('b', 2 % 3), ('c', 1 % 6)])
# (0.8675632284814612, [('a', 2 % 3), ('b', 1 % 6), ('c', 1 % 6)])
# (1.0114042647073518, [('a', 1 % 6), ('b', 1 % 3), ('c', 1 % 2)])
# (1.0114042647073518, [('a', 1 % 6), ('b', 1 % 2), ('c', 1 % 3)])
# (1.0114042647073518, [('a', 1 % 3), ('b', 1 % 6), ('c', 1 % 2)])
# (1.0114042647073516, [('a', 1 % 3), ('b', 1 % 2), ('c', 1 % 6)])
# (1.0114042647073518, [('a', 1 % 2), ('b', 1 % 6), ('c', 1 % 3)])
# (1.0114042647073516, [('a', 1 % 2), ('b', 1 % 3), ('c', 1 % 6)])
# (1.0986122886681096, [('a', 1 % 3), ('b', 1 % 3), ('c', 1 % 3)])


top