Entropy and alignment
Python implementation of the Overview/Entropy and alignment
Sections
Entropy
The entropy of a non-zero histogram $A \in \mathcal{A}$ is defined as the expected negative logarithm of the normalised counts, \[ \mathrm{entropy}(A) := -\sum_{S \in A^{\mathrm{FS}}} \hat{A}_S \ln \hat{A}_S \]
histogramsEntropy :: Histogram -> Double
For example, the entropy of the histogram of the deck of cards is $-\ln 1/52$,
suit = VarStr("suit")
rank = VarStr("rank")
vv = sset([suit, rank])
[hearts,clubs,diamonds,spades] = map(ValStr, ["hearts","clubs","diamonds","spades"])
wws = sset([hearts, clubs, diamonds, spades])
[jack,queen,king,ace] = map(ValStr, ["J","Q","K","A"])
wwr = sset([jack,queen,king,ace] + [ValInt(i) for i in range(2,10+1)])
uu = listsSystem([(suit,wws), (rank,wwr)])
uu
# {(rank, {A, J, K, Q, 2, 3, 4, 5, 6, 7, 8, 9, 10}), (suit, {clubs, diamonds, hearts, spades})}
vv
# {rank, suit}
aa = unit(cart(uu,vv))
rpln(aall(aa))
# ({(rank, A), (suit, clubs)}, 1 % 1)
# ({(rank, A), (suit, diamonds)}, 1 % 1)
# ({(rank, A), (suit, hearts)}, 1 % 1)
# ({(rank, A), (suit, spades)}, 1 % 1)
# ({(rank, J), (suit, clubs)}, 1 % 1)
# ({(rank, J), (suit, diamonds)}, 1 % 1)
# ...
# ({(rank, 9), (suit, hearts)}, 1 % 1)
# ({(rank, 9), (suit, spades)}, 1 % 1)
# ({(rank, 10), (suit, clubs)}, 1 % 1)
# ({(rank, 10), (suit, diamonds)}, 1 % 1)
# ({(rank, 10), (suit, hearts)}, 1 % 1)
# ({(rank, 10), (suit, spades)}, 1 % 1)
ent = histogramsEntropy
ent(aa)
# 3.9512437185814298
- log (1/52)
# 3.951243718581427
The sized entropy is $z \times \mathrm{entropy}(A)$ where $z = \mathrm{size}(A)$,
z = size(aa)
z * ent(aa)
# 205.46467336623434
The entropy of a singleton is zero, $z \times \mathrm{entropy}(\{(\cdot,z)\}) = 0$,
ent(regsing(2,2))
# -0.0
Entropy is highest in cartesian histograms, which are uniform and have maximum effective volume. The maximum sized entropy is $z \times \mathrm{entropy}(V_z^{\mathrm{C}}) = z \ln v$ where $v = |V^{\mathrm{C}}|$ and $V_z^{\mathrm{C}} = \mathrm{scalar}(z/v) * V^{\mathrm{C}}$. The entropy of the histogram of the deck of cards is the maximum because it is uniform cartesian, $A = V_z^{\mathrm{C}}$,
v = vol(uu,vars(aa))
z * log(v)
# 205.46467336623422
z * ent(aa)
# 205.46467336623434
Given a histogram $A$ and a set of query variables $K \subset V$, the scaled label entropy is the degree to which the histogram is ambiguous or non-causal in the query variables, $K$. It is the sum of the sized entropies of the contingent slices reduced to the label variables, $V \setminus K$, \[ \sum_{R \in (A\%K)^{\mathrm{FS}}} (A\%K)_R \times \mathrm{entropy}(A * \{R\}^{\mathrm{U}}~\%~(V \setminus K)) \]
The scaled label entropy is also known as the scaled query conditional entropy, \[ \begin{eqnarray} &&\sum_{R \in (A\%K)^{\mathrm{FS}}} (A\%K)_R \times \mathrm{entropy}(A * \{R\}^{\mathrm{U}}~\%~(V \setminus K)) \\ &=&-\sum_{S \in A^{\mathrm{FS}}} A_S \ln \frac{A_S}{(A\%K * V^{\mathrm{C}})_S} \\ &=&-\sum_{S \in A^{\mathrm{FS}}} A_S \ln (A/(A\%K))_S \\ &=&z \times \mathrm{entropy}(A) - z \times \mathrm{entropy}(A\%K) \end{eqnarray} \] When the histogram, $A$, is causal in the query variables, $\mathrm{split}(K,A^{\mathrm{FS}}) \in K^{\mathrm{CS}} \to (V \setminus K)^{\mathrm{CS}}$, the label entropy is zero because each slice is an effective singleton, $\forall R \in (A\%K)^{\mathrm{FS}}~(|A^{\mathrm{F}} * \{R\}^{\mathrm{U}}|=1)$. In this case the label state is unique for every effective query state. By contrast, when the label variables are independent of the query variables, $A = Z * \hat{A}\%K * \hat{A}\%(V \setminus K)$, the label entropy is maximised. To calculate the label entropy given the query variables, $K$, and a histogram, $A$,
setVarsHistogramsSliceEntropy :: Set.Set Variable -> Histogram -> Double
For example, the deck of cards histogram, $A$, is cartesian so the label entropy is always maximised,
def lent(kk,aa):
return size(aa) * (ent(aa) - ent(ared(aa,sset(kk))))
lent([suit, rank],aa)
# 0.0
lent([],aa)
# 205.46467336623434
lent([suit],aa)
# 133.37736658800003
lent([rank],aa)
# 72.08730677823439
This may be compared to the causal histogram relating suit and colour which has zero label entropy,
colour = VarStr("colour")
red = ValStr("red")
black = ValStr("black")
bb = llaa([(llss([(suit, u),(colour, w)]),1) for (u,w) in [(hearts, red), (clubs, black), (diamonds, red), (spades, black)]])
ssplit = setVarsSetStatesSplit
rpln(ssplit(sset([suit]),states(eff(bb))))
# ({(suit, clubs)}, {(colour, black)})
# ({(suit, diamonds)}, {(colour, red)})
# ({(suit, hearts)}, {(colour, red)})
# ({(suit, spades)}, {(colour, black)})
lent([suit],bb)
# 0.0
Consider another example. The union of slices of a regular singleton, a regular diagonal and a regular cartesian has zero label entropy in the singleton slice and maximum label entropy in the cartesian slice,
ss = mul(cdaa([[1]]),cdtp(regsing(2,2),[2,3]))
dd = mul(cdaa([[2]]),cdtp(regdiag(2,2),[2,3]))
cc = mul(cdaa([[3]]),cdtp(regcart(2,2),[2,3]))
ff = add(add(ss,dd),cc)
rpln(aall(ff))
# ({(1, 1), (2, 1), (3, 1)}, 1 % 1)
# ({(1, 2), (2, 1), (3, 1)}, 1 % 1)
# ({(1, 2), (2, 2), (3, 2)}, 1 % 1)
# ({(1, 3), (2, 1), (3, 1)}, 1 % 1)
# ({(1, 3), (2, 1), (3, 2)}, 1 % 1)
# ({(1, 3), (2, 2), (3, 1)}, 1 % 1)
# ({(1, 3), (2, 2), (3, 2)}, 1 % 1)
vk = sset(map(VarInt,[2,3]))
ff1 = ared(mul(ff,cdaa([[1]])),vk)
rpln(aall(ff1))
# ({(2, 1), (3, 1)}, 1 % 1)
size(ff1) * ent(ff1)
# -0.0
ff2 = ared(mul(ff,cdaa([[2]])),vk)
rpln(aall(ff2))
# ({(2, 1), (3, 1)}, 1 % 1)
# ({(2, 2), (3, 2)}, 1 % 1)
size(ff2) * ent(ff2)
# 1.3862943611198906
ff3 = ared(mul(ff,cdaa([[3]])),vk)
rpln(aall(ff3))
# ({(2, 1), (3, 1)}, 1 % 1)
# ({(2, 1), (3, 2)}, 1 % 1)
# ({(2, 2), (3, 1)}, 1 % 1)
# ({(2, 2), (3, 2)}, 1 % 1)
size(ff3) * ent(ff3)
# 5.545177444479562
lent([VarInt(1)],ff)
# 6.931471805599451
The multinomial coefficient of a non-zero integral histogram $A \in \mathcal{A}_{\mathrm{i}}$ is \[ \frac{z!}{\prod_{S \in A^{\mathrm{S}}} A_S!} \] where $z = \mathrm{size}(A) > 0$. In the case where the histogram is non-integral the multinomial coefficient is defined by the unit translated gamma function, $\Gamma_!(x) := \Gamma(x+1)$, \[ \frac{\Gamma_! z}{\prod_{S \in A^{\mathrm{S}}} \Gamma_! A_S} \]
combinationMultinomial :: Integer -> [Integer] -> Integer
histogramsMultinomialLog :: Histogram -> Double
For example,
mult = combinationMultinomial
log(mult(52,[1]*52))
156.3608363030788
multln = histogramsMultinomialLog
multln(aa)
# 156.3608363030788
In the case where the counts are not small, $z \gg \ln z$, the logarithm of the multinomial coefficient approximates to the sized entropy, \[ \begin{eqnarray} \ln \frac{z!}{\prod_{S \in A^{\mathrm{S}}} A_S!} &\approx& z \times \mathrm{entropy}(A) \end{eqnarray} \]
multln(aa)
# 156.3608363030788
52 * ent(aa)
# 205.46467336623434
multln(mul(scalar(100),aa))
# 20384.101936383322
100 * 52 * ent(mul(scalar(100),aa))
# 20546.467336623435
So the entropy, $\mathrm{entropy}(A)$, is a measure of the probability of the histogram of a randomly chosen history. Singleton histograms are least probable and uniform histograms are most probable.
The sized relative entropy between a histogram and its independent is the sized mutual entropy, \[ \sum_{S \in A^{\mathrm{FS}}} A_S \ln \frac{A_S}{A^{\mathrm{X}}_S} \] It can be shown that the size scaled expected logarithm of the independent with respect to the histogram equals the size scaled expected logarithm of the independent with respect to the independent, \[ \sum_{S \in A^{\mathrm{FS}}} A_S \ln A^{\mathrm{X}}_S = \sum_{S \in A^{\mathrm{XFS}}} A^{\mathrm{X}}_S \ln A^{\mathrm{X}}_S \] so the sized mutual entropy is the difference between the sized independent entropy and the sized histogram entropy, \[ \begin{eqnarray} \sum_{S \in A^{\mathrm{FS}}} A_S \ln \frac{A_S}{A^{\mathrm{X}}_S} &=& z \times \mathrm{entropy}(A^{\mathrm{X}}) - z \times \mathrm{entropy}(A) \end{eqnarray} \] For example, consider the sized mutual entropy of the scaled sum of a regular singleton, a regular diagonal and a regular cartesian,
aa = norm(regsing(3,2))
aa = add(aa,norm(regdiag(3,2)))
aa = add(aa,norm(regcart(3,2)))
aa = resize(100,aa)
rpln(aall(aa))
# ({(1, 1), (2, 1)}, 1300 % 27)
# ({(1, 1), (2, 2)}, 100 % 27)
# ({(1, 1), (2, 3)}, 100 % 27)
# ({(1, 2), (2, 1)}, 100 % 27)
# ({(1, 2), (2, 2)}, 400 % 27)
# ({(1, 2), (2, 3)}, 100 % 27)
# ({(1, 3), (2, 1)}, 100 % 27)
# ({(1, 3), (2, 2)}, 100 % 27)
# ({(1, 3), (2, 3)}, 400 % 27)
rpln(aall(ind(aa)))
# ({(1, 1), (2, 1)}, 2500 % 81)
# ({(1, 1), (2, 2)}, 1000 % 81)
# ({(1, 1), (2, 3)}, 1000 % 81)
# ({(1, 2), (2, 1)}, 1000 % 81)
# ({(1, 2), (2, 2)}, 400 % 81)
# ({(1, 2), (2, 3)}, 400 % 81)
# ({(1, 3), (2, 1)}, 1000 % 81)
# ({(1, 3), (2, 2)}, 400 % 81)
# ({(1, 3), (2, 3)}, 400 % 81)
aa == ind(aa)
# False
z = size(aa)
z * sum([a * log (a/x) for ((ss,a),(tt,x)) in zip(aall(norm(aa)),aall(norm(ind(aa))))])
# 33.99466156865321
z * ent(ind(aa)) - z * ent(aa)
# 33.994661568653214
The sized mutual entropy can be viewed as a measure of the probability of the independent, $A^{\mathrm{X}}$, relative to the histogram, $A$, given arbitrary substrate history. Equivalently, sized mutual entropy can be viewed as a measure of the surprisal of the histogram, $A$, relative to the independent, $A^{\mathrm{X}}$. That is, sized mutual entropy is a measure of the dependency between the variables in the histogram, $A$. Consider the sized mutual entropy of the scaled sum where a diagonal replaces the singleton,
aa = norm(regdiag(3,2))
aa = add(aa,norm(regdiag(3,2)))
aa = add(aa,norm(regcart(3,2)))
aa = resize(100,aa)
rpln(aall(aa))
# ({(1, 1), (2, 1)}, 700 % 27)
# ({(1, 1), (2, 2)}, 100 % 27)
# ({(1, 1), (2, 3)}, 100 % 27)
# ({(1, 2), (2, 1)}, 100 % 27)
# ({(1, 2), (2, 2)}, 700 % 27)
# ({(1, 2), (2, 3)}, 100 % 27)
# ({(1, 3), (2, 1)}, 100 % 27)
# ({(1, 3), (2, 2)}, 100 % 27)
# ({(1, 3), (2, 3)}, 700 % 27)
z * ent(ind(aa)) - z * ent(aa)
# 41.48733828193562
The sized mutual entropy has increased because the replacement of the singleton with a diagonal increases the dependency between the variables. Now compare to the sized mutual entropy of just the scaled regular diagonal where the dependency is greater still,
aa = resize(100,norm(regdiag(3,2)))
rpln(aall(aa))
# ({(1, 1), (2, 1)}, 100 % 3)
# ({(1, 2), (2, 2)}, 100 % 3)
# ({(1, 3), (2, 3)}, 100 % 3)
z * ent(ind(aa)) - z * ent(aa)
# 109.86122886681099
By comparison, the sized mutual entropy of the regular cartesian is zero, because the cartesian is independent and so there is no dependency between the variables,
aa = resize(100,norm(regcart(3,2)))
rpln(aall(aa))
# ({(1, 1), (2, 1)}, 100 % 9)
# ({(1, 1), (2, 2)}, 100 % 9)
# ({(1, 1), (2, 3)}, 100 % 9)
# ({(1, 2), (2, 1)}, 100 % 9)
# ({(1, 2), (2, 2)}, 100 % 9)
# ({(1, 2), (2, 3)}, 100 % 9)
# ({(1, 3), (2, 1)}, 100 % 9)
# ({(1, 3), (2, 2)}, 100 % 9)
# ({(1, 3), (2, 3)}, 100 % 9)
aa == ind(aa)
# True
z * ent(ind(aa)) - z * ent(aa)
# 0.0
Similarly, the sized mutual entropy of the regular singleton is zero, because the singleton is independent.
The sized mutual entropy is the sized relative entropy so it is always positive, \[ \begin{eqnarray} z \times \mathrm{entropy}(A^{\mathrm{X}}) - z \times \mathrm{entropy}(A) &\geq& 0 \end{eqnarray} \] and so the independent entropy is always greater than or equal to the histogram entropy \[ \begin{eqnarray} \mathrm{entropy}(A^{\mathrm{X}}) &\geq& \mathrm{entropy}(A) \end{eqnarray} \]
Alignment
The alignment of a histogram $A \in \mathcal{A}$ is defined \[ \begin{eqnarray} \mathrm{algn}(A) &:=& \sum_{S \in A^{\mathrm{S}}} \ln \Gamma_! A_S - \sum_{S \in A^{\mathrm{XS}}} \ln \Gamma_! A^{\mathrm{X}}_S \end{eqnarray} \] where $\Gamma_!$ is the unit translated gamma function.
In the case where both the histogram and its independent are integral, $A,A^{\mathrm{X}} \in \mathcal{A}_{\mathrm{i}}$, then the alignment is the difference between the sum log-factorial counts of the histogram and its independent, \[ \begin{eqnarray} \mathrm{algn}(A) &=& \sum_{S \in A^{\mathrm{S}}} \ln A_S! - \sum_{S \in A^{\mathrm{XS}}} \ln A^{\mathrm{X}}_S! \end{eqnarray} \]
histogramsAlignment :: Histogram -> Double
For example, consider the alignment of the scaled sum of a regular singleton, a regular diagonal and a regular cartesian,
algn = histogramsAlignment
aa = norm(regsing(3,2))
aa = add(aa,norm(regdiag(3,2)))
aa = add(aa,norm(regcart(3,2)))
aa = resize(100,aa)
rpln(aall(aa))
# ({(1, 1), (2, 1)}, 1300 % 27)
# ({(1, 1), (2, 2)}, 100 % 27)
# ({(1, 1), (2, 3)}, 100 % 27)
# ({(1, 2), (2, 1)}, 100 % 27)
# ({(1, 2), (2, 2)}, 400 % 27)
# ({(1, 2), (2, 3)}, 100 % 27)
# ({(1, 3), (2, 1)}, 100 % 27)
# ({(1, 3), (2, 2)}, 100 % 27)
# ({(1, 3), (2, 3)}, 400 % 27)
algn(aa)
# 32.67054377025386
Alignment is the logarithm of the ratio of the independent multinomial coefficient to the multinomial coefficient, \[ \begin{eqnarray} \mathrm{algn}(A) &=& \ln \left(\frac{z!}{\prod_{S \in A^{\mathrm{XS}}} A^{\mathrm{X}}_S!}~/~\frac{z!}{\prod_{S \in A^{\mathrm{S}}} A_S!}\right) \end{eqnarray} \]
multln(ind(aa)) - multln(aa)
# 32.67054377025386
So alignment is the logarithm of the probability of the independent, $A^{\mathrm{X}}$, relative to the histogram, $A$. Equivalently, alignment is the logarithm of the surprisal of the histogram, $A$, relative to the independent, $A^{\mathrm{X}}$. Alignment is a measure of the dependency between the variables in the histogram, $A$.
Alignment is approximately equal to the sized mutual entropy, \[ \begin{eqnarray} \mathrm{algn}(A) &\approx& z \times \mathrm{entropy}(A^{\mathrm{X}}) - z \times \mathrm{entropy}(A)\\ &=& \sum_{S \in A^{\mathrm{FS}}} A_S \ln \frac{A_S}{A^{\mathrm{X}}_S} \end{eqnarray} \]
z = size(aa)
z * ent(ind(aa)) - z * ent(aa)
# 33.994661568653214
Consider the alignment of the scaled sum where a diagonal replaces the singleton,
aa = norm(regdiag(3,2))
aa = add(aa,norm(regdiag(3,2)))
aa = add(aa,norm(regcart(3,2)))
aa = resize(100,aa)
rpln(aall(aa))
# ({(1, 1), (2, 1)}, 700 % 27)
# ({(1, 1), (2, 2)}, 100 % 27)
# ({(1, 1), (2, 3)}, 100 % 27)
# ({(1, 2), (2, 1)}, 100 % 27)
# ({(1, 2), (2, 2)}, 700 % 27)
# ({(1, 2), (2, 3)}, 100 % 27)
# ({(1, 3), (2, 1)}, 100 % 27)
# ({(1, 3), (2, 2)}, 100 % 27)
# ({(1, 3), (2, 3)}, 700 % 27)
algn(aa)
# 39.539287211260046
z * ent(ind(aa)) - z * ent(aa)
# 41.48733828193562
The alignment has increased because the replacement of the singleton with a diagonal increases the dependency between the variables. Now compare to the alignment of just the scaled regular diagonal where the dependency is greater still,
aa = norm(regdiag(3,2))
aa = resize(100,aa)
rpln(aall(aa))
# ({(1, 1), (2, 1)}, 100 % 3)
# ({(1, 2), (2, 2)}, 100 % 3)
# ({(1, 3), (2, 3)}, 100 % 3)
algn(aa)
# 98.71169723276279
z * ent(ind(aa)) - z * ent(aa)
# 109.86122886681099
By comparison, the alignment of the regular cartesian is zero, because the cartesian is independent and so there is no dependency between the variables,
aa = norm(regcart(3,2))
aa = resize(100,aa)
rpln(aall(aa))
# ({(1, 1), (2, 1)}, 100 % 9)
# ({(1, 1), (2, 2)}, 100 % 9)
# ({(1, 1), (2, 3)}, 100 % 9)
# ({(1, 2), (2, 1)}, 100 % 9)
# ({(1, 2), (2, 2)}, 100 % 9)
# ({(1, 2), (2, 3)}, 100 % 9)
# ({(1, 3), (2, 1)}, 100 % 9)
# ({(1, 3), (2, 2)}, 100 % 9)
# ({(1, 3), (2, 3)}, 100 % 9)
aa == ind(aa)
# True
algn(aa)
# 0.0
z * ent(ind(aa)) - z * ent(aa)
# 0.0
Similarly, the alignment of the regular singleton is zero, because the singleton is independent.
The alignment of an independent histogram, $A = A^{\mathrm{X}}$, is zero. In particular, scalar histograms, $V=\emptyset$, mono-variate histograms, $|V|=1$, uniform cartesian histograms, $A = V_z^{\mathrm{C}}$, and effective singleton histograms, $|A^{\mathrm{F}}| =1$, all have zero alignment,
algn(scalar(100))
# 0.0
algn(resize(100,regdiag(3,1)))
# 0.0
algn(resize(100,regcart(3,3)))
# 0.0
algn(resize(100,regsing(3,3)))
# 0.0
The maximum alignment of a histogram $A$ occurs when the histogram is both uniform and fully diagonalised. No pair of effective states shares any value, $\forall S,T \in A^{\mathrm{FS}}~(S \neq T \implies S \cap T = \emptyset)$, and all counts are equal along the diagonal, $\forall S,T \in A^{\mathrm{FS}}~(A_S = A_T)$,
algn(resize(100,regdiag(3,2)))
# 98.71169723276279
The maximum alignment of a regular histogram with dimension $n =|V|$ and valency $d$ is \[ d \ln \Gamma_! \frac{z}{d}~-~d^n \ln \Gamma_! \frac{z}{d^n} \]
def facln(x):
return gammaln(float(x) + 1)
d = 3
n = 2
d * facln(z/d) - (d**n) * facln(z/(d**n))
# 98.71169723276279
The maximum alignment is approximately $z \ln d^{n-1} = z \ln v/d$, where $v = d^n$,
z * log(d**(n-1))
# 109.86122886681098
Example - a weather forecast
Some of the concepts above regarding entropy and alignment can be demonstrated with the sample of some weather measurements created in States, histories and histograms,
def lluu(ll):
return listsSystem([(v,sset(ww)) for (v,ww) in ll])
def llhh(vv,ev):
return listsHistory([(IdInt(i), llss(zip(vv,ll))) for (i,ll) in ev])
def red(aa,ll):
return setVarsHistogramsReduce(sset(ll),aa)
def ssplit(ll,aa):
return setVarsSetStatesSplit(sset(ll),states(aa))
[pressure,cloud,wind,rain] = map(VarStr,["pressure","cloud","wind","rain"])
[low,medium,high,none,light,heavy,strong] = map(ValStr,["low","medium","high","none","light","heavy","strong"])
uu = lluu([
(pressure, [low,medium,high]),
(cloud, [none,light,heavy]),
(wind, [none,light,strong]),
(rain, [none,light,heavy])])
vv = uvars(uu)
hh = llhh([pressure,cloud,wind,rain],[
(1,[high,none,none,none]),
(2,[medium,light,none,light]),
(3,[high,none,light,none]),
(4,[low,heavy,strong,heavy]),
(5,[low,none,light,light]),
(6,[medium,none,light,light]),
(7,[low,heavy,light,heavy]),
(8,[high,none,light,none]),
(9,[medium,light,strong,heavy]),
(10,[medium,light,light,light]),
(11,[high,light,light,heavy]),
(12,[medium,none,none,none]),
(13,[medium,light,none,none]),
(14,[high,light,strong,light]),
(15,[medium,none,light,light]),
(16,[low,heavy,strong,heavy]),
(17,[low,heavy,light,heavy]),
(18,[high,none,none,none]),
(19,[low,light,none,light]),
(20,[high,none,none,none])])
aa = hhaa(hh)
uu
# {(cloud, {heavy, light, none}), (pressure, {high, low, medium}), (rain, {heavy, light, none}), (wind, {light, none, strong})}
vv
# {cloud, pressure, rain, wind}
rpln(aall(aa))
# ({(cloud, heavy), (pressure, low), (rain, heavy), (wind, light)}, 2 % 1)
# ({(cloud, heavy), (pressure, low), (rain, heavy), (wind, strong)}, 2 % 1)
# ({(cloud, light), (pressure, high), (rain, heavy), (wind, light)}, 1 % 1)
# ({(cloud, light), (pressure, high), (rain, light), (wind, strong)}, 1 % 1)
# ({(cloud, light), (pressure, low), (rain, light), (wind, none)}, 1 % 1)
# ({(cloud, light), (pressure, medium), (rain, heavy), (wind, strong)}, 1 % 1)
# ({(cloud, light), (pressure, medium), (rain, light), (wind, light)}, 1 % 1)
# ({(cloud, light), (pressure, medium), (rain, light), (wind, none)}, 1 % 1)
# ({(cloud, light), (pressure, medium), (rain, none), (wind, none)}, 1 % 1)
# ({(cloud, none), (pressure, high), (rain, none), (wind, light)}, 2 % 1)
# ({(cloud, none), (pressure, high), (rain, none), (wind, none)}, 3 % 1)
# ({(cloud, none), (pressure, low), (rain, light), (wind, light)}, 1 % 1)
# ({(cloud, none), (pressure, medium), (rain, light), (wind, light)}, 2 % 1)
# ({(cloud, none), (pressure, medium), (rain, none), (wind, none)}, 1 % 1)
size(aa)
# 20 % 1
The sized entropy, $z \times \mathrm{entropy}(A)$, is
z = size(aa)
z * ent(aa)
# 51.07363116059592
The sized independent entropy, $z \times \mathrm{entropy}(A^{\mathrm{X}})$, is
z * ent(ind(aa))
# 85.78884471224839
Compare that to the maximum sized entropy, $z \times \mathrm{entropy}(V_z^{\mathrm{C}}) = z \ln v$,
v = vol(uu,vv)
z * log(v)
# 87.88898309344879
The multinomial coefficient of the histogram, $A$, is \[ \frac{z!}{\prod_{S \in A^{\mathrm{S}}} A_S!} \]
multln = histogramsMultinomialLog
multln(aa)
# 37.77126826928565
multln(ind(aa))
# 49.62212054431038
multln(resize(20,unit(cart(uu,vv))))
# 50.23830936717706
The sized mutual entropy is \[ \sum_{S \in A^{\mathrm{FS}}} A_S \ln \frac{A_S}{A^{\mathrm{X}}_S} \]
z * ent(ind(aa)) - z * ent(aa)
# 34.71521355165247
The alignment of the histogram, $A$, is \[ \begin{eqnarray} \mathrm{algn}(A) &:=& \sum_{S \in A^{\mathrm{S}}} \ln \Gamma_! A_S - \sum_{S \in A^{\mathrm{XS}}} \ln \Gamma_! A^{\mathrm{X}}_S \end{eqnarray} \]
algn(aa)
# 11.85085227502473
Note that in this case where the counts are small, the sized mutual entropy differs considerably from the alignment. The relative difference is smaller for a scaled histogram, for example,
aa1 = mul(scalar(100),aa)
z1 = 100 * z
z1 * ent(ind(aa1)) - z1 * ent(aa1)
# 3471.521355165247
algn(aa1)
# 3318.508094475117
The histogram, $A$, happens to be a regular histogram of dimension $n = |V| = 4$ and valency $\{d\} = \{|U_w| : w \in V\} = \{3\}$, so the maximum alignment is that of a regular diagonal,
def facln(x):
return gammaln(float(x) + 1)
d = 3
n = 4
d * facln(z/d) - (d**n) * facln(z/(d**n))
# 31.485060233929005
z * log(d**(n-1))
# 65.91673732008658
Here are the alignments of various reductions,
algn(red(aa,[pressure,rain]))
# 4.278766678519384
algn(red(aa,[pressure,cloud]))
# 4.6232784937782885
algn(red(aa,[pressure,wind]))
# 0.646716967212571
algn(red(aa,[cloud,rain]))
# 6.4150379630063465
algn(red(aa,[cloud,wind]))
# 2.7673350044725016
algn(red(aa,[wind,rain]))
# 3.930131313218345
algn(red(aa,[cloud,wind,rain]))
# 8.935048311238008
These alignments may be contrasted with the label entropies,
def lent(kk,aa):
return size(aa) * (ent(aa) - ent(ared(aa,sset(kk))))
lent([pressure],red(aa,[pressure,rain]))
# 16.083165728773302
lent([pressure],red(aa,[pressure,cloud]))
# 14.173623223888864
lent([pressure],red(aa,[pressure,wind]))
# 20.127820211001172
lent([cloud],red(aa,[cloud,rain]))
# 12.418526752441053
lent([cloud],red(aa,[cloud,wind]))
# 16.508188366758773
lent([wind],red(aa,[wind,rain]))
# 15.984940222994219
lent([cloud,wind],red(aa,[cloud,wind,rain]))
# 8.047189562170498
The weather forecast example continues in Transforms.