Entropy and alignment

Python implementation of the Overview/Entropy and alignment

Sections

Entropy

Alignment

Example - a weather forecast

Entropy

The entropy of a non-zero histogram $A \in \mathcal{A}$ is defined as the expected negative logarithm of the normalised counts, \[ \mathrm{entropy}(A) := -\sum_{S \in A^{\mathrm{FS}}} \hat{A}_S \ln \hat{A}_S \]

histogramsEntropy :: Histogram -> Double

For example, the entropy of the histogram of the deck of cards is $-\ln 1/52$,

suit = VarStr("suit")
rank = VarStr("rank")
vv = sset([suit, rank])
[hearts,clubs,diamonds,spades] = map(ValStr, ["hearts","clubs","diamonds","spades"])
wws = sset([hearts, clubs, diamonds, spades])
[jack,queen,king,ace] = map(ValStr, ["J","Q","K","A"])
wwr = sset([jack,queen,king,ace] + [ValInt(i) for i in range(2,10+1)])
uu = listsSystem([(suit,wws), (rank,wwr)])

uu
# {(rank, {A, J, K, Q, 2, 3, 4, 5, 6, 7, 8, 9, 10}), (suit, {clubs, diamonds, hearts, spades})}

vv
# {rank, suit}

aa = unit(cart(uu,vv))

rpln(aall(aa))
# ({(rank, A), (suit, clubs)}, 1 % 1)
# ({(rank, A), (suit, diamonds)}, 1 % 1)
# ({(rank, A), (suit, hearts)}, 1 % 1)
# ({(rank, A), (suit, spades)}, 1 % 1)
# ({(rank, J), (suit, clubs)}, 1 % 1)
# ({(rank, J), (suit, diamonds)}, 1 % 1)
# ...
# ({(rank, 9), (suit, hearts)}, 1 % 1)
# ({(rank, 9), (suit, spades)}, 1 % 1)
# ({(rank, 10), (suit, clubs)}, 1 % 1)
# ({(rank, 10), (suit, diamonds)}, 1 % 1)
# ({(rank, 10), (suit, hearts)}, 1 % 1)
# ({(rank, 10), (suit, spades)}, 1 % 1)

ent = histogramsEntropy 

ent(aa)
# 3.9512437185814298

- log (1/52)
# 3.951243718581427

The sized entropy is $z \times \mathrm{entropy}(A)$ where $z = \mathrm{size}(A)$,

z = size(aa)

z * ent(aa)
# 205.46467336623434

The entropy of a singleton is zero, $z \times \mathrm{entropy}(\{(\cdot,z)\}) = 0$,

ent(regsing(2,2))
# -0.0

Entropy is highest in cartesian histograms, which are uniform and have maximum effective volume. The maximum sized entropy is $z \times \mathrm{entropy}(V_z^{\mathrm{C}}) = z \ln v$ where $v = |V^{\mathrm{C}}|$ and $V_z^{\mathrm{C}} = \mathrm{scalar}(z/v) * V^{\mathrm{C}}$. The entropy of the histogram of the deck of cards is the maximum because it is uniform cartesian, $A = V_z^{\mathrm{C}}$,

v = vol(uu,vars(aa))

z * log(v)
# 205.46467336623422

z * ent(aa)
# 205.46467336623434

Given a histogram $A$ and a set of query variables $K \subset V$, the scaled label entropy is the degree to which the histogram is ambiguous or non-causal in the query variables, $K$. It is the sum of the sized entropies of the contingent slices reduced to the label variables, $V \setminus K$, \[ \sum_{R \in (A\%K)^{\mathrm{FS}}} (A\%K)_R \times \mathrm{entropy}(A * \{R\}^{\mathrm{U}}~\%~(V \setminus K)) \]

The scaled label entropy is also known as the scaled query conditional entropy, \[ \begin{eqnarray} &&\sum_{R \in (A\%K)^{\mathrm{FS}}} (A\%K)_R \times \mathrm{entropy}(A * \{R\}^{\mathrm{U}}~\%~(V \setminus K)) \\ &=&-\sum_{S \in A^{\mathrm{FS}}} A_S \ln \frac{A_S}{(A\%K * V^{\mathrm{C}})_S} \\ &=&-\sum_{S \in A^{\mathrm{FS}}} A_S \ln (A/(A\%K))_S \\ &=&z \times \mathrm{entropy}(A) - z \times \mathrm{entropy}(A\%K) \end{eqnarray} \] When the histogram, $A$, is causal in the query variables, $\mathrm{split}(K,A^{\mathrm{FS}}) \in K^{\mathrm{CS}} \to (V \setminus K)^{\mathrm{CS}}$, the label entropy is zero because each slice is an effective singleton, $\forall R \in (A\%K)^{\mathrm{FS}}~(|A^{\mathrm{F}} * \{R\}^{\mathrm{U}}|=1)$. In this case the label state is unique for every effective query state. By contrast, when the label variables are independent of the query variables, $A = Z * \hat{A}\%K * \hat{A}\%(V \setminus K)$, the label entropy is maximised. To calculate the label entropy given the query variables, $K$, and a histogram, $A$,

setVarsHistogramsSliceEntropy :: Set.Set Variable -> Histogram -> Double

For example, the deck of cards histogram, $A$, is cartesian so the label entropy is always maximised,

def lent(kk,aa):
    return size(aa) * (ent(aa) - ent(ared(aa,sset(kk))))

lent([suit, rank],aa)
# 0.0

lent([],aa)
# 205.46467336623434

lent([suit],aa)
# 133.37736658800003

lent([rank],aa)
# 72.08730677823439

This may be compared to the causal histogram relating suit and colour which has zero label entropy,

colour = VarStr("colour")
red = ValStr("red")
black = ValStr("black")

bb = llaa([(llss([(suit, u),(colour, w)]),1) for (u,w) in [(hearts, red), (clubs, black), (diamonds, red), (spades, black)]])

ssplit = setVarsSetStatesSplit 

rpln(ssplit(sset([suit]),states(eff(bb))))
# ({(suit, clubs)}, {(colour, black)})
# ({(suit, diamonds)}, {(colour, red)})
# ({(suit, hearts)}, {(colour, red)})
# ({(suit, spades)}, {(colour, black)})

lent([suit],bb)
# 0.0

Consider another example. The union of slices of a regular singleton, a regular diagonal and a regular cartesian has zero label entropy in the singleton slice and maximum label entropy in the cartesian slice,

ss = mul(cdaa([[1]]),cdtp(regsing(2,2),[2,3]))
dd = mul(cdaa([[2]]),cdtp(regdiag(2,2),[2,3]))
cc = mul(cdaa([[3]]),cdtp(regcart(2,2),[2,3]))

ff = add(add(ss,dd),cc)

rpln(aall(ff))
# ({(1, 1), (2, 1), (3, 1)}, 1 % 1)
# ({(1, 2), (2, 1), (3, 1)}, 1 % 1)
# ({(1, 2), (2, 2), (3, 2)}, 1 % 1)
# ({(1, 3), (2, 1), (3, 1)}, 1 % 1)
# ({(1, 3), (2, 1), (3, 2)}, 1 % 1)
# ({(1, 3), (2, 2), (3, 1)}, 1 % 1)
# ({(1, 3), (2, 2), (3, 2)}, 1 % 1)

vk = sset(map(VarInt,[2,3]))

ff1 = ared(mul(ff,cdaa([[1]])),vk)

rpln(aall(ff1))
# ({(2, 1), (3, 1)}, 1 % 1)

size(ff1) * ent(ff1)
# -0.0

ff2 = ared(mul(ff,cdaa([[2]])),vk)

rpln(aall(ff2))
# ({(2, 1), (3, 1)}, 1 % 1)
# ({(2, 2), (3, 2)}, 1 % 1)

size(ff2) * ent(ff2)
# 1.3862943611198906

ff3 = ared(mul(ff,cdaa([[3]])),vk)

rpln(aall(ff3))
# ({(2, 1), (3, 1)}, 1 % 1)
# ({(2, 1), (3, 2)}, 1 % 1)
# ({(2, 2), (3, 1)}, 1 % 1)
# ({(2, 2), (3, 2)}, 1 % 1)

size(ff3) * ent(ff3)
# 5.545177444479562

lent([VarInt(1)],ff)
# 6.931471805599451

The multinomial coefficient of a non-zero integral histogram $A \in \mathcal{A}_{\mathrm{i}}$ is \[ \frac{z!}{\prod_{S \in A^{\mathrm{S}}} A_S!} \] where $z = \mathrm{size}(A) > 0$. In the case where the histogram is non-integral the multinomial coefficient is defined by the unit translated gamma function, $\Gamma_!(x) := \Gamma(x+1)$, \[ \frac{\Gamma_! z}{\prod_{S \in A^{\mathrm{S}}} \Gamma_! A_S} \]

combinationMultinomial :: Integer -> [Integer] -> Integer
histogramsMultinomialLog :: Histogram -> Double

For example,

mult = combinationMultinomial

log(mult(52,[1]*52))
156.3608363030788

multln = histogramsMultinomialLog

multln(aa)
# 156.3608363030788

In the case where the counts are not small, $z \gg \ln z$, the logarithm of the multinomial coefficient approximates to the sized entropy, \[ \begin{eqnarray} \ln \frac{z!}{\prod_{S \in A^{\mathrm{S}}} A_S!} &\approx& z \times \mathrm{entropy}(A) \end{eqnarray} \]

multln(aa)
# 156.3608363030788

52 * ent(aa)
# 205.46467336623434

multln(mul(scalar(100),aa))
# 20384.101936383322

100 * 52 * ent(mul(scalar(100),aa))
# 20546.467336623435

So the entropy, $\mathrm{entropy}(A)$, is a measure of the probability of the histogram of a randomly chosen history. Singleton histograms are least probable and uniform histograms are most probable.

The sized relative entropy between a histogram and its independent is the sized mutual entropy, \[ \sum_{S \in A^{\mathrm{FS}}} A_S \ln \frac{A_S}{A^{\mathrm{X}}_S} \] It can be shown that the size scaled expected logarithm of the independent with respect to the histogram equals the size scaled expected logarithm of the independent with respect to the independent, \[ \sum_{S \in A^{\mathrm{FS}}} A_S \ln A^{\mathrm{X}}_S = \sum_{S \in A^{\mathrm{XFS}}} A^{\mathrm{X}}_S \ln A^{\mathrm{X}}_S \] so the sized mutual entropy is the difference between the sized independent entropy and the sized histogram entropy, \[ \begin{eqnarray} \sum_{S \in A^{\mathrm{FS}}} A_S \ln \frac{A_S}{A^{\mathrm{X}}_S} &=& z \times \mathrm{entropy}(A^{\mathrm{X}}) - z \times \mathrm{entropy}(A) \end{eqnarray} \] For example, consider the sized mutual entropy of the scaled sum of a regular singleton, a regular diagonal and a regular cartesian,

aa = norm(regsing(3,2))
aa = add(aa,norm(regdiag(3,2)))
aa = add(aa,norm(regcart(3,2)))
aa = resize(100,aa)

rpln(aall(aa))
# ({(1, 1), (2, 1)}, 1300 % 27)
# ({(1, 1), (2, 2)}, 100 % 27)
# ({(1, 1), (2, 3)}, 100 % 27)
# ({(1, 2), (2, 1)}, 100 % 27)
# ({(1, 2), (2, 2)}, 400 % 27)
# ({(1, 2), (2, 3)}, 100 % 27)
# ({(1, 3), (2, 1)}, 100 % 27)
# ({(1, 3), (2, 2)}, 100 % 27)
# ({(1, 3), (2, 3)}, 400 % 27)

rpln(aall(ind(aa)))
# ({(1, 1), (2, 1)}, 2500 % 81)
# ({(1, 1), (2, 2)}, 1000 % 81)
# ({(1, 1), (2, 3)}, 1000 % 81)
# ({(1, 2), (2, 1)}, 1000 % 81)
# ({(1, 2), (2, 2)}, 400 % 81)
# ({(1, 2), (2, 3)}, 400 % 81)
# ({(1, 3), (2, 1)}, 1000 % 81)
# ({(1, 3), (2, 2)}, 400 % 81)
# ({(1, 3), (2, 3)}, 400 % 81)

aa == ind(aa)
# False

z = size(aa)

z * sum([a * log (a/x) for ((ss,a),(tt,x)) in zip(aall(norm(aa)),aall(norm(ind(aa))))])
# 33.99466156865321

z * ent(ind(aa)) - z * ent(aa)
# 33.994661568653214

The sized mutual entropy can be viewed as a measure of the probability of the independent, $A^{\mathrm{X}}$, relative to the histogram, $A$, given arbitrary substrate history. Equivalently, sized mutual entropy can be viewed as a measure of the surprisal of the histogram, $A$, relative to the independent, $A^{\mathrm{X}}$. That is, sized mutual entropy is a measure of the dependency between the variables in the histogram, $A$. Consider the sized mutual entropy of the scaled sum where a diagonal replaces the singleton,

aa = norm(regdiag(3,2))
aa = add(aa,norm(regdiag(3,2)))
aa = add(aa,norm(regcart(3,2)))
aa = resize(100,aa)

rpln(aall(aa))
# ({(1, 1), (2, 1)}, 700 % 27)
# ({(1, 1), (2, 2)}, 100 % 27)
# ({(1, 1), (2, 3)}, 100 % 27)
# ({(1, 2), (2, 1)}, 100 % 27)
# ({(1, 2), (2, 2)}, 700 % 27)
# ({(1, 2), (2, 3)}, 100 % 27)
# ({(1, 3), (2, 1)}, 100 % 27)
# ({(1, 3), (2, 2)}, 100 % 27)
# ({(1, 3), (2, 3)}, 700 % 27)

z * ent(ind(aa)) - z * ent(aa)
# 41.48733828193562

The sized mutual entropy has increased because the replacement of the singleton with a diagonal increases the dependency between the variables. Now compare to the sized mutual entropy of just the scaled regular diagonal where the dependency is greater still,

aa = resize(100,norm(regdiag(3,2)))

rpln(aall(aa))
# ({(1, 1), (2, 1)}, 100 % 3)
# ({(1, 2), (2, 2)}, 100 % 3)
# ({(1, 3), (2, 3)}, 100 % 3)

z * ent(ind(aa)) - z * ent(aa)
# 109.86122886681099

By comparison, the sized mutual entropy of the regular cartesian is zero, because the cartesian is independent and so there is no dependency between the variables,

aa = resize(100,norm(regcart(3,2)))

rpln(aall(aa))
# ({(1, 1), (2, 1)}, 100 % 9)
# ({(1, 1), (2, 2)}, 100 % 9)
# ({(1, 1), (2, 3)}, 100 % 9)
# ({(1, 2), (2, 1)}, 100 % 9)
# ({(1, 2), (2, 2)}, 100 % 9)
# ({(1, 2), (2, 3)}, 100 % 9)
# ({(1, 3), (2, 1)}, 100 % 9)
# ({(1, 3), (2, 2)}, 100 % 9)
# ({(1, 3), (2, 3)}, 100 % 9)

aa == ind(aa)
# True

z * ent(ind(aa)) - z * ent(aa)
# 0.0

Similarly, the sized mutual entropy of the regular singleton is zero, because the singleton is independent.

The sized mutual entropy is the sized relative entropy so it is always positive, \[ \begin{eqnarray} z \times \mathrm{entropy}(A^{\mathrm{X}}) - z \times \mathrm{entropy}(A) &\geq& 0 \end{eqnarray} \] and so the independent entropy is always greater than or equal to the histogram entropy \[ \begin{eqnarray} \mathrm{entropy}(A^{\mathrm{X}}) &\geq& \mathrm{entropy}(A) \end{eqnarray} \]

Alignment

The alignment of a histogram $A \in \mathcal{A}$ is defined \[ \begin{eqnarray} \mathrm{algn}(A) &:=& \sum_{S \in A^{\mathrm{S}}} \ln \Gamma_! A_S - \sum_{S \in A^{\mathrm{XS}}} \ln \Gamma_! A^{\mathrm{X}}_S \end{eqnarray} \] where $\Gamma_!$ is the unit translated gamma function.

In the case where both the histogram and its independent are integral, $A,A^{\mathrm{X}} \in \mathcal{A}_{\mathrm{i}}$, then the alignment is the difference between the sum log-factorial counts of the histogram and its independent, \[ \begin{eqnarray} \mathrm{algn}(A) &=& \sum_{S \in A^{\mathrm{S}}} \ln A_S! - \sum_{S \in A^{\mathrm{XS}}} \ln A^{\mathrm{X}}_S! \end{eqnarray} \]

histogramsAlignment :: Histogram -> Double

For example, consider the alignment of the scaled sum of a regular singleton, a regular diagonal and a regular cartesian,

algn = histogramsAlignment

aa = norm(regsing(3,2))
aa = add(aa,norm(regdiag(3,2)))
aa = add(aa,norm(regcart(3,2)))
aa = resize(100,aa)

rpln(aall(aa))
# ({(1, 1), (2, 1)}, 1300 % 27)
# ({(1, 1), (2, 2)}, 100 % 27)
# ({(1, 1), (2, 3)}, 100 % 27)
# ({(1, 2), (2, 1)}, 100 % 27)
# ({(1, 2), (2, 2)}, 400 % 27)
# ({(1, 2), (2, 3)}, 100 % 27)
# ({(1, 3), (2, 1)}, 100 % 27)
# ({(1, 3), (2, 2)}, 100 % 27)
# ({(1, 3), (2, 3)}, 400 % 27)

algn(aa)
# 32.67054377025386

Alignment is the logarithm of the ratio of the independent multinomial coefficient to the multinomial coefficient, \[ \begin{eqnarray} \mathrm{algn}(A) &=& \ln \left(\frac{z!}{\prod_{S \in A^{\mathrm{XS}}} A^{\mathrm{X}}_S!}~/~\frac{z!}{\prod_{S \in A^{\mathrm{S}}} A_S!}\right) \end{eqnarray} \]

multln(ind(aa)) - multln(aa)
# 32.67054377025386

So alignment is the logarithm of the probability of the independent, $A^{\mathrm{X}}$, relative to the histogram, $A$. Equivalently, alignment is the logarithm of the surprisal of the histogram, $A$, relative to the independent, $A^{\mathrm{X}}$. Alignment is a measure of the dependency between the variables in the histogram, $A$.

Alignment is approximately equal to the sized mutual entropy, \[ \begin{eqnarray} \mathrm{algn}(A) &\approx& z \times \mathrm{entropy}(A^{\mathrm{X}}) - z \times \mathrm{entropy}(A)\\ &=& \sum_{S \in A^{\mathrm{FS}}} A_S \ln \frac{A_S}{A^{\mathrm{X}}_S} \end{eqnarray} \]

z = size(aa)

z * ent(ind(aa)) - z * ent(aa)
# 33.994661568653214

Consider the alignment of the scaled sum where a diagonal replaces the singleton,

aa = norm(regdiag(3,2))
aa = add(aa,norm(regdiag(3,2)))
aa = add(aa,norm(regcart(3,2)))
aa = resize(100,aa)

rpln(aall(aa))
# ({(1, 1), (2, 1)}, 700 % 27)
# ({(1, 1), (2, 2)}, 100 % 27)
# ({(1, 1), (2, 3)}, 100 % 27)
# ({(1, 2), (2, 1)}, 100 % 27)
# ({(1, 2), (2, 2)}, 700 % 27)
# ({(1, 2), (2, 3)}, 100 % 27)
# ({(1, 3), (2, 1)}, 100 % 27)
# ({(1, 3), (2, 2)}, 100 % 27)
# ({(1, 3), (2, 3)}, 700 % 27)

algn(aa)
# 39.539287211260046

z * ent(ind(aa)) - z * ent(aa)
# 41.48733828193562

The alignment has increased because the replacement of the singleton with a diagonal increases the dependency between the variables. Now compare to the alignment of just the scaled regular diagonal where the dependency is greater still,

aa = norm(regdiag(3,2))
aa = resize(100,aa)

rpln(aall(aa))
# ({(1, 1), (2, 1)}, 100 % 3)
# ({(1, 2), (2, 2)}, 100 % 3)
# ({(1, 3), (2, 3)}, 100 % 3)

algn(aa)
# 98.71169723276279

z * ent(ind(aa)) - z * ent(aa)
# 109.86122886681099

By comparison, the alignment of the regular cartesian is zero, because the cartesian is independent and so there is no dependency between the variables,

aa = norm(regcart(3,2))
aa = resize(100,aa)

rpln(aall(aa))
# ({(1, 1), (2, 1)}, 100 % 9)
# ({(1, 1), (2, 2)}, 100 % 9)
# ({(1, 1), (2, 3)}, 100 % 9)
# ({(1, 2), (2, 1)}, 100 % 9)
# ({(1, 2), (2, 2)}, 100 % 9)
# ({(1, 2), (2, 3)}, 100 % 9)
# ({(1, 3), (2, 1)}, 100 % 9)
# ({(1, 3), (2, 2)}, 100 % 9)
# ({(1, 3), (2, 3)}, 100 % 9)

aa == ind(aa)
# True

algn(aa)
# 0.0

z * ent(ind(aa)) - z * ent(aa)
# 0.0

Similarly, the alignment of the regular singleton is zero, because the singleton is independent.

The alignment of an independent histogram, $A = A^{\mathrm{X}}$, is zero. In particular, scalar histograms, $V=\emptyset$, mono-variate histograms, $|V|=1$, uniform cartesian histograms, $A = V_z^{\mathrm{C}}$, and effective singleton histograms, $|A^{\mathrm{F}}| =1$, all have zero alignment,

algn(scalar(100))
# 0.0

algn(resize(100,regdiag(3,1)))
# 0.0

algn(resize(100,regcart(3,3)))
# 0.0

algn(resize(100,regsing(3,3)))
# 0.0

The maximum alignment of a histogram $A$ occurs when the histogram is both uniform and fully diagonalised. No pair of effective states shares any value, $\forall S,T \in A^{\mathrm{FS}}~(S \neq T \implies S \cap T = \emptyset)$, and all counts are equal along the diagonal, $\forall S,T \in A^{\mathrm{FS}}~(A_S = A_T)$,

algn(resize(100,regdiag(3,2)))
# 98.71169723276279

The maximum alignment of a regular histogram with dimension $n =|V|$ and valency $d$ is \[ d \ln \Gamma_! \frac{z}{d}~-~d^n \ln \Gamma_! \frac{z}{d^n} \]

def facln(x):
    return gammaln(float(x) + 1)

d = 3

n = 2

d * facln(z/d) - (d**n) * facln(z/(d**n))
# 98.71169723276279

The maximum alignment is approximately $z \ln d^{n-1} = z \ln v/d$, where $v = d^n$,

z * log(d**(n-1))
# 109.86122886681098

Example - a weather forecast

Some of the concepts above regarding entropy and alignment can be demonstrated with the sample of some weather measurements created in States, histories and histograms,

def lluu(ll):
    return listsSystem([(v,sset(ww)) for (v,ww) in ll])

def llhh(vv,ev):
    return listsHistory([(IdInt(i), llss(zip(vv,ll))) for (i,ll) in ev])

def red(aa,ll):
    return setVarsHistogramsReduce(sset(ll),aa)

def ssplit(ll,aa):
    return setVarsSetStatesSplit(sset(ll),states(aa))


[pressure,cloud,wind,rain] = map(VarStr,["pressure","cloud","wind","rain"])

[low,medium,high,none,light,heavy,strong] = map(ValStr,["low","medium","high","none","light","heavy","strong"])


uu = lluu([
      (pressure, [low,medium,high]),
      (cloud,    [none,light,heavy]),
      (wind,     [none,light,strong]),
      (rain,     [none,light,heavy])])

vv = uvars(uu)

hh = llhh([pressure,cloud,wind,rain],[
      (1,[high,none,none,none]),
      (2,[medium,light,none,light]),
      (3,[high,none,light,none]),
      (4,[low,heavy,strong,heavy]),
      (5,[low,none,light,light]),
      (6,[medium,none,light,light]),
      (7,[low,heavy,light,heavy]),
      (8,[high,none,light,none]),
      (9,[medium,light,strong,heavy]),
      (10,[medium,light,light,light]),
      (11,[high,light,light,heavy]),
      (12,[medium,none,none,none]),
      (13,[medium,light,none,none]),
      (14,[high,light,strong,light]),
      (15,[medium,none,light,light]),
      (16,[low,heavy,strong,heavy]),
      (17,[low,heavy,light,heavy]),
      (18,[high,none,none,none]),
      (19,[low,light,none,light]),
      (20,[high,none,none,none])])

aa = hhaa(hh)

uu
# {(cloud, {heavy, light, none}), (pressure, {high, low, medium}), (rain, {heavy, light, none}), (wind, {light, none, strong})}

vv
# {cloud, pressure, rain, wind}

rpln(aall(aa))
# ({(cloud, heavy), (pressure, low), (rain, heavy), (wind, light)}, 2 % 1)
# ({(cloud, heavy), (pressure, low), (rain, heavy), (wind, strong)}, 2 % 1)
# ({(cloud, light), (pressure, high), (rain, heavy), (wind, light)}, 1 % 1)
# ({(cloud, light), (pressure, high), (rain, light), (wind, strong)}, 1 % 1)
# ({(cloud, light), (pressure, low), (rain, light), (wind, none)}, 1 % 1)
# ({(cloud, light), (pressure, medium), (rain, heavy), (wind, strong)}, 1 % 1)
# ({(cloud, light), (pressure, medium), (rain, light), (wind, light)}, 1 % 1)
# ({(cloud, light), (pressure, medium), (rain, light), (wind, none)}, 1 % 1)
# ({(cloud, light), (pressure, medium), (rain, none), (wind, none)}, 1 % 1)
# ({(cloud, none), (pressure, high), (rain, none), (wind, light)}, 2 % 1)
# ({(cloud, none), (pressure, high), (rain, none), (wind, none)}, 3 % 1)
# ({(cloud, none), (pressure, low), (rain, light), (wind, light)}, 1 % 1)
# ({(cloud, none), (pressure, medium), (rain, light), (wind, light)}, 2 % 1)
# ({(cloud, none), (pressure, medium), (rain, none), (wind, none)}, 1 % 1)

size(aa)
# 20 % 1

The sized entropy, $z \times \mathrm{entropy}(A)$, is

z = size(aa)

z * ent(aa)
# 51.07363116059592

The sized independent entropy, $z \times \mathrm{entropy}(A^{\mathrm{X}})$, is

z * ent(ind(aa))
# 85.78884471224839

Compare that to the maximum sized entropy, $z \times \mathrm{entropy}(V_z^{\mathrm{C}}) = z \ln v$,

v = vol(uu,vv)

z * log(v)
# 87.88898309344879

The multinomial coefficient of the histogram, $A$, is \[ \frac{z!}{\prod_{S \in A^{\mathrm{S}}} A_S!} \]

multln = histogramsMultinomialLog

multln(aa)
# 37.77126826928565

multln(ind(aa))
# 49.62212054431038

multln(resize(20,unit(cart(uu,vv))))
# 50.23830936717706

The sized mutual entropy is \[ \sum_{S \in A^{\mathrm{FS}}} A_S \ln \frac{A_S}{A^{\mathrm{X}}_S} \]

z * ent(ind(aa)) - z * ent(aa)
# 34.71521355165247

The alignment of the histogram, $A$, is \[ \begin{eqnarray} \mathrm{algn}(A) &:=& \sum_{S \in A^{\mathrm{S}}} \ln \Gamma_! A_S - \sum_{S \in A^{\mathrm{XS}}} \ln \Gamma_! A^{\mathrm{X}}_S \end{eqnarray} \]

algn(aa)
# 11.85085227502473

Note that in this case where the counts are small, the sized mutual entropy differs considerably from the alignment. The relative difference is smaller for a scaled histogram, for example,

aa1 = mul(scalar(100),aa)

z1 = 100 * z

z1 * ent(ind(aa1)) - z1 * ent(aa1)
# 3471.521355165247

algn(aa1)
# 3318.508094475117

The histogram, $A$, happens to be a regular histogram of dimension $n = |V| = 4$ and valency $\{d\} = \{|U_w| : w \in V\} = \{3\}$, so the maximum alignment is that of a regular diagonal,

def facln(x):
    return gammaln(float(x) + 1)

d = 3

n = 4

d * facln(z/d) - (d**n) * facln(z/(d**n))
# 31.485060233929005

z * log(d**(n-1))
# 65.91673732008658

Here are the alignments of various reductions,

algn(red(aa,[pressure,rain]))
# 4.278766678519384

algn(red(aa,[pressure,cloud]))
# 4.6232784937782885

algn(red(aa,[pressure,wind]))
# 0.646716967212571

algn(red(aa,[cloud,rain]))
# 6.4150379630063465

algn(red(aa,[cloud,wind]))
# 2.7673350044725016

algn(red(aa,[wind,rain]))
# 3.930131313218345

algn(red(aa,[cloud,wind,rain]))
# 8.935048311238008

These alignments may be contrasted with the label entropies,

def lent(kk,aa):
    return size(aa) * (ent(aa) - ent(ared(aa,sset(kk))))

lent([pressure],red(aa,[pressure,rain]))
# 16.083165728773302

lent([pressure],red(aa,[pressure,cloud]))
# 14.173623223888864

lent([pressure],red(aa,[pressure,wind]))
# 20.127820211001172

lent([cloud],red(aa,[cloud,rain]))
# 12.418526752441053

lent([cloud],red(aa,[cloud,wind]))
# 16.508188366758773

lent([wind],red(aa,[wind,rain]))
# 15.984940222994219

lent([cloud,wind],red(aa,[cloud,wind,rain]))
# 8.047189562170498

The weather forecast example continues in Transforms.


top