# Aligned Induction

## Transform entropy

Haskell implementation of the Overview/Transform entropy

### Sections

Definitions

Model entropy

Example - a weather forecast

### Definitions

Let $T$ be a one functional transform, $T \in \mathcal{T}_{U,\mathrm{f},1}$, having underlying variables $V = \mathrm{und}(T)$. Let $A$ be a histogram, $A \in \mathcal{A}$, in the underlying variables, $\mathrm{vars}(A) = V$, having size $z = \mathrm{size}(A) > 0$. The underlying volume is $v = |V^{\mathrm{C}}|$. The derived volume is $w = |T^{-1}|$.

Consider the deck of cards example,

let lluu ll = fromJust $listsSystem [(v,Set.fromList ww) | (v,ww) <- ll] let [suit,rank] = map VarStr ["suit","rank"] [hearts,clubs,diamonds,spades] = map ValStr ["hearts","clubs","diamonds","spades"] [jack,queen,king,ace] = map ValStr ["J","Q","K","A"] let uu = lluu [ (suit, [hearts, clubs, diamonds, spades]), (rank, [jack,queen,king,ace] ++ map ValInt [2..10])] let vv = Set.fromList [suit, rank] rp uu "{(rank,{A,J,K,Q,2,3,4,5,6,7,8,9,10}),(suit,{clubs,diamonds,hearts,spades})}" rp vv "{rank,suit}" let aa = unit (cart uu vv) rpln$ aall aa
"({(rank,A),(suit,clubs)},1 % 1)"
"({(rank,A),(suit,diamonds)},1 % 1)"
"({(rank,A),(suit,hearts)},1 % 1)"
"({(rank,J),(suit,clubs)},1 % 1)"
"({(rank,J),(suit,diamonds)},1 % 1)"
...
"({(rank,9),(suit,hearts)},1 % 1)"
"({(rank,10),(suit,clubs)},1 % 1)"
"({(rank,10),(suit,diamonds)},1 % 1)"
"({(rank,10),(suit,hearts)},1 % 1)"


Also consider a game of cards which has a special deck such that spades and clubs are pip cards and hearts and diamonds are face cards. The suit and the rank are no longer independent,

let bb = unit (Set.fromList (
[llss [(suit,s),(rank,r)] | s <- [spades,clubs],    r <- ace : map ValInt [2..10]] ++
[llss [(suit,s),(rank,r)] | s <- [hearts,diamonds], r <- [jack,queen,king]]))

rpln $aall bb "({(rank,A),(suit,clubs)},1 % 1)" "({(rank,A),(suit,spades)},1 % 1)" "({(rank,J),(suit,diamonds)},1 % 1)" ... "({(rank,9),(suit,spades)},1 % 1)" "({(rank,10),(suit,clubs)},1 % 1)" "({(rank,10),(suit,spades)},1 % 1)"  Consider the transform relating the suit to the colour, let colour = VarStr "colour" red = ValStr "red"; black = ValStr "black" let xx = llaa [(llss [(suit, u),(colour, w)],1) | (u,w) <- [(hearts, red), (clubs, black), (diamonds, red), (spades, black)]] rpln$ aall xx
"({(colour,black),(suit,clubs)},1 % 1)"
"({(colour,red),(suit,diamonds)},1 % 1)"
"({(colour,red),(suit,hearts)},1 % 1)"

let ww = Set.fromList [colour]

let kk = vars xx Set.difference ww

let tt = trans xx ww

ttaa tt == xx
True

und tt == kk
True

der tt == ww
True


In order to compare the sized derived entropies of the two decks, we shall add together two special decks, $B + B$, to have the same size as whole deck, $A$,

size aa
52 % 1

size bb
26 % 1

let bb = scalar 2 mul unit (Set.fromList (
[llss [(suit,s),(rank,r)] | s <- [spades,clubs],    r <- ace : map ValInt [2..10]] ++
[llss [(suit,s),(rank,r)] | s <- [hearts,diamonds], r <- [jack,queen,king]]))

size bb
52 % 1

rpln $aall bb "({(rank,A),(suit,clubs)},2 % 1)" "({(rank,A),(suit,spades)},2 % 1)" "({(rank,J),(suit,diamonds)},2 % 1)" ... "({(rank,9),(suit,spades)},2 % 1)" "({(rank,10),(suit,clubs)},2 % 1)" "({(rank,10),(suit,spades)},2 % 1)" rpln$ aall $aa tmul tt "({(colour,black)},26 % 1)" "({(colour,red)},26 % 1)" rpln$ aall $bb tmul tt "({(colour,black)},40 % 1)" "({(colour,red)},12 % 1)"  The derived entropy or component size entropy is $\begin{eqnarray} \mathrm{entropy}(A * T) &:=& -\sum_{(R,\cdot) \in T^{-1}} (\hat{A} * T)_R \times \ln~(\hat{A} * T)_R \end{eqnarray}$ let ent = histogramsEntropy ent (aa tmul tt) 0.6931471805599453 ent (bb tmul tt) 0.5402041423888608  The derived entropy is positive and less than or equal to the logarithm of the derived volume,$0 \leq \mathrm{entropy}(A * T) \leq \ln w$, let w = fromIntegral (Set.size (states (xx ared der tt))) :: Double w 2.0 log w 0.6931471805599453 ent (aa tmul tt) <= log w True ent (bb tmul tt) <= log w True  Complementary to the derived entropy is the expected component entropy, $\begin{eqnarray} \mathrm{entropyComponent}(A,T) &:=& \sum_{(R,C) \in T^{-1}} (\hat{A} * T)_R \times \mathrm{entropy}(A * C)\\ &=&\sum_{(R,\cdot) \in T^{-1}} (\hat{A} * T)_R \times \mathrm{entropy}(\{R\}^{\mathrm{U}} * T^{\odot A}) \end{eqnarray}$ transformsHistogramsEntropyComponent :: Transform -> Histogram -> Double  For example, let cent aa tt = transformsHistogramsEntropyComponent tt aa cent aa tt 3.2580965380214835 cent bb tt 2.7178923956326213  The cartesian derived entropy or component cardinality entropy is $\begin{eqnarray} \mathrm{entropy}(V^{\mathrm{C}} * T) &:=& -\sum_{(R,\cdot) \in T^{-1}} (\hat{V}^{\mathrm{C}} * T)_R \times \ln~(\hat{V}^{\mathrm{C}} * T)_R \end{eqnarray}$ let vvc = unit (cart uu vv) ent (vvc tmul tt) 0.6931471805599453  In the case of the whole deck of cards, the histogram is cartesian,$A = V^{\mathrm{C}}$, so the component cardinality entropy equals the derived entropy,$V^{\mathrm{C}} * T = A * T$, ent (vvc tmul tt) == ent (aa tmul tt) True  The cartesian derived entropy is positive and less than or equal to the logarithm of the derived volume,$0 \leq \mathrm{entropy}(V^{\mathrm{C}} * T) \leq \ln w$, ent (vvc tmul tt) <= log w True  The cartesian derived derived sum entropy or component size cardinality sum entropy is $\begin{eqnarray} \mathrm{entropy}(A * T) + \mathrm{entropy}(V^{\mathrm{C}} * T) \end{eqnarray}$ ent (aa tmul tt) + ent (vvc tmul tt) 1.3862943611198906 ent (bb tmul tt) + ent (vvc tmul tt) 1.2333513229488062  The component size cardinality cross entropy is the negative derived histogram expected normalised cartesian derived count logarithm, $\begin{eqnarray} \mathrm{entropyCross}(A * T,V^{\mathrm{C}} * T) &:=& -\sum_{(R,\cdot) \in T^{-1}} (\hat{A} * T)_R \times \ln~(\hat{V}^{\mathrm{C}} * T)_R \end{eqnarray}$ histogramsHistogramsEntropyCross :: Histogram -> Histogram -> Double  For example, let crent = histogramsHistogramsEntropyCross crent (aa tmul tt) (vvc tmul tt) 0.6931471805599453 crent (bb tmul tt) (vvc tmul tt) 0.6931471805599453  The component size cardinality cross entropy is greater than or equal to the derived entropy,$\mathrm{entropyCross}(A * T,V^{\mathrm{C}} * T) \geq \mathrm{entropy}(A * T)$, crent (aa tmul tt) (vvc tmul tt) >= ent (aa tmul tt) True crent (bb tmul tt) (vvc tmul tt) >= ent (bb tmul tt) True  The component cardinality size cross entropy is the negative cartesian derived expected normalised derived histogram count logarithm, $\begin{eqnarray} \mathrm{entropyCross}(V^{\mathrm{C}} * T,A * T) &:=& -\sum_{(R,\cdot) \in T^{-1}} (\hat{V}^{\mathrm{C}} * T)_R \times \ln~(\hat{A} * T)_R \end{eqnarray}$ crent (vvc tmul tt) (aa tmul tt) 0.6931471805599453 crent (vvc tmul tt) (bb tmul tt) 0.864350666630459  The component cardinality size cross entropy is greater than or equal to the cartesian derived entropy,$\mathrm{entropyCross}(V^{\mathrm{C}} * T,A * T) \geq \mathrm{entropy}(V^{\mathrm{C}} * T)$, crent (vvc tmul tt) (aa tmul tt) >= ent (vvc tmul tt) True crent (vvc tmul tt) (bb tmul tt) >= ent (vvc tmul tt) True  The component size cardinality sum cross entropy is $\begin{eqnarray} \mathrm{entropy}(A * T + V^{\mathrm{C}} * T) \end{eqnarray}$ ent ((aa tmul tt) add (vvc tmul tt)) 0.6931471805599453 ent ((bb tmul tt) add (vvc tmul tt)) 0.6564535237245771  The component size cardinality sum cross entropy is positive and less than or equal to the logarithm of the derived volume,$0 \leq \mathrm{entropy}(A * T + V^{\mathrm{C}} * T) \leq \ln w$, ent ((aa tmul tt) add (vvc tmul tt)) <= log w True ent ((bb tmul tt) add (vvc tmul tt)) <= log w True  In all cases the cross entropy is maximised when high size components are low cardinality components,$(\hat{A} * T)_R \gg (\hat{V}^{\mathrm{C}} * T)_R$or$\mathrm{size}(A * C)/z \gg |C|/v$, and low size components are high cardinality components,$(\hat{A} * T)_R \ll (\hat{V}^{\mathrm{C}} * T)_R$or$\mathrm{size}(A * C)/z \ll |C|/v$, where$(R,C) \in T^{-1}$. To show this consider another transform$T’$, let tt' = trans (cdaa [[1,1,1],[1,2,2],[1,3,2],[2,1,2],[2,2,1],[2,3,2],[3,1,2],[3,2,2],[3,3,1]]) (Set.fromList [VarInt 3]) rpln$ aall $ttaa tt' "({(1,1),(2,1),(3,1)},1 % 1)" "({(1,1),(2,2),(3,2)},1 % 1)" "({(1,1),(2,3),(3,2)},1 % 1)" "({(1,2),(2,1),(3,2)},1 % 1)" "({(1,2),(2,2),(3,1)},1 % 1)" "({(1,2),(2,3),(3,2)},1 % 1)" "({(1,3),(2,1),(3,2)},1 % 1)" "({(1,3),(2,2),(3,2)},1 % 1)" "({(1,3),(2,3),(3,1)},1 % 1)"  Let$A’$be a scaled regular diagonal histogram plus a scaled regular cartesian histogram, let aa' = resize 9$ norm (regdiag 3 2) add norm (regcart 3 2)

rpln $aall$ aa'
"({(1,1),(2,1)},2 % 1)"
"({(1,1),(2,2)},1 % 2)"
"({(1,1),(2,3)},1 % 2)"
"({(1,2),(2,1)},1 % 2)"
"({(1,2),(2,2)},2 % 1)"
"({(1,2),(2,3)},1 % 2)"
"({(1,3),(2,1)},1 % 2)"
"({(1,3),(2,2)},1 % 2)"
"({(1,3),(2,3)},2 % 1)"

let vvc' = regcart 3 2

rpln $aall$ vvc'
"({(1,1),(2,1)},1 % 1)"
"({(1,1),(2,2)},1 % 1)"
"({(1,1),(2,3)},1 % 1)"
"({(1,2),(2,1)},1 % 1)"
"({(1,2),(2,2)},1 % 1)"
"({(1,2),(2,3)},1 % 1)"
"({(1,3),(2,1)},1 % 1)"
"({(1,3),(2,2)},1 % 1)"
"({(1,3),(2,3)},1 % 1)"

rpln $aall$ aa' tmul tt'
"({(3,1)},6 % 1)"
"({(3,2)},3 % 1)"

rpln $aall$ vvc' tmul tt'
"({(3,1)},3 % 1)"
"({(3,2)},6 % 1)"


The derived entropy equals the cartesian derived entropy,

ent (aa' tmul tt')
0.6365141682948128

ent (vvc' tmul tt')
0.6365141682948128


but the cross entropy is greater than either,

ent ((aa' tmul tt') add (vvc' tmul tt'))
0.6931471805599453

crent (aa' tmul tt') (vvc' tmul tt')
0.8675632284814613

crent (vvc' tmul tt') (aa' tmul tt')
0.8675632284814613


The cross entropy is minimised when the normalised derived histogram equals the normalised cartesian derived, $\hat{A} * T = \hat{V}^{\mathrm{C}} * T$ or $\forall (R,C) \in T^{-1}~(\mathrm{size}(A * C)/z = |C|/v)$. In this case the cross entropy equals the corresponding component entropy,

ent ((vvc' tmul tt') add (vvc' tmul tt'))
0.6365141682948128

crent (vvc' tmul tt') (vvc' tmul tt')
0.6365141682948128


The component size cardinality relative entropy is the component size cardinality cross entropy minus the component size entropy, $\begin{eqnarray} \mathrm{entropyRelative}(A * T,V^{\mathrm{C}} * T) &:=& \sum_{(R,\cdot) \in T^{-1}} (\hat{A} * T)_R \times \ln\frac{(\hat{A} * T)_R}{(\hat{V}^{\mathrm{C}} * T)_R}\\ &=& \mathrm{entropyCross}(A * T,V^{\mathrm{C}} * T)~-~\mathrm{entropy}(A * T) \end{eqnarray}$ The component size cardinality relative entropy is positive, $\mathrm{entropyRelative}(A * T,V^{\mathrm{C}} * T) \geq 0$,

crent (aa tmul tt) (vvc tmul tt) - ent (aa tmul tt)
0.0

crent (bb tmul tt) (vvc tmul tt) - ent (bb tmul tt)
0.15294303817108446

crent (aa' tmul tt') (vvc' tmul tt') - ent (aa' tmul tt')
0.2310490601866485


The component cardinality size relative entropy is the component cardinality size cross entropy minus the component cardinality entropy, $\begin{eqnarray} \mathrm{entropyRelative}(V^{\mathrm{C}} * T,A * T) &:=& \sum_{(R,\cdot) \in T^{-1}} (\hat{V}^{\mathrm{C}} * T)_R \times \ln\frac{(\hat{V}^{\mathrm{C}} * T)_R}{(\hat{A} * T)_R}\\ &=& \mathrm{entropyCross}(V^{\mathrm{C}} * T,A * T)~-~\mathrm{entropy}(V^{\mathrm{C}} * T) \end{eqnarray}$ The component cardinality size relative entropy is positive, $\mathrm{entropyRelative}(V^{\mathrm{C}} * T,A * T) \geq 0$,

crent (vvc tmul tt) (aa tmul tt) - ent (vvc tmul tt)
0.0

crent (vvc tmul tt) (bb tmul tt) - ent (vvc tmul tt)
0.17120348607051372

crent (vvc' tmul tt') (aa' tmul tt') - ent (vvc' tmul tt')
0.2310490601866485


The size-volume scaled component size cardinality sum relative entropy is the size-volume scaled component size cardinality sum cross entropy minus the size-volume scaled component size cardinality sum entropy, $\begin{eqnarray} (z+v) \times \mathrm{entropy}(A * T + V^{\mathrm{C}} * T)~-~z \times \mathrm{entropy}(A * T)~-~v \times \mathrm{entropy}(V^{\mathrm{C}} * T) \end{eqnarray}$ The size-volume scaled component size cardinality sum relative entropy is positive and less than the size-volume scaled logarithm of the derived volume, $(z+v) \ln w$,

let z = fromRational (size aa) :: Double
v = fromIntegral (vol uu vv) :: Double

(z+v) * ent ((aa tmul tt) add (vvc tmul tt)) - z * ent (aa tmul tt) - v * ent (vvc tmul tt)
0.0

(z+v) * ent ((bb tmul tt) add (vvc tmul tt)) - z * ent (bb tmul tt) - v * ent (vvc tmul tt)
4.136897674018108

(z+v) * log w
72.0873067782343

let z' = 9 :: Double
v' = 9 :: Double
w' = 2 :: Double

(z'+v') * ent ((aa' tmul tt') add (vvc' tmul tt')) - z' * ent (aa' tmul tt') - v' * ent (vvc' tmul tt')
1.0193942207723854

(z'+v') * log w'
12.476649250079015

(z'+v') * ent ((vvc' tmul tt') add (vvc' tmul tt')) - z' * ent (vvc' tmul tt') - v' * ent (vvc' tmul tt')
0.0


In all cases the relative entropy is maximised when (a) the cross entropy is maximised and (b) the component entropy is minimised. That is, the relative entropy is maximised when both (i) the component size entropy, $\mathrm{entropy}(A * T)$, and (ii) the component cardinality entropy, $\mathrm{entropy}(V^{\mathrm{C}} * T)$, are low, but low in different ways so that the component size cardinality sum cross entropy, $\mathrm{entropy}(A * T + V^{\mathrm{C}} * T)$, is high.

### Model entropy

Let histogram $A$ have a set of variables $V = \mathrm{vars}(A)$ which is partitioned into query variables $K \subset V$ and label variables $V \setminus K$. Let $T \in \mathcal{T}_{U,\mathrm{f},1}$ be a one functional transform having underlying variables equal to the query variables, $\mathrm{und}(T) = K$. As shown above, given a query state $Q \in K^{\mathrm{CS}}$ that is effective in the sample derived, $R \in (A * T)^{\mathrm{FS}}$ where $\{R\} = (\{Q\}^{\mathrm{U}} * T)^{\mathrm{FS}}$, the probability histogram for the label is $\begin{eqnarray} \{Q\}^{\mathrm{U}} * T * T^{\odot A}~\%~(V \setminus K) &\in& \mathcal{A} \cap \mathcal{P} \end{eqnarray}$ In the deck of cards example, the model of the colours of the suits does not tell us anything about the rank given the suit in the case where the histogram is the entire deck,

let qq = unit (Set.singleton (llss [(suit,clubs)]))

let vk = vv Set.difference kk

rpln $aall$ norm $qq tmul tt mul ttaa tt mul aa ared vk "({(rank,A)},1 % 13)" "({(rank,J)},1 % 13)" "({(rank,K)},1 % 13)" "({(rank,Q)},1 % 13)" "({(rank,2)},1 % 13)" "({(rank,3)},1 % 13)" "({(rank,4)},1 % 13)" "({(rank,5)},1 % 13)" "({(rank,6)},1 % 13)" "({(rank,7)},1 % 13)" "({(rank,8)},1 % 13)" "({(rank,9)},1 % 13)" "({(rank,10)},1 % 13)"  So the entropy is high, ent$ qq tmul tt mul ttaa tt mul aa ared vk
2.5649493574615376


In the case of the special deck, however, our model aligns the suit to the rank via colour, so a query on clubs is always a pip card,

rpln $aall$ norm $qq tmul tt mul ttaa tt mul bb ared vk "({(rank,A)},1 % 10)" "({(rank,2)},1 % 10)" "({(rank,3)},1 % 10)" "({(rank,4)},1 % 10)" "({(rank,5)},1 % 10)" "({(rank,6)},1 % 10)" "({(rank,7)},1 % 10)" "({(rank,8)},1 % 10)" "({(rank,9)},1 % 10)" "({(rank,10)},1 % 10)"  and the entropy is lower, ent$ qq tmul tt mul ttaa tt mul bb ared vk
2.3025850929940455


Similarly, a query on hearts is always a face card,

let qq = unit (Set.singleton (llss [(suit,hearts)]))

rpln $aall$ norm $qq tmul tt mul ttaa tt mul bb ared vk "({(rank,J)},1 % 3)" "({(rank,K)},1 % 3)" "({(rank,Q)},1 % 3)"  which has still lower entropy, ent$ qq tmul tt mul ttaa tt mul bb ared vk
1.0986122886681096


If the normalised histogram, $\hat{A} \in \mathcal{A} \cap \mathcal{P}$, is treated as a probability function of a single-state query, the scaled expected entropy of the modelled transformed conditional product, or scaled label entropy, is $\begin{eqnarray} &&\sum_{(R,C) \in T^{-1}} (A * T)_R \times \mathrm{entropy}(A * C~\%~(V \setminus K))\\ &=&\sum_{(R,\cdot) \in T^{-1}} (A * T)_R \times \mathrm{entropy}(\{R\}^{\mathrm{U}} * T^{\odot A}~\%~(V \setminus K)) \end{eqnarray}$

setVarsTransformsHistogramsEntropyLabel :: Set.Set Variable -> Transform -> Histogram -> Double


For example,

let tlent kk aa tt = setVarsTransformsHistogramsEntropyLabel kk tt aa

tlent Set.empty aa tt
169.42101997711714

tlent kk aa tt
133.37736658799994

tlent Set.empty bb tt
141.3304045728963

tlent kk bb tt
105.28675118377913


This is similar to the definition of the scaled expected component entropy, above, $\begin{eqnarray} z \times \mathrm{entropyComponent}(A,T) &:=& \sum_{(R,C) \in T^{-1}} (A * T)_R \times \mathrm{entropy}(A * C)\\ &=&\sum_{(R,\cdot) \in T^{-1}} (A * T)_R \times \mathrm{entropy}(\{R\}^{\mathrm{U}} * T^{\odot A}) \end{eqnarray}$ but now the component is reduced to the label variables, $V \setminus K$,

let cent aa tt = transformsHistogramsEntropyComponent tt aa

let z = fromRational (size aa) :: Double

z * cent aa tt
169.42101997711714

z * cent bb tt
141.3304045728963


The label entropy, may be contrasted with the alignment between the derived variables, $W$, and the label variables, $V \setminus K$, $\begin{eqnarray} \mathrm{algn}(A * \mathrm{his}(T)~\%~(W \cup V \setminus K)) \end{eqnarray}$

algn $aa mul ttaa tt ared (ww Set.union vk) 0.0 algn$ bb mul ttaa tt ared (ww Set.union vk)
17.152441878915248


The alignment varies against the scaled label entropy or scaled query conditional entropy. Let $B = A * \mathrm{his}(T)~\%~(W \cup V \setminus K)$, $\begin{eqnarray} &&\mathrm{algn}(A * \mathrm{his}(T)~\%~(W \cup V \setminus K)) \\ &&\hspace{5em}=\mathrm{algn}(B) \\ &&\hspace{5em}\approx z \times \mathrm{entropy}(B^{\mathrm{X}}) - z \times \mathrm{entropy}(B) \\ &&\hspace{5em}\sim z \times \mathrm{entropy}(B\%W) + z \times \mathrm{entropy}(B\%(V \setminus K)) - z \times \mathrm{entropy}(B) \\ &&\hspace{5em}\sim -(z \times \mathrm{entropy}(B) - z \times \mathrm{entropy}(B\%W)) \\ &&\hspace{5em}= -\sum_{R \in (B\%W)^{\mathrm{FS}}} (B\%W)_R \times \mathrm{entropy}(B * \{R\}^{\mathrm{U}}~\%~(V \setminus K))\\ &&\hspace{5em}= -\sum_{(R,C) \in T^{-1}} (A * T)_R \times \mathrm{entropy}(A * C~\%~(V \setminus K)) \end{eqnarray}$ The label entropy, may also be compared to the slice entropy, which is the sum of the sized entropies of the contingent slices reduced to the label variables, $V \setminus K$, $\sum_{R \in (A\%K)^{\mathrm{FS}}} (A\%K)_R \times \mathrm{entropy}(A * \{R\}^{\mathrm{U}}~\%~(V \setminus K))$

let lent kk aa = setVarsHistogramsSliceEntropy kk aa

lent Set.empty aa
205.46467336623434

lent kk aa
133.37736658799994

lent Set.empty bb
169.42101997711714

lent kk bb
105.28675118377913


In the case where the relation between the derived variables and the label variables is functional or causal, $\begin{eqnarray} \mathrm{split}(W,(A * \mathrm{his}(T)~\%~(W \cup V \setminus K))^{\mathrm{FS}}) &\in& W^{\mathrm{CS}} \to (V \setminus K)^{\mathrm{CS}} \end{eqnarray}$ the label entropy is zero, $\begin{eqnarray} \sum_{(R,C) \in T^{-1}} (A * T)_R \times \mathrm{entropy}(A * C~\%~(V \setminus K)) &=& 0 \end{eqnarray}$ This would be the case, for example, for a deck consisting of 26 ace of spades and 26 queen of hearts,

let cc = scalar 26 mul unit (Set.fromList [
llss [(suit,hearts),(rank,queen)]])

rpln $aall$ cc
"({(rank,Q),(suit,hearts)},26 % 1)"

rpln $aall$ cc tmul tt
"({(colour,black)},26 % 1)"
"({(colour,red)},26 % 1)"

rpln $aall$ cc mul ttaa tt ared (ww Set.union vk)
"({(colour,black),(rank,A)},26 % 1)"
"({(colour,red),(rank,Q)},26 % 1)"

rpln $Set.toList$ ssplit ww (states (cc mul ttaa tt ared (ww Set.union vk)))
"({(colour,black)},{(rank,A)})"
"({(colour,red)},{(rank,Q)})"

tlent kk cc tt
0.0

algn $cc mul ttaa tt ared (ww Set.union vk) 32.31474810951032  Now the model predicts the rank given the suit, let qq = unit (Set.singleton (llss [(suit,clubs)])) rpln$ aall $norm$ qq tmul tt mul ttaa tt mul cc ared vk
"({(rank,A)},1 % 1)"

let qq = unit (Set.singleton (llss [(suit,hearts)]))

rpln $aall$ norm qq tmul tt mul ttaa tt mul cc ared vk "({(rank,Q)},1 % 1)"  So label entropy is a measure of the ambiguity in the relation between the derived variables and the label variables. Negative label entropy may be viewed as the degree to which the derived variables of the model predict the label variables. In the cases of low label entropy, or high causality, the derived variables and the label variables are correlated and therefore aligned,\mathrm{algn}(A * \mathrm{his}(T)~\%~(W \cup V \setminus K)) > 0$. In these cases the derived histogram tends to the diagonal,$\mathrm{algn}(A * T) > 0$. ### Example - a weather forecast Some of the concepts above regarding transform entropy can be demonstrated with the sample of some weather measurements created in States, histories and histograms, let lluu ll = fromJust$ listsSystem [(v,Set.fromList ww) | (v,ww) <- ll]
llhh vv ev = fromJust $listsHistory [(IdInt i, llss (zip vv ll)) | (i,ll) <- ev] red aa ll = setVarsHistogramsReduce (Set.fromList ll) aa ssplit ll aa = Set.toList (setVarsSetStatesSplit (Set.fromList ll) (states aa)) lltt kk ww qq = trans (unit (Set.fromList [llss (zip (kk ++ ww) ll) | ll <- qq])) (Set.fromList ww) query qq tt aa ll = norm (qq tmul tt mul ttaa tt mul aa red ll) let [pressure,cloud,wind,rain] = map VarStr ["pressure","cloud","wind","rain"] let [low,medium,high,none,light,heavy,strong] = map ValStr ["low","medium","high","none","light","heavy","strong"]   let uu = lluu [ (pressure, [low,medium,high]), (cloud, [none,light,heavy]), (wind, [none,light,strong]), (rain, [none,light,heavy])] let vv = uvars uu let hh = llhh [pressure,cloud,wind,rain] [ (1,[high,none,none,none]), (2,[medium,light,none,light]), (3,[high,none,light,none]), (4,[low,heavy,strong,heavy]), (5,[low,none,light,light]), (6,[medium,none,light,light]), (7,[low,heavy,light,heavy]), (8,[high,none,light,none]), (9,[medium,light,strong,heavy]), (10,[medium,light,light,light]), (11,[high,light,light,heavy]), (12,[medium,none,none,none]), (13,[medium,light,none,none]), (14,[high,light,strong,light]), (15,[medium,none,light,light]), (16,[low,heavy,strong,heavy]), (17,[low,heavy,light,heavy]), (18,[high,none,none,none]), (19,[low,light,none,light]), (20,[high,none,none,none])] let aa = hhaa hh rp uu "{(cloud,{heavy,light,none}),(pressure,{high,low,medium}),(rain,{heavy,light,none}),(wind,{light,none,strong})}" rp vv "{cloud,pressure,rain,wind}" rpln$ aall aa
"({(cloud,heavy),(pressure,low),(rain,heavy),(wind,light)},2 % 1)"
"({(cloud,heavy),(pressure,low),(rain,heavy),(wind,strong)},2 % 1)"
"({(cloud,light),(pressure,high),(rain,heavy),(wind,light)},1 % 1)"
"({(cloud,light),(pressure,high),(rain,light),(wind,strong)},1 % 1)"
"({(cloud,light),(pressure,low),(rain,light),(wind,none)},1 % 1)"
"({(cloud,light),(pressure,medium),(rain,heavy),(wind,strong)},1 % 1)"
"({(cloud,light),(pressure,medium),(rain,light),(wind,light)},1 % 1)"
"({(cloud,light),(pressure,medium),(rain,light),(wind,none)},1 % 1)"
"({(cloud,light),(pressure,medium),(rain,none),(wind,none)},1 % 1)"
"({(cloud,none),(pressure,high),(rain,none),(wind,light)},2 % 1)"
"({(cloud,none),(pressure,high),(rain,none),(wind,none)},3 % 1)"
"({(cloud,none),(pressure,low),(rain,light),(wind,light)},1 % 1)"
"({(cloud,none),(pressure,medium),(rain,light),(wind,light)},2 % 1)"
"({(cloud,none),(pressure,medium),(rain,none),(wind,none)},1 % 1)"

size aa
20 % 1


We considered the case where we wish to predict the rain given the pressure, cloud and wind in Transforms, by creating a transform which related cloud and wind,

let cloud_and_wind = VarStr "cloud_and_wind"

let tt = lltt [cloud,wind] [cloud_and_wind] [
[none, none, none],
[none, light, light],
[none, strong, light],
[light, none, light],
[light, light, light],
[light, strong, light],
[heavy, none, strong],
[heavy, light, strong],
[heavy, strong, strong]]



The derived, $A * T$, is

rpln $aall$ aa tmul tt
"({(cloud_and_wind,light)},12 % 1)"
"({(cloud_and_wind,none)},4 % 1)"
"({(cloud_and_wind,strong)},4 % 1)"

rpln $aarr$ norm $aa tmul tt "({(cloud_and_wind,light)},0.6)" "({(cloud_and_wind,none)},0.2)" "({(cloud_and_wind,strong)},0.2)"  The derived entropy,$\mathrm{entropy}(A * T)$, is let ent = histogramsEntropy ent (aa tmul tt) 0.9502705392332347  The derived entropy is positive and less than or equal to the logarithm of the derived volume,$0 \leq \mathrm{entropy}(A * T) \leq \ln w$, let w = 3 :: Double log w 1.0986122886681098  Complementary to the derived entropy is the expected component entropy,$\mathrm{entropyComponent}(A,T)$, let cent = transformsHistogramsEntropyComponent cent tt aa 1.603411018796562  The cartesian derived,$V^{\mathrm{C}} * T$, is let vvc = unit (cart uu vv) size vvc 81 % 1 rpln$ aall $vvc tmul tt "({(cloud_and_wind,light)},45 % 1)" "({(cloud_and_wind,none)},9 % 1)" "({(cloud_and_wind,strong)},27 % 1)" rpln$ aarr $norm$ vvc tmul tt
"({(cloud_and_wind,light)},0.5555555555555556)"
"({(cloud_and_wind,none)},0.1111111111111111)"
"({(cloud_and_wind,strong)},0.3333333333333333)"


The cartesian derived entropy, $\mathrm{entropy}(V^{\mathrm{C}} * T)$, is

ent (vvc tmul tt)
0.9368883075390159


The component size cardinality cross entropy, $\mathrm{entropyCross}(A * T,V^{\mathrm{C}} * T)$, is

let crent = histogramsHistogramsEntropyCross

crent (aa tmul tt) (vvc tmul tt)
1.0118393721421373

crent (aa tmul tt) (vvc tmul tt) >= ent (aa tmul tt)
True


The component cardinality size cross entropy, $\mathrm{entropyCross}(V^{\mathrm{C}} * T,A * T)$, is

crent (vvc tmul tt) (aa tmul tt)
0.9990977520629283

crent (vvc tmul tt) (aa tmul tt) >= ent (vvc tmul tt)
True


The sum of the derived and cartesian derived, $A * T + V^{\mathrm{C}} * T$, is

rpln $aall$ (aa tmul tt) add (vvc tmul tt)
"({(cloud_and_wind,light)},57 % 1)"
"({(cloud_and_wind,none)},13 % 1)"
"({(cloud_and_wind,strong)},31 % 1)"

rpln $aarr$ norm $(aa tmul tt) add (vvc tmul tt) "({(cloud_and_wind,light)},0.5643564356435643)" "({(cloud_and_wind,none)},0.12871287128712872)" "({(cloud_and_wind,strong)},0.3069306930693069)"  The component size cardinality sum cross entropy,$\mathrm{entropy}(A * T + V^{\mathrm{C}} * T)$, is ent ((aa tmul tt) add (vvc tmul tt)) 0.9492604450332509 ent ((aa tmul tt) add (vvc tmul tt)) <= log w True  The component size cardinality relative entropy,$\mathrm{entropyRelative}(A * T,V^{\mathrm{C}} * T)$, is the component size cardinality cross entropy minus the component size entropy,$\mathrm{entropyCross}(A * T,V^{\mathrm{C}} * T)~-~\mathrm{entropy}(A * T)$, crent (aa tmul tt) (vvc tmul tt) - ent (aa tmul tt) 6.156883290890258e-2  The component cardinality size relative entropy,$\mathrm{entropyRelative}(V^{\mathrm{C}} * T,A * T)$, is the component cardinality size cross entropy minus the component cardinality entropy,$\mathrm{entropyCross}(V^{\mathrm{C}} * T,A * T)~-~\mathrm{entropy}(V^{\mathrm{C}} * T), crent (vvc tmul tt) (aa tmul tt) - ent (vvc tmul tt) 6.2209444523912416e-2  The size-volume scaled component size cardinality sum relative entropy is the size-volume scaled component size cardinality sum cross entropy minus the size-volume scaled component size cardinality sum entropy, $\begin{eqnarray} (z+v) \times \mathrm{entropy}(A * T + V^{\mathrm{C}} * T) - z \times \mathrm{entropy}(A * T) - v \times \mathrm{entropy}(V^{\mathrm{C}} * T) \end{eqnarray}$ let z = fromRational (size aa) :: Double v = fromIntegral (vol uu vv) :: Double (z+v) * ent ((aa tmul tt) add (vvc tmul tt)) - z * ent (aa tmul tt) - v * ent (vvc tmul tt) 0.9819412530333693 (z+v) * log w 110.95984115547908  Define the abbreviation rent for the size-volume scaled component size cardinality sum relative entropy, let rent aa bb = let a = fromRational (size aa); b = fromRational (size bb) in (a+b) * ent (aa add bb) - a * ent aa - b * ent bb rent (aa tmul tt) (vvc tmul tt) 0.9819412530333693  It was shown that the alignment between cloud_and_wind and rain is greater than the alignments between any of cloud, wind or pressure and rain, algn aa red [pressure,rain]
4.27876667992199

algn $aa red [cloud,rain] 6.415037968300277 algn$ aa red [wind,rain]
3.9301313052733455

algn aa mul ttaa tt red [cloud_and_wind,rain] 6.743705970350529  Define the abbreviation tlalgn for the alignment of the derived variables and the label variables, let ared aa vv = setVarsHistogramsReduce vv aa tlalgn tt aa ll = algn (aa mul ttaa tt ared (der tt Set.union Set.fromList ll)) tlalgn tt aa [rain] 6.743705970350529  The alignments are all zero for a cartesian sample, algn vvc
0.0

algn $vvc tmul tt 0.0  and for the independent and formal, algn$ ind aa
0.0

algn $ind aa tmul tt 0.0  In the case of medium pressure, heavy cloud and light winds, the forecast for rain is heavy, let qq1 = hhaa$ llhh [pressure,cloud,wind] [(1,[medium,heavy,light])]

rpln $aarr$ query qq1 tt aa [rain]
"({(rain,heavy)},1.0)"


So the entropy for this query is zero,

ent $query qq1 tt aa [rain] -0.0  Compare this to the cartesian where all outcomes are equally probable, rpln$ aarr $query qq1 tt vvc [rain] "({(rain,heavy)},0.3333333333333333)" "({(rain,light)},0.3333333333333333)" "({(rain,none)},0.3333333333333333)" ent$ query qq1 tt vvc [rain]
1.0986122886681096


For some queries the model is ambiguous. For example, when the pressure is low, but there is no cloud and winds are light, the forecast is usually for light rain, but not always,

let qq2 = hhaa $llhh [pressure,cloud,wind] [(1,[low,none,light])] rpln$ aarr $query qq2 tt aa [rain] "({(rain,heavy)},0.16666666666666666)" "({(rain,light)},0.5833333333333334)" "({(rain,none)},0.25)"  In this case the entropy is higher, ent$ query qq2 tt aa [rain]
0.9596147939120492


but still lower than for the cartesian,

ent $query qq2 tt vvc [rain] 1.0986122886681096  If the normalised histogram,$\hat{A} \in \mathcal{A} \cap \mathcal{P}$, is treated as a probability function of a single-state query, the scaled label entropy, is $\begin{eqnarray} \sum_{(R,C) \in T^{-1}} (A * T)_R \times \mathrm{entropy}(A * C~\%~(V \setminus K)) \end{eqnarray}$ let tlent tt aa ll = setVarsTransformsHistogramsEntropyLabel (vars aa Set.difference (Set.fromList ll)) tt aa tlent tt aa [rain] 11.51537752694459  An idea of the scale of the label entropy can be obtained from the cartesian, z/v * tlent tt vvc [rain] 21.97224577336219  This is similar to the definition of the scaled expected component entropy,$z \times \mathrm{entropyComponent}(A,T), z * cent tt aa 32.06822037593124 z * cent tt vvc 69.15121694266844  The label entropy, may be contrasted with the alignment between the derived variables,W$, and the label variables,$V \setminus K$, $\mathrm{algn}(A * \mathrm{his}(T)~\%~(W \cup V \setminus K))$ algn$ aa mul ttaa tt red [cloud_and_wind,rain]
6.743705970350529


or

tlalgn tt aa [rain]
6.743705970350529


This may be compared to the diagonalised for an idea of scale,

algn $resize (size aa)$ regdiag 3 2
15.41314425103298


The label entropy, may also be compared to the slice entropy, which is the sum of the sized entropies of the contingent slices reduced to the label variables, $V \setminus K$, $\sum_{R \in (A\%K)^{\mathrm{FS}}} (A\%K)_R \times \mathrm{entropy}(A * \{R\}^{\mathrm{U}}~\%~(V \setminus K))$

let lent aa ll = setVarsHistogramsSliceEntropy (vars aa Set.difference (Set.fromList ll)) aa

lent aa [rain]
1.3862943611198906

z/v * lent vvc [rain]
21.97224577336218


That is, the model label entropy is much higher than the sample label entropy, but model queries may be applied to ineffective sample states.

Now let us compare the entropy properties of several models. First redefine the cloud_and_wind model as $T_{\mathrm{cw}}$,

let ttcw = tt


Now consider a model $T_{\mathrm{c}}$ which consists of a literal reframe of the cloud variable,

let cloud2 = VarStr "cloud2"

let ttc = lltt [cloud] [cloud2] [
[none, none],
[light, light],
[heavy, heavy]]

rpln $aarr$ norm $aa tmul ttc "({(cloud2,heavy)},0.2)" "({(cloud2,light)},0.35)" "({(cloud2,none)},0.45)" ent (aa tmul ttc) 1.0486537893593546  So the simpler model,$T_{\mathrm{c}}$, has higher derived entropy than$T_{\mathrm{cw}}. Consider the relative entropy, rent (aa tmul ttc) (vvc tmul ttc) 0.8099580712542576  Now consider the alignment between the derived variable and the label variable, tlalgn ttc aa [rain] 6.415037968300277 algn aa red [cloud,rain]
6.415037968300277


So the simpler model, $T_{\mathrm{c}}$, has both lower relative entropy and lower label alignment than $T_{\mathrm{cw}}$.

Now consider queries on the model,

let qq1 = hhaa $llhh [pressure,cloud,wind] [(1,[medium,heavy,light])] rpln$ aarr $query qq1 ttc aa [rain] "({(rain,heavy)},1.0)" let qq2 = hhaa$ llhh [pressure,cloud,wind] [(1,[low,none,light])]

rpln $aarr$ query qq2 ttc aa [rain]
"({(rain,light)},0.3333333333333333)"
"({(rain,none)},0.6666666666666666)"

tlent ttc aa [rain]
12.418526752441055


So the simpler model, $T_{\mathrm{c}}$, has higher label entropy than $T_{\mathrm{cw}}$. In short, the simpler model, $T_{\mathrm{c}}$, is generally a worse predictor of label than $T_{\mathrm{cw}}$.

Consider if a better predictor of the rain can be made by constructing a transform $T_{\mathrm{cp}}$ that relates cloud and pressure,

algn $aa red [pressure,cloud] 4.623278490123701 let cloud_and_pressure = VarStr "cloud_and_pressure" let ttcp = lltt [cloud,pressure] [cloud_and_pressure] [ [none, high, none], [none, medium, light], [none, low, light], [light, high, light], [light, medium, light], [light, low, light], [heavy, high, strong], [heavy, medium, strong], [heavy, low, strong]] rpln$ aarr $norm$ aa tmul ttcp
"({(cloud_and_pressure,light)},0.55)"
"({(cloud_and_pressure,none)},0.25)"
"({(cloud_and_pressure,strong)},0.2)"

ent (aa tmul ttcp)
0.9972715231823841


So the simpler model, $T_{\mathrm{cp}}$, has higher derived entropy than $T_{\mathrm{cw}}$, but not as high as $T_{\mathrm{c}}$.

Consider the relative entropy,

rent (aa tmul ttcp) (vvc tmul ttcp)
1.4736881918377236


Now consider the alignment between the derived variable and the label variable,

tlalgn ttcp aa [rain]
8.020893993593209


So the new model, $T_{\mathrm{cp}}$, has both higher relative entropy and higher label alignment than $T_{\mathrm{cw}}$, although the derived entropy is higher.

Now consider queries on the model,

rpln $aarr$ query qq1 ttcp aa [rain]
"({(rain,heavy)},1.0)"

rpln $aarr$ query qq2 ttcp aa [rain]
"({(rain,heavy)},0.18181818181818182)"
"({(rain,light)},0.6363636363636364)"
"({(rain,none)},0.18181818181818182)"

tlent ttcp aa [rain]
9.982888235155102


So the new model, $T_{\mathrm{cp}}$, has lower label entropy than $T_{\mathrm{cw}}$. In short, the new model, $T_{\mathrm{cp}}$, is generally a better predictor of label than $T_{\mathrm{cw}}$.

To summarise,

[ent (aa tmul tt) | tt <- [ttc, ttcw, ttcp]]
[1.0486537893593546,0.9502705392332347,0.9972715231823841]

[cent tt aa | tt <- [ttc, ttcw, ttcp]]
[1.5050277686704419,1.603411018796562,1.5564100348474128]

[rent (aa tmul tt) (vvc tmul tt) | tt <- [ttc, ttcw, ttcp]]
[0.8099580712542576,0.9819412530333693,1.4736881918377236]

[tlalgn tt aa [rain] | tt <- [ttc, ttcw, ttcp]]
[6.415037968300277,6.743705970350529,8.020893993593209]

[tlent tt aa [rain] | tt <- [ttc, ttcw, ttcp]]
[12.418526752441055,11.51537752694459,9.982888235155102]


The weather forecast example continues in Functional definition sets.

top