MUSH - Analysis of the UCI Machine Learning Repository Mushroom Data Set

Sections

Introduction

Properties of the sample

Predicting edibility without modelling

Predicting odor without modelling

Manual modelling of edibility

Induced modelling of edibility

Introduction

The UCI Machine Learning Repository Mushroom Data Set is a popular dataset often used to test machine learning algorithms (e.g. Kaggle).

The dataset consists of descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Mushroom Family drawn from The Audubon Society Field Guide to North American Mushrooms (1981). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like “leaflets three, let it be” for Poisonous Oak and Ivy.

The dataset contains 8124 events of 23 discrete-valued variables:

  1. cap-shape: bell,conical,convex,flat, knobbed,sunken
  2. cap-surface: fibrous,grooves,scaly,smooth
  3. cap-color: brown,buff,cinnamon,gray,green, pink,purple,red,white,yellow
  4. bruises: bruises,no
  5. odor: almond,anise,creosote,fishy,foul, musty,none,pungent,spicy
  6. gill-attachment: attached,descending,free,notched
  7. gill-spacing: close,crowded,distant
  8. gill-size: broad,narrow
  9. gill-color: black,brown,buff,chocolate,gray, green,orange,pink,purple,red, white,yellow
  10. stalk-shape: enlarging,tapering
  11. stalk-root: bulbous,club,cup,equal, rhizomorphs,rooted,missing
  12. stalk-surface-above-ring: fibrous,scaly,silky,smooth
  13. stalk-surface-below-ring: fibrous,scaly,silky,smooth
  14. stalk-color-above-ring: brown,buff,cinnamon,gray,orange, pink,red,white,yellow
  15. stalk-color-below-ring: brown,buff,cinnamon,gray,orange, pink,red,white,yellow
  16. veil-type: partial,universal
  17. veil-color: brown,orange,white,yellow
  18. ring-number: none,one,two
  19. ring-type: cobwebby,evanescent,flaring,large, none,pendant,sheathing,zone
  20. spore-print-color: black,brown,buff,chocolate,green, orange,purple,white,yellow
  21. population: abundant,clustered,numerous, scattered,several,solitary
  22. habitat: grasses,leaves,meadows,paths, urban,waste,woods
  23. edibility: edible, poisonous

Note that although edibility is a secondary quality or classification, we shall treat it here as we would any other variable.

We shall analyse this dataset using the MUSH repository which depends on the AlignmentRepa repository. The AlignmentRepa repository is a fast Haskell implementation of some of the practicable inducers described in the paper. The code in this section can be executed by copying and pasting the code into a Haskell interpreter, see README. Also see the Introduction in Notation.

Properties of the sample

First load the sample $A$,

:l MUSHDev

(uu,aa) <- mushIO

let vv = uvars uu
let vvl = sgl (VarStr "edible")
let vvk = vv `minus` vvl

The system is $U$. The sample substrate variables are $V = \mathrm{vars}(A)$, the label variables are $V_{\mathrm{l}} = \{\mathrm{edible}\}$, and the query variables form the remainder, $V_{\mathrm{k}} = V \setminus V_{\mathrm{l}}$.

The variable valencies are $\{(w,|U_w|) : w \in V\}$,

rpln $ sort [(u,w) | w <- qqll vv, let u = vol uu (sgl w)]
"(1,veil-type)"
"(2,bruises)"
"(2,edible)"
"(2,gill-attachment)"
"(2,gill-size)"
"(2,gill-spacing)"
"(2,stalk-shape)"
"(3,ring-number)"
"(4,cap-surface)"
"(4,stalk-surface-above-ring)"
"(4,stalk-surface-below-ring)"
"(4,veil-color)"
"(5,ring-type)"
"(5,stalk-root)"
"(6,cap-shape)"
"(6,population)"
"(7,habitat)"
"(9,odor)"
"(9,spore-print-color)"
"(9,stalk-color-above-ring)"
"(9,stalk-color-below-ring)"
"(10,cap-color)"
"(12,gill-color)"

Note that veil-type has only one value and so is a constant.

The variable dimension, $|V|$, is,

card vv
23

The variable volume, $|V^{\mathrm{C}}|$, is,

vol uu vv
243799621632000

So the mean valency, $|V^{\mathrm{C}}|^{1/|V|}$, is,

exp $ log (fromIntegral (vol uu vv)) / fromIntegral (card vv)
4.222048084120202

The label variable dimension, $|V_{\mathrm{l}}|$, is,

card vvl
1

The label variable volume, $|V_{\mathrm{l}}^{\mathrm{C}}|$, is,

vol uu vvl
2

The query variable dimension, $|V_{\mathrm{k}}|$, is,

card vvk
22

The query variable volume, $|V_{\mathrm{k}}^{\mathrm{C}}|$, is,

vol uu vvk
121899810816000

The geometric mean query valency, $|V_{\mathrm{k}}^{\mathrm{C}}|^{1/|V_{\mathrm{k}}|}$, is,

exp $ log (fromIntegral (vol uu vvk)) / fromIntegral (card vvk)
4.367901791531438

The sample size, $\mathrm{size}(A)$, is

size aa
8124 % 1

So each effective state corresponds to exactly one event, $A = A^{\mathrm{F}}$,

size $ eff aa
8124 % 1

Now consider how highly aligned variables might be grouped together. See Entropy and alignment. First consider pairs in the substrate, $V$, \[ \{(\mathrm{algn}(A\%\{w,x\}),~w,~x) : w \in V,~x \in V,~w < x\} \]

rpln $ reverse $ sort [(algn (aa `red` llqq [w,x]),w,x) | w <- qqll vv, x <- qqll vv, w < x]
"(5255.546241914184,odor,spore-print-color)"
"(5243.485309506665,gill-color,spore-print-color)"
"(5076.0182810184415,edible,odor)"
"(4869.445702663863,spore-print-color,stalk-root)"
"(4747.650540191724,gill-color,odor)"
"(4634.640609017435,odor,stalk-root)"
"(4538.425095211103,ring-type,spore-print-color)"
"(4504.522900434204,gill-color,stalk-root)"
"(4319.357344740492,gill-color,ring-type)"
"(4191.8793461894675,odor,ring-type)"
"(3876.2346155391206,population,stalk-root)"
"(3792.536734874855,habitat,stalk-root)"
"(3631.9071754705874,ring-type,stalk-root)"
"(3594.3100532199896,stalk-color-above-ring,stalk-color-below-ring)"
"(3580.005670110324,habitat,population)"
"(3526.338391588135,gill-color,habitat)"
...
"(40.29419779898308,bruises,stalk-shape)"
"(34.93679977406282,gill-attachment,gill-spacing)"
"(27.046821925505355,gill-spacing,stalk-shape)"
"(25.072123641512007,cap-surface,stalk-shape)"
"(24.98472755986586,bruises,ring-number)"
"(23.840310585706902,cap-shape,gill-spacing)"
"(10.583146539713198,ring-number,veil-color)"
"(0.0,veil-color,veil-type)"
"(0.0,stalk-surface-below-ring,veil-type)"
...
"(0.0,cap-color,veil-type)"
"(0.0,bruises,veil-type)"

We can see that all of the variables except for mono-valent veil-type are aligned with each other, even if only very weakly. We can also see that some of the variables that are in highly aligned pairs are also in other highly aligned pairs, e.g. odor or spore-print-color. This suggests that we should also consider tuple dimensions greater than two.

Now consider using the tupler to group together highly aligned variables in the substrate, $V$. Note that for performance reasons we must first construct a HistoryRepa from the sample histogram, $A$. See History and HistoryRepa.

First consider the tuple dimension by choosing a volume limit, xmax,

10*12
120

4.367901791531438 ** 4
363.9916829234716

2*2*2*2*2*2*3*4
768

9*9*10
810

9*10*12
1080

4.222048084120202 ** 5
1341.5778383137888

4.367901791531438 ** 5
1589.879923943975

2*2*2*2*2*2*3*4*4
3072

size aa
8124 % 1

9*9*10*12
9720

Now create a shuffled sample, $A_{\mathrm{r}}$,

let hh = aahr uu aa

let hhr = historyRepasShuffle_u hh 1

hrsize hhr
8124

The shuffle has the same size as the sample, $\mathrm{size}(A_{\mathrm{r}}) = \mathrm{size}(A)$.

Now optimise the shuffle content alignment with the tuple set builder, $I_{P,U,\mathrm{B,ns,me}}$, \[ \{(\mathrm{algn}(A\%K)-\mathrm{algn}(A_{\mathrm{r}}\%K),~K) : ((K,\cdot,\cdot),\cdot) \in I_{P,U,\mathrm{B,ns,me}}^{ * }((V,~\emptyset,~A,~A_{\mathrm{r}}))\} \]

let buildtup xmax omax bmax uu vv xx xxrr = reverse $ sort $ map (\((kk,_),_) -> (algn (araa uu (xx `hrred` kk)) - algn (araa uu (xxrr `hrred` kk)), kk)) $ parametersSystemsBuilderTupleNoSumlayerMultiEffectiveRepa_u xmax omax bmax 1 uu vv fudEmpty xx (hrhx xx) xxrr (hrhx xxrr)

rpln $ buildtup 1590 10 10 uu vv hh hhr 
"(20876.42056216617,{bruises,edible,odor,ring-type,stalk-root})"
"(20724.85696482678,{bruises,odor,ring-type,stalk-root,stalk-shape})"
"(20592.98543254341,{bruises,gill-size,odor,ring-type,stalk-root})"
"(18767.160263973958,{bruises,gill-spacing,odor,ring-type,stalk-root})"
"(17613.764146912337,{habitat,ring-type,spore-print-color,stalk-root})"
"(17513.385227253526,{habitat,odor,ring-type,stalk-root})"
"(17350.660566624287,{bruises,odor,ring-number,ring-type,stalk-root})"
"(17328.479593340562,{edible,odor,spore-print-color,stalk-root})"
"(16741.886141138097,{odor,population,ring-type,stalk-root})"
"(16513.36748368214,{bruises,gill-attachment,odor,ring-type,stalk-root})"

We can see that the top tuples have large intersections. Now optimise again having removed the top tuple from the substrate, \[ Q_1~=~\{\mathrm{bruises},~\mathrm{edible},~\mathrm{odor},~\mathrm{ring type},~\mathrm{stalk root}\} \] and \[ \{(\mathrm{algn}(A\%K)-\mathrm{algn}(A_{\mathrm{r}}\%K),~K) : ((K,\cdot,\cdot),\cdot) \in I_{P,U,\mathrm{B,ns,me}}^{ * }((V \setminus Q_1,~\emptyset,~A,~A_{\mathrm{r}}))\} \]

let qq1 = llqq $ map VarStr ["bruises","edible","odor","ring-type","stalk-root"]

rpln $ buildtup 1590 10 10 uu (vv `minus` qq1) hh hhr 
"(13524.714394562914,{gill-color,gill-size,gill-spacing,spore-print-color,stalk-shape})"
"(13437.808617574647,{gill-color,gill-size,ring-number,spore-print-color,stalk-shape})"
"(13136.927246387597,{gill-color,habitat,spore-print-color,stalk-shape})"
"(12773.283370139885,{gill-color,gill-size,habitat,spore-print-color})"
"(12435.740820820381,{gill-color,population,spore-print-color,stalk-shape})"
"(12195.043388118738,{gill-attachment,gill-color,gill-size,spore-print-color,stalk-shape})"
"(12128.0816369985,{gill-color,gill-size,population,spore-print-color})"
"(12020.86082016887,{gill-color,spore-print-color,stalk-shape,stalk-surface-below-ring})"
"(11594.659887811722,{gill-color,spore-print-color,stalk-shape,stalk-surface-above-ring})"
"(11566.386900293212,{gill-color,habitat,population,stalk-shape})"

Now optimise again having removed the top two tuples from the substrate, \[ Q_2~=~\{\mathrm{gill color},~\mathrm{gill size},~\mathrm{gill spacing},~\mathrm{spore print color},~\mathrm{stalk shape}\} \] and \[ \{(\mathrm{algn}(A\%K)-\mathrm{algn}(A_{\mathrm{r}}\%K),~K) : ((K,\cdot,\cdot),\cdot) \in I_{P,U,\mathrm{B,ns,me}}^{ * }((V \setminus Q_1 \setminus Q_2,~\emptyset,~A,~A_{\mathrm{r}}))\} \]

let qq2 = llqq $ map VarStr ["gill-color","gill-size","gill-spacing","spore-print-color","stalk-shape"]

rpln $ buildtup 1590 10 10 uu (vv `minus` qq1 `minus` qq2) hh hhr 
"(10110.48213557032,{habitat,population,stalk-color-below-ring,stalk-surface-below-ring})"
"(9991.599621423276,{habitat,population,stalk-color-above-ring,stalk-surface-below-ring})"
"(9672.89435286382,{habitat,population,stalk-color-below-ring,stalk-surface-above-ring})"
"(9590.642827336735,{habitat,population,stalk-color-above-ring,stalk-surface-above-ring})"
"(9322.604386487947,{stalk-color-above-ring,stalk-color-below-ring,stalk-surface-above-ring,stalk-surface-below-ring})"
"(8589.774390793347,{cap-surface,habitat,population,stalk-color-below-ring})"
"(8552.729628857698,{cap-surface,habitat,population,stalk-color-above-ring})"
"(8236.321978675958,{cap-color,habitat,population,ring-number})"
"(8086.979898530419,{habitat,population,ring-number,stalk-color-below-ring})"
"(8076.805728953208,{habitat,population,ring-number,stalk-color-above-ring})"

This time if we remove the union of the top four tuples we terminate at the remainder variables, \[ Q_3~=~\{\mathrm{habitat},~\mathrm{population},…,~\mathrm{stalk surface above ring}\} \] and \[ \{(\mathrm{algn}(A\%K)-\mathrm{algn}(A_{\mathrm{r}}\%K),~K) : ((K,\cdot,\cdot),\cdot) \in I_{P,U,\mathrm{B,ns,me}}^{ * }((V \setminus Q_1 \setminus Q_2 \setminus Q_3,~\emptyset,~A,~A_{\mathrm{r}}))\} \]

let qq3 = llqq $ map VarStr ["habitat","population","stalk-color-below-ring","stalk-surface-below-ring","stalk-color-above-ring","stalk-surface-above-ring"]

rp (vv `minus` qq1 `minus` qq2 `minus` qq3)
"{cap-color,cap-shape,cap-surface,gill-attachment,ring-number,veil-color,veil-type}"

rpln $ buildtup 1590 10 10 uu (vv `minus` qq1 `minus` qq2 `minus` qq3) hh hhr
"(2979.6056462130255,{cap-color,cap-shape,cap-surface,gill-attachment,ring-number})"
"(2622.507907420848,{cap-color,cap-shape,gill-attachment,ring-number,veil-color})"
"(2560.129785055615,{cap-color,cap-shape,cap-surface,ring-number})"
"(2438.174819843647,{cap-color,cap-surface,gill-attachment,ring-number,veil-color})"
"(1952.2866086926588,{cap-color,cap-shape,cap-surface,gill-attachment})"
"(1936.9350626637206,{cap-color,cap-shape,cap-surface,veil-color})"
"(1831.4923688692215,{cap-color,cap-shape,gill-attachment,ring-number})"
"(1807.150150304944,{cap-color,cap-surface,gill-attachment,veil-color})"
"(1774.069884038865,{cap-color,cap-shape,ring-number,veil-color})"
"(1707.9959542906727,{cap-color,cap-shape,gill-attachment,veil-color})"

That is, there is a possible partition of the substrate as follows, $\bigcup\{Q_1,~Q_2,~Q_3,~V \setminus \{Q_1,Q_2,Q_3\}\} = V$,

rp qq1 
"{bruises,edible,odor,ring-type,stalk-root}"

rp qq2
"{gill-color,gill-size,gill-spacing,spore-print-color,stalk-shape}"

rp qq3
"{habitat,population,stalk-color-above-ring,stalk-color-below-ring,stalk-surface-above-ring,stalk-surface-below-ring}"

rp (vv `minus` qq1 `minus` qq2 `minus` qq3)
"{cap-color,cap-shape,cap-surface,gill-attachment,ring-number,veil-color,veil-type}"

We can check to see if the shuffle size is sufficient by optimising with a different shuffle,

let hhr = historyRepasShuffle_u hh 3

rpln $ buildtup 1590 10 10 uu vv hh hhr 
"(20898.818800009085,{bruises,edible,odor,ring-type,stalk-root})"
"(20734.9695740452,{bruises,odor,ring-type,stalk-root,stalk-shape})"
"(20591.92695959704,{bruises,gill-size,odor,ring-type,stalk-root})"
"(18791.851779468314,{bruises,gill-spacing,odor,ring-type,stalk-root})"
"(17627.706836273148,{habitat,ring-type,spore-print-color,stalk-root})"
"(17535.496226738054,{habitat,odor,ring-type,stalk-root})"
"(17355.91158249261,{bruises,odor,ring-number,ring-type,stalk-root})"
"(17341.73529314623,{edible,odor,spore-print-color,stalk-root})"
"(16762.475302884053,{odor,population,ring-type,stalk-root})"
"(16516.131296383355,{bruises,gill-attachment,odor,ring-type,stalk-root})"

We can see that this partition is not affected by the shuffle seed.

Predicting edibility without modelling

The sample query variables predict edibility. That is, there is a functional or causal relationship between the query variables and the label variables, $(A\%V_{\mathrm{k}})^{\mathrm{FS}} \to (A\%V_{\mathrm{l}})^{\mathrm{FS}}$. So the label entropy or query conditional entropy is zero. See Entropy and alignment. In this case, where $V = V_{\mathrm{k}} \cup V_{\mathrm{l}}$, the label entropy is \[ \begin{eqnarray} \mathrm{entropy}(A) - \mathrm{entropy}(A~\%~V_{\mathrm{k}})~=~0 \end{eqnarray} \] More generally, define \[ \begin{eqnarray} \mathrm{lent}(A,W,V_{\mathrm{l}})~:=~\mathrm{entropy}(A~\%~(W \cup V_{\mathrm{l}})) - \mathrm{entropy}(A~\%~W) \end{eqnarray} \]

let lent aa ww vvl = ent (aa `red` (ww `union` vvl)) - ent (aa `red` ww)

Then $\mathrm{lent}(A,V_{\mathrm{k}},V_{\mathrm{l}}) = 0$,

lent aa vvk vvl
0.0

We can determine which of the query variables has the least conditional entropy, \[ \begin{eqnarray} \{(\mathrm{lent}(A,\{w\},V_{\mathrm{l}}),~w) : w \in V_{\mathrm{k}}\} \end{eqnarray} \]

rpln $ sort [(lent aa (sgl w) vvl, w) | w <- qqll vvk]
"(6.445777995546464e-2,odor)"
"(0.3593018375305004,spore-print-color)"
"(0.4034743011923405,gill-color)"
"(0.47206538234114515,ring-type)"
"(0.49514434957356657,stalk-surface-above-ring)"
"(0.504038208263087,stalk-surface-below-ring)"
"(0.516549029620998,stalk-color-above-ring)"
"(0.5251645766232356,stalk-color-below-ring)"
"(0.5329702396776962,gill-size)"
"(0.5525144643974396,population)"
"(0.5591537977521386,bruises)"
"(0.5837923250560273,habitat)"
"(0.5990526304940014,stalk-root)"
"(0.6225742013519671,gill-spacing)"
"(0.6586777995379725,cap-shape)"
"(0.6658477366342479,ring-number)"
"(0.6675136370489365,cap-color)"
"(0.6726838566664071,cap-surface)"
"(0.6759923983315359,veil-color)"
"(0.6826826472037806,gill-attachment)"
"(0.6872908661915269,stalk-shape)"
"(0.6925010959051001,veil-type)"

This may be compared to the entropy of the label variables, $\mathrm{entropy}(A\%V_{\mathrm{l}})$,

ent $ aa `red` vvl
0.6925010959051001

Mono-valent veil-type has the highest conditional entropy. In fact, it is equal to the entropy of the label variables, and so makes no prediction of edibility, $\mathrm{lent}(A,\{\mathrm{veil type}\},V_{\mathrm{l}}) = \mathrm{entropy}(A\%V_{\mathrm{l}})$.

By contrast, odor has the least conditional entropy by quite a margin. Odor is highly predictive of edibility. Its label entropy is $\mathrm{lent}(A,\{\mathrm{odor}\},V_{\mathrm{l}})$,

let odor = VarStr "odor"

lent aa (sgl odor) vvl
6.445777995546464e-2

Let us reduce the sample, $A~\%~(\{\mathrm{odor}\} \cup V_{\mathrm{l}})$, to see the relationship,

rpln $ aall $ aa `red` (sgl odor `union` vvl)
"({(edible,edible),(odor,almond)},400 % 1)"
"({(edible,edible),(odor,anise)},400 % 1)"
"({(edible,edible),(odor,none)},3408 % 1)"
"({(edible,poisonous),(odor,creosote)},192 % 1)"
"({(edible,poisonous),(odor,fishy)},576 % 1)"
"({(edible,poisonous),(odor,foul)},2160 % 1)"
"({(edible,poisonous),(odor,musty)},36 % 1)"
"({(edible,poisonous),(odor,none)},120 % 1)"
"({(edible,poisonous),(odor,pungent)},256 % 1)"
"({(edible,poisonous),(odor,spicy)},576 % 1)"

rpln $ qqll $ ssplit vvk $ states (aa `red` (sgl odor `union` vvl))
"({(odor,almond)},{(edible,edible)})"
"({(odor,anise)},{(edible,edible)})"
"({(odor,creosote)},{(edible,poisonous)})"
"({(odor,fishy)},{(edible,poisonous)})"
"({(odor,foul)},{(edible,poisonous)})"
"({(odor,musty)},{(edible,poisonous)})"
"({(odor,none)},{(edible,edible)})"
"({(odor,none)},{(edible,poisonous)})"
"({(odor,pungent)},{(edible,poisonous)})"
"({(odor,spicy)},{(edible,poisonous)})"

Only value none is ambiguous.

Odor and edibility are also highly aligned, $\mathrm{algn}(A~\%~(\{\mathrm{odor}\} \cup V_{\mathrm{l}}))$,

algn $ aa `red` (sgl odor `union` vvl)
5076.0182810184415

which suggests that relationship tends to be bijective or functional/causal in both directions. That is, edibility is also somewhat predictive of odor. The label entropy in the opposite direction is $\mathrm{lent}(A,V_{\mathrm{l}},\{\mathrm{odor}\})$,

lent aa vvl (sgl odor) 
0.9796522676447261

ent $ aa `red` (sgl odor) 
1.6076955835943616

rpln $ qqll $ ssplit vvl $ states (aa `red` (sgl odor `union` vvl))
"({(edible,edible)},{(odor,almond)})"
"({(edible,edible)},{(odor,anise)})"
"({(edible,edible)},{(odor,none)})"
"({(edible,poisonous)},{(odor,creosote)})"
"({(edible,poisonous)},{(odor,fishy)})"
"({(edible,poisonous)},{(odor,foul)})"
"({(edible,poisonous)},{(odor,musty)})"
"({(edible,poisonous)},{(odor,none)})"
"({(edible,poisonous)},{(odor,pungent)})"
"({(edible,poisonous)},{(odor,spicy)})"

Now, however, both values edible and poisonous are ambiguous.

We can determine minimum subsets of the query variables that are causal or predictive by using the repa conditional entropy tuple set builder. The conditional entropy minimisation searches for the set of tuples with the least label entropy. We show the resultant tuples along with their label entropies, \[ \{(\mathrm{lent}(A,M,V_{\mathrm{l}}),~M) : M \in \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,A,\mathrm{L}}))\} \]

let buildcondrr vvl aa kmax omax qmax = sort $ map (\(a,b) -> (b,a)) $ Map.toList $ fromJust $ parametersBuilderConditionalVarsRepa kmax omax qmax vvl aa

let (kmax,omax,qmax) = (1, 5, 5)

rpln $ buildcondrr vvl hh kmax omax qmax
"(6.445777995546464e-2,{odor})"
"(0.3593018375305004,{spore-print-color})"
"(0.4034743011923405,{gill-color})"
"(0.47206538234114515,{ring-type})"
"(0.49514434957356657,{stalk-surface-above-ring})"

let (kmax,omax,qmax) = (2, 5, 5)

rpln $ buildcondrr vvl hh kmax omax qmax
"(2.082990753054048e-2,{odor,spore-print-color})"
"(3.6279340021870166e-2,{cap-color,odor})"
"(3.849645066616292e-2,{gill-color,odor})"
"(4.566068343589347e-2,{odor,stalk-shape})"
"(4.619210440231036e-2,{odor,stalk-color-below-ring})"

All of the multi-variate tuples contain odor.

let (kmax,omax,qmax) = (3, 5, 5)

rpln $ buildcondrr vvl hh kmax omax qmax
"(6.893838773649907e-3,{habitat,odor,spore-print-color})"
"(8.190808632908553e-3,{gill-size,odor,spore-print-color})"
"(8.190808632908997e-3,{odor,ring-number,spore-print-color})"
"(8.951100722878191e-3,{odor,spore-print-color,stalk-surface-below-ring})"
"(9.511843838571288e-3,{cap-color,odor,spore-print-color})"

let (kmax,omax,qmax) = (4, 5, 5)

rpln $ buildcondrr vvl hh kmax omax qmax
"(-8.881784197001252e-16,{habitat,odor,population,spore-print-color})"
"(1.880396361283232e-3,{cap-color,habitat,odor,spore-print-color})"
"(1.8803963612867847e-3,{habitat,odor,spore-print-color,stalk-color-below-ring})"
"(2.215007955169046e-3,{odor,ring-number,spore-print-color,stalk-surface-above-ring})"
"(2.215007955169046e-3,{odor,ring-number,spore-print-color,stalk-surface-below-ring})"

So the minimum tuple dimension that is causal or predictive is 4. Let this tuple be $X$, \[ X~=~\{\mathrm{habitat},~\mathrm{odor},~\mathrm{population},~\mathrm{spore print color}\} \]

let xx = llqq $ map VarStr ["habitat","odor","population","spore-print-color"]

card xx
4

The label entropy, $\mathrm{lent}(A,X,V_{\mathrm{l}})$, rounds to zero,

lent aa xx vvl
-8.881784197001252e-16

That is, there is a functional or causal relationship between the tuple, $X$, and the label variables, $(A\%X)^{\mathrm{FS}} \to (A\%V_{\mathrm{l}})^{\mathrm{FS}}$.

This tuple has a volume of $|X^{\mathrm{C}}| = 3402$,

vol uu xx
3402

but classifies the sample into only $|(A~\%~(X \cup V_{\mathrm{l}}))^{\mathrm{F}}| = |(A\%X)^{\mathrm{F}}| = 96$ effective states or slices,

rpln $ aall $ aa `red` (xx `union` vvl)
"({(edible,edible),(habitat,grasses),(odor,almond),(population,numerous),(spore-print-color,black)},32 % 1)"
"({(edible,edible),(habitat,grasses),(odor,almond),(population,numerous),(spore-print-color,brown)},32 % 1)"
"({(edible,edible),(habitat,grasses),(odor,almond),(population,scattered),(spore-print-color,black)},44 % 1)"
...
"({(edible,poisonous),(habitat,woods),(odor,musty),(population,clustered),(spore-print-color,white)},36 % 1)"
"({(edible,poisonous),(habitat,woods),(odor,none),(population,several),(spore-print-color,white)},32 % 1)"
"({(edible,poisonous),(habitat,woods),(odor,spicy),(population,several),(spore-print-color,white)},192 % 1)"

size $ eff $ aa `red` (xx `union` vvl)
96 % 1

rpln $ qqll $ ssplit vvk $ states (aa `red` (xx `union` vvl))
"({(habitat,grasses),(odor,almond),(population,numerous),(spore-print-color,black)},{(edible,edible)})"
"({(habitat,grasses),(odor,almond),(population,numerous),(spore-print-color,brown)},{(edible,edible)})"
"({(habitat,grasses),(odor,almond),(population,scattered),(spore-print-color,black)},{(edible,edible)})"
...
"({(habitat,woods),(odor,none),(population,solitary),(spore-print-color,chocolate)},{(edible,edible)})"
"({(habitat,woods),(odor,none),(population,solitary),(spore-print-color,white)},{(edible,edible)})"
"({(habitat,woods),(odor,spicy),(population,several),(spore-print-color,white)},{(edible,poisonous)})"

Let us consider whether a predictive tuple exists that excludes odor. Let $V_{\mathrm{k2}} = V_{\mathrm{k}} \setminus \{\mathrm{odor}\}$,

let vvk2 = vvk `minus` sgl odor

The reduced sample excluding odor is $A_2 = A~\%~(V_{\mathrm{k2}} \cup V_{\mathrm{l}})$. Repeat the conditional entropy minimisation, but with the reduced sample, \[ \{(\mathrm{lent}(A_2,M,V_{\mathrm{l}}),~M) : M \in \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,A_2,\mathrm{L}}))\} \]

let hrhrred hh vv = setVarsHistoryRepasHistoryRepaReduced vv hh

let hh2 = hh `hrhrred` (vvk2 `union` vvl)

let (kmax,omax,qmax) = (1, 5, 5)

rpln $ buildcondrr vvl hh2 kmax omax qmax
"(0.3593018375305004,{spore-print-color})"
"(0.4034743011923405,{gill-color})"
"(0.47206538234114515,{ring-type})"
"(0.49514434957356657,{stalk-surface-above-ring})"
"(0.504038208263087,{stalk-surface-below-ring})"

let (kmax,omax,qmax) = (4, 5, 5)

rpln $ buildcondrr vvl hh2 kmax omax qmax
"(0.0,{bruises,gill-size,spore-print-color,stalk-root})"
"(1.269419754079415e-2,{population,spore-print-color,stalk-root,stalk-shape})"
"(1.2695876064319211e-2,{cap-surface,gill-size,spore-print-color,stalk-root})"
"(1.3792239141938722e-2,{cap-surface,spore-print-color,stalk-root,stalk-shape})"
"(1.5268666719201907e-2,{bruises,spore-print-color,stalk-root,stalk-shape})"

In fact, there is another tetra-variate tuple that is causal or predictive of edibility. Let this tuple be $Y$, \[ Y~=~\{\mathrm{bruises},~\mathrm{gill size},~\mathrm{spore print color},~\mathrm{stalk root}\} \]

let yy = llqq $ map VarStr ["bruises","gill-size","spore-print-color","stalk-root"]

card yy
4

lent aa yy vvl
0.0

That is, there is a functional or causal relationship between the tuple and the label variables, $(A\%Y)^{\mathrm{FS}} \to (A\%V_{\mathrm{l}})^{\mathrm{FS}}$.

This tuple has a smaller volume of $|Y^{\mathrm{C}}| = 180$,

vol uu yy
180

and classifies the sample into only $|(A\%Y)^{\mathrm{F}}| = 33$ effective states or slices,

rpln $ aall $ aa `red` (yy `union` vvl)
"({(bruises,bruises),(edible,edible),(gill-size,broad),(spore-print-color,black),(stalk-root,bulbous)},864 % 1)"
"({(bruises,bruises),(edible,edible),(gill-size,broad),(spore-print-color,black),(stalk-root,club)},256 % 1)"
"({(bruises,bruises),(edible,edible),(gill-size,broad),(spore-print-color,black),(stalk-root,rooted)},96 % 1)"
"({(bruises,bruises),(edible,edible),(gill-size,broad),(spore-print-color,brown),(stalk-root,bulbous)},864 % 1)"
...
"({(bruises,no),(edible,poisonous),(gill-size,narrow),(spore-print-color,brown),(stalk-root,bulbous)},96 % 1)"
"({(bruises,no),(edible,poisonous),(gill-size,narrow),(spore-print-color,white),(stalk-root,club)},8 % 1)"
"({(bruises,no),(edible,poisonous),(gill-size,narrow),(spore-print-color,white),(stalk-root,missing)},1760 % 1)"

size $ eff $ aa `red` (yy `union` vvl)
33 % 1

rpln $ qqll $ ssplit vvk $ states (aa `red` (yy `union` vvl))
"({(bruises,bruises),(gill-size,broad),(spore-print-color,black),(stalk-root,bulbous)},{(edible,edible)})"
"({(bruises,bruises),(gill-size,broad),(spore-print-color,black),(stalk-root,club)},{(edible,edible)})"
"({(bruises,bruises),(gill-size,broad),(spore-print-color,black),(stalk-root,rooted)},{(edible,edible)})"
"({(bruises,bruises),(gill-size,broad),(spore-print-color,brown),(stalk-root,bulbous)},{(edible,edible)})"
...
"({(bruises,no),(gill-size,narrow),(spore-print-color,chocolate),(stalk-root,missing)},{(edible,edible)})"
"({(bruises,no),(gill-size,narrow),(spore-print-color,white),(stalk-root,bulbous)},{(edible,edible)})"
"({(bruises,no),(gill-size,narrow),(spore-print-color,white),(stalk-root,club)},{(edible,poisonous)})"
"({(bruises,no),(gill-size,narrow),(spore-print-color,white),(stalk-root,missing)},{(edible,poisonous)})"

The tuples only share one variable, $X \cap Y$,

rp $ xx `inter` yy
"{spore-print-color}"

We can continue on by excluding spore-print-color,

let vvk3 = vvk2 `minus` sgl (VarStr "spore-print-color")

let hh3 = hh `hrhrred` (vvk3 `union` vvl)

let (kmax,omax,qmax) = (1, 5, 5)

rpln $ buildcondrr vvl hh3 kmax omax qmax
"(0.4034743011923405,{gill-color})"
"(0.47206538234114515,{ring-type})"
"(0.49514434957356657,{stalk-surface-above-ring})"
"(0.504038208263087,{stalk-surface-below-ring})"
"(0.516549029620998,{stalk-color-above-ring})"

let (kmax,omax,qmax) = (5, 5, 20)

rpln $ buildcondrr vvl hh3 kmax omax qmax
"(-8.881784197001252e-16,{bruises,habitat,population,ring-type,stalk-root})"
"(-4.440892098500626e-16,{bruises,gill-size,habitat,ring-type,stalk-root})"
"(-4.440892098500626e-16,{bruises,habitat,ring-number,ring-type,stalk-root})"
"(-4.440892098500626e-16,{bruises,habitat,ring-type,stalk-root,stalk-surface-above-ring})"
"(-4.440892098500626e-16,{bruises,habitat,ring-type,stalk-root,stalk-surface-below-ring})"
"(3.1735493851989816e-3,{bruises,gill-color,habitat,stalk-root})"
"(4.134518654213437e-3,{bruises,habitat,ring-type,stalk-root})"
"(2.296408727224719e-2,{habitat,ring-type,stalk-root,stalk-shape})"
...

Now the tuple dimension is 5, but there are several variations.

We can see that there are multiple subsets of the query variables, $V_{\mathrm{k}}$, not necessarily including either odor or spore-print-color, that can predict the label variables or edibility, $V_{\mathrm{l}} = \{\mathrm{edible}\}$. For example, the tuple $X \subset V_{\mathrm{k}}$ or the tuple $Y \subset V_{\mathrm{k}}$.

Predicting odor without modelling

Now consider if there are tuples that can predict variable odor rather than variable edible. Let $V_{\mathrm{l2}} = \{\mathrm{odor}\}$,

let vvl2 = sgl odor

The entropy is $\mathrm{entropy}(A\%V_{\mathrm{l2}})$,

ent $ aa `red` vvl2
1.6076955835943616

The label entropy is $\mathrm{lent}(A,V_{\mathrm{k2}},V_{\mathrm{l2}})$,

lent aa vvk2 vvl2
0.3019349802152096

The label entropy is non-zero, so odor cannot be perfectly predicted even with all of the query variables. If we add edible to the query variables, the odor is still ambiguous, $\mathrm{lent}(A,~V \setminus V_{\mathrm{l2}},~V_{\mathrm{l2}}) > 0$,

lent aa (vv `minus` vvl2) vvl2
0.3019349802152096

A tetra-variate tuple obtains most of what causality there is,

let hh4 = hh `hrhrred` (vvk2 `union` vvl2)

let (kmax,omax,qmax) = (1, 5, 5)

rpln $ buildcondrr vvl2 hh4 kmax omax qmax
"(0.9477982525833584,{spore-print-color})"
"(1.006410004040864,{gill-color})"
"(1.0284970138083624,{stalk-root})"
"(1.083608451873597,{ring-type})"
"(1.2126735369913886,{cap-color})"

let (kmax,omax,qmax) = (4, 5, 5)

rpln $ buildcondrr vvl2 hh4 kmax omax qmax
"(0.3146291777557706,{cap-color,spore-print-color,stalk-root,stalk-shape})"
"(0.31607649976978625,{cap-surface,spore-print-color,stalk-root,stalk-shape})"
"(0.3216225692777601,{bruises,gill-size,spore-print-color,stalk-root})"
"(0.32306115291407433,{cap-surface,gill-size,spore-print-color,stalk-root})"
"(0.32333251163145116,{gill-size,habitat,spore-print-color,stalk-shape})"

let (kmax,omax,qmax) = (6, 5, 5)

rpln $ buildcondrr vvl2 hh4 kmax omax qmax
"(0.3019349802149689,{bruises,cap-color,cap-surface,spore-print-color,stalk-root,stalk-shape})"
"(0.30193498021497245,{bruises,cap-color,population,spore-print-color,stalk-root,stalk-shape})"
"(0.30193498021497245,{bruises,cap-surface,gill-color,gill-size,spore-print-color,stalk-root})"
"(0.30193498021497245,{bruises,gill-size,gill-spacing,spore-print-color,stalk-color-below-ring,stalk-root})"
"(0.30193498021497245,{bruises,gill-size,gill-spacing,spore-print-color,stalk-root,stalk-surface-below-ring})"

Instead of measuring the predictability of odor by label entropy we can measure the label modal size, \[ \begin{eqnarray} \sum_{R \in (A\%K)^{\mathrm{FS}}} \mathrm{maxr}(A * \{R\}^{\mathrm{U}}~\%~(V \setminus K)) \end{eqnarray} \] More generally, define \[ \begin{eqnarray} \mathrm{lmodal}(A,W,V_{\mathrm{l}})~:=~\sum_{R \in (A\%W)^{\mathrm{FS}}} \mathrm{maxr}(A~\%~(W \cup V_{\mathrm{l}}) * \{R\}^{\mathrm{U}}~\%~V_{\mathrm{l}}) \end{eqnarray} \] The tuple $Z$ is defined \[ Z~=~\{\mathrm{cap color},~\mathrm{spore print color},~\mathrm{stalk root},~\mathrm{stalk shape}\} \]

let zz = llqq $ map VarStr ["cap-color","spore-print-color","stalk-root","stalk-shape"]

The label entropy fraction of tuple $Z$ is $1 - \mathrm{lent}(A,Z,V_{\mathrm{l2}})/\mathrm{entropy}(A\%V_{\mathrm{l2}})$,

lent aa zz vvl2
0.3146291777557706

ent $ aa `red` vvl2
1.6076955835943616

1.0 - 0.3146291777557706/1.6076955835943616
0.8042980394009996

To calculate the label modal size setVarsHistogramsSliceModal is defined in module Alignment,

setVarsHistogramsSliceModal :: Set.Set Variable -> Histogram -> Rational

The label modal size fraction of tuple $Z$ is $\mathrm{lmodal}(A,Z,V_{\mathrm{l2}})/\mathrm{size}(A\%V_{\mathrm{l2}})$,

let lmodal aa ww vvl = setVarsHistogramsSliceModal ww (aa `red` (ww `union` vvl))

lmodal aa zz vvl2
6524 % 1

size $ aa `red` vvl2
8124 % 1

6524.0/8124.0
0.8030526834071886

Both measures can be interpreted as implying an odor prediction accuracy of around 80%.

We can analyse the components containing ambiguous values for variable odor. Define \[ \begin{eqnarray} \mathrm{lslices}(A,W,V_{\mathrm{l}})~:=~\{(R,~A~\%~(W \cup V_{\mathrm{l}}) * \{R\}^{\mathrm{U}}) : R \in (A\%W)^{\mathrm{FS}}\} \end{eqnarray} \]

let lslicesll aa ww vvl = Map.toList $ setVarsHistogramsSlices ww (aa `red` (ww `union` vvl))

Then \[ \begin{eqnarray} \{C’ : (R,C) \in \mathrm{lslices}(A,Z,V_{\mathrm{l2}}),~C’ = C\%V_{\mathrm{l2}},~\mathrm{size}(C^{‘\mathrm{F}}) > 1\} \end{eqnarray} \]

rpln [cc' | (rr,cc) <- lslicesll aa zz vvl2, let cc' = cc `red` vvl2, size (eff cc') > 1]
"{({(odor,none)},24 % 1),({(odor,pungent)},64 % 1)}"
"{({(odor,almond)},24 % 1),({(odor,anise)},24 % 1)}"
"{({(odor,none)},24 % 1),({(odor,pungent)},64 % 1)}"
"{({(odor,almond)},24 % 1),({(odor,anise)},24 % 1)}"
"{({(odor,fishy)},288 % 1),({(odor,foul)},288 % 1),({(odor,spicy)},288 % 1)}"
"{({(odor,fishy)},288 % 1),({(odor,foul)},288 % 1),({(odor,spicy)},288 % 1)}"
"{({(odor,almond)},64 % 1),({(odor,anise)},64 % 1)}"
"{({(odor,almond)},12 % 1),({(odor,anise)},12 % 1)}"
"{({(odor,almond)},64 % 1),({(odor,anise)},64 % 1)}"
"{({(odor,almond)},12 % 1),({(odor,anise)},12 % 1)}"
"{({(odor,almond)},64 % 1),({(odor,anise)},64 % 1)}"
"{({(odor,almond)},24 % 1),({(odor,anise)},24 % 1)}"
"{({(odor,almond)},12 % 1),({(odor,anise)},12 % 1)}"
"{({(odor,almond)},64 % 1),({(odor,anise)},64 % 1)}"
"{({(odor,almond)},24 % 1),({(odor,anise)},24 % 1)}"
"{({(odor,almond)},12 % 1),({(odor,anise)},12 % 1)}"

We can see that in some of the components the size of each value is duplicated. For example, in the last case the values almond and anise both have component size of 12,

let rr = last [rr | (rr,cc) <- lslicesll aa zz vvl2, let cc' = cc `red` vvl2, size (eff cc') > 1]

rp rr
"{(cap-color,yellow),(spore-print-color,purple),(stalk-root,bulbous),(stalk-shape,tapering)}"

Then $A * \{R\}^{\mathrm{U}}~\%~(Z \cup V_{\mathrm{l2}})$ is

rpln $ aall $ aa `mul` single rr 1 `red` (zz `union` vvl2)
"({(cap-color,yellow),(odor,almond),(spore-print-color,purple),(stalk-root,bulbous),(stalk-shape,tapering)},12 % 1)"
"({(cap-color,yellow),(odor,anise),(spore-print-color,purple),(stalk-root,bulbous),(stalk-shape,tapering)},12 % 1)"

size $ eff $ aa `mul` single rr 1
24 % 1

This duplication probably arises from the method used in the construction of the hypothetical mushroom samples.

As mentioned above, edibility is also somewhat predictive of odor,

let edible = VarStr "edible"

The label entropy fraction is $1 - \mathrm{lent}(A,\{\mathrm{edible}\},\{\mathrm{odor}\})/\mathrm{entropy}(A\%\{\mathrm{odor}\})$,

lent aa (sgl edible) (sgl odor) 
0.9796522676447261

ent $ aa `red` (sgl odor) 
1.6076955835943616

1.0 - 0.9796522676447261/1.6076955835943616
0.39064815650330065

The label modal size fraction is $\mathrm{lmodal}(A,\{\mathrm{edible}\},\{\mathrm{odor}\})/\mathrm{size}(A\%\{\mathrm{odor}\})$,

lmodal aa (sgl edible) (sgl odor)
5568 % 1

size $ aa `red` (sgl odor)
8124 % 1

5568.0/8124.0
0.6853766617429837

but the odor prediction accuracy is lower, around 40-70%.

Manual modelling of edibility

Having seen that edibility is predicted by various subsets of the substrate, $V$, consider if a model can do this in a more concise way.

There are some rules for poisonous mushrooms from most general to most specific:

P_1) odor=NOT(almond.OR.anise.OR.none)
     120 poisonous cases missed, 98.52% accuracy

P_2) spore-print-color=green
     48 cases missed, 99.41% accuracy
     
P_3) odor=none.AND.stalk-surface-below-ring=scaly.AND.
          (stalk-color-above-ring=NOT.brown) 
     8 cases missed, 99.90% accuracy
     
P_4) habitat=leaves.AND.cap-color=white
         100% accuracy

Rule P_4) may also be

P_4') population=clustered.AND.cap_color=white

We have created a fud of transforms for each of these rules in MUSH_model_manual.json (see Manual model construction).

First, load the model $G_{\mathrm{m}}$,

s <- ByteString.readFile "./MUSH_model_manual.json"
let ggm = fromJust $ persistentsFud $ fromJust $ (Data.Aeson.decode s :: Maybe FudPersistent)
let uu1 = uu `uunion` (fsys ggm)

The model has 4 derived variables, $W_{\mathrm{m}} = \mathrm{der}(G_{\mathrm{m}})$,

rp $ fder ggm
"{p1,p2,p3,p4}"

and a derived volume, $|W_{\mathrm{m}}^{\mathrm{C}}|$, of 16,

vol uu1 (fder ggm)
16

The model has 6 underlying variables, $V_{\mathrm{m}} = \mathrm{und}(G_{\mathrm{m}})$,

rp $ fund ggm
"{cap-color,habitat,odor,spore-print-color,stalk-color-above-ring,stalk-surface-below-ring}"

The underlying volume, $|V_{\mathrm{m}}^{\mathrm{C}}|$, is

vol uu $ fund ggm
204120

Let the derived be $A’ = A * G_{\mathrm{m}}^{\mathrm{T}}$. The derived alignment, $\mathrm{algn}(A’)$, is

let aa' = aa `fmul` ggm `red` fder ggm

algn aa'
69.86642835717794

The derived variables are only weakly aligned. Furthermore, they are overlapped, $\mathrm{overlap}(G_{\mathrm{m}}^{\mathrm{T}})$,

fudsOverlap ggm
True

so the content derived alignment, $\mathrm{algn}(A * G_{\mathrm{m}}^{\mathrm{T}}) - \mathrm{algn}(A^{\mathrm{X}} * G_{\mathrm{m}}^{\mathrm{T}})$, would be lower still.

The derived entropy, $\mathrm{entropy}(A’)$, is

ent aa'
0.7711287134449115

This may be compared to the logarithm of the derived volume, $\ln |W_{\mathrm{m}}^{\mathrm{C}}|$,

let w = fromIntegral (vol uu1 (fder ggm)) :: Double

log w
2.772588722239781

Let the cartesian derived be $V_{\mathrm{m}}^{\mathrm{C}’} = V_{\mathrm{m}}^{\mathrm{C}} * G_{\mathrm{m}}^{\mathrm{T}}$. The cartesian derived entropy, $\mathrm{entropy}(V_{\mathrm{m}}^{\mathrm{C}’})$, depends on the underlying cartesian, $V_{\mathrm{m}}^{\mathrm{C}}$, but the underlying volume, $|V_{\mathrm{m}}^{\mathrm{C}}|$, is quite large so we calculate the cartesian derived entropy by constructing a HistoryRepa,

let hvvg = aahr uu1 $ unit (cart uu1 (fund ggm))

hrsize hvvg
204120

let vvc' = hhaa $ hrhh uu1 $ hrfmul uu1 ggm hvvg `hrhrred` fder ggm

ent vvc'
1.1482395879784482

The cartesian derived entropy is greater than the derived entropy, $\mathrm{entropy}(V_{\mathrm{m}}^{\mathrm{C}’}) > \mathrm{entropy}(A’)$.

The size-volume scaled component size cardinality sum relative entropy is the size-volume scaled component size cardinality sum cross entropy minus the size-volume scaled component size cardinality sum entropy (Transform entropy), \[ \begin{eqnarray} (z+v_{\mathrm{m}}) \times \mathrm{entropy}(A * G_{\mathrm{m}}^{\mathrm{T}} + V_{\mathrm{m}}^{\mathrm{C}} * G_{\mathrm{m}}^{\mathrm{T}}) - z \times \mathrm{entropy}(A * G_{\mathrm{m}}^{\mathrm{T}}) - v_{\mathrm{m}} \times \mathrm{entropy}(V_{\mathrm{m}}^{\mathrm{C}} * G_{\mathrm{m}}^{\mathrm{T}}) \end{eqnarray} \]

let z = fromRational (size aa') :: Double

let v = fromRational (size vvc') :: Double

(z+v) * ent (aa' `add` vvc') - z * ent aa' - v * ent vvc'
1663.472301909118

(z+v) * log w
588465.3207630601

Define the abbreviation rent for the size-volume scaled component size cardinality sum relative entropy, \[ \begin{eqnarray} \mathrm{rent}(A,B)~:=~(z_A+z_B) \times \mathrm{entropy}(A + B) - z_A \times \mathrm{entropy}(A) - z_B \times \mathrm{entropy}(B) \end{eqnarray} \]

let rent aa bb = let a = fromRational (size aa); b = fromRational (size bb) in (a+b) * ent (aa `add` bb) - a * ent aa - b * ent bb

Then the relative entropy is $\mathrm{rent}(A’,V_{\mathrm{m}}^{\mathrm{C}’})$,

rent aa' vvc'
1663.472301909118

Like the derived alignment, the relative entropy is quite low. These statistics are interesting because both give us a measure of the likelihood of the model. This is especially the case for the size-volume scaled component size cardinality sum relative entropy, $\mathrm{rent}(A * G_{\mathrm{m}}^{\mathrm{T}},V_{\mathrm{m}}^{\mathrm{C}} * G_{\mathrm{m}}^{\mathrm{T}})$, which is discussed in the ‘Induction with model’ section of the Overview of the paper.

In the discussion of induced models below the underlying volumes are impracticably large so let us approximate the relative entropy by using a volume sized shuffle. We constructed a shuffle, $A_{\mathrm{r}}$, earlier when discussing tuples in the substrate,

let aar =  hhaa (hrhh uu hh)

size aar
8124 % 1

We will calculate the size-volume-sized-shuffle relative entropy, \[ \begin{eqnarray} (z+v_{\mathrm{m}}) \times \mathrm{ent}(A * G_{\mathrm{m}}^{\mathrm{T}} + Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}} * G_{\mathrm{m}}^{\mathrm{T}}) - z \times \mathrm{ent}(A * G_{\mathrm{m}}^{\mathrm{T}}) - v_{\mathrm{m}} \times \mathrm{ent}(Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}} * G_{\mathrm{m}}^{\mathrm{T}}) \end{eqnarray} \] where $v_{\mathrm{m}} = |V_{\mathrm{m}}^{\mathrm{C}}|$ and $Z_{\mathrm{m}} = \mathrm{scalar}(v_{\mathrm{m}})$.

Let the shuffle derived be $A_{\mathrm{r}}’ = A_{\mathrm{r}} * G_{\mathrm{m}}^{\mathrm{T}}$,

let aar' = aar `fmul` ggm `red` fder ggm

The shuffle derived alignment, $\mathrm{algn}(A_{\mathrm{r}}’)$ is expected to be low,

algn aar'
69.86823318824463

The volume sized shuffle derived entropy, $\mathrm{entropy}(Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}}’)$, is

ent (resize (size vvc') aar')
0.8722031664849256

and the size-volume-sized-shuffle relative entropy, $\mathrm{rent}(A’,~Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}}’)$, is

rent aa' (resize (size vvc') aar')
155.0603057218832

We can see that the size-volume-sized-shuffle relative entropy, $\mathrm{rent}(A’,~Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}}’)$, is lower than the size-volume relative entropy, $\mathrm{rent}(A’,V_{\mathrm{m}}^{\mathrm{C}’})$. This is because the volume sized shuffle, $Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}}$, is less uniform than the cartesian, $V_{\mathrm{m}}^{\mathrm{C}}$, and tends to synchronise with the sample, $A$. However, the size-volume-sized-shuffle relative entropy provides us with a measure of the likelihood of the model.

Now apply the model to the sample. Let $B = A * \mathrm{his}(G_{\mathrm{m}}^{\mathrm{T}})$,

let bb = aa `fmul` ggm

rpln $ aall $ bb `red` (fder ggm `union` vvl)
"({(edible,edible),(p1,0),(p2,0),(p3,0),(p4,0)},4208 % 1)"
"({(edible,poisonous),(p1,0),(p2,0),(p3,0),(p4,1)},8 % 1)"
"({(edible,poisonous),(p1,0),(p2,0),(p3,1),(p4,0)},40 % 1)"
"({(edible,poisonous),(p1,0),(p2,1),(p3,0),(p4,0)},72 % 1)"
"({(edible,poisonous),(p1,1),(p2,0),(p3,0),(p4,0)},3796 % 1)"

size $ eff $ bb `red` (fder ggm `union` vvl)
5 % 1

rpln $ qqll $ ssplit (fder ggm) $ states (bb `red` (fder ggm `union` vvl))
"({(p1,0),(p2,0),(p3,0),(p4,0)},{(edible,edible)})"
"({(p1,0),(p2,0),(p3,0),(p4,1)},{(edible,poisonous)})"
"({(p1,0),(p2,0),(p3,1),(p4,0)},{(edible,poisonous)})"
"({(p1,0),(p2,1),(p3,0),(p4,0)},{(edible,poisonous)})"
"({(p1,1),(p2,0),(p3,0),(p4,0)},{(edible,poisonous)})"

We can see that together the rules P1-4 are functionally or causally related to edibility, $(B\%W_{\mathrm{m}})^{\mathrm{FS}} \to (B\%V_{\mathrm{l}})^{\mathrm{FS}}$. In addition, there are only 5 effective states of 16 derived states, so the model, $G_{\mathrm{m}}$, might be said to be more concise than the tuples $X$ and $Y$ of the non-modelled case above.

The model entropy is similar to the slice entropy of the non-modelled case. The model’s label entropy or query conditional entropy is zero, $\mathrm{lent}(B,W_{\mathrm{m}},V_{\mathrm{l}}) = 0$.

let lent aa ww vvl = ent (aa `red` (llqq ww `union` vvl)) - ent (aa `red` llqq ww)

let [p1,p2,p3,p4] = map VarStr ["p1","p2","p3","p4"]

lent bb [p1,p2,p3,p4] vvl
0.0

lent bb [p1] vvl
6.75240166808252e-2

lent bb [p2] vvl
0.6859909862350153

lent bb [p3] vvl
0.6888949375684059

lent bb [p4] vvl
0.6917819609609287

lent bb [p1,p2] vvl
3.237355156509225e-2

lent bb [p1,p2,p3] vvl
7.155343359194988e-3

Rule P1 is far more predictive of edibility than the other rules, having a label entropy, $\mathrm{lent}(B,\{\mathrm{p}_1\},V_{\mathrm{l}})$, of only 6.75240166808252e-2, whereas the other rules are close to the maximum label entropy. The label entropy fraction is $1 - \mathrm{lent}(B,\{\mathrm{p}_1\},V_{\mathrm{l}})/\ln |V_{\mathrm{l}}^{\mathrm{C}}|$,

vol uu vvl
2

log 2
0.6931471805599453

1.0 - 6.75240166808252e-2/0.6931471805599453
0.9025834359936699

The label modal size fraction is $\mathrm{lmodal}(B,\{\mathrm{p}_1\},V_{\mathrm{l}})/\mathrm{size}(B\%V_{\mathrm{l}})$,

lmodal bb (sgl p1) vvl
8004 % 1

size $ bb `red` vvl
8124 % 1

8004.0/8124.0
0.9852289512555391

As noted above, odor is highly predictive of edibility,

lent aa [odor] vvl
6.445777995546464e-2

lmodal aa (sgl odor) vvl
8004 % 1

rpln $ aall $ aa `red` (sgl odor `union` vvl)
"({(edible,edible),(odor,almond)},400 % 1)"
"({(edible,edible),(odor,anise)},400 % 1)"
"({(edible,edible),(odor,none)},3408 % 1)"
"({(edible,poisonous),(odor,creosote)},192 % 1)"
"({(edible,poisonous),(odor,fishy)},576 % 1)"
"({(edible,poisonous),(odor,foul)},2160 % 1)"
"({(edible,poisonous),(odor,musty)},36 % 1)"
"({(edible,poisonous),(odor,none)},120 % 1)"
"({(edible,poisonous),(odor,pungent)},256 % 1)"
"({(edible,poisonous),(odor,spicy)},576 % 1)"

p1 depends only on odor. Let $T_1 \in G_{\mathrm{m}}$ be such that $\mathrm{der}(T_1) = \{\mathrm{p}_1\}$. Then $\mathrm{und}(T_1) = \{\mathrm{odor}\}$,

let tt1 = least $ ffqq $ ggm `fdep` sgl p1

rpln $ aall $ aa `mul` ttaa tt1 `red` (der tt1 `union` vvl)
"({(edible,edible),(p1,0)},4208 % 1)"
"({(edible,poisonous),(p1,0)},120 % 1)"
"({(edible,poisonous),(p1,1)},3796 % 1)"

rp $ und tt1
"{odor}"

rpln $ qqll $ states $ ttaa tt1
"{(odor,almond),(p1,0)}"
"{(odor,anise),(p1,0)}"
"{(odor,creosote),(p1,1)}"
"{(odor,fishy),(p1,1)}"
"{(odor,foul),(p1,1)}"
"{(odor,musty),(p1,1)}"
"{(odor,none),(p1,0)}"
"{(odor,pungent),(p1,1)}"
"{(odor,spicy),(p1,1)}"

rpln $ aall $ aa `mul` ttaa tt1 `red` (tvars tt1 `union` vvl)
"({(edible,edible),(odor,almond),(p1,0)},400 % 1)"
"({(edible,edible),(odor,anise),(p1,0)},400 % 1)"
"({(edible,edible),(odor,none),(p1,0)},3408 % 1)"
"({(edible,poisonous),(odor,creosote),(p1,1)},192 % 1)"
"({(edible,poisonous),(odor,fishy),(p1,1)},576 % 1)"
"({(edible,poisonous),(odor,foul),(p1,1)},2160 % 1)"
"({(edible,poisonous),(odor,musty),(p1,1)},36 % 1)"
"({(edible,poisonous),(odor,none),(p1,0)},120 % 1)"
"({(edible,poisonous),(odor,pungent),(p1,1)},256 % 1)"
"({(edible,poisonous),(odor,spicy),(p1,1)},576 % 1)"

We can also consider how predictive the model is of odor. The label entropy fraction is $1 - \mathrm{lent}(B,W_{\mathrm{m}},V_{\mathrm{l2}})/\ln |V_{\mathrm{l2}}^{\mathrm{C}}|$,

rp vvl2
"{odor}"

vol uu vvl2
9

log 9
2.1972245773362196

lent bb [p1,p2,p3,p4] vvl2
0.9136278428753103

1.0 - 0.9136278428753103/2.1972245773362196
0.584190049438216

The label modal size fraction is $\mathrm{lmodal}(B,W_{\mathrm{m}},V_{\mathrm{l2}})/\mathrm{size}(B\%V_{\mathrm{l2}})$,

lmodal bb (llqq [p1,p2,p3,p4]) vvl2
5688 % 1

5688.0/8124.0
0.7001477104874446

The model, $G_{\mathrm{m}}$, is only 60-70% accurate with respect to odor, even though odor is in the underlying variables, $\mathrm{odor} \in V_{\mathrm{m}}$.

Induced modelling of edibility

Having considered a manually defined model of edibility, $G_{\mathrm{m}}$, now consider an unsupervised induced model $D$ on the query variables, $V_{\mathrm{k}}$, which exclude edibility. By unsupervised we mean an induced model that is optimised not to minimise the label entropy, nor to maximise the label modal size, but rather to maximise the summed alignment valency-density.

Then we shall analyse this model, $D$, to find a smaller submodel that predicts the label variables, $V_{\mathrm{l}}$, or edibility. That is, we shall search in the decomposition fud for a submodel that optimises conditional entropy.

Here the induced model is created by the label-entropy limited-nodes highest-layer excluded-self maximum-roll-by-derived-dimension fud decomper, $(\cdot,D) = I_{P,U,\mathrm{D,F,mm,xs,d,f,e},V_{\mathrm{l}}}((V_{\mathrm{k}},A))$.

There are some examples of model induction in the MUSH repository.

First consider the fud decomposition MUSH_model16.json (see Model 16 induction),

s <- ByteString.readFile "./MUSH_model16.json"
let df = fromJust $ persistentsDecompFud $ fromJust $ (Data.Aeson.decode s :: Maybe DecompFudPersistent)

let uu1 = uu `uunion` (fsys (dfff df))

card $ uvars uu1
327

Let us examine the tree of the fud decomposition, \[ \begin{eqnarray} \{\{(S,~\mathrm{und}(F),~\mathrm{der}(F)) : (S,F) \in L\} : L \in \mathrm{paths}(D)\} \end{eqnarray} \]

rpln $ qqll $ treesPaths $ funcsTreesMap (\(ss,ff) -> (ss,fund ff,fder ff)) $ dfzz $ df
...

The fud identifier is a VarInt that is set by the inducer as part of the naming convention of the derived variables, \[ \begin{eqnarray} \mathrm{fid}(F)~:=~f : ((f,\cdot),\cdot) \in \mathrm{der}(F) \end{eqnarray} \] The decomposition tree contains 10 nodes with fud identifiers as follows, \[ \begin{eqnarray} \{\{\mathrm{fid}(F) : (\cdot,F) \in L\} : L \in \mathrm{paths}(D)\} \end{eqnarray} \]

let fid = variablesVariableFud . least . fder

rpln $ qqll $ treesSubPaths $ funcsTreesMap (\(ss,ff) -> fid ff) $ dfzz $ df
"[1]"
"[1,2]"
"[1,2,3]"
"[1,2,3,5]"
"[1,2,3,7]"
"[1,2,3,7,9]"
"[1,4]"
"[1,4,6]"
"[1,8]"
"[1,10]"

Now consider the summed alignment and the summed alignment valency-density, $\mathrm{summation}(U_1,D,A))$,

let summation = systemsDecompFudsHistoryRepasAlignmentContentShuffleSummation_u
let sumtree = systemsDecompFudsHistoryRepasTreeAlignmentContentShuffleSummation_u

let (wmax,lmax,xmax,omax,bmax,mmax,umax,pmax,fmax,mult,seed) = ((9*9*10), 8, (9*9*10), 40, (40*4), 4, (9*9*10), 1, 20, 7, 5)

let hh = aahr uu aa

summation mult seed uu1 df hh
(71310.19827233575,32440.295302812192)

\[ \begin{eqnarray} \{(\mathrm{fid}(F),~z_C,~a) : ((S,F),(z_C,(a,a_{\mathrm{d}}))) \in \mathrm{nodes}(\mathrm{sumtree}(U_1,D,A))\} \end{eqnarray} \]

rpln $ qqll $ treesElements $ funcsTreesMap (\((ss,ff),(zc,(a,ad))) -> (fid ff, zc, a)) $ sumtree mult seed uu1 df hh
"(1,8124,35990.15790620491)"
"(2,3020,13649.783208886602)"
"(3,1292,5676.82322968897)"
"(4,1824,8636.834045487856)"
"(5,576,2657.2174838583524)"
"(6,544,2364.3032881513377)"
"(7,324,1284.851899930063)"
"(8,120,323.81340470518205)"
"(9,132,446.3349025727048)"
"(10,88,280.07890284977236)"

We can see that the root fud has the highest slice size and shuffle content derived alignment, while the leaf fuds have small slice sizes and shuffle content derived alignments.

The bare model is a fud decomposition. As noted in Conversion to fud, the tree of a fud decomposition is sometimes unwieldy, so consider the fud decomposition fud, $F = D^{\mathrm{F}} \in \mathcal{F}$, (see Practicable fud decomposition fud),

let ff = fromJust $ systemsDecompFudsNullablePracticable uu1 df 1

let uu2 = uu `uunion` (fsys ff)

card $ uvars uu2
426

The model, $F$, has 85 derived variables, $W_F = \mathrm{der}(F)$, and a large derived volume, $|W_F^{\mathrm{C}}|$,

card $ fder ff
85

rp $ fder ff
"{<<1,n>,1>,<<1,n>,2>,...,<<1,n>,8>,<<2,n>,1>,<<2,n>,2>,...,<<9,n>,8>,<<9,n>,9>,<<10,n>,1>,<<10,n>,2>,...,<<10,n>,9>}"

vol uu2 (fder ff)
31496985441645519664092926530583462412288

The model has 21 underlying variables, $V_F = \mathrm{und}(F)$,

card $ fund ff
21

rp $ vv `minus` fund ff
"{edible,veil-type}"

That is, the model depends on all of the substrate except for the label variable, edible, and mono-valent veil-type. This is consistent with the observation above that none of the substrate variables, except for veil-type, is independent of the others.

The underlying volume, $|V_F^{\mathrm{C}}|$, is

vol uu $ fund ff
121899810816000

The derived entropy, $\mathrm{entropy}(A * F)$, is

let aa' = hhaa $ hrhh uu2 $ hrfmul uu2 ff (aahr uu aa) `hrhrred` fder ff

ent aa'
2.0334547171091186

This may be compared to the logarithm of the derived volume, $\ln |W_F^{\mathrm{C}}|$,

let w = fromIntegral (vol uu2 (fder ff)) :: Double

log w
93.25071046775459

So derived entropy is quite low. This is because there are only 18 effective derived states,

size $ eff aa'
18 % 1

rpln $ snd $ unzip $ aall aa'
"48 % 1"
"72 % 1"
"1728 % 1"
"288 % 1"
"288 % 1"
"384 % 1"
"8 % 1"
"36 % 1"
"96 % 1"
"192 % 1"
"288 % 1"
"256 % 1"
"512 % 1"
"768 % 1"
"48 % 1"
"48 % 1"
"40 % 1"
"3024 % 1"

The cartesian derived entropy, $\mathrm{entropy}(V_F^{\mathrm{C}} * F)$, depends on the underlying cartesian, $V_F^{\mathrm{C}}$. The underlying volume is too large to compute, so we are unable to calculate the cartesian derived entropy or the component size cardinality sum relative entropy. Instead we can compute an approximation to the size-volume scaled component size independent sum relative entropy using a volume sized shuffle, \[ \begin{eqnarray} (z+v_F) \times \mathrm{ent}(A * F^{\mathrm{T}} + Z_F * \hat{A}_{\mathrm{r}} * F^{\mathrm{T}}) - z \times \mathrm{ent}(A * F^{\mathrm{T}}) - v_F \times \mathrm{ent}(Z_F * \hat{A}_{\mathrm{r}} * F^{\mathrm{T}}) \end{eqnarray} \] where $v_F = |V_F^{\mathrm{C}}|$ and $Z_F = \mathrm{scalar}(v_F)$.

let aar = hhaa $ fromJust $ historiesShuffle (aahh aa) 1

size aar
8124 % 1

let vsize uu xx aa = resize (fromIntegral (vol uu xx)) aa

let aar' = hhaa $ hrhh uu2 $ hrfmul uu2 ff (aahr uu aar) `hrhrred` fder ff

ent $ vsize uu (fund ff) aar'
5.204078942330196

rent aa' (vsize uu (fund ff) aar')
91138.75

We can see that by this measure the relative entropy of the induced model, $\mathrm{rent}(A * F^{\mathrm{T}},~Z_F * \hat{A}_{\mathrm{r}} * F^{\mathrm{T}})$, is much higher than the relative entropy of the manual model, $\mathrm{rent}(A * G_{\mathrm{m}}^{\mathrm{T}},~Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}} * G_{\mathrm{m}}^{\mathrm{T}})$. This is consistent with the derived aligment, $\mathrm{algn}(A * F)$, implied by the summed alignment, $\mathrm{summation}(U_1,D,A)$, which is also higher for the induced model.

Now apply the model to the sample. Let $B = A * \prod\mathrm{his}(F)$,

let hh = aahr uu aa

let hhb = hrfmul uu2 ff hh

rpln $ aall $ hhaa $ hrhh uu2 $ hhb `hrhrred` (fder ff `union` vvl)
"({(edible,edible),(<<1,n>,1>,0),...,(<<10,n>,9>,null)},48 % 1)"
"({(edible,edible),(<<1,n>,1>,0),...,(<<10,n>,9>,null)},1728 % 1)"
"({(edible,edible),(<<1,n>,1>,0),...,(<<10,n>,9>,null)},288 % 1)"
"({(edible,edible),(<<1,n>,1>,0),...,(<<10,n>,9>,null)},384 % 1)"
"({(edible,edible),(<<1,n>,1>,0),...,(<<10,n>,9>,null)},96 % 1)"
"({(edible,edible),(<<1,n>,1>,0),...,(<<10,n>,9>,null)},288 % 1)"
"({(edible,edible),(<<1,n>,1>,0),...,(<<10,n>,9>,null)},512 % 1)"
"({(edible,edible),(<<1,n>,1>,0),...,(<<10,n>,9>,null)},768 % 1)"
"({(edible,edible),(<<1,n>,1>,0),...,(<<10,n>,9>,null)},48 % 1)"
"({(edible,edible),(<<1,n>,1>,1),...,(<<10,n>,9>,0)},48 % 1)"
"({(edible,poisonous),(<<1,n>,1>,0),...,(<<10,n>,9>,null)},72 % 1)"
"({(edible,poisonous),(<<1,n>,1>,0),...,(<<10,n>,9>,null)},288 % 1)"
"({(edible,poisonous),(<<1,n>,1>,0),...,(<<10,n>,9>,null)},8 % 1)"
"({(edible,poisonous),(<<1,n>,1>,0),...,(<<10,n>,9>,null)},36 % 1)"
"({(edible,poisonous),(<<1,n>,1>,0),...,(<<10,n>,9>,null)},192 % 1)"
"({(edible,poisonous),(<<1,n>,1>,0),...,(<<10,n>,9>,null)},256 % 1)"
"({(edible,poisonous),(<<1,n>,1>,1),...,(<<10,n>,9>,1)},40 % 1)"
"({(edible,poisonous),(<<1,n>,1>,1),...,(<<10,n>,9>,null)},3024 % 1)"

size $ eff $ hhaa $ hrhh uu2 $ hhb `hrhrred` (fder ff `union` vvl)
18 % 1

rpln $ qqll $ ssplit (fder ff) $ states (hhaa $ hrhh uu2 $ hhb `hrhrred` (fder ff `union` vvl))
"({(<<1,n>,1>,0),...,(<<10,n>,9>,null)},{(edible,edible)})"
...
"({(<<1,n>,1>,1),...,(<<10,n>,9>,null)},{(edible,poisonous)})"

The model derived variables, $W_F$, are functionally or causally related to edibility, $(B\%W_F)^{\mathrm{FS}} \to (B\%V_{\mathrm{l}})^{\mathrm{FS}}$. The model’s label entropy or query conditional entropy is zero, $\mathrm{lent}(B,W_F,V_{\mathrm{l}}) = 0$,

let hrlent uu hh ww vvl = ent (hhaa $ hrhh uu $ hh `hrhrred` (ww `union` vvl)) - ent (hhaa $ hrhh uu $ hh `hrhrred` ww)

hrlent uu2 hhb (fder ff) vvl
0.0

rpln $ sort [(hrlent uu2 hhb (sgl w) vvl, w) | w <- qqll (fder ff)]
"(0.2891276688147182,<<1,n>,4>)"
"(0.2891276688147182,<<1,n>,5>)"
"(0.29101789445875514,<<1,n>,2>)"
"(0.29101789445875514,<<1,n>,8>)"
"(0.3047757303930372,<<1,n>,3>)"
"(0.3117569583258841,<<1,n>,1>)"
"(0.3117569583258841,<<1,n>,7>)"
"(0.32031049115729093,<<1,n>,6>)"
"(0.5081830057676674,<<2,n>,1>)"
...
"(0.5081830057676674,<<2,n>,9>)"
"(0.5736117153862113,<<4,n>,2>)"
...
"(0.5993064401671433,<<4,n>,4>)"
"(0.6433069166049731,<<5,n>,1>)"
...
"(0.6433069166049731,<<5,n>,9>)"
"(0.6461836104063963,<<6,n>,1>)"
...
"(0.6461836104063963,<<6,n>,9>)"
"(0.656613585825601,<<3,n>,3>)"
"(0.6610796075749376,<<3,n>,4>)"
"(0.6610796075749376,<<3,n>,5>)"
"(0.6610796075749377,<<3,n>,1>)"
"(0.6736046244800589,<<7,n>,1>)"
...
"(0.6736046244800589,<<7,n>,9>)"
"(0.6814702642257713,<<9,n>,1>)"
...
"(0.6814702642257713,<<9,n>,9>)"
"(0.6821406364651721,<<8,n>,1>)"
...
"(0.6821406364651721,<<8,n>,9>)"
"(0.68502108329405,<<10,n>,1>)"
...
"(0.68502108329405,<<10,n>,9>)"
"(0.6874764106603346,<<3,n>,7>)"
"(0.687904856920078,<<3,n>,2>)"
"(0.687904856920078,<<3,n>,6>)"

We can see that the derived variables nearest the root fud tend to have the lowest label entropy. None have zero label entropy by themselves. Consider derived variable <<1,n>,4> in the root fud,

let w1n4 = stringsVariable "<<1,n>,4>"

rp $ fund $ ff `fdep` sgl w1n4
"{bruises,spore-print-color,stalk-root,stalk-shape}"

hrlent uu2 hhb (sgl w1n4) vvl
0.2891276688147182

rpln $ aall $ hhaa $ hrhh uu2 $ hhb `hrhrred` (sgl w1n4 `union` vvl)
"({(edible,edible),(<<1,n>,4>,0)},2592 % 1)"
"({(edible,edible),(<<1,n>,4>,2)},1616 % 1)"
"({(edible,poisonous),(<<1,n>,4>,0)},636 % 1)"
"({(edible,poisonous),(<<1,n>,4>,1)},3024 % 1)"
"({(edible,poisonous),(<<1,n>,4>,2)},256 % 1)"

rpln $ qqll $ ssplit (fder ff) $ states (hhaa $ hrhh uu2 $ hhb `hrhrred` (sgl w1n4 `union` vvl))
"({(<<1,n>,4>,0)},{(edible,edible)})"
"({(<<1,n>,4>,0)},{(edible,poisonous)})"
"({(<<1,n>,4>,1)},{(edible,poisonous)})"
"({(<<1,n>,4>,2)},{(edible,edible)})"
"({(<<1,n>,4>,2)},{(edible,poisonous)})"

Now consider the label entropy for all of the fud variables, $\mathrm{vars}(F)$, not just the fud derived variables, $\mathrm{der}(F)$. We can determine minimum subsets of the query variables that are causal or predictive by using the repa conditional entropy tuple set builder. The conditional entropy minimisation searches for the set of tuples with the least label entropy. We show the resultant tuples along with their label entropies, \[ \{(\mathrm{lent}(B,M,V_{\mathrm{l}}),~M) : M \in \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,B,\mathrm{L}}))\} \]

let buildcondrr vvl aa kmax omax qmax = sort $ map (\(a,b) -> (b,a)) $ Map.toList $ fromJust $ parametersBuilderConditionalVarsRepa kmax omax qmax vvl aa

let (kmax,omax,qmax) = (1, 5, 5)

rpln $ buildcondrr vvl hhb kmax omax qmax
"(6.445777995546442e-2,{<<6,1>,36>})"
"(6.445777995546464e-2,{odor})"
"(6.75240166808252e-2,{<<7,1>,45>})"
"(0.13057611124536495,{<<3,1>,103>})"
"(0.16474416484069376,{<<1,1>,212>})"

Variable <<6,1>,36> is as predictable as variable odor. Variable <<6,1>,36> is in the bottom layer of fud 6 and is defined as follows -

{
	"derived":["<<6,1>,36>"],
	"history":{
		"hsystem":[
			{"var":"odor","values":["almond","anise","creosote","fishy","foul","musty","none","pungent","spicy"]},
			{"var":"<<6,1>,36>","values":["0","1","2"]}
		],
		"hstates":[
			[0,0],
			[1,0],
			[2,1],
			[3,1],
			[4,1],
			[5,1],
			[6,2],
			[7,1],
			[8,1]
		]
	}
}

or

let w6136 = stringsVariable "<<6,1>,36>"

rp $ fund $ ff `fdep` sgl w6136
"{odor}"

rpln $ qqll $ states $ ttaa $ least $ ffqq $ ff `fdep` sgl w6136
"{(odor,almond),(<<6,1>,36>,0)}"
"{(odor,anise),(<<6,1>,36>,0)}"
"{(odor,creosote),(<<6,1>,36>,1)}"
"{(odor,fishy),(<<6,1>,36>,1)}"
"{(odor,foul),(<<6,1>,36>,1)}"
"{(odor,musty),(<<6,1>,36>,1)}"
"{(odor,none),(<<6,1>,36>,2)}"
"{(odor,pungent),(<<6,1>,36>,1)}"
"{(odor,spicy),(<<6,1>,36>,1)}"

That is, underlying values almond and anise form a component, value none forms a singleton component, while the remaining values are in a third component.

rpln $ aall $ hhaa $ hrhh uu2 $ hhb `hrhrred` (llqq [w6136] `union` vvl)
"({(edible,edible),(<<6,1>,36>,0)},800 % 1)"
"({(edible,edible),(<<6,1>,36>,2)},3408 % 1)"
"({(edible,poisonous),(<<6,1>,36>,1)},3796 % 1)"
"({(edible,poisonous),(<<6,1>,36>,2)},120 % 1)"

let oder = VarStr "odor"

rpln $ aall $ hhaa $ hrhh uu2 $ hhb `hrhrred` (llqq [w6136,oder] `union` vvl)
"({(edible,edible),(odor,almond),(<<6,1>,36>,0)},400 % 1)"
"({(edible,edible),(odor,anise),(<<6,1>,36>,0)},400 % 1)"
"({(edible,edible),(odor,none),(<<6,1>,36>,2)},3408 % 1)"
"({(edible,poisonous),(odor,creosote),(<<6,1>,36>,1)},192 % 1)"
"({(edible,poisonous),(odor,fishy),(<<6,1>,36>,1)},576 % 1)"
"({(edible,poisonous),(odor,foul),(<<6,1>,36>,1)},2160 % 1)"
"({(edible,poisonous),(odor,musty),(<<6,1>,36>,1)},36 % 1)"
"({(edible,poisonous),(odor,none),(<<6,1>,36>,2)},120 % 1)"
"({(edible,poisonous),(odor,pungent),(<<6,1>,36>,1)},256 % 1)"
"({(edible,poisonous),(odor,spicy),(<<6,1>,36>,1)},576 % 1)"

Variable <<7,1>,45> is the next most predictable variable. Variable <<7,1>,45> is in the bottom layer of fud 7 and is defined as follows -

{
	"derived":["<<7,1>,45>"],
	"history":{
		"hsystem":[
			{"var":"odor","values":["almond","anise","creosote","fishy","foul","musty","none","pungent","spicy"]},
			{"var":"<<7,1>,45>","values":["0","1","2"]}
		],
		"hstates":[
			[0,0],
			[1,0],
			[2,1],
			[3,2],
			[4,2],
			[5,2],
			[6,0],
			[7,2],
			[8,2]
		]
	}
}

or

let w7145 = stringsVariable "<<7,1>,45>"

rp $ fund $ ff `fdep` sgl w7145
"{odor}"

rpln $ qqll $ states $ ttaa $ least $ ffqq $ ff `fdep` sgl w7145
"{(odor,almond),(<<7,1>,45>,0)}"
"{(odor,anise),(<<7,1>,45>,0)}"
"{(odor,creosote),(<<7,1>,45>,1)}"
"{(odor,fishy),(<<7,1>,45>,2)}"
"{(odor,foul),(<<7,1>,45>,2)}"
"{(odor,musty),(<<7,1>,45>,2)}"
"{(odor,none),(<<7,1>,45>,0)}"
"{(odor,pungent),(<<7,1>,45>,2)}"
"{(odor,spicy),(<<7,1>,45>,2)}"

In variable <<7,1>,45> underlying value creosote forms a singleton component. creosote, however, is not relevant to edibility, so the label entropy is higher than for variable <<6,1>,36>.

Now optimise for larger tuples, excluding the substrate. Let $B_2 = B~\%~(\mathrm{vars}(F) \setminus V \cup V_{\mathrm{l}})$. Then, \[ \{(\mathrm{lent}(B_2,M,V_{\mathrm{l}}),~M) : M \in \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,B_2,\mathrm{L}}))\} \]

let hhb2 = hhb `hrhrred` (fvars ff `minus` vv `union` vvl)

let (kmax,omax,qmax) = (1, 10, 10)

rpln $ buildcondrr vvl hhb2 kmax omax qmax
"(6.445777995546442e-2,{<<6,1>,36>})"
"(6.75240166808252e-2,{<<7,1>,45>})"
"(0.13057611124536495,{<<3,1>,103>})"
"(0.16474416484069376,{<<1,1>,212>})"
"(0.16917391545419513,{<<1,1>,37>})"
"(0.17856617443735812,{<<1,1>,87>})"
"(0.21016319832421082,{<<4,1>,276>})"
"(0.2328929843975649,{<<4,1>,238>})"
"(0.2559340575050262,{<<1,1>,14>})"
"(0.2663397973256072,{<<1,1>,79>})"

let (kmax,omax,qmax) = (2, 10, 20)

rpln $ buildcondrr vvl hhb2 kmax omax qmax
"(1.1067256420112193e-2,{<<6,1>,36>,<<8,1>,217>})"
"(1.1275561197905626e-2,{<<7,1>,45>,<<8,1>,217>})"
"(1.607586751574508e-2,{<<6,1>,36>,<<8,2>,235>})"
"(1.6491154925838747e-2,{<<7,1>,45>,<<8,2>,235>})"
"(2.037283830138059e-2,{<<6,1>,36>,<<8,1>,71>})"
"(2.176957272342106e-2,{<<2,1>,281>,<<3,1>,103>})"
"(2.192285077228817e-2,{<<2,1>,281>,<<7,1>,45>})"
"(2.1922850772288616e-2,{<<2,1>,281>,<<6,1>,36>})"
"(2.230180269810611e-2,{<<7,1>,45>,<<8,1>,71>})"
"(2.2344953993540195e-2,{<<6,1>,36>,<<8,3>,4>})"
"(6.445777995546442e-2,{<<6,1>,36>})"
"(6.75240166808252e-2,{<<7,1>,45>})"
"(0.13057611124536495,{<<3,1>,103>})"
"(0.16474416484069376,{<<1,1>,212>})"
...

let (kmax,omax,qmax) = (3, 10, 30)

let ll = buildcondrr vvl hhb2 kmax omax qmax

rpln ll
"(-4.440892098500626e-16,{<<2,1>,281>,<<3,1>,103>,<<10,2>,311>})"
"(0.0,{<<2,1>,281>,<<6,1>,36>,<<10,2>,311>})"
"(0.0,{<<2,1>,281>,<<7,1>,45>,<<10,2>,311>})"
"(0.0,{<<6,1>,36>,<<8,1>,71>,<<10,2>,311>})"
"(0.0,{<<6,1>,36>,<<8,1>,217>,<<10,2>,311>})"
"(0.0,{<<6,1>,36>,<<8,3>,4>,<<10,2>,311>})"
"(0.0,{<<7,1>,45>,<<8,1>,71>,<<10,2>,311>})"
"(0.0,{<<7,1>,45>,<<8,1>,217>,<<10,2>,311>})"
"(2.21500795516949e-3,{<<2,1>,281>,<<7,1>,45>,<<10,1>,100>})"
"(2.215007955169934e-3,{<<2,1>,281>,<<3,1>,103>,<<10,1>,100>})"
"(1.1067256420112193e-2,{<<6,1>,36>,<<8,1>,217>})"
...

Now we have found 8 tuples which are predictive of edibility,

rpln [(xx, fund (ff `fdep` xx)) | (e,xx) <- ll, e < 1e-14] 
"({<<2,1>,281>,<<3,1>,103>,<<10,2>,311>},{bruises,cap-color,gill-attachment,gill-spacing,habitat,odor,population,spore-print-color,stalk-color-above-ring,stalk-color-below-ring,stalk-root,stalk-surface-below-ring,veil-color})"
"({<<2,1>,281>,<<6,1>,36>,<<10,2>,311>},{bruises,cap-color,gill-spacing,habitat,odor,population,spore-print-color,stalk-color-above-ring,stalk-color-below-ring,stalk-root,stalk-surface-below-ring,veil-color})"
"({<<2,1>,281>,<<7,1>,45>,<<10,2>,311>},{bruises,cap-color,gill-spacing,habitat,odor,population,spore-print-color,stalk-color-above-ring,stalk-color-below-ring,stalk-root,stalk-surface-below-ring,veil-color})"
"({<<6,1>,36>,<<8,1>,71>,<<10,2>,311>},{cap-color,gill-spacing,habitat,odor,population,spore-print-color,stalk-color-above-ring,stalk-color-below-ring,stalk-root,stalk-surface-below-ring,veil-color})"
"({<<6,1>,36>,<<8,1>,217>,<<10,2>,311>},{cap-color,gill-spacing,habitat,odor,population,spore-print-color,stalk-color-above-ring,stalk-color-below-ring,stalk-root,stalk-surface-below-ring,veil-color})"
"({<<6,1>,36>,<<8,3>,4>,<<10,2>,311>},{bruises,cap-color,gill-spacing,habitat,odor,population,spore-print-color,stalk-color-above-ring,stalk-color-below-ring,stalk-root,stalk-surface-above-ring,stalk-surface-below-ring,veil-color})"
"({<<7,1>,45>,<<8,1>,71>,<<10,2>,311>},{cap-color,gill-spacing,habitat,odor,population,spore-print-color,stalk-color-above-ring,stalk-color-below-ring,stalk-root,stalk-surface-below-ring,veil-color})"
"({<<7,1>,45>,<<8,1>,217>,<<10,2>,311>},{cap-color,gill-spacing,habitat,odor,population,spore-print-color,stalk-color-above-ring,stalk-color-below-ring,stalk-root,stalk-surface-below-ring,veil-color})"

All include variable <<10,2>,311> which is in the second layer and depends on 9 substrate variables,

let w102311 = stringsVariable "<<10,2>,311>"

card $ fund $ ff `fdep` sgl w102311
9

rp $ fund $ ff `fdep` sgl w102311
"{cap-color,gill-spacing,habitat,population,stalk-color-above-ring,stalk-color-below-ring,stalk-root,stalk-surface-below-ring,veil-color}"

Let us sort by shuffle content derived alignment descending. Let $L = \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,B_2,\mathrm{L}}))$. Then calculate \[ \{(\mathrm{algn}(B\%X)-\mathrm{algn}(B_{\mathrm{r}}\%X),~X) : (e,X) \in L,~e \approx 0\} \] where $B_{\mathrm{r}} = A_{\mathrm{r}} * \prod\mathrm{his}(F)$,

let aar = hhaa $ fromJust $ historiesShuffle (aahh aa) 1

let hhr = aahr uu aar

let hhbr = hrfmul uu2 ff hhr

hrsize hhbr
8124

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, e < 1e-14, let aa' = hhaa (hrhh uu2 (hhb `hrhrred` xx)), let aar' = hhaa (hrhh uu2 (hhbr `hrhrred` xx))] 
"(2497.8620087369127,{<<2,1>,281>,<<6,1>,36>,<<10,2>,311>})"
"(2264.151471030258,{<<2,1>,281>,<<3,1>,103>,<<10,2>,311>})"
"(2196.043847907931,{<<2,1>,281>,<<7,1>,45>,<<10,2>,311>})"
"(400.7703551572704,{<<6,1>,36>,<<8,1>,71>,<<10,2>,311>})"
"(379.7467829022353,{<<7,1>,45>,<<8,1>,71>,<<10,2>,311>})"
"(344.1523150150606,{<<6,1>,36>,<<8,1>,217>,<<10,2>,311>})"
"(339.22729806548887,{<<7,1>,45>,<<8,1>,217>,<<10,2>,311>})"
"(222.40958944160957,{<<6,1>,36>,<<8,3>,4>,<<10,2>,311>})"

and by size-volume-sized-shuffle relative entropy descending, \[ \{(\mathrm{rent}(B~\%~X,~Z_F * \hat{B}_{\mathrm{r}}~\%~X),~X) : (e,X) \in L,~e \approx 0\} \]

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, e < 1e-14, let aa' = hhaa (hrhh uu2 (hhb `hrhrred` xx)), let vaar' = vsize uu2 (fund (ff `fdep` xx)) (hhaa (hrhh uu2 (hhbr `hrhrred` xx)))] 
"(3587.1309378147125,{<<2,1>,281>,<<6,1>,36>,<<10,2>,311>})"
"(3377.876708507538,{<<2,1>,281>,<<3,1>,103>,<<10,2>,311>})"
"(3254.055778026581,{<<2,1>,281>,<<7,1>,45>,<<10,2>,311>})"
"(580.0929397344589,{<<6,1>,36>,<<8,1>,71>,<<10,2>,311>})"
"(558.7054543495178,{<<7,1>,45>,<<8,1>,71>,<<10,2>,311>})"
"(471.5192070007324,{<<6,1>,36>,<<8,3>,4>,<<10,2>,311>})"
"(397.2206202149391,{<<6,1>,36>,<<8,1>,217>,<<10,2>,311>})"
"(359.7466884255409,{<<7,1>,45>,<<8,1>,217>,<<10,2>,311>})"

We can see that the derived alignments and the relative entropies of the submodels, $X \subset \mathrm{vars}(F) \setminus V$, are highly correlated. Also the derived alignments and the relative entropies of the submodels are higher than that of the manual model, $W_{\mathrm{m}}$, which suggest that the induced submodels are more likely and less sensitive than the manual model.

Let us analyse the top sub-model,

let w21281 = stringsVariable "<<2,1>,281>"
let w6136 = stringsVariable "<<6,1>,36>"
let w102311 = stringsVariable "<<10,2>,311>"

rp $ fund $ ff `fdep` sgl w21281
"{bruises,spore-print-color}"

rp $ fund $ ff `fdep` sgl w6136
"{odor}"

rp $ fund $ ff `fdep` sgl w102311
"{cap-color,gill-spacing,habitat,population,stalk-color-above-ring,stalk-color-below-ring,stalk-root,stalk-surface-below-ring,veil-color}"

hrlent uu2 hhb (sgl w21281) vvl
0.4636572682639106

hrlent uu2 hhb (sgl w6136) vvl
6.445777995546442e-2

hrlent uu2 hhb (sgl w102311) vvl
0.661359275162257

let xx = llqq [w21281,w6136,w102311]

hrlent uu2 hhb xx vvl
0.0

card $ fvars $ ff `fdep` xx
20

This tuple has a volume of 24,

vol uu2 xx
24

but classifies the sample into only 12 effective states or slices,

rpln $ aall $ hhaa (hrhh uu2 (hhb `hrhrred` (xx `union` vvl)))
"({(edible,edible),(<<2,1>,281>,0),(<<6,1>,36>,0),(<<10,2>,311>,0)},800 % 1)"
"({(edible,edible),(<<2,1>,281>,0),(<<6,1>,36>,2),(<<10,2>,311>,0)},1728 % 1)"
"({(edible,edible),(<<2,1>,281>,1),(<<6,1>,36>,2),(<<10,2>,311>,0)},1328 % 1)"
"({(edible,edible),(<<2,1>,281>,3),(<<6,1>,36>,2),(<<10,2>,311>,0)},352 % 1)"
"({(edible,poisonous),(<<2,1>,281>,0),(<<6,1>,36>,1),(<<10,2>,311>,0)},256 % 1)"
"({(edible,poisonous),(<<2,1>,281>,1),(<<6,1>,36>,1),(<<10,2>,311>,0)},1488 % 1)"
"({(edible,poisonous),(<<2,1>,281>,1),(<<6,1>,36>,2),(<<10,2>,311>,1)},8 % 1)"
"({(edible,poisonous),(<<2,1>,281>,2),(<<6,1>,36>,1),(<<10,2>,311>,0)},288 % 1)"
"({(edible,poisonous),(<<2,1>,281>,2),(<<6,1>,36>,2),(<<10,2>,311>,0)},72 % 1)"
"({(edible,poisonous),(<<2,1>,281>,3),(<<6,1>,36>,1),(<<10,2>,311>,0)},1476 % 1)"
"({(edible,poisonous),(<<2,1>,281>,3),(<<6,1>,36>,1),(<<10,2>,311>,1)},288 % 1)"
"({(edible,poisonous),(<<2,1>,281>,3),(<<6,1>,36>,2),(<<10,2>,311>,1)},40 % 1)"

size $ eff $ hhaa (hrhh uu2 (hhb `hrhrred` (xx `union` vvl)))
12 % 1

rpln $ qqll $ ssplit xx $ states $ hhaa (hrhh uu2 (hhb `hrhrred` (xx `union` vvl)))
"({(<<2,1>,281>,0),(<<6,1>,36>,0),(<<10,2>,311>,0)},{(edible,edible)})"
"({(<<2,1>,281>,0),(<<6,1>,36>,1),(<<10,2>,311>,0)},{(edible,poisonous)})"
"({(<<2,1>,281>,0),(<<6,1>,36>,2),(<<10,2>,311>,0)},{(edible,edible)})"
"({(<<2,1>,281>,1),(<<6,1>,36>,1),(<<10,2>,311>,0)},{(edible,poisonous)})"
"({(<<2,1>,281>,1),(<<6,1>,36>,2),(<<10,2>,311>,0)},{(edible,edible)})"
"({(<<2,1>,281>,1),(<<6,1>,36>,2),(<<10,2>,311>,1)},{(edible,poisonous)})"
"({(<<2,1>,281>,2),(<<6,1>,36>,1),(<<10,2>,311>,0)},{(edible,poisonous)})"
"({(<<2,1>,281>,2),(<<6,1>,36>,2),(<<10,2>,311>,0)},{(edible,poisonous)})"
"({(<<2,1>,281>,3),(<<6,1>,36>,1),(<<10,2>,311>,0)},{(edible,poisonous)})"
"({(<<2,1>,281>,3),(<<6,1>,36>,1),(<<10,2>,311>,1)},{(edible,poisonous)})"
"({(<<2,1>,281>,3),(<<6,1>,36>,2),(<<10,2>,311>,0)},{(edible,edible)})"
"({(<<2,1>,281>,3),(<<6,1>,36>,2),(<<10,2>,311>,1)},{(edible,poisonous)})"

Now consider if a 4-tuple would be more likely,

let (kmax,omax,qmax) = (4, 10, 30)

let ll = buildcondrr vvl hhb2 kmax omax qmax

rpln ll
"(-1.3322676295501878e-15,{<<2,1>,239>,<<2,1>,281>,<<3,1>,103>,<<10,1>,100>})"
"(-1.3322676295501878e-15,{<<2,1>,239>,<<2,1>,281>,<<7,1>,45>,<<10,1>,100>})"
"(-8.881784197001252e-16,{<<2,s>,1>,<<2,1>,281>,<<3,1>,103>,<<10,1>,100>})"
"(-8.881784197001252e-16,{<<2,1>,80>,<<2,1>,281>,<<3,1>,103>,<<10,1>,100>})"
"(-8.881784197001252e-16,{<<2,1>,281>,<<3,1>,103>,<<4,4>,40>,<<10,1>,100>})"
"(-8.881784197001252e-16,{<<2,1>,281>,<<3,1>,103>,<<4,4>,109>,<<10,1>,100>})"
"(-8.881784197001252e-16,{<<2,1>,281>,<<3,1>,103>,<<5,1>,218>,<<10,1>,100>})"
"(-8.881784197001252e-16,{<<2,1>,281>,<<3,1>,103>,<<8,2>,244>,<<10,1>,100>})"
"(-8.881784197001252e-16,{<<2,1>,281>,<<4,4>,131>,<<7,1>,45>,<<10,1>,100>})"
"(-4.440892098500626e-16,{<<1,n>,4>,<<2,1>,281>,<<7,1>,45>,<<10,1>,100>})"
"(-4.440892098500626e-16,{<<2,1>,281>,<<3,1>,103>,<<10,2>,311>})"
"(0.0,{<<2,1>,281>,<<6,1>,36>,<<10,2>,311>})"
"(0.0,{<<2,1>,281>,<<7,1>,45>,<<10,2>,311>})"
"(0.0,{<<6,1>,36>,<<8,1>,71>,<<10,2>,311>})"
"(0.0,{<<6,1>,36>,<<8,1>,217>,<<10,2>,311>})"
"(0.0,{<<6,1>,36>,<<8,3>,4>,<<10,2>,311>})"
"(0.0,{<<7,1>,45>,<<8,1>,71>,<<10,2>,311>})"
"(0.0,{<<7,1>,45>,<<8,1>,217>,<<10,2>,311>})"
...

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, e < 1e-14, let aa' = hhaa (hrhh uu2 (hhb `hrhrred` xx)), let aar' = hhaa (hrhh uu2 (hhbr `hrhrred` xx))] 
"(8527.088206908153,{<<2,1>,239>,<<2,1>,281>,<<3,1>,103>,<<10,1>,100>})"
"(8314.397547958259,{<<1,n>,4>,<<2,1>,281>,<<7,1>,45>,<<10,1>,100>})"
"(7577.306805248743,{<<2,1>,281>,<<3,1>,103>,<<4,4>,109>,<<10,1>,100>})"
"(7271.234836176271,{<<2,1>,281>,<<3,1>,103>,<<5,1>,218>,<<10,1>,100>})"
"(7199.2720202532655,{<<2,1>,80>,<<2,1>,281>,<<3,1>,103>,<<10,1>,100>})"
"(6825.830960889085,{<<2,1>,281>,<<4,4>,131>,<<7,1>,45>,<<10,1>,100>})"
"(6341.89748897274,{<<2,1>,239>,<<2,1>,281>,<<7,1>,45>,<<10,1>,100>})"
"(6157.637217947755,{<<2,1>,281>,<<3,1>,103>,<<4,4>,40>,<<10,1>,100>})"
"(5345.849397610538,{<<2,s>,1>,<<2,1>,281>,<<3,1>,103>,<<10,1>,100>})"
"(3523.188967835813,{<<2,1>,281>,<<3,1>,103>,<<8,2>,244>,<<10,1>,100>})"
"(2497.8620087369127,{<<2,1>,281>,<<6,1>,36>,<<10,2>,311>})"
"(2264.151471030258,{<<2,1>,281>,<<3,1>,103>,<<10,2>,311>})"
"(2196.043847907931,{<<2,1>,281>,<<7,1>,45>,<<10,2>,311>})"
"(400.7703551572704,{<<6,1>,36>,<<8,1>,71>,<<10,2>,311>})"
"(379.7467829022353,{<<7,1>,45>,<<8,1>,71>,<<10,2>,311>})"
"(344.1523150150606,{<<6,1>,36>,<<8,1>,217>,<<10,2>,311>})"
"(339.22729806548887,{<<7,1>,45>,<<8,1>,217>,<<10,2>,311>})"
"(222.40958944160957,{<<6,1>,36>,<<8,3>,4>,<<10,2>,311>})"

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, e < 1e-14, let aa' = hhaa (hrhh uu2 (hhb `hrhrred` xx)), let vaar' = vsize uu2 (fund (ff `fdep` xx)) (hhaa (hrhh uu2 (hhbr `hrhrred` xx)))] 
"(10557.535177643585,{<<1,n>,4>,<<2,1>,281>,<<7,1>,45>,<<10,1>,100>})"
"(9242.638248175383,{<<2,1>,281>,<<3,1>,103>,<<4,4>,40>,<<10,1>,100>})"
"(9154.46537399292,{<<2,1>,281>,<<3,1>,103>,<<4,4>,109>,<<10,1>,100>})"
"(8707.323609109968,{<<2,1>,281>,<<3,1>,103>,<<5,1>,218>,<<10,1>,100>})"
"(8131.591965079308,{<<2,1>,281>,<<4,4>,131>,<<7,1>,45>,<<10,1>,100>})"
"(7944.993670288968,{<<2,1>,239>,<<2,1>,281>,<<3,1>,103>,<<10,1>,100>})"
"(7943.237293086946,{<<2,s>,1>,<<2,1>,281>,<<3,1>,103>,<<10,1>,100>})"
"(5252.33876140794,{<<2,1>,239>,<<2,1>,281>,<<7,1>,45>,<<10,1>,100>})"
"(5185.635422746374,{<<2,1>,80>,<<2,1>,281>,<<3,1>,103>,<<10,1>,100>})"
"(4651.403177678585,{<<2,1>,281>,<<3,1>,103>,<<8,2>,244>,<<10,1>,100>})"
"(3587.1309378147125,{<<2,1>,281>,<<6,1>,36>,<<10,2>,311>})"
"(3377.876708507538,{<<2,1>,281>,<<3,1>,103>,<<10,2>,311>})"
"(3254.055778026581,{<<2,1>,281>,<<7,1>,45>,<<10,2>,311>})"
"(580.0929397344589,{<<6,1>,36>,<<8,1>,71>,<<10,2>,311>})"
"(558.7054543495178,{<<7,1>,45>,<<8,1>,71>,<<10,2>,311>})"
"(471.5192070007324,{<<6,1>,36>,<<8,3>,4>,<<10,2>,311>})"
"(397.2206202149391,{<<6,1>,36>,<<8,1>,217>,<<10,2>,311>})"
"(359.7466884255409,{<<7,1>,45>,<<8,1>,217>,<<10,2>,311>})"

The top sub-model by derived alignment is all in the first layer,

rp $ fund $ ff `fdep` sgl (stringsVariable "<<2,1>,239>")
"{gill-size,habitat}"

rp $ fund $ ff `fdep` sgl (stringsVariable "<<2,1>,281>")
"{bruises,spore-print-color}"

rp $ fund $ ff `fdep` sgl (stringsVariable "<<3,1>,103>")
"{gill-attachment,odor}"

rp $ fund $ ff `fdep` sgl (stringsVariable "<<10,1>,100>")
"{gill-spacing,habitat,population}"

let xx = llqq $ map stringsVariable ["<<2,1>,239>","<<2,1>,281>","<<3,1>,103>","<<10,1>,100>"]

hrlent uu2 hhb xx vvl
-1.3322676295501878e-15

card $ fvars $ ff `fdep` xx
12

vol uu2 xx
128

size $ eff $ hhaa (hrhh uu2 (hhb `hrhrred` (xx `union` vvl)))
32 % 1

rpln $ qqll $ ssplit xx $ states $ hhaa (hrhh uu2 (hhb `hrhrred` (xx `union` vvl)))
"({(<<2,1>,239>,0),(<<2,1>,281>,0),(<<3,1>,103>,0),(<<10,1>,100>,0)},{(edible,edible)})"
"({(<<2,1>,239>,0),(<<2,1>,281>,1),(<<3,1>,103>,1),(<<10,1>,100>,0)},{(edible,poisonous)})"
"({(<<2,1>,239>,0),(<<2,1>,281>,1),(<<3,1>,103>,3),(<<10,1>,100>,0)},{(edible,edible)})"
...
"({(<<2,1>,239>,3),(<<2,1>,281>,1),(<<3,1>,103>,3),(<<10,1>,100>,0)},{(edible,edible)})"
"({(<<2,1>,239>,3),(<<2,1>,281>,3),(<<3,1>,103>,0),(<<10,1>,100>,1)},{(edible,poisonous)})"
"({(<<2,1>,239>,3),(<<2,1>,281>,3),(<<3,1>,103>,3),(<<10,1>,100>,0)},{(edible,edible)})"

The top sub-model by relative entropy includes a variable from the top layer of the root fud,

rp $ fund $ ff `fdep` sgl (stringsVariable "<<1,n>,4>")
"{bruises,spore-print-color,stalk-root,stalk-shape}"

rp $ fund $ ff `fdep` sgl (stringsVariable "<<2,1>,281>")
"{bruises,spore-print-color}"

rp $ fund $ ff `fdep` sgl (stringsVariable "<<7,1>,45>")
"{odor}"

rp $ fund $ ff `fdep` sgl (stringsVariable "<<10,1>,100>")
"{gill-spacing,habitat,population}"

let xx = llqq $ map stringsVariable ["<<1,n>,4>","<<2,1>,281>","<<7,1>,45>","<<10,1>,100>"]

hrlent uu2 hhb xx vvl
-4.440892098500626e-16

card $ fvars $ ff `fdep` xx
15

vol uu2 xx
72

size $ eff $ hhaa (hrhh uu2 (hhb `hrhrred` (xx `union` vvl)))
19 % 1

rpln $ qqll $ ssplit xx $ states $ hhaa (hrhh uu2 (hhb `hrhrred` (xx `union` vvl)))
"({(<<1,n>,4>,0),(<<2,1>,281>,0),(<<7,1>,45>,0),(<<10,1>,100>,0)},{(edible,edible)})"
"({(<<1,n>,4>,0),(<<2,1>,281>,0),(<<7,1>,45>,0),(<<10,1>,100>,1)},{(edible,edible)})"
"({(<<1,n>,4>,0),(<<2,1>,281>,1),(<<7,1>,45>,0),(<<10,1>,100>,0)},{(edible,edible)})"
...
"({(<<1,n>,4>,2),(<<2,1>,281>,0),(<<7,1>,45>,2),(<<10,1>,100>,0)},{(edible,poisonous)})"
"({(<<1,n>,4>,2),(<<2,1>,281>,1),(<<7,1>,45>,0),(<<10,1>,100>,0)},{(edible,edible)})"
"({(<<1,n>,4>,2),(<<2,1>,281>,1),(<<7,1>,45>,0),(<<10,1>,100>,1)},{(edible,edible)})"

Increasing to a 5-tuple with omax == 10 does not find any other sub-models,

let (kmax,omax,qmax) = (5, 10, 30)

let ll = buildcondrr vvl hhb2 kmax omax qmax

rpln ll
"(-1.3322676295501878e-15,{<<2,1>,239>,<<2,1>,281>,<<3,1>,103>,<<10,1>,100>})"
"(-1.3322676295501878e-15,{<<2,1>,239>,<<2,1>,281>,<<7,1>,45>,<<10,1>,100>})"
"(-8.881784197001252e-16,{<<2,s>,1>,<<2,1>,281>,<<3,1>,103>,<<10,1>,100>})"
"(-8.881784197001252e-16,{<<2,1>,80>,<<2,1>,281>,<<3,1>,103>,<<10,1>,100>})"
"(-8.881784197001252e-16,{<<2,1>,281>,<<3,1>,103>,<<4,4>,40>,<<10,1>,100>})"
"(-8.881784197001252e-16,{<<2,1>,281>,<<3,1>,103>,<<4,4>,109>,<<10,1>,100>})"
"(-8.881784197001252e-16,{<<2,1>,281>,<<3,1>,103>,<<5,1>,218>,<<10,1>,100>})"
"(-8.881784197001252e-16,{<<2,1>,281>,<<3,1>,103>,<<8,2>,244>,<<10,1>,100>})"
"(-8.881784197001252e-16,{<<2,1>,281>,<<4,4>,131>,<<7,1>,45>,<<10,1>,100>})"
"(-4.440892098500626e-16,{<<1,n>,4>,<<2,1>,281>,<<7,1>,45>,<<10,1>,100>})"
"(-4.440892098500626e-16,{<<2,1>,281>,<<3,1>,103>,<<10,2>,311>})"
"(0.0,{<<2,1>,281>,<<6,1>,36>,<<10,2>,311>})"
"(0.0,{<<2,1>,281>,<<7,1>,45>,<<10,2>,311>})"
"(0.0,{<<6,1>,36>,<<8,1>,71>,<<10,2>,311>})"
"(0.0,{<<6,1>,36>,<<8,1>,217>,<<10,2>,311>})"
"(0.0,{<<6,1>,36>,<<8,3>,4>,<<10,2>,311>})"
"(0.0,{<<7,1>,45>,<<8,1>,71>,<<10,2>,311>})"
"(0.0,{<<7,1>,45>,<<8,1>,217>,<<10,2>,311>})"
...

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, e < 1e-14, let aa' = hhaa (hrhh uu2 (hhb `hrhrred` xx)), let aar' = hhaa (hrhh uu2 (hhbr `hrhrred` xx))] 
"(8527.088206908153,{<<2,1>,239>,<<2,1>,281>,<<3,1>,103>,<<10,1>,100>})"
"(8314.397547958259,{<<1,n>,4>,<<2,1>,281>,<<7,1>,45>,<<10,1>,100>})"
"(7577.306805248743,{<<2,1>,281>,<<3,1>,103>,<<4,4>,109>,<<10,1>,100>})"
...

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, e < 1e-14, let aa' = hhaa (hrhh uu2 (hhb `hrhrred` xx)), let vaar' = vsize uu2 (fund (ff `fdep` xx)) (hhaa (hrhh uu2 (hhbr `hrhrred` xx)))] 
"(10557.535177643585,{<<1,n>,4>,<<2,1>,281>,<<7,1>,45>,<<10,1>,100>})"
"(9242.638248175383,{<<2,1>,281>,<<3,1>,103>,<<4,4>,40>,<<10,1>,100>})"
"(9154.46537399292,{<<2,1>,281>,<<3,1>,103>,<<4,4>,109>,<<10,1>,100>})"
...

However, if we increase omax to 20 we find some 3-tuple sub-models with higher derived alignments and similar relative entropies,

let (kmax,omax,qmax) = (5, 20, 30)

let ll = buildcondrr vvl hhb2 kmax omax qmax

rpln ll
"(-4.440892098500626e-16,{<<2,1>,281>,<<3,1>,103>,<<10,2>,311>})"
"(0.0,{<<1,n>,1>,<<1,1>,79>,<<3,2>,166>})"
"(0.0,{<<1,n>,7>,<<1,1>,79>,<<3,2>,166>})"
"(0.0,{<<1,s>,4>,<<1,1>,79>,<<3,2>,166>})"
"(0.0,{<<1,1>,79>,<<1,1>,207>,<<3,2>,166>})"
"(0.0,{<<1,1>,79>,<<1,2>,29>,<<3,2>,166>})"
"(0.0,{<<1,1>,79>,<<1,2>,225>,<<3,2>,166>})"
"(0.0,{<<1,1>,79>,<<2,1>,65>,<<3,2>,166>})"
"(0.0,{<<1,1>,79>,<<3,2>,166>,<<4,2>,103>})"
"(0.0,{<<1,1>,79>,<<3,2>,166>,<<4,2>,161>})"
"(0.0,{<<1,1>,79>,<<3,2>,166>,<<7,1>,102>})"
"(0.0,{<<1,1>,79>,<<3,2>,166>,<<10,n>,1>})"
"(0.0,{<<1,1>,79>,<<3,2>,166>,<<10,n>,2>})"
"(0.0,{<<1,1>,79>,<<3,2>,166>,<<10,n>,3>})"
"(0.0,{<<1,1>,79>,<<3,2>,166>,<<10,n>,4>})"
"(0.0,{<<1,1>,79>,<<3,2>,166>,<<10,n>,5>})"
"(0.0,{<<1,1>,79>,<<3,2>,166>,<<10,n>,6>})"
"(0.0,{<<1,1>,79>,<<3,2>,166>,<<10,n>,7>})"
"(0.0,{<<1,1>,79>,<<3,2>,166>,<<10,n>,8>})"
"(0.0,{<<1,1>,79>,<<3,2>,166>,<<10,n>,9>})"
...

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, e < 1e-14, let aa' = hhaa (hrhh uu2 (hhb `hrhrred` xx)), let aar' = hhaa (hrhh uu2 (hhbr `hrhrred` xx))] 
"(12713.696863137891,{<<1,1>,79>,<<1,1>,207>,<<3,2>,166>})"
"(11941.517891089868,{<<1,1>,79>,<<2,1>,65>,<<3,2>,166>})"
"(9233.082785127524,{<<1,1>,79>,<<3,2>,166>,<<4,2>,103>})"
"(8842.040563424976,{<<1,n>,1>,<<1,1>,79>,<<3,2>,166>})"
"(8842.04056342497,{<<1,1>,79>,<<1,2>,29>,<<3,2>,166>})"
"(8796.877460886935,{<<1,n>,7>,<<1,1>,79>,<<3,2>,166>})"
"(8796.877460886892,{<<1,1>,79>,<<1,2>,225>,<<3,2>,166>})"
"(8773.786866869392,{<<1,1>,79>,<<3,2>,166>,<<7,1>,102>})"
"(8516.223496706974,{<<1,1>,79>,<<3,2>,166>,<<4,2>,161>})"
"(3953.2129505576595,{<<1,1>,79>,<<3,2>,166>,<<10,n>,5>})"
"(3953.032235524086,{<<1,1>,79>,<<3,2>,166>,<<10,n>,1>})"
"(3952.4524170144286,{<<1,1>,79>,<<3,2>,166>,<<10,n>,8>})"
"(3951.6369220215274,{<<1,1>,79>,<<3,2>,166>,<<10,n>,6>})"
"(3950.174173453779,{<<1,1>,79>,<<3,2>,166>,<<10,n>,3>})"
"(3949.822617623824,{<<1,1>,79>,<<3,2>,166>,<<10,n>,2>})"
"(3948.7929981994894,{<<1,1>,79>,<<3,2>,166>,<<10,n>,9>})"
"(3946.5524692789404,{<<1,1>,79>,<<3,2>,166>,<<10,n>,7>})"
"(3944.466468956656,{<<1,1>,79>,<<3,2>,166>,<<10,n>,4>})"
"(3905.762371777695,{<<1,s>,4>,<<1,1>,79>,<<3,2>,166>})"
"(2264.151471030258,{<<2,1>,281>,<<3,1>,103>,<<10,2>,311>})"

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, e < 1e-14, let aa' = hhaa (hrhh uu2 (hhb `hrhrred` xx)), let vaar' = vsize uu2 (fund (ff `fdep` xx)) (hhaa (hrhh uu2 (hhbr `hrhrred` xx)))] 
"(8891.585304835404,{<<1,1>,79>,<<1,2>,29>,<<3,2>,166>})"
"(8891.585304835287,{<<1,n>,1>,<<1,1>,79>,<<3,2>,166>})"
"(8814.679132970166,{<<1,1>,79>,<<3,2>,166>,<<4,2>,103>})"
"(8717.961974559294,{<<1,1>,79>,<<3,2>,166>,<<4,2>,161>})"
"(6258.5066506674375,{<<1,1>,79>,<<1,1>,207>,<<3,2>,166>})"
"(6039.5386900025405,{<<1,1>,79>,<<2,1>,65>,<<3,2>,166>})"
"(5898.02287415359,{<<1,1>,79>,<<1,2>,225>,<<3,2>,166>})"
"(5898.02287415359,{<<1,n>,7>,<<1,1>,79>,<<3,2>,166>})"
"(5597.137802124023,{<<1,1>,79>,<<3,2>,166>,<<10,n>,6>})"
"(5350.54219943285,{<<1,1>,79>,<<3,2>,166>,<<10,n>,2>})"
"(5333.425339281559,{<<1,1>,79>,<<3,2>,166>,<<10,n>,1>})"
"(5311.292247593403,{<<1,1>,79>,<<3,2>,166>,<<10,n>,4>})"
"(5221.305827239528,{<<1,1>,79>,<<3,2>,166>,<<10,n>,3>})"
"(5195.33349609375,{<<1,1>,79>,<<3,2>,166>,<<10,n>,9>})"
"(5177.754608154297,{<<1,1>,79>,<<3,2>,166>,<<10,n>,5>})"
"(5126.841391921043,{<<1,1>,79>,<<3,2>,166>,<<10,n>,7>})"
"(5118.96097278595,{<<1,1>,79>,<<3,2>,166>,<<10,n>,8>})"
"(5045.093370459974,{<<1,s>,4>,<<1,1>,79>,<<3,2>,166>})"
"(4741.080206201626,{<<1,1>,79>,<<3,2>,166>,<<7,1>,102>})"
"(3377.876708507538,{<<2,1>,281>,<<3,1>,103>,<<10,2>,311>})"

Analysing the top sub-model by derived alignment,

rp $ fund $ ff `fdep` sgl (stringsVariable "<<1,1>,79>")
"{stalk-root,stalk-shape}"

rp $ fund $ ff `fdep` sgl (stringsVariable "<<1,1>,207>")
"{bruises,spore-print-color}"

rp $ fund $ ff `fdep` sgl (stringsVariable "<<3,2>,166>")
"{gill-attachment,gill-spacing,odor,spore-print-color}"

let xx = llqq $ map stringsVariable ["<<1,1>,79>","<<1,1>,207>","<<3,2>,166>"]

hrlent uu2 hhb xx vvl
0.0

card $ fvars $ ff `fdep` xx
12

vol uu2 xx
80

size $ eff $ hhaa (hrhh uu2 (hhb `hrhrred` (xx `union` vvl)))
21 % 1

rpln $ qqll $ ssplit xx $ states $ hhaa (hrhh uu2 (hhb `hrhrred` (xx `union` vvl)))
"({(<<1,1>,79>,0),(<<1,1>,207>,0),(<<3,2>,166>,3)},{(edible,poisonous)})"
"({(<<1,1>,79>,0),(<<1,1>,207>,1),(<<3,2>,166>,2)},{(edible,poisonous)})"
"({(<<1,1>,79>,0),(<<1,1>,207>,2),(<<3,2>,166>,0)},{(edible,poisonous)})"
...
"({(<<1,1>,79>,3),(<<1,1>,207>,3),(<<3,2>,166>,1)},{(edible,poisonous)})"
"({(<<1,1>,79>,3),(<<1,1>,207>,3),(<<3,2>,166>,3)},{(edible,edible)})"
"({(<<1,1>,79>,4),(<<1,1>,207>,3),(<<3,2>,166>,3)},{(edible,poisonous)})"

Analysing the top sub-model by relative entropy,

rp $ fund $ ff `fdep` sgl (stringsVariable "<<1,1>,79>")
"{stalk-root,stalk-shape}"

rp $ fund $ ff `fdep` sgl (stringsVariable "<<1,2>,29>")
"{habitat,ring-type}"

rp $ fund $ ff `fdep` sgl (stringsVariable "<<3,2>,166>")
"{gill-attachment,gill-spacing,odor,spore-print-color}"

let xx = llqq $ map stringsVariable ["<<1,1>,79>","<<1,2>,29>","<<3,2>,166>"]

hrlent uu2 hhb xx vvl
0.0

card $ fvars $ ff `fdep` xx
14

vol uu2 xx
40

size $ eff $ hhaa (hrhh uu2 (hhb `hrhrred` (xx `union` vvl)))
17 % 1

rpln $ qqll $ ssplit xx $ states $ hhaa (hrhh uu2 (hhb `hrhrred` (xx `union` vvl)))
"({(<<1,1>,79>,0),(<<1,2>,29>,0),(<<3,2>,166>,0)},{(edible,poisonous)})"
"({(<<1,1>,79>,0),(<<1,2>,29>,0),(<<3,2>,166>,1)},{(edible,edible)})"
"({(<<1,1>,79>,0),(<<1,2>,29>,0),(<<3,2>,166>,3)},{(edible,poisonous)})"
...
"({(<<1,1>,79>,3),(<<1,2>,29>,0),(<<3,2>,166>,3)},{(edible,edible)})"
"({(<<1,1>,79>,3),(<<1,2>,29>,1),(<<3,2>,166>,1)},{(edible,poisonous)})"
"({(<<1,1>,79>,4),(<<1,2>,29>,1),(<<3,2>,166>,3)},{(edible,poisonous)})"

To conclude, we can see that there are many robust sub-models of the induced model that are predictive of edibility.


top