MUSH - Analysis of the UCI Machine Learning Repository Mushroom Data Set

Sections

Introduction

Properties of the sample

Predicting edibility without modelling

Predicting odor without modelling

Manual modelling of edibility

Induced modelling of edibility

Introduction

The UCI Machine Learning Repository Mushroom Data Set is a popular dataset often used to test machine learning algorithms (e.g. Kaggle).

The dataset consists of descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Mushroom Family drawn from The Audubon Society Field Guide to North American Mushrooms (1981). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like “leaflets three, let it be” for Poisonous Oak and Ivy.

The dataset contains 8124 events of 23 discrete-valued variables:

cap-shape: bell,conical,convex,flat, knobbed,sunken
cap-surface: fibrous,grooves,scaly,smooth
cap-color: brown,buff,cinnamon,gray,green, pink,purple,red,white,yellow
bruises: bruises,no
odor: almond,anise,creosote,fishy,foul, musty,none,pungent,spicy
gill-attachment: attached,descending,free,notched
gill-spacing: close,crowded,distant
gill-size: broad,narrow
gill-color: black,brown,buff,chocolate,gray, green,orange,pink,purple,red, white,yellow
stalk-shape: enlarging,tapering
stalk-root: bulbous,club,cup,equal, rhizomorphs,rooted,missing
stalk-surface-above-ring: fibrous,scaly,silky,smooth
stalk-surface-below-ring: fibrous,scaly,silky,smooth
stalk-color-above-ring: brown,buff,cinnamon,gray,orange, pink,red,white,yellow
stalk-color-below-ring: brown,buff,cinnamon,gray,orange, pink,red,white,yellow
veil-type: partial,universal
veil-color: brown,orange,white,yellow
ring-number: none,one,two
ring-type: cobwebby,evanescent,flaring,large, none,pendant,sheathing,zone
spore-print-color: black,brown,buff,chocolate,green, orange,purple,white,yellow
population: abundant,clustered,numerous, scattered,several,solitary
habitat: grasses,leaves,meadows,paths, urban,waste,woods
edibility: edible, poisonous

Note that although edibility is a secondary quality or classification, we shall treat it here as we would any other variable.

We shall analyse this dataset using the MUSHPy repository which depends on the AlignmentRepaPy repository. The AlignmentRepaPy repository is a fast Python implementation of some of the practicable inducers described in the paper. The code in this section can be executed by copying and pasting the code into a Python interpreter, see README. Also see the Introduction in Notation.

Properties of the sample

First load the sample $A$,

from MUSHDev import *

(uu,aa) = mushIO()
vv = uvars(uu)
vvl = sset([VarStr("edible")])
vvk = vv - vvl

The system is $U$. The sample substrate variables are $V = \mathrm{vars}(A)$, the label variables are $V_{\mathrm{l}} = \{\mathrm{edible}\}$, and the query variables form the remainder, $V_{\mathrm{k}} = V \setminus V_{\mathrm{l}}$.

The variable valencies are $\{(w,|U_w|) : w \in V\}$,

rpln(sset([(vol(uu,sset([w])),w) for w in vv]))
# (1, veil-type)
# (2, bruises)
# (2, edible)
# (2, gill-attachment)
# (2, gill-size)
# (2, gill-spacing)
# (2, stalk-shape)
# (3, ring-number)
# (4, cap-surface)
# (4, stalk-surface-above-ring)
# (4, stalk-surface-below-ring)
# (4, veil-color)
# (5, ring-type)
# (5, stalk-root)
# (6, cap-shape)
# (6, population)
# (7, habitat)
# (9, odor)
# (9, spore-print-color)
# (9, stalk-color-above-ring)
# (9, stalk-color-below-ring)
# (10, cap-color)
# (12, gill-color)

Note that veil-type has only one value and so is a constant.

The variable dimension, $|V|$, is,

len(vv)
23

The variable volume, $|V^{\mathrm{C}}|$, is,

vol(uu,vv)
243799621632000

So the mean valency, $|V^{\mathrm{C}}|^{1/|V|}$, is,

exp(log(vol(uu,vv))/len(vv))
4.222048084120202

The label variable dimension, $|V_{\mathrm{l}}|$, is,

len(vvl)
1

The label variable volume, $|V_{\mathrm{l}}^{\mathrm{C}}|$, is,

vol(uu,vvl)
2

The query variable dimension, $|V_{\mathrm{k}}|$, is,

len(vvk)
22

The query variable volume, $|V_{\mathrm{k}}^{\mathrm{C}}|$, is,

vol(uu,vvk)
121899810816000

The geometric mean query valency, $|V_{\mathrm{k}}^{\mathrm{C}}|^{1/|V_{\mathrm{k}}|}$, is,

exp(log(vol(uu,vvk))/len(vvk))
4.367901791531438

The sample size, $\mathrm{size}(A)$, is

size(aa)
# 8124 % 1

So each effective state corresponds to exactly one event, $A = A^{\mathrm{F}}$,

size(eff(aa))
# 8124 % 1

Now consider how highly aligned variables might be grouped together. See Entropy and alignment. First consider pairs in the substrate, $V$, \[ \{(\mathrm{algn}(A\%\{w,x\}),~w,~x) : w \in V,~x \in V,~w < x\} \]

rpln(reversed(list(sset([(algn(red(aa,sset([w,x]))),w,x) for w in vv for x in vv if w < x]))))
# (5255.546241861608, odor, spore-print-color)
# (5243.485309542397, gill-color, spore-print-color)
# (5076.0182810184415, edible, odor)
# (4869.445702685916, spore-print-color, stalk-root)
# (4747.650540211325, gill-color, odor)
# (4634.640609067872, odor, stalk-root)
# (4538.425095165279, ring-type, spore-print-color)
# (4504.522900455915, gill-color, stalk-root)
# (4319.357344722732, gill-color, ring-type)
# (4191.879346198264, odor, ring-type)
# (3876.2346155617124, population, stalk-root)
# (3792.5367348885193, habitat, stalk-root)
# (3631.907175495806, ring-type, stalk-root)
# (3594.3100531883683, stalk-color-above-ring, stalk-color-below-ring)
# (3580.0056701234134, habitat, population)
# (3526.33839164619, gill-color, habitat)
# ...
# (40.29419779899035, bruises, stalk-shape)
# (34.93679977406282, gill-attachment, gill-spacing)
# (27.046821925505355, gill-spacing, stalk-shape)
# (25.072123638783523, cap-surface, stalk-shape)
# (24.98472755986586, bruises, ring-number)
# (23.84031057698303, cap-shape, gill-spacing)
# (10.583146543744078, ring-number, veil-color)
# (0.0, veil-color, veil-type)
# (0.0, stalk-surface-below-ring, veil-type)
# ...
# (0.0, cap-color, veil-type)
# (0.0, bruises, veil-type)

We can see that all of the variables except for mono-valent veil-type are aligned with each other, even if only very weakly. We can also see that some of the variables that are in highly aligned pairs are also in other highly aligned pairs, e.g. odor or spore-print-color. This suggests that we should also consider tuple dimensions greater than two.

Now consider using the tupler to group together highly aligned variables in the substrate, $V$. Note that for performance reasons we must first construct a HistoryRepa from the sample histogram, $A$. See History and HistoryRepa.

First consider the tuple dimension by choosing a volume limit, xmax,

10*12
120

4.367901791531438 ** 4
363.9916829234716

2*2*2*2*2*2*3*4
768

9*9*10
810

9*10*12
1080

4.222048084120202 ** 5
1341.5778383137888

4.367901791531438 ** 5
1589.879923943975

2*2*2*2*2*2*3*4*4
3072

size(aa)
# 8124 % 1

9*9*10*12
9720

Now create a shuffled sample, $A_{\mathrm{r}}$,

hh = aahr(uu,aa)

hhr = historyRepasShuffle_u(hh,1)

hrsize(hhr)
# 8124

The shuffle has the same size as the sample, $\mathrm{size}(A_{\mathrm{r}}) = \mathrm{size}(A)$.

Now optimise the shuffle content alignment with the tuple set builder, $I_{P,U,\mathrm{B,ns,me}}$, \[ \{(\mathrm{algn}(A\%K)-\mathrm{algn}(A_{\mathrm{r}}\%K),~K) : ((K,\cdot,\cdot),\cdot) \in I_{P,U,\mathrm{B,ns,me}}^{ * }((V,~\emptyset,~A,~A_{\mathrm{r}}))\} \]

def buildtuprr(xmax,omax,bmax,uu,vv,xx,xxrr):
    return reversed(list(sset([(algn(rraa(uu,hrred(xx,kk))) - algn(rraa(uu,hrred(xxrr,kk))), kk) for ((kk,_),_) in parametersSystemsBuilderTupleNoSumlayerMultiEffectiveRepa_ui(xmax,omax,bmax,1,uu,vv,fudEmpty(),xx,hrhx(xx),xxrr,hrhx(xxrr))[0]])))

rpln(buildtuprr(1590,10,10,uu,vv,hh,hhr))
# (20909.315588710102, {bruises, edible, odor, ring-type, stalk-root})
# (20737.49894848034, {bruises, odor, ring-type, stalk-root, stalk-shape})
# (20595.04499418575, {bruises, gill-size, odor, ring-type, stalk-root})
# (18771.00223024639, {bruises, gill-spacing, odor, ring-type, stalk-root})
# (17647.35883825688, {habitat, ring-type, spore-print-color, stalk-root})
# (17511.267019084673, {habitat, odor, ring-type, stalk-root})
# (17356.19101355837, {edible, odor, spore-print-color, stalk-root})
# (17337.191103083293, {bruises, odor, ring-number, ring-type, stalk-root})
# (16741.46427782372, {odor, population, ring-type, stalk-root})
# (16536.946674766714, {bruises, gill-attachment, odor, ring-type, stalk-root})

We can see that the top tuples have large intersections. Now optimise again having removed the top tuple from the substrate, \[ Q_1~=~\{\mathrm{bruises},~\mathrm{edible},~\mathrm{odor},~\mathrm{ring type},~\mathrm{stalk root}\} \] and \[ \{(\mathrm{algn}(A\%K)-\mathrm{algn}(A_{\mathrm{r}}\%K),~K) : ((K,\cdot,\cdot),\cdot) \in I_{P,U,\mathrm{B,ns,me}}^{ * }((V \setminus Q_1,~\emptyset,~A,~A_{\mathrm{r}}))\} \]

qq1 = sset([VarStr(s) for s in ["bruises","edible","odor","ring-type","stalk-root"]])

rpln(buildtuprr(1590,10,10,uu,vv-qq1,hh,hhr))
# (13474.472075221576, {gill-color, gill-size, gill-spacing, spore-print-color, stalk-shape})
# (13380.35024073867, {gill-color, gill-size, ring-number, spore-print-color, stalk-shape})
# (13071.036894266123, {gill-color, habitat, spore-print-color, stalk-shape})
# (12727.00966415456, {gill-color, gill-size, habitat, spore-print-color})
# (12404.82395851165, {gill-color, population, spore-print-color, stalk-shape})
# (12154.32506810885, {gill-attachment, gill-color, gill-size, spore-print-color, stalk-shape})
# (12120.144392218797, {gill-color, gill-size, population, spore-print-color})
# (12004.554463851495, {gill-color, spore-print-color, stalk-shape, stalk-surface-below-ring})
# (11553.333907313998, {gill-color, spore-print-color, stalk-shape, stalk-surface-above-ring})
# (11546.535980638902, {gill-color, habitat, population, stalk-shape})

Now optimise again having removed the top two tuples from the substrate, \[ Q_2~=~\{\mathrm{gill color},~\mathrm{gill size},~\mathrm{gill spacing},~\mathrm{spore print color},~\mathrm{stalk shape}\} \] and \[ \{(\mathrm{algn}(A\%K)-\mathrm{algn}(A_{\mathrm{r}}\%K),~K) : ((K,\cdot,\cdot),\cdot) \in I_{P,U,\mathrm{B,ns,me}}^{ * }((V \setminus Q_1 \setminus Q_2,~\emptyset,~A,~A_{\mathrm{r}}))\} \]

qq2 = sset([VarStr(s) for s in ["gill-color","gill-size","gill-spacing","spore-print-color","stalk-shape"]])

rpln(buildtuprr(1590,10,10,uu,vv-qq1-qq2,hh,hhr))
# (10117.338301213375, {habitat, population, stalk-color-below-ring, stalk-surface-below-ring})
# (10015.422262274038, {habitat, population, stalk-color-above-ring, stalk-surface-below-ring})
# (9721.157922413408, {habitat, population, stalk-color-below-ring, stalk-surface-above-ring})
# (9633.996246304727, {habitat, population, stalk-color-above-ring, stalk-surface-above-ring})
# (9348.809148464625, {stalk-color-above-ring, stalk-color-below-ring, stalk-surface-above-ring, stalk-surface-below-ring})
# (8575.748022597432, {cap-surface, habitat, population, stalk-color-below-ring})
# (8554.425441539577, {cap-surface, habitat, population, stalk-color-above-ring})
# (8259.597445754986, {cap-color, habitat, population, ring-number})
# (8098.58768334987, {habitat, population, ring-number, stalk-color-above-ring})
# (8061.878539905454, {habitat, population, ring-number, stalk-color-below-ring})

This time if we remove the union of the top four tuples we terminate at the remainder variables, \[ Q_3~=~\{\mathrm{habitat},~\mathrm{population},…,~\mathrm{stalk surface above ring}\} \] and \[ \{(\mathrm{algn}(A\%K)-\mathrm{algn}(A_{\mathrm{r}}\%K),~K) : ((K,\cdot,\cdot),\cdot) \in I_{P,U,\mathrm{B,ns,me}}^{ * }((V \setminus Q_1 \setminus Q_2 \setminus Q_3,~\emptyset,~A,~A_{\mathrm{r}}))\} \]

qq3 = sset([VarStr(s) for s in ["habitat","population","stalk-color-below-ring","stalk-surface-below-ring","stalk-color-above-ring","stalk-surface-above-ring"]])

vv - qq1 - qq2 - qq3
# {cap-color, cap-shape, cap-surface, gill-attachment, ring-number, veil-color, veil-type}

rpln(buildtuprr(1590,10,10,uu,vv-qq1-qq2-qq3,hh,hhr))
(3032.5994640940407, {cap-color, cap-shape, cap-surface, gill-attachment, ring-number})
(2650.426903548403, {cap-color, cap-shape, gill-attachment, ring-number, veil-color})
(2588.3533754319324, {cap-color, cap-shape, cap-surface, ring-number})
(2474.5970677913137, {cap-color, cap-surface, gill-attachment, ring-number, veil-color})
(1999.2245419999308, {cap-color, cap-shape, cap-surface, gill-attachment})
(1962.429414852144, {cap-color, cap-shape, cap-surface, veil-color})
(1851.1240419597088, {cap-color, cap-shape, gill-attachment, ring-number})
(1818.103175173921, {cap-color, cap-surface, gill-attachment, veil-color})
(1787.8796736710938, {cap-color, cap-shape, ring-number, veil-color})
(1720.3728042711373, {cap-color, cap-shape, gill-attachment, veil-color})

That is, there is a possible partition of the substrate as follows, $\bigcup\{Q_1,~Q_2,~Q_3,~V \setminus \{Q_1,Q_2,Q_3\}\} = V$,

qq1 
# {bruises, edible, odor, ring-type, stalk-root}

qq2
# {gill-color, gill-size, gill-spacing, spore-print-color, stalk-shape}

qq3
# {habitat, population, stalk-color-above-ring, stalk-color-below-ring, stalk-surface-above-ring, stalk-surface-below-ring}

vv-qq1-qq2-qq3
# {cap-color, cap-shape, cap-surface, gill-attachment, ring-number, veil-color, veil-type}

We can check to see if the shuffle size is sufficient by optimising with a different shuffle,

hhr = historyRepasShuffle_u(hh,3)

rpln(buildtuprr(1590,10,10,uu,vv,hh,hhr))
# (20898.30881207043, {bruises, edible, odor, ring-type, stalk-root})
# (20752.970341604527, {bruises, odor, ring-type, stalk-root, stalk-shape})
# (20609.75866419327, {bruises, gill-size, odor, ring-type, stalk-root})
# (18788.360560650763, {bruises, gill-spacing, odor, ring-type, stalk-root})
# (17630.53206124808, {habitat, ring-type, spore-print-color, stalk-root})
# (17521.773935518864, {habitat, odor, ring-type, stalk-root})
# (17353.567039910533, {edible, odor, spore-print-color, stalk-root})
# (17349.636902047107, {bruises, odor, ring-number, ring-type, stalk-root})
# (16737.072559143457, {odor, population, ring-type, stalk-root})
# (16543.55247776693, {bruises, gill-attachment, odor, ring-type, stalk-root})

We can see that this partition is not affected by the shuffle seed.

Predicting edibility without modelling

The sample query variables predict edibility. That is, there is a functional or causal relationship between the query variables and the label variables, $(A\%V_{\mathrm{k}})^{\mathrm{FS}} \to (A\%V_{\mathrm{l}})^{\mathrm{FS}}$. So the label entropy or query conditional entropy is zero. See Entropy and alignment. In this case, where $V = V_{\mathrm{k}} \cup V_{\mathrm{l}}$, the label entropy is \[ \begin{eqnarray} \mathrm{entropy}(A) - \mathrm{entropy}(A~\%~V_{\mathrm{k}})~=~0 \end{eqnarray} \] More generally, define \[ \begin{eqnarray} \mathrm{lent}(A,W,V_{\mathrm{l}})~:=~\mathrm{entropy}(A~\%~(W \cup V_{\mathrm{l}})) - \mathrm{entropy}(A~\%~W) \end{eqnarray} \]

def lent(aa,ww,vvl):
    return ent(red(aa,ww|vvl)) - ent(red(aa,ww))

Then $\mathrm{lent}(A,V_{\mathrm{k}},V_{\mathrm{l}}) = 0$,

lent(aa,vvk,vvl)
0.0

We can determine which of the query variables has the least conditional entropy, \[ \begin{eqnarray} \{(\mathrm{lent}(A,\{w\},V_{\mathrm{l}}),~w) : w \in V_{\mathrm{k}}\} \end{eqnarray} \]

rpln(sset([(lent(aa,sset([w]),vvl),w) for w in vvk]))
# (0.06445777995546464, odor)
# (0.3593018375305004, spore-print-color)
# (0.4034743011923405, gill-color)
# (0.47206538234114515, ring-type)
# (0.49514434957356657, stalk-surface-above-ring)
# (0.504038208263087, stalk-surface-below-ring)
# (0.516549029620998, stalk-color-above-ring)
# (0.5251645766232356, stalk-color-below-ring)
# (0.5329702396776962, gill-size)
# (0.5525144643974396, population)
# (0.5591537977521386, bruises)
# (0.5837923250560273, habitat)
# (0.5990526304940014, stalk-root)
# (0.6225742013519671, gill-spacing)
# (0.6586777995379725, cap-shape)
# (0.6658477366342479, ring-number)
# (0.6675136370489365, cap-color)
# (0.6726838566664071, cap-surface)
# (0.6759923983315359, veil-color)
# (0.6826826472037806, gill-attachment)
# (0.6872908661915269, stalk-shape)
# (0.6925010959051001, veil-type)

This may be compared to the entropy of the label variables, $\mathrm{entropy}(A\%V_{\mathrm{l}})$,

ent(red(aa,vvl))
0.6925010959051001

Mono-valent veil-type has the highest conditional entropy. In fact, it is equal to the entropy of the label variables, and so makes no prediction of edibility, $\mathrm{lent}(A,\{\mathrm{veil type}\},V_{\mathrm{l}}) = \mathrm{entropy}(A\%V_{\mathrm{l}})$.

By contrast, odor has the least conditional entropy by quite a margin. Odor is highly predictive of edibility. Its label entropy is $\mathrm{lent}(A,\{\mathrm{odor}\},V_{\mathrm{l}})$,

odor = VarStr("odor")

lent(aa,sset([odor]),vvl)
0.06445777995546464

Let us reduce the sample, $A~\%~(\{\mathrm{odor}\} \cup V_{\mathrm{l}})$, to see the relationship,

rpln(aall(red(aa,sset([odor])|vvl)))
# ({(edible, edible), (odor, almond)}, 400 % 1)
# ({(edible, edible), (odor, anise)}, 400 % 1)
# ({(edible, edible), (odor, none)}, 3408 % 1)
# ({(edible, poisonous), (odor, creosote)}, 192 % 1)
# ({(edible, poisonous), (odor, fishy)}, 576 % 1)
# ({(edible, poisonous), (odor, foul)}, 2160 % 1)
# ({(edible, poisonous), (odor, musty)}, 36 % 1)
# ({(edible, poisonous), (odor, none)}, 120 % 1)
# ({(edible, poisonous), (odor, pungent)}, 256 % 1)
# ({(edible, poisonous), (odor, spicy)}, 576 % 1)

rpln(ssplit(vvk,states(red(aa,sset([odor])|vvl))))
# ({(odor, almond)}, {(edible, edible)})
# ({(odor, anise)}, {(edible, edible)})
# ({(odor, creosote)}, {(edible, poisonous)})
# ({(odor, fishy)}, {(edible, poisonous)})
# ({(odor, foul)}, {(edible, poisonous)})
# ({(odor, musty)}, {(edible, poisonous)})
# ({(odor, none)}, {(edible, edible)})
# ({(odor, none)}, {(edible, poisonous)})
# ({(odor, pungent)}, {(edible, poisonous)})
# ({(odor, spicy)}, {(edible, poisonous)})

Only value none is ambiguous.

Odor and edibility are also highly aligned, $\mathrm{algn}(A~\%~(\{\mathrm{odor}\} \cup V_{\mathrm{l}}))$,

algn(red(aa,sset([odor])|vvl))
5076.0182810184415

which suggests that relationship tends to be bijective or functional/causal in both directions. That is, edibility is also somewhat predictive of odor. The label entropy in the opposite direction is $\mathrm{lent}(A,V_{\mathrm{l}},\{\mathrm{odor}\})$,

lent(aa,vvl,sset([odor]))
0.9796522676447261

ent(red(aa,sset([odor])))
1.6076955835943616

rpln(ssplit(vvl,states(red(aa,sset([odor])|vvl))))
# ({(edible, edible)}, {(odor, almond)})
# ({(edible, edible)}, {(odor, anise)})
# ({(edible, edible)}, {(odor, none)})
# ({(edible, poisonous)}, {(odor, creosote)})
# ({(edible, poisonous)}, {(odor, fishy)})
# ({(edible, poisonous)}, {(odor, foul)})
# ({(edible, poisonous)}, {(odor, musty)})
# ({(edible, poisonous)}, {(odor, none)})
# ({(edible, poisonous)}, {(odor, pungent)})
# ({(edible, poisonous)}, {(odor, spicy)})

Now, however, both values edible and poisonous are ambiguous.

We can determine minimum subsets of the query variables that are causal or predictive by using the repa conditional entropy tuple set builder. The conditional entropy minimisation searches for the set of tuples with the least label entropy. We show the resultant tuples along with their label entropies, \[ \{(\mathrm{lent}(A,M,V_{\mathrm{l}}),~M) : M \in \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,A,\mathrm{L}}))\} \]

def buildcondrr(vvl,aa,kmax,omax,qmax):
    return sset([(b,a) for (a,b) in parametersBuilderConditionalVarsRepa(kmax,omax,qmax,vvl,aa).items()])

(kmax,omax,qmax) = (1, 5, 5)

rpln(buildcondrr(vvl,hh,kmax,omax,qmax))
# (0.06445777995546442, {odor})
# (0.3593018375305004, {spore-print-color})
# (0.4034743011923396, {gill-color})
# (0.4720653823411449, {ring-type})
# (0.49514434957356646, {stalk-surface-above-ring})

(kmax,omax,qmax) = (2, 5, 5)

rpln(buildcondrr(vvl,hh,kmax,omax,qmax))
# (0.02082990753054048, {odor, spore-print-color})
# (0.03627934002186972, {cap-color, odor})
# (0.038496450666163806, {gill-color, odor})
# (0.04566068343589391, {odor, stalk-shape})
# (0.04619210440230992, {odor, stalk-color-below-ring})

All of the multi-variate tuples contain odor.

(kmax,omax,qmax) = (3, 5, 5)

rpln(buildcondrr(vvl,hh,kmax,omax,qmax))
# (0.006893838773649907, {habitat, odor, spore-print-color})
# (0.008190808632908997, {odor, ring-number, spore-print-color})
# (0.008190808632909441, {gill-size, odor, spore-print-color})
# (0.008951100722878635, {odor, spore-print-color, stalk-surface-below-ring})
# (0.009511843838571732, {cap-color, odor, spore-print-color})

(kmax,omax,qmax) = (4, 5, 5)

rpln(buildcondrr(vvl,hh,kmax,omax,qmax))
# (0.0, {habitat, odor, population, spore-print-color})
# (0.0018803963612850083, {cap-color, habitat, odor, spore-print-color})
# (0.0018803963612850083, {habitat, odor, spore-print-color, stalk-color-below-ring})
# (0.002215007955169934, {gill-size, odor, spore-print-color, stalk-surface-below-ring})
# (0.002215007955169934, {odor, ring-number, spore-print-color, stalk-surface-below-ring})

So the minimum tuple dimension that is causal or predictive is 4. Let this tuple be $X$, \[ X~=~\{\mathrm{habitat},~\mathrm{odor},~\mathrm{population},~\mathrm{spore print color}\} \]

xx = sset([VarStr(s) for s in ["habitat","odor","population","spore-print-color"]])

len(xx)
4

The label entropy, $\mathrm{lent}(A,X,V_{\mathrm{l}})$, rounds to zero,

lent(aa,xx,vvl)
-8.881784197001252e-16

That is, there is a functional or causal relationship between the tuple, $X$, and the label variables, $(A\%X)^{\mathrm{FS}} \to (A\%V_{\mathrm{l}})^{\mathrm{FS}}$.

This tuple has a volume of $|X^{\mathrm{C}}| = 3402$,

vol(uu,xx)
3402

but classifies the sample into only $|(A~\%~(X \cup V_{\mathrm{l}}))^{\mathrm{F}}| = |(A\%X)^{\mathrm{F}}| = 96$ effective states or slices,

rpln(aall(red(aa,xx|vvl)))
# ({(edible, edible), (habitat, grasses), (odor, almond), (population, numerous), (spore-print-color, black)}, 32 % 1)
# ({(edible, edible), (habitat, grasses), (odor, almond), (population, numerous), (spore-print-color, brown)}, 32 % 1)
# ({(edible, edible), (habitat, grasses), (odor, almond), (population, scattered), (spore-print-color, black)}, 44 % 1)
# ...
# ({(edible, poisonous), (habitat, woods), (odor, musty), (population, clustered), (spore-print-color, white)}, 36 % 1)
# ({(edible, poisonous), (habitat, woods), (odor, none), (population, several), (spore-print-color, white)}, 32 % 1)
# ({(edible, poisonous), (habitat, woods), (odor, spicy), (population, several), (spore-print-color, white)}, 192 % 1)

size(eff(red(aa,xx|vvl)))
# 96 % 1

rpln(ssplit(vvk,states(red(aa,xx|vvl))))
# ({(habitat, grasses), (odor, almond), (population, numerous), (spore-print-color, black)}, {(edible, edible)})
# ({(habitat, grasses), (odor, almond), (population, numerous), (spore-print-color, brown)}, {(edible, edible)})
# ({(habitat, grasses), (odor, almond), (population, scattered), (spore-print-color, black)}, {(edible, edible)})
# ...
# ({(habitat, woods), (odor, none), (population, solitary), (spore-print-color, chocolate)}, {(edible, edible)})
# ({(habitat, woods), (odor, none), (population, solitary), (spore-print-color, white)}, {(edible, edible)})
# ({(habitat, woods), (odor, spicy), (population, several), (spore-print-color, white)}, {(edible, poisonous)})

Let us consider whether a predictive tuple exists that excludes odor. Let $V_{\mathrm{k2}} = V_{\mathrm{k}} \setminus \{\mathrm{odor}\}$,

vvk2 = vvk - sset([odor])

The reduced sample excluding odor is $A_2 = A~\%~(V_{\mathrm{k2}} \cup V_{\mathrm{l}})$. Repeat the conditional entropy minimisation, but with the reduced sample, \[ \{(\mathrm{lent}(A_2,M,V_{\mathrm{l}}),~M) : M \in \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,A_2,\mathrm{L}}))\} \]

def hrhrred(hr,vv):
    return setVarsHistoryRepasHistoryRepaReduced(vv,hr)

hh2 = hrhrred(hh,vvk2|vvl)

(kmax,omax,qmax) = (1, 5, 5)

rpln(buildcondrr(vvl,hh2,kmax,omax,qmax))
# (0.3593018375305004, {spore-print-color})
# (0.4034743011923396, {gill-color})
# (0.4720653823411449, {ring-type})
# (0.49514434957356646, {stalk-surface-above-ring})
# (0.504038208263087, {stalk-surface-below-ring})

(kmax,omax,qmax) = (4, 5, 5)

rpln(buildcondrr(vvl,hh2,kmax,omax,qmax))
# (0.0, {bruises, gill-size, spore-print-color, stalk-root})
# (0.012694197540795926, {population, spore-print-color, stalk-root, stalk-shape})
# (0.012695876064318767, {cap-surface, gill-size, spore-print-color, stalk-root})
# (0.013792239141938278, {cap-surface, spore-print-color, stalk-root, stalk-shape})
# (0.015268666719201462, {bruises, spore-print-color, stalk-root, stalk-shape})

In fact, there is another tetra-variate tuple that is causal or predictive of edibility. Let this tuple be $Y$, \[ Y~=~\{\mathrm{bruises},~\mathrm{gill size},~\mathrm{spore print color},~\mathrm{stalk root}\} \]

yy = sset([VarStr(s) for s in ["bruises","gill-size","spore-print-color","stalk-root"]])

len(yy)
4

lent(aa,yy,vvl)
0.0

That is, there is a functional or causal relationship between the tuple and the label variables, $(A\%Y)^{\mathrm{FS}} \to (A\%V_{\mathrm{l}})^{\mathrm{FS}}$.

This tuple has a smaller volume of $|Y^{\mathrm{C}}| = 180$,

vol(uu,yy)
180

and classifies the sample into only $|(A\%Y)^{\mathrm{F}}| = 33$ effective states or slices,

rpln(aall(red(aa,yy|vvl)))
# ({(bruises, bruises), (edible, edible), (gill-size, broad), (spore-print-color, black), (stalk-root, bulbous)}, 864 % 1)
# ({(bruises, bruises), (edible, edible), (gill-size, broad), (spore-print-color, black), (stalk-root, club)}, 256 % 1)
# ({(bruises, bruises), (edible, edible), (gill-size, broad), (spore-print-color, black), (stalk-root, rooted)}, 96 % 1)
# ...
# ({(bruises, no), (edible, poisonous), (gill-size, narrow), (spore-print-color, brown), (stalk-root, bulbous)}, 96 % 1)
# ({(bruises, no), (edible, poisonous), (gill-size, narrow), (spore-print-color, white), (stalk-root, club)}, 8 % 1)
# ({(bruises, no), (edible, poisonous), (gill-size, narrow), (spore-print-color, white), (stalk-root, missing)}, 1760 % 1)

size(eff(red(aa,yy|vvl)))
# 33 % 1

rpln(ssplit(vvk,states(red(aa,yy|vvl))))
# ({(bruises, bruises), (gill-size, broad), (spore-print-color, black), (stalk-root, bulbous)}, {(edible, edible)})
# ({(bruises, bruises), (gill-size, broad), (spore-print-color, black), (stalk-root, club)}, {(edible, edible)})
# ({(bruises, bruises), (gill-size, broad), (spore-print-color, black), (stalk-root, rooted)}, {(edible, edible)})
# ...
# ({(bruises, no), (gill-size, narrow), (spore-print-color, white), (stalk-root, bulbous)}, {(edible, edible)})
# ({(bruises, no), (gill-size, narrow), (spore-print-color, white), (stalk-root, club)}, {(edible, poisonous)})
# ({(bruises, no), (gill-size, narrow), (spore-print-color, white), (stalk-root, missing)}, {(edible, poisonous)})

The tuples only share one variable, $X \cap Y$,

xx & yy
# {spore-print-color}

We can continue on by excluding spore-print-color,

vvk3 = vvk2 - sset([VarStr("spore-print-color")])

hh3 = hrhrred(hh,vvk3|vvl)

(kmax,omax,qmax) = (1, 5, 5)

rpln(buildcondrr(vvl,hh3,kmax,omax,qmax))
# (0.4034743011923396, {gill-color})
# (0.4720653823411449, {ring-type})
# (0.49514434957356646, {stalk-surface-above-ring})
# (0.504038208263087, {stalk-surface-below-ring})
# (0.5165490296209985, {stalk-color-above-ring})

(kmax,omax,qmax) = (5, 5, 20)

rpln(buildcondrr(vvl,hh3,kmax,omax,qmax))
# (0.0, {bruises, cap-color, gill-color, habitat, stalk-root})
# (0.0, {bruises, cap-color, habitat, ring-type, stalk-root})
# (0.0, {bruises, gill-color, habitat, ring-type, stalk-root})
# (0.0, {bruises, gill-color, habitat, stalk-color-below-ring, stalk-root})
# (0.0, {bruises, gill-color, habitat, stalk-root, stalk-surface-above-ring})
# (0.0031735493851989816, {bruises, gill-color, habitat, stalk-root})
# (0.004134518654214325, {bruises, habitat, ring-type, stalk-root})
# (0.022964087272247635, {habitat, ring-type, stalk-root, stalk-shape})
...

Now the tuple dimension is 5, but there are several variations.

We can see that there are multiple subsets of the query variables, $V_{\mathrm{k}}$, not necessarily including either odor or spore-print-color, that can predict the label variables or edibility, $V_{\mathrm{l}} = \{\mathrm{edible}\}$. For example, the tuple $X \subset V_{\mathrm{k}}$ or the tuple $Y \subset V_{\mathrm{k}}$.

Predicting odor without modelling

Now consider if there are tuples that can predict variable odor rather than variable edible. Let $V_{\mathrm{l2}} = \{\mathrm{odor}\}$,

vvl2 = sset([odor])

The entropy is $\mathrm{entropy}(A\%V_{\mathrm{l2}})$,

ent(red(aa,vvl2))
1.6076955835943616

The label entropy is $\mathrm{lent}(A,V_{\mathrm{k2}},V_{\mathrm{l2}})$,

lent(aa,vvk2,vvl2)
0.3019349802152096

The label entropy is non-zero, so odor cannot be perfectly predicted even with all of the query variables. If we add edible to the query variables, the odor is still ambiguous, $\mathrm{lent}(A,~V \setminus V_{\mathrm{l2}},~V_{\mathrm{l2}}) > 0$,

lent(aa,vv-vvl2,vvl2)
0.3019349802152096

A tetra-variate tuple obtains most of what causality there is,

hh4 = hrhrred(hh,vvk2|vvl2)

(kmax,omax,qmax) = (1, 5, 5)

rpln(buildcondrr(vvl2,hh4,kmax,omax,qmax))
# (0.9477982525833593, {spore-print-color})
# (1.0064100040408621, {gill-color})
# (1.0284970138083624, {stalk-root})
# (1.083608451873597, {ring-type})
# (1.2126735369913884, {cap-color})

(kmax,omax,qmax) = (4, 5, 5)

rpln(buildcondrr(vvl2,hh4,kmax,omax,qmax))
# (0.3146291777557688, {cap-color, spore-print-color, stalk-root, stalk-shape})
# (0.31607649976978536, {cap-surface, spore-print-color, stalk-root, stalk-shape})
# (0.32162256927776056, {bruises, gill-size, spore-print-color, stalk-root})
# (0.323061152914073, {cap-surface, gill-size, spore-print-color, stalk-root})
# (0.3233325116314498, {gill-size, habitat, spore-print-color, stalk-shape})

(kmax,omax,qmax) = (6, 5, 5)

rpln(buildcondrr(vvl2,hh4,kmax,omax,qmax))
# (0.3019349802149671, {cap-surface, gill-color, gill-size, spore-print-color, stalk-root, stalk-shape})
# (0.301934980214968, {cap-color, cap-surface, spore-print-color, stalk-color-above-ring, stalk-root, stalk-shape})
# (0.3019349802149689, {cap-color, cap-surface, spore-print-color, stalk-color-below-ring, stalk-root, stalk-shape})
# (0.3019349802149698, {bruises, cap-color, cap-surface, spore-print-color, stalk-root, stalk-shape})
# (0.3019349802149698, {cap-color, cap-surface, gill-size, spore-print-color, stalk-root, stalk-shape})

Instead of measuring the predictability of odor by label entropy we can measure the label modal size, \[ \begin{eqnarray} \sum_{R \in (A\%K)^{\mathrm{FS}}} \mathrm{maxr}(A * \{R\}^{\mathrm{U}}~\%~(V \setminus K)) \end{eqnarray} \] More generally, define \[ \begin{eqnarray} \mathrm{lmodal}(A,W,V_{\mathrm{l}})~:=~\sum_{R \in (A\%W)^{\mathrm{FS}}} \mathrm{maxr}(A~\%~(W \cup V_{\mathrm{l}}) * \{R\}^{\mathrm{U}}~\%~V_{\mathrm{l}}) \end{eqnarray} \] The tuple $Z$ is defined \[ Z~=~\{\mathrm{cap color},~\mathrm{spore print color},~\mathrm{stalk root},~\mathrm{stalk shape}\} \]

zz = sset([VarStr(s) for s in ["cap-color","spore-print-color","stalk-root","stalk-shape"]])

The label entropy fraction of tuple $Z$ is $1 - \mathrm{lent}(A,Z,V_{\mathrm{l2}})/\mathrm{entropy}(A\%V_{\mathrm{l2}})$,

lent(aa,zz,vvl2)
0.3146291777557706

ent(red(aa,vvl2))
1.6076955835943616

1.0 - 0.3146291777557706/1.6076955835943616
0.8042980394009996

To calculate the label modal size setVarsHistogramsSliceModal is defined in module Alignment,

setVarsHistogramsSliceModal :: Set.Set Variable -> Histogram -> Rational

The label modal size fraction of tuple $Z$ is $\mathrm{lmodal}(A,Z,V_{\mathrm{l2}})/\mathrm{size}(A\%V_{\mathrm{l2}})$,

def lmodal(aa,ww,vvl):
    return setVarsHistogramsSliceModal(ww,red(aa,ww|vvl))

lmodal(aa,zz,vvl2)
# 6524 % 1

size(red(aa,vvl2))
# 8124 % 1

6524.0/8124.0
0.8030526834071886

Both measures can be interpreted as implying an odor prediction accuracy of around 80%.

We can analyse the components containing ambiguous values for variable odor. Define \[ \begin{eqnarray} \mathrm{lslices}(A,W,V_{\mathrm{l}})~:=~\{(R,~A~\%~(W \cup V_{\mathrm{l}}) * \{R\}^{\mathrm{U}}) : R \in (A\%W)^{\mathrm{FS}}\} \end{eqnarray} \]

def lslicesll(aa,ww,vvl):
    return list(setVarsHistogramsSlices(ww,red(aa,ww|vvl)).items())

Then \[ \begin{eqnarray} \{C’ : (R,C) \in \mathrm{lslices}(A,Z,V_{\mathrm{l2}}),~C’ = C\%V_{\mathrm{l2}},~\mathrm{size}(C^{‘\mathrm{F}}) > 1\} \end{eqnarray} \]

rpln([cc1 for (rr,cc) in lslicesll(aa,zz,vvl2) for cc1 in [red(cc,vvl2)] if size(eff(cc1)) > 1])
# {({(odor, none)}, 24 % 1), ({(odor, pungent)}, 64 % 1)}
# {({(odor, almond)}, 24 % 1), ({(odor, anise)}, 24 % 1)}
# {({(odor, none)}, 24 % 1), ({(odor, pungent)}, 64 % 1)}
# {({(odor, almond)}, 24 % 1), ({(odor, anise)}, 24 % 1)}
# {({(odor, fishy)}, 288 % 1), ({(odor, foul)}, 288 % 1), ({(odor, spicy)}, 288 % 1)}
# {({(odor, fishy)}, 288 % 1), ({(odor, foul)}, 288 % 1), ({(odor, spicy)}, 288 % 1)}
# {({(odor, almond)}, 64 % 1), ({(odor, anise)}, 64 % 1)}
# {({(odor, almond)}, 12 % 1), ({(odor, anise)}, 12 % 1)}
# {({(odor, almond)}, 64 % 1), ({(odor, anise)}, 64 % 1)}
# {({(odor, almond)}, 12 % 1), ({(odor, anise)}, 12 % 1)}
# {({(odor, almond)}, 64 % 1), ({(odor, anise)}, 64 % 1)}
# {({(odor, almond)}, 24 % 1), ({(odor, anise)}, 24 % 1)}
# {({(odor, almond)}, 12 % 1), ({(odor, anise)}, 12 % 1)}
# {({(odor, almond)}, 64 % 1), ({(odor, anise)}, 64 % 1)}
# {({(odor, almond)}, 24 % 1), ({(odor, anise)}, 24 % 1)}
# {({(odor, almond)}, 12 % 1), ({(odor, anise)}, 12 % 1)}

We can see that in some of the components the size of each value is duplicated. For example, in the last case the values almond and anise both have component size of 12,

rr = [rr for (rr,cc) in lslicesll(aa,zz,vvl2) for cc1 in [red(cc,vvl2)] if size(eff(cc1)) > 1][-1]

rr
# {(cap-color, yellow), (spore-print-color, purple), (stalk-root, bulbous), (stalk-shape, tapering)}

Then $A * \{R\}^{\mathrm{U}}~\%~(Z \cup V_{\mathrm{l2}})$ is

rpln(aall(red(mul(aa,single(rr,1)),zz|vvl2)))
# ({(cap-color, yellow), (odor, almond), (spore-print-color, purple), (stalk-root, bulbous), (stalk-shape, tapering)}, 12 % 1)
# ({(cap-color, yellow), (odor, anise), (spore-print-color, purple), (stalk-root, bulbous), (stalk-shape, tapering)}, 12 % 1)

size(eff(mul(aa,single(rr,1))))
# 24 % 1

This duplication probably arises from the method used in the construction of the hypothetical mushroom samples.

As mentioned above, edibility is also somewhat predictive of odor,

edible = VarStr("edible")

The label entropy fraction is $1 - \mathrm{lent}(A,\{\mathrm{edible}\},\{\mathrm{odor}\})/\mathrm{entropy}(A\%\{\mathrm{odor}\})$,

lent(aa,sset([edible]),sset([odor]))
0.9796522676447261

ent(red(aa,sset([odor])))
1.6076955835943616

1.0 - 0.9796522676447261/1.6076955835943616
0.39064815650330065

The label modal size fraction is $\mathrm{lmodal}(A,\{\mathrm{edible}\},\{\mathrm{odor}\})/\mathrm{size}(A\%\{\mathrm{odor}\})$,

lmodal(aa,sset([edible]),sset([odor]))
# 5568 % 1

size(red(aa,sset([odor])))
# 8124 % 1

5568.0/8124.0
0.6853766617429837

but the odor prediction accuracy is lower, around 40-70%.

Manual modelling of edibility

Having seen that edibility is predicted by various subsets of the substrate, $V$, consider if a model can do this in a more concise way.

There are some rules for poisonous mushrooms from most general to most specific:

P_1) odor=NOT(almond.OR.anise.OR.none)
     120 poisonous cases missed, 98.52% accuracy

P_2) spore-print-color=green
     48 cases missed, 99.41% accuracy
     
P_3) odor=none.AND.stalk-surface-below-ring=scaly.AND.
          (stalk-color-above-ring=NOT.brown) 
     8 cases missed, 99.90% accuracy
     
P_4) habitat=leaves.AND.cap-color=white
         100% accuracy

Rule P_4) may also be

P_4') population=clustered.AND.cap_color=white

We have created a fud of transforms for each of these rules in MUSH_model_manual.json (see Manual model construction).

First, load the model $G_{\mathrm{m}}$,

ggm = persistentsFud(json.load(open('./MUSH_model_manual.json', 'r')))

uu1 = uunion(uu,fsys(ggm))

The model has 4 derived variables, $W_{\mathrm{m}} = \mathrm{der}(G_{\mathrm{m}})$,

fder(ggm)
# {p1, p2, p3, p4}

and a derived volume, $|W_{\mathrm{m}}^{\mathrm{C}}|$, of 16,

vol(uu1,fder(ggm))
16

The model has 6 underlying variables, $V_{\mathrm{m}} = \mathrm{und}(G_{\mathrm{m}})$,

fund(ggm)
# {cap-color, habitat, odor, spore-print-color, stalk-color-above-ring, stalk-surface-below-ring}

The underlying volume, $|V_{\mathrm{m}}^{\mathrm{C}}|$, is

vol(uu1,fund(ggm))
204120

Let the derived be $A’ = A * G_{\mathrm{m}}^{\mathrm{T}}$. The derived alignment, $\mathrm{algn}(A’)$, is

aa1 = red(fmul(aa,ggm),fder(ggm))

algn(aa1)
69.86642836898682

The derived variables are only weakly aligned. Furthermore, they are overlapped, $\mathrm{overlap}(G_{\mathrm{m}}^{\mathrm{T}})$,

fudsOverlap(ggm)
# True

so the content derived alignment, $\mathrm{algn}(A * G_{\mathrm{m}}^{\mathrm{T}}) - \mathrm{algn}(A^{\mathrm{X}} * G_{\mathrm{m}}^{\mathrm{T}})$, would be lower still.

The derived entropy, $\mathrm{entropy}(A’)$, is

ent(aa1)
0.7711287134449115

This may be compared to the logarithm of the derived volume, $\ln |W_{\mathrm{m}}^{\mathrm{C}}|$,

w = vol(uu1,fder(ggm))

log(w)
2.772588722239781

Let the cartesian derived be $V_{\mathrm{m}}^{\mathrm{C}’} = V_{\mathrm{m}}^{\mathrm{C}} * G_{\mathrm{m}}^{\mathrm{T}}$. The cartesian derived entropy, $\mathrm{entropy}(V_{\mathrm{m}}^{\mathrm{C}’})$, depends on the underlying cartesian, $V_{\mathrm{m}}^{\mathrm{C}}$, but the underlying volume, $|V_{\mathrm{m}}^{\mathrm{C}}|$, is quite large so we calculate the cartesian derived entropy by constructing a HistoryRepa,

hvvg = aahr(uu1,unit(cart(uu1,fund(ggm))))

hrsize(hvvg)
204120

vvc1 = hhaa(hrhh(uu1,hrhrred(hrfmul(uu1,ggm,hvvg),fder(ggm))))

ent(vvc1)
1.1482395879784482

The cartesian derived entropy is greater than the derived entropy, $\mathrm{entropy}(V_{\mathrm{m}}^{\mathrm{C}’}) > \mathrm{entropy}(A’)$.

The size-volume scaled component size cardinality sum relative entropy is the size-volume scaled component size cardinality sum cross entropy minus the size-volume scaled component size cardinality sum entropy (Transform entropy), \[ \begin{eqnarray} (z+v_{\mathrm{m}}) \times \mathrm{entropy}(A * G_{\mathrm{m}}^{\mathrm{T}} + V_{\mathrm{m}}^{\mathrm{C}} * G_{\mathrm{m}}^{\mathrm{T}}) - z \times \mathrm{entropy}(A * G_{\mathrm{m}}^{\mathrm{T}}) - v_{\mathrm{m}} \times \mathrm{entropy}(V_{\mathrm{m}}^{\mathrm{C}} * G_{\mathrm{m}}^{\mathrm{T}}) \end{eqnarray} \]

z = size(aa1)
v = size(vvc1)

(z+v) * ent(add(aa1,vvc1)) - z * ent(aa1) - v * ent(vvc1)
1663.472301909118

(z+v) * log(w)
588465.3207630601

Define the abbreviation rent for the size-volume scaled component size cardinality sum relative entropy, \[ \begin{eqnarray} \mathrm{rent}(A,B)~:=~(z_A+z_B) \times \mathrm{entropy}(A + B) - z_A \times \mathrm{entropy}(A) - z_B \times \mathrm{entropy}(B) \end{eqnarray} \]

def rent(aa,bb):
    a = size(aa)
    b = size(bb)
    return (a+b) * ent(add(aa,bb)) - a * ent(aa) - b * ent(bb)

Then the relative entropy is $\mathrm{rent}(A’,V_{\mathrm{m}}^{\mathrm{C}’})$,

rent(aa1,vvc1)
1663.472301909118

Like the derived alignment, the relative entropy is quite low. These statistics are interesting because both give us a measure of the likelihood of the model. This is especially the case for the size-volume scaled component size cardinality sum relative entropy, $\mathrm{rent}(A * G_{\mathrm{m}}^{\mathrm{T}},V_{\mathrm{m}}^{\mathrm{C}} * G_{\mathrm{m}}^{\mathrm{T}})$, which is discussed in the ‘Induction with model’ section of the Overview of the paper.

In the discussion of induced models below the underlying volumes are impracticably large so let us approximate the relative entropy by using a volume sized shuffle. We constructed a shuffle, $A_{\mathrm{r}}$, earlier when discussing tuples in the substrate,

aar = hhaa(hrhh(uu,hhr))

size(aar)
# 8124 % 1

We will calculate the size-volume-sized-shuffle relative entropy, \[ \begin{eqnarray} (z+v_{\mathrm{m}}) \times \mathrm{ent}(A * G_{\mathrm{m}}^{\mathrm{T}} + Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}} * G_{\mathrm{m}}^{\mathrm{T}}) - z \times \mathrm{ent}(A * G_{\mathrm{m}}^{\mathrm{T}}) - v_{\mathrm{m}} \times \mathrm{ent}(Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}} * G_{\mathrm{m}}^{\mathrm{T}}) \end{eqnarray} \] where $v_{\mathrm{m}} = |V_{\mathrm{m}}^{\mathrm{C}}|$ and $Z_{\mathrm{m}} = \mathrm{scalar}(v_{\mathrm{m}})$.

Let the shuffle derived be $A_{\mathrm{r}}’ = A_{\mathrm{r}} * G_{\mathrm{m}}^{\mathrm{T}}$,

aar1 = red(fmul(aar,ggm),fder(ggm))

The shuffle derived alignment, $\mathrm{algn}(A_{\mathrm{r}}’)$ is expected to be low,

algn(aar1)
66.88979584247136

The volume sized shuffle derived entropy, $\mathrm{entropy}(Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}}’)$, is

ent(resize(v,aar1))
0.8671533652420039

and the size-volume-sized-shuffle relative entropy, $\mathrm{rent}(A’,~Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}}’)$, is

rent(aa1,resize(v,aar1))
146.8218190143234

We can see that the size-volume-sized-shuffle relative entropy, $\mathrm{rent}(A’,~Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}}’)$, is lower than the size-volume relative entropy, $\mathrm{rent}(A’,V_{\mathrm{m}}^{\mathrm{C}’})$. This is because the volume sized shuffle, $Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}}$, is less uniform than the cartesian, $V_{\mathrm{m}}^{\mathrm{C}}$, and tends to synchronise with the sample, $A$. However, the size-volume-sized-shuffle relative entropy provides us with a measure of the likelihood of the model.

Now apply the model to the sample. Let $B = A * \mathrm{his}(G_{\mathrm{m}}^{\mathrm{T}})$,

bb = fmul(aa,ggm)

rpln(aall(red(bb,fder(ggm)|vvl)))
# ({(edible, edible), (p1, 0), (p2, 0), (p3, 0), (p4, 0)}, 4208 % 1)
# ({(edible, poisonous), (p1, 0), (p2, 0), (p3, 0), (p4, 1)}, 8 % 1)
# ({(edible, poisonous), (p1, 0), (p2, 0), (p3, 1), (p4, 0)}, 40 % 1)
# ({(edible, poisonous), (p1, 0), (p2, 1), (p3, 0), (p4, 0)}, 72 % 1)
# ({(edible, poisonous), (p1, 1), (p2, 0), (p3, 0), (p4, 0)}, 3796 % 1)

size(eff(red(bb,fder(ggm)|vvl)))
# 5 % 1

rpln(ssplit(fder(ggm),states(red(bb,fder(ggm)|vvl))))
# ({(p1, 0), (p2, 0), (p3, 0), (p4, 0)}, {(edible, edible)})
# ({(p1, 0), (p2, 0), (p3, 0), (p4, 1)}, {(edible, poisonous)})
# ({(p1, 0), (p2, 0), (p3, 1), (p4, 0)}, {(edible, poisonous)})
# ({(p1, 0), (p2, 1), (p3, 0), (p4, 0)}, {(edible, poisonous)})
# ({(p1, 1), (p2, 0), (p3, 0), (p4, 0)}, {(edible, poisonous)})

We can see that together the rules P1-4 are functionally or causally related to edibility, $(B\%W_{\mathrm{m}})^{\mathrm{FS}} \to (B\%V_{\mathrm{l}})^{\mathrm{FS}}$. In addition, there are only 5 effective states of 16 derived states, so the model, $G_{\mathrm{m}}$, might be said to be more concise than the tuples $X$ and $Y$ of the non-modelled case above.

The model entropy is similar to the slice entropy of the non-modelled case. The model’s label entropy or query conditional entropy is zero, $\mathrm{lent}(B,W_{\mathrm{m}},V_{\mathrm{l}}) = 0$.

[p1,p2,p3,p4] = map(VarStr,["p1","p2","p3","p4"])

lent(bb,sset([p1,p2,p3,p4]),vvl)
0.0

lent(bb,sset([p1]),vvl)
0.0675240166808252

lent(bb,sset([p2]),vvl)
0.6859909862350153

lent(bb,sset([p3]),vvl)
0.6888949375684059

lent(bb,sset([p4]),vvl)
0.6917819609609287

lent(bb,sset([p1,p2]),vvl)
0.03237355156509225

lent(bb,sset([p1,p2,p3]),vvl)
0.007155343359194988

Rule P1 is far more predictive of edibility than the other rules, having a label entropy, $\mathrm{lent}(B,\{\mathrm{p}_1\},V_{\mathrm{l}})$, of only 0.0675240166808252, whereas the other rules are close to the maximum label entropy. The label entropy fraction is $1 - \mathrm{lent}(B,\{\mathrm{p}_1\},V_{\mathrm{l}})/\ln |V_{\mathrm{l}}^{\mathrm{C}}|$,

vol(uu,vvl)
2

log(2)
0.6931471805599453

1.0 - 0.0675240166808252/0.6931471805599453
0.9025834359936699

The label modal size fraction is $\mathrm{lmodal}(B,\{\mathrm{p}_1\},V_{\mathrm{l}})/\mathrm{size}(B\%V_{\mathrm{l}})$,

lmodal(bb,sset([p1]),vvl)
# 8004 % 1

size(red(bb,vvl))
# 8124 % 1

8004.0/8124.0
0.9852289512555391

As noted above, odor is highly predictive of edibility,

lent(aa,sset([odor]),vvl)
0.06445777995546464

lmodal(aa,sset([odor]),vvl)
# 8004 % 1

rpln(aall(red(aa,sset([odor])|vvl)))
# ({(edible, edible), (odor, almond)}, 400 % 1)
# ({(edible, edible), (odor, anise)}, 400 % 1)
# ({(edible, edible), (odor, none)}, 3408 % 1)
# ({(edible, poisonous), (odor, creosote)}, 192 % 1)
# ({(edible, poisonous), (odor, fishy)}, 576 % 1)
# ({(edible, poisonous), (odor, foul)}, 2160 % 1)
# ({(edible, poisonous), (odor, musty)}, 36 % 1)
# ({(edible, poisonous), (odor, none)}, 120 % 1)
# ({(edible, poisonous), (odor, pungent)}, 256 % 1)
# ({(edible, poisonous), (odor, spicy)}, 576 % 1)

p1 depends only on odor. Let $T_1 \in G_{\mathrm{m}}$ be such that $\mathrm{der}(T_1) = \{\mathrm{p}_1\}$. Then $\mathrm{und}(T_1) = \{\mathrm{odor}\}$,

tt1 = ffqq(fdep(ggm,sset([p1])))[0]

rpln(aall(red(mul(aa,ttaa(tt1)),der(tt1)|vvl)))
# ({(edible, edible), (p1, 0)}, 4208 % 1)
# ({(edible, poisonous), (p1, 0)}, 120 % 1)
# ({(edible, poisonous), (p1, 1)}, 3796 % 1)

und(tt1)
# {odor}

rpln(states(ttaa(tt1)))
# {(odor, almond), (p1, 0)}
# {(odor, anise), (p1, 0)}
# {(odor, creosote), (p1, 1)}
# {(odor, fishy), (p1, 1)}
# {(odor, foul), (p1, 1)}
# {(odor, musty), (p1, 1)}
# {(odor, none), (p1, 0)}
# {(odor, pungent), (p1, 1)}
# {(odor, spicy), (p1, 1)}

rpln(aall(red(mul(aa,ttaa(tt1)),tvars(tt1)|vvl)))
# ({(edible, edible), (odor, almond), (p1, 0)}, 400 % 1)
# ({(edible, edible), (odor, anise), (p1, 0)}, 400 % 1)
# ({(edible, edible), (odor, none), (p1, 0)}, 3408 % 1)
# ({(edible, poisonous), (odor, creosote), (p1, 1)}, 192 % 1)
# ({(edible, poisonous), (odor, fishy), (p1, 1)}, 576 % 1)
# ({(edible, poisonous), (odor, foul), (p1, 1)}, 2160 % 1)
# ({(edible, poisonous), (odor, musty), (p1, 1)}, 36 % 1)
# ({(edible, poisonous), (odor, none), (p1, 0)}, 120 % 1)
# ({(edible, poisonous), (odor, pungent), (p1, 1)}, 256 % 1)
# ({(edible, poisonous), (odor, spicy), (p1, 1)}, 576 % 1)

We can also consider how predictive the model is of odor. The label entropy fraction is $1 - \mathrm{lent}(B,W_{\mathrm{m}},V_{\mathrm{l2}})/\ln |V_{\mathrm{l2}}^{\mathrm{C}}|$,

vvl2
# {odor}

vol(uu,vvl2)
9

log(9)
2.1972245773362196

lent(bb,sset([p1,p2,p3,p4]),vvl2)
0.9136278428753103

1.0 - 0.9136278428753103/2.1972245773362196
0.584190049438216

The label modal size fraction is $\mathrm{lmodal}(B,W_{\mathrm{m}},V_{\mathrm{l2}})/\mathrm{size}(B\%V_{\mathrm{l2}})$,

lmodal(bb,sset([p1,p2,p3,p4]),vvl2)
# 5688 % 1

5688.0/8124.0
0.7001477104874446

The model, $G_{\mathrm{m}}$, is only 60-70% accurate with respect to odor, even though odor is in the underlying variables, $\mathrm{odor} \in V_{\mathrm{m}}$.

Induced modelling of edibility

Having considered a manually defined model of edibility, $G_{\mathrm{m}}$, now consider an unsupervised induced model $D$ on the query variables, $V_{\mathrm{k}}$, which exclude edibility. By unsupervised we mean an induced model that is optimised not to minimise the label entropy, nor to maximise the label modal size, but rather to maximise the summed alignment valency-density.

Then we shall analyse this model, $D$, to find a smaller submodel that predicts the label variables, $V_{\mathrm{l}}$, or edibility. That is, we shall search in the decomposition fud for a submodel that optimises conditional entropy.

Here the induced model is created by the limited-nodes highest-layer excluded-self maximum-roll-by-derived-dimension fud decomper, $(\cdot,D) = I_{P,U,\mathrm{D,F,mm,xs,d,f}}((V_{\mathrm{k}},A))$.

There are some examples of model induction in the MUSH repository.

First consider the fud decomposition MUSH_model17.json (see Model induction),

df = persistentsDecompFud_u(json.load(open('./MUSH_model17.json', 'r')))

uu1 = uunion(uu,fsys(dfff(df)))

len(uvars(uu1))
132

Let us examine the tree of the fud decomposition, \[ \begin{eqnarray} \{\{(S,~\mathrm{und}(F),~\mathrm{der}(F)) : (S,F) \in L\} : L \in \mathrm{paths}(D)\} \end{eqnarray} \]

rpln(treesPaths(funcsTreesMap(lambda xx:(xx[0],fund(xx[1]),fder(xx[1])),dfzz(df))))
...

The fud identifier is a VarInt that is set by the inducer as part of the naming convention of the derived variables, \[ \begin{eqnarray} \mathrm{fid}(F)~:=~f : ((f,\cdot),\cdot) \in \mathrm{der}(F) \end{eqnarray} \] The decomposition tree contains 7 nodes with fud identifiers as follows, \[ \begin{eqnarray} \{\{\mathrm{fid}(F) : (\cdot,F) \in L\} : L \in \mathrm{paths}(D)\} \end{eqnarray} \]

def fid(ff):
    return variablesVariableFud(fder(ff)[0])

rpln(treesSubPaths(funcsTreesMap(lambda xx:fid(xx[1]),dfzz(df))))
# [1]
# [1, 2]
# [1, 2, 7]
# [1, 2, 9]
# [1, 2, 10]
# [1, 3]
# [1, 4]

Now consider the summed alignment and the summed alignment valency-density, $\mathrm{summation}(U_1,D,A))$,

(wmax,lmax,xmax,omax,bmax,mmax,umax,pmax,fmax,mult,seed) = ((9*9*10), 8, (9*9*10), 20, (20*3), 3, (9*9*10), 1, 10, 7, 5)

summation(mult,seed,uu1,df1,hh)
(85780.45912794449, 37161.48267081803)

\[ \begin{eqnarray} \{(\mathrm{fid}(F),~z_C,~a) : ((S,F),(z_C,(a,a_{\mathrm{d}}))) \in \mathrm{nodes}(\mathrm{sumtree}(U_1,D,A))\} \end{eqnarray} \]

sumtree = systemsDecompFudsHistoryRepasTreeAlignmentContentShuffleSummation_u

rpln([(fid(ff),zc,a) for ((ss,ff),(zc,(a,ad))) in sumtree(mult,seed,uu1,df,hr).items()])
# (1, 8124, 39181.46354001778)
# (2, 3276, 14654.951059674358)
# (4, 1824, 2802.8249523523555)
# (3, 3024, 15354.177038855069)
# (9, 972, 3435.8013895566273)
# (10, 832, 3314.590501097839)
# (7, 1472, 7036.65064639045)

We can see that the root fud has the highest slice size and shuffle content derived alignment, while the leaf fuds have small slice sizes and shuffle content derived alignments.

The bare model is a fud decomposition. As noted in Conversion to fud, the tree of a fud decomposition is sometimes unwieldy, so consider the fud decomposition fud, $F = D^{\mathrm{F}} \in \mathcal{F}$, (see Practicable fud decomposition fud),

ff = systemsDecompFudsNullablePracticable(uu1,df,1)

uu2 = uunion(uu,fsys(ff))

len(uvars(uu2))
197

The model, $F$, has 56 derived variables, $W_F = \mathrm{der}(F)$, and a large derived volume, $|W_F^{\mathrm{C}}|$,

len(fder(ff))
56

fder(ff)
# {<<1,n>,1>, <<1,n>,2>, ... <<1,n>,7>, <<2,n>,1>, <<2,n>,2>, ... <<9,n>,8>, <<10,n>,1>, <<10,n>,2>, ... <<10,n>,7>}

vol(uu2,fder(ff))
2065214267056164664258854912

The model has 20 underlying variables, $V_F = \mathrm{und}(F)$,

len(fund(ff))
20

vv - fund(ff)
# {edible, veil-color, veil-type}

That is, the model depends on all of the substrate except for the label variable, edible, variable veil-color and mono-valent veil-type. This is consistent with the observation above that none of the substrate variables, except for veil-type, is independent of the others, and that veil-color is only weakly dependent.

The underlying volume, $|V_F^{\mathrm{C}}|$, is

vol(uu,fund(ff))
30474952704000

The derived entropy, $\mathrm{entropy}(A * F)$, is

aa1 = hhaa(hrhh(uu2,hrhrred(hrfmul(uu2,ff,hh),fder(ff))))

ent(aa1)
2.2056420385272157

This may be compared to the logarithm of the derived volume, $\ln |W_F^{\mathrm{C}}|$,

w = vol(uu2,fder(ff))

log(w)
62.89503149315568

So derived entropy is quite low. This is because there are only 15 effective derived states,

size(eff(aa1))
# 15 % 1

rpln([c for (ss,c) in aall(aa1)])
# 188 % 1
# 48 % 1
# 8 % 1
# 56 % 1
# 384 % 1
# 288 % 1
# 288 % 1
# 288 % 1
# 256 % 1
# 704 % 1
# 768 % 1
# 1728 % 1
# 96 % 1
# 1296 % 1
# 1728 % 1

The cartesian derived entropy, $\mathrm{entropy}(V_F^{\mathrm{C}} * F)$, depends on the underlying cartesian, $V_F^{\mathrm{C}}$. The underlying volume is too large to compute, so we are unable to calculate the cartesian derived entropy or the component size cardinality sum relative entropy. Instead we can compute an approximation to the size-volume scaled component size independent sum relative entropy using a volume sized shuffle, \[ \begin{eqnarray} (z+v_F) \times \mathrm{ent}(A * F^{\mathrm{T}} + Z_F * \hat{A}_{\mathrm{r}} * F^{\mathrm{T}}) - z \times \mathrm{ent}(A * F^{\mathrm{T}}) - v_F \times \mathrm{ent}(Z_F * \hat{A}_{\mathrm{r}} * F^{\mathrm{T}}) \end{eqnarray} \] where $v_F = |V_F^{\mathrm{C}}|$ and $Z_F = \mathrm{scalar}(v_F)$.

aar = hhaa(hrhh(uu,hhr))

size(aar)
# 8124 % 1

def vsize(uu,xx,aa):
    return resize(vol(uu,xx),aa)

aar1 = hhaa(hrhh(uu2,hrhrred(hrfmul(uu2,ff,hhr),fder(ff))))

ent(vsize(uu,fund(ff),aar1))
4.693386848554536

rent(aa1,vsize(uu,fund(ff),aar1))
112112.4375

We can see that by this measure the relative entropy of the induced model, $\mathrm{rent}(A * F^{\mathrm{T}},~Z_F * \hat{A}_{\mathrm{r}} * F^{\mathrm{T}})$, is much higher than the relative entropy of the manual model, $\mathrm{rent}(A * G_{\mathrm{m}}^{\mathrm{T}},~Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}} * G_{\mathrm{m}}^{\mathrm{T}})$. This is consistent with the derived aligment, $\mathrm{algn}(A * F)$, implied by the summed alignment, $\mathrm{summation}(U_1,D,A)$, which is also higher for the induced model.

Now apply the model to the sample. Let $B = A * \prod\mathrm{his}(F)$,

hhb = hrfmul(uu2,ff,hh)

rpln(aall(hhaa(hrhh(uu2,hrhrred(hhb,fder(ff)|vvl)))))
# ({(edible, edible), (<<1,n>,1>, 0), ... (<<10,n>,7>, null)}, 72 % 1)
...
# ({(edible, poisonous), (<<1,n>,1>, 1), ... (<<10,n>,7>, null)}, 1728 % 1)

size(eff(hhaa(hrhh(uu2,hrhrred(hhb,fder(ff)|vvl)))))
# 19 % 1

rpln(ssplit(fder(ff),states(hhaa(hrhh(uu2,hrhrred(hhb,fder(ff)|vvl))))))
# ({(<<1,n>,1>, 0), ... (<<10,n>,7>, null)}, {(edible, edible)})
# ...
# ({(<<1,n>,1>, 1), ... (<<10,n>,7>, null)}, {(edible, poisonous)})

The model derived variables, $W_F$, are almost causally related to edibility, $(B\%W_F)^{\mathrm{FS}} \to (B\%V_{\mathrm{l}})^{\mathrm{FS}}$. The model’s label entropy or query conditional entropy is near zero, $\mathrm{lent}(B,W_F,V_{\mathrm{l}}) \approx 0$,

def hrlent(uu,hh,ww,vvl):
    return ent(hhaa(hrhh(uu,hrhrred(hh,ww|vvl)))) - ent(hhaa(hrhh(uu,hrhrred(hh,ww))))

hrlent(uu2,hhb,fder(ff),vvl)
0.04488778006332694

rpln(sset([(hrlent(uu2,hhb,sset([w]),vvl),w) for w in fder(ff)]))
# (0.2361093658636677, <<1,n>,3>)
# (0.2361093658636677, <<1,n>,4>)
# ...
# (0.29101789445875514, <<1,n>,5>)
# (0.29101789445875526, <<3,n>,1>)
# ...
# (0.29101789445875537, <<3,n>,9>)
# (0.4844913448854371, <<2,n>,2>)
# ...
# (0.4844913448854371, <<2,n>,6>)
# (0.5143617442765938, <<4,n>,1>)
# ...
# (0.5143617442765938, <<4,n>,9>)
# (0.549068727770982, <<2,n>,1>)
# ...
# (0.549068727770982, <<2,n>,7>)
# (0.5546035042868757, <<7,n>,1>)
# ...
# (0.5546035042868757, <<7,n>,9>)
# (0.634616073600902, <<9,n>,1>)
# (0.634616073600902, <<9,n>,2>)
# (0.637373464649287, <<10,n>,1>)
# ...
# (0.6460129913993978, <<10,n>,6>)
# (0.6509225684291914, <<9,n>,7>)
# ...
# (0.6605611705755877, <<9,n>,6>)

We can see that the derived variables nearest the root fud tend to have the lowest label entropy. None have zero label entropy by themselves. Consider derived variable <<1,n>,4> in the root fud,

w1n4 = stringsVariable("<<1,n>,4>")

fund(fdep(ff,sset([w1n4])))
# {bruises, gill-color, gill-size, habitat, odor, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring}

hrlent(uu2,hhb,sset([w1n4]),vvl)
0.2361093658636677

rpln(aall(hhaa(hrhh(uu2,hrhrred(hhb,sset([w1n4])|vvl)))))
# ({(edible, edible), (<<1,n>,4>, 0)}, 2384 % 1)
# ({(edible, edible), (<<1,n>,4>, 1)}, 1824 % 1)
# ({(edible, poisonous), (<<1,n>,4>, 0)}, 892 % 1)
# ({(edible, poisonous), (<<1,n>,4>, 2)}, 3024 % 1)

rpln(ssplit(fder(ff),states(hhaa(hrhh(uu2,hrhrred(hhb,sset([w1n4])|vvl))))))
# ({(<<1,n>,4>, 0)}, {(edible, edible)})
# ({(<<1,n>,4>, 0)}, {(edible, poisonous)})
# ({(<<1,n>,4>, 1)}, {(edible, edible)})
# ({(<<1,n>,4>, 2)}, {(edible, poisonous)})

Now consider the label entropy for all of the fud variables, $\mathrm{vars}(F)$, not just the fud derived variables, $\mathrm{der}(F)$. We can determine minimum subsets of the query variables that are causal or predictive by using the repa conditional entropy tuple set builder. The conditional entropy minimisation searches for the set of tuples with the least label entropy. We show the resultant tuples along with their label entropies, \[ \{(\mathrm{lent}(B,M,V_{\mathrm{l}}),~M) : M \in \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,B,\mathrm{L}}))\} \]

def buildcondrr(vvl,aa,kmax,omax,qmax):
    return sset([(b,a) for (a,b) in parametersBuilderConditionalVarsRepa(kmax,omax,qmax,vvl,aa).items()])

(kmax,omax,qmax) = (1, 5, 5)

rpln(buildcondrr(vvl,hhb,kmax,omax,qmax))
# (0.06445777995546442, {odor})
# (0.16474416484069354, {<<1,1>,31>})
# (0.16917391545419513, {<<1,1>,4>})
# (0.2263923811825881, {<<1,1>,49>})
# (0.23421502771371605, {<<1,2>,95>})

Variable <<1,1>,31> is nearly as predictive as variable odor. Variable <<1,1>,31> is in the bottom layer of fud 1 and is defined as follows -

{
	"derived":["<<1,1>,31>"],
	"history":{
		"hsystem":[
			{"var": "odor", "values": ["almond", "anise", "creosote", "fishy", "foul", "musty", "none", "pungent", "spicy"]},
			{"var": "<<1,1>,31>", "values": ["0", "1", "2"]}
		],
		"hstates":[
			[0, 0],
			[1, 0],
			[2, 0],
			[3, 1],
			[4, 1],
			[5, 1],
			[6, 2],
			[7, 0],
			[8, 1]
		]
	}
}

w1131 = stringsVariable("<<1,1>,31>")

fund(fdep(ff,sset([w1131])))
# {odor}

rpln(states(ttaa(fdep(ff,sset([w1131]))[0])))
# {(odor, almond), (<<1,1>,31>, 0)}
# {(odor, anise), (<<1,1>,31>, 0)}
# {(odor, creosote), (<<1,1>,31>, 0)}
# {(odor, fishy), (<<1,1>,31>, 1)}
# {(odor, foul), (<<1,1>,31>, 1)}
# {(odor, musty), (<<1,1>,31>, 1)}
# {(odor, none), (<<1,1>,31>, 2)}
# {(odor, pungent), (<<1,1>,31>, 0)}
# {(odor, spicy), (<<1,1>,31>, 1)}

rpln(aall(hhaa(hrhh(uu2,hrhrred(hhb,sset([w1131])|vvl)))))
# ({(edible, edible), (<<1,1>,31>, 0)}, 800 % 1)
# ({(edible, edible), (<<1,1>,31>, 2)}, 3408 % 1)
# ({(edible, poisonous), (<<1,1>,31>, 0)}, 448 % 1)
# ({(edible, poisonous), (<<1,1>,31>, 1)}, 3348 % 1)
# ({(edible, poisonous), (<<1,1>,31>, 2)}, 120 % 1)

That is, underlying values almond, anise, creosote and pungent form a component, value none forms a singleton component, while the remaining values are in a third component. Underlying values creosote and pungent, however, are not relevant to edibility, so the label entropy is higher than for variable odor.

rpln(aall(hhaa(hrhh(uu2,hrhrred(hhb,sset([w1131])|vvl)))))
# ({(edible, edible), (<<1,1>,31>, 0)}, 800 % 1)
# ({(edible, edible), (<<1,1>,31>, 2)}, 3408 % 1)
# ({(edible, poisonous), (<<1,1>,31>, 0)}, 448 % 1)
# ({(edible, poisonous), (<<1,1>,31>, 1)}, 3348 % 1)
# ({(edible, poisonous), (<<1,1>,31>, 2)}, 120 % 1)

odor = VarStr("odor")

rpln(aall(hhaa(hrhh(uu2,hrhrred(hhb,sset([w1131,odor])|vvl)))))
# ({(edible, edible), (odor, almond), (<<1,1>,31>, 0)}, 400 % 1)
# ({(edible, edible), (odor, anise), (<<1,1>,31>, 0)}, 400 % 1)
# ({(edible, edible), (odor, none), (<<1,1>,31>, 2)}, 3408 % 1)
# ({(edible, poisonous), (odor, creosote), (<<1,1>,31>, 0)}, 192 % 1)
# ({(edible, poisonous), (odor, fishy), (<<1,1>,31>, 1)}, 576 % 1)
# ({(edible, poisonous), (odor, foul), (<<1,1>,31>, 1)}, 2160 % 1)
# ({(edible, poisonous), (odor, musty), (<<1,1>,31>, 1)}, 36 % 1)
# ({(edible, poisonous), (odor, none), (<<1,1>,31>, 2)}, 120 % 1)
# ({(edible, poisonous), (odor, pungent), (<<1,1>,31>, 0)}, 256 % 1)
# ({(edible, poisonous), (odor, spicy), (<<1,1>,31>, 1)}, 576 % 1)

Now optimise for larger tuples, excluding the substrate. Let $B_2 = B~\%~(\mathrm{vars}(F) \setminus V \cup V_{\mathrm{l}})$. Then, \[ \{(\mathrm{lent}(B_2,M,V_{\mathrm{l}}),~M) : M \in \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,B_2,\mathrm{L}}))\} \]

hhb2 = hrhrred(hhb,fvars(ff)-vv|vvl)

(kmax,omax,qmax) = (1, 10, 10)

rpln(buildcondrr(vvl,hhb2,kmax,omax,qmax))
# (0.16474416484069354, {<<1,1>,31>})
# (0.16917391545419513, {<<1,1>,4>})
# (0.2263923811825881, {<<1,1>,49>})
# (0.23421502771371605, {<<1,2>,95>})
# (0.2361093658636677, {<<1,n>,3>})
# (0.2361093658636677, {<<1,n>,4>})
# (0.2361093658636677, {<<1,n>,6>})
# (0.2361093658636677, {<<1,n>,7>})
# (0.2361093658636677, {<<1,2>,83>})
# (0.2361093658636677, {<<1,2>,85>})

(kmax,omax,qmax) = (2, 10, 20)

rpln(buildcondrr(vvl,hhb2,kmax,omax,qmax))
# (0.044122472082529285, {<<1,1>,31>, <<2,3>,4>})
# (0.04412247208252951, {<<1,1>,4>, <<2,n>,2>})
# (0.04412247208252951, {<<1,1>,4>, <<2,n>,3>})
# (0.04412247208252951, {<<1,1>,4>, <<2,n>,6>})
# (0.04412247208252951, {<<1,1>,31>, <<2,n>,2>})
# (0.04412247208252951, {<<1,1>,31>, <<2,n>,3>})
# (0.04412247208252951, {<<1,1>,31>, <<2,n>,6>})
# (0.04412247208252973, {<<1,1>,31>, <<2,2>,38>})
# (0.04566068343589369, {<<1,1>,31>, <<2,n>,1>})
# (0.04566068343589369, {<<1,1>,31>, <<2,n>,4>})
# (0.16474416484069354, {<<1,1>,31>})
# (0.16917391545419513, {<<1,1>,4>})
...

Continuing up to to 6-tuples,

(kmax,omax,qmax) = (3, 10, 20)

rpln(buildcondrr(vvl,hhb2,kmax,omax,qmax))
# (0.022390255903229406, {<<1,1>,31>, <<2,2>,38>, <<9,1>,20>})
# (0.022390255903229406, {<<1,1>,31>, <<2,3>,4>, <<9,1>,20>})
...

(kmax,omax,qmax) = (4, 10, 20)

rpln(buildcondrr(vvl,hhb2,kmax,omax,qmax))
# (0.010993236700227893, {<<1,1>,4>, <<1,1>,36>, <<2,n>,2>, <<9,n>,1>})
# (0.010993236700227893, {<<1,1>,4>, <<1,1>,36>, <<2,n>,2>, <<9,n>,2>})
...

(kmax,omax,qmax) = (5, 10, 20)

rpln(buildcondrr(vvl,hhb2,kmax,omax,qmax))
# (0.002463822863309595, {<<1,1>,4>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,1>})
# (0.002463822863309595, {<<1,1>,4>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,2>})
...

(kmax,omax,qmax) = (6, 10, 20)

ll = buildcondrr(vvl,hhb2,kmax,omax,qmax)
rpln(ll)
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,1>})
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,2>})
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,2>,48>, <<9,n>,1>})
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,2>,48>, <<9,n>,2>})
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,1>,18>, <<9,n>,1>})
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,1>,18>, <<9,n>,2>})
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,2>,48>, <<9,n>,1>})
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,2>,48>, <<9,n>,2>})
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,6>, <<2,1>,18>, <<9,n>,1>})
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,6>, <<2,1>,18>, <<9,n>,2>})
# (0.002463822863309595, {<<1,1>,4>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,1>})
# (0.002463822863309595, {<<1,1>,4>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,2>})
# ...

Now we have found 10 tuples which are predictive of edibility,

rpln([(xx,fund(fdep(ff,xx))) for (e,xx) in ll if e < 1e-14])
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,1>}, {bruises, gill-attachment, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,2>}, {bruises, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,2>,48>, <<9,n>,1>}, {bruises, gill-attachment, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,2>,48>, <<9,n>,2>}, {bruises, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,1>,18>, <<9,n>,1>}, {bruises, gill-attachment, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,1>,18>, <<9,n>,2>}, {bruises, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,2>,48>, <<9,n>,1>}, {bruises, gill-attachment, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,2>,48>, <<9,n>,2>}, {bruises, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,6>, <<2,1>,18>, <<9,n>,1>}, {bruises, gill-attachment, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,6>, <<2,1>,18>, <<9,n>,2>}, {bruises, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})

Let us sort by shuffle content derived alignment descending. Let $L = \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,B_2,\mathrm{L}}))$. Then calculate \[ \{(\mathrm{algn}(B\%X)-\mathrm{algn}(B_{\mathrm{r}}\%X),~X) : (e,X) \in L,~e \approx 0\} \] where $B_{\mathrm{r}} = A_{\mathrm{r}} * \prod\mathrm{his}(F)$,

hhbr = hrfmul(uu2,ff,hhr)

hrsize(hhbr)
8124

rpln(reversed(list(sset([(algn(aa1)-algn(aar1),xx) for (e,xx) in ll if e < 1e-14 for aa1 in [hhaa(hrhh(uu2,hrhrred(hhb,xx)))] for aar1 in [hhaa(hrhh(uu2,hrhrred(hhbr,xx)))]]))))
# (29472.345908137817, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,1>,18>, <<9,n>,2>})
# (29468.238375651046, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,1>,18>, <<9,n>,1>})
# (29280.163931967294, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,2>})
# (29276.550876789777, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,1>})
# (28844.380866136813, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,6>, <<2,1>,18>, <<9,n>,2>})
# (28840.28394834463, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,6>, <<2,1>,18>, <<9,n>,1>})
# (28710.853209698478, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,2>,48>, <<9,n>,2>})
# (28705.809106843124, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,2>,48>, <<9,n>,1>})
# (28505.052905060125, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,2>,48>, <<9,n>,2>})
# (28500.56028382371, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,2>,48>, <<9,n>,1>})

and by size-volume-sized-shuffle relative entropy descending, \[ \{(\mathrm{rent}(B~\%~X,~Z_F * \hat{B}_{\mathrm{r}}~\%~X),~X) : (e,X) \in L,~e \approx 0\} \]

rpln(reversed(list(sset([(rent(aa1,vaar1),xx) for (e,xx) in ll if e < 1e-14 for aa1 in [hhaa(hrhh(uu2,hrhrred(hhb,xx)))] for vaar1 in [vsize(uu2,fund(fdep(ff,xx)),hhaa(hrhh(uu2,hrhrred(hhbr,xx))))]]))))
# (35025.78415060043, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,1>,18>, <<9,n>,1>})
# (34267.05117201805, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,1>})
# (34138.91352367401, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,2>,48>, <<9,n>,1>})
# (33342.31550860405, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,6>, <<2,1>,18>, <<9,n>,1>})
# (33271.06856274605, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,2>,48>, <<9,n>,1>})
# (32560.13660311699, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,1>,18>, <<9,n>,2>})
# (31805.06520330906, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,2>})
# (31674.591724276543, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,2>,48>, <<9,n>,2>})
# (30880.60635328293, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,6>, <<2,1>,18>, <<9,n>,2>})
# (30810.17875802517, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,2>,48>, <<9,n>,2>})

We can see that the derived alignments and the relative entropies of the submodels, $X \subset \mathrm{vars}(F) \setminus V$, are higher than that of the manual model, $W_{\mathrm{m}}$, which suggest that the induced submodels are more likely and less sensitive than the manual model.

Let us analyse the one of the sub-models,

xx = ll[0][1]

xx
# {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,1>}

hrlent(uu2,hhb,xx,vvl)
0.0

len(fvars(fdep(ff,xx)))
76

This tuple has a volume of 3840,

vol(uu2,xx)
3840

but classifies the sample into only 22 effective states or slices,

rpln(aall(hhaa(hrhh(uu2,hrhrred(hhb,xx|vvl)))))
# ({(edible, edible), (<<1,1>,4>, 0), (<<1,1>,26>, 0), (<<1,1>,36>, 0), (<<2,n>,2>, null), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, 96 % 1)
# ({(edible, edible), (<<1,1>,4>, 0), (<<1,1>,26>, 2), (<<1,1>,36>, 0), (<<2,n>,2>, 2), (<<2,1>,18>, 1), (<<9,n>,1>, null)}, 704 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 1)}, 192 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 1), (<<2,1>,18>, 2), (<<9,n>,1>, null)}, 96 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 2), (<<2,1>,18>, 4), (<<9,n>,1>, null)}, 768 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 2), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 0)}, 48 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 0), (<<2,n>,2>, null), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, 1728 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 0), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 1)}, 144 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 1), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, 32 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 1), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 1)}, 48 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, 16 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 2)}, 288 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 3), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, 48 % 1)
# ({(edible, poisonous), (<<1,1>,4>, 0), (<<1,1>,26>, 0), (<<1,1>,36>, 0), (<<2,n>,2>, 1), (<<2,1>,18>, 2), (<<9,n>,1>, null)}, 256 % 1)
# ({(edible, poisonous), (<<1,1>,4>, 0), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 1), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, 192 % 1)
# ({(edible, poisonous), (<<1,1>,4>, 0), (<<1,1>,26>, 0), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 1), (<<9,n>,1>, 0)}, 36 % 1)
# ({(edible, poisonous), (<<1,1>,4>, 1), (<<1,1>,26>, 1), (<<1,1>,36>, 2), (<<2,n>,2>, null), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, 1296 % 1)
# ({(edible, poisonous), (<<1,1>,4>, 1), (<<1,1>,26>, 2), (<<1,1>,36>, 1), (<<2,n>,2>, 1), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, 288 % 1)
# ({(edible, poisonous), (<<1,1>,4>, 1), (<<1,1>,26>, 3), (<<1,1>,36>, 3), (<<2,n>,2>, null), (<<2,1>,18>, 3), (<<9,n>,1>, null)}, 1728 % 1)
# ({(edible, poisonous), (<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, 8 % 1)
# ({(edible, poisonous), (<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 0), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, 72 % 1)
# ({(edible, poisonous), (<<1,1>,4>, 2), (<<1,1>,26>, 3), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 0)}, 40 % 1)

size(eff(hhaa(hrhh(uu2,hrhrred(hhb,xx|vvl)))))
# 22 % 1

rpln(ssplit(xx,states(hhaa(hrhh(uu2,hrhrred(hhb,xx|vvl))))))
# ({(<<1,1>,4>, 0), (<<1,1>,26>, 0), (<<1,1>,36>, 0), (<<2,n>,2>, null), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, {(edible, edible)})
# ({(<<1,1>,4>, 0), (<<1,1>,26>, 0), (<<1,1>,36>, 0), (<<2,n>,2>, 1), (<<2,1>,18>, 2), (<<9,n>,1>, null)}, {(edible, poisonous)})
# ({(<<1,1>,4>, 0), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 1), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, {(edible, poisonous)})
# ({(<<1,1>,4>, 0), (<<1,1>,26>, 0), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 1), (<<9,n>,1>, 0)}, {(edible, poisonous)})
# ({(<<1,1>,4>, 0), (<<1,1>,26>, 2), (<<1,1>,36>, 0), (<<2,n>,2>, 2), (<<2,1>,18>, 1), (<<9,n>,1>, null)}, {(edible, edible)})
# ({(<<1,1>,4>, 1), (<<1,1>,26>, 1), (<<1,1>,36>, 2), (<<2,n>,2>, null), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, {(edible, poisonous)})
# ({(<<1,1>,4>, 1), (<<1,1>,26>, 2), (<<1,1>,36>, 1), (<<2,n>,2>, 1), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, {(edible, poisonous)})
# ({(<<1,1>,4>, 1), (<<1,1>,26>, 3), (<<1,1>,36>, 3), (<<2,n>,2>, null), (<<2,1>,18>, 3), (<<9,n>,1>, null)}, {(edible, poisonous)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, {(edible, poisonous)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 1)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 1), (<<2,1>,18>, 2), (<<9,n>,1>, null)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 2), (<<2,1>,18>, 4), (<<9,n>,1>, null)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 2), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 0)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 0), (<<2,n>,2>, null), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 0), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, {(edible, poisonous)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 0), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 1)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 1), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 1), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 1)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 2)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 3), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 3), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 0)}, {(edible, poisonous)})

To conclude, we can see that there are many robust sub-models of the induced model that are predictive of edibility.

top