MUSH - Analysis of the UCI Machine Learning Repository Mushroom Data Set
Sections
Predicting edibility without modelling
Predicting odor without modelling
Induced modelling of edibility
Introduction
The UCI Machine Learning Repository Mushroom Data Set is a popular dataset often used to test machine learning algorithms (e.g. Kaggle).
The dataset consists of descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Mushroom Family drawn from The Audubon Society Field Guide to North American Mushrooms (1981). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like “leaflets three, let it be” for Poisonous Oak and Ivy.
The dataset contains 8124 events of 23 discrete-valued variables:
- cap-shape: bell,conical,convex,flat, knobbed,sunken
- cap-surface: fibrous,grooves,scaly,smooth
- cap-color: brown,buff,cinnamon,gray,green, pink,purple,red,white,yellow
- bruises: bruises,no
- odor: almond,anise,creosote,fishy,foul, musty,none,pungent,spicy
- gill-attachment: attached,descending,free,notched
- gill-spacing: close,crowded,distant
- gill-size: broad,narrow
- gill-color: black,brown,buff,chocolate,gray, green,orange,pink,purple,red, white,yellow
- stalk-shape: enlarging,tapering
- stalk-root: bulbous,club,cup,equal, rhizomorphs,rooted,missing
- stalk-surface-above-ring: fibrous,scaly,silky,smooth
- stalk-surface-below-ring: fibrous,scaly,silky,smooth
- stalk-color-above-ring: brown,buff,cinnamon,gray,orange, pink,red,white,yellow
- stalk-color-below-ring: brown,buff,cinnamon,gray,orange, pink,red,white,yellow
- veil-type: partial,universal
- veil-color: brown,orange,white,yellow
- ring-number: none,one,two
- ring-type: cobwebby,evanescent,flaring,large, none,pendant,sheathing,zone
- spore-print-color: black,brown,buff,chocolate,green, orange,purple,white,yellow
- population: abundant,clustered,numerous, scattered,several,solitary
- habitat: grasses,leaves,meadows,paths, urban,waste,woods
- edibility: edible, poisonous
Note that although edibility is a secondary quality or classification, we shall treat it here as we would any other variable.
We shall analyse this dataset using the MUSHPy repository which depends on the AlignmentRepaPy repository. The AlignmentRepaPy repository is a fast Python implementation of some of the practicable inducers described in the paper. The code in this section can be executed by copying and pasting the code into a Python interpreter, see README. Also see the Introduction in Notation.
Properties of the sample
First load the sample $A$,
from MUSHDev import *
(uu,aa) = mushIO()
vv = uvars(uu)
vvl = sset([VarStr("edible")])
vvk = vv - vvl
The system is $U$. The sample substrate variables are $V = \mathrm{vars}(A)$, the label variables are $V_{\mathrm{l}} = \{\mathrm{edible}\}$, and the query variables form the remainder, $V_{\mathrm{k}} = V \setminus V_{\mathrm{l}}$.
The variable valencies are $\{(w,|U_w|) : w \in V\}$,
rpln(sset([(vol(uu,sset([w])),w) for w in vv]))
# (1, veil-type)
# (2, bruises)
# (2, edible)
# (2, gill-attachment)
# (2, gill-size)
# (2, gill-spacing)
# (2, stalk-shape)
# (3, ring-number)
# (4, cap-surface)
# (4, stalk-surface-above-ring)
# (4, stalk-surface-below-ring)
# (4, veil-color)
# (5, ring-type)
# (5, stalk-root)
# (6, cap-shape)
# (6, population)
# (7, habitat)
# (9, odor)
# (9, spore-print-color)
# (9, stalk-color-above-ring)
# (9, stalk-color-below-ring)
# (10, cap-color)
# (12, gill-color)
Note that veil-type
has only one value and so is a constant.
The variable dimension, $|V|$, is,
len(vv)
23
The variable volume, $|V^{\mathrm{C}}|$, is,
vol(uu,vv)
243799621632000
So the mean valency, $|V^{\mathrm{C}}|^{1/|V|}$, is,
exp(log(vol(uu,vv))/len(vv))
4.222048084120202
The label variable dimension, $|V_{\mathrm{l}}|$, is,
len(vvl)
1
The label variable volume, $|V_{\mathrm{l}}^{\mathrm{C}}|$, is,
vol(uu,vvl)
2
The query variable dimension, $|V_{\mathrm{k}}|$, is,
len(vvk)
22
The query variable volume, $|V_{\mathrm{k}}^{\mathrm{C}}|$, is,
vol(uu,vvk)
121899810816000
The geometric mean query valency, $|V_{\mathrm{k}}^{\mathrm{C}}|^{1/|V_{\mathrm{k}}|}$, is,
exp(log(vol(uu,vvk))/len(vvk))
4.367901791531438
The sample size, $\mathrm{size}(A)$, is
size(aa)
# 8124 % 1
So each effective state corresponds to exactly one event, $A = A^{\mathrm{F}}$,
size(eff(aa))
# 8124 % 1
Now consider how highly aligned variables might be grouped together. See Entropy and alignment. First consider pairs in the substrate, $V$, \[ \{(\mathrm{algn}(A\%\{w,x\}),~w,~x) : w \in V,~x \in V,~w < x\} \]
rpln(reversed(list(sset([(algn(red(aa,sset([w,x]))),w,x) for w in vv for x in vv if w < x]))))
# (5255.546241861608, odor, spore-print-color)
# (5243.485309542397, gill-color, spore-print-color)
# (5076.0182810184415, edible, odor)
# (4869.445702685916, spore-print-color, stalk-root)
# (4747.650540211325, gill-color, odor)
# (4634.640609067872, odor, stalk-root)
# (4538.425095165279, ring-type, spore-print-color)
# (4504.522900455915, gill-color, stalk-root)
# (4319.357344722732, gill-color, ring-type)
# (4191.879346198264, odor, ring-type)
# (3876.2346155617124, population, stalk-root)
# (3792.5367348885193, habitat, stalk-root)
# (3631.907175495806, ring-type, stalk-root)
# (3594.3100531883683, stalk-color-above-ring, stalk-color-below-ring)
# (3580.0056701234134, habitat, population)
# (3526.33839164619, gill-color, habitat)
# ...
# (40.29419779899035, bruises, stalk-shape)
# (34.93679977406282, gill-attachment, gill-spacing)
# (27.046821925505355, gill-spacing, stalk-shape)
# (25.072123638783523, cap-surface, stalk-shape)
# (24.98472755986586, bruises, ring-number)
# (23.84031057698303, cap-shape, gill-spacing)
# (10.583146543744078, ring-number, veil-color)
# (0.0, veil-color, veil-type)
# (0.0, stalk-surface-below-ring, veil-type)
# ...
# (0.0, cap-color, veil-type)
# (0.0, bruises, veil-type)
We can see that all of the variables except for mono-valent veil-type
are aligned with each other, even if only very weakly. We can also see that some of the variables that are in highly aligned pairs are also in other highly aligned pairs, e.g. odor
or spore-print-color
. This suggests that we should also consider tuple dimensions greater than two.
Now consider using the tupler to group together highly aligned variables in the substrate, $V$. Note that for performance reasons we must first construct a HistoryRepa
from the sample histogram, $A$. See History and HistoryRepa.
First consider the tuple dimension by choosing a volume limit, xmax
,
10*12
120
4.367901791531438 ** 4
363.9916829234716
2*2*2*2*2*2*3*4
768
9*9*10
810
9*10*12
1080
4.222048084120202 ** 5
1341.5778383137888
4.367901791531438 ** 5
1589.879923943975
2*2*2*2*2*2*3*4*4
3072
size(aa)
# 8124 % 1
9*9*10*12
9720
Now create a shuffled sample, $A_{\mathrm{r}}$,
hh = aahr(uu,aa)
hhr = historyRepasShuffle_u(hh,1)
hrsize(hhr)
# 8124
The shuffle has the same size as the sample, $\mathrm{size}(A_{\mathrm{r}}) = \mathrm{size}(A)$.
Now optimise the shuffle content alignment with the tuple set builder, $I_{P,U,\mathrm{B,ns,me}}$, \[ \{(\mathrm{algn}(A\%K)-\mathrm{algn}(A_{\mathrm{r}}\%K),~K) : ((K,\cdot,\cdot),\cdot) \in I_{P,U,\mathrm{B,ns,me}}^{ * }((V,~\emptyset,~A,~A_{\mathrm{r}}))\} \]
def buildtuprr(xmax,omax,bmax,uu,vv,xx,xxrr):
return reversed(list(sset([(algn(rraa(uu,hrred(xx,kk))) - algn(rraa(uu,hrred(xxrr,kk))), kk) for ((kk,_),_) in parametersSystemsBuilderTupleNoSumlayerMultiEffectiveRepa_ui(xmax,omax,bmax,1,uu,vv,fudEmpty(),xx,hrhx(xx),xxrr,hrhx(xxrr))[0]])))
rpln(buildtuprr(1590,10,10,uu,vv,hh,hhr))
# (20909.315588710102, {bruises, edible, odor, ring-type, stalk-root})
# (20737.49894848034, {bruises, odor, ring-type, stalk-root, stalk-shape})
# (20595.04499418575, {bruises, gill-size, odor, ring-type, stalk-root})
# (18771.00223024639, {bruises, gill-spacing, odor, ring-type, stalk-root})
# (17647.35883825688, {habitat, ring-type, spore-print-color, stalk-root})
# (17511.267019084673, {habitat, odor, ring-type, stalk-root})
# (17356.19101355837, {edible, odor, spore-print-color, stalk-root})
# (17337.191103083293, {bruises, odor, ring-number, ring-type, stalk-root})
# (16741.46427782372, {odor, population, ring-type, stalk-root})
# (16536.946674766714, {bruises, gill-attachment, odor, ring-type, stalk-root})
We can see that the top tuples have large intersections. Now optimise again having removed the top tuple from the substrate, \[ Q_1~=~\{\mathrm{bruises},~\mathrm{edible},~\mathrm{odor},~\mathrm{ring type},~\mathrm{stalk root}\} \] and \[ \{(\mathrm{algn}(A\%K)-\mathrm{algn}(A_{\mathrm{r}}\%K),~K) : ((K,\cdot,\cdot),\cdot) \in I_{P,U,\mathrm{B,ns,me}}^{ * }((V \setminus Q_1,~\emptyset,~A,~A_{\mathrm{r}}))\} \]
qq1 = sset([VarStr(s) for s in ["bruises","edible","odor","ring-type","stalk-root"]])
rpln(buildtuprr(1590,10,10,uu,vv-qq1,hh,hhr))
# (13474.472075221576, {gill-color, gill-size, gill-spacing, spore-print-color, stalk-shape})
# (13380.35024073867, {gill-color, gill-size, ring-number, spore-print-color, stalk-shape})
# (13071.036894266123, {gill-color, habitat, spore-print-color, stalk-shape})
# (12727.00966415456, {gill-color, gill-size, habitat, spore-print-color})
# (12404.82395851165, {gill-color, population, spore-print-color, stalk-shape})
# (12154.32506810885, {gill-attachment, gill-color, gill-size, spore-print-color, stalk-shape})
# (12120.144392218797, {gill-color, gill-size, population, spore-print-color})
# (12004.554463851495, {gill-color, spore-print-color, stalk-shape, stalk-surface-below-ring})
# (11553.333907313998, {gill-color, spore-print-color, stalk-shape, stalk-surface-above-ring})
# (11546.535980638902, {gill-color, habitat, population, stalk-shape})
Now optimise again having removed the top two tuples from the substrate, \[ Q_2~=~\{\mathrm{gill color},~\mathrm{gill size},~\mathrm{gill spacing},~\mathrm{spore print color},~\mathrm{stalk shape}\} \] and \[ \{(\mathrm{algn}(A\%K)-\mathrm{algn}(A_{\mathrm{r}}\%K),~K) : ((K,\cdot,\cdot),\cdot) \in I_{P,U,\mathrm{B,ns,me}}^{ * }((V \setminus Q_1 \setminus Q_2,~\emptyset,~A,~A_{\mathrm{r}}))\} \]
qq2 = sset([VarStr(s) for s in ["gill-color","gill-size","gill-spacing","spore-print-color","stalk-shape"]])
rpln(buildtuprr(1590,10,10,uu,vv-qq1-qq2,hh,hhr))
# (10117.338301213375, {habitat, population, stalk-color-below-ring, stalk-surface-below-ring})
# (10015.422262274038, {habitat, population, stalk-color-above-ring, stalk-surface-below-ring})
# (9721.157922413408, {habitat, population, stalk-color-below-ring, stalk-surface-above-ring})
# (9633.996246304727, {habitat, population, stalk-color-above-ring, stalk-surface-above-ring})
# (9348.809148464625, {stalk-color-above-ring, stalk-color-below-ring, stalk-surface-above-ring, stalk-surface-below-ring})
# (8575.748022597432, {cap-surface, habitat, population, stalk-color-below-ring})
# (8554.425441539577, {cap-surface, habitat, population, stalk-color-above-ring})
# (8259.597445754986, {cap-color, habitat, population, ring-number})
# (8098.58768334987, {habitat, population, ring-number, stalk-color-above-ring})
# (8061.878539905454, {habitat, population, ring-number, stalk-color-below-ring})
This time if we remove the union of the top four tuples we terminate at the remainder variables, \[ Q_3~=~\{\mathrm{habitat},~\mathrm{population},…,~\mathrm{stalk surface above ring}\} \] and \[ \{(\mathrm{algn}(A\%K)-\mathrm{algn}(A_{\mathrm{r}}\%K),~K) : ((K,\cdot,\cdot),\cdot) \in I_{P,U,\mathrm{B,ns,me}}^{ * }((V \setminus Q_1 \setminus Q_2 \setminus Q_3,~\emptyset,~A,~A_{\mathrm{r}}))\} \]
qq3 = sset([VarStr(s) for s in ["habitat","population","stalk-color-below-ring","stalk-surface-below-ring","stalk-color-above-ring","stalk-surface-above-ring"]])
vv - qq1 - qq2 - qq3
# {cap-color, cap-shape, cap-surface, gill-attachment, ring-number, veil-color, veil-type}
rpln(buildtuprr(1590,10,10,uu,vv-qq1-qq2-qq3,hh,hhr))
(3032.5994640940407, {cap-color, cap-shape, cap-surface, gill-attachment, ring-number})
(2650.426903548403, {cap-color, cap-shape, gill-attachment, ring-number, veil-color})
(2588.3533754319324, {cap-color, cap-shape, cap-surface, ring-number})
(2474.5970677913137, {cap-color, cap-surface, gill-attachment, ring-number, veil-color})
(1999.2245419999308, {cap-color, cap-shape, cap-surface, gill-attachment})
(1962.429414852144, {cap-color, cap-shape, cap-surface, veil-color})
(1851.1240419597088, {cap-color, cap-shape, gill-attachment, ring-number})
(1818.103175173921, {cap-color, cap-surface, gill-attachment, veil-color})
(1787.8796736710938, {cap-color, cap-shape, ring-number, veil-color})
(1720.3728042711373, {cap-color, cap-shape, gill-attachment, veil-color})
That is, there is a possible partition of the substrate as follows, $\bigcup\{Q_1,~Q_2,~Q_3,~V \setminus \{Q_1,Q_2,Q_3\}\} = V$,
qq1
# {bruises, edible, odor, ring-type, stalk-root}
qq2
# {gill-color, gill-size, gill-spacing, spore-print-color, stalk-shape}
qq3
# {habitat, population, stalk-color-above-ring, stalk-color-below-ring, stalk-surface-above-ring, stalk-surface-below-ring}
vv-qq1-qq2-qq3
# {cap-color, cap-shape, cap-surface, gill-attachment, ring-number, veil-color, veil-type}
We can check to see if the shuffle size is sufficient by optimising with a different shuffle,
hhr = historyRepasShuffle_u(hh,3)
rpln(buildtuprr(1590,10,10,uu,vv,hh,hhr))
# (20898.30881207043, {bruises, edible, odor, ring-type, stalk-root})
# (20752.970341604527, {bruises, odor, ring-type, stalk-root, stalk-shape})
# (20609.75866419327, {bruises, gill-size, odor, ring-type, stalk-root})
# (18788.360560650763, {bruises, gill-spacing, odor, ring-type, stalk-root})
# (17630.53206124808, {habitat, ring-type, spore-print-color, stalk-root})
# (17521.773935518864, {habitat, odor, ring-type, stalk-root})
# (17353.567039910533, {edible, odor, spore-print-color, stalk-root})
# (17349.636902047107, {bruises, odor, ring-number, ring-type, stalk-root})
# (16737.072559143457, {odor, population, ring-type, stalk-root})
# (16543.55247776693, {bruises, gill-attachment, odor, ring-type, stalk-root})
We can see that this partition is not affected by the shuffle seed.
Predicting edibility without modelling
The sample query variables predict edibility. That is, there is a functional or causal relationship between the query variables and the label variables, $(A\%V_{\mathrm{k}})^{\mathrm{FS}} \to (A\%V_{\mathrm{l}})^{\mathrm{FS}}$. So the label entropy or query conditional entropy is zero. See Entropy and alignment. In this case, where $V = V_{\mathrm{k}} \cup V_{\mathrm{l}}$, the label entropy is \[ \begin{eqnarray} \mathrm{entropy}(A) - \mathrm{entropy}(A~\%~V_{\mathrm{k}})~=~0 \end{eqnarray} \] More generally, define \[ \begin{eqnarray} \mathrm{lent}(A,W,V_{\mathrm{l}})~:=~\mathrm{entropy}(A~\%~(W \cup V_{\mathrm{l}})) - \mathrm{entropy}(A~\%~W) \end{eqnarray} \]
def lent(aa,ww,vvl):
return ent(red(aa,ww|vvl)) - ent(red(aa,ww))
Then $\mathrm{lent}(A,V_{\mathrm{k}},V_{\mathrm{l}}) = 0$,
lent(aa,vvk,vvl)
0.0
We can determine which of the query variables has the least conditional entropy, \[ \begin{eqnarray} \{(\mathrm{lent}(A,\{w\},V_{\mathrm{l}}),~w) : w \in V_{\mathrm{k}}\} \end{eqnarray} \]
rpln(sset([(lent(aa,sset([w]),vvl),w) for w in vvk]))
# (0.06445777995546464, odor)
# (0.3593018375305004, spore-print-color)
# (0.4034743011923405, gill-color)
# (0.47206538234114515, ring-type)
# (0.49514434957356657, stalk-surface-above-ring)
# (0.504038208263087, stalk-surface-below-ring)
# (0.516549029620998, stalk-color-above-ring)
# (0.5251645766232356, stalk-color-below-ring)
# (0.5329702396776962, gill-size)
# (0.5525144643974396, population)
# (0.5591537977521386, bruises)
# (0.5837923250560273, habitat)
# (0.5990526304940014, stalk-root)
# (0.6225742013519671, gill-spacing)
# (0.6586777995379725, cap-shape)
# (0.6658477366342479, ring-number)
# (0.6675136370489365, cap-color)
# (0.6726838566664071, cap-surface)
# (0.6759923983315359, veil-color)
# (0.6826826472037806, gill-attachment)
# (0.6872908661915269, stalk-shape)
# (0.6925010959051001, veil-type)
This may be compared to the entropy of the label variables, $\mathrm{entropy}(A\%V_{\mathrm{l}})$,
ent(red(aa,vvl))
0.6925010959051001
Mono-valent veil-type
has the highest conditional entropy. In fact, it is equal to the entropy of the label variables, and so makes no prediction of edibility, $\mathrm{lent}(A,\{\mathrm{veil type}\},V_{\mathrm{l}}) = \mathrm{entropy}(A\%V_{\mathrm{l}})$.
By contrast, odor
has the least conditional entropy by quite a margin. Odor is highly predictive of edibility. Its label entropy is $\mathrm{lent}(A,\{\mathrm{odor}\},V_{\mathrm{l}})$,
odor = VarStr("odor")
lent(aa,sset([odor]),vvl)
0.06445777995546464
Let us reduce the sample, $A~\%~(\{\mathrm{odor}\} \cup V_{\mathrm{l}})$, to see the relationship,
rpln(aall(red(aa,sset([odor])|vvl)))
# ({(edible, edible), (odor, almond)}, 400 % 1)
# ({(edible, edible), (odor, anise)}, 400 % 1)
# ({(edible, edible), (odor, none)}, 3408 % 1)
# ({(edible, poisonous), (odor, creosote)}, 192 % 1)
# ({(edible, poisonous), (odor, fishy)}, 576 % 1)
# ({(edible, poisonous), (odor, foul)}, 2160 % 1)
# ({(edible, poisonous), (odor, musty)}, 36 % 1)
# ({(edible, poisonous), (odor, none)}, 120 % 1)
# ({(edible, poisonous), (odor, pungent)}, 256 % 1)
# ({(edible, poisonous), (odor, spicy)}, 576 % 1)
rpln(ssplit(vvk,states(red(aa,sset([odor])|vvl))))
# ({(odor, almond)}, {(edible, edible)})
# ({(odor, anise)}, {(edible, edible)})
# ({(odor, creosote)}, {(edible, poisonous)})
# ({(odor, fishy)}, {(edible, poisonous)})
# ({(odor, foul)}, {(edible, poisonous)})
# ({(odor, musty)}, {(edible, poisonous)})
# ({(odor, none)}, {(edible, edible)})
# ({(odor, none)}, {(edible, poisonous)})
# ({(odor, pungent)}, {(edible, poisonous)})
# ({(odor, spicy)}, {(edible, poisonous)})
Only value none
is ambiguous.
Odor and edibility are also highly aligned, $\mathrm{algn}(A~\%~(\{\mathrm{odor}\} \cup V_{\mathrm{l}}))$,
algn(red(aa,sset([odor])|vvl))
5076.0182810184415
which suggests that relationship tends to be bijective or functional/causal in both directions. That is, edibility is also somewhat predictive of odor. The label entropy in the opposite direction is $\mathrm{lent}(A,V_{\mathrm{l}},\{\mathrm{odor}\})$,
lent(aa,vvl,sset([odor]))
0.9796522676447261
ent(red(aa,sset([odor])))
1.6076955835943616
rpln(ssplit(vvl,states(red(aa,sset([odor])|vvl))))
# ({(edible, edible)}, {(odor, almond)})
# ({(edible, edible)}, {(odor, anise)})
# ({(edible, edible)}, {(odor, none)})
# ({(edible, poisonous)}, {(odor, creosote)})
# ({(edible, poisonous)}, {(odor, fishy)})
# ({(edible, poisonous)}, {(odor, foul)})
# ({(edible, poisonous)}, {(odor, musty)})
# ({(edible, poisonous)}, {(odor, none)})
# ({(edible, poisonous)}, {(odor, pungent)})
# ({(edible, poisonous)}, {(odor, spicy)})
Now, however, both values edible
and poisonous
are ambiguous.
We can determine minimum subsets of the query variables that are causal or predictive by using the repa conditional entropy tuple set builder. The conditional entropy minimisation searches for the set of tuples with the least label entropy. We show the resultant tuples along with their label entropies, \[ \{(\mathrm{lent}(A,M,V_{\mathrm{l}}),~M) : M \in \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,A,\mathrm{L}}))\} \]
def buildcondrr(vvl,aa,kmax,omax,qmax):
return sset([(b,a) for (a,b) in parametersBuilderConditionalVarsRepa(kmax,omax,qmax,vvl,aa).items()])
(kmax,omax,qmax) = (1, 5, 5)
rpln(buildcondrr(vvl,hh,kmax,omax,qmax))
# (0.06445777995546442, {odor})
# (0.3593018375305004, {spore-print-color})
# (0.4034743011923396, {gill-color})
# (0.4720653823411449, {ring-type})
# (0.49514434957356646, {stalk-surface-above-ring})
(kmax,omax,qmax) = (2, 5, 5)
rpln(buildcondrr(vvl,hh,kmax,omax,qmax))
# (0.02082990753054048, {odor, spore-print-color})
# (0.03627934002186972, {cap-color, odor})
# (0.038496450666163806, {gill-color, odor})
# (0.04566068343589391, {odor, stalk-shape})
# (0.04619210440230992, {odor, stalk-color-below-ring})
All of the multi-variate tuples contain odor
.
(kmax,omax,qmax) = (3, 5, 5)
rpln(buildcondrr(vvl,hh,kmax,omax,qmax))
# (0.006893838773649907, {habitat, odor, spore-print-color})
# (0.008190808632908997, {odor, ring-number, spore-print-color})
# (0.008190808632909441, {gill-size, odor, spore-print-color})
# (0.008951100722878635, {odor, spore-print-color, stalk-surface-below-ring})
# (0.009511843838571732, {cap-color, odor, spore-print-color})
(kmax,omax,qmax) = (4, 5, 5)
rpln(buildcondrr(vvl,hh,kmax,omax,qmax))
# (0.0, {habitat, odor, population, spore-print-color})
# (0.0018803963612850083, {cap-color, habitat, odor, spore-print-color})
# (0.0018803963612850083, {habitat, odor, spore-print-color, stalk-color-below-ring})
# (0.002215007955169934, {gill-size, odor, spore-print-color, stalk-surface-below-ring})
# (0.002215007955169934, {odor, ring-number, spore-print-color, stalk-surface-below-ring})
So the minimum tuple dimension that is causal or predictive is 4. Let this tuple be $X$, \[ X~=~\{\mathrm{habitat},~\mathrm{odor},~\mathrm{population},~\mathrm{spore print color}\} \]
xx = sset([VarStr(s) for s in ["habitat","odor","population","spore-print-color"]])
len(xx)
4
The label entropy, $\mathrm{lent}(A,X,V_{\mathrm{l}})$, rounds to zero,
lent(aa,xx,vvl)
-8.881784197001252e-16
That is, there is a functional or causal relationship between the tuple, $X$, and the label variables, $(A\%X)^{\mathrm{FS}} \to (A\%V_{\mathrm{l}})^{\mathrm{FS}}$.
This tuple has a volume of $|X^{\mathrm{C}}| = 3402$,
vol(uu,xx)
3402
but classifies the sample into only $|(A~\%~(X \cup V_{\mathrm{l}}))^{\mathrm{F}}| = |(A\%X)^{\mathrm{F}}| = 96$ effective states or slices,
rpln(aall(red(aa,xx|vvl)))
# ({(edible, edible), (habitat, grasses), (odor, almond), (population, numerous), (spore-print-color, black)}, 32 % 1)
# ({(edible, edible), (habitat, grasses), (odor, almond), (population, numerous), (spore-print-color, brown)}, 32 % 1)
# ({(edible, edible), (habitat, grasses), (odor, almond), (population, scattered), (spore-print-color, black)}, 44 % 1)
# ...
# ({(edible, poisonous), (habitat, woods), (odor, musty), (population, clustered), (spore-print-color, white)}, 36 % 1)
# ({(edible, poisonous), (habitat, woods), (odor, none), (population, several), (spore-print-color, white)}, 32 % 1)
# ({(edible, poisonous), (habitat, woods), (odor, spicy), (population, several), (spore-print-color, white)}, 192 % 1)
size(eff(red(aa,xx|vvl)))
# 96 % 1
rpln(ssplit(vvk,states(red(aa,xx|vvl))))
# ({(habitat, grasses), (odor, almond), (population, numerous), (spore-print-color, black)}, {(edible, edible)})
# ({(habitat, grasses), (odor, almond), (population, numerous), (spore-print-color, brown)}, {(edible, edible)})
# ({(habitat, grasses), (odor, almond), (population, scattered), (spore-print-color, black)}, {(edible, edible)})
# ...
# ({(habitat, woods), (odor, none), (population, solitary), (spore-print-color, chocolate)}, {(edible, edible)})
# ({(habitat, woods), (odor, none), (population, solitary), (spore-print-color, white)}, {(edible, edible)})
# ({(habitat, woods), (odor, spicy), (population, several), (spore-print-color, white)}, {(edible, poisonous)})
Let us consider whether a predictive tuple exists that excludes odor
. Let $V_{\mathrm{k2}} = V_{\mathrm{k}} \setminus \{\mathrm{odor}\}$,
vvk2 = vvk - sset([odor])
The reduced sample excluding odor
is $A_2 = A~\%~(V_{\mathrm{k2}} \cup V_{\mathrm{l}})$.
Repeat the conditional entropy minimisation, but with the reduced sample,
\[
\{(\mathrm{lent}(A_2,M,V_{\mathrm{l}}),~M) : M \in \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,A_2,\mathrm{L}}))\}
\]
def hrhrred(hr,vv):
return setVarsHistoryRepasHistoryRepaReduced(vv,hr)
hh2 = hrhrred(hh,vvk2|vvl)
(kmax,omax,qmax) = (1, 5, 5)
rpln(buildcondrr(vvl,hh2,kmax,omax,qmax))
# (0.3593018375305004, {spore-print-color})
# (0.4034743011923396, {gill-color})
# (0.4720653823411449, {ring-type})
# (0.49514434957356646, {stalk-surface-above-ring})
# (0.504038208263087, {stalk-surface-below-ring})
(kmax,omax,qmax) = (4, 5, 5)
rpln(buildcondrr(vvl,hh2,kmax,omax,qmax))
# (0.0, {bruises, gill-size, spore-print-color, stalk-root})
# (0.012694197540795926, {population, spore-print-color, stalk-root, stalk-shape})
# (0.012695876064318767, {cap-surface, gill-size, spore-print-color, stalk-root})
# (0.013792239141938278, {cap-surface, spore-print-color, stalk-root, stalk-shape})
# (0.015268666719201462, {bruises, spore-print-color, stalk-root, stalk-shape})
In fact, there is another tetra-variate tuple that is causal or predictive of edibility. Let this tuple be $Y$, \[ Y~=~\{\mathrm{bruises},~\mathrm{gill size},~\mathrm{spore print color},~\mathrm{stalk root}\} \]
yy = sset([VarStr(s) for s in ["bruises","gill-size","spore-print-color","stalk-root"]])
len(yy)
4
lent(aa,yy,vvl)
0.0
That is, there is a functional or causal relationship between the tuple and the label variables, $(A\%Y)^{\mathrm{FS}} \to (A\%V_{\mathrm{l}})^{\mathrm{FS}}$.
This tuple has a smaller volume of $|Y^{\mathrm{C}}| = 180$,
vol(uu,yy)
180
and classifies the sample into only $|(A\%Y)^{\mathrm{F}}| = 33$ effective states or slices,
rpln(aall(red(aa,yy|vvl)))
# ({(bruises, bruises), (edible, edible), (gill-size, broad), (spore-print-color, black), (stalk-root, bulbous)}, 864 % 1)
# ({(bruises, bruises), (edible, edible), (gill-size, broad), (spore-print-color, black), (stalk-root, club)}, 256 % 1)
# ({(bruises, bruises), (edible, edible), (gill-size, broad), (spore-print-color, black), (stalk-root, rooted)}, 96 % 1)
# ...
# ({(bruises, no), (edible, poisonous), (gill-size, narrow), (spore-print-color, brown), (stalk-root, bulbous)}, 96 % 1)
# ({(bruises, no), (edible, poisonous), (gill-size, narrow), (spore-print-color, white), (stalk-root, club)}, 8 % 1)
# ({(bruises, no), (edible, poisonous), (gill-size, narrow), (spore-print-color, white), (stalk-root, missing)}, 1760 % 1)
size(eff(red(aa,yy|vvl)))
# 33 % 1
rpln(ssplit(vvk,states(red(aa,yy|vvl))))
# ({(bruises, bruises), (gill-size, broad), (spore-print-color, black), (stalk-root, bulbous)}, {(edible, edible)})
# ({(bruises, bruises), (gill-size, broad), (spore-print-color, black), (stalk-root, club)}, {(edible, edible)})
# ({(bruises, bruises), (gill-size, broad), (spore-print-color, black), (stalk-root, rooted)}, {(edible, edible)})
# ...
# ({(bruises, no), (gill-size, narrow), (spore-print-color, white), (stalk-root, bulbous)}, {(edible, edible)})
# ({(bruises, no), (gill-size, narrow), (spore-print-color, white), (stalk-root, club)}, {(edible, poisonous)})
# ({(bruises, no), (gill-size, narrow), (spore-print-color, white), (stalk-root, missing)}, {(edible, poisonous)})
The tuples only share one variable, $X \cap Y$,
xx & yy
# {spore-print-color}
We can continue on by excluding spore-print-color
,
vvk3 = vvk2 - sset([VarStr("spore-print-color")])
hh3 = hrhrred(hh,vvk3|vvl)
(kmax,omax,qmax) = (1, 5, 5)
rpln(buildcondrr(vvl,hh3,kmax,omax,qmax))
# (0.4034743011923396, {gill-color})
# (0.4720653823411449, {ring-type})
# (0.49514434957356646, {stalk-surface-above-ring})
# (0.504038208263087, {stalk-surface-below-ring})
# (0.5165490296209985, {stalk-color-above-ring})
(kmax,omax,qmax) = (5, 5, 20)
rpln(buildcondrr(vvl,hh3,kmax,omax,qmax))
# (0.0, {bruises, cap-color, gill-color, habitat, stalk-root})
# (0.0, {bruises, cap-color, habitat, ring-type, stalk-root})
# (0.0, {bruises, gill-color, habitat, ring-type, stalk-root})
# (0.0, {bruises, gill-color, habitat, stalk-color-below-ring, stalk-root})
# (0.0, {bruises, gill-color, habitat, stalk-root, stalk-surface-above-ring})
# (0.0031735493851989816, {bruises, gill-color, habitat, stalk-root})
# (0.004134518654214325, {bruises, habitat, ring-type, stalk-root})
# (0.022964087272247635, {habitat, ring-type, stalk-root, stalk-shape})
...
Now the tuple dimension is 5, but there are several variations.
We can see that there are multiple subsets of the query variables, $V_{\mathrm{k}}$, not necessarily including either odor
or spore-print-color
, that can predict the label variables or edibility, $V_{\mathrm{l}} = \{\mathrm{edible}\}$. For example, the tuple $X \subset V_{\mathrm{k}}$ or the tuple $Y \subset V_{\mathrm{k}}$.
Predicting odor without modelling
Now consider if there are tuples that can predict variable odor
rather than variable edible
. Let $V_{\mathrm{l2}} = \{\mathrm{odor}\}$,
vvl2 = sset([odor])
The entropy is $\mathrm{entropy}(A\%V_{\mathrm{l2}})$,
ent(red(aa,vvl2))
1.6076955835943616
The label entropy is $\mathrm{lent}(A,V_{\mathrm{k2}},V_{\mathrm{l2}})$,
lent(aa,vvk2,vvl2)
0.3019349802152096
The label entropy is non-zero, so odor cannot be perfectly predicted even with all of the query variables.
If we add edible
to the query variables, the odor is still ambiguous, $\mathrm{lent}(A,~V \setminus V_{\mathrm{l2}},~V_{\mathrm{l2}}) > 0$,
lent(aa,vv-vvl2,vvl2)
0.3019349802152096
A tetra-variate tuple obtains most of what causality there is,
hh4 = hrhrred(hh,vvk2|vvl2)
(kmax,omax,qmax) = (1, 5, 5)
rpln(buildcondrr(vvl2,hh4,kmax,omax,qmax))
# (0.9477982525833593, {spore-print-color})
# (1.0064100040408621, {gill-color})
# (1.0284970138083624, {stalk-root})
# (1.083608451873597, {ring-type})
# (1.2126735369913884, {cap-color})
(kmax,omax,qmax) = (4, 5, 5)
rpln(buildcondrr(vvl2,hh4,kmax,omax,qmax))
# (0.3146291777557688, {cap-color, spore-print-color, stalk-root, stalk-shape})
# (0.31607649976978536, {cap-surface, spore-print-color, stalk-root, stalk-shape})
# (0.32162256927776056, {bruises, gill-size, spore-print-color, stalk-root})
# (0.323061152914073, {cap-surface, gill-size, spore-print-color, stalk-root})
# (0.3233325116314498, {gill-size, habitat, spore-print-color, stalk-shape})
(kmax,omax,qmax) = (6, 5, 5)
rpln(buildcondrr(vvl2,hh4,kmax,omax,qmax))
# (0.3019349802149671, {cap-surface, gill-color, gill-size, spore-print-color, stalk-root, stalk-shape})
# (0.301934980214968, {cap-color, cap-surface, spore-print-color, stalk-color-above-ring, stalk-root, stalk-shape})
# (0.3019349802149689, {cap-color, cap-surface, spore-print-color, stalk-color-below-ring, stalk-root, stalk-shape})
# (0.3019349802149698, {bruises, cap-color, cap-surface, spore-print-color, stalk-root, stalk-shape})
# (0.3019349802149698, {cap-color, cap-surface, gill-size, spore-print-color, stalk-root, stalk-shape})
Instead of measuring the predictability of odor by label entropy we can measure the label modal size, \[ \begin{eqnarray} \sum_{R \in (A\%K)^{\mathrm{FS}}} \mathrm{maxr}(A * \{R\}^{\mathrm{U}}~\%~(V \setminus K)) \end{eqnarray} \] More generally, define \[ \begin{eqnarray} \mathrm{lmodal}(A,W,V_{\mathrm{l}})~:=~\sum_{R \in (A\%W)^{\mathrm{FS}}} \mathrm{maxr}(A~\%~(W \cup V_{\mathrm{l}}) * \{R\}^{\mathrm{U}}~\%~V_{\mathrm{l}}) \end{eqnarray} \] The tuple $Z$ is defined \[ Z~=~\{\mathrm{cap color},~\mathrm{spore print color},~\mathrm{stalk root},~\mathrm{stalk shape}\} \]
zz = sset([VarStr(s) for s in ["cap-color","spore-print-color","stalk-root","stalk-shape"]])
The label entropy fraction of tuple $Z$ is $1 - \mathrm{lent}(A,Z,V_{\mathrm{l2}})/\mathrm{entropy}(A\%V_{\mathrm{l2}})$,
lent(aa,zz,vvl2)
0.3146291777557706
ent(red(aa,vvl2))
1.6076955835943616
1.0 - 0.3146291777557706/1.6076955835943616
0.8042980394009996
To calculate the label modal size setVarsHistogramsSliceModal
is defined in module Alignment
,
setVarsHistogramsSliceModal :: Set.Set Variable -> Histogram -> Rational
The label modal size fraction of tuple $Z$ is $\mathrm{lmodal}(A,Z,V_{\mathrm{l2}})/\mathrm{size}(A\%V_{\mathrm{l2}})$,
def lmodal(aa,ww,vvl):
return setVarsHistogramsSliceModal(ww,red(aa,ww|vvl))
lmodal(aa,zz,vvl2)
# 6524 % 1
size(red(aa,vvl2))
# 8124 % 1
6524.0/8124.0
0.8030526834071886
Both measures can be interpreted as implying an odor prediction accuracy of around 80%.
We can analyse the components containing ambiguous values for variable odor
. Define
\[
\begin{eqnarray}
\mathrm{lslices}(A,W,V_{\mathrm{l}})~:=~\{(R,~A~\%~(W \cup V_{\mathrm{l}}) * \{R\}^{\mathrm{U}}) : R \in (A\%W)^{\mathrm{FS}}\}
\end{eqnarray}
\]
def lslicesll(aa,ww,vvl):
return list(setVarsHistogramsSlices(ww,red(aa,ww|vvl)).items())
Then \[ \begin{eqnarray} \{C’ : (R,C) \in \mathrm{lslices}(A,Z,V_{\mathrm{l2}}),~C’ = C\%V_{\mathrm{l2}},~\mathrm{size}(C^{‘\mathrm{F}}) > 1\} \end{eqnarray} \]
rpln([cc1 for (rr,cc) in lslicesll(aa,zz,vvl2) for cc1 in [red(cc,vvl2)] if size(eff(cc1)) > 1])
# {({(odor, none)}, 24 % 1), ({(odor, pungent)}, 64 % 1)}
# {({(odor, almond)}, 24 % 1), ({(odor, anise)}, 24 % 1)}
# {({(odor, none)}, 24 % 1), ({(odor, pungent)}, 64 % 1)}
# {({(odor, almond)}, 24 % 1), ({(odor, anise)}, 24 % 1)}
# {({(odor, fishy)}, 288 % 1), ({(odor, foul)}, 288 % 1), ({(odor, spicy)}, 288 % 1)}
# {({(odor, fishy)}, 288 % 1), ({(odor, foul)}, 288 % 1), ({(odor, spicy)}, 288 % 1)}
# {({(odor, almond)}, 64 % 1), ({(odor, anise)}, 64 % 1)}
# {({(odor, almond)}, 12 % 1), ({(odor, anise)}, 12 % 1)}
# {({(odor, almond)}, 64 % 1), ({(odor, anise)}, 64 % 1)}
# {({(odor, almond)}, 12 % 1), ({(odor, anise)}, 12 % 1)}
# {({(odor, almond)}, 64 % 1), ({(odor, anise)}, 64 % 1)}
# {({(odor, almond)}, 24 % 1), ({(odor, anise)}, 24 % 1)}
# {({(odor, almond)}, 12 % 1), ({(odor, anise)}, 12 % 1)}
# {({(odor, almond)}, 64 % 1), ({(odor, anise)}, 64 % 1)}
# {({(odor, almond)}, 24 % 1), ({(odor, anise)}, 24 % 1)}
# {({(odor, almond)}, 12 % 1), ({(odor, anise)}, 12 % 1)}
We can see that in some of the components the size of each value is duplicated. For example, in the last case the values almond
and anise
both have component size of 12,
rr = [rr for (rr,cc) in lslicesll(aa,zz,vvl2) for cc1 in [red(cc,vvl2)] if size(eff(cc1)) > 1][-1]
rr
# {(cap-color, yellow), (spore-print-color, purple), (stalk-root, bulbous), (stalk-shape, tapering)}
Then $A * \{R\}^{\mathrm{U}}~\%~(Z \cup V_{\mathrm{l2}})$ is
rpln(aall(red(mul(aa,single(rr,1)),zz|vvl2)))
# ({(cap-color, yellow), (odor, almond), (spore-print-color, purple), (stalk-root, bulbous), (stalk-shape, tapering)}, 12 % 1)
# ({(cap-color, yellow), (odor, anise), (spore-print-color, purple), (stalk-root, bulbous), (stalk-shape, tapering)}, 12 % 1)
size(eff(mul(aa,single(rr,1))))
# 24 % 1
This duplication probably arises from the method used in the construction of the hypothetical mushroom samples.
As mentioned above, edibility is also somewhat predictive of odor,
edible = VarStr("edible")
The label entropy fraction is $1 - \mathrm{lent}(A,\{\mathrm{edible}\},\{\mathrm{odor}\})/\mathrm{entropy}(A\%\{\mathrm{odor}\})$,
lent(aa,sset([edible]),sset([odor]))
0.9796522676447261
ent(red(aa,sset([odor])))
1.6076955835943616
1.0 - 0.9796522676447261/1.6076955835943616
0.39064815650330065
The label modal size fraction is $\mathrm{lmodal}(A,\{\mathrm{edible}\},\{\mathrm{odor}\})/\mathrm{size}(A\%\{\mathrm{odor}\})$,
lmodal(aa,sset([edible]),sset([odor]))
# 5568 % 1
size(red(aa,sset([odor])))
# 8124 % 1
5568.0/8124.0
0.6853766617429837
but the odor prediction accuracy is lower, around 40-70%.
Manual modelling of edibility
Having seen that edibility is predicted by various subsets of the substrate, $V$, consider if a model can do this in a more concise way.
There are some rules for poisonous mushrooms from most general to most specific:
P_1) odor=NOT(almond.OR.anise.OR.none)
120 poisonous cases missed, 98.52% accuracy
P_2) spore-print-color=green
48 cases missed, 99.41% accuracy
P_3) odor=none.AND.stalk-surface-below-ring=scaly.AND.
(stalk-color-above-ring=NOT.brown)
8 cases missed, 99.90% accuracy
P_4) habitat=leaves.AND.cap-color=white
100% accuracy
Rule P_4) may also be
P_4') population=clustered.AND.cap_color=white
We have created a fud of transforms for each of these rules in MUSH_model_manual.json (see Manual model construction).
First, load the model $G_{\mathrm{m}}$,
ggm = persistentsFud(json.load(open('./MUSH_model_manual.json', 'r')))
uu1 = uunion(uu,fsys(ggm))
The model has 4 derived variables, $W_{\mathrm{m}} = \mathrm{der}(G_{\mathrm{m}})$,
fder(ggm)
# {p1, p2, p3, p4}
and a derived volume, $|W_{\mathrm{m}}^{\mathrm{C}}|$, of 16,
vol(uu1,fder(ggm))
16
The model has 6 underlying variables, $V_{\mathrm{m}} = \mathrm{und}(G_{\mathrm{m}})$,
fund(ggm)
# {cap-color, habitat, odor, spore-print-color, stalk-color-above-ring, stalk-surface-below-ring}
The underlying volume, $|V_{\mathrm{m}}^{\mathrm{C}}|$, is
vol(uu1,fund(ggm))
204120
Let the derived be $A’ = A * G_{\mathrm{m}}^{\mathrm{T}}$. The derived alignment, $\mathrm{algn}(A’)$, is
aa1 = red(fmul(aa,ggm),fder(ggm))
algn(aa1)
69.86642836898682
The derived variables are only weakly aligned. Furthermore, they are overlapped, $\mathrm{overlap}(G_{\mathrm{m}}^{\mathrm{T}})$,
fudsOverlap(ggm)
# True
so the content derived alignment, $\mathrm{algn}(A * G_{\mathrm{m}}^{\mathrm{T}}) - \mathrm{algn}(A^{\mathrm{X}} * G_{\mathrm{m}}^{\mathrm{T}})$, would be lower still.
The derived entropy, $\mathrm{entropy}(A’)$, is
ent(aa1)
0.7711287134449115
This may be compared to the logarithm of the derived volume, $\ln |W_{\mathrm{m}}^{\mathrm{C}}|$,
w = vol(uu1,fder(ggm))
log(w)
2.772588722239781
Let the cartesian derived be $V_{\mathrm{m}}^{\mathrm{C}’} = V_{\mathrm{m}}^{\mathrm{C}} * G_{\mathrm{m}}^{\mathrm{T}}$. The cartesian derived entropy, $\mathrm{entropy}(V_{\mathrm{m}}^{\mathrm{C}’})$, depends on the underlying cartesian, $V_{\mathrm{m}}^{\mathrm{C}}$, but the underlying volume, $|V_{\mathrm{m}}^{\mathrm{C}}|$, is quite large so we calculate the cartesian derived entropy by constructing a HistoryRepa
,
hvvg = aahr(uu1,unit(cart(uu1,fund(ggm))))
hrsize(hvvg)
204120
vvc1 = hhaa(hrhh(uu1,hrhrred(hrfmul(uu1,ggm,hvvg),fder(ggm))))
ent(vvc1)
1.1482395879784482
The cartesian derived entropy is greater than the derived entropy, $\mathrm{entropy}(V_{\mathrm{m}}^{\mathrm{C}’}) > \mathrm{entropy}(A’)$.
The size-volume scaled component size cardinality sum relative entropy is the size-volume scaled component size cardinality sum cross entropy minus the size-volume scaled component size cardinality sum entropy (Transform entropy), \[ \begin{eqnarray} (z+v_{\mathrm{m}}) \times \mathrm{entropy}(A * G_{\mathrm{m}}^{\mathrm{T}} + V_{\mathrm{m}}^{\mathrm{C}} * G_{\mathrm{m}}^{\mathrm{T}}) - z \times \mathrm{entropy}(A * G_{\mathrm{m}}^{\mathrm{T}}) - v_{\mathrm{m}} \times \mathrm{entropy}(V_{\mathrm{m}}^{\mathrm{C}} * G_{\mathrm{m}}^{\mathrm{T}}) \end{eqnarray} \]
z = size(aa1)
v = size(vvc1)
(z+v) * ent(add(aa1,vvc1)) - z * ent(aa1) - v * ent(vvc1)
1663.472301909118
(z+v) * log(w)
588465.3207630601
Define the abbreviation rent
for the size-volume scaled component size cardinality sum relative entropy,
\[
\begin{eqnarray}
\mathrm{rent}(A,B)~:=~(z_A+z_B) \times \mathrm{entropy}(A + B) - z_A \times \mathrm{entropy}(A) - z_B \times \mathrm{entropy}(B)
\end{eqnarray}
\]
def rent(aa,bb):
a = size(aa)
b = size(bb)
return (a+b) * ent(add(aa,bb)) - a * ent(aa) - b * ent(bb)
Then the relative entropy is $\mathrm{rent}(A’,V_{\mathrm{m}}^{\mathrm{C}’})$,
rent(aa1,vvc1)
1663.472301909118
Like the derived alignment, the relative entropy is quite low. These statistics are interesting because both give us a measure of the likelihood of the model. This is especially the case for the size-volume scaled component size cardinality sum relative entropy, $\mathrm{rent}(A * G_{\mathrm{m}}^{\mathrm{T}},V_{\mathrm{m}}^{\mathrm{C}} * G_{\mathrm{m}}^{\mathrm{T}})$, which is discussed in the ‘Induction with model’ section of the Overview of the paper.
In the discussion of induced models below the underlying volumes are impracticably large so let us approximate the relative entropy by using a volume sized shuffle. We constructed a shuffle, $A_{\mathrm{r}}$, earlier when discussing tuples in the substrate,
aar = hhaa(hrhh(uu,hhr))
size(aar)
# 8124 % 1
We will calculate the size-volume-sized-shuffle relative entropy, \[ \begin{eqnarray} (z+v_{\mathrm{m}}) \times \mathrm{ent}(A * G_{\mathrm{m}}^{\mathrm{T}} + Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}} * G_{\mathrm{m}}^{\mathrm{T}}) - z \times \mathrm{ent}(A * G_{\mathrm{m}}^{\mathrm{T}}) - v_{\mathrm{m}} \times \mathrm{ent}(Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}} * G_{\mathrm{m}}^{\mathrm{T}}) \end{eqnarray} \] where $v_{\mathrm{m}} = |V_{\mathrm{m}}^{\mathrm{C}}|$ and $Z_{\mathrm{m}} = \mathrm{scalar}(v_{\mathrm{m}})$.
Let the shuffle derived be $A_{\mathrm{r}}’ = A_{\mathrm{r}} * G_{\mathrm{m}}^{\mathrm{T}}$,
aar1 = red(fmul(aar,ggm),fder(ggm))
The shuffle derived alignment, $\mathrm{algn}(A_{\mathrm{r}}’)$ is expected to be low,
algn(aar1)
66.88979584247136
The volume sized shuffle derived entropy, $\mathrm{entropy}(Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}}’)$, is
ent(resize(v,aar1))
0.8671533652420039
and the size-volume-sized-shuffle relative entropy, $\mathrm{rent}(A’,~Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}}’)$, is
rent(aa1,resize(v,aar1))
146.8218190143234
We can see that the size-volume-sized-shuffle relative entropy, $\mathrm{rent}(A’,~Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}}’)$, is lower than the size-volume relative entropy, $\mathrm{rent}(A’,V_{\mathrm{m}}^{\mathrm{C}’})$. This is because the volume sized shuffle, $Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}}$, is less uniform than the cartesian, $V_{\mathrm{m}}^{\mathrm{C}}$, and tends to synchronise with the sample, $A$. However, the size-volume-sized-shuffle relative entropy provides us with a measure of the likelihood of the model.
Now apply the model to the sample. Let $B = A * \mathrm{his}(G_{\mathrm{m}}^{\mathrm{T}})$,
bb = fmul(aa,ggm)
rpln(aall(red(bb,fder(ggm)|vvl)))
# ({(edible, edible), (p1, 0), (p2, 0), (p3, 0), (p4, 0)}, 4208 % 1)
# ({(edible, poisonous), (p1, 0), (p2, 0), (p3, 0), (p4, 1)}, 8 % 1)
# ({(edible, poisonous), (p1, 0), (p2, 0), (p3, 1), (p4, 0)}, 40 % 1)
# ({(edible, poisonous), (p1, 0), (p2, 1), (p3, 0), (p4, 0)}, 72 % 1)
# ({(edible, poisonous), (p1, 1), (p2, 0), (p3, 0), (p4, 0)}, 3796 % 1)
size(eff(red(bb,fder(ggm)|vvl)))
# 5 % 1
rpln(ssplit(fder(ggm),states(red(bb,fder(ggm)|vvl))))
# ({(p1, 0), (p2, 0), (p3, 0), (p4, 0)}, {(edible, edible)})
# ({(p1, 0), (p2, 0), (p3, 0), (p4, 1)}, {(edible, poisonous)})
# ({(p1, 0), (p2, 0), (p3, 1), (p4, 0)}, {(edible, poisonous)})
# ({(p1, 0), (p2, 1), (p3, 0), (p4, 0)}, {(edible, poisonous)})
# ({(p1, 1), (p2, 0), (p3, 0), (p4, 0)}, {(edible, poisonous)})
We can see that together the rules P1-4 are functionally or causally related to edibility, $(B\%W_{\mathrm{m}})^{\mathrm{FS}} \to (B\%V_{\mathrm{l}})^{\mathrm{FS}}$. In addition, there are only 5 effective states of 16 derived states, so the model, $G_{\mathrm{m}}$, might be said to be more concise than the tuples $X$ and $Y$ of the non-modelled case above.
The model entropy is similar to the slice entropy of the non-modelled case. The model’s label entropy or query conditional entropy is zero, $\mathrm{lent}(B,W_{\mathrm{m}},V_{\mathrm{l}}) = 0$.
[p1,p2,p3,p4] = map(VarStr,["p1","p2","p3","p4"])
lent(bb,sset([p1,p2,p3,p4]),vvl)
0.0
lent(bb,sset([p1]),vvl)
0.0675240166808252
lent(bb,sset([p2]),vvl)
0.6859909862350153
lent(bb,sset([p3]),vvl)
0.6888949375684059
lent(bb,sset([p4]),vvl)
0.6917819609609287
lent(bb,sset([p1,p2]),vvl)
0.03237355156509225
lent(bb,sset([p1,p2,p3]),vvl)
0.007155343359194988
Rule P1 is far more predictive of edibility than the other rules, having a label entropy, $\mathrm{lent}(B,\{\mathrm{p}_1\},V_{\mathrm{l}})$, of only 0.0675240166808252
, whereas the other rules are close to the maximum label entropy. The label entropy fraction is $1 - \mathrm{lent}(B,\{\mathrm{p}_1\},V_{\mathrm{l}})/\ln |V_{\mathrm{l}}^{\mathrm{C}}|$,
vol(uu,vvl)
2
log(2)
0.6931471805599453
1.0 - 0.0675240166808252/0.6931471805599453
0.9025834359936699
The label modal size fraction is $\mathrm{lmodal}(B,\{\mathrm{p}_1\},V_{\mathrm{l}})/\mathrm{size}(B\%V_{\mathrm{l}})$,
lmodal(bb,sset([p1]),vvl)
# 8004 % 1
size(red(bb,vvl))
# 8124 % 1
8004.0/8124.0
0.9852289512555391
As noted above, odor is highly predictive of edibility,
lent(aa,sset([odor]),vvl)
0.06445777995546464
lmodal(aa,sset([odor]),vvl)
# 8004 % 1
rpln(aall(red(aa,sset([odor])|vvl)))
# ({(edible, edible), (odor, almond)}, 400 % 1)
# ({(edible, edible), (odor, anise)}, 400 % 1)
# ({(edible, edible), (odor, none)}, 3408 % 1)
# ({(edible, poisonous), (odor, creosote)}, 192 % 1)
# ({(edible, poisonous), (odor, fishy)}, 576 % 1)
# ({(edible, poisonous), (odor, foul)}, 2160 % 1)
# ({(edible, poisonous), (odor, musty)}, 36 % 1)
# ({(edible, poisonous), (odor, none)}, 120 % 1)
# ({(edible, poisonous), (odor, pungent)}, 256 % 1)
# ({(edible, poisonous), (odor, spicy)}, 576 % 1)
p1
depends only on odor
. Let $T_1 \in G_{\mathrm{m}}$ be such that $\mathrm{der}(T_1) = \{\mathrm{p}_1\}$. Then $\mathrm{und}(T_1) = \{\mathrm{odor}\}$,
tt1 = ffqq(fdep(ggm,sset([p1])))[0]
rpln(aall(red(mul(aa,ttaa(tt1)),der(tt1)|vvl)))
# ({(edible, edible), (p1, 0)}, 4208 % 1)
# ({(edible, poisonous), (p1, 0)}, 120 % 1)
# ({(edible, poisonous), (p1, 1)}, 3796 % 1)
und(tt1)
# {odor}
rpln(states(ttaa(tt1)))
# {(odor, almond), (p1, 0)}
# {(odor, anise), (p1, 0)}
# {(odor, creosote), (p1, 1)}
# {(odor, fishy), (p1, 1)}
# {(odor, foul), (p1, 1)}
# {(odor, musty), (p1, 1)}
# {(odor, none), (p1, 0)}
# {(odor, pungent), (p1, 1)}
# {(odor, spicy), (p1, 1)}
rpln(aall(red(mul(aa,ttaa(tt1)),tvars(tt1)|vvl)))
# ({(edible, edible), (odor, almond), (p1, 0)}, 400 % 1)
# ({(edible, edible), (odor, anise), (p1, 0)}, 400 % 1)
# ({(edible, edible), (odor, none), (p1, 0)}, 3408 % 1)
# ({(edible, poisonous), (odor, creosote), (p1, 1)}, 192 % 1)
# ({(edible, poisonous), (odor, fishy), (p1, 1)}, 576 % 1)
# ({(edible, poisonous), (odor, foul), (p1, 1)}, 2160 % 1)
# ({(edible, poisonous), (odor, musty), (p1, 1)}, 36 % 1)
# ({(edible, poisonous), (odor, none), (p1, 0)}, 120 % 1)
# ({(edible, poisonous), (odor, pungent), (p1, 1)}, 256 % 1)
# ({(edible, poisonous), (odor, spicy), (p1, 1)}, 576 % 1)
We can also consider how predictive the model is of odor. The label entropy fraction is $1 - \mathrm{lent}(B,W_{\mathrm{m}},V_{\mathrm{l2}})/\ln |V_{\mathrm{l2}}^{\mathrm{C}}|$,
vvl2
# {odor}
vol(uu,vvl2)
9
log(9)
2.1972245773362196
lent(bb,sset([p1,p2,p3,p4]),vvl2)
0.9136278428753103
1.0 - 0.9136278428753103/2.1972245773362196
0.584190049438216
The label modal size fraction is $\mathrm{lmodal}(B,W_{\mathrm{m}},V_{\mathrm{l2}})/\mathrm{size}(B\%V_{\mathrm{l2}})$,
lmodal(bb,sset([p1,p2,p3,p4]),vvl2)
# 5688 % 1
5688.0/8124.0
0.7001477104874446
The model, $G_{\mathrm{m}}$, is only 60-70% accurate with respect to odor, even though odor
is in the underlying variables, $\mathrm{odor} \in V_{\mathrm{m}}$.
Induced modelling of edibility
Having considered a manually defined model of edibility, $G_{\mathrm{m}}$, now consider an unsupervised induced model $D$ on the query variables, $V_{\mathrm{k}}$, which exclude edibility. By unsupervised we mean an induced model that is optimised not to minimise the label entropy, nor to maximise the label modal size, but rather to maximise the summed alignment valency-density.
Then we shall analyse this model, $D$, to find a smaller submodel that predicts the label variables, $V_{\mathrm{l}}$, or edibility. That is, we shall search in the decomposition fud for a submodel that optimises conditional entropy.
Here the induced model is created by the limited-nodes highest-layer excluded-self maximum-roll-by-derived-dimension fud decomper, $(\cdot,D) = I_{P,U,\mathrm{D,F,mm,xs,d,f}}((V_{\mathrm{k}},A))$.
There are some examples of model induction in the MUSH repository.
First consider the fud decomposition MUSH_model17.json (see Model induction),
df = persistentsDecompFud_u(json.load(open('./MUSH_model17.json', 'r')))
uu1 = uunion(uu,fsys(dfff(df)))
len(uvars(uu1))
132
Let us examine the tree of the fud decomposition, \[ \begin{eqnarray} \{\{(S,~\mathrm{und}(F),~\mathrm{der}(F)) : (S,F) \in L\} : L \in \mathrm{paths}(D)\} \end{eqnarray} \]
rpln(treesPaths(funcsTreesMap(lambda xx:(xx[0],fund(xx[1]),fder(xx[1])),dfzz(df))))
...
The fud identifier is a VarInt
that is set by the inducer as part of the naming convention of the derived variables,
\[
\begin{eqnarray}
\mathrm{fid}(F)~:=~f : ((f,\cdot),\cdot) \in \mathrm{der}(F)
\end{eqnarray}
\]
The decomposition tree contains 7 nodes with fud identifiers as follows,
\[
\begin{eqnarray}
\{\{\mathrm{fid}(F) : (\cdot,F) \in L\} : L \in \mathrm{paths}(D)\}
\end{eqnarray}
\]
def fid(ff):
return variablesVariableFud(fder(ff)[0])
rpln(treesSubPaths(funcsTreesMap(lambda xx:fid(xx[1]),dfzz(df))))
# [1]
# [1, 2]
# [1, 2, 7]
# [1, 2, 9]
# [1, 2, 10]
# [1, 3]
# [1, 4]
Now consider the summed alignment and the summed alignment valency-density, $\mathrm{summation}(U_1,D,A))$,
(wmax,lmax,xmax,omax,bmax,mmax,umax,pmax,fmax,mult,seed) = ((9*9*10), 8, (9*9*10), 20, (20*3), 3, (9*9*10), 1, 10, 7, 5)
summation(mult,seed,uu1,df1,hh)
(85780.45912794449, 37161.48267081803)
\[ \begin{eqnarray} \{(\mathrm{fid}(F),~z_C,~a) : ((S,F),(z_C,(a,a_{\mathrm{d}}))) \in \mathrm{nodes}(\mathrm{sumtree}(U_1,D,A))\} \end{eqnarray} \]
sumtree = systemsDecompFudsHistoryRepasTreeAlignmentContentShuffleSummation_u
rpln([(fid(ff),zc,a) for ((ss,ff),(zc,(a,ad))) in sumtree(mult,seed,uu1,df,hr).items()])
# (1, 8124, 39181.46354001778)
# (2, 3276, 14654.951059674358)
# (4, 1824, 2802.8249523523555)
# (3, 3024, 15354.177038855069)
# (9, 972, 3435.8013895566273)
# (10, 832, 3314.590501097839)
# (7, 1472, 7036.65064639045)
We can see that the root fud has the highest slice size and shuffle content derived alignment, while the leaf fuds have small slice sizes and shuffle content derived alignments.
The bare model is a fud decomposition. As noted in Conversion to fud, the tree of a fud decomposition is sometimes unwieldy, so consider the fud decomposition fud, $F = D^{\mathrm{F}} \in \mathcal{F}$, (see Practicable fud decomposition fud),
ff = systemsDecompFudsNullablePracticable(uu1,df,1)
uu2 = uunion(uu,fsys(ff))
len(uvars(uu2))
197
The model, $F$, has 56 derived variables, $W_F = \mathrm{der}(F)$, and a large derived volume, $|W_F^{\mathrm{C}}|$,
len(fder(ff))
56
fder(ff)
# {<<1,n>,1>, <<1,n>,2>, ... <<1,n>,7>, <<2,n>,1>, <<2,n>,2>, ... <<9,n>,8>, <<10,n>,1>, <<10,n>,2>, ... <<10,n>,7>}
vol(uu2,fder(ff))
2065214267056164664258854912
The model has 20 underlying variables, $V_F = \mathrm{und}(F)$,
len(fund(ff))
20
vv - fund(ff)
# {edible, veil-color, veil-type}
That is, the model depends on all of the substrate except for the label variable, edible
, variable veil-color
and mono-valent veil-type
. This is consistent with the observation above that none of the substrate variables, except for veil-type
, is independent of the others, and that veil-color
is only weakly dependent.
The underlying volume, $|V_F^{\mathrm{C}}|$, is
vol(uu,fund(ff))
30474952704000
The derived entropy, $\mathrm{entropy}(A * F)$, is
aa1 = hhaa(hrhh(uu2,hrhrred(hrfmul(uu2,ff,hh),fder(ff))))
ent(aa1)
2.2056420385272157
This may be compared to the logarithm of the derived volume, $\ln |W_F^{\mathrm{C}}|$,
w = vol(uu2,fder(ff))
log(w)
62.89503149315568
So derived entropy is quite low. This is because there are only 15 effective derived states,
size(eff(aa1))
# 15 % 1
rpln([c for (ss,c) in aall(aa1)])
# 188 % 1
# 48 % 1
# 8 % 1
# 56 % 1
# 384 % 1
# 288 % 1
# 288 % 1
# 288 % 1
# 256 % 1
# 704 % 1
# 768 % 1
# 1728 % 1
# 96 % 1
# 1296 % 1
# 1728 % 1
The cartesian derived entropy, $\mathrm{entropy}(V_F^{\mathrm{C}} * F)$, depends on the underlying cartesian, $V_F^{\mathrm{C}}$. The underlying volume is too large to compute, so we are unable to calculate the cartesian derived entropy or the component size cardinality sum relative entropy. Instead we can compute an approximation to the size-volume scaled component size independent sum relative entropy using a volume sized shuffle, \[ \begin{eqnarray} (z+v_F) \times \mathrm{ent}(A * F^{\mathrm{T}} + Z_F * \hat{A}_{\mathrm{r}} * F^{\mathrm{T}}) - z \times \mathrm{ent}(A * F^{\mathrm{T}}) - v_F \times \mathrm{ent}(Z_F * \hat{A}_{\mathrm{r}} * F^{\mathrm{T}}) \end{eqnarray} \] where $v_F = |V_F^{\mathrm{C}}|$ and $Z_F = \mathrm{scalar}(v_F)$.
aar = hhaa(hrhh(uu,hhr))
size(aar)
# 8124 % 1
def vsize(uu,xx,aa):
return resize(vol(uu,xx),aa)
aar1 = hhaa(hrhh(uu2,hrhrred(hrfmul(uu2,ff,hhr),fder(ff))))
ent(vsize(uu,fund(ff),aar1))
4.693386848554536
rent(aa1,vsize(uu,fund(ff),aar1))
112112.4375
We can see that by this measure the relative entropy of the induced model, $\mathrm{rent}(A * F^{\mathrm{T}},~Z_F * \hat{A}_{\mathrm{r}} * F^{\mathrm{T}})$, is much higher than the relative entropy of the manual model, $\mathrm{rent}(A * G_{\mathrm{m}}^{\mathrm{T}},~Z_{\mathrm{m}} * \hat{A}_{\mathrm{r}} * G_{\mathrm{m}}^{\mathrm{T}})$. This is consistent with the derived aligment, $\mathrm{algn}(A * F)$, implied by the summed alignment, $\mathrm{summation}(U_1,D,A)$, which is also higher for the induced model.
Now apply the model to the sample. Let $B = A * \prod\mathrm{his}(F)$,
hhb = hrfmul(uu2,ff,hh)
rpln(aall(hhaa(hrhh(uu2,hrhrred(hhb,fder(ff)|vvl)))))
# ({(edible, edible), (<<1,n>,1>, 0), ... (<<10,n>,7>, null)}, 72 % 1)
...
# ({(edible, poisonous), (<<1,n>,1>, 1), ... (<<10,n>,7>, null)}, 1728 % 1)
size(eff(hhaa(hrhh(uu2,hrhrred(hhb,fder(ff)|vvl)))))
# 19 % 1
rpln(ssplit(fder(ff),states(hhaa(hrhh(uu2,hrhrred(hhb,fder(ff)|vvl))))))
# ({(<<1,n>,1>, 0), ... (<<10,n>,7>, null)}, {(edible, edible)})
# ...
# ({(<<1,n>,1>, 1), ... (<<10,n>,7>, null)}, {(edible, poisonous)})
The model derived variables, $W_F$, are almost causally related to edibility, $(B\%W_F)^{\mathrm{FS}} \to (B\%V_{\mathrm{l}})^{\mathrm{FS}}$. The model’s label entropy or query conditional entropy is near zero, $\mathrm{lent}(B,W_F,V_{\mathrm{l}}) \approx 0$,
def hrlent(uu,hh,ww,vvl):
return ent(hhaa(hrhh(uu,hrhrred(hh,ww|vvl)))) - ent(hhaa(hrhh(uu,hrhrred(hh,ww))))
hrlent(uu2,hhb,fder(ff),vvl)
0.04488778006332694
rpln(sset([(hrlent(uu2,hhb,sset([w]),vvl),w) for w in fder(ff)]))
# (0.2361093658636677, <<1,n>,3>)
# (0.2361093658636677, <<1,n>,4>)
# ...
# (0.29101789445875514, <<1,n>,5>)
# (0.29101789445875526, <<3,n>,1>)
# ...
# (0.29101789445875537, <<3,n>,9>)
# (0.4844913448854371, <<2,n>,2>)
# ...
# (0.4844913448854371, <<2,n>,6>)
# (0.5143617442765938, <<4,n>,1>)
# ...
# (0.5143617442765938, <<4,n>,9>)
# (0.549068727770982, <<2,n>,1>)
# ...
# (0.549068727770982, <<2,n>,7>)
# (0.5546035042868757, <<7,n>,1>)
# ...
# (0.5546035042868757, <<7,n>,9>)
# (0.634616073600902, <<9,n>,1>)
# (0.634616073600902, <<9,n>,2>)
# (0.637373464649287, <<10,n>,1>)
# ...
# (0.6460129913993978, <<10,n>,6>)
# (0.6509225684291914, <<9,n>,7>)
# ...
# (0.6605611705755877, <<9,n>,6>)
We can see that the derived variables nearest the root fud tend to have the lowest label entropy. None have zero label entropy by themselves. Consider derived variable <<1,n>,4>
in the root fud,
w1n4 = stringsVariable("<<1,n>,4>")
fund(fdep(ff,sset([w1n4])))
# {bruises, gill-color, gill-size, habitat, odor, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring}
hrlent(uu2,hhb,sset([w1n4]),vvl)
0.2361093658636677
rpln(aall(hhaa(hrhh(uu2,hrhrred(hhb,sset([w1n4])|vvl)))))
# ({(edible, edible), (<<1,n>,4>, 0)}, 2384 % 1)
# ({(edible, edible), (<<1,n>,4>, 1)}, 1824 % 1)
# ({(edible, poisonous), (<<1,n>,4>, 0)}, 892 % 1)
# ({(edible, poisonous), (<<1,n>,4>, 2)}, 3024 % 1)
rpln(ssplit(fder(ff),states(hhaa(hrhh(uu2,hrhrred(hhb,sset([w1n4])|vvl))))))
# ({(<<1,n>,4>, 0)}, {(edible, edible)})
# ({(<<1,n>,4>, 0)}, {(edible, poisonous)})
# ({(<<1,n>,4>, 1)}, {(edible, edible)})
# ({(<<1,n>,4>, 2)}, {(edible, poisonous)})
Now consider the label entropy for all of the fud variables, $\mathrm{vars}(F)$, not just the fud derived variables, $\mathrm{der}(F)$. We can determine minimum subsets of the query variables that are causal or predictive by using the repa conditional entropy tuple set builder. The conditional entropy minimisation searches for the set of tuples with the least label entropy. We show the resultant tuples along with their label entropies, \[ \{(\mathrm{lent}(B,M,V_{\mathrm{l}}),~M) : M \in \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,B,\mathrm{L}}))\} \]
def buildcondrr(vvl,aa,kmax,omax,qmax):
return sset([(b,a) for (a,b) in parametersBuilderConditionalVarsRepa(kmax,omax,qmax,vvl,aa).items()])
(kmax,omax,qmax) = (1, 5, 5)
rpln(buildcondrr(vvl,hhb,kmax,omax,qmax))
# (0.06445777995546442, {odor})
# (0.16474416484069354, {<<1,1>,31>})
# (0.16917391545419513, {<<1,1>,4>})
# (0.2263923811825881, {<<1,1>,49>})
# (0.23421502771371605, {<<1,2>,95>})
Variable <<1,1>,31>
is nearly as predictive as variable odor
. Variable <<1,1>,31>
is in the bottom layer of fud 1 and is defined as follows -
{
"derived":["<<1,1>,31>"],
"history":{
"hsystem":[
{"var": "odor", "values": ["almond", "anise", "creosote", "fishy", "foul", "musty", "none", "pungent", "spicy"]},
{"var": "<<1,1>,31>", "values": ["0", "1", "2"]}
],
"hstates":[
[0, 0],
[1, 0],
[2, 0],
[3, 1],
[4, 1],
[5, 1],
[6, 2],
[7, 0],
[8, 1]
]
}
}
or
w1131 = stringsVariable("<<1,1>,31>")
fund(fdep(ff,sset([w1131])))
# {odor}
rpln(states(ttaa(fdep(ff,sset([w1131]))[0])))
# {(odor, almond), (<<1,1>,31>, 0)}
# {(odor, anise), (<<1,1>,31>, 0)}
# {(odor, creosote), (<<1,1>,31>, 0)}
# {(odor, fishy), (<<1,1>,31>, 1)}
# {(odor, foul), (<<1,1>,31>, 1)}
# {(odor, musty), (<<1,1>,31>, 1)}
# {(odor, none), (<<1,1>,31>, 2)}
# {(odor, pungent), (<<1,1>,31>, 0)}
# {(odor, spicy), (<<1,1>,31>, 1)}
rpln(aall(hhaa(hrhh(uu2,hrhrred(hhb,sset([w1131])|vvl)))))
# ({(edible, edible), (<<1,1>,31>, 0)}, 800 % 1)
# ({(edible, edible), (<<1,1>,31>, 2)}, 3408 % 1)
# ({(edible, poisonous), (<<1,1>,31>, 0)}, 448 % 1)
# ({(edible, poisonous), (<<1,1>,31>, 1)}, 3348 % 1)
# ({(edible, poisonous), (<<1,1>,31>, 2)}, 120 % 1)
That is, underlying values almond
, anise
, creosote
and pungent
form a component, value none
forms a singleton component, while the remaining values are in a third component. Underlying values creosote
and pungent
, however, are not relevant to edibility, so the label entropy is higher than for variable odor
.
rpln(aall(hhaa(hrhh(uu2,hrhrred(hhb,sset([w1131])|vvl)))))
# ({(edible, edible), (<<1,1>,31>, 0)}, 800 % 1)
# ({(edible, edible), (<<1,1>,31>, 2)}, 3408 % 1)
# ({(edible, poisonous), (<<1,1>,31>, 0)}, 448 % 1)
# ({(edible, poisonous), (<<1,1>,31>, 1)}, 3348 % 1)
# ({(edible, poisonous), (<<1,1>,31>, 2)}, 120 % 1)
odor = VarStr("odor")
rpln(aall(hhaa(hrhh(uu2,hrhrred(hhb,sset([w1131,odor])|vvl)))))
# ({(edible, edible), (odor, almond), (<<1,1>,31>, 0)}, 400 % 1)
# ({(edible, edible), (odor, anise), (<<1,1>,31>, 0)}, 400 % 1)
# ({(edible, edible), (odor, none), (<<1,1>,31>, 2)}, 3408 % 1)
# ({(edible, poisonous), (odor, creosote), (<<1,1>,31>, 0)}, 192 % 1)
# ({(edible, poisonous), (odor, fishy), (<<1,1>,31>, 1)}, 576 % 1)
# ({(edible, poisonous), (odor, foul), (<<1,1>,31>, 1)}, 2160 % 1)
# ({(edible, poisonous), (odor, musty), (<<1,1>,31>, 1)}, 36 % 1)
# ({(edible, poisonous), (odor, none), (<<1,1>,31>, 2)}, 120 % 1)
# ({(edible, poisonous), (odor, pungent), (<<1,1>,31>, 0)}, 256 % 1)
# ({(edible, poisonous), (odor, spicy), (<<1,1>,31>, 1)}, 576 % 1)
Now optimise for larger tuples, excluding the substrate. Let $B_2 = B~\%~(\mathrm{vars}(F) \setminus V \cup V_{\mathrm{l}})$. Then, \[ \{(\mathrm{lent}(B_2,M,V_{\mathrm{l}}),~M) : M \in \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,B_2,\mathrm{L}}))\} \]
hhb2 = hrhrred(hhb,fvars(ff)-vv|vvl)
(kmax,omax,qmax) = (1, 10, 10)
rpln(buildcondrr(vvl,hhb2,kmax,omax,qmax))
# (0.16474416484069354, {<<1,1>,31>})
# (0.16917391545419513, {<<1,1>,4>})
# (0.2263923811825881, {<<1,1>,49>})
# (0.23421502771371605, {<<1,2>,95>})
# (0.2361093658636677, {<<1,n>,3>})
# (0.2361093658636677, {<<1,n>,4>})
# (0.2361093658636677, {<<1,n>,6>})
# (0.2361093658636677, {<<1,n>,7>})
# (0.2361093658636677, {<<1,2>,83>})
# (0.2361093658636677, {<<1,2>,85>})
(kmax,omax,qmax) = (2, 10, 20)
rpln(buildcondrr(vvl,hhb2,kmax,omax,qmax))
# (0.044122472082529285, {<<1,1>,31>, <<2,3>,4>})
# (0.04412247208252951, {<<1,1>,4>, <<2,n>,2>})
# (0.04412247208252951, {<<1,1>,4>, <<2,n>,3>})
# (0.04412247208252951, {<<1,1>,4>, <<2,n>,6>})
# (0.04412247208252951, {<<1,1>,31>, <<2,n>,2>})
# (0.04412247208252951, {<<1,1>,31>, <<2,n>,3>})
# (0.04412247208252951, {<<1,1>,31>, <<2,n>,6>})
# (0.04412247208252973, {<<1,1>,31>, <<2,2>,38>})
# (0.04566068343589369, {<<1,1>,31>, <<2,n>,1>})
# (0.04566068343589369, {<<1,1>,31>, <<2,n>,4>})
# (0.16474416484069354, {<<1,1>,31>})
# (0.16917391545419513, {<<1,1>,4>})
...
Continuing up to to 6-tuples,
(kmax,omax,qmax) = (3, 10, 20)
rpln(buildcondrr(vvl,hhb2,kmax,omax,qmax))
# (0.022390255903229406, {<<1,1>,31>, <<2,2>,38>, <<9,1>,20>})
# (0.022390255903229406, {<<1,1>,31>, <<2,3>,4>, <<9,1>,20>})
...
(kmax,omax,qmax) = (4, 10, 20)
rpln(buildcondrr(vvl,hhb2,kmax,omax,qmax))
# (0.010993236700227893, {<<1,1>,4>, <<1,1>,36>, <<2,n>,2>, <<9,n>,1>})
# (0.010993236700227893, {<<1,1>,4>, <<1,1>,36>, <<2,n>,2>, <<9,n>,2>})
...
(kmax,omax,qmax) = (5, 10, 20)
rpln(buildcondrr(vvl,hhb2,kmax,omax,qmax))
# (0.002463822863309595, {<<1,1>,4>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,1>})
# (0.002463822863309595, {<<1,1>,4>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,2>})
...
(kmax,omax,qmax) = (6, 10, 20)
ll = buildcondrr(vvl,hhb2,kmax,omax,qmax)
rpln(ll)
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,1>})
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,2>})
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,2>,48>, <<9,n>,1>})
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,2>,48>, <<9,n>,2>})
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,1>,18>, <<9,n>,1>})
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,1>,18>, <<9,n>,2>})
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,2>,48>, <<9,n>,1>})
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,2>,48>, <<9,n>,2>})
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,6>, <<2,1>,18>, <<9,n>,1>})
# (0.0, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,6>, <<2,1>,18>, <<9,n>,2>})
# (0.002463822863309595, {<<1,1>,4>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,1>})
# (0.002463822863309595, {<<1,1>,4>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,2>})
# ...
Now we have found 10 tuples which are predictive of edibility,
rpln([(xx,fund(fdep(ff,xx))) for (e,xx) in ll if e < 1e-14])
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,1>}, {bruises, gill-attachment, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,2>}, {bruises, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,2>,48>, <<9,n>,1>}, {bruises, gill-attachment, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,2>,48>, <<9,n>,2>}, {bruises, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,1>,18>, <<9,n>,1>}, {bruises, gill-attachment, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,1>,18>, <<9,n>,2>}, {bruises, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,2>,48>, <<9,n>,1>}, {bruises, gill-attachment, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,2>,48>, <<9,n>,2>}, {bruises, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,6>, <<2,1>,18>, <<9,n>,1>}, {bruises, gill-attachment, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})
# ({<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,6>, <<2,1>,18>, <<9,n>,2>}, {bruises, gill-color, gill-size, gill-spacing, habitat, odor, population, ring-number, ring-type, spore-print-color, stalk-root, stalk-shape, stalk-surface-below-ring})
Let us sort by shuffle content derived alignment descending. Let $L = \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,B_2,\mathrm{L}}))$. Then calculate \[ \{(\mathrm{algn}(B\%X)-\mathrm{algn}(B_{\mathrm{r}}\%X),~X) : (e,X) \in L,~e \approx 0\} \] where $B_{\mathrm{r}} = A_{\mathrm{r}} * \prod\mathrm{his}(F)$,
hhbr = hrfmul(uu2,ff,hhr)
hrsize(hhbr)
8124
rpln(reversed(list(sset([(algn(aa1)-algn(aar1),xx) for (e,xx) in ll if e < 1e-14 for aa1 in [hhaa(hrhh(uu2,hrhrred(hhb,xx)))] for aar1 in [hhaa(hrhh(uu2,hrhrred(hhbr,xx)))]]))))
# (29472.345908137817, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,1>,18>, <<9,n>,2>})
# (29468.238375651046, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,1>,18>, <<9,n>,1>})
# (29280.163931967294, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,2>})
# (29276.550876789777, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,1>})
# (28844.380866136813, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,6>, <<2,1>,18>, <<9,n>,2>})
# (28840.28394834463, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,6>, <<2,1>,18>, <<9,n>,1>})
# (28710.853209698478, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,2>,48>, <<9,n>,2>})
# (28705.809106843124, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,2>,48>, <<9,n>,1>})
# (28505.052905060125, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,2>,48>, <<9,n>,2>})
# (28500.56028382371, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,2>,48>, <<9,n>,1>})
and by size-volume-sized-shuffle relative entropy descending, \[ \{(\mathrm{rent}(B~\%~X,~Z_F * \hat{B}_{\mathrm{r}}~\%~X),~X) : (e,X) \in L,~e \approx 0\} \]
rpln(reversed(list(sset([(rent(aa1,vaar1),xx) for (e,xx) in ll if e < 1e-14 for aa1 in [hhaa(hrhh(uu2,hrhrred(hhb,xx)))] for vaar1 in [vsize(uu2,fund(fdep(ff,xx)),hhaa(hrhh(uu2,hrhrred(hhbr,xx))))]]))))
# (35025.78415060043, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,1>,18>, <<9,n>,1>})
# (34267.05117201805, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,1>})
# (34138.91352367401, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,2>,48>, <<9,n>,1>})
# (33342.31550860405, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,6>, <<2,1>,18>, <<9,n>,1>})
# (33271.06856274605, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,2>,48>, <<9,n>,1>})
# (32560.13660311699, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,1>,18>, <<9,n>,2>})
# (31805.06520330906, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,2>})
# (31674.591724276543, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,3>, <<2,2>,48>, <<9,n>,2>})
# (30880.60635328293, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,6>, <<2,1>,18>, <<9,n>,2>})
# (30810.17875802517, {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,2>,48>, <<9,n>,2>})
We can see that the derived alignments and the relative entropies of the submodels, $X \subset \mathrm{vars}(F) \setminus V$, are higher than that of the manual model, $W_{\mathrm{m}}$, which suggest that the induced submodels are more likely and less sensitive than the manual model.
Let us analyse the one of the sub-models,
xx = ll[0][1]
xx
# {<<1,1>,4>, <<1,1>,26>, <<1,1>,36>, <<2,n>,2>, <<2,1>,18>, <<9,n>,1>}
hrlent(uu2,hhb,xx,vvl)
0.0
len(fvars(fdep(ff,xx)))
76
This tuple has a volume of 3840,
vol(uu2,xx)
3840
but classifies the sample into only 22 effective states or slices,
rpln(aall(hhaa(hrhh(uu2,hrhrred(hhb,xx|vvl)))))
# ({(edible, edible), (<<1,1>,4>, 0), (<<1,1>,26>, 0), (<<1,1>,36>, 0), (<<2,n>,2>, null), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, 96 % 1)
# ({(edible, edible), (<<1,1>,4>, 0), (<<1,1>,26>, 2), (<<1,1>,36>, 0), (<<2,n>,2>, 2), (<<2,1>,18>, 1), (<<9,n>,1>, null)}, 704 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 1)}, 192 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 1), (<<2,1>,18>, 2), (<<9,n>,1>, null)}, 96 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 2), (<<2,1>,18>, 4), (<<9,n>,1>, null)}, 768 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 2), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 0)}, 48 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 0), (<<2,n>,2>, null), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, 1728 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 0), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 1)}, 144 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 1), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, 32 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 1), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 1)}, 48 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, 16 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 2)}, 288 % 1)
# ({(edible, edible), (<<1,1>,4>, 2), (<<1,1>,26>, 3), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, 48 % 1)
# ({(edible, poisonous), (<<1,1>,4>, 0), (<<1,1>,26>, 0), (<<1,1>,36>, 0), (<<2,n>,2>, 1), (<<2,1>,18>, 2), (<<9,n>,1>, null)}, 256 % 1)
# ({(edible, poisonous), (<<1,1>,4>, 0), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 1), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, 192 % 1)
# ({(edible, poisonous), (<<1,1>,4>, 0), (<<1,1>,26>, 0), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 1), (<<9,n>,1>, 0)}, 36 % 1)
# ({(edible, poisonous), (<<1,1>,4>, 1), (<<1,1>,26>, 1), (<<1,1>,36>, 2), (<<2,n>,2>, null), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, 1296 % 1)
# ({(edible, poisonous), (<<1,1>,4>, 1), (<<1,1>,26>, 2), (<<1,1>,36>, 1), (<<2,n>,2>, 1), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, 288 % 1)
# ({(edible, poisonous), (<<1,1>,4>, 1), (<<1,1>,26>, 3), (<<1,1>,36>, 3), (<<2,n>,2>, null), (<<2,1>,18>, 3), (<<9,n>,1>, null)}, 1728 % 1)
# ({(edible, poisonous), (<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, 8 % 1)
# ({(edible, poisonous), (<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 0), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, 72 % 1)
# ({(edible, poisonous), (<<1,1>,4>, 2), (<<1,1>,26>, 3), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 0)}, 40 % 1)
size(eff(hhaa(hrhh(uu2,hrhrred(hhb,xx|vvl)))))
# 22 % 1
rpln(ssplit(xx,states(hhaa(hrhh(uu2,hrhrred(hhb,xx|vvl))))))
# ({(<<1,1>,4>, 0), (<<1,1>,26>, 0), (<<1,1>,36>, 0), (<<2,n>,2>, null), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, {(edible, edible)})
# ({(<<1,1>,4>, 0), (<<1,1>,26>, 0), (<<1,1>,36>, 0), (<<2,n>,2>, 1), (<<2,1>,18>, 2), (<<9,n>,1>, null)}, {(edible, poisonous)})
# ({(<<1,1>,4>, 0), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 1), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, {(edible, poisonous)})
# ({(<<1,1>,4>, 0), (<<1,1>,26>, 0), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 1), (<<9,n>,1>, 0)}, {(edible, poisonous)})
# ({(<<1,1>,4>, 0), (<<1,1>,26>, 2), (<<1,1>,36>, 0), (<<2,n>,2>, 2), (<<2,1>,18>, 1), (<<9,n>,1>, null)}, {(edible, edible)})
# ({(<<1,1>,4>, 1), (<<1,1>,26>, 1), (<<1,1>,36>, 2), (<<2,n>,2>, null), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, {(edible, poisonous)})
# ({(<<1,1>,4>, 1), (<<1,1>,26>, 2), (<<1,1>,36>, 1), (<<2,n>,2>, 1), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, {(edible, poisonous)})
# ({(<<1,1>,4>, 1), (<<1,1>,26>, 3), (<<1,1>,36>, 3), (<<2,n>,2>, null), (<<2,1>,18>, 3), (<<9,n>,1>, null)}, {(edible, poisonous)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, {(edible, poisonous)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 1)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 1), (<<2,1>,18>, 2), (<<9,n>,1>, null)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 1), (<<2,n>,2>, 2), (<<2,1>,18>, 4), (<<9,n>,1>, null)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 0), (<<1,1>,36>, 2), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 0)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 0), (<<2,n>,2>, null), (<<2,1>,18>, 0), (<<9,n>,1>, null)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 0), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, {(edible, poisonous)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 0), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 1)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 1), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 1), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 1)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 2), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 2)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 3), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 0), (<<9,n>,1>, 0)}, {(edible, edible)})
# ({(<<1,1>,4>, 2), (<<1,1>,26>, 3), (<<1,1>,36>, 3), (<<2,n>,2>, 0), (<<2,1>,18>, 3), (<<9,n>,1>, 0)}, {(edible, poisonous)})
To conclude, we can see that there are many robust sub-models of the induced model that are predictive of edibility.