Predicting digit without modelling

MNIST - handwritten digits/Predicting digit without modelling

Sections

Introduction

1-tuple

Introduction

The sample query variables predict digit. That is, there is a functional or causal relationship between the query variables and the label variables, $(A\%V_{\mathrm{k}})^{\mathrm{FS}} \to (A\%V_{\mathrm{l}})^{\mathrm{FS}}$. So the label entropy or query conditional entropy is zero. See Entropy and alignment. The label entropy is \[ \begin{eqnarray} \mathrm{lent}(A,W,L)~:=~\mathrm{entropy}(A~\%~(W \cup L)) - \mathrm{entropy}(A~\%~W) \end{eqnarray} \]

First load the training sample,

:l NISTDev

(uu,hrtr) <- nistTrainBucketedIO 2

let digit = VarStr "digit"
let vv = uvars uu
let vvl = sgl digit
let vvk = vv `minus` vvl

let hr = hrev [i | i <- [0 .. hrsize hrtr - 1], i `mod` 8 == 0] hrtr 

The shuffle is $A_{\mathrm{r}}$,

let hrr = historyRepasShuffle_u hr 1

Now define the label entropy,

let hrlent uu hh ww vvl = ent (hhaa $ hrhh uu $ hh `hrhrred` (ww `union` vvl)) - ent (hhaa $ hrhh uu $ hh `hrhrred` ww)

Then $\mathrm{lent}(A,V_{\mathrm{k}},V_{\mathrm{l}}) = 0$,

hrlent uu hr vvk vvl
0.0

1-tuple

We can determine which of the query variables has the least conditional entropy, \[ \begin{eqnarray} \{(\mathrm{lent}(A,\{w\},V_{\mathrm{l}}),~w) : w \in V_{\mathrm{k}}\} \end{eqnarray} \]

let ll = sort [(hrlent uu hr (sgl w) vvl, w) | w <- qqll vvk]

rpln $ take 20 ll
"(2.106715914339201,<13,15>)"
"(2.1123060297815153,<14,15>)"
"(2.1332577705312117,<15,15>)"
"(2.1340760040384357,<22,10>)"
"(2.136893962468214,<20,11>)"
"(2.13721055765674,<21,10>)"
"(2.141687876893669,<20,12>)"
"(2.141948880423374,<21,11>)"
"(2.142064103611258,<12,15>)"
"(2.143524082463724,<22,11>)"
"(2.1438457401658626,<16,15>)"
"(2.1453000849733366,<23,11>)"
"(2.146147540554603,<14,14>)"
"(2.1488599750956165,<21,9>)"
"(2.1498280089275488,<24,13>)"
"(2.1506649095818275,<21,12>)"
"(2.151477655766341,<17,15>)"
"(2.1536449181562007,<15,18>)"
"(2.155469103682602,<23,12>)"
"(2.155476320456197,<20,10>)"

The top pixel is <13,15>. Here it is overlaid on the average, $\hat{A}\%V_{\mathrm{k}}$,

let file = "NIST.bmp"

let hrbmav = hrbm 28 3 2 $ hr `hrhrred` vvk

bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (llqq $ snd $ unzip $ take 1 ll)

top 1 label ent

Imaging the 2 states ordered by size descending, $A~\%~\{\mathrm{<13,15>}\}$,

let v1315 = stringsVariable "<13,15>"

let pp = take 20 $ reverse $ sort $ map (\(a,b) -> (b,a)) $ aall $ araa uu $ hr `hrred` sgl v1315

map numerator $ fst $ unzip pp
[4765,2735]

bmwrite file $ bmhstack $ map (bmborder 1 . hrbm 28 3 2) $ [hr' `hrhrsel` hrs | let hr' = hr `hrhrred` vvk, (_,ss) <- pp, let hrs = aahr uu (single ss 1)]

v1315 states

The top 20 are $\mathrm{topd}(20)(L)$, where $L = \{(w,~\mathrm{lent}(A,\{w\},V_{\mathrm{l}})) : w \in V_{\mathrm{k}}\}$,

bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (llqq $ snd $ unzip $ take 20 ll)

top 20 label ent

The top 100 are $\mathrm{topd}(100)(L)$,

bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (llqq $ snd $ unzip $ take 100 ll)

top 100 label ent

The bottom 20 pixels are $\mathrm{botd}(20)(L)$,

rpln $ drop (28*28-20) ll
"(2.3018362154122483,<27,5>)"
"(2.3018362154122483,<27,23>)"
"(2.3018362154122483,<27,24>)"
"(2.3018362154122483,<27,25>)"
"(2.3018362154122483,<27,26>)"
"(2.3018362154122483,<27,27>)"
"(2.3018362154122483,<27,28>)"
"(2.3018362154122483,<28,1>)"
"(2.3018362154122483,<28,2>)"
"(2.3018362154122483,<28,3>)"
"(2.3018362154122483,<28,4>)"
"(2.3018362154122483,<28,5>)"
"(2.3018362154122483,<28,6>)"
"(2.3018362154122483,<28,7>)"
"(2.3018362154122483,<28,23>)"
"(2.3018362154122483,<28,24>)"
"(2.3018362154122483,<28,25>)"
"(2.3018362154122483,<28,26>)"
"(2.3018362154122483,<28,27>)"
"(2.3018362154122483,<28,28>)"

This may be compared to the entropy of the label variables, $\mathrm{entropy}(A\%V_{\mathrm{l}})$,

ent (hhaa $ hrhh uu $ hr `hrhrred` vvl)
2.3018362154122483

bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (llqq [u | (e,u) <- ll, e >= 2.3018362154122483])

bottom label ent

Pixel <13,15> has the least conditional entropy, and so is more predictive of digit, $\mathrm{lent}(A,\{\mathrm{<13,15>}\},V_{\mathrm{l}})$,

hrlent uu hr (sgl v1315) vvl
2.106715914339201

The reduction is $A~\%~\{\mathrm{<13,15>}\}$,

rpln $ aall $ hhaa $ hrhh uu $ hr `hrhrred` (sgl v1315 `union` vvl)
"({(digit,0),(<13,15>,0)},704 % 1)"
"({(digit,0),(<13,15>,1)},39 % 1)"
"({(digit,1),(<13,15>,0)},58 % 1)"
"({(digit,1),(<13,15>,1)},764 % 1)"
"({(digit,2),(<13,15>,0)},568 % 1)"
"({(digit,2),(<13,15>,1)},170 % 1)"
"({(digit,3),(<13,15>,0)},168 % 1)"
"({(digit,3),(<13,15>,1)},577 % 1)"
"({(digit,4),(<13,15>,0)},670 % 1)"
"({(digit,4),(<13,15>,1)},76 % 1)"
"({(digit,5),(<13,15>,0)},442 % 1)"
"({(digit,5),(<13,15>,1)},278 % 1)"
"({(digit,6),(<13,15>,0)},534 % 1)"
"({(digit,6),(<13,15>,1)},173 % 1)"
"({(digit,7),(<13,15>,0)},692 % 1)"
"({(digit,7),(<13,15>,1)},75 % 1)"
"({(digit,8),(<13,15>,0)},349 % 1)"
"({(digit,8),(<13,15>,1)},405 % 1)"
"({(digit,9),(<13,15>,0)},580 % 1)"
"({(digit,9),(<13,15>,1)},178 % 1)"

We can determine minimum subsets of the query variables that are causal or predictive by using the repa conditional entropy tuple set builder. We shall also calculate the shuffle content derived alignment and the size-volume-sized-shuffle relative entropy. \[ \{(\mathrm{lent}(A,M,V_{\mathrm{l}}),~M) : M \in \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,A,\mathrm{L}}))\} \] First load the test sample and select a subset of 1000 events $A_{\mathrm{te}}$,

(_,hrte) <- nistTestBucketedIO 2

hrsize hrte
10000

let hrq = hrev [i | i <- [0 .. hrsize hrte - 1], i `mod` 10 == 0] hrte 

hrsize hrq
1000

Now we do the conditional entropy minimise,

let buildcondrr vvl aa kmax omax qmax = sort $ map (\(a,b) -> (b,a)) $ Map.toList $ fromJust $ parametersBuilderConditionalVarsRepa kmax omax qmax vvl aa

let (kmax,omax,qmax) = (1, 60, 10)

let ll = buildcondrr vvl hr kmax omax qmax

rpln ll
"(2.106715914339201,{<13,15>})"
"(2.1123060297815153,{<14,15>})"
"(2.1332577705312117,{<15,15>})"
"(2.1340760040384357,{<22,10>})"
"(2.136893962468214,{<20,11>})"
"(2.13721055765674,{<21,10>})"
"(2.141687876893669,{<20,12>})"
"(2.141948880423374,{<21,11>})"
"(2.142064103611258,{<12,15>})"
"(2.143524082463724,{<22,11>})"

Let us sort by shuffle content derived alignment descending. Let $L = \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,A,\mathrm{L}}))$. Then calculate \[ \{(\mathrm{algn}(A\%X)-\mathrm{algn}(A_{\mathrm{r}}\%X),~X) : (e,X) \in L\} \]

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uu (hr `hrhrred` xx)), let aar' = hhaa (hrhh uu (hrr `hrhrred` xx))] 
"(0.0,{<22,11>})"
"(0.0,{<22,10>})"
"(0.0,{<21,11>})"
"(0.0,{<21,10>})"
"(0.0,{<20,12>})"
"(0.0,{<20,11>})"
"(0.0,{<15,15>})"
"(0.0,{<14,15>})"
"(0.0,{<13,15>})"
"(0.0,{<12,15>})"

and by size-volume-sized-shuffle relative entropy descending, \[ \{(\mathrm{rent}(A~\%~X,~Z_X * \hat{A}_{\mathrm{r}}~\%~X),~X) : (e,X) \in L\} \] where $Z_X = \mathrm{scalar}(|X^{\mathrm{C}}|)$,

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uu (hr `hrhrred` xx)), let vaar' = vsize uu xx (hhaa (hrhh uu (hrr `hrhrred` xx)))] 
"(2.7622348852673895e-13,{<12,15>})"
"(2.737809978725636e-13,{<14,15>})"
"(2.2959412149248237e-13,{<15,15>})"
"(-4.884981308350689e-15,{<13,15>})"
"(-6.439293542825908e-14,{<20,12>})"
"(-1.6564527527407336e-13,{<22,11>})"
"(-2.0516921495072893e-13,{<22,10>})"
"(-3.157474282033945e-13,{<20,11>})"
"(-4.89386309254769e-13,{<21,10>})"
"(-6.794564910705958e-13,{<21,11>})"

Choose the top tuple $X$,

let xx = llqq $ map stringsVariable ["<13,15>"]

card xx
1

The label entropy, $\mathrm{lent}(A,X,V_{\mathrm{l}})$, is,

hrlent uu hr xx vvl
2.106715914339201

This tuple has a volume of $|X^{\mathrm{C}}| = 2$,

vol uu xx
2

Now consider the query effectiveness against the test set, $\mathrm{size}(A_{\mathrm{te}} * (A\%X)^{\mathrm{F}})$,

size $ hhaa (hrhh uu hrq) `mul` eff (hhaa (hrhh uu (hr `hrhrred` xx)))
1000 % 1

So there exists a prediction for each of the test set for the mono-variate tuple.

2-tuple

let (kmax,omax,qmax) = (2, 10, 10)

let ll = buildcondrr vvl hr kmax omax qmax

rpln ll
"(1.9278815272185974,{<14,15>,<17,15>})"
"(1.9308806609242308,{<13,15>,<16,15>})"
"(1.9312459077976918,{<13,15>,<17,15>})"
"(1.93457345799883,{<13,15>,<22,10>})"
"(1.9378630043953962,{<13,15>,<20,11>})"
"(1.9389377983376297,{<13,15>,<21,10>})"
"(1.941189996342865,{<14,15>,<22,10>})"
"(1.9421337966096723,{<13,15>,<21,11>})"
"(1.9422336675359642,{<14,15>,<20,11>})"
"(1.943167355230266,{<13,15>,<22,11>})"

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uu (hr `hrhrred` xx)), let aar' = hhaa (hrhh uu (hrr `hrhrred` xx))] 
"(43.05125269116252,{<13,15>,<16,15>})"
"(7.2539737408587825,{<13,15>,<21,10>})"
"(4.2186678996440605,{<13,15>,<20,11>})"
"(2.9941047592583345,{<14,15>,<17,15>})"
"(1.385944680099783,{<14,15>,<22,10>})"
"(1.269607648246165,{<13,15>,<22,10>})"
"(-0.16175869062863057,{<14,15>,<20,11>})"
"(-0.28838801408710424,{<13,15>,<17,15>})"
"(-0.3156879640810075,{<13,15>,<22,11>})"
"(-0.4525069542287383,{<13,15>,<21,11>})"

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uu (hr `hrhrred` xx)), let vaar' = vsize uu xx (hhaa (hrhh uu (hrr `hrhrred` xx)))] 
"(1.8386829250236048e-2,{<13,15>,<16,15>})"
"(4.991577993035179e-3,{<14,15>,<17,15>})"
"(3.7668384198976668e-3,{<13,15>,<21,10>})"
"(2.608184312029138e-3,{<13,15>,<21,11>})"
"(7.863858921490774e-4,{<13,15>,<22,10>})"
"(7.260431202160689e-4,{<13,15>,<20,11>})"
"(2.874437048907552e-4,{<14,15>,<22,10>})"
"(1.039902050123942e-4,{<13,15>,<17,15>})"
"(3.374541994727309e-5,{<13,15>,<22,11>})"
"(2.660226409112454e-5,{<14,15>,<20,11>})"

let xx = llqq $ map stringsVariable ["<13,15>","<16,15>"]

card xx
2

hrlent uu hr xx vvl
1.9308806609242308

vol uu xx
4

size $ hhaa (hrhh uu hrq) `mul` eff (hhaa (hrhh uu (hr `hrhrred` xx)))
1000 % 1

rpln $ aall $ hhaa $ hrhh uu $ hr `hrhrred` (xx `union` vvl)
"({(digit,0),(<13,15>,0),(<16,15>,0)},699 % 1)"
"({(digit,0),(<13,15>,0),(<16,15>,1)},5 % 1)"
"({(digit,0),(<13,15>,1),(<16,15>,0)},33 % 1)"
"({(digit,0),(<13,15>,1),(<16,15>,1)},6 % 1)"
"({(digit,1),(<13,15>,0),(<16,15>,0)},19 % 1)"
"({(digit,1),(<13,15>,0),(<16,15>,1)},39 % 1)"
"({(digit,1),(<13,15>,1),(<16,15>,0)},19 % 1)"
"({(digit,1),(<13,15>,1),(<16,15>,1)},745 % 1)"
...
"({(digit,9),(<13,15>,0),(<16,15>,0)},248 % 1)"
"({(digit,9),(<13,15>,0),(<16,15>,1)},332 % 1)"
"({(digit,9),(<13,15>,1),(<16,15>,0)},123 % 1)"
"({(digit,9),(<13,15>,1),(<16,15>,1)},55 % 1)"

bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk xx

bottom label ent

Imaging the 4 states ordered by size descending, $A\%X$,

let pp = take 20 $ reverse $ sort $ map (\(a,b) -> (b,a)) $ aall $ araa uu $ hr `hrred` xx

map numerator $ fst $ unzip pp
[2489,2276,1612,1123]

bmwrite file $ bmhstack $ map (bmborder 1 . hrbm 28 1 2) $ [hr' `hrhrsel` hrs | let hr' = hr `hrhrred` vvk, (_,ss) <- pp, let hrs = aahr uu (single ss 1)]

xx states

This 2-tuple is quite good at distinguishing between zero and non-zero.

5-tuple

Continue on to the 5-tuple,

let (kmax,omax,qmax) = (5, 10, 10)

let ll = buildcondrr vvl hr kmax omax qmax

rpln ll
"(1.436485455789557,{<13,15>,<16,15>,<20,12>,<22,10>,<24,13>})"
"(1.438843611340515,{<13,15>,<16,15>,<20,12>,<22,10>,<24,14>})"
"(1.4468132285301718,{<11,18>,<13,15>,<16,15>,<20,12>,<22,10>})"
"(1.4474427280733333,{<14,15>,<17,15>,<20,12>,<22,10>,<24,13>})"
"(1.4497701554712257,{<13,15>,<16,15>,<20,12>,<22,11>,<24,13>})"
"(1.4500235885928388,{<13,15>,<16,15>,<20,12>,<22,10>,<24,12>})"
"(1.450024451470413,{<14,15>,<17,15>,<20,12>,<22,10>,<24,14>})"
"(1.4505181754063354,{<13,10>,<13,15>,<16,15>,<20,12>,<22,10>})"
"(1.451323679428873,{<13,18>,<14,15>,<17,15>,<20,12>,<22,10>})"
"(1.4522048769374631,{<10,18>,<13,15>,<16,15>,<20,12>,<22,10>})"

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uu (hr `hrhrred` xx)), let aar' = hhaa (hrhh uu (hrr `hrhrred` xx))] 
"(811.0329633142683,{<13,15>,<16,15>,<20,12>,<22,11>,<24,13>})"
"(758.9258663460932,{<13,15>,<16,15>,<20,12>,<22,10>,<24,12>})"
"(733.2461124636175,{<14,15>,<17,15>,<20,12>,<22,10>,<24,13>})"
"(656.6864908957359,{<13,18>,<14,15>,<17,15>,<20,12>,<22,10>})"
"(644.0347740919315,{<13,15>,<16,15>,<20,12>,<22,10>,<24,13>})"
"(638.4889439393664,{<14,15>,<17,15>,<20,12>,<22,10>,<24,14>})"
"(536.7002904385736,{<13,15>,<16,15>,<20,12>,<22,10>,<24,14>})"
"(506.4821881385724,{<11,18>,<13,15>,<16,15>,<20,12>,<22,10>})"
"(485.0874000337062,{<13,10>,<13,15>,<16,15>,<20,12>,<22,10>})"
"(448.7673821378048,{<10,18>,<13,15>,<16,15>,<20,12>,<22,10>})"

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uu (hr `hrhrred` xx)), let vaar' = vsize uu xx (hhaa (hrhh uu (hrr `hrhrred` xx)))] 
"(4.430436832928706,{<13,15>,<16,15>,<20,12>,<22,11>,<24,13>})"
"(3.638794073206938,{<13,15>,<16,15>,<20,12>,<22,10>,<24,12>})"
"(3.5359659134791883,{<14,15>,<17,15>,<20,12>,<22,10>,<24,13>})"
"(3.358375838520928,{<14,15>,<17,15>,<20,12>,<22,10>,<24,14>})"
"(3.127285631617113,{<13,15>,<16,15>,<20,12>,<22,10>,<24,13>})"
"(3.088298662755136,{<13,18>,<14,15>,<17,15>,<20,12>,<22,10>})"
"(2.9533695215733786,{<13,15>,<16,15>,<20,12>,<22,10>,<24,14>})"
"(2.311549911111868,{<11,18>,<13,15>,<16,15>,<20,12>,<22,10>})"
"(2.2027705825412482,{<13,10>,<13,15>,<16,15>,<20,12>,<22,10>})"
"(1.8670983616914754,{<10,18>,<13,15>,<16,15>,<20,12>,<22,10>})"

let xx = head $ snd $ unzip ll

card xx
5

hrlent uu hr xx vvl
1.436485455789557

vol uu xx
32

size $ hhaa (hrhh uu hrq) `mul` eff (hhaa (hrhh uu (hr `hrhrred` xx)))
1000 % 1

rpln $ aall $ hhaa $ hrhh uu $ hr `hrhrred` (xx `union` vvl)

bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk xx

bottom label ent

Imaging the first 20 states ordered by size descending, $\mathrm{top}(20)(A\%X)$,

let pp = take 20 $ reverse $ sort $ map (\(a,b) -> (b,a)) $ aall $ araa uu $ hr `hrred` xx

map numerator $ fst $ unzip pp
[898,861,673,436,407,369,359,333,299,290,245,232,217,178,169,152,150,141,133,129]

bmwrite file $ bmhstack $ map (bmborder 1 . hrbm 28 1 2) $ [hr' `hrhrsel` hrs | let hr' = hr `hrhrred` vvk, (_,ss) <- pp, let hrs = aahr uu (single ss 1)]

xx states

Let us apply the 5-tuple to the test sample to calculate the accuracy of prediction, \[ \{(Q\%V_{\mathrm{l}},~\mathrm{max}(R)) : (S,\cdot) \in A_{\mathrm{te}},~Q = \{S\}^{\mathrm{U}},~R = A * (Q\%X),~\mathrm{size}(R) > 0\} \]

let amax = llaa . take 1 . reverse . map (\(a,b) -> (b,a)) . sort . map (\(a,b) -> (b,a)) . aall . norm . trim

rpln $ take 20 [(aarr (qq `red` vvl), aarr (amax rr)) | let hhq = hrhh uu (hrq `hrhrred` (xx `union` vvl)), let aa = araa uu (hr `hrred` (xx `union` vvl)), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` xx) `red` vvl, size rr > 0]
"([({(digit,7)},1.0)],[({(digit,7)},0.4151376146788991)])"
"([({(digit,0)},1.0)],[({(digit,0)},0.7764127764127764)])"
"([({(digit,9)},1.0)],[({(digit,3)},0.49310344827586206)])"
"([({(digit,3)},1.0)],[({(digit,3)},0.49310344827586206)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.6939078751857355)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.6128133704735376)])"
"([({(digit,7)},1.0)],[({(digit,7)},0.35423925667828104)])"
"([({(digit,7)},1.0)],[({(digit,7)},0.4151376146788991)])"
"([({(digit,7)},1.0)],[({(digit,7)},0.35423925667828104)])"
"([({(digit,3)},1.0)],[({(digit,3)},0.49310344827586206)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.6128133704735376)])"
"([({(digit,8)},1.0)],[({(digit,1)},0.6939078751857355)])"
"([({(digit,5)},1.0)],[({(digit,8)},0.4883720930232558)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.6128133704735376)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.4624624624624625)])"
"([({(digit,9)},1.0)],[({(digit,4)},0.4298440979955457)])"
"([({(digit,4)},1.0)],[({(digit,4)},0.4298440979955457)])"
"([({(digit,4)},1.0)],[({(digit,4)},0.4298440979955457)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.6939078751857355)])"
"([({(digit,1)},1.0)],[({(digit,8)},0.47368421052631576)])"

We show the test sample digit on the left and the best guess by the tuple on the right, along with the probability. In this case all 20 queries are effective and the model is correct 15 times.

These are the query images side by side with the slice for the tuple state,

bmwrite file $ bmvstack $ take 20 $ map (bmhstack . map (bmborder 1 . hrbm 28 1 2)) [[hrev [i] hrq `hrhrred` vvk, hr `hrhrsel` qq `hrhrred` vvk] | i <- [0 .. hrsize hrq - 1], let qq = hrev [i] hrq `hrhrred` xx]

20 query

Overall, this model is correct for 44.8% of the test sample, \[ |\{R : (S,\cdot) \in A_{\mathrm{te}},~Q = \{S\}^{\mathrm{U}},~R = A * (Q\%X),~\mathrm{size}(\mathrm{max}(R) * (Q\%V_{\mathrm{l}})) > 0\}| \]

length [rr | let hhq = hrhh uu (hrq `hrhrred` (xx `union` vvl)), let aa = araa uu (hr `hrred` (xx `union` vvl)), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` xx) `red` vvl, size rr > 0, size (amax rr `mul` (qq `red` vvl)) > 0]
448

12-tuple

Continue on to the 12-tuple,

let (kmax,omax,qmax) = (12, 10, 10)

let ll = buildcondrr vvl hr kmax omax qmax

rpln ll
"(0.44876138405600585,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<15,18>,<16,15>,<17,11>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(0.4488059178690671,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<15,18>,<16,15>,<17,12>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(0.4494382758231419,{<8,15>,<10,12>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(0.4497106538933693,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<17,12>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(0.450279819031401,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<17,11>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(0.4507996248397248,{<8,15>,<9,12>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(0.45103768071386785,{<8,15>,<9,12>,<10,17>,<10,19>,<13,15>,<15,12>,<15,18>,<16,15>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(0.45139293398245073,{<6,16>,<8,15>,<10,12>,<10,17>,<13,15>,<14,18>,<15,12>,<16,15>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(0.4519764035369258,{<8,15>,<10,17>,<10,19>,<11,11>,<13,15>,<15,12>,<16,15>,<16,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(0.45260335677485486,{<8,15>,<10,12>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<17,18>,<20,12>,<21,14>,<22,10>,<24,13>})"

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uu (hr `hrhrred` xx)), let aar' = hhaa (hrhh uu (hrr `hrhrred` xx))] 
"(4699.670621715999,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<15,18>,<16,15>,<17,12>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(4621.062377420516,{<6,16>,<8,15>,<10,12>,<10,17>,<13,15>,<14,18>,<15,12>,<16,15>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(4614.444253739001,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<17,12>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(4577.341163675566,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<15,18>,<16,15>,<17,11>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(4480.221910807174,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<17,11>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(4193.502863142198,{<8,15>,<9,12>,<10,17>,<10,19>,<13,15>,<15,12>,<15,18>,<16,15>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(4183.941209264208,{<8,15>,<10,12>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(4149.267596970897,{<8,15>,<10,12>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<17,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(4119.763448820465,{<8,15>,<10,17>,<10,19>,<11,11>,<13,15>,<15,12>,<16,15>,<16,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(4095.1484671911376,{<8,15>,<9,12>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<20,12>,<21,14>,<22,10>,<24,13>})"

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uu (hr `hrhrred` xx)), let vaar' = vsize uu xx (hhaa (hrhh uu (hrr `hrhrred` xx)))] 
"(3123.6207472783244,{<6,16>,<8,15>,<10,12>,<10,17>,<13,15>,<14,18>,<15,12>,<16,15>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(3032.1854086896783,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<15,18>,<16,15>,<17,12>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(3012.1475269226794,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<17,12>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(2996.2635833369714,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<17,11>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(2988.9158685330367,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<15,18>,<16,15>,<17,11>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(2969.2991737766934,{<8,15>,<9,12>,<10,17>,<10,19>,<13,15>,<15,12>,<15,18>,<16,15>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(2945.6892856164413,{<8,15>,<10,17>,<10,19>,<11,11>,<13,15>,<15,12>,<16,15>,<16,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(2915.848713614232,{<8,15>,<10,12>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(2907.7712056377277,{<8,15>,<9,12>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(2892.892903450367,{<8,15>,<10,12>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<17,18>,<20,12>,<21,14>,<22,10>,<24,13>})"

let xx = head $ snd $ unzip ll

card xx
12

hrlent uu hr xx vvl
0.44876138405600585

vol uu xx
4096

size $ hhaa (hrhh uu hrq) `mul` eff (hhaa (hrhh uu (hr `hrhrred` xx)))
876 % 1

rpln $ aall $ hhaa $ hrhh uu $ hr `hrhrred` (xx `union` vvl)

bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk xx

bottom label ent

Imaging the first 20 states ordered by size descending, $\mathrm{top}(20)(A\%X)$,

let pp = take 20 $ reverse $ sort $ map (\(a,b) -> (b,a)) $ aall $ araa uu $ hr `hrred` xx

map numerator $ fst $ unzip pp
[162,138,88,72,65,64,63,55,51,45,44,44,44,44,44,42,40,36,36,35]

bmwrite file $ bmhstack $ map (bmborder 1 . hrbm 28 1 2) $ [hr' `hrhrsel` hrs | let hr' = hr `hrhrred` vvk, (_,ss) <- pp, let hrs = aahr uu (single ss 1)]

xx states

We can see that as the tuple cardinality increases and the label entropy decreases, the slices are increasingly identifiable as a digit.

Let us apply the 12-tuple to the test sample to calculate the accuracy of prediction, \[ \{(Q\%V_{\mathrm{l}},~\mathrm{max}(R)) : (S,\cdot) \in A_{\mathrm{te}},~Q = \{S\}^{\mathrm{U}},~R = A * (Q\%X),~\mathrm{size}(R) > 0\} \]

rpln $ take 20 [(aarr (qq `red` vvl), aarr (amax rr)) | let hhq = hrhh uu (hrq `hrhrred` (xx `union` vvl)), let aa = araa uu (hr `hrred` (xx `union` vvl)), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` xx) `red` vvl, size rr > 0]
"([({(digit,7)},1.0)],[({(digit,7)},0.9777777777777777)])"
"([({(digit,0)},1.0)],[({(digit,0)},0.7818181818181819)])"
"([({(digit,9)},1.0)],[({(digit,9)},0.8)])"
"([({(digit,3)},1.0)],[({(digit,3)},0.875)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.9814814814814815)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.9722222222222222)])"
"([({(digit,7)},1.0)],[({(digit,7)},0.7894736842105263)])"
"([({(digit,7)},1.0)],[({(digit,7)},0.6)])"
"([({(digit,7)},1.0)],[({(digit,4)},0.8666666666666667)])"
"([({(digit,3)},1.0)],[({(digit,3)},0.6666666666666666)])"
"([({(digit,6)},1.0)],[({(digit,6)},1.0)])"
"([({(digit,8)},1.0)],[({(digit,4)},1.0)])"
"([({(digit,5)},1.0)],[({(digit,5)},1.0)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.9722222222222222)])"
"([({(digit,6)},1.0)],[({(digit,6)},1.0)])"
"([({(digit,9)},1.0)],[({(digit,9)},0.782608695652174)])"
"([({(digit,4)},1.0)],[({(digit,4)},0.967741935483871)])"
"([({(digit,4)},1.0)],[({(digit,4)},0.8095238095238095)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.9710144927536232)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.8333333333333334)])"

In this case all 20 queries are effective and the model is correct 18 times,

bmwrite file $ bmvstack $ take 20 $ map (bmhstack . map (bmborder 1 . hrbm 28 1 2)) [[hrev [i] hrq `hrhrred` vvk, hr `hrhrsel` qq `hrhrred` vvk] | i <- [0 .. hrsize hrq - 1], let qq = hrev [i] hrq `hrhrred` xx]

20 query

Overall, this model is correct for 60% of the test sample,

length [rr | let hhq = hrhh uu (hrq `hrhrred` (xx `union` vvl)), let aa = araa uu (hr `hrred` (xx `union` vvl)), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` xx) `red` vvl, size rr > 0, size (amax rr `mul` (qq `red` vvl)) > 0]
600

The prediction accuracy of larger tuples is not likely to be much higher, because the query effectiveness declines as the tuple cardinality increases.

1-tuple 15-fud

Instead of determining minimum subsets of the query variables that are causal or predictive by using the conditional entropy tuple set builder, consider using the conditional entropy fud decomper, $\{D\} = \mathrm{leaves}(\mathrm{tree}(Z_{P,A,\mathrm{L,D,F}}))$.

The resultant decomposition consists of singleton fuds of self partition transforms of smaller tuples. In this way a set of paths of different tuples for different slices can reduce the label entropy,

let decompercondrr vvl uu aa kmax omax fmax = fromJust $ parametersSystemsHistoryRepasDecomperConditionalFmaxRepa kmax omax fmax uu vvl aa

let (kmax,omax) = (1, 5)

let (uu1,df) = decompercondrr vvl uu hr kmax omax 15

rp $ dfund df
"{<6,16>,<8,11>,<8,15>,<9,18>,<11,11>,<13,15>,<15,12>,<15,14>,<16,10>,<16,16>,<17,13>,<17,15>,<19,11>,<20,12>,<22,10>}"

card $ dfund df
15

let dfll = qqll . treesPaths . dfzz 

rpln $ map (map (\(_,ff) -> fund ff)) $ dfll df
"[{<13,15>},{<22,10>},{<20,12>},{<16,10>},{<15,14>},{<17,13>},{<6,16>}]"
"[{<13,15>},{<22,10>},{<20,12>},{<16,10>},{<8,15>}]"
"[{<13,15>},{<22,10>},{<20,12>},{<9,18>}]"
"[{<13,15>},{<22,10>},{<16,16>}]"
"[{<13,15>},{<17,15>},{<19,11>},{<11,11>}]"
"[{<13,15>},{<17,15>},{<8,11>},{<15,12>}]"

Now analyse with the fud decomposition fud, $F = D^{\mathrm{F}}$, (see Practicable fud decomposition fud),

let ff = fromJust $ systemsDecompFudsNullablePracticable uu1 df 1

let uu2 = uu `uunion` (fsys ff)

let hrb = hrfmul uu2 ff hr

The label entropy, $\mathrm{lent}(A * \mathrm{his}(F^{\mathrm{T}}),W_F,V_{\mathrm{l}})$, where $W_F = \mathrm{der}(F)$, is

hrlent uu2 hrb (fder ff) vvl
1.4172761780693386

Imaging the underlying $V_F = \mathrm{und}(F)$ overlaid on the average, $\hat{A}\%V_{\mathrm{k}}$,

bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (fund ff)

20 query

We can see that a decomposition limited to 15 fuds has an expected path length of 4 but a label entropy similar to the 5-tuple above. Only 3 of the variables are in both models,

let xx = llqq $ map stringsVariable ["<13,15>","<16,15>","<20,12>","<22,10>","<24,13>"]

rp $ dfund df `inter` xx
"{<13,15>,<20,12>,<22,10>}"

The conditional entropy decomposition fud of a bi-valent substrate is always effective, $\mathrm{size}((A_{\mathrm{te}} * F^{\mathrm{T}}) * (A * F^{\mathrm{T}})^{\mathrm{F}}) = \mathrm{size}(A_{\mathrm{te}})$,

let hrqb = hrfmul uu2 ff hrq

size $ hhaa (hrhh uu2 (hrqb `hrhrred` (fder ff))) `mul` eff (hhaa (hrhh uu2 (hrb `hrhrred` (fder ff))))
1000 % 1

Imaging the slices of the decomposition,

rpln $ map (map (hrsize . snd)) $ qqll $ treesPaths $ hrmult uu1 df hr
"[7500,4765,3406,2564,1569,1130,811]"
"[7500,4765,3406,2564,995]"
"[7500,4765,3406,842]"
"[7500,4765,1359]"
"[7500,2735,1441,1070]"
"[7500,2735,1294,984]"

bmwrite file $ bmvstack $ map (\bm -> bminsert (bmempty (28+2) ((28+2)*7)) 0 0 bm) $ map (bmhstack . map (\(_,hrs) -> bmborder 1 (hrbm 28 1 2 (hrs `hrhrred` vvk)))) $ qqll $ treesPaths $ hrmult uu1 df hr

cond decomp

Let us apply the 15-fud to the test sample to calculate the accuracy of prediction, \[ \begin{eqnarray} &&\{(Q\%V_{\mathrm{l}},~\mathrm{max}(R)) : (S,\cdot) \in A_{\mathrm{te}} * \mathrm{his}(F^{\mathrm{T}})~\%~(W_F \cup V_{\mathrm{l}}),~Q = \{S\}^{\mathrm{U}}, \\ &&\hspace{8em}R = A * \mathrm{his}(F^{\mathrm{T}})~\%~(W_F \cup V_{\mathrm{l}}) * (Q\%W_F),~\mathrm{size}(R) > 0\} \end{eqnarray} \]

rpln $ take 20 [(aarr (qq `red` vvl), aarr (amax rr)) | let hhq = hrhh uu2 (hrqb `hrhrred` (fder ff `union` vvl)), let aa = hhaa (hrhh uu2 (hrb `hrhrred` (fder ff `union` vvl))), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` fder ff) `red` vvl, size rr > 0]
"([({(digit,7)},1.0)],[({(digit,7)},0.7174825174825175)])"
"([({(digit,0)},1.0)],[({(digit,0)},0.6117353308364545)])"
"([({(digit,9)},1.0)],[({(digit,5)},0.3783783783783784)])"
"([({(digit,3)},1.0)],[({(digit,3)},0.5753052917232022)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.8231780167264038)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.6442141623488774)])"
"([({(digit,7)},1.0)],[({(digit,7)},0.7174825174825175)])"
"([({(digit,7)},1.0)],[({(digit,7)},0.7174825174825175)])"
"([({(digit,7)},1.0)],[({(digit,4)},0.6975524475524476)])"
"([({(digit,3)},1.0)],[({(digit,3)},0.5753052917232022)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.6442141623488774)])"
"([({(digit,8)},1.0)],[({(digit,3)},0.5753052917232022)])"
"([({(digit,5)},1.0)],[({(digit,3)},0.5753052917232022)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.6442141623488774)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.6442141623488774)])"
"([({(digit,9)},1.0)],[({(digit,9)},0.5602836879432624)])"
"([({(digit,4)},1.0)],[({(digit,4)},0.29153605015673983)])"
"([({(digit,4)},1.0)],[({(digit,4)},0.29153605015673983)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.8231780167264038)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.8231780167264038)])"

In this case all 20 queries are effective and the model is correct 16 times,

bmwrite file $ bmvstack $ take 20 $ map (bmhstack . map (bmborder 1 . hrbm 28 1 2)) [[hrev [i] hrqb `hrhrred` vvk, hrb `hrhrsel` qq `hrhrred` vvk] | i <- [0 .. hrsize hrqb - 1], let qq = hrev [i] hrqb `hrhrred` fder ff]

20 query

Overall, this model is correct for 56.7% of the test sample, \[ \begin{eqnarray} &&|\{R : (S,\cdot) \in A_{\mathrm{te}} * \mathrm{his}(F^{\mathrm{T}})~\%~(W_F \cup V_{\mathrm{l}}),~Q = \{S\}^{\mathrm{U}}, \\ &&\hspace{4em}R = A * \mathrm{his}(F^{\mathrm{T}})~\%~(W_F \cup V_{\mathrm{l}}) * (Q\%W_F),~\mathrm{size}(\mathrm{max}(R) * (Q\%V_{\mathrm{l}})) > 0\}| \end{eqnarray} \]

length [rr | let hhq = hrhh uu2 (hrqb `hrhrred` (fder ff `union` vvl)), let aa = hhaa (hrhh uu2 (hrb `hrhrred` (fder ff `union` vvl))), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` fder ff) `red` vvl, size rr > 0, size (amax rr `mul` (qq `red` vvl)) > 0]
567

This is considerably more accurate than the 5-tuple (44.8%). It is approaching the accuracy of the 12-tuple (60.0%).

1-tuple 127-fud

Increasing the decomposition fuds cardinality to an expected path length of 7,

let (uu1,df) = decompercondrr vvl uu hr kmax omax 127

card $ dfund df
101

rpln $ map (map (\(_,ff) -> fund ff)) $ dfll df
"[{<13,15>},{<22,10>},{<20,12>},{<16,10>},{<15,14>},{<17,13>},{<6,16>},{<18,9>},{<10,19>},{<11,17>},{<21,15>},{<24,14>}]"
"[{<13,15>},{<22,10>},{<20,12>},{<16,10>},{<15,14>},{<17,13>},{<6,16>},{<18,9>},{<10,19>},{<11,17>},{<15,15>},{<6,11>},{<13,10>}]"
"[{<13,15>},{<22,10>},{<20,12>},{<16,10>},{<15,14>},{<17,13>},{<6,16>},{<18,9>},{<10,19>},{<20,8>},{<16,13>},{<14,14>}]"
"..."
"[{<13,15>},{<17,15>},{<8,11>},{<12,10>},{<19,12>},{<19,15>}]"
"[{<13,15>},{<17,15>},{<8,11>},{<12,10>},{<19,12>},{<12,13>}]"
"[{<13,15>},{<17,15>},{<8,11>},{<12,10>},{<24,16>}]"

let ff = fromJust $ systemsDecompFudsNullablePracticable uu1 df 1

let uu2 = uu `uunion` (fsys ff)

let hrb = hrfmul uu2 ff hr

hrlent uu2 hrb (fder ff) vvl
0.7456431103313212

bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (fund ff)

20 query

Imaging the slices of the decomposition,

rpln $ map (map (hrsize . snd)) $ qqll $ treesPaths $ hrmult uu1 df hr
"[7500,4765,3406,2564,1569,1130,811,715,651,241,86,52]"
"[7500,4765,3406,2564,1569,1130,811,715,651,241,155,128,121]"
"[7500,4765,3406,2564,1569,1130,811,715,651,410,401,394]"
...
"[7500,2735,1294,310,196,114]"
"[7500,2735,1294,310,196,82]"
"[7500,2735,1294,310,114]"

bmwrite file $ bmvstack $ map (\bm -> bminsert (bmempty (28+2) ((28+2)*13)) 0 0 bm) $ map (bmhstack . map (\(_,hrs) -> bmborder 1 (hrbm 28 1 2 (hrs `hrhrred` vvk)))) $ qqll $ treesPaths $ hrmult uu1 df hr

cond decomp

The decomposition fud is still effective,

let hrqb = hrfmul uu2 ff hrq

size $ hhaa (hrhh uu2 (hrqb `hrhrred` (fder ff))) `mul` eff (hhaa (hrhh uu2 (hrb `hrhrred` (fder ff))))
1000 % 1

Let us apply the 127-fud to the test sample to calculate the accuracy of prediction,

rpln $ take 20 [(aarr (qq `red` vvl), aarr (amax rr)) | let hhq = hrhh uu2 (hrqb `hrhrred` (fder ff `union` vvl)), let aa = hhaa (hrhh uu2 (hrb `hrhrred` (fder ff `union` vvl))), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` fder ff) `red` vvl, size rr > 0]
"([({(digit,7)},1.0)],[({(digit,7)},0.9639175257731959)])"
"([({(digit,0)},1.0)],[({(digit,0)},0.9884726224783862)])"
"([({(digit,9)},1.0)],[({(digit,9)},0.9090909090909091)])"
"([({(digit,3)},1.0)],[({(digit,3)},0.9798387096774194)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.9855769230769231)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.9707792207792207)])"
"([({(digit,7)},1.0)],[({(digit,7)},0.9047619047619048)])"
"([({(digit,7)},1.0)],[({(digit,7)},0.9639175257731959)])"
"([({(digit,7)},1.0)],[({(digit,5)},0.5238095238095238)])"
"([({(digit,3)},1.0)],[({(digit,3)},0.9798387096774194)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.9707792207792207)])"
"([({(digit,8)},1.0)],[({(digit,3)},0.7954545454545454)])"
"([({(digit,5)},1.0)],[({(digit,5)},0.7876106194690266)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.9707792207792207)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.7721518987341772)])"
"([({(digit,9)},1.0)],[({(digit,9)},0.9184782608695652)])"
"([({(digit,4)},1.0)],[({(digit,4)},0.8055555555555556)])"
"([({(digit,4)},1.0)],[({(digit,4)},0.8055555555555556)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.9855769230769231)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.7833333333333333)])"

In this case all 20 queries are effective and the model is correct 18 times,

bmwrite file $ bmvstack $ take 20 $ map (bmhstack . map (bmborder 1 . hrbm 28 1 2)) [[hrev [i] hrqb `hrhrred` vvk, hrb `hrhrsel` qq `hrhrred` vvk] | i <- [0 .. hrsize hrqb - 1], let qq = hrev [i] hrqb `hrhrred` fder ff]

20 query

Overall, this model is correct for 70.2% of the test sample,

length [rr | let hhq = hrhh uu2 (hrqb `hrhrred` (fder ff `union` vvl)), let aa = hhaa (hrhh uu2 (hrb `hrhrred` (fder ff `union` vvl))), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` fder ff) `red` vvl, size rr > 0, size (amax rr `mul` (qq `red` vvl)) > 0]
702

This is considerably more accurate than the 12-tuple (60.0%), which is also much less effective (87.6% instead of 100%).

1-tuple 511-fud

Increasing the decomposition fuds cardinality to an expected path length of 9,

let (uu1,df) = decompercondrr vvl uu hr kmax omax 511

card $ dfund df
236

let ff = fromJust $ systemsDecompFudsNullablePracticable uu1 df 1

let uu2 = uu `uunion` (fsys ff)

let hrb = hrfmul uu2 ff hr

hrlent uu2 hrb (fder ff) vvl
0.24275823199986313

bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (fund ff)

20 query

Overall, this model is correct for 75% of the test sample,

let hrqb = hrfmul uu2 ff hrq

length [rr | let hhq = hrhh uu2 (hrqb `hrhrred` (fder ff `union` vvl)), let aa = hhaa (hrhh uu2 (hrb `hrhrred` (fder ff `union` vvl))), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` fder ff) `red` vvl, size rr > 0, size (amax rr `mul` (qq `red` vvl)) > 0]
750

2-tuple 15-fud

Now consider a 2-tuple for each fud in a 15-fud decomposition,

let (kmax,omax) = (2, 5)

let (uu1,df) = decompercondrr vvl uu hr kmax omax 15

card $ dfund df
29

rpln $ map (map (\(_,ff) -> fund ff)) $ dfll df
"[{<14,15>,<17,15>},{<21,12>,<23,11>},{<16,13>,<19,13>},{<7,15>,<10,15>}]"
"[{<14,15>,<17,15>},{<21,12>,<23,11>},{<12,15>,<15,15>}]"
"[{<14,15>,<17,15>},{<21,12>,<23,11>},{<10,19>,<17,17>}]"
"[{<14,15>,<17,15>},{<21,13>,<24,12>},{<8,15>,<21,11>},{<13,17>,<14,19>}]"
"[{<14,15>,<17,15>},{<21,13>,<24,12>},{<9,19>,<13,11>}]"
"[{<14,15>,<17,15>},{<19,11>,<24,13>},{<19,13>,<22,13>}]"
"[{<14,15>,<17,15>},{<19,11>,<24,13>},{<11,16>,<11,18>}]"
"[{<14,15>,<17,15>},{<11,11>,<20,11>},{<8,11>,<16,12>}]"

let ff = fromJust $ systemsDecompFudsNullablePracticable uu1 df 1

let uu2 = uu `uunion` (fsys ff)

let hrb = hrfmul uu2 ff hr

hrlent uu2 hrb (fder ff) vvl
1.1776004502299249

bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (fund ff)

20 query

The decomposition fud is still effective,

let hrqb = hrfmul uu2 ff hrq

size $ hhaa (hrhh uu2 (hrqb `hrhrred` (fder ff))) `mul` eff (hhaa (hrhh uu2 (hrb `hrhrred` (fder ff))))
1000 % 1

Imaging the slices of the decomposition,

rpln $ map (map (hrsize . snd)) $ qqll $ treesPaths $ hrmult uu1 df hr
"[7500,2250,1008,618]"
"[7500,2250,614]"
"[7500,2250,368]"
"[7500,1913,979,386]"
"[7500,1913,498]"
"[7500,1693,578]"
"[7500,1693,660]"
"[7500,1644,988]"

bmwrite file $ bmvstack $ map (\bm -> bminsert (bmempty (28+2) ((28+2)*4)) 0 0 bm) $ map (bmhstack . map (\(_,hrs) -> bmborder 1 (hrbm 28 1 2 (hrs `hrhrred` vvk)))) $ qqll $ treesPaths $ hrmult uu1 df hr

cond decomp

Overall, this model is correct for 58% of the test sample,

length [rr | let hhq = hrhh uu2 (hrqb `hrhrred` (fder ff `union` vvl)), let aa = hhaa (hrhh uu2 (hrb `hrhrred` (fder ff `union` vvl))), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` fder ff) `red` vvl, size rr > 0, size (amax rr `mul` (qq `red` vvl)) > 0]
580

So the accuracy is little better than the 15-mono-variate-fud (56.7%).

2-tuple 127-fud

Now consider a 2-tuple for each fud in a 127-fud decomposition,

let (uu1,df) = decompercondrr vvl uu hr kmax omax 127

card $ dfund df
153

rpln $ map (map (\(_,ff) -> fund ff)) $ dfll df
"[{<14,15>,<17,15>},{<21,12>,<23,11>},{<16,13>,<19,13>},{<7,15>,<10,15>},{<8,15>,<17,12>},{<6,13>,<15,14>},{<5,14>,<12,13>}]"
"[{<14,15>,<17,15>},{<21,12>,<23,11>},{<16,13>,<19,13>},{<7,15>,<10,15>},{<8,15>,<17,12>},{<14,13>,<16,18>}]"
"[{<14,15>,<17,15>},{<21,12>,<23,11>},{<16,13>,<19,13>},{<7,15>,<10,15>},{<14,13>,<22,8>}]"
...
"[{<14,15>,<17,15>},{<11,11>,<20,11>},{<7,15>,<9,15>},{<13,13>,<22,13>}]"
"[{<14,15>,<17,15>},{<11,11>,<20,11>},{<7,15>,<9,15>},{<11,15>,<22,14>}]"
"[{<14,15>,<17,15>},{<11,11>,<20,11>},{<18,10>,<24,13>}]"

let ff = fromJust $ systemsDecompFudsNullablePracticable uu1 df 1

let uu2 = uu `uunion` (fsys ff)

let hrb = hrfmul uu2 ff hr

hrlent uu2 hrb (fder ff) vvl
0.4608076349580772

bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (fund ff)

20 query

The decomposition fud is slightly ineffective,

let hrqb = hrfmul uu2 ff hrq

size $ hhaa (hrhh uu2 (hrqb `hrhrred` (fder ff))) `mul` eff (hhaa (hrhh uu2 (hrb `hrhrred` (fder ff))))
999 % 1

Overall, this model is correct for 74.3% of the test sample,

length [rr | let hhq = hrhh uu2 (hrqb `hrhrred` (fder ff `union` vvl)), let aa = hhaa (hrhh uu2 (hrb `hrhrred` (fder ff `union` vvl))), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` fder ff) `red` vvl, size rr > 0, size (amax rr `mul` (qq `red` vvl)) > 0]
743

So the accuracy is better than the 127-mono-variate-fud (70.2%), but not as good as the 511-mono-variate-fud (75%).

3-tuple 127-fud

Finally, consider tri-variate-fuds,

let (kmax,omax) = (3, 5)

let (uu1,df) = decompercondrr vvl uu hr kmax omax 127

card $ dfund df
207

let ff = fromJust $ systemsDecompFudsNullablePracticable uu1 df 1

let uu2 = uu `uunion` (fsys ff)

let hrb = hrfmul uu2 ff hr

hrlent uu2 hrb (fder ff) vvl
0.268674398120087

bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (fund ff)

20 query

The decomposition fud is slightly ineffective,

let hrqb = hrfmul uu2 ff hrq

size $ hhaa (hrhh uu2 (hrqb `hrhrred` (fder ff))) `mul` eff (hhaa (hrhh uu2 (hrb `hrhrred` (fder ff))))
994 % 1

Overall, this model is correct for 73.6% of the test sample,

length [rr | let hhq = hrhh uu2 (hrqb `hrhrred` (fder ff `union` vvl)), let aa = hhaa (hrhh uu2 (hrb `hrhrred` (fder ff `union` vvl))), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` fder ff) `red` vvl, size rr > 0, size (amax rr `mul` (qq `red` vvl)) > 0]
736

So the accuracy of the 127-tri-variate-fud is a little worse than that of the 127-bi-variate-fud (74.3%).

Overall, of all of these models the 511-mono-variate-fud has the highest accuracy of 75.0%.

To conclude, a model consisting of only substrate variables can have reasonable accuracy/effectiveness with respect to digit, considering that the relative entropy of a substrate model is always quite low compared to induced models.

top