Predicting digit without modelling
MNIST - handwritten digits/Predicting digit without modelling
Sections
Introduction
The sample query variables predict digit. That is, there is a functional or causal relationship between the query variables and the label variables, $(A\%V_{\mathrm{k}})^{\mathrm{FS}} \to (A\%V_{\mathrm{l}})^{\mathrm{FS}}$. So the label entropy or query conditional entropy is zero. See Entropy and alignment. The label entropy is \[ \begin{eqnarray} \mathrm{lent}(A,W,L)~:=~\mathrm{entropy}(A~\%~(W \cup L)) - \mathrm{entropy}(A~\%~W) \end{eqnarray} \]
First load the training sample,
:l NISTDev
(uu,hrtr) <- nistTrainBucketedIO 2
let digit = VarStr "digit"
let vv = uvars uu
let vvl = sgl digit
let vvk = vv `minus` vvl
let hr = hrev [i | i <- [0 .. hrsize hrtr - 1], i `mod` 8 == 0] hrtr
The shuffle is $A_{\mathrm{r}}$,
let hrr = historyRepasShuffle_u hr 1
Now define the label entropy,
let hrlent uu hh ww vvl = ent (hhaa $ hrhh uu $ hh `hrhrred` (ww `union` vvl)) - ent (hhaa $ hrhh uu $ hh `hrhrred` ww)
Then $\mathrm{lent}(A,V_{\mathrm{k}},V_{\mathrm{l}}) = 0$,
hrlent uu hr vvk vvl
0.0
1-tuple
We can determine which of the query variables has the least conditional entropy, \[ \begin{eqnarray} \{(\mathrm{lent}(A,\{w\},V_{\mathrm{l}}),~w) : w \in V_{\mathrm{k}}\} \end{eqnarray} \]
let ll = sort [(hrlent uu hr (sgl w) vvl, w) | w <- qqll vvk]
rpln $ take 20 ll
"(2.106715914339201,<13,15>)"
"(2.1123060297815153,<14,15>)"
"(2.1332577705312117,<15,15>)"
"(2.1340760040384357,<22,10>)"
"(2.136893962468214,<20,11>)"
"(2.13721055765674,<21,10>)"
"(2.141687876893669,<20,12>)"
"(2.141948880423374,<21,11>)"
"(2.142064103611258,<12,15>)"
"(2.143524082463724,<22,11>)"
"(2.1438457401658626,<16,15>)"
"(2.1453000849733366,<23,11>)"
"(2.146147540554603,<14,14>)"
"(2.1488599750956165,<21,9>)"
"(2.1498280089275488,<24,13>)"
"(2.1506649095818275,<21,12>)"
"(2.151477655766341,<17,15>)"
"(2.1536449181562007,<15,18>)"
"(2.155469103682602,<23,12>)"
"(2.155476320456197,<20,10>)"
The top pixel is <13,15>
. Here it is overlaid on the average, $\hat{A}\%V_{\mathrm{k}}$,
let file = "NIST.bmp"
let hrbmav = hrbm 28 3 2 $ hr `hrhrred` vvk
bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (llqq $ snd $ unzip $ take 1 ll)
Imaging the 2 states ordered by size descending, $A~\%~\{\mathrm{<13,15>}\}$,
let v1315 = stringsVariable "<13,15>"
let pp = take 20 $ reverse $ sort $ map (\(a,b) -> (b,a)) $ aall $ araa uu $ hr `hrred` sgl v1315
map numerator $ fst $ unzip pp
[4765,2735]
bmwrite file $ bmhstack $ map (bmborder 1 . hrbm 28 3 2) $ [hr' `hrhrsel` hrs | let hr' = hr `hrhrred` vvk, (_,ss) <- pp, let hrs = aahr uu (single ss 1)]
The top 20 are $\mathrm{topd}(20)(L)$, where $L = \{(w,~\mathrm{lent}(A,\{w\},V_{\mathrm{l}})) : w \in V_{\mathrm{k}}\}$,
bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (llqq $ snd $ unzip $ take 20 ll)
The top 100 are $\mathrm{topd}(100)(L)$,
bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (llqq $ snd $ unzip $ take 100 ll)
The bottom 20 pixels are $\mathrm{botd}(20)(L)$,
rpln $ drop (28*28-20) ll
"(2.3018362154122483,<27,5>)"
"(2.3018362154122483,<27,23>)"
"(2.3018362154122483,<27,24>)"
"(2.3018362154122483,<27,25>)"
"(2.3018362154122483,<27,26>)"
"(2.3018362154122483,<27,27>)"
"(2.3018362154122483,<27,28>)"
"(2.3018362154122483,<28,1>)"
"(2.3018362154122483,<28,2>)"
"(2.3018362154122483,<28,3>)"
"(2.3018362154122483,<28,4>)"
"(2.3018362154122483,<28,5>)"
"(2.3018362154122483,<28,6>)"
"(2.3018362154122483,<28,7>)"
"(2.3018362154122483,<28,23>)"
"(2.3018362154122483,<28,24>)"
"(2.3018362154122483,<28,25>)"
"(2.3018362154122483,<28,26>)"
"(2.3018362154122483,<28,27>)"
"(2.3018362154122483,<28,28>)"
This may be compared to the entropy of the label variables, $\mathrm{entropy}(A\%V_{\mathrm{l}})$,
ent (hhaa $ hrhh uu $ hr `hrhrred` vvl)
2.3018362154122483
bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (llqq [u | (e,u) <- ll, e >= 2.3018362154122483])
Pixel <13,15>
has the least conditional entropy, and so is more predictive of digit, $\mathrm{lent}(A,\{\mathrm{<13,15>}\},V_{\mathrm{l}})$,
hrlent uu hr (sgl v1315) vvl
2.106715914339201
The reduction is $A~\%~\{\mathrm{<13,15>}\}$,
rpln $ aall $ hhaa $ hrhh uu $ hr `hrhrred` (sgl v1315 `union` vvl)
"({(digit,0),(<13,15>,0)},704 % 1)"
"({(digit,0),(<13,15>,1)},39 % 1)"
"({(digit,1),(<13,15>,0)},58 % 1)"
"({(digit,1),(<13,15>,1)},764 % 1)"
"({(digit,2),(<13,15>,0)},568 % 1)"
"({(digit,2),(<13,15>,1)},170 % 1)"
"({(digit,3),(<13,15>,0)},168 % 1)"
"({(digit,3),(<13,15>,1)},577 % 1)"
"({(digit,4),(<13,15>,0)},670 % 1)"
"({(digit,4),(<13,15>,1)},76 % 1)"
"({(digit,5),(<13,15>,0)},442 % 1)"
"({(digit,5),(<13,15>,1)},278 % 1)"
"({(digit,6),(<13,15>,0)},534 % 1)"
"({(digit,6),(<13,15>,1)},173 % 1)"
"({(digit,7),(<13,15>,0)},692 % 1)"
"({(digit,7),(<13,15>,1)},75 % 1)"
"({(digit,8),(<13,15>,0)},349 % 1)"
"({(digit,8),(<13,15>,1)},405 % 1)"
"({(digit,9),(<13,15>,0)},580 % 1)"
"({(digit,9),(<13,15>,1)},178 % 1)"
We can determine minimum subsets of the query variables that are causal or predictive by using the repa conditional entropy tuple set builder. We shall also calculate the shuffle content derived alignment and the size-volume-sized-shuffle relative entropy. \[ \{(\mathrm{lent}(A,M,V_{\mathrm{l}}),~M) : M \in \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,A,\mathrm{L}}))\} \] First load the test sample and select a subset of 1000 events $A_{\mathrm{te}}$,
(_,hrte) <- nistTestBucketedIO 2
hrsize hrte
10000
let hrq = hrev [i | i <- [0 .. hrsize hrte - 1], i `mod` 10 == 0] hrte
hrsize hrq
1000
Now we do the conditional entropy minimise,
let buildcondrr vvl aa kmax omax qmax = sort $ map (\(a,b) -> (b,a)) $ Map.toList $ fromJust $ parametersBuilderConditionalVarsRepa kmax omax qmax vvl aa
let (kmax,omax,qmax) = (1, 60, 10)
let ll = buildcondrr vvl hr kmax omax qmax
rpln ll
"(2.106715914339201,{<13,15>})"
"(2.1123060297815153,{<14,15>})"
"(2.1332577705312117,{<15,15>})"
"(2.1340760040384357,{<22,10>})"
"(2.136893962468214,{<20,11>})"
"(2.13721055765674,{<21,10>})"
"(2.141687876893669,{<20,12>})"
"(2.141948880423374,{<21,11>})"
"(2.142064103611258,{<12,15>})"
"(2.143524082463724,{<22,11>})"
Let us sort by shuffle content derived alignment descending. Let $L = \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,A,\mathrm{L}}))$. Then calculate \[ \{(\mathrm{algn}(A\%X)-\mathrm{algn}(A_{\mathrm{r}}\%X),~X) : (e,X) \in L\} \]
rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uu (hr `hrhrred` xx)), let aar' = hhaa (hrhh uu (hrr `hrhrred` xx))]
"(0.0,{<22,11>})"
"(0.0,{<22,10>})"
"(0.0,{<21,11>})"
"(0.0,{<21,10>})"
"(0.0,{<20,12>})"
"(0.0,{<20,11>})"
"(0.0,{<15,15>})"
"(0.0,{<14,15>})"
"(0.0,{<13,15>})"
"(0.0,{<12,15>})"
and by size-volume-sized-shuffle relative entropy descending, \[ \{(\mathrm{rent}(A~\%~X,~Z_X * \hat{A}_{\mathrm{r}}~\%~X),~X) : (e,X) \in L\} \] where $Z_X = \mathrm{scalar}(|X^{\mathrm{C}}|)$,
rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uu (hr `hrhrred` xx)), let vaar' = vsize uu xx (hhaa (hrhh uu (hrr `hrhrred` xx)))]
"(2.7622348852673895e-13,{<12,15>})"
"(2.737809978725636e-13,{<14,15>})"
"(2.2959412149248237e-13,{<15,15>})"
"(-4.884981308350689e-15,{<13,15>})"
"(-6.439293542825908e-14,{<20,12>})"
"(-1.6564527527407336e-13,{<22,11>})"
"(-2.0516921495072893e-13,{<22,10>})"
"(-3.157474282033945e-13,{<20,11>})"
"(-4.89386309254769e-13,{<21,10>})"
"(-6.794564910705958e-13,{<21,11>})"
Choose the top tuple $X$,
let xx = llqq $ map stringsVariable ["<13,15>"]
card xx
1
The label entropy, $\mathrm{lent}(A,X,V_{\mathrm{l}})$, is,
hrlent uu hr xx vvl
2.106715914339201
This tuple has a volume of $|X^{\mathrm{C}}| = 2$,
vol uu xx
2
Now consider the query effectiveness against the test set, $\mathrm{size}(A_{\mathrm{te}} * (A\%X)^{\mathrm{F}})$,
size $ hhaa (hrhh uu hrq) `mul` eff (hhaa (hrhh uu (hr `hrhrred` xx)))
1000 % 1
So there exists a prediction for each of the test set for the mono-variate tuple.
2-tuple
let (kmax,omax,qmax) = (2, 10, 10)
let ll = buildcondrr vvl hr kmax omax qmax
rpln ll
"(1.9278815272185974,{<14,15>,<17,15>})"
"(1.9308806609242308,{<13,15>,<16,15>})"
"(1.9312459077976918,{<13,15>,<17,15>})"
"(1.93457345799883,{<13,15>,<22,10>})"
"(1.9378630043953962,{<13,15>,<20,11>})"
"(1.9389377983376297,{<13,15>,<21,10>})"
"(1.941189996342865,{<14,15>,<22,10>})"
"(1.9421337966096723,{<13,15>,<21,11>})"
"(1.9422336675359642,{<14,15>,<20,11>})"
"(1.943167355230266,{<13,15>,<22,11>})"
rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uu (hr `hrhrred` xx)), let aar' = hhaa (hrhh uu (hrr `hrhrred` xx))]
"(43.05125269116252,{<13,15>,<16,15>})"
"(7.2539737408587825,{<13,15>,<21,10>})"
"(4.2186678996440605,{<13,15>,<20,11>})"
"(2.9941047592583345,{<14,15>,<17,15>})"
"(1.385944680099783,{<14,15>,<22,10>})"
"(1.269607648246165,{<13,15>,<22,10>})"
"(-0.16175869062863057,{<14,15>,<20,11>})"
"(-0.28838801408710424,{<13,15>,<17,15>})"
"(-0.3156879640810075,{<13,15>,<22,11>})"
"(-0.4525069542287383,{<13,15>,<21,11>})"
rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uu (hr `hrhrred` xx)), let vaar' = vsize uu xx (hhaa (hrhh uu (hrr `hrhrred` xx)))]
"(1.8386829250236048e-2,{<13,15>,<16,15>})"
"(4.991577993035179e-3,{<14,15>,<17,15>})"
"(3.7668384198976668e-3,{<13,15>,<21,10>})"
"(2.608184312029138e-3,{<13,15>,<21,11>})"
"(7.863858921490774e-4,{<13,15>,<22,10>})"
"(7.260431202160689e-4,{<13,15>,<20,11>})"
"(2.874437048907552e-4,{<14,15>,<22,10>})"
"(1.039902050123942e-4,{<13,15>,<17,15>})"
"(3.374541994727309e-5,{<13,15>,<22,11>})"
"(2.660226409112454e-5,{<14,15>,<20,11>})"
let xx = llqq $ map stringsVariable ["<13,15>","<16,15>"]
card xx
2
hrlent uu hr xx vvl
1.9308806609242308
vol uu xx
4
size $ hhaa (hrhh uu hrq) `mul` eff (hhaa (hrhh uu (hr `hrhrred` xx)))
1000 % 1
rpln $ aall $ hhaa $ hrhh uu $ hr `hrhrred` (xx `union` vvl)
"({(digit,0),(<13,15>,0),(<16,15>,0)},699 % 1)"
"({(digit,0),(<13,15>,0),(<16,15>,1)},5 % 1)"
"({(digit,0),(<13,15>,1),(<16,15>,0)},33 % 1)"
"({(digit,0),(<13,15>,1),(<16,15>,1)},6 % 1)"
"({(digit,1),(<13,15>,0),(<16,15>,0)},19 % 1)"
"({(digit,1),(<13,15>,0),(<16,15>,1)},39 % 1)"
"({(digit,1),(<13,15>,1),(<16,15>,0)},19 % 1)"
"({(digit,1),(<13,15>,1),(<16,15>,1)},745 % 1)"
...
"({(digit,9),(<13,15>,0),(<16,15>,0)},248 % 1)"
"({(digit,9),(<13,15>,0),(<16,15>,1)},332 % 1)"
"({(digit,9),(<13,15>,1),(<16,15>,0)},123 % 1)"
"({(digit,9),(<13,15>,1),(<16,15>,1)},55 % 1)"
bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk xx
Imaging the 4 states ordered by size descending, $A\%X$,
let pp = take 20 $ reverse $ sort $ map (\(a,b) -> (b,a)) $ aall $ araa uu $ hr `hrred` xx
map numerator $ fst $ unzip pp
[2489,2276,1612,1123]
bmwrite file $ bmhstack $ map (bmborder 1 . hrbm 28 1 2) $ [hr' `hrhrsel` hrs | let hr' = hr `hrhrred` vvk, (_,ss) <- pp, let hrs = aahr uu (single ss 1)]
This 2-tuple is quite good at distinguishing between zero and non-zero.
5-tuple
Continue on to the 5-tuple,
let (kmax,omax,qmax) = (5, 10, 10)
let ll = buildcondrr vvl hr kmax omax qmax
rpln ll
"(1.436485455789557,{<13,15>,<16,15>,<20,12>,<22,10>,<24,13>})"
"(1.438843611340515,{<13,15>,<16,15>,<20,12>,<22,10>,<24,14>})"
"(1.4468132285301718,{<11,18>,<13,15>,<16,15>,<20,12>,<22,10>})"
"(1.4474427280733333,{<14,15>,<17,15>,<20,12>,<22,10>,<24,13>})"
"(1.4497701554712257,{<13,15>,<16,15>,<20,12>,<22,11>,<24,13>})"
"(1.4500235885928388,{<13,15>,<16,15>,<20,12>,<22,10>,<24,12>})"
"(1.450024451470413,{<14,15>,<17,15>,<20,12>,<22,10>,<24,14>})"
"(1.4505181754063354,{<13,10>,<13,15>,<16,15>,<20,12>,<22,10>})"
"(1.451323679428873,{<13,18>,<14,15>,<17,15>,<20,12>,<22,10>})"
"(1.4522048769374631,{<10,18>,<13,15>,<16,15>,<20,12>,<22,10>})"
rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uu (hr `hrhrred` xx)), let aar' = hhaa (hrhh uu (hrr `hrhrred` xx))]
"(811.0329633142683,{<13,15>,<16,15>,<20,12>,<22,11>,<24,13>})"
"(758.9258663460932,{<13,15>,<16,15>,<20,12>,<22,10>,<24,12>})"
"(733.2461124636175,{<14,15>,<17,15>,<20,12>,<22,10>,<24,13>})"
"(656.6864908957359,{<13,18>,<14,15>,<17,15>,<20,12>,<22,10>})"
"(644.0347740919315,{<13,15>,<16,15>,<20,12>,<22,10>,<24,13>})"
"(638.4889439393664,{<14,15>,<17,15>,<20,12>,<22,10>,<24,14>})"
"(536.7002904385736,{<13,15>,<16,15>,<20,12>,<22,10>,<24,14>})"
"(506.4821881385724,{<11,18>,<13,15>,<16,15>,<20,12>,<22,10>})"
"(485.0874000337062,{<13,10>,<13,15>,<16,15>,<20,12>,<22,10>})"
"(448.7673821378048,{<10,18>,<13,15>,<16,15>,<20,12>,<22,10>})"
rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uu (hr `hrhrred` xx)), let vaar' = vsize uu xx (hhaa (hrhh uu (hrr `hrhrred` xx)))]
"(4.430436832928706,{<13,15>,<16,15>,<20,12>,<22,11>,<24,13>})"
"(3.638794073206938,{<13,15>,<16,15>,<20,12>,<22,10>,<24,12>})"
"(3.5359659134791883,{<14,15>,<17,15>,<20,12>,<22,10>,<24,13>})"
"(3.358375838520928,{<14,15>,<17,15>,<20,12>,<22,10>,<24,14>})"
"(3.127285631617113,{<13,15>,<16,15>,<20,12>,<22,10>,<24,13>})"
"(3.088298662755136,{<13,18>,<14,15>,<17,15>,<20,12>,<22,10>})"
"(2.9533695215733786,{<13,15>,<16,15>,<20,12>,<22,10>,<24,14>})"
"(2.311549911111868,{<11,18>,<13,15>,<16,15>,<20,12>,<22,10>})"
"(2.2027705825412482,{<13,10>,<13,15>,<16,15>,<20,12>,<22,10>})"
"(1.8670983616914754,{<10,18>,<13,15>,<16,15>,<20,12>,<22,10>})"
let xx = head $ snd $ unzip ll
card xx
5
hrlent uu hr xx vvl
1.436485455789557
vol uu xx
32
size $ hhaa (hrhh uu hrq) `mul` eff (hhaa (hrhh uu (hr `hrhrred` xx)))
1000 % 1
rpln $ aall $ hhaa $ hrhh uu $ hr `hrhrred` (xx `union` vvl)
bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk xx
Imaging the first 20 states ordered by size descending, $\mathrm{top}(20)(A\%X)$,
let pp = take 20 $ reverse $ sort $ map (\(a,b) -> (b,a)) $ aall $ araa uu $ hr `hrred` xx
map numerator $ fst $ unzip pp
[898,861,673,436,407,369,359,333,299,290,245,232,217,178,169,152,150,141,133,129]
bmwrite file $ bmhstack $ map (bmborder 1 . hrbm 28 1 2) $ [hr' `hrhrsel` hrs | let hr' = hr `hrhrred` vvk, (_,ss) <- pp, let hrs = aahr uu (single ss 1)]
Let us apply the 5-tuple to the test sample to calculate the accuracy of prediction, \[ \{(Q\%V_{\mathrm{l}},~\mathrm{max}(R)) : (S,\cdot) \in A_{\mathrm{te}},~Q = \{S\}^{\mathrm{U}},~R = A * (Q\%X),~\mathrm{size}(R) > 0\} \]
let amax = llaa . take 1 . reverse . map (\(a,b) -> (b,a)) . sort . map (\(a,b) -> (b,a)) . aall . norm . trim
rpln $ take 20 [(aarr (qq `red` vvl), aarr (amax rr)) | let hhq = hrhh uu (hrq `hrhrred` (xx `union` vvl)), let aa = araa uu (hr `hrred` (xx `union` vvl)), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` xx) `red` vvl, size rr > 0]
"([({(digit,7)},1.0)],[({(digit,7)},0.4151376146788991)])"
"([({(digit,0)},1.0)],[({(digit,0)},0.7764127764127764)])"
"([({(digit,9)},1.0)],[({(digit,3)},0.49310344827586206)])"
"([({(digit,3)},1.0)],[({(digit,3)},0.49310344827586206)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.6939078751857355)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.6128133704735376)])"
"([({(digit,7)},1.0)],[({(digit,7)},0.35423925667828104)])"
"([({(digit,7)},1.0)],[({(digit,7)},0.4151376146788991)])"
"([({(digit,7)},1.0)],[({(digit,7)},0.35423925667828104)])"
"([({(digit,3)},1.0)],[({(digit,3)},0.49310344827586206)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.6128133704735376)])"
"([({(digit,8)},1.0)],[({(digit,1)},0.6939078751857355)])"
"([({(digit,5)},1.0)],[({(digit,8)},0.4883720930232558)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.6128133704735376)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.4624624624624625)])"
"([({(digit,9)},1.0)],[({(digit,4)},0.4298440979955457)])"
"([({(digit,4)},1.0)],[({(digit,4)},0.4298440979955457)])"
"([({(digit,4)},1.0)],[({(digit,4)},0.4298440979955457)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.6939078751857355)])"
"([({(digit,1)},1.0)],[({(digit,8)},0.47368421052631576)])"
We show the test sample digit on the left and the best guess by the tuple on the right, along with the probability. In this case all 20 queries are effective and the model is correct 15 times.
These are the query images side by side with the slice for the tuple state,
bmwrite file $ bmvstack $ take 20 $ map (bmhstack . map (bmborder 1 . hrbm 28 1 2)) [[hrev [i] hrq `hrhrred` vvk, hr `hrhrsel` qq `hrhrred` vvk] | i <- [0 .. hrsize hrq - 1], let qq = hrev [i] hrq `hrhrred` xx]
Overall, this model is correct for 44.8% of the test sample, \[ |\{R : (S,\cdot) \in A_{\mathrm{te}},~Q = \{S\}^{\mathrm{U}},~R = A * (Q\%X),~\mathrm{size}(\mathrm{max}(R) * (Q\%V_{\mathrm{l}})) > 0\}| \]
length [rr | let hhq = hrhh uu (hrq `hrhrred` (xx `union` vvl)), let aa = araa uu (hr `hrred` (xx `union` vvl)), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` xx) `red` vvl, size rr > 0, size (amax rr `mul` (qq `red` vvl)) > 0]
448
12-tuple
Continue on to the 12-tuple,
let (kmax,omax,qmax) = (12, 10, 10)
let ll = buildcondrr vvl hr kmax omax qmax
rpln ll
"(0.44876138405600585,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<15,18>,<16,15>,<17,11>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(0.4488059178690671,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<15,18>,<16,15>,<17,12>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(0.4494382758231419,{<8,15>,<10,12>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(0.4497106538933693,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<17,12>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(0.450279819031401,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<17,11>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(0.4507996248397248,{<8,15>,<9,12>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(0.45103768071386785,{<8,15>,<9,12>,<10,17>,<10,19>,<13,15>,<15,12>,<15,18>,<16,15>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(0.45139293398245073,{<6,16>,<8,15>,<10,12>,<10,17>,<13,15>,<14,18>,<15,12>,<16,15>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(0.4519764035369258,{<8,15>,<10,17>,<10,19>,<11,11>,<13,15>,<15,12>,<16,15>,<16,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(0.45260335677485486,{<8,15>,<10,12>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<17,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uu (hr `hrhrred` xx)), let aar' = hhaa (hrhh uu (hrr `hrhrred` xx))]
"(4699.670621715999,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<15,18>,<16,15>,<17,12>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(4621.062377420516,{<6,16>,<8,15>,<10,12>,<10,17>,<13,15>,<14,18>,<15,12>,<16,15>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(4614.444253739001,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<17,12>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(4577.341163675566,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<15,18>,<16,15>,<17,11>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(4480.221910807174,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<17,11>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(4193.502863142198,{<8,15>,<9,12>,<10,17>,<10,19>,<13,15>,<15,12>,<15,18>,<16,15>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(4183.941209264208,{<8,15>,<10,12>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(4149.267596970897,{<8,15>,<10,12>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<17,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(4119.763448820465,{<8,15>,<10,17>,<10,19>,<11,11>,<13,15>,<15,12>,<16,15>,<16,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(4095.1484671911376,{<8,15>,<9,12>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uu (hr `hrhrred` xx)), let vaar' = vsize uu xx (hhaa (hrhh uu (hrr `hrhrred` xx)))]
"(3123.6207472783244,{<6,16>,<8,15>,<10,12>,<10,17>,<13,15>,<14,18>,<15,12>,<16,15>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(3032.1854086896783,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<15,18>,<16,15>,<17,12>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(3012.1475269226794,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<17,12>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(2996.2635833369714,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<17,11>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(2988.9158685330367,{<8,15>,<10,17>,<10,19>,<13,15>,<15,12>,<15,18>,<16,15>,<17,11>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(2969.2991737766934,{<8,15>,<9,12>,<10,17>,<10,19>,<13,15>,<15,12>,<15,18>,<16,15>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(2945.6892856164413,{<8,15>,<10,17>,<10,19>,<11,11>,<13,15>,<15,12>,<16,15>,<16,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(2915.848713614232,{<8,15>,<10,12>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(2907.7712056377277,{<8,15>,<9,12>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<16,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
"(2892.892903450367,{<8,15>,<10,12>,<10,17>,<10,19>,<13,15>,<15,12>,<16,15>,<17,18>,<20,12>,<21,14>,<22,10>,<24,13>})"
let xx = head $ snd $ unzip ll
card xx
12
hrlent uu hr xx vvl
0.44876138405600585
vol uu xx
4096
size $ hhaa (hrhh uu hrq) `mul` eff (hhaa (hrhh uu (hr `hrhrred` xx)))
876 % 1
rpln $ aall $ hhaa $ hrhh uu $ hr `hrhrred` (xx `union` vvl)
bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk xx
Imaging the first 20 states ordered by size descending, $\mathrm{top}(20)(A\%X)$,
let pp = take 20 $ reverse $ sort $ map (\(a,b) -> (b,a)) $ aall $ araa uu $ hr `hrred` xx
map numerator $ fst $ unzip pp
[162,138,88,72,65,64,63,55,51,45,44,44,44,44,44,42,40,36,36,35]
bmwrite file $ bmhstack $ map (bmborder 1 . hrbm 28 1 2) $ [hr' `hrhrsel` hrs | let hr' = hr `hrhrred` vvk, (_,ss) <- pp, let hrs = aahr uu (single ss 1)]
We can see that as the tuple cardinality increases and the label entropy decreases, the slices are increasingly identifiable as a digit.
Let us apply the 12-tuple to the test sample to calculate the accuracy of prediction, \[ \{(Q\%V_{\mathrm{l}},~\mathrm{max}(R)) : (S,\cdot) \in A_{\mathrm{te}},~Q = \{S\}^{\mathrm{U}},~R = A * (Q\%X),~\mathrm{size}(R) > 0\} \]
rpln $ take 20 [(aarr (qq `red` vvl), aarr (amax rr)) | let hhq = hrhh uu (hrq `hrhrred` (xx `union` vvl)), let aa = araa uu (hr `hrred` (xx `union` vvl)), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` xx) `red` vvl, size rr > 0]
"([({(digit,7)},1.0)],[({(digit,7)},0.9777777777777777)])"
"([({(digit,0)},1.0)],[({(digit,0)},0.7818181818181819)])"
"([({(digit,9)},1.0)],[({(digit,9)},0.8)])"
"([({(digit,3)},1.0)],[({(digit,3)},0.875)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.9814814814814815)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.9722222222222222)])"
"([({(digit,7)},1.0)],[({(digit,7)},0.7894736842105263)])"
"([({(digit,7)},1.0)],[({(digit,7)},0.6)])"
"([({(digit,7)},1.0)],[({(digit,4)},0.8666666666666667)])"
"([({(digit,3)},1.0)],[({(digit,3)},0.6666666666666666)])"
"([({(digit,6)},1.0)],[({(digit,6)},1.0)])"
"([({(digit,8)},1.0)],[({(digit,4)},1.0)])"
"([({(digit,5)},1.0)],[({(digit,5)},1.0)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.9722222222222222)])"
"([({(digit,6)},1.0)],[({(digit,6)},1.0)])"
"([({(digit,9)},1.0)],[({(digit,9)},0.782608695652174)])"
"([({(digit,4)},1.0)],[({(digit,4)},0.967741935483871)])"
"([({(digit,4)},1.0)],[({(digit,4)},0.8095238095238095)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.9710144927536232)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.8333333333333334)])"
In this case all 20 queries are effective and the model is correct 18 times,
bmwrite file $ bmvstack $ take 20 $ map (bmhstack . map (bmborder 1 . hrbm 28 1 2)) [[hrev [i] hrq `hrhrred` vvk, hr `hrhrsel` qq `hrhrred` vvk] | i <- [0 .. hrsize hrq - 1], let qq = hrev [i] hrq `hrhrred` xx]
Overall, this model is correct for 60% of the test sample,
length [rr | let hhq = hrhh uu (hrq `hrhrred` (xx `union` vvl)), let aa = araa uu (hr `hrred` (xx `union` vvl)), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` xx) `red` vvl, size rr > 0, size (amax rr `mul` (qq `red` vvl)) > 0]
600
The prediction accuracy of larger tuples is not likely to be much higher, because the query effectiveness declines as the tuple cardinality increases.
1-tuple 15-fud
Instead of determining minimum subsets of the query variables that are causal or predictive by using the conditional entropy tuple set builder, consider using the conditional entropy fud decomper, $\{D\} = \mathrm{leaves}(\mathrm{tree}(Z_{P,A,\mathrm{L,D,F}}))$.
The resultant decomposition consists of singleton fuds of self partition transforms of smaller tuples. In this way a set of paths of different tuples for different slices can reduce the label entropy,
let decompercondrr vvl uu aa kmax omax fmax = fromJust $ parametersSystemsHistoryRepasDecomperConditionalFmaxRepa kmax omax fmax uu vvl aa
let (kmax,omax) = (1, 5)
let (uu1,df) = decompercondrr vvl uu hr kmax omax 15
rp $ dfund df
"{<6,16>,<8,11>,<8,15>,<9,18>,<11,11>,<13,15>,<15,12>,<15,14>,<16,10>,<16,16>,<17,13>,<17,15>,<19,11>,<20,12>,<22,10>}"
card $ dfund df
15
let dfll = qqll . treesPaths . dfzz
rpln $ map (map (\(_,ff) -> fund ff)) $ dfll df
"[{<13,15>},{<22,10>},{<20,12>},{<16,10>},{<15,14>},{<17,13>},{<6,16>}]"
"[{<13,15>},{<22,10>},{<20,12>},{<16,10>},{<8,15>}]"
"[{<13,15>},{<22,10>},{<20,12>},{<9,18>}]"
"[{<13,15>},{<22,10>},{<16,16>}]"
"[{<13,15>},{<17,15>},{<19,11>},{<11,11>}]"
"[{<13,15>},{<17,15>},{<8,11>},{<15,12>}]"
Now analyse with the fud decomposition fud, $F = D^{\mathrm{F}}$, (see Practicable fud decomposition fud),
let ff = fromJust $ systemsDecompFudsNullablePracticable uu1 df 1
let uu2 = uu `uunion` (fsys ff)
let hrb = hrfmul uu2 ff hr
The label entropy, $\mathrm{lent}(A * \mathrm{his}(F^{\mathrm{T}}),W_F,V_{\mathrm{l}})$, where $W_F = \mathrm{der}(F)$, is
hrlent uu2 hrb (fder ff) vvl
1.4172761780693386
Imaging the underlying $V_F = \mathrm{und}(F)$ overlaid on the average, $\hat{A}\%V_{\mathrm{k}}$,
bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (fund ff)
We can see that a decomposition limited to 15 fuds has an expected path length of 4 but a label entropy similar to the 5-tuple above. Only 3 of the variables are in both models,
let xx = llqq $ map stringsVariable ["<13,15>","<16,15>","<20,12>","<22,10>","<24,13>"]
rp $ dfund df `inter` xx
"{<13,15>,<20,12>,<22,10>}"
The conditional entropy decomposition fud of a bi-valent substrate is always effective, $\mathrm{size}((A_{\mathrm{te}} * F^{\mathrm{T}}) * (A * F^{\mathrm{T}})^{\mathrm{F}}) = \mathrm{size}(A_{\mathrm{te}})$,
let hrqb = hrfmul uu2 ff hrq
size $ hhaa (hrhh uu2 (hrqb `hrhrred` (fder ff))) `mul` eff (hhaa (hrhh uu2 (hrb `hrhrred` (fder ff))))
1000 % 1
Imaging the slices of the decomposition,
rpln $ map (map (hrsize . snd)) $ qqll $ treesPaths $ hrmult uu1 df hr
"[7500,4765,3406,2564,1569,1130,811]"
"[7500,4765,3406,2564,995]"
"[7500,4765,3406,842]"
"[7500,4765,1359]"
"[7500,2735,1441,1070]"
"[7500,2735,1294,984]"
bmwrite file $ bmvstack $ map (\bm -> bminsert (bmempty (28+2) ((28+2)*7)) 0 0 bm) $ map (bmhstack . map (\(_,hrs) -> bmborder 1 (hrbm 28 1 2 (hrs `hrhrred` vvk)))) $ qqll $ treesPaths $ hrmult uu1 df hr
Let us apply the 15-fud to the test sample to calculate the accuracy of prediction, \[ \begin{eqnarray} &&\{(Q\%V_{\mathrm{l}},~\mathrm{max}(R)) : (S,\cdot) \in A_{\mathrm{te}} * \mathrm{his}(F^{\mathrm{T}})~\%~(W_F \cup V_{\mathrm{l}}),~Q = \{S\}^{\mathrm{U}}, \\ &&\hspace{8em}R = A * \mathrm{his}(F^{\mathrm{T}})~\%~(W_F \cup V_{\mathrm{l}}) * (Q\%W_F),~\mathrm{size}(R) > 0\} \end{eqnarray} \]
rpln $ take 20 [(aarr (qq `red` vvl), aarr (amax rr)) | let hhq = hrhh uu2 (hrqb `hrhrred` (fder ff `union` vvl)), let aa = hhaa (hrhh uu2 (hrb `hrhrred` (fder ff `union` vvl))), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` fder ff) `red` vvl, size rr > 0]
"([({(digit,7)},1.0)],[({(digit,7)},0.7174825174825175)])"
"([({(digit,0)},1.0)],[({(digit,0)},0.6117353308364545)])"
"([({(digit,9)},1.0)],[({(digit,5)},0.3783783783783784)])"
"([({(digit,3)},1.0)],[({(digit,3)},0.5753052917232022)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.8231780167264038)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.6442141623488774)])"
"([({(digit,7)},1.0)],[({(digit,7)},0.7174825174825175)])"
"([({(digit,7)},1.0)],[({(digit,7)},0.7174825174825175)])"
"([({(digit,7)},1.0)],[({(digit,4)},0.6975524475524476)])"
"([({(digit,3)},1.0)],[({(digit,3)},0.5753052917232022)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.6442141623488774)])"
"([({(digit,8)},1.0)],[({(digit,3)},0.5753052917232022)])"
"([({(digit,5)},1.0)],[({(digit,3)},0.5753052917232022)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.6442141623488774)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.6442141623488774)])"
"([({(digit,9)},1.0)],[({(digit,9)},0.5602836879432624)])"
"([({(digit,4)},1.0)],[({(digit,4)},0.29153605015673983)])"
"([({(digit,4)},1.0)],[({(digit,4)},0.29153605015673983)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.8231780167264038)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.8231780167264038)])"
In this case all 20 queries are effective and the model is correct 16 times,
bmwrite file $ bmvstack $ take 20 $ map (bmhstack . map (bmborder 1 . hrbm 28 1 2)) [[hrev [i] hrqb `hrhrred` vvk, hrb `hrhrsel` qq `hrhrred` vvk] | i <- [0 .. hrsize hrqb - 1], let qq = hrev [i] hrqb `hrhrred` fder ff]
Overall, this model is correct for 56.7% of the test sample, \[ \begin{eqnarray} &&|\{R : (S,\cdot) \in A_{\mathrm{te}} * \mathrm{his}(F^{\mathrm{T}})~\%~(W_F \cup V_{\mathrm{l}}),~Q = \{S\}^{\mathrm{U}}, \\ &&\hspace{4em}R = A * \mathrm{his}(F^{\mathrm{T}})~\%~(W_F \cup V_{\mathrm{l}}) * (Q\%W_F),~\mathrm{size}(\mathrm{max}(R) * (Q\%V_{\mathrm{l}})) > 0\}| \end{eqnarray} \]
length [rr | let hhq = hrhh uu2 (hrqb `hrhrred` (fder ff `union` vvl)), let aa = hhaa (hrhh uu2 (hrb `hrhrred` (fder ff `union` vvl))), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` fder ff) `red` vvl, size rr > 0, size (amax rr `mul` (qq `red` vvl)) > 0]
567
This is considerably more accurate than the 5-tuple (44.8%). It is approaching the accuracy of the 12-tuple (60.0%).
1-tuple 127-fud
Increasing the decomposition fuds cardinality to an expected path length of 7,
let (uu1,df) = decompercondrr vvl uu hr kmax omax 127
card $ dfund df
101
rpln $ map (map (\(_,ff) -> fund ff)) $ dfll df
"[{<13,15>},{<22,10>},{<20,12>},{<16,10>},{<15,14>},{<17,13>},{<6,16>},{<18,9>},{<10,19>},{<11,17>},{<21,15>},{<24,14>}]"
"[{<13,15>},{<22,10>},{<20,12>},{<16,10>},{<15,14>},{<17,13>},{<6,16>},{<18,9>},{<10,19>},{<11,17>},{<15,15>},{<6,11>},{<13,10>}]"
"[{<13,15>},{<22,10>},{<20,12>},{<16,10>},{<15,14>},{<17,13>},{<6,16>},{<18,9>},{<10,19>},{<20,8>},{<16,13>},{<14,14>}]"
"..."
"[{<13,15>},{<17,15>},{<8,11>},{<12,10>},{<19,12>},{<19,15>}]"
"[{<13,15>},{<17,15>},{<8,11>},{<12,10>},{<19,12>},{<12,13>}]"
"[{<13,15>},{<17,15>},{<8,11>},{<12,10>},{<24,16>}]"
let ff = fromJust $ systemsDecompFudsNullablePracticable uu1 df 1
let uu2 = uu `uunion` (fsys ff)
let hrb = hrfmul uu2 ff hr
hrlent uu2 hrb (fder ff) vvl
0.7456431103313212
bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (fund ff)
Imaging the slices of the decomposition,
rpln $ map (map (hrsize . snd)) $ qqll $ treesPaths $ hrmult uu1 df hr
"[7500,4765,3406,2564,1569,1130,811,715,651,241,86,52]"
"[7500,4765,3406,2564,1569,1130,811,715,651,241,155,128,121]"
"[7500,4765,3406,2564,1569,1130,811,715,651,410,401,394]"
...
"[7500,2735,1294,310,196,114]"
"[7500,2735,1294,310,196,82]"
"[7500,2735,1294,310,114]"
bmwrite file $ bmvstack $ map (\bm -> bminsert (bmempty (28+2) ((28+2)*13)) 0 0 bm) $ map (bmhstack . map (\(_,hrs) -> bmborder 1 (hrbm 28 1 2 (hrs `hrhrred` vvk)))) $ qqll $ treesPaths $ hrmult uu1 df hr
The decomposition fud is still effective,
let hrqb = hrfmul uu2 ff hrq
size $ hhaa (hrhh uu2 (hrqb `hrhrred` (fder ff))) `mul` eff (hhaa (hrhh uu2 (hrb `hrhrred` (fder ff))))
1000 % 1
Let us apply the 127-fud to the test sample to calculate the accuracy of prediction,
rpln $ take 20 [(aarr (qq `red` vvl), aarr (amax rr)) | let hhq = hrhh uu2 (hrqb `hrhrred` (fder ff `union` vvl)), let aa = hhaa (hrhh uu2 (hrb `hrhrred` (fder ff `union` vvl))), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` fder ff) `red` vvl, size rr > 0]
"([({(digit,7)},1.0)],[({(digit,7)},0.9639175257731959)])"
"([({(digit,0)},1.0)],[({(digit,0)},0.9884726224783862)])"
"([({(digit,9)},1.0)],[({(digit,9)},0.9090909090909091)])"
"([({(digit,3)},1.0)],[({(digit,3)},0.9798387096774194)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.9855769230769231)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.9707792207792207)])"
"([({(digit,7)},1.0)],[({(digit,7)},0.9047619047619048)])"
"([({(digit,7)},1.0)],[({(digit,7)},0.9639175257731959)])"
"([({(digit,7)},1.0)],[({(digit,5)},0.5238095238095238)])"
"([({(digit,3)},1.0)],[({(digit,3)},0.9798387096774194)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.9707792207792207)])"
"([({(digit,8)},1.0)],[({(digit,3)},0.7954545454545454)])"
"([({(digit,5)},1.0)],[({(digit,5)},0.7876106194690266)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.9707792207792207)])"
"([({(digit,6)},1.0)],[({(digit,6)},0.7721518987341772)])"
"([({(digit,9)},1.0)],[({(digit,9)},0.9184782608695652)])"
"([({(digit,4)},1.0)],[({(digit,4)},0.8055555555555556)])"
"([({(digit,4)},1.0)],[({(digit,4)},0.8055555555555556)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.9855769230769231)])"
"([({(digit,1)},1.0)],[({(digit,1)},0.7833333333333333)])"
In this case all 20 queries are effective and the model is correct 18 times,
bmwrite file $ bmvstack $ take 20 $ map (bmhstack . map (bmborder 1 . hrbm 28 1 2)) [[hrev [i] hrqb `hrhrred` vvk, hrb `hrhrsel` qq `hrhrred` vvk] | i <- [0 .. hrsize hrqb - 1], let qq = hrev [i] hrqb `hrhrred` fder ff]
Overall, this model is correct for 70.2% of the test sample,
length [rr | let hhq = hrhh uu2 (hrqb `hrhrred` (fder ff `union` vvl)), let aa = hhaa (hrhh uu2 (hrb `hrhrred` (fder ff `union` vvl))), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` fder ff) `red` vvl, size rr > 0, size (amax rr `mul` (qq `red` vvl)) > 0]
702
This is considerably more accurate than the 12-tuple (60.0%), which is also much less effective (87.6% instead of 100%).
1-tuple 511-fud
Increasing the decomposition fuds cardinality to an expected path length of 9,
let (uu1,df) = decompercondrr vvl uu hr kmax omax 511
card $ dfund df
236
let ff = fromJust $ systemsDecompFudsNullablePracticable uu1 df 1
let uu2 = uu `uunion` (fsys ff)
let hrb = hrfmul uu2 ff hr
hrlent uu2 hrb (fder ff) vvl
0.24275823199986313
bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (fund ff)
Overall, this model is correct for 75% of the test sample,
let hrqb = hrfmul uu2 ff hrq
length [rr | let hhq = hrhh uu2 (hrqb `hrhrred` (fder ff `union` vvl)), let aa = hhaa (hrhh uu2 (hrb `hrhrred` (fder ff `union` vvl))), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` fder ff) `red` vvl, size rr > 0, size (amax rr `mul` (qq `red` vvl)) > 0]
750
2-tuple 15-fud
Now consider a 2-tuple for each fud in a 15-fud decomposition,
let (kmax,omax) = (2, 5)
let (uu1,df) = decompercondrr vvl uu hr kmax omax 15
card $ dfund df
29
rpln $ map (map (\(_,ff) -> fund ff)) $ dfll df
"[{<14,15>,<17,15>},{<21,12>,<23,11>},{<16,13>,<19,13>},{<7,15>,<10,15>}]"
"[{<14,15>,<17,15>},{<21,12>,<23,11>},{<12,15>,<15,15>}]"
"[{<14,15>,<17,15>},{<21,12>,<23,11>},{<10,19>,<17,17>}]"
"[{<14,15>,<17,15>},{<21,13>,<24,12>},{<8,15>,<21,11>},{<13,17>,<14,19>}]"
"[{<14,15>,<17,15>},{<21,13>,<24,12>},{<9,19>,<13,11>}]"
"[{<14,15>,<17,15>},{<19,11>,<24,13>},{<19,13>,<22,13>}]"
"[{<14,15>,<17,15>},{<19,11>,<24,13>},{<11,16>,<11,18>}]"
"[{<14,15>,<17,15>},{<11,11>,<20,11>},{<8,11>,<16,12>}]"
let ff = fromJust $ systemsDecompFudsNullablePracticable uu1 df 1
let uu2 = uu `uunion` (fsys ff)
let hrb = hrfmul uu2 ff hr
hrlent uu2 hrb (fder ff) vvl
1.1776004502299249
bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (fund ff)
The decomposition fud is still effective,
let hrqb = hrfmul uu2 ff hrq
size $ hhaa (hrhh uu2 (hrqb `hrhrred` (fder ff))) `mul` eff (hhaa (hrhh uu2 (hrb `hrhrred` (fder ff))))
1000 % 1
Imaging the slices of the decomposition,
rpln $ map (map (hrsize . snd)) $ qqll $ treesPaths $ hrmult uu1 df hr
"[7500,2250,1008,618]"
"[7500,2250,614]"
"[7500,2250,368]"
"[7500,1913,979,386]"
"[7500,1913,498]"
"[7500,1693,578]"
"[7500,1693,660]"
"[7500,1644,988]"
bmwrite file $ bmvstack $ map (\bm -> bminsert (bmempty (28+2) ((28+2)*4)) 0 0 bm) $ map (bmhstack . map (\(_,hrs) -> bmborder 1 (hrbm 28 1 2 (hrs `hrhrred` vvk)))) $ qqll $ treesPaths $ hrmult uu1 df hr
Overall, this model is correct for 58% of the test sample,
length [rr | let hhq = hrhh uu2 (hrqb `hrhrred` (fder ff `union` vvl)), let aa = hhaa (hrhh uu2 (hrb `hrhrred` (fder ff `union` vvl))), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` fder ff) `red` vvl, size rr > 0, size (amax rr `mul` (qq `red` vvl)) > 0]
580
So the accuracy is little better than the 15-mono-variate-fud (56.7%).
2-tuple 127-fud
Now consider a 2-tuple for each fud in a 127-fud decomposition,
let (uu1,df) = decompercondrr vvl uu hr kmax omax 127
card $ dfund df
153
rpln $ map (map (\(_,ff) -> fund ff)) $ dfll df
"[{<14,15>,<17,15>},{<21,12>,<23,11>},{<16,13>,<19,13>},{<7,15>,<10,15>},{<8,15>,<17,12>},{<6,13>,<15,14>},{<5,14>,<12,13>}]"
"[{<14,15>,<17,15>},{<21,12>,<23,11>},{<16,13>,<19,13>},{<7,15>,<10,15>},{<8,15>,<17,12>},{<14,13>,<16,18>}]"
"[{<14,15>,<17,15>},{<21,12>,<23,11>},{<16,13>,<19,13>},{<7,15>,<10,15>},{<14,13>,<22,8>}]"
...
"[{<14,15>,<17,15>},{<11,11>,<20,11>},{<7,15>,<9,15>},{<13,13>,<22,13>}]"
"[{<14,15>,<17,15>},{<11,11>,<20,11>},{<7,15>,<9,15>},{<11,15>,<22,14>}]"
"[{<14,15>,<17,15>},{<11,11>,<20,11>},{<18,10>,<24,13>}]"
let ff = fromJust $ systemsDecompFudsNullablePracticable uu1 df 1
let uu2 = uu `uunion` (fsys ff)
let hrb = hrfmul uu2 ff hr
hrlent uu2 hrb (fder ff) vvl
0.4608076349580772
bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (fund ff)
The decomposition fud is slightly ineffective,
let hrqb = hrfmul uu2 ff hrq
size $ hhaa (hrhh uu2 (hrqb `hrhrred` (fder ff))) `mul` eff (hhaa (hrhh uu2 (hrb `hrhrred` (fder ff))))
999 % 1
Overall, this model is correct for 74.3% of the test sample,
length [rr | let hhq = hrhh uu2 (hrqb `hrhrred` (fder ff `union` vvl)), let aa = hhaa (hrhh uu2 (hrb `hrhrred` (fder ff `union` vvl))), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` fder ff) `red` vvl, size rr > 0, size (amax rr `mul` (qq `red` vvl)) > 0]
743
So the accuracy is better than the 127-mono-variate-fud (70.2%), but not as good as the 511-mono-variate-fud (75%).
3-tuple 127-fud
Finally, consider tri-variate-fuds,
let (kmax,omax) = (3, 5)
let (uu1,df) = decompercondrr vvl uu hr kmax omax 127
card $ dfund df
207
let ff = fromJust $ systemsDecompFudsNullablePracticable uu1 df 1
let uu2 = uu `uunion` (fsys ff)
let hrb = hrfmul uu2 ff hr
hrlent uu2 hrb (fder ff) vvl
0.268674398120087
bmwrite file $ bmborder 1 $ bmmax hrbmav 0 0 $ hrbm 28 3 2 $ qqhr 2 uu vvk (fund ff)
The decomposition fud is slightly ineffective,
let hrqb = hrfmul uu2 ff hrq
size $ hhaa (hrhh uu2 (hrqb `hrhrred` (fder ff))) `mul` eff (hhaa (hrhh uu2 (hrb `hrhrred` (fder ff))))
994 % 1
Overall, this model is correct for 73.6% of the test sample,
length [rr | let hhq = hrhh uu2 (hrqb `hrhrred` (fder ff `union` vvl)), let aa = hhaa (hrhh uu2 (hrb `hrhrred` (fder ff `union` vvl))), (_,ss) <- hhll hhq, let qq = single ss 1, let rr = aa `mul` (qq `red` fder ff) `red` vvl, size rr > 0, size (amax rr `mul` (qq `red` vvl)) > 0]
736
So the accuracy of the 127-tri-variate-fud is a little worse than that of the 127-bi-variate-fud (74.3%).
Overall, of all of these models the 511-mono-variate-fud has the highest accuracy of 75.0%.
To conclude, a model consisting of only substrate variables can have reasonable accuracy/effectiveness with respect to digit, considering that the relative entropy of a substrate model is always quite low compared to induced models.