AMES - House Prices

Sections

Introduction

Properties of the sample

Predicting sale price without modelling

Induced modelling of sale price

Introduction

The Ames Housing dataset describes the sale of individual residential property in Ames, Iowa from 2006 to 2010. It was compiled by Dean De Cock for use in data science education. Full details of the dataset are in Kaggle Data Set - House Prices: Advanced Regression Techniques.

The dataset contains 1460 events of 80 variables including SalePrice. There is also a test dataset containing 1459 events of 79 variables excluding SalePrice.

Here’s a brief version of what you’ll find in the data description file:

We shall analyse this dataset using the AMES repository which depends on the AlignmentRepa repository. The AlignmentRepa repository is a fast Haskell implementation of some of the practicable inducers described in the paper. The code in this section can be executed by copying and pasting the code into a Haskell interpreter, see README. Also see the Introduction in Notation.

Properties of the sample

First load the training sample $A_{\mathrm{tr}}$ and the test sample $A_{\mathrm{te}}$,

:l AMESDev

(uu,aatr,aate) <- amesIO

size aatr
1460 % 1

size aate
1459 % 1

let vv = uvars uu `minus` sgl (VarStr "Id")
let vvl = sgl (VarStr "SalePrice")
let vvk = vv `minus` vvl

card vv
80

The system is $U$. The sample substrate variables are $V = \mathrm{vars}(A_{\mathrm{tr}}) \setminus \{\mathrm{Id}\}$, the label variables are $V_{\mathrm{l}} = \{\mathrm{SalePrice}\}$, and the query variables form the remainder, $V_{\mathrm{k}} = V \setminus V_{\mathrm{l}}$.

Now create a joint sample on the query variables $A = A_{\mathrm{tr}}\%V_{\mathrm{k}} + A_{\mathrm{te}}\%V_{\mathrm{k}}$,

let aa = (aatr `red` vvk) `add` (aate `red` vvk)

size aa
2919 % 1

So $\mathrm{vars}(A) = V_{\mathrm{k}}$.

The variable valencies are $\{(w,|U_w|) : w \in V\}$,

rpln $ sort [(u,w) | w <- qqll vv, let u = vol uu (sgl w)]
"(2,CentralAir)"
"(2,Street)"
"(3,Alley)"
...
"(10,OverallQual)"
"(10,SaleType)"
"(12,MoSold)"
"(14,PoolArea)"
"(14,TotRmsAbvGrd)"
"(16,Exterior1st)"
"(16,MSSubClass)"
"(17,Exterior2nd)"
"(25,Neighborhood)"
"(31,3SsnPorch)"
"(36,LowQualFinSF)"
"(38,MiscVal)"
"(61,YearRemodAdd)"
"(104,GarageYrBlt)"
"(118,YearBuilt)"
"(121,ScreenPorch)"
"(129,LotFrontage)"
"(183,EnclosedPorch)"
"(252,OpenPorchSF)"
"(273,BsmtFinSF2)"
"(379,WoodDeckSF)"
"(445,MasVnrArea)"
"(604,GarageArea)"
"(635,2ndFlrSF)"
"(663,SalePrice)"
"(992,BsmtFinSF1)"
"(1059,TotalBsmtSF)"
"(1083,1stFlrSF)"
"(1136,BsmtUnfSF)"
"(1292,GrLivArea)"
"(1951,LotArea)"

In order to construct tuples with more than one variable, the valencies of some of the variables with ordered values can be reframed into buckets. Module AMESDev has a function isOrd that determines which variables can be bucketed,

rpln $ sort [(u,w) | w <- qqll vv, isOrd uu w, let u = vol uu (sgl w)]
"(3,HalfBath)"
"(4,BsmtHalfBath)"
...
"(14,PoolArea)"
"(14,TotRmsAbvGrd)"
"(16,MSSubClass)"
"(31,3SsnPorch)"
...
"(1136,BsmtUnfSF)"
"(1292,GrLivArea)"
"(1951,LotArea)"

rpln $ sort [(u,w) | w <- qqll vv, isOrd uu w, let u = vol uu (sgl w), u > 16]
"(31,3SsnPorch)"
"(36,LowQualFinSF)"
"(38,MiscVal)"
"(61,YearRemodAdd)"
"(104,GarageYrBlt)"
"(118,YearBuilt)"
"(121,ScreenPorch)"
"(129,LotFrontage)"
"(183,EnclosedPorch)"
"(252,OpenPorchSF)"
"(273,BsmtFinSF2)"
"(379,WoodDeckSF)"
"(445,MasVnrArea)"
"(604,GarageArea)"
"(635,2ndFlrSF)"
"(663,SalePrice)"
"(992,BsmtFinSF1)"
"(1059,TotalBsmtSF)"
"(1083,1stFlrSF)"
"(1136,BsmtUnfSF)"
"(1292,GrLivArea)"
"(1951,LotArea)"

let vvo = llqq [w | w <- qqll vv, isOrd uu w, let u = vol uu (sgl w), u > 16]

rpln $ aall $ aa `red` sgl (VarStr "3SsnPorch")
"({(3SsnPorch,0)},2882 % 1)"
"({(3SsnPorch,23)},1 % 1)"
"({(3SsnPorch,86)},1 % 1)"
...
"({(3SsnPorch,360)},1 % 1)"
"({(3SsnPorch,407)},1 % 1)"
"({(3SsnPorch,508)},1 % 1)"

rpln $ aall $ aa `red` sgl (VarStr "LotArea")
"({(LotArea,1300)},1 % 1)"
"({(LotArea,1470)},1 % 1)"
"({(LotArea,1476)},1 % 1)"
"({(LotArea,1477)},2 % 1)"
...
"({(LotArea,159000)},1 % 1)"
"({(LotArea,164660)},1 % 1)"
"({(LotArea,215245)},1 % 1)"

Let us determine which variables treat ValStr "null" as a special case,

rpln $ sort [(size bb, w) | w <- qqll vvk, isOrd uu w, let u = vol uu (sgl w), u > 16, let rr = unit (sgl (llss [(w, ValStr "null")])), let bb = aa `red` sgl w `mul` rr, size bb > 0]
"(1 % 1,BsmtFinSF1)"
"(1 % 1,BsmtFinSF2)"
"(1 % 1,BsmtUnfSF)"
"(1 % 1,GarageArea)"
"(1 % 1,TotalBsmtSF)"
"(23 % 1,MasVnrArea)"
"(159 % 1,GarageYrBlt)"
"(486 % 1,LotFrontage)"

rpln $ sort [(size bb, w) | w <- qqll vv, isOrd uu w, let u = vol uu (sgl w), u > 16, let rr = unit (sgl (llss [(w, ValStr "null")])), let bb = aatr `red` sgl w `mul` rr, size bb > 0]
"(8 % 1,MasVnrArea)"
"(81 % 1,GarageYrBlt)"
"(259 % 1,LotFrontage)"

Let us determine which variables treat ValInt 0 as a special case,

rpln $ sort [(size bb, w) | w <- qqll vvk, isOrd uu w, let u = vol uu (sgl w), u > 16, let rr = unit (sgl (llss [(w, ValInt 0)])), let bb = aa `red` sgl w `mul` rr, size bb > 200]
"(241 % 1,BsmtUnfSF)"
"(929 % 1,BsmtFinSF1)"
"(1298 % 1,OpenPorchSF)"
"(1523 % 1,WoodDeckSF)"
"(1668 % 1,2ndFlrSF)"
"(1738 % 1,MasVnrArea)"
"(2460 % 1,EnclosedPorch)"
"(2571 % 1,BsmtFinSF2)"
"(2663 % 1,ScreenPorch)"
"(2816 % 1,MiscVal)"
"(2879 % 1,LowQualFinSF)"
"(2882 % 1,3SsnPorch)"

rpln $ sort [(size bb, w) | w <- qqll vv, isOrd uu w, let u = vol uu (sgl w), u > 16, let rr = unit (sgl (llss [(w, ValInt 0)])), let bb = aatr `red` sgl w `mul` rr, size bb > 100]
"(118 % 1,BsmtUnfSF)"
"(467 % 1,BsmtFinSF1)"
"(656 % 1,OpenPorchSF)"
"(761 % 1,WoodDeckSF)"
"(829 % 1,2ndFlrSF)"
"(861 % 1,MasVnrArea)"
"(1252 % 1,EnclosedPorch)"
"(1293 % 1,BsmtFinSF2)"
"(1344 % 1,ScreenPorch)"
"(1408 % 1,MiscVal)"
"(1434 % 1,LowQualFinSF)"
"(1436 % 1,3SsnPorch)"

let vvoz = llqq [w | w <- qqll vv, isOrd uu w, let u = vol uu (sgl w), u > 16, let rr = unit (sgl (llss [(w, ValInt 0)])), let bb = aatr `red` sgl w `mul` rr, size bb > 100]

card vvo
22

card vvoz
12

There are 22 orderable variables, of which 12 treat ValInt 0 as a special case.

Now let us reframe to valencies of 20,

let xx = Map.fromList $ map (\(v,ww) -> let VarStr s = v in (v, (VarStr (s ++ "B"), ww))) $ [(v, bucket 20 aa v) | v <- qqll (vvo `minus` vvoz)] ++ [(VarStr "SalePrice", bucket 20 aatr (VarStr "SalePrice"))] ++ [(v, bucket 20 aa' v) | v <- qqll vvoz, let rr = unit (sgl (llss [(v, ValInt 0)])), let bb = aa `red` sgl v `mul` rr, let aa' = trim (aa `red` sgl v `sub` bb)]

let aab = reframeb aa xx

let aatrb = reframeb aatr xx

let aateb = reframeb aate xx

let uub = sys aab `uunion` sys aatrb `uunion` sys aateb
let vvb = uvars uub `minus` sgl (VarStr "Id")
let vvbl = sgl (VarStr "SalePriceB")
let vvbk = vvb `minus` vvbl

rpln $ sort [(u,w) | w <- qqll vvb, let u = vol uub (sgl w)]
"(2,CentralAir)"
"(2,Street)"
"(3,Alley)"
"(3,HalfBath)"
"(3,LandSlope)"
"(3,PavedDrive)"
"(3,Utilities)"
"(4,BsmtHalfBath)"
...
"(14,TotRmsAbvGrd)"
"(16,Exterior1st)"
"(16,MSSubClass)"
"(17,Exterior2nd)"
"(18,LotFrontageB)"
"(19,YearRemodAddB)"
...
"(23,ScreenPorchB)"
"(25,Neighborhood)"
"(31,3SsnPorchB)"

rpln $ concat [aall (aab `red` sgl w) | w <- qqll vvbk]

rpln $ concat [aall (aatrb `red` sgl w) | w <- qqll vvbl]

The bucketed system is $U_{\mathrm{b}}$. The bucketed joint sample is $A_{\mathrm{b}}$, the bucketed training sample is $A_{\mathrm{trb}}$ and the bucketed test sample is $A_{\mathrm{teb}}$. The bucketed sample substrate variables are $V_{\mathrm{b}}$, the bucketed label variables are $V_{\mathrm{bl}} = \{\mathrm{SalePriceB}\}$, and the bucketed query variables are $V_{\mathrm{bk}}$.

For convenience, the bucketing is encapsulated in amesBucketedIO in AMESDev,

:l AMESDev

(uub,aab,aatrb,aateb) <- amesBucketedIO 20

let vvb = uvars uub `minus` sgl (VarStr "Id")
let vvbl = sgl (VarStr "SalePriceB")
let vvbk = vvb `minus` vvbl

The mean query bucketed valency, $|V_{\mathrm{b}}^{\mathrm{C}}|^{1/|V_{\mathrm{b}}|}$, is,

exp $ log (fromIntegral (vol uub vvb)) / fromIntegral (card vvb)
8.421852632661576

The label variable dimension, $|V_{\mathrm{bl}}|$, is,

card vvbl
1

The label bucketed variable volume, $|V_{\mathrm{bl}}^{\mathrm{C}}|$, is,

vol uub vvbl
20

The query variable dimension, $|V_{\mathrm{bk}}|$, is,

card vvbk
79

The geometric mean query bucketed valency, $|V_{\mathrm{bk}}^{\mathrm{C}}|^{1/|V_{\mathrm{bk}}|}$, is,

exp $ log (fromIntegral (vol uub vvbk)) / fromIntegral (card vvbk)
8.330151968320083

The bucketed sample size, $\mathrm{size}(A_{\mathrm{b}})$, is

size aab
2919 % 1

Nearly all effective states correspond to exactly one event,

size $ eff aab
2916 % 1

The bucketed training sample size, $\mathrm{size}(A_{\mathrm{trb}})$, is

size aatrb
1460 % 1

All bucketed effective states correspond to exactly one event, $A_{\mathrm{trb}} = A_{\mathrm{trb}}^{\mathrm{F}}$,

size $ eff aatrb
1460 % 1

Now consider how highly aligned variables might be grouped together. See Entropy and alignment. First consider pairs in the substrate, $V_{\mathrm{b}}$, \[ \{(\mathrm{algn}(A_{\mathrm{trb}}\%\{w,x\}),~w,~x) : w,x \in V_{\mathrm{b}},~w < x\} \]

rpln $ reverse $ sort [(algn (aatrb `red` llqq [w,x]),w,x) | w <- qqll vvb, x <- qqll vvb, w < x]
"(2465.515298639463,GarageYrBltB,YearBuiltB)"
"(2152.548583211349,Exterior1st,Exterior2nd)"
"(1978.2349802414053,YearBuiltB,YearRemodAddB)"
"(1858.3580585810525,1stFlrSFB,TotalBsmtSFB)"
"(1724.7876320348048,GarageYrBltB,YearRemodAddB)"
"(1599.6055353604816,HouseStyle,MSSubClass)"
"(1568.0764167598086,1stFlrSFB,GrLivAreaB)"
"(1324.8418738115395,Neighborhood,YearBuiltB)"
"(1142.4421275737332,GarageAreaB,GarageCars)"
"(1058.2972139996386,GarageYrBltB,Neighborhood)"
"(1016.1760916081748,2ndFlrSFB,MSSubClass)"
"(1002.6330778099891,BsmtFinSF1B,BsmtFinType1)"
"(1001.7919977841902,2ndFlrSFB,HouseStyle)"
"(999.2949684838529,GrLivAreaB,TotalBsmtSFB)"
"(997.9601406150259,FireplaceQu,Fireplaces)"
"(959.3978901229884,MasVnrAreaB,MasVnrType)"
"(856.1859130315815,Foundation,YearBuiltB)"
"(837.2461935554879,MSSubClass,YearBuiltB)"
"(835.5545116207313,MSSubClass,Neighborhood)"
"(819.625393006077,Neighborhood,YearRemodAddB)"
"(814.132563549517,GarageAreaB,GarageYrBltB)"
"(802.3462154275726,GrLivAreaB,TotRmsAbvGrd)"
"(797.7360838914828,BldgType,MSSubClass)"
"(773.9597427192747,BsmtUnfSFB,TotalBsmtSFB)"
"(763.3659578029506,Exterior1st,YearBuiltB)"
"(756.7870691470594,OverallQual,SalePriceB)"
"(741.9020600393133,Exterior2nd,YearBuiltB)"
"(707.2717828684083,Neighborhood,SalePriceB)"
"(700.6161119670701,BsmtQual,YearBuiltB)"
"(691.933071655274,LotAreaB,LotFrontageB)"
"(659.1431610097475,Exterior1st,Neighborhood)"
"(658.2352622127469,Exterior2nd,Neighborhood)"
"(651.455079669025,GarageCars,GarageYrBltB)"
"(646.8060494318734,GrLivAreaB,SalePriceB)"
"(638.3458187498586,Foundation,GarageYrBltB)"
"(633.2998965204324,Foundation,Neighborhood)"
...

We can see that some of the variables that are in highly aligned pairs are also in other highly aligned pairs, e.g. YearBuiltB or Neighborhood. This suggests that we should also consider tuple dimensions greater than two.

Now consider using the tupler to group together highly aligned variables in the substrate, $V_{\mathrm{b}}$. Note that for performance reasons we must first construct a HistoryRepa from the sample histogram, $A_{\mathrm{trb}}$. See History and HistoryRepa.

First consider the tuple dimension by choosing a volume limit, xmax,

8.330151968320083 ** 3
578.041172320828

8.421852632661576 ** 3
597.341809663622

25*31
775

2*2*3*3*3*3*3
972

size aatrb
1460 % 1

size aab
2919 % 1

2*2*3*3*3*3*3*4
3888

8.330151968320083 ** 4
4815.170809378394

8.421852632661576 ** 4
5030.724692314405

Now create a shuffled sample, $A_{\mathrm{trbr}}$,

let hhtrb = aahr uub aatrb `hrhrred` vvb

let hhtrbr = historyRepasShuffle_u hhtrb 1

hrsize hhtrbr
1460

Now optimise the shuffle content alignment with the tuple set builder, $I_{P,U,\mathrm{B,ns,me}}$, \[ \{(\mathrm{algn}(A_{\mathrm{trb}}\%K)-\mathrm{algn}(A_{\mathrm{trbr}}\%K),~K) : ((K,\cdot,\cdot),\cdot) \in I_{P,U,\mathrm{B,ns,me}}^{ * }((V_{\mathrm{b}},~\emptyset,~A_{\mathrm{trb}},~A_{\mathrm{trbr}}))\} \]

let buildtup xmax omax bmax uu vv xx xxrr = reverse $ sort $ map (\((kk,_),_) -> (algn (araa uu (xx `hrred` kk)) - algn (araa uu (xxrr `hrred` kk)), kk)) $ parametersSystemsBuilderTupleNoSumlayerMultiEffectiveRepa_u xmax omax bmax 1 uu vv fudEmpty xx (hrhx xx) xxrr (hrhx xxrr)

rpln $ buildtup 1460 10 10 uub vvb hhtrb hhtrbr 
"(2316.315511707224,{GarageYrBltB,YearBuiltB})"
"(2313.8306050502806,{GarageYrBltB,Utilities,YearBuiltB})"
"(2305.794903451247,{GarageYrBltB,Street,YearBuiltB})"
"(2291.428871557963,{GarageYrBltB,PavedDrive,YearBuiltB})"
"(2290.4571539977837,{CentralAir,GarageYrBltB,YearBuiltB})"
"(2255.670566496886,{Alley,GarageYrBltB,YearBuiltB})"
"(2246.4146208143807,{BldgType,CentralAir,HouseStyle,MSSubClass})"
"(2220.4007854993206,{GarageYrBltB,LandSlope,YearBuiltB})"
"(2200.04968946938,{BldgType,HouseStyle,MSSubClass})"
"(2197.238473800078,{BldgType,HouseStyle,MSSubClass,Street})"

Now optimise again having removed the top tuple from the substrate, \[ Q_1~=~\{\mathrm{GarageYrBltB},~\mathrm{YearBuiltB}\} \] and \[ \{(\mathrm{algn}(A_{\mathrm{trb}}\%K)-\mathrm{algn}(A_{\mathrm{trbr}}\%K),~K) : ((K,\cdot,\cdot),\cdot) \in I_{P,U,\mathrm{B,ns,me}}^{ * }((V_{\mathrm{b}} \setminus Q_1,~\emptyset,~A_{\mathrm{trb}},~A_{\mathrm{trbr}}))\} \]

let qq1 = llqq $ map VarStr ["GarageYrBltB","YearBuiltB"]

rpln $ buildtup 1460 10 10 uub (vvb `minus` qq1) hhtrb hhtrbr 
"(2246.4146208143807,{BldgType,CentralAir,HouseStyle,MSSubClass})"
"(2200.04968946938,{BldgType,HouseStyle,MSSubClass})"
"(2197.238473800078,{BldgType,HouseStyle,MSSubClass,Street})"
"(2183.811196265153,{ExterQual,Exterior1st,Exterior2nd})"
"(2166.928021319717,{BsmtQual,Exterior1st,Exterior2nd})"
"(2161.2201036076463,{Exterior1st,Exterior2nd,HeatingQC})"
"(2129.808404829695,{CentralAir,Exterior1st,Exterior2nd})"
"(2127.747951702249,{Exterior1st,Exterior2nd,FullBath})"
"(2115.3207115468913,{Exterior1st,Exterior2nd,PoolQC})"
"(2113.5555998624536,{CentralAir,Exterior1st,Exterior2nd,Street})"

Now optimise again having removed the top two tuples from the substrate, \[ Q_2~=~\{\mathrm{BldgType},~\mathrm{CentralAir},~\mathrm{HouseStyle},~\mathrm{MSSubClass},~\mathrm{Street}\} \] and \[ \{(\mathrm{algn}(A_{\mathrm{trb}}\%K)-\mathrm{algn}(A_{\mathrm{trbr}}\%K),~K) : ((K,\cdot,\cdot),\cdot) \in I_{P,U,\mathrm{B,ns,me}}^{ * }((V_{\mathrm{b}} \setminus Q_1 \setminus Q_2,~\emptyset,~A_{\mathrm{trb}},~A_{\mathrm{trbr}}))\} \]

let qq2 = llqq $ map VarStr ["BldgType","CentralAir","HouseStyle","MSSubClass","Street"]

rpln $ buildtup 1460 10 10 uub (vvb `minus` qq1 `minus` qq2) hhtrb hhtrbr 
"(2183.811196265153,{ExterQual,Exterior1st,Exterior2nd})"
"(2166.928021319717,{BsmtQual,Exterior1st,Exterior2nd})"
"(2161.2201036076463,{Exterior1st,Exterior2nd,HeatingQC})"
"(2127.747951702249,{Exterior1st,Exterior2nd,FullBath})"
"(2115.3207115468913,{Exterior1st,Exterior2nd,PoolQC})"
"(2111.679949111417,{Exterior1st,Exterior2nd})"
"(2110.8117080771854,{Exterior1st,Exterior2nd,Utilities})"
"(2109.6259531428923,{Exterior1st,Exterior2nd,KitchenQual})"
"(2106.512685544173,{Exterior1st,Exterior2nd,PavedDrive})"
"(2095.465720020596,{Exterior1st,Exterior2nd,KitchenAbvGr})"

Then continue in the same manner,

let qq3 = llqq $ map VarStr ["ExterQual","Exterior1st","Exterior2nd"]

rpln $ buildtup 1460 10 10 uub (vvb `minus` qq1 `minus` qq2 `minus` qq3) hhtrb hhtrbr 
"(1669.5721086734302,{1stFlrSFB,TotalBsmtSFB,Utilities})"
"(1668.4734963865526,{1stFlrSFB,TotalBsmtSFB})"
"(1600.7415395948117,{1stFlrSFB,LandSlope,TotalBsmtSFB})"
"(1594.7851150364495,{1stFlrSFB,Alley,TotalBsmtSFB})"
"(1592.3609889521056,{1stFlrSFB,PavedDrive,TotalBsmtSFB})"
"(1493.050655091204,{1stFlrSFB,GrLivAreaB,HalfBath})"
"(1474.3755703469872,{1stFlrSFB,HalfBath,TotalBsmtSFB})"
"(1417.4607257057366,{GarageAreaB,GarageCars,GarageFinish})"
"(1405.5506399722233,{1stFlrSFB,GrLivAreaB})"
"(1403.4965162313727,{1stFlrSFB,GrLivAreaB,Utilities})"

let qq4 = llqq $ map VarStr ["1stFlrSFB","TotalBsmtSFB","Utilities"]

rpln $ buildtup 1460 10 10 uub (vvb `minus` qq1 `minus` qq2 `minus` qq3 `minus` qq4) hhtrb hhtrbr 
"(1417.4607257057366,{GarageAreaB,GarageCars,GarageFinish})"
"(1347.9841955296451,{GarageAreaB,GarageCars,GarageType})"
"(1327.4506489735613,{GarageAreaB,GarageCars,GarageQual})"
"(1322.2211279162511,{GarageAreaB,GarageCars,GarageCond})"
"(1298.4977153135874,{BsmtFinSF1B,BsmtFinType1,BsmtFullBath})"
"(1278.8631650257093,{BsmtQual,GarageAreaB,GarageCars})"
"(1272.2758863750464,{FullBath,GarageAreaB,GarageCars})"
"(1233.8875091931845,{Foundation,GarageAreaB,GarageCars})"
"(1222.775134509769,{GarageAreaB,GarageCars,KitchenQual})"
"(1191.4337066548173,{BsmtQual,Foundation,Neighborhood})"

let qq5 = llqq $ map VarStr ["GarageAreaB","GarageCars","GarageFinish","GarageType","GarageQual","GarageCond"]

rpln $ buildtup 1460 10 10 uub (vvb `minus` qq1 `minus` qq2 `minus` qq3 `minus` qq4 `minus` qq5) hhtrb hhtrbr 
"(1480.5607543928045,{BsmtQual,FireplaceQu,Fireplaces,Foundation})"
"(1363.1306197141262,{BsmtQual,FireplaceQu,Fireplaces,KitchenQual})"
"(1343.465447221784,{BsmtQual,FireplaceQu,Fireplaces,FullBath})"
"(1340.2216196435274,{BsmtFinType1,BsmtQual,FireplaceQu,Fireplaces})"
"(1315.7699835653266,{FireplaceQu,Fireplaces,Foundation,KitchenQual})"
"(1298.4977153135874,{BsmtFinSF1B,BsmtFinType1,BsmtFullBath})"
"(1257.6592538930654,{FireplaceQu,Fireplaces,FullBath,KitchenQual})"
"(1241.580040607204,{FireplaceQu,Fireplaces,HeatingQC,KitchenQual})"
"(1219.9731711581449,{BsmtExposure,BsmtQual,FireplaceQu,Fireplaces})"
"(1213.2068465014286,{BsmtCond,BsmtQual,FireplaceQu,Fireplaces})"

let qq6 = llqq $ map VarStr ["BsmtQual","FireplaceQu","Fireplaces","Foundation","KitchenQual","FullBath","BsmtFinType1","BsmtFinType1","BsmtFullBath","BsmtExposure","BsmtCond"]

rpln $ buildtup 1460 10 10 uub (vvb `minus` qq1 `minus` qq2 `minus` qq3 `minus` qq4 `minus` qq5 `minus` qq6) hhtrb hhtrbr 
"(1014.0619543251155,{MasVnrAreaB,MasVnrType,OverallQual})"
"(1012.1995431451023,{LandContour,LandSlope,MasVnrAreaB,MasVnrType})"
"(979.545657383569,{MasVnrAreaB,MasVnrType,SaleCondition})"
"(974.4469729408697,{BsmtHalfBath,HalfBath,MasVnrAreaB,MasVnrType})"
"(971.9728576112584,{HalfBath,KitchenAbvGr,MasVnrAreaB,MasVnrType})"
"(970.0267238854267,{HalfBath,MasVnrAreaB,MasVnrType})"
"(969.9208402814256,{HalfBath,MasVnrAreaB,MasVnrType,PoolQC})"
"(962.2837560307116,{MasVnrAreaB,MasVnrType,RoofStyle})"
"(958.1312724762402,{MasVnrAreaB,MasVnrType,SaleType})"
"(949.8533795660628,{HalfBath,LandSlope,MasVnrAreaB,MasVnrType})"

let qq7 = llqq $ map VarStr ["MasVnrAreaB","MasVnrType","OverallQual","LandContour","LandSlope","SaleCondition","BsmtHalfBath","HalfBath","KitchenAbvGr","PoolQC","RoofStyle","SaleType"]

rpln $ buildtup 1460 10 10 uub (vvb `minus` qq1 `minus` qq2 `minus` qq3 `minus` qq4 `minus` qq5 `minus` qq6 `minus` qq7) hhtrb hhtrbr 
"(744.5508884697301,{Alley,MSZoning,Neighborhood,PavedDrive})"
"(733.8519375885294,{HeatingQC,MSZoning,Neighborhood})"
"(733.8247451336943,{Alley,GrLivAreaB,TotRmsAbvGrd})"
"(727.4519230357951,{GrLivAreaB,TotRmsAbvGrd})"
"(713.716849074829,{MSZoning,Neighborhood,OverallCond})"
"(712.687723291684,{GrLivAreaB,PavedDrive,TotRmsAbvGrd})"
"(676.5188135228077,{Alley,Neighborhood,YearRemodAddB})"
"(674.1384098338397,{Neighborhood,PavedDrive,YearRemodAddB})"
"(665.1908299353995,{Alley,MSZoning,Neighborhood})"
"(658.3768159368594,{Neighborhood,YearRemodAddB})"

card $ vvb `minus` qq1 `minus` qq2 `minus` qq3 `minus` qq4 `minus` qq5 `minus` qq6 `minus` qq7
39

After this selection of 7 tuples there are 39 less closely aligned variables remaining.

That is, there is a possible partition of the substrate as follows, $\bigcup\{Q_1,~Q_2,~Q_3,~Q_4,~Q_5,~Q_6,~Q_7,~V_{\mathrm{b}} \setminus \{Q_1,Q_2,Q_3,Q_4,Q_5,Q_6,Q_7\}\} = V_{\mathrm{b}}$,

rp qq1 
"{GarageYrBltB,YearBuiltB}"

rp qq2
"{BldgType,CentralAir,HouseStyle,MSSubClass,Street}"

rp qq3
"{ExterQual,Exterior1st,Exterior2nd}"

rp qq4
"{1stFlrSFB,TotalBsmtSFB,Utilities}"

rp qq5
"{GarageAreaB,GarageCars,GarageCond,GarageFinish,GarageQual,GarageType}"

rp qq6
"{BsmtCond,BsmtExposure,BsmtFinType1,BsmtFullBath,BsmtQual,FireplaceQu,Fireplaces,Foundation,FullBath,KitchenQual}"

rp qq7
"{BsmtHalfBath,HalfBath,KitchenAbvGr,LandContour,LandSlope,MasVnrAreaB,MasVnrType,OverallQual,PoolQC,RoofStyle,SaleCondition,SaleType}"

rp $ vvb `minus` qq1 `minus` qq2 `minus` qq3 `minus` qq4 `minus` qq5 `minus` qq6 `minus` qq7
"{2ndFlrSFB,3SsnPorchB,Alley,BedroomAbvGr,BsmtFinSF1B,BsmtFinSF2B,BsmtFinType2,BsmtUnfSFB,Condition1,Condition2,Electrical,EnclosedPorchB,ExterCond,Fence,Functional,GrLivAreaB,Heating,HeatingQC,LotAreaB,LotConfig,LotFrontageB,LotShape,LowQualFinSFB,MSZoning,MiscFeature,MiscValB,MoSold,Neighborhood,OpenPorchSFB,OverallCond,PavedDrive,PoolArea,RoofMatl,SalePriceB,ScreenPorchB,TotRmsAbvGrd,WoodDeckSFB,YearRemodAddB,YrSold}"

Predicting sale price without modelling

The sample query variables predict edibility. That is, there is a functional or causal relationship between the query variables and the label variables, $(A_{\mathrm{trb}}\%V_{\mathrm{bk}})^{\mathrm{FS}} \to (A_{\mathrm{trb}}\%V_{\mathrm{bl}})^{\mathrm{FS}}$. So the label entropy or query conditional entropy is zero. See Entropy and alignment. The label entropy is \[ \begin{eqnarray} \mathrm{lent}(A,W,L)~:=~\mathrm{entropy}(A~\%~(W \cup L)) - \mathrm{entropy}(A~\%~W) \end{eqnarray} \]

let lent aa ww ll = ent (aa `red` (ww `union` ll)) - ent (aa `red` ww)

Then $\mathrm{lent}(A_{\mathrm{trb}},V_{\mathrm{bk}},V_{\mathrm{bl}}) = 0$,

lent aatrb vvbk vvbl
0.0

We can determine which of the query variables has the least conditional entropy, \[ \begin{eqnarray} \{(\mathrm{lent}(A_{\mathrm{trb}},\{w\},V_{\mathrm{bl}}),~w) : w \in V_{\mathrm{bk}}\} \end{eqnarray} \]

rpln $ sort [(lent aatrb (sgl w) vvbl, w) | w <- qqll vvbk]
"(2.3688094014030585,Neighborhood)"
"(2.401775936638896,OverallQual)"
"(2.438375883488403,GrLivAreaB)"
"(2.505387552265462,GarageAreaB)"
"(2.5141550725111537,TotalBsmtSFB)"
"(2.5286425658331457,YearBuiltB)"
"(2.575379618536724,GarageYrBltB)"
"(2.5772799825200803,1stFlrSFB)"
"(2.6072098566399333,GarageCars)"
"(2.626737288605211,YearRemodAddB)"
"(2.633387515941217,MSSubClass)"
"(2.652820702365173,BsmtQual)"
"(2.660976881941984,2ndFlrSFB)"
"(2.662266429504159,ExterQual)"
...
"(2.9763251921097145,LandSlope)"
"(2.980341704620559,PoolArea)"
"(2.9844981605029184,PoolQC)"
"(2.9889645372889797,Street)"
"(2.9928053874878207,Utilities)"

This may be compared to the entropy of the label variables, $\mathrm{entropy}(A_{\mathrm{trb}}\%V_{\mathrm{bl}})$,

ent $ aatrb `red` vvbl
2.9948072760546887

Utilities has the highest conditional entropy, and so makes very little prediction of sale price. Neighborhood has the least conditional entropy, and so is more predictive of sale price. Its label entropy is $\mathrm{lent}(A_{\mathrm{trb}},\{\mathrm{Neighborhood}\},V_{\mathrm{bl}})$,

let vNeighborhood = VarStr "Neighborhood"

lent aatrb (sgl vNeighborhood) vvbl
2.3688094014030585

Let us reduce the sample, $A_{\mathrm{trb}}~\%~(\{\mathrm{Neighborhood}\} \cup V_{\mathrm{bl}})$, to see the relationship,

rpln $ aall $ aatrb `red` (sgl vNeighborhood `union` vvbl)
"({(Neighborhood,Blmngtn),(SalePriceB,163000)},2 % 1)"
"({(Neighborhood,Blmngtn),(SalePriceB,172500)},2 % 1)"
"({(Neighborhood,Blmngtn),(SalePriceB,179200)},3 % 1)"
...
"({(Neighborhood,Veenker),(SalePriceB,278000)},1 % 1)"
"({(Neighborhood,Veenker),(SalePriceB,326000)},2 % 1)"
"({(Neighborhood,Veenker),(SalePriceB,755000)},1 % 1)"

rpln $ qqll $ ssplit vvbk $ states (aatrb `red` (sgl vNeighborhood `union` vvbl))
"({(Neighborhood,Blmngtn)},{(SalePriceB,163000)})"
"({(Neighborhood,Blmngtn)},{(SalePriceB,172500)})"
"({(Neighborhood,Blmngtn)},{(SalePriceB,179200)})"
...
"({(Neighborhood,Veenker)},{(SalePriceB,278000)})"
"({(Neighborhood,Veenker)},{(SalePriceB,326000)})"
"({(Neighborhood,Veenker)},{(SalePriceB,755000)})"

We can determine minimum subsets of the query variables that are causal or predictive by using the repa conditional entropy tuple set builder. We shall also calculate the shuffle content derived alignment and the size-volume-sized-shuffle relative entropy. \[ \{(\mathrm{lent}(A_{\mathrm{trb}},M,V_{\mathrm{bl}}),~M) : M \in \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,A_{\mathrm{trb}},\mathrm{L}}))\} \]

let buildcondrr vvl aa kmax omax qmax = sort $ map (\(a,b) -> (b,a)) $ Map.toList $ fromJust $ parametersBuilderConditionalVarsRepa kmax omax qmax vvl aa

let (kmax,omax,qmax) = (1, 60, 10)

let ll = buildcondrr vvbl hhtrb kmax omax qmax

rpln ll
"(2.3688094014030585,{Neighborhood})"
"(2.401775936638896,{OverallQual})"
"(2.438375883488403,{GrLivAreaB})"
"(2.505387552265462,{GarageAreaB})"
"(2.5141550725111537,{TotalBsmtSFB})"
"(2.5286425658331457,{YearBuiltB})"
"(2.575379618536724,{GarageYrBltB})"
"(2.5772799825200803,{1stFlrSFB})"
"(2.6072098566399333,{GarageCars})"
"(2.626737288605211,{YearRemodAddB})"

Let us sort by shuffle content derived alignment descending. Let $L = \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,A_{\mathrm{trb}},\mathrm{L}}))$. Then calculate \[ \{(\mathrm{algn}(A_{\mathrm{trb}}\%X)-\mathrm{algn}(A_{\mathrm{trbr}}\%X),~X) : (e,X) \in L\} \]

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub (hhtrb `hrhrred` xx)), let aar' = hhaa (hrhh uub (hhtrbr `hrhrred` xx))] 
"(0.0,{YearRemodAddB})"
"(0.0,{YearBuiltB})"
"(0.0,{TotalBsmtSFB})"
"(0.0,{OverallQual})"
"(0.0,{Neighborhood})"
"(0.0,{GrLivAreaB})"
"(0.0,{GarageYrBltB})"
"(0.0,{GarageCars})"
"(0.0,{GarageAreaB})"
"(0.0,{1stFlrSFB})"

and by size-volume-sized-shuffle relative entropy descending, \[ \{(\mathrm{rent}(A_{\mathrm{trb}}~\%~X,~Z_X * \hat{A}_{\mathrm{trbr}}~\%~X),~X) : (e,X) \in L\} \] where $Z_X = \mathrm{scalar}(|X^{\mathrm{C}}|)$,

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub (hhtrb `hrhrred` xx)), let vaar' = vsize uub xx (hhaa (hrhh uub (hhtrbr `hrhrred` xx)))] 
"(3.694822225952521e-13,{GrLivAreaB})"
"(1.7763568394002505e-13,{GarageAreaB})"
"(1.7763568394002505e-13,{1stFlrSFB})"
"(1.2789769243681803e-13,{GarageYrBltB})"
"(2.842170943040401e-14,{YearBuiltB})"
"(2.6645352591003757e-15,{GarageCars})"
"(-2.2382096176443156e-13,{OverallQual})"
"(-3.765876499528531e-13,{TotalBsmtSFB})"
"(-3.979039320256561e-13,{Neighborhood})"
"(-6.323830348264892e-13,{YearRemodAddB})"

Choose a tuple $X$ with the maximum relative entropy,

let xx = llqq $ map VarStr ["GrLivAreaB"]

card xx
1

The label entropy, $\mathrm{lent}(A_{\mathrm{trb}},X,V_{\mathrm{bl}})$, is,

lent aatrb xx vvbl
2.438375883488403

This tuple has a volume of $|X^{\mathrm{C}}| = 21$,

vol uub xx
21

Now consider the query effectiveness against the test set, $\mathrm{size}(A_{\mathrm{teb}} * (A_{\mathrm{trb}}\%X)^{\mathrm{F}})$,

size $ aateb `mul` eff (hhaa (hrhh uub (hhtrb `hrhrred` xx)))
1459 % 1

So there exists a prediction for each of the test set for the mono-variate tuple.

let (kmax,omax,qmax) = (2, 60, 10)

let ll = buildcondrr vvbl hhtrb kmax omax qmax

rpln ll
"(1.172599721455306,{GarageYrBltB,GrLivAreaB})"
"(1.1763717756587901,{GrLivAreaB,YearBuiltB})"
"(1.1825182247882235,{BsmtUnfSFB,GarageAreaB})"
"(1.1902468919522589,{GarageAreaB,GrLivAreaB})"
"(1.1963275646526261,{1stFlrSFB,GarageYrBltB})"
"(1.200771002911111,{GarageYrBltB,TotalBsmtSFB})"
"(1.2044059887997252,{BsmtUnfSFB,GrLivAreaB})"
"(1.208066469609002,{1stFlrSFB,YearBuiltB})"
"(1.2181093531815002,{BsmtUnfSFB,LotAreaB})"
"(1.218337507644864,{1stFlrSFB,GarageAreaB})"

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub (hhtrb `hrhrred` xx)), let aar' = hhaa (hrhh uub (hhtrbr `hrhrred` xx))] 
"(236.15925458380707,{GarageAreaB,GrLivAreaB})"
"(208.98810169704188,{1stFlrSFB,GarageAreaB})"
"(169.3991581584039,{1stFlrSFB,YearBuiltB})"
"(168.73888512471535,{GrLivAreaB,YearBuiltB})"
"(166.33764054037806,{BsmtUnfSFB,GrLivAreaB})"
"(155.24622232457887,{GarageYrBltB,GrLivAreaB})"
"(152.22632381386984,{GarageYrBltB,TotalBsmtSFB})"
"(151.7383625410605,{1stFlrSFB,GarageYrBltB})"
"(78.02237481803604,{BsmtUnfSFB,GarageAreaB})"
"(45.72850378493081,{BsmtUnfSFB,LotAreaB})"

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub (hhtrb `hrhrred` xx)), let vaar' = vsize uub xx (hhaa (hrhh uub (hhtrbr `hrhrred` xx)))] 
"(202.14855617164312,{BsmtUnfSFB,GrLivAreaB})"
"(188.51396452689005,{GrLivAreaB,YearBuiltB})"
"(181.96397603121977,{GarageAreaB,GrLivAreaB})"
"(179.13967532324114,{1stFlrSFB,YearBuiltB})"
"(174.04009467586593,{1stFlrSFB,GarageAreaB})"
"(172.92626885681875,{GarageYrBltB,TotalBsmtSFB})"
"(161.41828778067747,{1stFlrSFB,GarageYrBltB})"
"(153.1830382871035,{GarageYrBltB,GrLivAreaB})"
"(152.9069696762781,{BsmtUnfSFB,LotAreaB})"
"(145.5538173855398,{BsmtUnfSFB,GarageAreaB})"

let xx = llqq $ map VarStr ["BsmtUnfSFB","GrLivAreaB"]

card xx
2

lent aatrb xx vvbl
1.2044059887997252

vol uub xx
462

size $ aateb `mul` eff (hhaa (hrhh uub (hhtrb `hrhrred` xx)))
1395 % 1

1459 - 1395
64

In the case of the bi-variate tuple with the highest relative entropy, the query on the test set is ineffective for 64 events.


let (kmax,omax,qmax) = (3, 60, 10)

let ll = buildcondrr vvbl hhtrb kmax omax qmax

rpln ll
"(0.15839025938626694,{BsmtUnfSFB,GarageAreaB,LotAreaB})"
"(0.1632160815826591,{BsmtUnfSFB,GrLivAreaB,LotAreaB})"
"(0.17534692586875789,{BsmtUnfSFB,GarageYrBltB,LotAreaB})"
"(0.17551868584769093,{GarageAreaB,GrLivAreaB,LotAreaB})"
"(0.18785610344573467,{BsmtUnfSFB,GarageYrBltB,GrLivAreaB})"
"(0.19170918418883343,{BsmtUnfSFB,GarageAreaB,GrLivAreaB})"
"(0.19209840013119717,{GarageYrBltB,GrLivAreaB,LotAreaB})"
"(0.1925880641418427,{BsmtUnfSFB,LotAreaB,YearRemodAddB})"
"(0.19796856294348153,{1stFlrSFB,GarageAreaB,LotAreaB})"
"(0.2007710037546424,{BsmtUnfSFB,LotAreaB,YearBuiltB})"

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub (hhtrb `hrhrred` xx)), let aar' = hhaa (hrhh uub (hhtrbr `hrhrred` xx))] 
"(132.21930339837297,{1stFlrSFB,GarageAreaB,LotAreaB})"
"(124.43940524717561,{BsmtUnfSFB,GarageAreaB,GrLivAreaB})"
"(107.27129951600762,{BsmtUnfSFB,GarageYrBltB,GrLivAreaB})"
"(107.07322956414316,{GarageYrBltB,GrLivAreaB,LotAreaB})"
"(103.84418822087514,{GarageAreaB,GrLivAreaB,LotAreaB})"
"(103.36860177131007,{BsmtUnfSFB,LotAreaB,YearBuiltB})"
"(88.94491385166089,{BsmtUnfSFB,GrLivAreaB,LotAreaB})"
"(71.75931961875256,{BsmtUnfSFB,LotAreaB,YearRemodAddB})"
"(67.96817687341434,{BsmtUnfSFB,GarageAreaB,LotAreaB})"
"(67.78585532601016,{BsmtUnfSFB,GarageYrBltB,LotAreaB})"

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub (hhtrb `hrhrred` xx)), let vaar' = vsize uub xx (hhaa (hrhh uub (hhtrbr `hrhrred` xx)))] 
"(3816.431521439241,{BsmtUnfSFB,GrLivAreaB,LotAreaB})"
"(3746.3421233849076,{BsmtUnfSFB,GarageYrBltB,LotAreaB})"
"(3723.0855617410125,{BsmtUnfSFB,GarageYrBltB,GrLivAreaB})"
"(3697.5598221442633,{BsmtUnfSFB,LotAreaB,YearBuiltB})"
"(3693.4212002672284,{BsmtUnfSFB,GarageAreaB,GrLivAreaB})"
"(3662.6668429876154,{BsmtUnfSFB,GarageAreaB,LotAreaB})"
"(3654.8591105367377,{GarageYrBltB,GrLivAreaB,LotAreaB})"
"(3555.8442521451507,{GarageAreaB,GrLivAreaB,LotAreaB})"
"(3539.1504964539054,{1stFlrSFB,GarageAreaB,LotAreaB})"
"(3391.7796869104423,{BsmtUnfSFB,LotAreaB,YearRemodAddB})"

let xx = llqq $ map VarStr ["BsmtUnfSFB","GrLivAreaB","LotAreaB"]

card xx
3

lent aatrb xx vvbl
0.1632160815826591

vol uub xx
9702

size $ aateb `mul` eff (hhaa (hrhh uub (hhtrb `hrhrred` xx)))
369 % 1

In the case of the tri-variate tuple with the highest relative entropy, the query on the test set is effective for only 369 events.

let xx = llqq $ map VarStr ["1stFlrSFB","GarageAreaB","LotAreaB"]

card xx
3

lent aatrb xx vvbl
0.19796856294348153

vol uub xx
9261

size $ aateb `mul` eff (hhaa (hrhh uub (hhtrb `hrhrred` xx)))
436 % 1

The tri-variate tuple with the highest content alignment is effective for only 436 events.


let (kmax,omax,qmax) = (4, 60, 10)

let ll = buildcondrr vvbl hhtrb kmax omax qmax

rpln ll
"(1.7264363040669473e-2,{BsmtUnfSFB,GrLivAreaB,LotAreaB,MoSold})"
"(2.1653557329598172e-2,{GarageAreaB,GrLivAreaB,LotAreaB,MoSold})"
"(2.5045822967734388e-2,{GarageYrBltB,GrLivAreaB,LotAreaB,MoSold})"
"(2.5171473700730473e-2,{BsmtUnfSFB,GrLivAreaB,MoSold,YearBuiltB})"
"(2.6586467199564368e-2,{BsmtUnfSFB,GarageYrBltB,LotAreaB,MoSold})"
"(2.8611151303957527e-2,{1stFlrSFB,GarageYrBltB,LotAreaB,MoSold})"
"(2.9016952408618124e-2,{GrLivAreaB,LotAreaB,MoSold,YearRemodAddB})"
"(2.9142603141615098e-2,{1stFlrSFB,LotAreaB,MoSold,YearRemodAddB})"
"(2.9422753513281386e-2,{BsmtUnfSFB,GarageAreaB,GrLivAreaB,LotAreaB})"
"(2.943501725666131e-2,{BsmtUnfSFB,LotAreaB,MoSold,YearBuiltB})"

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub (hhtrb `hrhrred` xx)), let aar' = hhaa (hrhh uub (hhtrbr `hrhrred` xx))] 
"(37.926153529930275,{BsmtUnfSFB,GarageAreaB,GrLivAreaB,LotAreaB})"
"(24.718860935107273,{GrLivAreaB,LotAreaB,MoSold,YearRemodAddB})"
"(19.64368712646933,{1stFlrSFB,GarageYrBltB,LotAreaB,MoSold})"
"(18.83275691405754,{GarageYrBltB,GrLivAreaB,LotAreaB,MoSold})"
"(17.276563509988023,{BsmtUnfSFB,GrLivAreaB,MoSold,YearBuiltB})"
"(16.518877798785525,{1stFlrSFB,LotAreaB,MoSold,YearRemodAddB})"
"(13.798405087084234,{GarageAreaB,GrLivAreaB,LotAreaB,MoSold})"
"(12.999897402574106,{BsmtUnfSFB,LotAreaB,MoSold,YearBuiltB})"
"(11.613603041230476,{BsmtUnfSFB,GarageYrBltB,LotAreaB,MoSold})"
"(11.431281470602812,{BsmtUnfSFB,GrLivAreaB,LotAreaB,MoSold})"

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub (hhtrb `hrhrred` xx)), let vaar' = vsize uub xx (hhaa (hrhh uub (hhtrbr `hrhrred` xx)))] 
"(8604.005120763555,{BsmtUnfSFB,GarageAreaB,GrLivAreaB,LotAreaB})"
"(7754.46681838599,{BsmtUnfSFB,GrLivAreaB,LotAreaB,MoSold})"
"(7723.530448965263,{BsmtUnfSFB,GarageYrBltB,LotAreaB,MoSold})"
"(7705.397790408111,{BsmtUnfSFB,LotAreaB,MoSold,YearBuiltB})"
"(7696.605216956232,{BsmtUnfSFB,GrLivAreaB,MoSold,YearBuiltB})"
"(7647.940380570362,{GarageAreaB,GrLivAreaB,LotAreaB,MoSold})"
"(7641.228231311194,{GarageYrBltB,GrLivAreaB,LotAreaB,MoSold})"
"(7629.860702232574,{1stFlrSFB,GarageYrBltB,LotAreaB,MoSold})"
"(7455.700517320074,{GrLivAreaB,LotAreaB,MoSold,YearRemodAddB})"
"(7442.036664302228,{1stFlrSFB,LotAreaB,MoSold,YearRemodAddB})"

let xx = llqq $ map VarStr ["BsmtUnfSFB","GarageAreaB","GrLivAreaB","LotAreaB"]

card xx
4

lent aatrb xx vvbl
2.9422753513281386e-2

vol uub xx
203742

size $ aateb `mul` eff (hhaa (hrhh uub (hhtrb `hrhrred` xx)))
65 % 1

let (kmax,omax,qmax) = (5, 60, 10)

let ll = buildcondrr vvbl hhtrb kmax omax qmax

rpln ll
"(3.7980667427950365e-3,{BsmtUnfSFB,GrLivAreaB,LotAreaB,MoSold,YrSold})"
"(3.7980667427950365e-3,{GarageAreaB,GrLivAreaB,LotAreaB,MoSold,YrSold})"
"(4.7475834284931295e-3,{GrLivAreaB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(4.747583428494018e-3,{BsmtUnfSFB,GrLivAreaB,MoSold,YearRemodAddB,YrSold})"
"(5.105972568058448e-3,{1stFlrSFB,BsmtFinSF1B,GarageYrBltB,LotAreaB,MoSold})"
"(5.697100114192999e-3,{1stFlrSFB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(5.697100114192999e-3,{BsmtUnfSFB,GrLivAreaB,LotFrontageB,MoSold,YrSold})"
"(6.055489253757429e-3,{1stFlrSFB,BsmtUnfSFB,GarageYrBltB,LotAreaB,MoSold})"
"(6.055489253757429e-3,{BsmtUnfSFB,GarageYrBltB,GrLivAreaB,MoSold,YrSold})"
"(6.055489253758317e-3,{BsmtFinSF1B,GarageAreaB,GrLivAreaB,LotAreaB,MoSold})"

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub (hhtrb `hrhrred` xx)), let vaar' = vsize uub xx (hhaa (hrhh uub (hhtrbr `hrhrred` xx)))] 
"(12273.219033706933,{BsmtFinSF1B,GarageAreaB,GrLivAreaB,LotAreaB,MoSold})"
"(12273.219033703208,{1stFlrSFB,BsmtUnfSFB,GarageYrBltB,LotAreaB,MoSold})"
"(12264.795410484076,{1stFlrSFB,BsmtFinSF1B,GarageYrBltB,LotAreaB,MoSold})"
"(10169.712556688115,{BsmtUnfSFB,GarageYrBltB,GrLivAreaB,MoSold,YrSold})"
"(10169.712556685321,{BsmtUnfSFB,GrLivAreaB,LotAreaB,MoSold,YrSold})"
"(10088.226606839336,{GarageAreaB,GrLivAreaB,LotAreaB,MoSold,YrSold})"
"(10017.393311132211,{BsmtUnfSFB,GrLivAreaB,MoSold,YearRemodAddB,YrSold})"
"(9956.692081614863,{GrLivAreaB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(9938.891320712399,{BsmtUnfSFB,GrLivAreaB,LotFrontageB,MoSold,YrSold})"
"(9922.476644909475,{1stFlrSFB,LotAreaB,MoSold,YearRemodAddB,YrSold})"

let xx = llqq $ map VarStr ["BsmtFinSF1B","GarageAreaB","GrLivAreaB","LotAreaB","MoSold"]

card xx
5

lent aatrb xx vvbl
6.055489253758317e-3

vol uub xx
2444904

size $ aateb `mul` eff (hhaa (hrhh uub (hhtrb `hrhrred` xx)))
19 % 1

let (kmax,omax,qmax) = (6, 60, 10)

let ll = buildcondrr vvbl hhtrb kmax omax qmax

rpln ll
"(1.8990333713970742e-3,{1stFlrSFB,GarageAreaB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(1.8990333713970742e-3,{BsmtExposure,BsmtUnfSFB,GrLivAreaB,LotAreaB,MoSold,YrSold})"
"(1.8990333713970742e-3,{BsmtUnfSFB,GarageAreaB,GrLivAreaB,LotFrontageB,MoSold,YrSold})"
"(1.8990333713970742e-3,{BsmtUnfSFB,GarageAreaB,GrLivAreaB,MoSold,YearRemodAddB,YrSold})"
"(1.8990333713970742e-3,{BsmtUnfSFB,GarageAreaB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(1.8990333713970742e-3,{BsmtUnfSFB,GarageAreaB,LotFrontageB,MoSold,TotalBsmtSFB,YrSold})"
"(1.8990333713970742e-3,{BsmtUnfSFB,GrLivAreaB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(1.8990333713970742e-3,{GarageAreaB,GrLivAreaB,LotAreaB,MasVnrAreaB,MoSold,YrSold})"
"(1.8990333713970742e-3,{GarageAreaB,GrLivAreaB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(1.8990333713970742e-3,{GarageAreaB,LotAreaB,MoSold,TotRmsAbvGrd,TotalBsmtSFB,YrSold})"

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub (hhtrb `hrhrred` xx)), let vaar' = vsize uub xx (hhaa (hrhh uub (hhtrbr `hrhrred` xx)))] 
"(14617.822131037712,{GarageAreaB,GrLivAreaB,LotAreaB,MasVnrAreaB,MoSold,YrSold})"
"(14501.807925760746,{BsmtUnfSFB,GarageAreaB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(14501.807925760746,{BsmtUnfSFB,GarageAreaB,GrLivAreaB,MoSold,YearRemodAddB,YrSold})"
"(14501.807925745845,{BsmtUnfSFB,GrLivAreaB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(14433.893293499947,{GarageAreaB,GrLivAreaB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(14433.893293455243,{1stFlrSFB,GarageAreaB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(14412.99644856155,{BsmtUnfSFB,GarageAreaB,GrLivAreaB,LotFrontageB,MoSold,YrSold})"
"(14393.23908534646,{BsmtUnfSFB,GarageAreaB,LotFrontageB,MoSold,TotalBsmtSFB,YrSold})"
"(13988.072130039334,{GarageAreaB,LotAreaB,MoSold,TotRmsAbvGrd,TotalBsmtSFB,YrSold})"
"(12535.78026765585,{BsmtExposure,BsmtUnfSFB,GrLivAreaB,LotAreaB,MoSold,YrSold})"

let xx = llqq $ map VarStr ["GarageAreaB","GrLivAreaB","LotAreaB","MasVnrAreaB","MoSold","YrSold"]

card xx
6

lent aatrb xx vvbl
1.8990333713970742e-3

vol uub xx
12224520

size $ aateb `mul` eff (hhaa (hrhh uub (hhtrb `hrhrred` xx)))
12 % 1

let (kmax,omax,qmax) = (7, 60, 10)

let ll = buildcondrr vvbl hhtrb kmax omax qmax

rpln ll
"(0.0,{BsmtExposure,BsmtUnfSFB,GarageAreaB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(0.0,{BsmtExposure,BsmtUnfSFB,GarageAreaB,LotFrontageB,MoSold,YearRemodAddB,YrSold})"
"(0.0,{BsmtExposure,BsmtUnfSFB,GrLivAreaB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(9.49516685698093e-4,{1stFlrSFB,BedroomAbvGr,BsmtExposure,BsmtFinSF1B,GarageYrBltB,LotAreaB,MoSold})"
"(9.49516685698093e-4,{1stFlrSFB,BedroomAbvGr,BsmtFinSF1B,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(9.49516685698093e-4,{1stFlrSFB,BedroomAbvGr,BsmtFinType1,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(9.49516685698093e-4,{1stFlrSFB,BedroomAbvGr,BsmtUnfSFB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(9.49516685698093e-4,{1stFlrSFB,BedroomAbvGr,GarageAreaB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(9.49516685698093e-4,{1stFlrSFB,BedroomAbvGr,LotAreaB,MasVnrAreaB,MoSold,YearRemodAddB,YrSold})"
"(9.49516685698093e-4,{1stFlrSFB,BedroomAbvGr,LotAreaB,MoSold,WoodDeckSFB,YearRemodAddB,YrSold})"

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub (hhtrb `hrhrred` xx)), let vaar' = vsize uub xx (hhaa (hrhh uub (hhtrbr `hrhrred` xx)))] 
"(17671.7167750597,{1stFlrSFB,BedroomAbvGr,BsmtExposure,BsmtFinSF1B,GarageYrBltB,LotAreaB,MoSold})"
"(17537.708243370056,{1stFlrSFB,BedroomAbvGr,BsmtUnfSFB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(17525.6961145401,{1stFlrSFB,BedroomAbvGr,LotAreaB,MoSold,WoodDeckSFB,YearRemodAddB,YrSold})"
"(17525.69611442089,{1stFlrSFB,BedroomAbvGr,BsmtFinSF1B,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(17513.683985590935,{1stFlrSFB,BedroomAbvGr,LotAreaB,MasVnrAreaB,MoSold,YearRemodAddB,YrSold})"
"(17469.78956925869,{1stFlrSFB,BedroomAbvGr,GarageAreaB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(16851.510157883167,{BsmtExposure,BsmtUnfSFB,GarageAreaB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(16851.510157704353,{BsmtExposure,BsmtUnfSFB,GrLivAreaB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(16626.453417003155,{BsmtExposure,BsmtUnfSFB,GarageAreaB,LotFrontageB,MoSold,YearRemodAddB,YrSold})"
"(15865.840895712376,{1stFlrSFB,BedroomAbvGr,BsmtFinType1,LotAreaB,MoSold,YearRemodAddB,YrSold})"

So the minimum tuple dimension is 7. Choose a tuple $X$ with the maximum relative entropy,

let xx = llqq $ map VarStr ["BsmtExposure","BsmtUnfSFB","GarageAreaB","LotAreaB","MoSold","YearRemodAddB","YrSold"]

card xx
7

lent aatrb xx vvbl
0.0

vol uub xx
55301400

but classifies the sample into only $|(A_{\mathrm{trb}}~\%~(X \cup V_{\mathrm{bl}}))^{\mathrm{F}}| = |(A_{\mathrm{trb}}\%X)^{\mathrm{F}}| = 1460$ effective states or slices,

rpln $ aall $ aatrb `red` (xx `union` vvbl)
"({(BsmtExposure,Av),(BsmtUnfSFB,0),(GarageAreaB,0),(LotAreaB,3182),(MoSold,5),(SalePriceB,88000),(YearRemodAddB,1974),(YrSold,2010)},1 % 1)"
"({(BsmtExposure,Av),(BsmtUnfSFB,0),(GarageAreaB,0),(LotAreaB,3182),(MoSold,12),(SalePriceB,88000),(YearRemodAddB,1970),(YrSold,2007)},1 % 1)"
"({(BsmtExposure,Av),(BsmtUnfSFB,0),(GarageAreaB,0),(LotAreaB,12150),(MoSold,5),(SalePriceB,124000),(YearRemodAddB,1993),(YrSold,2007)},1 % 1)"
...
"({(BsmtExposure,No),(BsmtUnfSFB,2336),(GarageAreaB,844),(LotAreaB,12150),(MoSold,3),(SalePriceB,326000),(YearRemodAddB,2006),(YrSold,2007)},1 % 1)"
"({(BsmtExposure,No),(BsmtUnfSFB,2336),(GarageAreaB,844),(LotAreaB,14175),(MoSold,7),(SalePriceB,755000),(YearRemodAddB,2009),(YrSold,2009)},1 % 1)"
"({(BsmtExposure,No),(BsmtUnfSFB,2336),(GarageAreaB,1488),(LotAreaB,13005),(MoSold,8),(SalePriceB,278000),(YearRemodAddB,2009),(YrSold,2009)},1 % 1)"

size $ eff $ aatrb `red` (xx `union` vvbl)
1460 % 1

This, however, is the cardinality of effective states of the bucketed training sample. So, even though the relative entropy is the highest obtained so far, which implies a robust or likely model, it is doubtful that there is sufficient size in each component to make the tuple very query effective. This can be seen by considering the query effectiveness of the test set, $\mathrm{size}(A_{\mathrm{teb}} * (A_{\mathrm{trb}}\%X)^{\mathrm{F}})$,

size $ aateb `mul` eff (hhaa (hrhh uub (hhtrb `hrhrred` xx)))
7 % 1

Of course, if a query fails with a model of 7 variables we can retry with the less likely model of 6 variables, and so on until a prediction is made.

To conclude, the choice between models consisting of only substrate variables is a trade-off between model likelihood and accuracy/effectiveness given the sample size and substrate valencies.

Induced modelling of sale price

Consider an unsupervised induced model $D$ on the query variables, $V_{\mathrm{bk}}$, which exclude sale price. We shall analyse this model, $D$, to find a smaller submodel that predicts the label variables, $V_{\mathrm{bl}}$, or sale price. That is, we shall search in the decomposition fud for a submodel that optimises conditional entropy.

Here the induced model is created by the limited-nodes highest-layer excluded-self maximum-roll-by-derived-dimension fud decomper, $(\cdot,D) = I_{P,U_{\mathrm{b}},\mathrm{D,F,mm,xs,d,f}}((V_{\mathrm{bk}},A_{\mathrm{trb}}))$.

There is an example of model induction in the AMES repository.

First consider the fud decomposition AMES_model1.json (see Model 1 induction),

s <- BL.readFile "./AMES_model1.json"
let df = fromJust $ persistentsDecompFud $ fromJust $ (Data.Aeson.decode s :: Maybe DecompFudPersistent)

let uub1 = uub `uunion` (fsys (dfff df))

card $ uvars uub1
605

Let us examine the tree of the fud decomposition, \[ \begin{eqnarray} \{\{(S,~\mathrm{und}(F),~\mathrm{der}(F)) : (S,F) \in L\} : L \in \mathrm{paths}(D)\} \end{eqnarray} \]

rpln $ qqll $ treesPaths $ funcsTreesMap (\(ss,ff) -> (ss,fund ff,fder ff)) $ dfzz $ df
...

The decomposition tree contains 20 nodes with fud variables as follows, \[ \begin{eqnarray} \{\{\mathrm{fid}(F) : (\cdot,F) \in L\} : L \in \mathrm{paths}(D)\} \end{eqnarray} \]

let fid = variablesVariableFud . least . fder

rpln $ qqll $ treesSubPaths $ funcsTreesMap (\(ss,ff) -> fid ff) $ dfzz $ df
"[1]"
"[1,2]"
"[1,2,6]"
"[1,2,6,16]"
"[1,2,6,18]"
"[1,2,17]"
"[1,3]"
"[1,3,5]"
"[1,3,5,13]"
"[1,3,5,15]"
"[1,3,7]"
"[1,3,7,12]"
"[1,3,7,20]"
"[1,3,9]"
"[1,3,9,19]"
"[1,4]"
"[1,4,8]"
"[1,4,8,14]"
"[1,4,10]"
"[1,4,11]"

Now consider the summed alignment and the summed alignment valency-density, $\mathrm{summation}(U_{\mathrm{b}1},D,A_{\mathrm{b}}))$,

let summation = systemsDecompFudsHistoryRepasAlignmentContentShuffleSummation_u
let sumtree = systemsDecompFudsHistoryRepasTreeAlignmentContentShuffleSummation_u

let (wmax,lmax,xmax,omax,bmax,mmax,umax,pmax,fmax,mult,seed) = (2919, 8, 2919, 50, (50*5), 5, 2919, 1, 20, 10, 5)

let hhb = aahr uub aab

summation mult seed uub1 df hhb
(26138.986220754774,11161.095383182923)

\[ \begin{eqnarray} \{(\mathrm{fid}(F),~z_C,~a) : ((S,F),(z_C,(a,a_{\mathrm{d}}))) \in \mathrm{nodes}(\mathrm{sumtree}(U_{\mathrm{b}1},D,A_{\mathrm{b}}))\} \end{eqnarray} \]

rpln $ qqll $ treesElements $ funcsTreesMap (\((ss,ff),(zc,(a,ad))) -> (fid ff, zc, a)) $ sumtree mult seed uub1 df hhb
"(1,2919,11876.31476343951)"
"(2,1217,2797.010668824966)"
"(3,959,2772.448994487938)"
"(4,546,2189.266554362157)"
"(5,330,1023.1721661690158)"
"(6,317,623.5242458361815)"
"(7,263,800.9270189055542)"
"(8,236,749.8662882382283)"
"(9,178,423.5294121314936)"
"(10,151,175.60853217243653)"
"(11,149,449.6395044598346)"
"(12,139,421.8052962543643)"
"(13,132,365.7226548602505)"
"(14,110,202.6662173927707)"
"(15,108,219.29202578165894)"
"(16,98,228.80939173275726)"
"(17,96,157.91847441003722)"
"(18,93,201.48929391123798)"
"(19,88,249.44069275190375)"
"(20,86,210.53402463247687)"

We can see that the root fud has the highest slice size and shuffle content derived alignment, while the leaf fuds have small slice sizes and shuffle content derived alignments.

The bare model is a fud decomposition. As noted in Conversion to fud, the tree of a fud decomposition is sometimes unwieldy, so consider the fud decomposition fud, $F = D^{\mathrm{F}} \in \mathcal{F}$, (see Practicable fud decomposition fud),

let ff = fromJust $ systemsDecompFudsNullablePracticable uub1 df 1

let uub2 = uub `uunion` (fsys ff)

card $ uvars uub2
838

The model, $F$, has 198 derived variables, $W_F = \mathrm{der}(F)$, and a large derived volume, $|W_F^{\mathrm{C}}|$,

card $ fder ff
198

rp $ fder ff
"{<<1,n>,1>,<<1,n>,2>,...,<<1,n>,8>,<<2,n>,1>,<<2,n>,2>,...,<<19,n>,8>,<<19,n>,9>,<<20,n>,1>,<<20,n>,2>,...,<<20,n>,9>}"

vol uub2 (fder ff)
65288319805636102477734568013414167375376952816294527506227174787815365536553004470827937999880192

The model has 54 underlying variables, $V_F = \mathrm{und}(F)$,

card $ fund ff
54

rp $ vvbk `minus` fund ff
"{3SsnPorchB,BsmtFinSF2B,Condition1,Condition2,EnclosedPorchB,GarageAreaB,GarageCond,Heating,LotAreaB,LotFrontageB,LowQualFinSFB,MasVnrAreaB,MiscValB,MoSold,OpenPorchSFB,OverallCond,OverallQual,PoolArea,RoofMatl,RoofStyle,ScreenPorchB,Street,Utilities,WoodDeckSFB,YrSold}"

That is, a substantial part of substrate is ignored by the model.

The underlying volume, $|V_F^{\mathrm{C}}|$, is

vol uub $ fund ff
11693908356403202876262422937600000000000000000

The derived entropy, $\mathrm{entropy}(A_{\mathrm{b}} * F)$, is

let aab' = hhaa $ hrhh uub2 $ hrfmul uub2 ff hhb `hrhrred` fder ff

ent aab'
4.673619793603431

This may be compared to the logarithm of the derived volume, $\ln |W_F^{\mathrm{C}}|$,

let w = fromIntegral (vol uub2 (fder ff)) :: Double

log w
225.22698207796643

So derived entropy is quite low. This is because there are only 244 effective derived states,

size $ eff aab'
244 % 1

rpln $ snd $ unzip $ aall aab'
"6 % 1"
"1 % 1"
"10 % 1"
...
"5 % 1"
"68 % 1"
"16 % 1"

Now apply the model to the sample. Let $A_{\mathrm{trbb}} = A_{\mathrm{trb}}~\%~V_{\mathrm{b}} * \prod\mathrm{his}(F)$,

let hhtrb = aahr uub aatrb `hrhrred` vvb

let hhtrbb = hrfmul uub2 ff hhtrb

hrsize hhtrbb
1460

let hhtrbr = historyRepasShuffle_u hhtrb 1

let hhtrbrb = hrfmul uub2 ff hhtrbr

hrsize hhtrbrb
1460

let hhteb = aahr uub aateb `hrhrred` vvb

let hhtebb = hrfmul uub2 ff hhteb

hrsize hhtebb
1459

rpln $ aall $ hhaa $ hrhh uub2 $ hhtrbb `hrhrred` (fder ff `union` vvbl)
"({(SalePriceB,88000),(<<1,n>,1>,0),...,,(<<20,n>,9>,null)},1 % 1)"
...

size $ eff $ hhaa $ hrhh uub2 $ hhtrbb `hrhrred` (fder ff `union` vvbl)
741 % 1

The model’s label entropy or query conditional entropy is less than that of Neighborhood, \[ \mathrm{lent}(A_{\mathrm{trbb}},W_F,V_{\mathrm{bl}}) < \mathrm{lent}(A_{\mathrm{trbb}},\{\mathrm{Neighbourhood}\},V_{\mathrm{bl}}) < \mathrm{ent}(A_{\mathrm{trbb}}\%V_{\mathrm{bl}}) \]

let hrlent uu hh ww vvl = ent (hhaa $ hrhh uu $ hh `hrhrred` (ww `union` vvl)) - ent (hhaa $ hrhh uu $ hh `hrhrred` ww)

hrlent uub2 hhtrbb (fder ff) vvbl
1.7450488015119827

lent aatrb (sgl (VarStr "Neighborhood")) vvbl
2.3688094014030585

ent $ aatrb `red` vvbl
2.9948072760546887

That is, the model is more predictive of sale price than Neighborhood.

rpln $ sort [(hrlent uub2 hhtrbb (sgl w) vvbl, w) | w <- qqll (fder ff)]
"(2.7678906027208003,<<1,n>,7>)"
"(2.7721353834176585,<<1,n>,5>)"
"(2.7744768451897204,<<1,n>,1>)"
"(2.7754896835888694,<<1,n>,2>)"
"(2.7800290635823406,<<1,n>,3>)"
"(2.7964648892331976,<<2,n>,6>)"
"(2.7971255328687072,<<2,n>,2>)"
"(2.808794699705433,<<2,n>,5>)"
"(2.8089679700007926,<<2,n>,3>)"
"(2.8135915898007724,<<2,n>,7>)"
"(2.8201910325632626,<<2,n>,10>)"
"(2.822413255887054,<<2,n>,9>)"
"(2.8225839417138143,<<2,n>,1>)"
"(2.8231610136666396,<<2,n>,8>)"
"(2.8272020112182132,<<4,n>,5>)"
"(2.8273193335108893,<<2,n>,4>)"
"(2.8277017001533764,<<4,n>,7>)"
"(2.8277017001533764,<<4,n>,8>)"
"(2.8277017001533764,<<4,n>,9>)"
...
"(2.971364052157883,<<9,n>,2>)"
"(2.971364052157883,<<9,n>,3>)"
"(2.9714608064465797,<<9,n>,7>)"
"(2.9719010255089846,<<19,n>,3>)"
"(2.9719010255089846,<<19,n>,4>)"
"(2.9719010255089846,<<19,n>,5>)"
"(2.9719010255089846,<<19,n>,7>)"
"(2.9719010255089846,<<19,n>,8>)"
"(2.972345808353551,<<9,n>,6>)"
"(2.9740902966032143,<<20,n>,1>)"

We can see that the derived variables nearest the root fud tend to have the lowest label entropy. None have zero label entropy by themselves. Consider derived variable <<1,n>,7> in the root fud,

let w1n7 = stringsVariable "<<1,n>,7>"

rp $ fund $ ff `fdep` sgl w1n7
"{BsmtQual,ExterQual,Foundation,GarageQual,GarageYrBltB,SaleCondition,YearRemodAddB}"

hrlent uub2 hhtrbb (sgl w1n7) vvbl
2.7678906027208003

rpln $ aall $ hhaa $ hrhh uub2 $ hhtrbb `hrhrred` (sgl w1n7 `union` vvbl)
"({(SalePriceB,88000),(<<1,n>,7>,0)},64 % 1)"
"({(SalePriceB,88000),(<<1,n>,7>,1)},1 % 1)"
"({(SalePriceB,88000),(<<1,n>,7>,2)},10 % 1)"
"({(SalePriceB,106250),(<<1,n>,7>,0)},63 % 1)"
...
"({(SalePriceB,326000),(<<1,n>,7>,2)},29 % 1)"
"({(SalePriceB,755000),(<<1,n>,7>,0)},5 % 1)"
"({(SalePriceB,755000),(<<1,n>,7>,1)},43 % 1)"
"({(SalePriceB,755000),(<<1,n>,7>,2)},25 % 1)"

Now consider the label entropy for all of the fud variables, not just the fud derived variables. We can determine minimum subsets of the fud variables that are causal or predictive by using the repa conditional entropy tuple set builder to do the conditional entropy minimise, \[ \{(\mathrm{lent}(A_{\mathrm{trbb}},M,V_{\mathrm{bl}}),~M) : M \in \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,A_{\mathrm{trbb}},\mathrm{L}}))\} \]

let buildcondrr vvl aa kmax omax qmax = sort $ map (\(a,b) -> (b,a)) $ Map.toList $ fromJust $ parametersBuilderConditionalVarsRepa kmax omax qmax vvl aa

let (kmax,omax,qmax) = (1, 20, 20)

let ll = buildcondrr vvbl hhtrbb kmax omax qmax

rpln ll
"(2.3688094014030585,{Neighborhood})"
"(2.401775936638896,{OverallQual})"
"(2.438375883488403,{GrLivAreaB})"
"(2.505387552265462,{GarageAreaB})"
"(2.5141550725111537,{TotalBsmtSFB})"
"(2.5286425658331457,{YearBuiltB})"
"(2.575379618536724,{GarageYrBltB})"
"(2.5772799825200803,{1stFlrSFB})"
"(2.6072098566399333,{GarageCars})"
"(2.626737288605211,{YearRemodAddB})"
"(2.633387515941217,{MSSubClass})"
"(2.63723029318385,{<<1,1>,137>})"
"(2.652820702365173,{BsmtQual})"
"(2.660976881941984,{2ndFlrSFB})"
"(2.662266429504159,{ExterQual})"
"(2.667532610287423,{KitchenQual})"
"(2.685754350308946,{BsmtFinSF1B})"
"(2.6905473447680794,{<<1,1>,135>})"
"(2.690942864111628,{OpenPorchSFB})"
"(2.699781772374598,{<<1,1>,72>})"

Let us sort by shuffle content derived alignment descending. Let $L = \mathrm{botd}(\mathrm{qmax})(\mathrm{elements}(Z_{P,A_{\mathrm{trbb}},\mathrm{L}}))$. Then calculate \[ \{(\mathrm{algn}(A_{\mathrm{trbb}}\%X)-\mathrm{algn}(A_{\mathrm{trbrb}}\%X),~X) : (e,X) \in L\} \] where $A_{\mathrm{trbrb}} = A_{\mathrm{trbr}}~\%~V_{\mathrm{b}} * \prod\mathrm{his}(F)$,

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub2 (hhtrbb `hrhrred` xx)), let aar' = hhaa (hrhh uub2 (hhtrbrb `hrhrred` xx))] 
"(0.0,{<<1,1>,137>})"
"(0.0,{<<1,1>,135>})"
"(0.0,{<<1,1>,72>})"
"(0.0,{YearRemodAddB})"
"(0.0,{YearBuiltB})"
"(0.0,{TotalBsmtSFB})"
"(0.0,{OverallQual})"
"(0.0,{OpenPorchSFB})"
"(0.0,{Neighborhood})"
"(0.0,{MSSubClass})"
"(0.0,{KitchenQual})"
"(0.0,{GrLivAreaB})"
"(0.0,{GarageYrBltB})"
"(0.0,{GarageCars})"
"(0.0,{GarageAreaB})"
"(0.0,{ExterQual})"
"(0.0,{BsmtQual})"
"(0.0,{BsmtFinSF1B})"
"(0.0,{2ndFlrSFB})"
"(0.0,{1stFlrSFB})"

and by size-volume-sized-shuffle relative entropy descending, \[ \{(\mathrm{rent}(A_{\mathrm{trbb}}~\%~X,~Z_F * \hat{A}_{\mathrm{trbrb}}~\%~X),~X) : (e,X) \in L\} \] where $Z_F = \mathrm{scalar}(|V_F^{\mathrm{C}}|)$,

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub2 (hhtrbb `hrhrred` xx)), let vaar' = vsize uub2 xx (hhaa (hrhh uub2 (hhtrbrb `hrhrred` xx)))] 
"(3.694822225952521e-13,{GrLivAreaB})"
"(2.984279490192421e-13,{<<1,1>,135>})"
"(2.5579538487363607e-13,{MSSubClass})"
"(1.7763568394002505e-13,{GarageAreaB})"
"(1.7763568394002505e-13,{1stFlrSFB})"
"(1.6342482922482304e-13,{<<1,1>,72>})"
"(1.4566126083082054e-13,{KitchenQual})"
"(1.2789769243681803e-13,{GarageYrBltB})"
"(9.237055564881302e-14,{2ndFlrSFB})"
"(2.842170943040401e-14,{YearBuiltB})"
"(2.1316282072803006e-14,{OpenPorchSFB})"
"(2.6645352591003757e-15,{GarageCars})"
"(-7.593925488436071e-14,{ExterQual})"
"(-8.260059303211165e-14,{BsmtQual})"
"(-8.526512829121202e-14,{BsmtFinSF1B})"
"(-2.2382096176443156e-13,{OverallQual})"
"(-3.588240815588506e-13,{<<1,1>,137>})"
"(-3.765876499528531e-13,{TotalBsmtSFB})"
"(-3.979039320256561e-13,{Neighborhood})"
"(-6.323830348264892e-13,{YearRemodAddB})"

let xx = llqq $ map VarStr ["GrLivAreaB"]

card xx
1

The label entropy of the tuple, $X$, is $\mathrm{lent}(A_{\mathrm{trbb}},X,V_{\mathrm{bl}})$,

hrlent uub2 hhtrbb xx vvbl
2.438375883488403

vol uub2 xx
21

The tuple, $X$, is very query effective, $\mathrm{size}(A_{\mathrm{tebb}}\%X * (A_{\mathrm{trbb}}\%X)^{\mathrm{F}})$,

size $ hhaa (hrhh uub2 (hhtebb `hrhrred` xx)) `mul` eff (hhaa (hrhh uub2 (hhtrbb `hrhrred` xx)))
1459 % 1

let xx = llqq $ map stringsVariable ["<<1,1>,135>"]

card xx
1

hrlent uub2 hhtrbb xx vvbl
2.6905473447680794

vol uub2 xx
8

size $ hhaa (hrhh uub2 (hhtebb `hrhrred` xx)) `mul` eff (hhaa (hrhh uub2 (hhtrbb `hrhrred` xx)))
1459 % 1

The substrate variables are usually more predictive of sale price than the fud variables. This is because the substrate variables generally have larger valencies and so fewer are needed to partition the volume,

let (kmax,omax,qmax) = (5, 20, 20)

let ll = buildcondrr vvbl hhtrbb kmax omax qmax

rpln ll
"(3.7980667427950365e-3,{BsmtUnfSFB,GrLivAreaB,LotAreaB,MoSold,YrSold})"
"(3.7980667427950365e-3,{GarageAreaB,GrLivAreaB,LotAreaB,MoSold,YrSold})"
"(4.7475834284931295e-3,{GrLivAreaB,LotAreaB,MoSold,YearRemodAddB,YrSold})"
"(4.747583428494018e-3,{BsmtUnfSFB,GrLivAreaB,MoSold,YearRemodAddB,YrSold})"
"(5.105972568058448e-3,{1stFlrSFB,BsmtFinSF1B,GarageYrBltB,LotAreaB,MoSold})"
"(6.055489253757429e-3,{1stFlrSFB,BsmtUnfSFB,GarageYrBltB,LotAreaB,MoSold})"
"(6.055489253757429e-3,{BsmtUnfSFB,GarageYrBltB,GrLivAreaB,MoSold,YrSold})"
"(6.055489253758317e-3,{BsmtFinSF1B,GarageAreaB,GrLivAreaB,LotAreaB,MoSold})"
"(7.005005939455522e-3,{LotAreaB,MoSold,TotalBsmtSFB,YearRemodAddB,YrSold})"
"(7.596133485589185e-3,{BsmtUnfSFB,GrLivAreaB,LotAreaB,MoSold,WoodDeckSFB})"
"(7.596133485590073e-3,{1stFlrSFB,GarageAreaB,LotAreaB,MoSold,YrSold})"
"(7.596133485590073e-3,{1stFlrSFB,GarageYrBltB,LotAreaB,MoSold,YrSold})"
"(7.596133485590073e-3,{GarageYrBltB,GrLivAreaB,LotAreaB,MoSold,YrSold})"
"(7.769196183682325e-3,{BsmtUnfSFB,GrLivAreaB,LotAreaB,MoSold,OverallQual})"
"(7.769196183682325e-3,{BsmtUnfSFB,GrLivAreaB,LotAreaB,MoSold,YearRemodAddB})"
"(7.7691961836832135e-3,{BsmtUnfSFB,Exterior2nd,GrLivAreaB,LotAreaB,MoSold})"
"(7.954522625153615e-3,{BsmtFinSF1B,GarageYrBltB,GrLivAreaB,LotAreaB,MoSold})"
"(7.954522625154503e-3,{1stFlrSFB,BsmtFinSF1B,LotAreaB,MoSold,YearBuiltB})"
"(7.954522625154503e-3,{1stFlrSFB,BsmtUnfSFB,LotAreaB,MoSold,YearBuiltB})"
"(7.954522625154503e-3,{BsmtUnfSFB,GarageAreaB,GrLivAreaB,MoSold,YearBuiltB})"

None of the 5-tuples contain model variables.

Now optimise for larger tuples, excluding the substrate. Let $A_{\mathrm{trbb}2} = A_{\mathrm{trbb}}~\%~(\mathrm{vars}(F) \setminus V_{\mathrm{b}} \cup V_{\mathrm{bl}}$, $A_{\mathrm{trbrb}2} = A_{\mathrm{trbrb}}~\%~(\mathrm{vars}(F) \setminus V_{\mathrm{b}} \cup V_{\mathrm{bl}}$ and $A_{\mathrm{tebb}2} = A_{\mathrm{tebb}}~\%~(\mathrm{vars}(F) \setminus V_{\mathrm{b}} \cup V_{\mathrm{bl}}$,

let hhtrbb2 = hhtrbb `hrhrred` (fvars ff `minus` vvb `union` vvbl)

let hhtrbrb2 = hhtrbrb `hrhrred` (fvars ff `minus` vvb `union` vvbl)

let hhtebb2 = hhtebb `hrhrred` (fvars ff `minus` vvb `union` vvbl)

let (kmax,omax,qmax) = (1, 20, 10)

let ll = buildcondrr vvbl hhtrbb2 kmax omax qmax

rpln ll
"(2.63723029318385,{<<1,1>,137>})"
"(2.6905473447680794,{<<1,1>,135>})"
"(2.699781772374598,{<<1,1>,72>})"
"(2.702120882859549,{<<1,1>,187>})"
"(2.7099518288561932,{<<12,1>,167>})"
"(2.712191756461502,{<<1,3>,183>})"
"(2.7133311011540195,{<<1,1>,270>})"
"(2.713693998368657,{<<1,1>,147>})"
"(2.7152300202479576,{<<1,1>,278>})"
"(2.7177371682096467,{<<4,3>,333>})"

The shuffle content derived alignment is \[ \{(\mathrm{algn}(A_{\mathrm{trbb}2}\%X)-\mathrm{algn}(A_{\mathrm{trbrb}2}\%X),~X) : (e,X) \in L\} \]

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)), let aar' = hhaa (hrhh uub2 (hhtrbrb2 `hrhrred` xx))] 
"(0.0,{<<12,1>,167>})"
"(0.0,{<<4,3>,333>})"
"(0.0,{<<1,3>,183>})"
"(0.0,{<<1,1>,278>})"
"(0.0,{<<1,1>,270>})"
"(0.0,{<<1,1>,187>})"
"(0.0,{<<1,1>,147>})"
"(0.0,{<<1,1>,137>})"
"(0.0,{<<1,1>,135>})"
"(0.0,{<<1,1>,72>})"

and the size-volume-sized-shuffle relative entropy is \[ \{(\mathrm{rent}(A_{\mathrm{trbb}2}~\%~X,~Z_F * \hat{A}_{\mathrm{trbrb}2}~\%~X),~X) : (e,X) \in L\} \]

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)), let vaar' = vsize uub2 (fund (ff `fdep` xx)) (hhaa (hrhh uub2 (hhtrbrb2 `hrhrred` xx)))] 
"(361.6478934810002,{<<1,3>,183>})"
"(52.293405406875536,{<<4,3>,333>})"
"(5.578372495765301,{<<1,1>,278>})"
"(9.947598300641403e-14,{<<1,1>,135>})"
"(4.1744385725905886e-14,{<<1,1>,270>})"
"(3.552713678800501e-14,{<<1,1>,72>})"
"(-4.973799150320701e-14,{<<1,1>,147>})"
"(-7.460698725481052e-14,{<<12,1>,167>})"
"(-1.6342482922482304e-13,{<<1,1>,187>})"
"(-3.197442310920451e-13,{<<1,1>,137>})"

let xx = llqq $ map stringsVariable ["<<1,3>,183>"]

card xx
1

rp $ fund (ff `fdep` xx)
"{BsmtQual,Foundation,GarageQual,GarageYrBltB,YearRemodAddB}"

The label entropy of the tuple, $X$, is $\mathrm{lent}(A_{\mathrm{trbb}},X,V_{\mathrm{bl}})$,

hrlent uub2 hhtrbb xx vvbl
2.712191756461502

vol uub2 xx
4

vol uub2 (fund (ff `fdep` xx))
71820

The tuple, $X$, is also very query effective, $\mathrm{size}(A_{\mathrm{tebb}2}\%X * (A_{\mathrm{trbb}2}\%X)^{\mathrm{F}})$,

size $ hhaa (hrhh uub2 (hhtebb2 `hrhrred` xx)) `mul` eff (hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)))
1459 % 1

let (kmax,omax,qmax) = (2, 20, 10)

let ll = buildcondrr vvbl hhtrbb2 kmax omax qmax

rpln ll
"(2.228411537951918,{<<1,1>,137>,<<4,3>,333>})"
"(2.231670337255345,{<<1,1>,137>,<<4,5>,266>})"
"(2.2396571593009766,{<<1,1>,137>,<<4,2>,416>})"
"(2.2545453221051157,{<<1,1>,137>,<<6,1>,89>})"
"(2.256645627482065,{<<1,1>,135>,<<4,3>,333>})"
"(2.263670920150578,{<<1,1>,135>,<<4,5>,266>})"
"(2.2654973169513886,{<<1,1>,137>,<<4,2>,469>})"
"(2.2693353116066253,{<<1,1>,137>,<<4,5>,30>})"
"(2.274029029404974,{<<1,1>,137>,<<4,3>,247>})"
"(2.284237902240966,{<<1,1>,137>,<<12,2>,592>})"

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)), let aar' = hhaa (hrhh uub2 (hhtrbrb2 `hrhrred` xx))] 
"(234.25011358461597,{<<1,1>,137>,<<4,3>,333>})"
"(189.26198717336138,{<<1,1>,137>,<<4,5>,266>})"
"(186.80919372823337,{<<1,1>,135>,<<4,3>,333>})"
"(172.61968932379932,{<<1,1>,137>,<<4,2>,416>})"
"(153.6987132336326,{<<1,1>,135>,<<4,5>,266>})"
"(102.52236658568745,{<<1,1>,137>,<<4,3>,247>})"
"(88.91057934754463,{<<1,1>,137>,<<12,2>,592>})"
"(70.85016212031951,{<<1,1>,137>,<<4,5>,30>})"
"(65.96846289210498,{<<1,1>,137>,<<6,1>,89>})"
"(61.540659699282514,{<<1,1>,137>,<<4,2>,469>})"

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)), let vaar' = vsize uub2 (fund (ff `fdep` xx)) (hhaa (hrhh uub2 (hhtrbrb2 `hrhrred` xx)))] 
"(342.4780684709549,{<<1,1>,137>,<<4,3>,333>})"
"(336.6053400039673,{<<1,1>,137>,<<4,5>,30>})"
"(322.67164772748947,{<<1,1>,137>,<<4,5>,266>})"
"(278.3178806453943,{<<1,1>,135>,<<4,3>,333>})"
"(259.82100361585617,{<<1,1>,135>,<<4,5>,266>})"
"(179.20785022083146,{<<1,1>,137>,<<4,2>,416>})"
"(136.19259847560897,{<<1,1>,137>,<<4,3>,247>})"
"(108.70932296565297,{<<1,1>,137>,<<12,2>,592>})"
"(96.91162282396544,{<<1,1>,137>,<<4,2>,469>})"
"(20.5473732081266,{<<1,1>,137>,<<6,1>,89>})"

let xx = llqq $ map stringsVariable ["<<1,1>,137>","<<4,3>,333>"]

card xx
2

rp $ fund (ff `fdep` xx)
"{2ndFlrSFB,GrLivAreaB,LandSlope,MSSubClass,TotalBsmtSFB,YearBuiltB}"

hrlent uub2 hhtrbb xx vvbl
2.228411537951918

vol uub2 xx
40

vol uub2 (fund (ff `fdep` xx))
9779616

size $ hhaa (hrhh uub2 (hhtebb2 `hrhrred` xx)) `mul` eff (hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)))
1459 % 1

let (kmax,omax,qmax) = (3, 20, 10)

let ll = buildcondrr vvbl hhtrbb2 kmax omax qmax

rpln ll
"(1.6895145260292708,{<<1,1>,137>,<<4,5>,266>,<<6,1>,149>})"
"(1.6979051641436822,{<<1,1>,137>,<<4,3>,333>,<<6,1>,149>})"
"(1.6979972731087285,{<<1,1>,137>,<<4,5>,266>,<<6,1>,96>})"
"(1.701430263423628,{<<1,1>,137>,<<4,2>,416>,<<6,1>,149>})"
"(1.7016296974423017,{<<1,1>,135>,<<4,3>,333>,<<6,1>,149>})"
"(1.7035087375165014,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>})"
"(1.710548342968142,{<<1,1>,137>,<<4,3>,333>,<<6,1>,96>})"
"(1.7152879473802507,{<<1,1>,135>,<<4,3>,333>,<<6,1>,96>})"
"(1.7157617287822422,{<<1,1>,137>,<<4,2>,416>,<<6,1>,96>})"
"(1.7199788474188793,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>})"


rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)), let aar' = hhaa (hrhh uub2 (hhtrbrb2 `hrhrred` xx))] 
"(443.58952520290404,{<<1,1>,137>,<<4,3>,333>,<<6,1>,149>})"
"(424.53349036219424,{<<1,1>,137>,<<4,3>,333>,<<6,1>,96>})"
"(421.0264378818474,{<<1,1>,137>,<<4,2>,416>,<<6,1>,96>})"
"(413.4285174212341,{<<1,1>,137>,<<4,2>,416>,<<6,1>,149>})"
"(405.4769356945526,{<<1,1>,137>,<<4,5>,266>,<<6,1>,149>})"
"(393.1470212526342,{<<1,1>,137>,<<4,5>,266>,<<6,1>,96>})"
"(367.49654411246047,{<<1,1>,135>,<<4,3>,333>,<<6,1>,149>})"
"(353.09311292631855,{<<1,1>,135>,<<4,3>,333>,<<6,1>,96>})"
"(346.13423571774774,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>})"
"(336.6055450711483,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>})"

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)), let vaar' = vsize uub2 (fund (ff `fdep` xx)) (hhaa (hrhh uub2 (hhtrbrb2 `hrhrred` xx)))] 
"(1295.7206377983093,{<<1,1>,135>,<<4,3>,333>,<<6,1>,149>})"
"(1238.6163300275803,{<<1,1>,135>,<<4,3>,333>,<<6,1>,96>})"
"(1204.8991889953613,{<<1,1>,137>,<<4,5>,266>,<<6,1>,149>})"
"(1081.4174448251724,{<<1,1>,137>,<<4,3>,333>,<<6,1>,96>})"
"(1076.976710319519,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>})"
"(1047.4527168273926,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>})"
"(1045.8651905059814,{<<1,1>,137>,<<4,5>,266>,<<6,1>,96>})"
"(991.0951346158981,{<<1,1>,137>,<<4,3>,333>,<<6,1>,149>})"
"(716.770625458099,{<<1,1>,137>,<<4,2>,416>,<<6,1>,96>})"
"(657.4769666909706,{<<1,1>,137>,<<4,2>,416>,<<6,1>,149>})"

let xx = llqq $ map stringsVariable ["<<1,1>,135>","<<4,3>,333>","<<6,1>,149>"]

card xx
3

rp $ fund (ff `fdep` xx)
"{1stFlrSFB,2ndFlrSFB,GarageYrBltB,GrLivAreaB,LandSlope,MSSubClass,TotalBsmtSFB}"

hrlent uub2 hhtrbb xx vvbl
1.7016296974423017

vol uub2 xx
200

vol uub2 (fund (ff `fdep` xx))
205371936

size $ hhaa (hrhh uub2 (hhtebb2 `hrhrred` xx)) `mul` eff (hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)))
1447 % 1

let (kmax,omax,qmax) = (4, 20, 10)

let ll = buildcondrr vvbl hhtrbb2 kmax omax qmax

rpln ll
"(1.2056093767641958,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<17,2>,573>})"
"(1.210725637544753,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<17,2>,573>})"
"(1.2112972862964995,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<17,2>,573>})"
"(1.218828672242375,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>,<<17,2>,573>})"
"(1.222185484195852,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<17,2>,589>})"
"(1.224833076774261,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<17,2>,589>})"
"(1.2249045755185568,{<<1,1>,135>,<<4,3>,333>,<<6,1>,149>,<<17,2>,573>})"
"(1.2276213600801311,{<<1,1>,135>,<<4,3>,333>,<<6,1>,96>,<<17,2>,573>})"
"(1.2296029451733608,{<<1,1>,137>,<<4,5>,266>,<<6,1>,149>,<<17,2>,573>})"
"(1.2312487294792733,{<<1,1>,137>,<<4,2>,416>,<<6,1>,149>,<<17,2>,573>})"

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)), let aar' = hhaa (hrhh uub2 (hhtrbrb2 `hrhrred` xx))] 
"(632.8287551124616,{<<1,1>,137>,<<4,5>,266>,<<6,1>,149>,<<17,2>,573>})"
"(573.4333215999733,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<17,2>,573>})"
"(570.7605595883509,{<<1,1>,135>,<<4,3>,333>,<<6,1>,149>,<<17,2>,573>})"
"(565.2738415795945,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<17,2>,589>})"
"(562.2997152520041,{<<1,1>,137>,<<4,2>,416>,<<6,1>,149>,<<17,2>,573>})"
"(556.9739490482635,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>,<<17,2>,573>})"
"(543.5880980920637,{<<1,1>,135>,<<4,3>,333>,<<6,1>,96>,<<17,2>,573>})"
"(508.07776127914633,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<17,2>,573>})"
"(497.07470463978746,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<17,2>,589>})"
"(479.81118068496687,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<17,2>,573>})"

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)), let vaar' = vsize uub2 (fund (ff `fdep` xx)) (hhaa (hrhh uub2 (hhtrbrb2 `hrhrred` xx)))] 
"(5389.512054443359,{<<1,1>,137>,<<4,5>,266>,<<6,1>,149>,<<17,2>,573>})"
"(5287.312896728516,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<17,2>,573>})"
"(5154.737701416016,{<<1,1>,135>,<<4,3>,333>,<<6,1>,96>,<<17,2>,573>})"
"(5117.855224609375,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>,<<17,2>,573>})"
"(4932.823638916016,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<17,2>,589>})"
"(4854.911560058594,{<<1,1>,135>,<<4,3>,333>,<<6,1>,149>,<<17,2>,573>})"
"(3764.756631374359,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<17,2>,573>})"
"(3737.942985057831,{<<1,1>,137>,<<4,2>,416>,<<6,1>,149>,<<17,2>,573>})"
"(3438.4512329101563,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<17,2>,573>})"
"(3296.3307580947876,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<17,2>,589>})"

let xx = llqq $ map stringsVariable ["<<1,1>,137>","<<4,5>,266>","<<6,1>,149>","<<17,2>,573>"]

card xx
4

rp $ fund (ff `fdep` xx)
"{1stFlrSFB,2ndFlrSFB,BsmtFullBath,CentralAir,GrLivAreaB,LandSlope,MSSubClass,MasVnrType,PavedDrive,TotalBsmtSFB,YearBuiltB}"

hrlent uub2 hhtrbb xx vvbl
1.2296029451733608

vol uub2 xx
800

vol uub2 (fund (ff `fdep` xx))
30805790400

size $ hhaa (hrhh uub2 (hhtebb2 `hrhrred` xx)) `mul` eff (hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)))
1345 % 1

The 4-tuple model may be compared to the 2-tuple substrate model, above,

let xx = llqq $ map VarStr ["BsmtUnfSFB","GrLivAreaB"]

card xx
2

lent aatrb xx vvbl
1.2044059887997252

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | let aa' = hhaa (hrhh uub (hhtrb `hrhrred` xx)), let aar' = hhaa (hrhh uub (hhtrbr `hrhrred` xx))] 
"(166.33764054037806,{BsmtUnfSFB,GrLivAreaB})"

rpln $ reverse $ sort [(rent aa' vaar', xx) | let aa' = hhaa (hrhh uub (hhtrb `hrhrred` xx)), let vaar' = vsize uub xx (hhaa (hrhh uub (hhtrbr `hrhrred` xx)))] 
"(202.14855617164312,{BsmtUnfSFB,GrLivAreaB})"

vol uub xx
462

size $ aateb `mul` eff (hhaa (hrhh uub (hhtrb `hrhrred` xx)))
1395 % 1

The 2-tuple substrate model is more query effective but has lower derived alignment and lower relative entropy, so the 4-tuple model is a more robust model.

let (kmax,omax,qmax) = (5, 20, 10)

let ll = buildcondrr vvbl hhtrbb2 kmax omax qmax

rpln ll
"(0.7996055847890995,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<16,2>,213>,<<17,2>,573>})"
"(0.8013716288758692,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<16,2>,213>,<<17,2>,573>})"
"(0.8080536409731831,{<<1,1>,135>,<<1,1>,229>,<<4,2>,416>,<<6,1>,149>,<<17,2>,573>})"
"(0.8087889725824677,{<<1,1>,135>,<<1,1>,229>,<<4,2>,416>,<<6,1>,96>,<<17,2>,573>})"
"(0.813729675502814,{<<1,1>,135>,<<1,1>,229>,<<4,5>,266>,<<6,1>,96>,<<17,2>,573>})"
"(0.8165672350370006,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<7,1>,131>,<<17,2>,573>})"
"(0.8173482106841732,{<<1,1>,135>,<<1,1>,229>,<<4,5>,266>,<<6,1>,149>,<<17,2>,573>})"
"(0.8183359199247011,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<16,2>,213>,<<17,2>,573>})"
"(0.8190168347001219,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<7,1>,131>,<<17,2>,573>})"
"(0.8194393043020138,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<16,2>,191>,<<17,2>,573>})"

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)), let aar' = hhaa (hrhh uub2 (hhtrbrb2 `hrhrred` xx))] 
"(736.998110430858,{<<1,1>,135>,<<1,1>,229>,<<4,5>,266>,<<6,1>,149>,<<17,2>,573>})"
"(716.3872523320393,{<<1,1>,135>,<<1,1>,229>,<<4,5>,266>,<<6,1>,96>,<<17,2>,573>})"
"(660.8402359074136,{<<1,1>,135>,<<1,1>,229>,<<4,2>,416>,<<6,1>,96>,<<17,2>,573>})"
"(659.5427110214221,{<<1,1>,135>,<<1,1>,229>,<<4,2>,416>,<<6,1>,149>,<<17,2>,573>})"
"(555.7056395987813,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<16,2>,213>,<<17,2>,573>})"
"(529.6933577574255,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<7,1>,131>,<<17,2>,573>})"
"(507.6162552287551,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<7,1>,131>,<<17,2>,573>})"
"(478.6182520814947,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<16,2>,213>,<<17,2>,573>})"
"(461.9056193389745,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<16,2>,213>,<<17,2>,573>})"
"(459.4824369068289,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<16,2>,191>,<<17,2>,573>})"

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)), let vaar' = vsize uub2 (fund (ff `fdep` xx)) (hhaa (hrhh uub2 (hhtrbrb2 `hrhrred` xx)))] 
"(17708.736328125,{<<1,1>,135>,<<1,1>,229>,<<4,5>,266>,<<6,1>,96>,<<17,2>,573>})"
"(16832.236328125,{<<1,1>,135>,<<1,1>,229>,<<4,5>,266>,<<6,1>,149>,<<17,2>,573>})"
"(15152.5,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<16,2>,213>,<<17,2>,573>})"
"(13323.161811828613,{<<1,1>,135>,<<1,1>,229>,<<4,2>,416>,<<6,1>,149>,<<17,2>,573>})"
"(13230.954818725586,{<<1,1>,135>,<<1,1>,229>,<<4,2>,416>,<<6,1>,96>,<<17,2>,573>})"
"(12779.5556640625,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<16,2>,213>,<<17,2>,573>})"
"(12517.32861328125,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<16,2>,191>,<<17,2>,573>})"
"(12406.62158203125,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<16,2>,213>,<<17,2>,573>})"
"(11599.976943969727,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<7,1>,131>,<<17,2>,573>})"
"(11283.258018493652,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<7,1>,131>,<<17,2>,573>})"

let xx = llqq $ map stringsVariable ["<<1,1>,135>","<<1,1>,229>","<<4,5>,266>","<<6,1>,96>","<<17,2>,573>"]

card xx
5

rp $ fund (ff `fdep` xx)
"{1stFlrSFB,2ndFlrSFB,BsmtFullBath,CentralAir,GarageYrBltB,GrLivAreaB,LandSlope,MSSubClass,MasVnrType,PavedDrive,TotalBsmtSFB,YearRemodAddB}"

hrlent uub2 hhtrbb xx vvbl
0.813729675502814

vol uub2 xx
4000

vol uub2 (fund (ff `fdep` xx))
585310017600

size $ hhaa (hrhh uub2 (hhtebb2 `hrhrred` xx)) `mul` eff (hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)))
1147 % 1

let (kmax,omax,qmax) = (6, 20, 10)

let ll = buildcondrr vvbl hhtrbb2 kmax omax qmax

rpln ll
"(0.5265768073788282,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(0.526838567000059,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(0.5280303080883888,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(0.5306443141507797,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(0.53680403495808,{<<1,1>,135>,<<4,3>,333>,<<6,1>,96>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(0.539159221726278,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<12,1>,9>,<<16,2>,213>,<<17,2>,573>})"
"(0.5438003996267335,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<16,2>,213>,<<17,2>,573>,<<20,3>,31>})"
"(0.5448685855838491,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>,<<12,1>,9>,<<16,2>,213>,<<17,2>,573>})"
"(0.546574690478562,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<12,1>,9>,<<16,2>,213>,<<17,2>,573>})"
"(0.5472063031325449,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<12,2>,122>,<<16,2>,213>,<<17,2>,573>})"

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)), let aar' = hhaa (hrhh uub2 (hhtrbrb2 `hrhrred` xx))] 
"(480.61731800495886,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>,<<12,1>,9>,<<16,2>,213>,<<17,2>,573>})"
"(480.04607411763925,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<12,1>,9>,<<16,2>,213>,<<17,2>,573>})"
"(473.47409555493596,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(470.27663672278027,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(461.939928826033,{<<1,1>,135>,<<4,3>,333>,<<6,1>,96>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(457.5257905311439,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<12,2>,122>,<<16,2>,213>,<<17,2>,573>})"
"(436.9961813251525,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<16,2>,213>,<<17,2>,573>,<<20,3>,31>})"
"(436.85248442826594,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<12,1>,9>,<<16,2>,213>,<<17,2>,573>})"
"(416.87696156472543,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(407.5551791689236,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)), let vaar' = vsize uub2 (fund (ff `fdep` xx)) (hhaa (hrhh uub2 (hhtrbrb2 `hrhrred` xx)))] 
"(29076.0,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<16,2>,213>,<<17,2>,573>,<<20,3>,31>})"
"(25856.5625,{<<1,1>,135>,<<4,3>,333>,<<6,1>,96>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(25725.0625,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>,<<12,1>,9>,<<16,2>,213>,<<17,2>,573>})"
"(25614.3125,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(25432.75,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<12,1>,9>,<<16,2>,213>,<<17,2>,573>})"
"(25033.75,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(24501.75,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<12,2>,122>,<<16,2>,213>,<<17,2>,573>})"
"(21010.849609375,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<12,1>,9>,<<16,2>,213>,<<17,2>,573>})"
"(20225.47265625,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(19945.4013671875,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"

let xx = llqq $ map stringsVariable ["<<1,1>,135>","<<4,2>,416>","<<6,1>,96>","<<16,2>,213>","<<17,2>,573>","<<20,3>,31>"]

card xx
6

rp $ fund (ff `fdep` xx)
"{1stFlrSFB,BsmtFullBath,CentralAir,ExterCond,GarageYrBltB,GrLivAreaB,HalfBath,HeatingQC,KitchenQual,LandContour,MSSubClass,MasVnrType,MiscFeature,PavedDrive,TotalBsmtSFB,YearBuiltB,YearRemodAddB}"

hrlent uub2 hhtrbb xx vvbl
0.5438003996267335

vol uub2 xx
7200

vol uub2 (fund (ff `fdep` xx))
1396762542000000

size $ hhaa (hrhh uub2 (hhtebb2 `hrhrred` xx)) `mul` eff (hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)))
1008 % 1

Note that the 6-tuple model derived alignments are lower than for the 5-tuple model, although the relative entropies are higher.

let (kmax,omax,qmax) = (7, 20, 10)

let ll = buildcondrr vvbl hhtrbb2 kmax omax qmax

rpln ll
"(0.34948438534744586,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(0.3530895236135647,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(0.3578363807935494,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(0.35958164077478827,{<<1,1>,135>,<<4,3>,333>,<<6,1>,96>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(0.361642127944271,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(0.3629795305509944,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<14,1>,121>,<<15,2>,391>,<<16,2>,213>,<<17,2>,573>})"
"(0.365835761016017,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<12,1>,9>,<<14,1>,121>,<<16,2>,213>,<<17,2>,573>})"
"(0.36619113145582904,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<14,1>,121>,<<16,2>,213>,<<17,2>,573>,<<20,3>,31>})"
"(0.36640694441084865,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<14,1>,121>,<<15,2>,391>,<<16,2>,213>,<<17,2>,573>})"
"(0.3680779794565705,{<<1,1>,135>,<<1,1>,229>,<<4,5>,266>,<<6,1>,96>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"

rpln $ reverse $ sort [(algn aa' - algn aar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)), let aar' = hhaa (hrhh uub2 (hhtrbrb2 `hrhrred` xx))] 
"(395.75810074247715,{<<1,1>,135>,<<1,1>,229>,<<4,5>,266>,<<6,1>,96>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(351.1157464562465,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<12,1>,9>,<<14,1>,121>,<<16,2>,213>,<<17,2>,573>})"
"(349.1980840357014,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(345.50794830381835,{<<1,1>,135>,<<4,3>,333>,<<6,1>,96>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(342.9598684030718,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(313.95583427447843,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<14,1>,121>,<<15,2>,391>,<<16,2>,213>,<<17,2>,573>})"
"(310.77581760639396,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<14,1>,121>,<<15,2>,391>,<<16,2>,213>,<<17,2>,573>})"
"(307.8765972868987,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<14,1>,121>,<<16,2>,213>,<<17,2>,573>,<<20,3>,31>})"
"(306.25946539998574,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(304.1040971600737,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)), let vaar' = vsize uub2 (fund (ff `fdep` xx)) (hhaa (hrhh uub2 (hhtrbrb2 `hrhrred` xx)))] 
"(34404.0,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<14,1>,121>,<<16,2>,213>,<<17,2>,573>,<<20,3>,31>})"
"(33040.9375,{<<1,1>,135>,<<1,1>,229>,<<4,5>,266>,<<6,1>,96>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(29535.0,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<12,1>,9>,<<14,1>,121>,<<16,2>,213>,<<17,2>,573>})"
"(29399.25,{<<1,1>,135>,<<4,3>,333>,<<6,1>,96>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(29119.6875,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(28869.8125,{<<1,1>,135>,<<4,5>,266>,<<6,1>,149>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(25523.9140625,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<14,1>,121>,<<15,2>,391>,<<16,2>,213>,<<17,2>,573>})"
"(25425.8359375,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<14,1>,121>,<<15,2>,391>,<<16,2>,213>,<<17,2>,573>})"
"(23795.111328125,{<<1,1>,135>,<<4,2>,416>,<<6,1>,149>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(23770.71484375,{<<1,1>,135>,<<4,2>,416>,<<6,1>,96>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"

let xx = llqq $ map stringsVariable ["<<1,1>,135>","<<4,2>,416>","<<6,1>,96>","<<14,1>,121>","<<16,2>,213>","<<17,2>,573>","<<20,3>,31>"]

card xx
7

rp $ fund (ff `fdep` xx)
"{1stFlrSFB,BsmtFullBath,CentralAir,ExterCond,GarageYrBltB,GrLivAreaB,HalfBath,HeatingQC,KitchenQual,LandContour,MSSubClass,MasVnrType,MiscFeature,PavedDrive,TotalBsmtSFB,YearBuiltB,YearRemodAddB}"

hrlent uub2 hhtrbb xx vvbl
0.36619113145582904

vol uub2 xx
14400

vol uub2 (fund (ff `fdep` xx))
1396762542000000

size $ hhaa (hrhh uub2 (hhtebb2 `hrhrred` xx)) `mul` eff (hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)))
784 % 1

Now skip to the 9-tuple,

let (kmax,omax,qmax) = (9, 20, 10)

let ll = buildcondrr vvbl hhtrbb2 kmax omax qmax

rpln ll
"(0.17732083901906037,{<<1,1>,135>,<<1,1>,229>,<<4,2>,416>,<<6,1>,96>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>,<<20,2>,100>})"
"(0.17743758499768436,{<<1,1>,135>,<<1,1>,229>,<<4,5>,266>,<<6,1>,149>,<<14,1>,121>,<<14,1>,123>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(0.1774849969627752,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>,<<7,1>,131>,<<14,1>,121>,<<14,1>,123>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(0.17824288789096876,{<<1,1>,135>,<<1,1>,229>,<<4,5>,266>,<<6,1>,96>,<<12,1>,9>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(0.17833633072927757,{<<1,1>,135>,<<1,1>,229>,<<4,5>,266>,<<6,1>,149>,<<14,1>,121>,<<14,1>,141>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(0.17859997431136332,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>,<<7,1>,131>,<<12,2>,122>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(0.17893081726442084,{<<1,1>,135>,<<1,1>,229>,<<4,5>,266>,<<6,1>,96>,<<14,1>,121>,<<14,1>,123>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(0.17913272886819165,{<<1,1>,135>,<<1,1>,229>,<<4,5>,266>,<<6,1>,96>,<<12,2>,122>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(0.18012197711105848,{<<1,1>,135>,<<1,1>,229>,<<4,2>,416>,<<6,1>,149>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>,<<20,2>,100>})"
"(0.18014061854319507,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>,<<7,1>,131>,<<12,1>,9>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"

rpln $ reverse $ sort [(rent aa' vaar', xx) | (e,xx) <- ll, let aa' = hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)), let vaar' = vsize uub2 (fund (ff `fdep` xx)) (hhaa (hrhh uub2 (hhtrbrb2 `hrhrred` xx)))] 
"(41328.0,{<<1,1>,135>,<<1,1>,229>,<<4,5>,266>,<<6,1>,96>,<<12,2>,122>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(40760.0,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>,<<7,1>,131>,<<12,2>,122>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(40616.0,{<<1,1>,135>,<<1,1>,229>,<<4,5>,266>,<<6,1>,149>,<<14,1>,121>,<<14,1>,141>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(39278.0,{<<1,1>,135>,<<1,1>,229>,<<4,5>,266>,<<6,1>,96>,<<14,1>,121>,<<14,1>,123>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(39178.0,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>,<<7,1>,131>,<<14,1>,121>,<<14,1>,123>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(39090.0,{<<1,1>,135>,<<1,1>,229>,<<4,5>,266>,<<6,1>,149>,<<14,1>,121>,<<14,1>,123>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(38418.5,{<<1,1>,135>,<<1,1>,229>,<<4,5>,266>,<<6,1>,96>,<<12,1>,9>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(38112.0,{<<1,1>,135>,<<4,5>,266>,<<6,1>,96>,<<7,1>,131>,<<12,1>,9>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>})"
"(37943.5,{<<1,1>,135>,<<1,1>,229>,<<4,2>,416>,<<6,1>,149>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>,<<20,2>,100>})"
"(37855.0,{<<1,1>,135>,<<1,1>,229>,<<4,2>,416>,<<6,1>,96>,<<14,1>,121>,<<15,1>,147>,<<16,2>,213>,<<17,2>,573>,<<20,2>,100>})"

let xx = llqq $ map stringsVariable ["<<1,1>,135>","<<1,1>,229>","<<4,5>,266>","<<6,1>,96>","<<12,2>,122>","<<14,1>,121>","<<15,1>,147>","<<16,2>,213>","<<17,2>,573>"]

card xx
9

rp $ fund (ff `fdep` xx)
"{1stFlrSFB,2ndFlrSFB,BsmtCond,BsmtFullBath,CentralAir,FireplaceQu,Fireplaces,FullBath,GarageYrBltB,GrLivAreaB,LandSlope,MSSubClass,MasVnrType,PavedDrive,TotalBsmtSFB,YearBuiltB,YearRemodAddB}"

hrlent uub2 hhtrbb xx vvbl
0.17913272886819165

vol uub2 xx
216000

vol uub2 (fund (ff `fdep` xx))
9218632777200000

size $ hhaa (hrhh uub2 (hhtebb2 `hrhrred` xx)) `mul` eff (hhaa (hrhh uub2 (hhtrbb2 `hrhrred` xx)))
415 % 1

Note that the derived volume is now very large so we have not calculated the derived alignment.

The 9-tuple sub-model of the induced model may be compared to the 3-tuple substrate model, above,

let xx = llqq $ map VarStr ["BsmtUnfSFB","GrLivAreaB","LotAreaB"]

card xx
3

lent aatrb xx vvbl
0.1632160815826591

rpln $ reverse $ sort [(rent aa' vaar', xx) | let aa' = hhaa (hrhh uub (hhtrb `hrhrred` xx)), let vaar' = vsize uub xx (hhaa (hrhh uub (hhtrbr `hrhrred` xx)))] 
"(3816.431521439241,{BsmtUnfSFB,GrLivAreaB,LotAreaB})"

vol uub xx
9702

size $ aateb `mul` eff (hhaa (hrhh uub (hhtrb `hrhrred` xx)))
369 % 1

The 3-tuple substrate model has similar label entropy and query effectiveness to the 9-tuple sub-model. The 3-tuple model has lower relative entropy, however, so the 9-tuple model is the more likely model. That is, the 9-tuple model is more accurate when effective.

With respect to sale price, we can see that there are sub-models of the induced model which have similar properties as models consisting of subsets of the substrate. Again, when choosing between sub-models of the induced model there is a trade-off between model likelihood and query effectiveness. When choosing between a sub-model of the induced model and a corresponding substrate variable model of similar label entropy and query effectiveness, however, the sub-model is, in general, the more likely model. That is, the sub-model of the induced model is preferable to the substrate model because it is more accurate when it is query effective.


top