AMES - Model 1 induction
AMES - House Prices/Model 1 induction
AMES_model1.json is induced by AMES_engine1.hs.
AMES_engine1
may be built as described in README. Then run as follows -
stack exec AMES_engine1.exe +RTS -s >AMES_engine1.log 2>&1 &
tail -f AMES_engine1.log
The first section loads the sample,
csvtr <- BL.readFile "train.csv"
let vvcsvtr = either (\_ -> V.empty) id (Data.Csv.decode HasHeader csvtr :: Either String (V.Vector Train))
let aatr = llaa [(llss [(VarStr s, fw rr) | (s,fw) <- trmap],1) | rr <- V.toList vvcsvtr]
csvte <- BL.readFile "test.csv"
let vvcsvte = either (\_ -> V.empty) id (Data.Csv.decode HasHeader csvte :: Either String (V.Vector Test))
let aate = llaa [(llss [(VarStr s, fw rr) | (s,fw) <- temap],1) | rr <- V.toList vvcsvte]
let uu = sys aatr `uunion` sys aate
let vv = uvars uu `minus` sgl (VarStr "Id")
let vvl = sgl (VarStr "SalePrice")
let vvk = vv `minus` vvl
let aa = (aatr `red` vvk) `add` (aate `red` vvk)
let vvo = llqq [w | w <- qqll vv, isOrd uu w, let u = vol uu (sgl w), u > 16]
let vvoz = llqq [w | w <- qqll vv, isOrd uu w, let u = vol uu (sgl w), u > 16, let rr = unit (sgl (llss [(w, ValInt 0)])), let bb = aatr `red` sgl w `mul` rr, size bb > 100]
let xx = Map.fromList $ map (\(v,ww) -> let VarStr s = v in (v, (VarStr (s ++ "B"), ww))) $ [(v, bucket 20 aa v) | v <- qqll (vvo `minus` vvoz)] ++ [(VarStr "SalePrice", bucket 20 aatr (VarStr "SalePrice"))] ++ [(v, bucket 20 aa' v) | v <- qqll vvoz, let rr = unit (sgl (llss [(v, ValInt 0)])), let bb = aa `red` sgl v `mul` rr, let aa' = trim (aa `red` sgl v `sub` bb)]
let aab = reframeb aa xx
let aatrb = reframeb aatr xx
let uub = sys aab `uunion` sys aatrb
let vvb = uvars uub `minus` sgl (VarStr "Id")
let vvbl = sgl (VarStr "SalePriceB")
let vvbk = vvb `minus` vvbl
let hhb = aahr uub aab `hrhrred` vvbk
Then the parameters are defined,
let model = "AMES_model1"
let (wmax,lmax,xmax,omax,bmax,mmax,umax,pmax,fmax,mult,seed) = (2919, 8, 2919, 50, (50*5), 5, 2919, 1, 20, 10, 5)
Here the limit of the underlying volume, xmax
, is set to the histogram size, 2919
,
size aa
2919 % 1
In general, the maximum-roll-by-derived-dimension decomper is such that increasing any of the parameters generally increases the summed alignment valency-density at the cost of computation time and space. In this case the parameters are chosen such that AMES_engine1
runs on a Ubuntu 16.04 Pentium CPU G2030 @ 3.00GHz using 1883 MB total memory in 6454 seconds.
Then the decomper is run,
Just (uub',dfb') <- decomperIO uub vvbk hhb wmax lmax xmax omax bmax mmax umax pmax fmax mult seed
...
where
...
decomperIO uu vv hh wmax lmax xmax omax bmax mmax umax pmax fmax mult seed =
parametersSystemsHistoryRepasDecomperMaxRollByMExcludedSelfHighestFmaxIORepa
wmax lmax xmax omax bmax mmax umax pmax fmax mult seed uu vv hh
Then the model is is written to AMES_model1.json,
BL.writeFile (model ++ ".json") $ decompFudsPersistentsEncode $ decompFudsPersistent dfb'
Finally, the summed alignment and the summed alignment valency-density are calculated,
let (a,ad) = summation mult seed uub' dfb' hhb
printf "alignment: %.2f\n" $ a
printf "alignment density: %.2f\n" $ ad
...
where
...
summation = systemsDecompFudsHistoryRepasAlignmentContentShuffleSummation_u
The summed alignment is,
alignment: 26138.99
alignment density: 11161.10