AMES - Model 1 induction
AMES - House Prices/Model 1 induction
AMES_model1.json is induced by AMES_engine1.hs.
may be built as described in README. Then run as follows -
stack exec AMES_engine1.exe +RTS -s >AMES_engine1.log 2>&1 &
tail -f AMES_engine1.log
The first section loads the sample,
csvtr <- BL.readFile "train.csv"
let vvcsvtr = either (\_ -> V.empty) id (Data.Csv.decode HasHeader csvtr :: Either String (V.Vector Train))
let aatr = llaa [(llss [(VarStr s, fw rr) | (s,fw) <- trmap],1) | rr <- V.toList vvcsvtr]
csvte <- BL.readFile "test.csv"
let vvcsvte = either (\_ -> V.empty) id (Data.Csv.decode HasHeader csvte :: Either String (V.Vector Test))
let aate = llaa [(llss [(VarStr s, fw rr) | (s,fw) <- temap],1) | rr <- V.toList vvcsvte]
let uu = sys aatr `uunion` sys aate
let vv = uvars uu `minus` sgl (VarStr "Id")
let vvl = sgl (VarStr "SalePrice")
let vvk = vv `minus` vvl
let aa = (aatr `red` vvk) `add` (aate `red` vvk)
let vvo = llqq [w | w <- qqll vv, isOrd uu w, let u = vol uu (sgl w), u > 16]
let vvoz = llqq [w | w <- qqll vv, isOrd uu w, let u = vol uu (sgl w), u > 16, let rr = unit (sgl (llss [(w, ValInt 0)])), let bb = aatr `red` sgl w `mul` rr, size bb > 100]
let xx = Map.fromList $ map (\(v,ww) -> let VarStr s = v in (v, (VarStr (s ++ "B"), ww))) $ [(v, bucket 20 aa v) | v <- qqll (vvo `minus` vvoz)] ++ [(VarStr "SalePrice", bucket 20 aatr (VarStr "SalePrice"))] ++ [(v, bucket 20 aa' v) | v <- qqll vvoz, let rr = unit (sgl (llss [(v, ValInt 0)])), let bb = aa `red` sgl v `mul` rr, let aa' = trim (aa `red` sgl v `sub` bb)]
let aab = reframeb aa xx
let aatrb = reframeb aatr xx
let uub = sys aab `uunion` sys aatrb
let vvb = uvars uub `minus` sgl (VarStr "Id")
let vvbl = sgl (VarStr "SalePriceB")
let vvbk = vvb `minus` vvbl
let hhb = aahr uub aab `hrhrred` vvbk
Then the parameters are defined,
let model = "AMES_model1"
let (wmax,lmax,xmax,omax,bmax,mmax,umax,pmax,fmax,mult,seed) = (2919, 8, 2919, 50, (50*5), 5, 2919, 1, 20, 10, 5)
Here the limit of the underlying volume, xmax
, is set to the histogram size, 2919
size aa
2919 % 1
In general, the maximum-roll-by-derived-dimension decomper is such that increasing any of the parameters generally increases the summed alignment valency-density at the cost of computation time and space. In this case the parameters are chosen such that AMES_engine1
runs on a Ubuntu 16.04 Pentium CPU G2030 @ 3.00GHz using 1883 MB total memory in 6454 seconds.
Then the decomper is run,
Just (uub',dfb') <- decomperIO uub vvbk hhb wmax lmax xmax omax bmax mmax umax pmax fmax mult seed
decomperIO uu vv hh wmax lmax xmax omax bmax mmax umax pmax fmax mult seed =
wmax lmax xmax omax bmax mmax umax pmax fmax mult seed uu vv hh
Then the model is is written to AMES_model1.json,
BL.writeFile (model ++ ".json") $ decompFudsPersistentsEncode $ decompFudsPersistent dfb'
Finally, the summed alignment and the summed alignment valency-density are calculated,
let (a,ad) = summation mult seed uub' dfb' hhb
printf "alignment: %.2f\n" $ a
printf "alignment density: %.2f\n" $ ad
summation = systemsDecompFudsHistoryRepasAlignmentContentShuffleSummation_u
The summed alignment is,
alignment: 26138.99
alignment density: 11161.10