AMES - Model 1 induction

AMES - House Prices/Model 1 induction

AMES_model1.json is induced by AMES_engine1.hs.

AMES_engine1 may be built as described in README. Then run as follows -

stack exec AMES_engine1.exe +RTS -s >AMES_engine1.log 2>&1 &

tail -f AMES_engine1.log

The first section loads the sample,

    csvtr <- BL.readFile "train.csv"
    let vvcsvtr = either (\_ -> V.empty) id (Data.Csv.decode HasHeader csvtr :: Either String (V.Vector Train))
    let aatr = llaa [(llss [(VarStr s, fw rr) | (s,fw) <- trmap],1) | rr <- V.toList vvcsvtr]

    csvte <- BL.readFile "test.csv"
    let vvcsvte = either (\_ -> V.empty) id (Data.Csv.decode HasHeader csvte :: Either String (V.Vector Test))
    let aate = llaa [(llss [(VarStr s, fw rr) | (s,fw) <- temap],1) | rr <- V.toList vvcsvte]

    let uu = sys aatr `uunion` sys aate
    let vv = uvars uu `minus` sgl (VarStr "Id")
    let vvl = sgl (VarStr "SalePrice")
    let vvk = vv `minus` vvl

    let aa = (aatr `red` vvk) `add` (aate `red` vvk)

    let vvo = llqq [w | w <- qqll vv, isOrd uu w, let u = vol uu (sgl w), u > 16]

    let vvoz = llqq [w | w <- qqll vv, isOrd uu w, let u = vol uu (sgl w), u > 16, let rr = unit (sgl (llss [(w, ValInt 0)])), let bb = aatr `red` sgl w `mul` rr, size bb > 100]

    let xx = Map.fromList $ map (\(v,ww) -> let VarStr s = v in (v, (VarStr (s ++ "B"), ww))) $ [(v, bucket 20 aa v) | v <- qqll (vvo `minus` vvoz)] ++ [(VarStr "SalePrice", bucket 20 aatr (VarStr "SalePrice"))] ++ [(v, bucket 20 aa' v) | v <- qqll vvoz, let rr = unit (sgl (llss [(v, ValInt 0)])), let bb = aa `red` sgl v `mul` rr, let aa' = trim (aa `red` sgl v `sub` bb)]

    let aab = reframeb aa xx
    let aatrb = reframeb aatr xx

    let uub = sys aab `uunion` sys aatrb
    let vvb = uvars uub `minus` sgl (VarStr "Id")
    let vvbl = sgl (VarStr "SalePriceB")
    let vvbk = vvb `minus` vvbl

    let hhb = aahr uub aab `hrhrred` vvbk

Then the parameters are defined,

    let model = "AMES_model1"
    let (wmax,lmax,xmax,omax,bmax,mmax,umax,pmax,fmax,mult,seed) = (2919, 8, 2919, 50, (50*5), 5, 2919, 1, 20, 10, 5)

Here the limit of the underlying volume, xmax, is set to the histogram size, 2919,

size aa
2919 % 1

In general, the maximum-roll-by-derived-dimension decomper is such that increasing any of the parameters generally increases the summed alignment valency-density at the cost of computation time and space. In this case the parameters are chosen such that AMES_engine1 runs on a Ubuntu 16.04 Pentium CPU G2030 @ 3.00GHz using 1883 MB total memory in 6454 seconds.

Then the decomper is run,

    Just (uub',dfb') <- decomperIO uub vvbk hhb wmax lmax xmax omax bmax mmax umax pmax fmax mult seed
...
  where 
...
    decomperIO uu vv hh wmax lmax xmax omax bmax mmax umax pmax fmax mult seed =
      parametersSystemsHistoryRepasDecomperMaxRollByMExcludedSelfHighestFmaxIORepa 
        wmax lmax xmax omax bmax mmax umax pmax fmax mult seed uu vv hh

Then the model is is written to AMES_model1.json,

    BL.writeFile (model ++ ".json") $ decompFudsPersistentsEncode $ decompFudsPersistent dfb'

Finally, the summed alignment and the summed alignment valency-density are calculated,

    let (a,ad) = summation mult seed uub' dfb' hhb
    printf "alignment: %.2f\n" $ a
    printf "alignment density: %.2f\n" $ ad
...
  where 
...
    summation = systemsDecompFudsHistoryRepasAlignmentContentShuffleSummation_u

The summed alignment is,

alignment: 26138.99
alignment density: 11161.10

top