AMES - Model 1 induction

AMES - House Prices/Model 1 induction

AMES_model1.json is induced by AMES_engine1.hs.

AMES_engine1 may be built and executed as follows (see README) -

cd ../Alignment
rm *.o *.hi

cd ../AlignmentRepa
rm *.o *.hi

gcc -fPIC -c AlignmentForeign.c -o AlignmentForeign.o -O3

cd ../AMES
rm *.o *.hi

ghc -i../Alignment -i../AlignmentRepa ../AlignmentRepa/AlignmentForeign.o AMES_engine1.hs -o AMES_engine1.exe -rtsopts -O2

./AMES_engine1.exe +RTS -s >AMES_engine1.log 2>&1 &

tail -f AMES_engine1.log

The first section loads the sample,

    csvtr <- BL.readFile "train.csv"
    let vvcsvtr = either (\_ -> V.empty) id (Data.Csv.decode HasHeader csvtr :: Either String (V.Vector Train))
    let aatr = llaa [(llss [(VarStr s, fw rr) | (s,fw) <- trmap],1) | rr <- V.toList vvcsvtr]

    csvte <- BL.readFile "test.csv"
    let vvcsvte = either (\_ -> V.empty) id (Data.Csv.decode HasHeader csvte :: Either String (V.Vector Test))
    let aate = llaa [(llss [(VarStr s, fw rr) | (s,fw) <- temap],1) | rr <- V.toList vvcsvte]

    let uu = sys aatr `uunion` sys aate
    let vv = uvars uu `minus` sgl (VarStr "Id")
    let vvl = sgl (VarStr "SalePrice")
    let vvk = vv `minus` vvl

    let aa = (aatr `red` vvk) `add` (aate `red` vvk)

    let vvo = llqq [w | w <- qqll vv, isOrd uu w, let u = vol uu (sgl w), u > 16]

    let vvoz = llqq [w | w <- qqll vv, isOrd uu w, let u = vol uu (sgl w), u > 16, let rr = unit (sgl (llss [(w, ValInt 0)])), let bb = aatr `red` sgl w `mul` rr, size bb > 100]

    let xx = Map.fromList $ map (\(v,ww) -> let VarStr s = v in (v, (VarStr (s ++ "B"), ww))) $ [(v, bucket 20 aa v) | v <- qqll (vvo `minus` vvoz)] ++ [(VarStr "SalePrice", bucket 20 aatr (VarStr "SalePrice"))] ++ [(v, bucket 20 aa' v) | v <- qqll vvoz, let rr = unit (sgl (llss [(v, ValInt 0)])), let bb = aa `red` sgl v `mul` rr, let aa' = trim (aa `red` sgl v `sub` bb)]

    let aab = reframeb aa xx
    let aatrb = reframeb aatr xx

    let uub = sys aab `uunion` sys aatrb
    let vvb = uvars uub `minus` sgl (VarStr "Id")
    let vvbl = sgl (VarStr "SalePriceB")
    let vvbk = vvb `minus` vvbl

    let hhb = aahr uub aab `hrhrred` vvbk

Then the parameters are defined,

    let model = "AMES_model1"
    let (wmax,lmax,xmax,omax,bmax,mmax,umax,pmax,fmax,mult,seed) = (2919, 8, 2919, 50, (50*5), 5, 2919, 1, 20, 10, 5)

Here the limit of the underlying volume, xmax, is set to the histogram size, 2919,

size aa
2919 % 1

In general, the maximum-roll-by-derived-dimension decomper is such that increasing any of the parameters generally increases the summed alignment valency-density at the cost of computation time and space. In this case the parameters are chosen such that AMES_engine1 runs on a Ubuntu 16.04 Pentium CPU G2030 @ 3.00GHz using 1883 MB total memory in 6454 seconds.

Then the decomper is run,

    Just (uub',dfb') <- decomperIO uub vvbk hhb wmax lmax xmax omax bmax mmax umax pmax fmax mult seed
    decomperIO uu vv hh wmax lmax xmax omax bmax mmax umax pmax fmax mult seed =
        wmax lmax xmax omax bmax mmax umax pmax fmax mult seed uu vv hh

Then the model is is written to AMES_model1.json,

    BL.writeFile (model ++ ".json") $ decompFudsPersistentsEncode $ decompFudsPersistent dfb'

Finally, the summed alignment and the summed alignment valency-density are calculated,

    let (a,ad) = summation mult seed uub' dfb' hhb
    printf "alignment: %.2f\n" $ a
    printf "alignment density: %.2f\n" $ ad
    summation = systemsDecompFudsHistoryRepasAlignmentContentShuffleSummation_u

The summed alignment is,

alignment: 26138.99
alignment density: 11161.10