samthiriot / gosp.dpp Goto Github PK
View Code? Open in Web Editor NEWthe direct probabilistic pairing method for generation of synthetic populations
License: GNU General Public License v2.0
the direct probabilistic pairing method for generation of synthetic populations
License: GNU General Public License v2.0
right now MSE for degrees, frequencies and nA,nB are not on the same scale
so it does not makes sense to sum then and compare them.
test:
When working on a case with many empty cells in frequency, some parameters lead to the generation of very big populations.
To reproduce the case, using INSEE data:
library(data.table)
library(devtools)
load_all()
dwellings_raw <- read.csv(
file="~/projets/2017\ parcimonious\ iterated\ picking/application_lille/FD_LOGEMTZB_2014.txt",
header=T,
nrow=50000,
sep=";",
check.names=FALSE
#,
#col_types = cols(b=col_factor())
)
# INPER: nb personnes ménage
# NBPI: nb pieces logement
# SURF: surface logement
sample_dwellings <- gosp.dpp::create_sample(
data=dwellings_raw,
encoding = list(
# we provide no mapping
),
weight.colname="IPONDL"
)
# free some memory
remove(dwellings_raw)
#CATL: categorie
# 1 : Résidences principales
# 2 : Logements occasionnels
# 3 : Résidences secondaires
# 4 : Logements vacants
# Z : Hors logement ordinaire
#
# n'y mettre un foyer que si vacant
pdi <- create_degree_probabilities_table(
data.frame(
'CATL=1'=c(0.0, 1.0),
'CATL=2'=c(1.0, 0.0),
'CATL=3'=c(1.0, 0.0),
'CATL=4'=c(1.0, 0.0),
'CATL=Z'=c(1.0, 0.0),
check.names=FALSE
)
)
#
households_raw <- read.csv(
file="~/projets/2017\ parcimonious\ iterated\ picking/application_lille/FD_INDCVIZB_2014.txt",
header=T,
nrow=10000,
sep=";",
check.names=FALSE
#,
#col_types = cols(b=col_factor())
)
sample_households <- gosp.dpp::create_sample(
data=households_raw,
encoding = list(
# we provide no mapping
),
weight.colname="IPONDI"
)
remove(households_raw)
pdj <- create_degree_probabilities_table(
data.frame(
'STOCD=00'=c(1.0, 0.000001),
'STOCD=10'=c(0.000001, 1.0),
'STOCD=21'=c(0.000001, 1.0),
'STOCD=22'=c(0.000001, 1.0),
'STOCD=23'=c(0.000001, 1.0),
'STOCD=30'=c(1.0, 0.000001),
'STOCD=ZZ'=c(1.0, 0.000001),
check.names=FALSE
),
norm=TRUE
)
# STOCD
# 00 : Logement ordinaire inoccupé
# 10 : Propriétaire
# 21 : Locataire ou sous-locataire d'un logement loué vide non HLM
# 22 : Locataire ou sous-locataire d'un logement loué vide HLM
# 23 : Locataire ou sous-locataire d'un logement loué meublé ou d'une chambre d'hôtel
# 30 : Logé gratuitement
# ZZ : Hors logement ordinaire
# TYPL
# Type de logement
# 1 : Maison
# 2 : Appartement
# 3 : Logement-foyer
# 4 : Chambre d'hôtel
# 5 : Habitation de fortune
# 6 : Pièce indépendante (ayant sa propre entrée)
# Z : Hors logement ordinaire
# INPER: nb personnes ménage
# SURF: surface logement
# 6 5 3 4 7 1 2
pij <- create_matching_probabilities_table(
normalise(
data.frame(
"SURF=1"=c(1.0, 1.0, 0.7, 0.4, 0.1, 0.1, 0.001, 0.001, 0.001, 0.001, 0.001, 1.0),
"SURF=2"=c(1.0, 1.0, 1.0, 0.7, 0.4, 0.1, 0.1, 0.001, 0.001, 0.001, 0.001, 0.0),
"SURF=3"=c(0.8, 1.0, 1.0, 1.0, 0.7, 0.4, 0.1, 0.1, 0.001, 0.001, 0.001, 0.0),
"SURF=4"=c(0.3, 0.8, 1.0, 1.0, 1.0, 0.7, 0.4, 0.1, 0.1, 0.001, 0.001, 0.0),
"SURF=5"=c(0.3, 0.3, 0.8, 1.0, 1.0, 1.0, 0.7, 0.4, 0.1, 0.1, 0.1, 0.0),
"SURF=6"=c(0.1, 0.3, 0.3, 0.8, 1.0, 1.0, 1.0, 0.7, 0.4, 0.1, 0.1, 0.0),
"SURF=7"=c(0.01, 0.1, 0.3, 0.3, 0.8, 1.0, 1.0, 1.0, 0.7, 0.4, 0.4, 0.0),
row.names=c("INPER=1", "INPER=2", "INPER=3", "INPER=4", "INPER=5", "INPER=6", "INPER=7", "INPER=8", "INPER=9", "INPER=10", "INPER=11", "INPER=Z"),
check.names=FALSE
)
)
)
prepared <- matching.prepare(sample_dwellings, sample_households, pdi, pdj, pij)
solved <- matching.solve(prepared, nA=50000, nB=40000, nu.A=1, phi.A=0, delta.A=1, gamma=1, delta.B=1, phi.B=1, nu.B=1, verbose=T)
solved$gen$hat.nB
[1] 7002050000
it should sum the weights
typically if a good solution is hat.nA = nA, another hat.nB =nB, we might split the error between nA and nB
When testing on Debian Linux, R-release, an error is raised
* checking tests ...
Running ‘testthat.R’
ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
gamma = 0, delta.B = 1, phi.B = 1, nu.B = 1, verbose = FALSE) at testthat/test_basic1.R:389
2: resolve(sol, case, nA, nB, nu.A, phi.A, delta.A, nu.B, phi.B, delta.B, gamma, verbose = verbose)
3: resolve.missing.chain(sol.tmp, chain, case, nA, nB, nu.A, phi.A, delta.A, nu.B, phi.B,
delta.B, gamma, verbose = verbose)
══ testthat results ═══════════════════════════════════════════════════════════
OK: 632 SKIPPED: 2 FAILED: 5
1. Error: constraints: nA, phi.A, phi.B (@test_basic1.R#85)
2. Error: constraints: phi.A, delta.A (free on matching and B) (@test_basic1.R#167)
3. Error: constraints: phi.A, gamma (free on A and B) (@test_basic1.R#188)
4. Error: constraints: nothing (totally free - long chain) (@test_basic1.R#305)
5. Error: constraints: pdi with zero (p(di=0)=1.0) (@test_basic1.R#389)
Error: testthat unit tests failed
Execution halted
data(dwellings_households)
prepared <- matching.prepare(dwellings_households$sample.A, dwellings_households$sample.B, dwellings_households$pdi, dwellings_households$pdj, dwellings_households$pij)
solved <- matching.solve(prepared, nA=50000, nB=40000, nu.A=1, phi.A=1, delta.A=1, gamma=0, delta.B=1, phi.B=1, nu.B=1)
generated <- matching.generate(solved, dwellings_households$sample.A, dwellings_households$sample.B)
plot(generated, dwellings_households$sample.A$sample, dwellings_households$sample.B$sample)
... before generation
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.