Coder Social home page Coder Social logo

samthiriot / gosp.dpp Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 462 KB

the direct probabilistic pairing method for generation of synthetic populations

License: GNU General Public License v2.0

R 100.00%
synthetic-population-library synthetic-population r networks network-generator rpackage

gosp.dpp's People

Contributors

samthiriot avatar

Stargazers

 avatar

Watchers

 avatar  avatar

gosp.dpp's Issues

trying to generate billions of individuals when some frequencies are 0

When working on a case with many empty cells in frequency, some parameters lead to the generation of very big populations.

To reproduce the case, using INSEE data:

library(data.table)
library(devtools)
load_all()

dwellings_raw <- read.csv(
		file="~/projets/2017\ parcimonious\ iterated\ picking/application_lille/FD_LOGEMTZB_2014.txt", 
		header=T, 
		nrow=50000, 
		sep=";",
		check.names=FALSE
		#,
		#col_types = cols(b=col_factor())
		)

# INPER: nb personnes ménage
# NBPI: nb pieces logement
# SURF: surface logement

sample_dwellings <- gosp.dpp::create_sample(
                data=dwellings_raw,
                encoding = list(
                        # we provide no mapping
                       ),
                weight.colname="IPONDL"
                )

# free some memory
remove(dwellings_raw)


#CATL: categorie
# 	1 : Résidences principales
# 	2 : Logements occasionnels
# 	3 : Résidences secondaires
# 	4 : Logements vacants
# 	Z : Hors logement ordinaire
#
# n'y mettre un foyer que si vacant
pdi <- create_degree_probabilities_table(
                data.frame(
                    'CATL=1'=c(0.0, 1.0),
                    'CATL=2'=c(1.0, 0.0),
                    'CATL=3'=c(1.0, 0.0),
                    'CATL=4'=c(1.0, 0.0),
                    'CATL=Z'=c(1.0, 0.0),
                    check.names=FALSE
                    )
                )


#
households_raw <- read.csv(
		file="~/projets/2017\ parcimonious\ iterated\ picking/application_lille/FD_INDCVIZB_2014.txt", 
		header=T, 
		nrow=10000, 
		sep=";",
		check.names=FALSE
		#,
		#col_types = cols(b=col_factor())
		)
sample_households <- gosp.dpp::create_sample(
                data=households_raw,
                encoding = list(
                        # we provide no mapping
                       ),
                weight.colname="IPONDI"
                )
remove(households_raw)


pdj <- create_degree_probabilities_table(
                data.frame(
                    'STOCD=00'=c(1.0, 0.000001),
                    'STOCD=10'=c(0.000001, 1.0),
                    'STOCD=21'=c(0.000001, 1.0),
                    'STOCD=22'=c(0.000001, 1.0),
                    'STOCD=23'=c(0.000001, 1.0),
                    'STOCD=30'=c(1.0, 0.000001),
                    'STOCD=ZZ'=c(1.0, 0.000001),
                    check.names=FALSE
                    ),
                norm=TRUE
                )


# STOCD
	# 00 : Logement ordinaire inoccupé
	# 10 : Propriétaire
	# 21 : Locataire ou sous-locataire d'un logement loué vide non HLM
	# 22 : Locataire ou sous-locataire d'un logement loué vide HLM
	# 23 : Locataire ou sous-locataire d'un logement loué meublé ou d'une chambre d'hôtel
	# 30 : Logé gratuitement
	# ZZ : Hors logement ordinaire

# TYPL
	# Type de logement
	# 1 : Maison
	# 2 : Appartement
	# 3 : Logement-foyer
	# 4 : Chambre d'hôtel
	# 5 : Habitation de fortune
	# 6 : Pièce indépendante (ayant sa propre entrée)
	# Z : Hors logement ordinaire

# INPER: nb personnes ménage


# SURF: surface logement
	# 6 5 3 4 7 1 2

pij <- create_matching_probabilities_table(
		normalise(
			data.frame(
				"SURF=1"=c(1.0, 1.0, 0.7, 0.4, 0.1, 0.1, 0.001, 0.001, 0.001, 0.001, 0.001, 1.0), 
				"SURF=2"=c(1.0, 1.0, 1.0, 0.7, 0.4, 0.1, 0.1, 0.001, 0.001, 0.001, 0.001, 0.0), 
				"SURF=3"=c(0.8, 1.0, 1.0, 1.0, 0.7, 0.4, 0.1, 0.1, 0.001, 0.001, 0.001, 0.0), 
				"SURF=4"=c(0.3, 0.8, 1.0, 1.0, 1.0, 0.7, 0.4, 0.1, 0.1, 0.001, 0.001, 0.0), 
				"SURF=5"=c(0.3, 0.3, 0.8, 1.0, 1.0, 1.0, 0.7, 0.4, 0.1, 0.1, 0.1, 0.0), 
				"SURF=6"=c(0.1, 0.3, 0.3, 0.8, 1.0, 1.0, 1.0, 0.7, 0.4, 0.1, 0.1, 0.0), 
				"SURF=7"=c(0.01,  0.1, 0.3, 0.3, 0.8, 1.0, 1.0, 1.0, 0.7, 0.4, 0.4, 0.0), 
		        row.names=c("INPER=1", "INPER=2", "INPER=3",  "INPER=4",  "INPER=5",  "INPER=6",  "INPER=7",  "INPER=8",  "INPER=9",  "INPER=10",  "INPER=11", "INPER=Z"), 
		        check.names=FALSE
		        )
			)
		)

prepared <- matching.prepare(sample_dwellings, sample_households, pdi, pdj, pij) 

solved <- matching.solve(prepared, nA=50000, nB=40000, nu.A=1, phi.A=0, delta.A=1, gamma=1, delta.B=1, phi.B=1, nu.B=1, verbose=T)

solved$gen$hat.nB
[1] 7002050000

solving fails on some system with a "subscript error"

When testing on Debian Linux, R-release, an error is raised

* checking tests ...
  Running ‘testthat.R’
 ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
         gamma = 0, delta.B = 1, phi.B = 1, nu.B = 1, verbose = FALSE) at testthat/test_basic1.R:389
  2: resolve(sol, case, nA, nB, nu.A, phi.A, delta.A, nu.B, phi.B, delta.B, gamma, verbose = verbose)
  3: resolve.missing.chain(sol.tmp, chain, case, nA, nB, nu.A, phi.A, delta.A, nu.B, phi.B, 
         delta.B, gamma, verbose = verbose)
  
  ══ testthat results  ═══════════════════════════════════════════════════════════
  OK: 632 SKIPPED: 2 FAILED: 5
  1. Error: constraints: nA, phi.A, phi.B (@test_basic1.R#85) 
  2. Error: constraints: phi.A, delta.A (free on matching and B) (@test_basic1.R#167) 
  3. Error: constraints: phi.A, gamma (free on A and B) (@test_basic1.R#188) 
  4. Error: constraints: nothing (totally free - long chain) (@test_basic1.R#305) 
  5. Error: constraints: pdi with zero (p(di=0)=1.0) (@test_basic1.R#389) 
  
  Error: testthat unit tests failed
  Execution halted

investigate why pij is not respected in case2

data(dwellings_households)
prepared <- matching.prepare(dwellings_households$sample.A, dwellings_households$sample.B, dwellings_households$pdi, dwellings_households$pdj, dwellings_households$pij)
solved <- matching.solve(prepared, nA=50000, nB=40000, nu.A=1, phi.A=1, delta.A=1, gamma=0, delta.B=1, phi.B=1, nu.B=1)
generated <- matching.generate(solved, dwellings_households$sample.A, dwellings_households$sample.B)
plot(generated, dwellings_households$sample.A$sample, dwellings_households$sample.B$sample)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.