Coder Social home page Coder Social logo

pemilu2024's Introduction

pemilu2024

TLDR;

Source data is taken from this xlsx file provided by https://data-pemilu.vercel.app/.

As of 15-Feb-2024 15:49 WIB, the exported parquet file size is 85MB. It is desirable to provide diff for subsequent update cycle.

All of the generated (manually, for now) files are below:

File Name Size Created At Info
./ppwp_tps.csv.bz 41M 12:55 PM Feb 15 Initial scrapping result, taken from the xlsx above
./ppwp_tps.parquet 85M 05:28 PM Feb 15 Initial csv -> parquet
./ppwp_tps__001.parquet 17M 10:54 PM Feb 15 Update 1: updated_at > 2024-02-14 20:24:41.538
./ppwp_tps__002.parquet 2.9M 09:49 AM Feb 16 Update 2: updated_at > 2024-02-15 14:39:15.109

Data from new upstream servers for now logged in here.

NOTES

Parquet files was generated using duckdb, more or less like so:

--- convert the xlsx file to csv
--- e.g. in this case the exported file is 'ppwp_tps.csv'

# --- create table t0 schema
create table t0 (
    kode varchar primary key,
    provinsi_kode varchar,
    kabupaten_kota_kode varchar,
    kecamatan_kode varchar,
    kelurahan_desa_kode varchar,
    tps varchar,
    suara_paslon_1 integer,
    suara_paslon_2 integer,
    suara_paslon_3 integer,
    chasil_hal_1 varchar,
    chasil_hal_2 varchar,
    chasil_hal_3 varchar,
    suara_sah integer,
    suara_total integer,
    pemilih_dpt_j integer,
    pemilih_dpt_l integer,
    pemilih_dpt_p integer,
    pengguna_dpt_j integer,
    pengguna_dpt_l integer,
    pengguna_dpt_p integer,
    pengguna_dptb_j integer,
    pengguna_dptb_l integer,
    pengguna_dptb_p integer,
    suara_tidak_sah integer,
    pengguna_total_j integer,
    pengguna_total_l integer,
    pengguna_total_p integer,
    pengguna_non_dpt_j integer,
    pengguna_non_dpt_l integer,
    pengguna_non_dpt_p integer,
    psu varchar,
    ts timestamp,
    status_suara boolean,
    status_adm boolean,
    updated_at timestamp,
    created_at timestamp,
    url_page varchar,
    provinsi_nama varchar,
    kabupaten_kota_nama varchar,
    kecamatan_nama varchar,
    kelurahan_desa_nama varchar,
    url_api varchar
);

copy t0 from 'ppwp_tps.csv' with (header true, delimiter '\t', auto_detect false);

--- export to parquet
copy t0 to 'ppwp_tps.parquet' (format parquet);

Subsequent updates will be created as follow:

attach 'dbname=pemilu_2024 user=pantau password=ZubQ7IHFn1sTCP8C8rgw3T24QIiJktb8 host=vps.zakiego.com port=54325' as source_db (type postgres);

--- find the max `updated_at` from prev table
select max(updated_at) from t0;

--- create table t1 from querying source_db where updated_at > t0 max updated_at
create table t1 as select * from source_db.ppwp_tps where updated_at > (select max(updated_at) from t0);

--- export to parquet
copy t1 to 'ppwp_tps__001.parquet' (format parquet);

That's all for now. Contributions, both small and large, are welcome.


Pemilu 2024 Parquet File by Ridho is marked with CC0 1.0 Universal

pemilu2024's People

Contributors

reedho avatar

Stargazers

Sayyid Shalahuddin avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.