Coder Social home page Coder Social logo

dathere / datapusher-plus Goto Github PK

View Code? Open in Web Editor NEW
26.0 4.0 18.0 1.84 MB

A standalone web service that pushes data into the CKAN Datastore fast & reliably. It pushes real good!

License: GNU Affero General Public License v3.0

Python 97.57% Dockerfile 1.93% Shell 0.50%
ckan datastore open-data

datapusher-plus's Introduction

DataPusher+

DataPusher+ is a fork of Datapusher that combines the speed and robustness of ckanext-xloader with the data type guessing of Datapusher.

Datapusher+ is built using CKAN Service Provider, with Messytables replaced by qsv.

TNRIS/TWDB provided the use cases that informed and supported the development of Datapusher+, specifically, to support a Resource-first upload workflow.

For a more detailed overview, see the CKAN Monthly Live Jan 2023 presentation.

It features:

  • "Bullet-proof", ultra-fast data type inferencing with qsv

    Unlike Messytables which scans only the the first few rows to guess the type of a column, qsv scans the entire table so its data type inferences are guaranteed1.

    Despite this, qsv is still exponentially faster even if it scans the whole file, not only inferring data types, it also calculates summary statistics as well. For example, scanning a 2.7 million row, 124MB CSV file for types and stats took 0.16 seconds2.

    It is very fast as qsv is written in Rust, is multithreaded, and uses all kinds of performance techniques especially designed for data-wrangling.

  • Exponentially faster loading speed

    Similar to xloader, we use PostgreSQL COPY to directly pipe the data into the datastore, short-circuiting the additional processing/transformation/API calls used by Datapusher.

    But unlike xloader, we load everything using the proper data types and not as text, so there's no need to reload the data again after adjusting the Data Dictionary, as you would with xloader.

  • Far more Storage Efficient AND Performant Datastore with easier to compose SQL queries

    As we create the Datastore tables using the most efficient PostgreSQL data type for each column using qsv's guaranteed type inferences - the Datastore is not only more storage efficient, it is also far more more performant for loading AND querying.

    With its "smartint" data type (with qsv inferring the most efficient integer data type for the range of values in the column); comprehensive date format inferencing (supporting 19 date formats, with each format having several variants & with configurable DMY/MDY preference parsing) & auto-formatting dates to RFC3339 format so they are stored as Postgres timestamps; cardinality-aware, configurable auto-indexing; automatic sanitization of column names to valid PostgreSQL column identifiers; auto PostgreSQL vacuuming & analysis of resources after loading; and more - DP+ enables the Datastore to tap into PostgreSQL's full power.

    Configurable auto-aliasing of resources also makes it easier to compose SQL queries, as you can use more intuitive resource aliases instead of cryptic resource IDs.

  • Production-ready Robustness

    In production, the number one source of support issues is Datapusher - primarily, because of data quality issues and Datapusher's inability to correctly infer data types, gracefully handle errors3, and provide the Data Publisher actionable information to correct the data.

    Datapusher+'s design directly addresses all these issues.

  • More informative datastore loading messages

    Datapusher+ messages are designed to be more verbose and actionable, so the data publisher's user experience is far better and makes it possible to have a resource-first upload workflow.

  • Extended preprocessing with qsv

    qsv is leveraged by Datapusher+ to:

    • create "Smarter" Data Dictionaries, with:
      • guaranteed data type inferences
      • optional ability to automatically choose the best integer PostgreSQL data type ("smartint") based on the range of the numeric column (PostgreSQL's int, bigint and numeric types) for optimal storage/indexing efficiency and SQL query performance.
      • sanitized column names (guaranteeing valid PostgreSQL column identifiers) while preserving the original column name as a label, which is used to label columns in DataTables_view.
      • an optional "summary stats" resource as an extension of the Data Dictionary, with comprehensive summary statistics for each column - sum, min/max/range, min/max length, mean, stddev, variance, nullcount, sparsity, quartiles, IQR, lower/upper fences, skewness, median, mode/s, antimode/s & cardinality.
    • convert Excel & OpenOffice/LibreOffice Calc (ODS) files to CSV, with the ability to choose which sheet to use by default (e.g. 0 is the first sheet, -1 is the last sheet, -2 the second to last sheet, etc.)
    • convert various date formats (19 date formats are recognized with each format having several variants; ~80 date format permutations in total) to a standard RFC 3339 format
    • enable random access of a CSV by creating a CSV index - which also enables parallel processing of different parts of a CSV simultaneously (a major reason type inferencing and stats calculation is so fast)
    • instantaneously count the number of rows with a CSV index
    • validate if an uploaded CSV conforms to the RFC-4180 standard
    • normalizes and transcodes CSV/TSV dialects into a standard UTF-8 encoded RFC-4180 CSV format
    • optionally create a preview subset, with the ability to only download the first n preview rows of a file, and not the entire file (e.g. only download first 1,000 rows of 3 gb CSV file - especially good for harvesting/cataloging external sites where you only want to harvest the metadata and a small sample of each file).
    • optionally create a preview subset from the end of a file (e.g. last 1,000 rows, good for time-series/sensor data)
    • auto-index columns based on its cardinality/format (unique indices created for columns with all unique values, auto-index columns whose cardinality is below a given threshold; auto-index date columns)
    • check for duplicates, and optionally deduplicate rows
    • optionally screen for Personally Identifiable Information (PII), with an option to "quarantine" the PII-candidate rows in a separate resource, while still creating the screened resource.
    • optional ability to specify a custom PII screening regex set, instead of the default PII screening regex set.

    Even with all these pre-processing tasks, qsv typically takes less than 5 seconds to finish all its analysis tasks, even for a 100mb CSV file.

    Future versions of Datapusher+ will further leverage qsv's 80+ commands to do additional preprocessing, data-wrangling and validation. The Roadmap is available here. Ideas, suggestions and your feedback are most welcome!

Development Installation

Datapusher+ is a drop-in replacement for Datapusher, so it's installed the same way.

  1. Install the required packages.

    sudo apt install python3-virtualenv python3-dev python3-pip python3-wheel build-essential libxslt1-dev libxml2-dev zlib1g-dev git libffi-dev libpq-dev file
  2. Create a virtual environment for Datapusher+ using at least python 3.8.

    cd /usr/lib/ckan
    sudo python3.8 -m venv dpplus_venv
    sudo chown -R $(whoami) dpplus_venv
    . dpplus_venv/bin/activate
    cd dpplus_venv

    ℹ️ NOTE: DP+ requires at least python 3.8 as it makes extensive use of new capabilities introduced in 3.7/3.8 to the subprocess module. If you're using Ubuntu 18.04 or earlier, follow the procedure below to install python 3.8:

    sudo add-apt-repository ppa:deadsnakes/ppa
    # we use 3.8 here, but you can get a higher version by changing the version suffix of the packages below
    sudo apt install python3.8 python3.8-venv python3.8-dev
    # install additional dependencies
    sudo apt install build-essential libxslt1-dev libxml2-dev zlib1g-dev git libffi-dev

    Note that DP+ still works with CKAN<=2.8, which uses older versions of python.

  3. Get the code.

    mkdir src
    cd src
    git clone --branch 0.11.0 https://github.com/datHere/datapusher-plus
    cd datapusher-plus
  4. Install the dependencies.

    pip install wheel
    pip install -r requirements-dev.txt
    pip install -e .
  5. Install qsv.

    Download the appropriate precompiled binaries for your platform and copy it to the appropriate directory, e.g. for Linux:

    wget https://github.com/jqnatividad/qsv/releases/download/0.108.0/qsv-0.108.0-x86_64-unknown-linux-gnu.zip
    unzip qsv-0.108.0-x86_64-unknown-linux-gnu.zip
    rm qsv-0.108.0-x86_64-unknown-linux-gnu.zip
    sudo mv qsv* /usr/local/bin

    Alternatively, if you want to install qsv from source, follow the instructions here. Note that when compiling from source, you may want to look into the Performance Tuning section to squeeze even more performance from qsv.

    Also, if you get glibc errors when starting qsv, your Linux distro may not have the required version of the GNU C Library (This will be the case when running Ubuntu 18.04 or older). If so, use the qsvdp_glibc-2.31 binary as its linked to an older version of glibc. If that still fails, the use the unknown-linux-musl.zip archive as it is statically linked with the MUSL C Library.

    If you already have qsv, update it to the latest release by using the --update option.

    qsvdp --update

    ℹ️ NOTE: qsv is a general purpose CSV data-wrangling toolkit that gets regular updates. To update to the latest version, just run qsv with the --update option and it will check for the latest version and update as required.

  6. Configure the Datapusher+ database.

    Make sure to create the datapusher PostgreSQL user and the datapusher_jobs database (see DataPusher+ Database Setup).

  7. Copy the datapusher/dot-env.template to datapusher/.env and modify your configuration.

    cd /usr/lib/ckan/dpplus_env/src/datapusher-plus/datapusher
    cp dot-env.template .env
    # configure your installation as required
    nano .env
  8. Run Datapusher+ in the dpplus_venv virtual environment.

    python main.py config.py

    By default, DP+ should be running at the following port:

    http://localhost:8800/

Production Deployment

There are two ways to deploy Datapusher+:

  1. Manual Deployment

    These instructions set up the DataPusher web service on uWSGI running on port 8800, but can be easily adapted to other WSGI servers like Gunicorn. You'll probably need to set up Nginx as a reverse proxy in front of it and something like Supervisor to keep the process up.

    # Install requirements for DataPusher+. Be sure to have at least Python 3.8
    sudo apt install python3-virtualenv python3-dev python3-pip python3-wheel build-essential libxslt1-dev libxml2-dev zlib1g-dev git libffi-dev libpq-dev file
    
    # Install qsv, if required
    wget https://github.com/jqnatividad/qsv/releases/download/0.108.0/qsv-0.108.0-x86_64-unknown-linux-gnu.zip -P /tmp
    unzip /tmp/qsv-0.108.0-x86_64-unknown-linux-gnu.zip -d /tmp
    rm /tmp/qsv-0.108.0-x86_64-unknown-linux-gnu.zip
    sudo mv /tmp/qsv* /usr/local/bin
    
    # if qsv is already installed, be sure to update it to the latest release
    sudo qsvdp --update
    
    # if you get a glibc error when running `qsvdp --update`
    # you're on an old distro (e.g. Ubuntu 18.04) without the required version of the glibc libraries.
    # If so, try running the qsvdp_glibc-2.31 binary instead. If it runs, you can use it instead of the default qsvdp binary.
    # If that still doesnt work, use the statically linked MUSL version instead
    # https://github.com/jqnatividad/qsv/releases/download/0.108.0/qsv-0.108.0-x86_64-unknown-linux-musl.zip
    
    # find out the locale settings
    locale
    
    # ONLY IF LANG is not "en_US.UTF-8", set locale
    export LC_ALL="en_US.UTF-8"
    export LC_CTYPE="en_US.UTF-8"
    sudo dpkg-reconfigure locales
    
    # Create a virtualenv for DataPusher+. DP+ requires at least python 3.8.
    sudo python3.8 -m venv /usr/lib/ckan/dpplus_venv
    sudo chown -R $(whoami) dpplus_venv
    
    # install datapusher-plus in the virtual environment
    . /usr/lib/ckan/dpplus_venv/bin/activate
    pip install wheel
    pip install datapusher-plus
    
    # create an .env file and tune DP+ settings. Tune the uwsgi.ini file as well
    sudo mkdir -p /etc/ckan/datapusher-plus
    sudo curl https://raw.githubusercontent.com/dathere/datapusher-plus/master/datapusher/dot-env.template -o /etc/ckan/datapusher-plus/.env
    sudo curl https://raw.githubusercontent.com/dathere/datapusher-plus/master/deployment/datapusher-uwsgi.ini -o /etc/ckan/datapusher-plus/uwsgi.ini
    
    # Be sure to initialize the database if required. (See Database Setup section below)
    # Be sure to edit the .env file and set the right database connect strings!
    
    # Create a user to run the web service (if necessary)
    sudo addgroup www-data
    sudo adduser -G www-data www-data

    At this point you can run DataPusher+ with the following command:

    /usr/lib/ckan/dpplus_venv/bin/uwsgi --enable-threads -i /etc/ckan/datapusher-plus/uwsgi.ini

    You might need to change the uid and guid in the uwsgi.ini file when using a different user.

    To deploy it using supervisor:

    sudo curl https://raw.githubusercontent.com/dathere/datapusher-plus/master/deployment/datapusher-uwsgi.conf -o /etc/supervisor/conf.d/datapusher-uwsgi.conf
    sudo service supervisor restart
  2. Dockerized Deployment

    As Datapusher+ is quite involved as evinced by the above procedure, a containerized installation will make it far easier not only to deploy DP+ to production, but also to experiment with.

    Instructions to set up the DP+ Docker instance can be found here.

    The DP+ Docker will also expose additional features and administrative interface to manage not only Datapusher+ jobs, but also to manage the CKAN Datastore.

Configuring

CKAN Configuration

Add datapusher to the plugins in your CKAN configuration file (generally located at /etc/ckan/default/ckan.ini):

ckan.plugins = <other plugins> datapusher

In order to tell CKAN where this webservice is located, the following must be added to the [app:main] section of your CKAN configuration file :

ckan.datapusher.url = http://127.0.0.1:8800/

There are other CKAN configuration options that allow to customize the CKAN - DataPusher integration. Please refer to the DataPusher Settings section in the CKAN documentation for more details.

ℹ️ NOTE: DP+ recognizes some additional TSV and spreadsheet subformats - xlsm and xlsb for Excel Spreadsheets, and tab for TSV files. To process these subformats, set ckan.datapusher.formats as follows in your CKAN.INI file:

ckan.datapusher.formats = csv xls xlsx xlsm xlsb tsv tab application/csv application/vnd.ms-excel application/vnd.openxmlformats-officedocument.spreadsheetml.sheet ods application/vnd.oasis.opendocument.spreadsheet

and add this entry to your CKAN's resource_formats.json file.

["TAB", "Tab Separated Values File", "text/tab-separated-values", []],

DataPusher+ Configuration

The DataPusher+ instance is configured in the .env file located in the working directory of DP+ (/etc/ckan/datapusher-plus when running a production deployment. The datapusher-plus/datapusher source directory when running a development installation.)

See dot-env.template for a summary of configuration options available.

DataPusher+ Database Setup

DP+ requires a dedicated PostgreSQL account named datapusher to connect to the CKAN Datastore.

To create the datapusher user and give it the required privileges to the datastore_default database:

su - postgres
psql -d datastore_default
CREATE ROLE datapusher LOGIN PASSWORD 'YOURPASSWORD';
GRANT CREATE, CONNECT, TEMPORARY, SUPERUSER ON DATABASE datastore_default TO datapusher;
GRANT SELECT, INSERT, UPDATE, DELETE, TRUNCATE ON ALL TABLES IN SCHEMA public TO datapusher;
\q

DP+ also requires its own job_store database to keep track of all the DP+ jobs. In the original Datapusher, this was a sqlite database by default. Though DP+ can still use a sqlite database, we are discouraging its use.

To setup the datapusher_jobs database and its user:

sudo -u postgres createuser -S -D -R -P datapusher_jobs
sudo -u postgres createdb -O datapusher_jobs datapusher_jobs -E utf-8

Usage

Any file that has one of the supported formats (defined in ckan.datapusher.formats) will be attempted to be loaded into the DataStore.

You can also manually trigger resources to be resubmitted. When editing a resource in CKAN (clicking the "Manage" button on a resource page), a new tab named "DataStore" will appear. This will contain a log of the last attempted upload and a button to retry the upload. Once a resource has been "pushed" into the Datastore, a "Data Dictionary" tab will also be available where the data pusblisher can fine-tune the inferred data dictionary.

DataPusher+ UI DataPusher+ UI 2

Command line

Run the following command to submit all resources to datapusher, although it will skip files whose hash of the data file has not changed:

ckan -c /etc/ckan/default/ckan.ini datapusher resubmit

On CKAN<=2.8:

paster --plugin=ckan datapusher resubmit -c /etc/ckan/default/ckan.ini

To Resubmit a specific resource, whether or not the hash of the data file has changed::

ckan -c /etc/ckan/default/ckan.ini datapusher submit {dataset_id}

On CKAN<=2.8:

paster --plugin=ckan datapusher submit <pkgname> -c /etc/ckan/default/ckan.ini

Testing

To test Datapusher-plus, you can use the following test script available on GitHub: test script.

Uninstalling Datapusher+

Should you need to remove Datapusher+, and you followed either the Development or Production Installation procedures above:

# if you're running inside the dpplus_venv virtual environment, deactivate it first
deactivate

# remove the DP+ python virtual environment
sudo rm -rf /usr/lib/ckan/dpplus_venv

# remove the supervisor DP+ configuration
sudo rm -rf /etc/supervisor/conf.d/datapusher-uwsgi.conf

# remove the DP+ production deployment directory
sudo rm -rf /etc/ckan/datapusher-plus

# remove qsv binary variants
sudo rm /usr/local/bin/qsv /usr/local/bin/qsvdp /usr/local/bin/qsvlite /usr/local/bin/qsv_nightly /usr/local/bin/qsvdp_nightly /usr/local/bin/qsvlite_nightly

# restart the supervisor, without the Datapusher+ service
sudo service supervisor reload

# ========= DATABASE objects ============
# OPTIONAL: backup the datapusher_jobs database first if 
# you want to retain the DP+ job history
sudo -u postgres pg_dump --format=custom -d datapusher_jobs > datapusher_jobs.dump

# to remove the Datapusher+ job database and the datapusher_jobs user/role
sudo -u postgres dropdb datapusher_jobs
sudo -u postgres dropuser datapusher_jobs

# to drop the datapusher user which DP+ uses to write to the CKAN Datastore
sudo -u postgres dropuser datapusher

To ensure the Datapusher+ service is not automatically invoked when tabular resources are uploaded, remove datapusher from ckan.plugins in your ckan.ini file.

Also remove/comment out the following ckan.datapusher entries in your ckan.ini:

  • ckan.datapusher.formats
  • ckan.datapusher.url
  • ckan.datapusher.callback_url_base
  • ckan.datapusher.assume_task_stale_after

Note that resources which has been pushed previously will still be available on the CKAN Datastore. You will have to delete these resources separately using the UI or the CKAN resource_delete API.

If you're no longer using the CKAN Datastore:

  • Edit your ckan.ini and remove/comment datastore from ckan.plugins.
  • Remove/comment out the ckan.datastore.write_url and ckan.datastore.read_url entries.

To confirm the uninstallation is successful, upload a new tabular resource and check if:

  • tabular Resource Views (e.g. datatables_view, recline_view, etc.) are no longer available
  • the Datastore and Data Dictionary tabs are no longer available
  • the Download button on the resource page will no longer offer alternate download formats (CSV, TSV, JSON, XML)
  • the Datastore API button will no longer display on tabular resources

License

This material is copyright (c) 2020 Open Knowledge Foundation and other contributors

It is open and licensed under the GNU Affero General Public License (AGPL) v3.0 whose full text may be found at:

http://www.fsf.org/licensing/licenses/agpl-3.0.html

Footnotes

  1. Why use qsv instead of a "proper" python data analysis library like pandas?

  2. It takes 0.16 seconds with an index to run qsv stats against the qsv whirlwind tour sample file on a Ryzen 4800H (8 physical/16 logical cores) with 32 gb memory and a 1 TB SSD. Without an index, it takes 1.3 seconds.

  3. Imagine you have a 1M row CSV, and the last row has an invalid value for a numeric column (e.g. "N/A" instead of a number). After spending hours pushing the data very slowly, legacy datapusher will abort on the last row and the ENTIRE job is invalid. Ok, that's bad, but what makes it worse is that the old table has been deleted already, and Datapusher doesn't tell you what caused the job to fail! YIKES!!!!

datapusher-plus's People

Contributors

alvarollmenezes avatar amercader avatar bluepython508 avatar categulario avatar ctrepka avatar davidmiller avatar dependabot[bot] avatar domoritz avatar ericsoroos avatar joetsoi avatar jqnatividad avatar kindly avatar klikstermkd avatar madebydavid avatar mbocevski avatar metaodi avatar minhajuddin2510 avatar morty avatar nigelbabu avatar rossjones avatar rufuspollock avatar seanh avatar shubham-mahajan avatar stefina avatar thrawnca avatar tktech avatar tomecirun avatar vitorbaptista avatar wardi avatar zharktas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

datapusher-plus's Issues

`IGNORE_FILE_HASH` is set to `True` by default

This causes problems when a resource's metadata is modified, and even if the resource file itself is not changed.

This was set to True by default to facilitate devt/testing of DP+. It should be False by default.

cc @twdbben

Containerfile does miss GLIBCXX dependency

I have built 0.11.0, but pushing gives this error:

ckan@minipod_ckan:/$ /usr/local/bin/qsvdp
/usr/local/bin/qsvdp: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /usr/local/bin/qsvdp)

Smarter automatic deduplication

Automatic deduplication works well (#25), however, when duplicates are found and removed, the datastore table and the resource file are no longer in sync.

Smarter dedup can be handled three ways. When dupes are found:

  1. Stop the DP+ job and show the dupe error in the Datastore tab.
  2. Replace the resource file with the dedupped CSV.
  3. Take advantage of qsv dedup's --dupes-output option and create two new resources - RESOURCENAME_dupes.csv and RESOURCENAME_dedupped.csv which are pushed to the Datastore. The original resource with dupes is NOT pushed. The Data Publisher can then just use the CKAN interface to manage which resource to keep (e.g. delete the original and the _dupes resources; rename the _dedupped resource, removing the _dedupped suffix.)

Smart resource download

Create a DOWNLOAD_ALWAYS_WHITELIST - a list of hosts from which DP+ will always download the entire dataset even if DOWNLOAD_PREVIEW_ONLY is true.

DOWNLOAD_ALWAYS_WHITELIST are typically local hosts or hosts where the CKAN instance has fast connections/peering arrangements.

Doing so will allow us to always do comprehensive metadata inferencing even if we're only pushing PREVIEW_ROWS into the Datastore.

Container image

Create a reference container image for Datapusher+. This is already WIP and being done by @ctrepka .

Error on push of a CSV resource with only a header row in it (no data rows)

Pushing a CSV resource to the DataStore with only a header row in it (no data rows) causes DataPusher to throw the following error.

Traceback (most recent call last):
File "/usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages/apscheduler/executors/base.py", line 125, in run_job
retval = job.func(*job.args, **job.kwargs)
File "/usr/lib/ckan/dpplus_venv/datapusher-plus/datapusher/jobs.py", line 551, in push_to_datastore
dupe_count = int(str(qsv_dedup.stderr).strip())
ValueError: invalid literal for int() with base 10: "thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 0', src/cmd/dedup.rs:105:32\nnote: run with RUST_BACKTRACE=1 environm
ent variable to display a backtrace"

Optimized data type mapping to PostgreSQL data types (for speed/reduced storage/efficiency)

Currently, all numeric fields (int and float) are mapped to PostgreSQL's numeric type.

https://www.postgresql.org/docs/current/datatype-numeric.html

Which is very inefficient for both storage and performance.

In DP+, we're inferring data types using its stats function. We can guarantee correct data type inferences as we scan the whole file. Whilst scanning, we also compile descriptive statistics, e.g. from the qsv whirlwind tour:

$ qsv stats wcp.csv --everything | qsv table
field       type     sum                min           max         min_length  max_length  mean                stddev              variance            lower_fence         q1          q2_median   q3          iqr                upper_fence         skew                  mode         cardinality  nullcount
Country     String                      ad            zw          2           2                                                                                                                                                                                            ru           231          0
City        String                       al lusayli   Þykkvibaer  1           87                                                                                                                                                                                           san jose     2008182      0
AccentCity  String                       Al Lusayli   özlüce      1           87                                                                                                                                                                                           San Antonio  2031214      0
Region      String                      00            Z4          0           2                                                                       -29.5               5           11          28          23                 62.5                1.3036035769599401    04           392          4
Population  Integer  2290536125         7             31480498    0           8           48730.66387966977   308414.0418510231   95119221210.88461   -33018              3730.5      10879       28229.5     24499              64978               0.36819008290764255                28460        2652350
Latitude    Float    76585211.19776328  -54.9333333   82.483333   1           12          28.371681223643343  21.938373536960917  481.292233447227    -35.9076389         12.9552778  33.8666667  45.5305556  32.5752778         94.3934723          -0.7514210842155992   50.8         255133       0
Longitude   Float    75976506.66429423  -179.9833333  180         1           14          28.14618114715278   62.472858625866486  3902.8580648875004  -98.49166745000002  2.383333    26.8802778  69.6333333  67.25000030000001  170.50833375000002  0.060789759344963286  23.1         407568       0

With min\max we can see if an int and float types will fit to PostgreSQL's smallint\integer\bigint and real\double precision types respectively, and only use numeric\decimal when it would overflow the more efficient types.

Having min\max also allow us to explore PostgreSQL's range types.

With min length\max length we can probably use character varying as well (though, per Postgres documentation, there is really no performance benefit to not use text, but perhaps for short strings with a low maxlength, we can use char varying for enforcing inferred schema constraints at the DB level ).

With cardinality, we can even automatically index certain columns based on some rules - i.e. cardinality = rowcount means a unique index; low cardinality below a certain threshold means creating an index to facilitate datastore_search_sql performance. (see #30)

With qsv frequency, we can also compile frequency tables and perhaps even explore exploiting PostgreSQL's enumerated types, if a column's frequency table is only a few values under a certain threshold.

Doing more efficient data type mapping will make the datastore more performant and space efficient, allowing it to support faster searches with datastore_search_sql queries, and take it beyond what IMHO, is currently a de facto "FTS-enabled tabular blob store" and be a big installment in taking CKAN beyond just metadata catalog use cases to being an enterprise datastore with "data lake"-like capabilities.

Automatic deduplication

WIth an environment variable QSV_AUTO_DEDUP is set, use qsv to dedup a file before inserting it into the Datastore.

Validation using `qsv schema` and `validate` commands

With 0.11, we already validate and normalize the CSV during analysis.

Will need to add schema support, perhaps, by looking for a schema.json attribute in the package?

If that's set, it will use the designated schema (may it be another resource in the CKAN instance, or a schema.json file on a URL) to validate the resource before pushing it.

Recoverable jobs

For an existing resource, when a job fails after it has already been deleted from the datastore, there is no easy way to recover.

To avoid this, DP+ should:

  • do a "soft-delete" of the table (perhaps, by temporarily renaming it?)
  • do the DP+ job as usual
  • if the job is successful, drop the "soft-deleted" table. If the job failed, restore the old table

Smart auto-indexing

Have an env var named QSV_AUTOINDEX.

It will be a string with three positions - PUI:

  • When P (for Primary Key) is 1, automatically create a primary key for a column with cardinality=rowcount, and there is no other column with cardinality=rowcount.
  • When U (for Unique Index) is 1, create a unique index for ALL columns where cardinality=rowcount.
  • When I (Index) is not zero, create an index for columns that have cardinality >= I.

Semi-automatic creation of indices based on cardinality of column values (exposed through Advanced Data Dictionary)

This is now available with 0.8.0.

There are two parameters to tune:

  • AUTO_INDEX_THRESHOLD (default: 3)
    If a column's cardinality (number of unique values) is greater than or equal to this, an index is created.
    If this is set to -1, all columns are indexed regardless of cardinality.
    If a column's cardinality is equal to the number of rows (all values are unique), a UNIQUE INDEX is created.

  • AUTO_INDEX_DATES (default: True)
    If a column's data type is inferred as a timestamp and AUTO_INDEX_DATES is true, an index is created for this column.

Advanced Data Dictionary

Currently, CKAN's Data Dictionary is limited to data type, label and description.

With qsv stats we collect descriptive statistics when we infer each column's data type during the Analysis phase of a DP+ job.

Currently - sum, min/max, min/max length, mean, stddev, variance, quartiles, median, modes, cardinality & nullcount.

When cardinality = rowcount and nullcount = 0, we can infer that a column can be a primary key and be a unique index, and annotate its data dictionary accordingly (and going further, create a unique index on it after the Copy phase).

When nullcount = rowcount, we can infer that a column is empty, and note it in the data dictionary as well.

And with qsv frequency - we can also compile frequency tables for the top N values of a column, and if the cardinality of a column is below a given N threshold, we can even infer the domain of a column as enumerated values.

Since we paid for compiling the statistics when we inferred the column data types already, we can store these statistics in the data dictionary as well as "schemata" (a term I coined for schema metadata) for "free" ( or nearly free, as running qsv frequency is not currently done by DP+, but even against a large file like the 1m row, 500mb NYC 311 benchmark data, it only takes 2.8 seconds).

We have several options:

  1. add additional properties to the existing Data Dictionary JSON which is stored as a table comment
  2. keep the existing Data Dictionary JSON as is, and store RESOURCE_NAME-stats.csv and RESOURCE_NAME-freq.csv as "system resources", that can be downloaded and queried with the CKAN API.
  3. extend the Data Dictionary with additional properties, and also store the -stats and -freq CSVs as system resources.
  4. alternatively, instead of using the original Data Dictionary JSON, we can instead insert the jsonschema file produced by the qsv schema command. The added benefit of doing so is that we use the jsonschema file with qsv validate to check if an external file conforms to the schema. And since qsv validate accepts a jsonschema URL, you can even validate an external file against the CKAN hosted jsonschema.
  5. Do 4, and add the "system resources" like 3.
  6. Store all these schemata data in a "schemata catalog" in the datastore database or a dedicated schemata database as native PostgreSQL objects. Perhaps, by using a resource's ID and adding a special prefix and/or suffix to it (e.g. RESOURCEID_datadict, RESOURCEID_stats, RESOURCEID_freq). Doing so has the added benefit of being able to query all the data dictionaries - e.g. columns with the same name, infer related resources, suggest joins, etc. and other "Linked Data" like queries and inferences in a performant manner.

@wardi , since you originally implemented the Data Dictionary, would be keen to get your opinion.

Database connections are not re-established if they are lost of some reason

Describe the bug
If databases connections are lost for some reason, like restarting database server, the connections are lost and datapusher+ produces an error. The logs show the following:

sqlalchemy.exc.OperationalError: (psycopg2.errors.AdminShutdown) terminating connection due to administrator command
SSL connection has been closed unexpectedly
[SQL: INSERT INTO apscheduler_jobs (id, next_run_time, job_state) VALUES (%(id)s, %(next_run_time)s, %(job_state)s)

To Reproduce
Steps to reproduce the behavior:

  1. Restart your database
  2. Try to add CSV to datastore via datapusher+.

Expected behavior
The data would be added to the datastore

ModuleNotFoundError: No module named 'datapusher'

After installing DataPusher-Plus for development following the instructions on: https://github.com/dathere/datapusher-plus#datapusher-database-setup . After successful installation of pip install datapusher-plus within the suggested environment: . /usr/lib/ckan/dpplus_venv/bin/activate.

I get the following error after trying to start DataPusher with /usr/lib/ckan/dpplus_venv/bin/uwsgi --enable-threads -i /etc/ckan/datapusher-plus/uwsgi.ini:

[uWSGI] getting INI configuration from /etc/ckan/datapusher-plus/uwsgi.ini
*** Starting uWSGI 2.0.21 (64bit) on [Tue Feb 14 18:11:44 2023] ***
compiled with version: 9.4.0 on 14 February 2023 21:59:44
os: Linux-5.15.0-60-generic #66~20.04.1-Ubuntu SMP Wed Jan 25 09:41:30 UTC 2023
nodename: ubuntu
machine: x86_64
clock source: unix
detected number of CPU cores: 16
current working directory: /home/user
writing pidfile to /tmp/uwsgi.pid
detected binary path: /usr/lib/ckan/dpplus_venv/bin/uwsgi
!!! no internal routing support, rebuild with pcre support !!!
your processes number limit is 23546
your memory page size is 4096 bytes
 *** WARNING: you have enabled harakiri without post buffering. Slow upload could be rejected on post-unbuffered webservers *** 
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uWSGI http bound on 0.0.0.0:8800 fd 4
uwsgi socket 0 bound to TCP address 127.0.0.1:44081 (port auto-assigned) fd 3
Python version: 3.8.10 (default, Nov 14 2022, 12:59:47)  [GCC 9.4.0]
PEP 405 virtualenv detected: /usr/lib/ckan/dpplus_venv
Set PythonHome to /usr/lib/ckan/dpplus_venv
Python main interpreter initialized at 0x55f3e6fbde50
python threads support enabled
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 719200 bytes (702 KB) for 9 cores
*** Operational MODE: preforking+threaded ***
spawned uWSGI master process (pid: 3230)
spawned uWSGI worker 1 (pid: 3231, cores: 3)
spawned uWSGI worker 2 (pid: 3232, cores: 3)
spawned uWSGI worker 3 (pid: 3233, cores: 3)
spawned uWSGI http 1 (pid: 3234)
ModuleNotFoundError: No module named 'datapusher'
ModuleNotFoundError: No module named 'datapusher'
unable to load app 0 (mountpoint='') (callable not found or import error)
unable to load app 0 (mountpoint='') (callable not found or import error)
*** no app loaded. going in full dynamic mode ***
*** no app loaded. going in full dynamic mode ***
ModuleNotFoundError: No module named 'datapusher'
unable to load app 0 (mountpoint='') (callable not found or import error)
*** no app loaded. going in full dynamic mode ***

Under dpplus_venv when trying to import datapusher:

(dpplus_venv) user@ubuntu:~$ python
Python 3.8.10 (default, Nov 14 2022, 12:59:47) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import datapusher
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'datapusher'

I have double checked that pip install datapusher-plus:

(dpplus_venv) user@ubuntu:~$ pip install datapusher-plus
Requirement already satisfied: datapusher-plus in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (0.10.2)
Requirement already satisfied: ckanserviceprovider>=1.0 in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from datapusher-plus) (1.1.0)
Requirement already satisfied: uwsgi in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from datapusher-plus) (2.0.21)
Requirement already satisfied: requests in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from datapusher-plus) (2.28.2)
Requirement already satisfied: python-dotenv in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from datapusher-plus) (0.21.1)
Requirement already satisfied: psycopg2-binary in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from datapusher-plus) (2.9.5)
Requirement already satisfied: semver in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from datapusher-plus) (2.13.0)
Requirement already satisfied: datasize in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from datapusher-plus) (1.0.0)
Requirement already satisfied: APScheduler<3.10.0,>=2.1.2 in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from ckanserviceprovider>=1.0->datapusher-plus) (3.9.1.post1)
Requirement already satisfied: SQLAlchemy<1.4.0,>=1.3.15 in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from ckanserviceprovider>=1.0->datapusher-plus) (1.3.24)
Requirement already satisfied: flask-login==0.6.0 in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from ckanserviceprovider>=1.0->datapusher-plus) (0.6.0)
Requirement already satisfied: future in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from ckanserviceprovider>=1.0->datapusher-plus) (0.18.3)
Requirement already satisfied: Werkzeug>=1.0.0 in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from ckanserviceprovider>=1.0->datapusher-plus) (2.2.3)
Requirement already satisfied: Flask>=1.1.1 in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from ckanserviceprovider>=1.0->datapusher-plus) (2.2.2)
Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from requests->datapusher-plus) (2022.12.7)
Requirement already satisfied: idna<4,>=2.5 in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from requests->datapusher-plus) (3.4)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from requests->datapusher-plus) (3.0.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from requests->datapusher-plus) (1.26.14)
Requirement already satisfied: tzlocal!=3.*,>=2.0 in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from APScheduler<3.10.0,>=2.1.2->ckanserviceprovider>=1.0->datapusher-plus) (4.2)
Requirement already satisfied: six>=1.4.0 in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from APScheduler<3.10.0,>=2.1.2->ckanserviceprovider>=1.0->datapusher-plus) (1.16.0)
Requirement already satisfied: setuptools>=0.7 in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from APScheduler<3.10.0,>=2.1.2->ckanserviceprovider>=1.0->datapusher-plus) (44.0.0)
Requirement already satisfied: pytz in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from APScheduler<3.10.0,>=2.1.2->ckanserviceprovider>=1.0->datapusher-plus) (2022.7.1)
Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from Werkzeug>=1.0.0->ckanserviceprovider>=1.0->datapusher-plus) (2.1.2)
Requirement already satisfied: itsdangerous>=2.0 in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from Flask>=1.1.1->ckanserviceprovider>=1.0->datapusher-plus) (2.1.2)
Requirement already satisfied: Jinja2>=3.0 in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from Flask>=1.1.1->ckanserviceprovider>=1.0->datapusher-plus) (3.1.2)
Requirement already satisfied: importlib-metadata>=3.6.0; python_version < "3.10" in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from Flask>=1.1.1->ckanserviceprovider>=1.0->datapusher-plus) (6.0.0)
Requirement already satisfied: click>=8.0 in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from Flask>=1.1.1->ckanserviceprovider>=1.0->datapusher-plus) (8.1.3)
Requirement already satisfied: backports.zoneinfo; python_version < "3.9" in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from tzlocal!=3.*,>=2.0->APScheduler<3.10.0,>=2.1.2->ckanserviceprovider>=1.0->datapusher-plus) (0.2.1)
Requirement already satisfied: pytz-deprecation-shim in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from tzlocal!=3.*,>=2.0->APScheduler<3.10.0,>=2.1.2->ckanserviceprovider>=1.0->datapusher-plus) (0.1.0.post0)
Requirement already satisfied: zipp>=0.5 in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from importlib-metadata>=3.6.0; python_version < "3.10"->Flask>=1.1.1->ckanserviceprovider>=1.0->datapusher-plus) (3.13.0)
Requirement already satisfied: tzdata; python_version >= "3.6" in /usr/lib/ckan/dpplus_venv/lib/python3.8/site-packages (from pytz-deprecation-shim->tzlocal!=3.*,>=2.0->APScheduler<3.10.0,>=2.1.2->ckanserviceprovider>=1.0->datapusher-plus) (2022.7)

I am using Ubuntu 20.04 as recommended:

(dpplus_venv) user@ubuntu:~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.5 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.5 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

CKAN server on the other hand is running fine. Only DataPusher seems to fail.

Thank you for your help!

datapusher-plus-uwsgi:datapusher-plus-uwsgi-00 FATAL Exited too quickly (process log may have details)

There was a detail missing from the installation instructions, here's my analysis and how I solved it, although I still have a problem when trying to upload to the datastore, I suspect it's the API KEY parameter, which probably could also be included in the installation process:

I use ckan package version 2.9 Ubuntu Server 20.04

Installing the plugin Datapusher-Plus (DP+), after carrying out all the steps, I get the following error:

ckan-datapusher:ckan-datapusher-00 RUNNING pid 1656, uptime 0:00:22
ckan-uwsgi:ckan-uwsgi-00 RUNNING pid 1657, uptime 0:00:22
ckan-worker:ckan-worker-00 RUNNING pid 1658, uptime 0:00:22
**datapusher-plus-uwsgi:datapusher-plus-uwsgi-00 FATAL Exited too quickly (process log may have details)**

I think the problem is that I have to stop the datapusher that comes with ckan, but is it correct? If so, how to stop it?

Indeed, the datapusher-plus uses the same port 8800
I understand what needs to be done is to remove the original datapusher configuration file /etc/supervisor/conf.d/datapusher-uwsgi.conf and run:

sudo supervisorctl reread
sudo supervisorctl update

And yes! the problem gone

ckan-uwsgi:ckan-uwsgi-00                         RUNNING   pid 1068, uptime 0:50:55
ckan-worker:ckan-worker-00                       RUNNING   pid 1069, uptime 0:50:55
datapusher-plus-uwsgi:datapusher-plus-uwsgi-00   RUNNING   pid 1070, uptime 0:50:55

So datapusher-plus to be the only one to load

Config DP+ /etc/ckan/datapusher-plus/uwsgi.ini

[uwsgi]
http=0.0.0.0:8800
uid = www-data
guid = www-data
virtualenv = /usr/lib/ckan/dpplus_venv
module = datapusher.wsgi:application
master=true
pidfile = /tmp/%n.pid
harakiri = 50
max-requests = 5000
vacuum = true
buffer-size = 32768

see High Availability Setup
workers = 3
thread = 3
lazy-apps=true

ckan data pusher original
/etc/ckan/datapusherdatapusher-uwsgi.ini

[uwsgi]
http=127.0.0.1:8800
uid = www-data
guid = www-data
wsgi-file = /etc/ckan/datapusher/datapusher.wsgi
virtualenv = /usr/lib/ckan/datapusher
master=true
pidfile = /tmp/%n.pid
harakiri = 50
max-requests = 5000
vacuum = true
callable = application
buffer-size = 32768

datapusher_plus-uwsgi.OUT contains no records

I have datapusher_plus-uwsgi.ERR here:
datapusher_plus-uwsgi.ERR

The last boot record in datapusher_plus-uwsgi.ERR is:

Starting uWSGI 2.0.21 (64bit) on [Wed Feb 1 08:08:11 2023]
compiled with version: 9.4.0 on 31 January 2023 14:07:28
os: Linux-5.15.0-58-generic #64~20.04.1-Ubuntu SMP Fri Jan 6 16:42:31 UTC 2023
nodename: CkanSrv
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 2
current working directory: /
writing pidfile to /tmp/uwsgi.pid
detected binary path: /usr/lib/ckan/dpplus_venv/bin/uwsgi
setuid() to 33
your process number limit is 7560
your memory page size is 4096 bytes
WARNING: you have enabled harakiri without post buffering. Slow upload could be rejected on post-unbuffered webservers
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
probably another instance of uWSGI is running on the same address (0.0.0.0:8800).
bind(): Address already in use [core/socket.c line 769]

By the way, where do I find the API Key to set in the parameter: ckan.datapusher.api_token = <api_token> ?

I have already managed to leave DP+ active, but I have this error when I try to upload a data

Upload error: An Error occurred while sending the job: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8800/job

I suspect I am missing the above parameter (ckan.datapusher.api_token = <api_token> ) in my ckan.ini

At this point I am now, I hope you can help me to be able to solve and comment on the conclusion of the story and that it helps with the installation and configuration process of the DP+

affiliated CKAN Service Provider jobs - "DataGroomers" that are meant to periodically groom datastore data

"Datagroomers" as the name implies, continuously "groom" the data in the background based on certain rules/recipes.

At the moment, I envision them as CKAN service provider jobs.

Several "datagroomers" come to mind:

  • libpostal datagroomer - for normalizing addresses
  • geocoding datagroomer
    • using qsv's built-in, low-resolution geonames geocoder
    • using the user's preferred geocoding service, leverage qsv fetch
  • auto-tagging datagroomer - for adding tags based on certain domains (e.g. clean-energy tagger, internet of water tagger, etc)
  • related resources datagroomer - Link related resources based on their data dictionaries

Smarter alias creation

If an alias already exist, DP+ errors out (however gracefully).

It should automatically create another alias that's human-readable, but guaranteed to be unique. Perhaps, by adding the last four characters of its resourceid?

Untrimmed column names causes DP+ to fail

Describe the bug
When a CSV or Excel file with untrimmed column names is pushed, it causes DP+ to fail.

Expected behavior
DP+ should automatically trim column names.

Additional context
DP+ uses the column names in a CSV/Excel file to create corresponding postgres columns when creating resource tables. Postgres column names have strict naming requirements. IRL, however, most column names that people use in Excel and CSV files do not follow this convention.

To overcome this, we create quoted identifiers so column names can have embedded spaces and special characters.

However, this workaround doesn't work when a column has leading or trailing spaces.

Smart date inferencing

With qsv stats we collect descriptive statistics when we infer each column's data type during the Analysis phase of a DP+ job.

For example, using the benchmark data from qsv based on a 1M row , 512 mb, 41 column sample of NYC's 311 data, the command:

$ qsv index .\NYC_311_SR_2010-2020-sample-1M.csv  
$ qsv stats .\NYC_311_SR_2010-2020-sample-1M.csv > nyc311stats-simple.csv

yields the file below in 0.27 seconds:

field                          ,type    ,sum                ,min                                                                                ,max                                                                                                                                                                                                                                                                                                                                                       ,min_length ,max_length ,mean               ,stddev              ,variance
Unique Key                     ,Integer ,32687965858032     ,11465364                                                                           ,48478173                                                                                                                                                                                                                                                                                                                                                  ,8          ,8          ,32687965.858031966 ,9013895.335828971   ,81250309125279.27
Created Date                   ,String  ,                   ,01/01/2010 01:05:51 PM                                                             ,12/31/2019 12:58:50 PM                                                                                                                                                                                                                                                                                                                                    ,22         ,22         ,                   ,                    ,
Closed Date                    ,String  ,                   ,01/01/1900 12:00:00 AM                                                             ,12/31/2019 12:59:00 PM                                                                                                                                                                                                                                                                                                                                    ,0          ,22         ,                   ,                    ,
Agency                         ,String  ,                   ,3-1-1                                                                              ,TLC                                                                                                                                                                                                                                                                                                                                                       ,3          ,42         ,                   ,                    ,
Agency Name                    ,String  ,                   ,3-1-1                                                                              ,Valuation Policy                                                                                                                                                                                                                                                                                                                                          ,3          ,82         ,                   ,                    ,
Complaint Type                 ,String  ,                   ,../../WEB-INF/web.xml;x=                                                           ,ZTESTINT                                                                                                                                                                                                                                                                                                                                                  ,3          ,41         ,                   ,                    ,
Descriptor                     ,String  ,                   ,1 Missed Collection                                                                ,unknown odor/taste in drinking water (QA6)                                                                                                                                                                                                                                                                                                                ,0          ,80         ,                   ,                    ,
Location Type                  ,String  ,                   ,"1-, 2- and 3- Family Home"                                                        ,Wooded Area                                                                                                                                                                                                                                                                                                                                               ,0          ,36         ,                   ,                    ,
Incident Zip                   ,String  ,                   ,*                                                                                  ,XXXXX                                                                                                                                                                                                                                                                                                                                                     ,0          ,10         ,                   ,                    ,
Incident Address               ,String  ,                   ,* *                                                                                ,west 155 street and edgecombe avenue                                                                                                                                                                                                                                                                                                                      ,0          ,55         ,                   ,                    ,
Street Name                    ,String  ,                   ,*                                                                                  ,wyckoff avenue                                                                                                                                                                                                                                                                                                                                            ,0          ,55         ,                   ,                    ,
Cross Street 1                 ,String  ,                   ,1 AVE                                                                              ,mermaid                                                                                                                                                                                                                                                                                                                                                   ,0          ,32         ,                   ,                    ,
Cross Street 2                 ,String  ,                   ,1 AVE                                                                              ,surf                                                                                                                                                                                                                                                                                                                                                      ,0          ,35         ,                   ,                    ,
Intersection Street 1          ,String  ,                   ,1 AVE                                                                              ,flatlands AVE                                                                                                                                                                                                                                                                                                                                             ,0          ,35         ,                   ,                    ,
Intersection Street 2          ,String  ,                   ,1 AVE                                                                              ,glenwood RD                                                                                                                                                                                                                                                                                                                                               ,0          ,33         ,                   ,                    ,
Address Type                   ,String  ,                   ,ADDRESS                                                                            ,PLACENAME                                                                                                                                                                                                                                                                                                                                                 ,0          ,12         ,                   ,                    ,
City                           ,String  ,                   ,*                                                                                  ,YORKTOWN HEIGHTS                                                                                                                                                                                                                                                                                                                                          ,0          ,22         ,                   ,                    ,
Landmark                       ,String  ,                   ,1 AVENUE                                                                           ,ZULETTE AVENUE                                                                                                                                                                                                                                                                                                                                            ,0          ,32         ,                   ,                    ,
Facility Type                  ,String  ,                   ,DSNY Garage                                                                        ,School District                                                                                                                                                                                                                                                                                                                                           ,0          ,15         ,                   ,                    ,
Status                         ,String  ,                   ,Assigned                                                                           ,Unspecified                                                                                                                                                                                                                                                                                                                                               ,4          ,16         ,                   ,                    ,
Due Date                       ,String  ,                   ,01/01/2010 01:26:03 PM                                                             ,12/31/2018 12:59:20 PM                                                                                                                                                                                                                                                                                                                                    ,0          ,22         ,                   ,                    ,
Resolution Description         ,String  ,                   ,A DOB violation was issued for failing to comply with an existing Stop Work Order. ,"Your request was submitted to the Department of Homeless Services. The City?s outreach team will assess the homeless individual and offer appropriate assistance within 2 hours. If you asked to know the outcome of your request, you will get a call within 2 hours. No further status will be available through the NYC 311 App, 311, or 311 Online." ,0          ,934        ,                   ,                    ,
Resolution Action Updated Date ,String  ,                   ,01/01/2010 01:50:45 PM                                                             ,12/31/2019 12:58:00 PM                                                                                                                                                                                                                                                                                                                                    ,0          ,22         ,                   ,                    ,
Community Board                ,String  ,                   ,0 Unspecified                                                                      ,Unspecified STATEN ISLAND                                                                                                                                                                                                                                                                                                                                 ,8          ,25         ,                   ,                    ,
BBL                            ,Integer ,2082985217282449   ,0                                                                                  ,5270000501                                                                                                                                                                                                                                                                                                                                                ,0          ,10         ,2751798943.2415347 ,1168122117.923852   ,1.3645092823829053e18
Borough                        ,String  ,                   ,BRONX                                                                              ,Unspecified                                                                                                                                                                                                                                                                                                                                               ,5          ,13         ,                   ,                    ,
X Coordinate (State Plane)     ,Integer ,919555108413       ,913281                                                                             ,1067220                                                                                                                                                                                                                                                                                                                                                   ,0          ,7          ,1005337.5451259619 ,22512.45281021959   ,506810531.5323639
Y Coordinate (State Plane)     ,Integer ,188099299101       ,121152                                                                             ,271876                                                                                                                                                                                                                                                                                                                                                    ,0          ,6          ,205646.49782053265 ,31723.198493763975  ,1006361322.674749
Open Data Channel Type         ,String  ,                   ,MOBILE                                                                             ,UNKNOWN                                                                                                                                                                                                                                                                                                                                                   ,5          ,7          ,                   ,                    ,
Park Facility Name             ,String  ,                   ,"""Uncle"" Vito F. Maranzano Glendale Playground"                                  ,Zimmerman Playground                                                                                                                                                                                                                                                                                                                                      ,3          ,82         ,                   ,                    ,
Park Borough                   ,String  ,                   ,BRONX                                                                              ,Unspecified                                                                                                                                                                                                                                                                                                                                               ,5          ,13         ,                   ,                    ,
Vehicle Type                   ,String  ,                   ,Ambulette / Paratransit                                                            ,Green Taxi                                                                                                                                                                                                                                                                                                                                                ,0          ,23         ,                   ,                    ,
Taxi Company Borough           ,String  ,                   ,BRONX                                                                              ,Staten Island                                                                                                                                                                                                                                                                                                                                             ,0          ,13         ,                   ,                    ,
Taxi Pick Up Location          ,String  ,                   ,1 5 AVENUE MANHATTAN                                                               ,YORK AVENUE AND EAST 70 STREET                                                                                                                                                                                                                                                                                                                            ,0          ,60         ,                   ,                    ,
Bridge Highway Name            ,String  ,                   ,145th St. Br - Lenox Ave                                                           ,Willis Ave Br - 125th St/1st Ave                                                                                                                                                                                                                                                                                                                          ,0          ,42         ,                   ,                    ,
Bridge Highway Direction       ,String  ,                   ,Bronx Bound                                                                        ,Westbound/To Goethals Br                                                                                                                                                                                                                                                                                                                                  ,0          ,33         ,                   ,                    ,
Road Ramp                      ,String  ,                   ,N/A                                                                                ,Roadway                                                                                                                                                                                                                                                                                                                                                   ,0          ,7          ,                   ,                    ,
Bridge Highway Segment         ,String  ,                   ,1-1-1265963747                                                                     ,Wythe Ave/Kent Ave (Exit 31)                                                                                                                                                                                                                                                                                                                              ,0          ,100        ,                   ,                    ,
Latitude                       ,Float   ,30355391.760447357 ,40.1123853                                                                         ,40.9128688                                                                                                                                                                                                                                                                                                                                                ,0          ,18         ,40.72881808178842  ,0.0893143967633158  ,0.007977061469194996
Longitude                      ,Float   ,-55100392.94986465 ,-77.5195844                                                                        ,-73.7005968                                                                                                                                                                                                                                                                                                                                               ,0          ,18         ,-73.92999235194249 ,0.16351835417538158 ,0.026738252152225528
Location                       ,String  ,                   ,"(40.1123853, -77.5195844)"                                                        ,"(40.9128688, -73.9024731)"                                                                                                                                                                                                                                                                                                                               ,0          ,40         ,                   ,                    ,

Adding the --everything and --infer-dates options...

$ qsv stats --everything --infer-dates .\NYC_311_SR_2010-2020-sample-1M.csv > nyc311stats.-everything-inferdates.csv

yields the file below in 103.89 seconds. More than 3 orders of magnitude slower!

field                          ,type     ,sum                 ,min                                                                                ,max                                                                                                                                                                                                                                                                                                                                                       ,min_length ,max_length ,mean               ,stddev              ,variance             ,lower_fence        ,q1           ,q2_median    ,q3           ,iqr                 ,upper_fence        ,skew                  ,mode                                                                                                                                    ,cardinality ,nullcount
Unique Key                     ,Integer  ,32687965858032      ,11465364                                                                           ,48478173                                                                                                                                                                                                                                                                                                                                                  ,8          ,8          ,32687965.85803196  ,9013895.335828971   ,81250309125279.27    ,2803282.25         ,25245773.0   ,32853358.5   ,40207433.5   ,14961660.5          ,62649924.25        ,-0.055045893858106744 ,                                                                                                                                        ,1000000     ,0
Created Date                   ,DateTime ,                    ,2010-01-01 05:00:00 UTC                                                            ,2020-12-23 06:25:51 UTC                                                                                                                                                                                                                                                                                                                                   ,22         ,22         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,01/24/2013 12:00:00 AM                                                                                                                  ,841014      ,0
Closed Date                    ,DateTime ,                    ,1900-01-01 05:00:00 UTC                                                            ,2100-01-01 05:00:00 UTC                                                                                                                                                                                                                                                                                                                                   ,0          ,22         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,                                                                                                                                        ,688837      ,28619
Agency                         ,String   ,                    ,3-1-1                                                                              ,TLC                                                                                                                                                                                                                                                                                                                                                       ,3          ,42         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,NYPD                                                                                                                                    ,28          ,0
Agency Name                    ,String   ,                    ,3-1-1                                                                              ,Valuation Policy                                                                                                                                                                                                                                                                                                                                          ,3          ,82         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,New York City Police Department                                                                                                         ,553         ,0
Complaint Type                 ,String   ,                    ,../../WEB-INF/web.xml;x=                                                           ,ZTESTINT                                                                                                                                                                                                                                                                                                                                                  ,3          ,41         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,Noise - Residential                                                                                                                     ,287         ,0
Descriptor                     ,String   ,                    ,1 Missed Collection                                                                ,unknown odor/taste in drinking water (QA6)                                                                                                                                                                                                                                                                                                                ,0          ,80         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,Loud Music/Party                                                                                                                        ,1392        ,3001
Location Type                  ,String   ,                    ,"1-, 2- and 3- Family Home"                                                        ,Wooded Area                                                                                                                                                                                                                                                                                                                                               ,0          ,36         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,RESIDENTIAL BUILDING                                                                                                                    ,162         ,239131
Incident Zip                   ,String   ,                    ,*                                                                                  ,XXXXX                                                                                                                                                                                                                                                                                                                                                     ,0          ,10         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,                                                                                                                                        ,535         ,54978
Incident Address               ,String   ,                    ,* *                                                                                ,west 155 street and edgecombe avenue                                                                                                                                                                                                                                                                                                                      ,0          ,55         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,                                                                                                                                        ,341996      ,174700
Street Name                    ,String   ,                    ,*                                                                                  ,wyckoff avenue                                                                                                                                                                                                                                                                                                                                            ,0          ,55         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,                                                                                                                                        ,14837       ,174720
Cross Street 1                 ,String   ,                    ,1 AVE                                                                              ,mermaid                                                                                                                                                                                                                                                                                                                                                   ,0          ,32         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,                                                                                                                                        ,16238       ,320401
Cross Street 2                 ,String   ,                    ,1 AVE                                                                              ,surf                                                                                                                                                                                                                                                                                                                                                      ,0          ,35         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,                                                                                                                                        ,16486       ,323644
Intersection Street 1          ,String   ,                    ,1 AVE                                                                              ,flatlands AVE                                                                                                                                                                                                                                                                                                                                             ,0          ,35         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,                                                                                                                                        ,11237       ,767422
Intersection Street 2          ,String   ,                    ,1 AVE                                                                              ,glenwood RD                                                                                                                                                                                                                                                                                                                                               ,0          ,33         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,                                                                                                                                        ,11674       ,767709
Address Type                   ,String   ,                    ,ADDRESS                                                                            ,PLACENAME                                                                                                                                                                                                                                                                                                                                                 ,0          ,12         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,ADDRESS                                                                                                                                 ,6           ,125802
City                           ,String   ,                    ,*                                                                                  ,YORKTOWN HEIGHTS                                                                                                                                                                                                                                                                                                                                          ,0          ,22         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,BROOKLYN                                                                                                                                ,382         ,61963
Landmark                       ,String   ,                    ,1 AVENUE                                                                           ,ZULETTE AVENUE                                                                                                                                                                                                                                                                                                                                            ,0          ,32         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,                                                                                                                                        ,5915        ,912779
Facility Type                  ,String   ,                    ,DSNY Garage                                                                        ,School District                                                                                                                                                                                                                                                                                                                                           ,0          ,15         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,N/A                                                                                                                                     ,6           ,145478
Status                         ,String   ,                    ,Assigned                                                                           ,Unspecified                                                                                                                                                                                                                                                                                                                                               ,4          ,16         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,Closed                                                                                                                                  ,10          ,0
Due Date                       ,DateTime ,                    ,1900-01-02 05:00:00 UTC                                                            ,2021-06-17 20:34:13 UTC                                                                                                                                                                                                                                                                                                                                   ,0          ,22         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,                                                                                                                                        ,345077      ,647794
Resolution Description         ,String   ,                    ,A DOB violation was issued for failing to comply with an existing Stop Work Order. ,"Your request was submitted to the Department of Homeless Services. The City?s outreach team will assess the homeless individual and offer appropriate assistance within 2 hours. If you asked to know the outcome of your request, you will get a call within 2 hours. No further status will be available through the NYC 311 App, 311, or 311 Online." ,0          ,934        ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,The Police Department responded to the complaint and with the information available observed no evidence of the violation at that time. ,1216        ,20480
Resolution Action Updated Date ,DateTime ,                    ,2009-12-31 06:35:00 UTC                                                            ,2020-12-23 11:56:14 UTC                                                                                                                                                                                                                                                                                                                                   ,0          ,22         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,                                                                                                                                        ,690314      ,15072
Community Board                ,String   ,                    ,0 Unspecified                                                                      ,Unspecified STATEN ISLAND                                                                                                                                                                                                                                                                                                                                 ,8          ,25         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,0 Unspecified                                                                                                                           ,77          ,0
BBL                            ,Integer  ,2082985217282449    ,0                                                                                  ,5270000501                                                                                                                                                                                                                                                                                                                                                ,0          ,10         ,2751798943.241534  ,1168122117.9238517  ,1.364509282382905e18 ,-941195045.5       ,2028310001.0 ,3019480063.0 ,4007980032.0 ,1979670031.0        ,6977485078.5       ,-0.6874652461017321   ,                                                                                                                                        ,268383      ,243046
Borough                        ,String   ,                    ,BRONX                                                                              ,Unspecified                                                                                                                                                                                                                                                                                                                                               ,5          ,13         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,BROOKLYN                                                                                                                                ,6           ,0
X Coordinate (State Plane)     ,Integer  ,919555108413        ,913281                                                                             ,1067220                                                                                                                                                                                                                                                                                                                                                   ,0          ,7          ,1005337.5451259615 ,22512.45281021959   ,506810531.5323639    ,956616.5           ,993572.0     ,1004546.0    ,1018209.0    ,24637.0             ,1055164.5          ,0.105480970816589     ,                                                                                                                                        ,102556      ,85327
Y Coordinate (State Plane)     ,Integer  ,188099299101        ,121152                                                                             ,271876                                                                                                                                                                                                                                                                                                                                                    ,0          ,6          ,205646.49782053265 ,31723.19849376398   ,1006361322.6747493   ,103334.0           ,182411.0     ,202514.0     ,235129.0     ,52718.0             ,314206.0           ,0.29623410966726027   ,                                                                                                                                        ,116092      ,85327
Open Data Channel Type         ,String   ,                    ,MOBILE                                                                             ,UNKNOWN                                                                                                                                                                                                                                                                                                                                                   ,5          ,7          ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,PHONE                                                                                                                                   ,5           ,0
Park Facility Name             ,String   ,                    ,"""Uncle"" Vito F. Maranzano Glendale Playground"                                  ,Zimmerman Playground                                                                                                                                                                                                                                                                                                                                      ,3          ,82         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,Unspecified                                                                                                                             ,1889        ,0
Park Borough                   ,String   ,                    ,BRONX                                                                              ,Unspecified                                                                                                                                                                                                                                                                                                                                               ,5          ,13         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,BROOKLYN                                                                                                                                ,6           ,0
Vehicle Type                   ,String   ,                    ,Ambulette / Paratransit                                                            ,Green Taxi                                                                                                                                                                                                                                                                                                                                                ,0          ,23         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,                                                                                                                                        ,5           ,999652
Taxi Company Borough           ,String   ,                    ,BRONX                                                                              ,Staten Island                                                                                                                                                                                                                                                                                                                                             ,0          ,13         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,                                                                                                                                        ,11          ,999156
Taxi Pick Up Location          ,String   ,                    ,1 5 AVENUE MANHATTAN                                                               ,YORK AVENUE AND EAST 70 STREET                                                                                                                                                                                                                                                                                                                            ,0          ,60         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,                                                                                                                                        ,1903        ,992129
Bridge Highway Name            ,String   ,                    ,145th St. Br - Lenox Ave                                                           ,Willis Ave Br - 125th St/1st Ave                                                                                                                                                                                                                                                                                                                          ,0          ,42         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,                                                                                                                                        ,68          ,997711
Bridge Highway Direction       ,String   ,                    ,Bronx Bound                                                                        ,Westbound/To Goethals Br                                                                                                                                                                                                                                                                                                                                  ,0          ,33         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,                                                                                                                                        ,50          ,997691
Road Ramp                      ,String   ,                    ,N/A                                                                                ,Roadway                                                                                                                                                                                                                                                                                                                                                   ,0          ,7          ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,                                                                                                                                        ,4           ,997693
Bridge Highway Segment         ,String   ,                    ,1-1-1265963747                                                                     ,Wythe Ave/Kent Ave (Exit 31)                                                                                                                                                                                                                                                                                                                              ,0          ,100        ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,                                                                                                                                        ,937         ,997556
Latitude                       ,Float    ,30355391.760447353  ,40.1123853                                                                         ,40.9128688                                                                                                                                                                                                                                                                                                                                                ,0          ,18         ,40.72881808178841  ,0.08931439676331582 ,0.007977061469195    ,40.46458052499999  ,40.6677055   ,40.7221652   ,40.80312215  ,0.13541665000000336 ,41.006247125       ,0.2234650413429593    ,                                                                                                                                        ,353694      ,254695
Longitude                      ,Float    ,-55100392.949864656 ,-77.5195844                                                                        ,-73.7005968                                                                                                                                                                                                                                                                                                                                               ,0          ,18         ,-73.92999235194247 ,0.16351835417538158 ,0.02673825215222553  ,-74.11194174999999 ,-73.970536   ,-73.9279455  ,-73.8762655  ,0.09427049999999326 ,-73.73485975000001 ,-0.03755270078620233  ,                                                                                                                                        ,353996      ,254695
Location                       ,String   ,                    ,"(40.1123853, -77.5195844)"                                                        ,"(40.9128688, -73.9024731)"                                                                                                                                                                                                                                                                                                                               ,0          ,40         ,                   ,                    ,                     ,                   ,             ,             ,             ,                    ,                   ,                      ,                                                                                                                                        ,375772      ,254695

while the command:

 qsv stats --everything .\NYC_311_SR_2010-2020-sample-1M.csv > nyc311stats-everything.csv

yields the file below in only 3.60 seconds. The only difference being we didn't use the --infer-dates option and date fields and their min/max values are treated as Strings.

field                          ,type    ,sum                ,min                                                                                ,max                                                                                                                                                                                                                                                                                                                                                       ,min_length ,max_length ,mean               ,stddev              ,variance              ,lower_fence        ,q1           ,q2_median    ,q3           ,iqr                 ,upper_fence        ,skew                 ,mode                                                                                                                                    ,cardinality ,nullcount
Unique Key                     ,Integer ,32687965858032     ,11465364                                                                           ,48478173                                                                                                                                                                                                                                                                                                                                                  ,8          ,8          ,32687965.858031962 ,9013895.335828971   ,81250309125279.27     ,2803282.25         ,25245773.0   ,32853358.5   ,40207433.5   ,14961660.5          ,62649924.25        ,-0.0550458938581055  ,                                                                                                                                        ,1000000     ,0
Created Date                   ,String  ,                   ,01/01/2010 01:05:51 PM                                                             ,12/31/2019 12:58:50 PM                                                                                                                                                                                                                                                                                                                                    ,22         ,22         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,01/24/2013 12:00:00 AM                                                                                                                  ,841014      ,0
Closed Date                    ,String  ,                   ,01/01/1900 12:00:00 AM                                                             ,12/31/2019 12:59:00 PM                                                                                                                                                                                                                                                                                                                                    ,0          ,22         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,                                                                                                                                        ,688837      ,28619
Agency                         ,String  ,                   ,3-1-1                                                                              ,TLC                                                                                                                                                                                                                                                                                                                                                       ,3          ,42         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,NYPD                                                                                                                                    ,28          ,0
Agency Name                    ,String  ,                   ,3-1-1                                                                              ,Valuation Policy                                                                                                                                                                                                                                                                                                                                          ,3          ,82         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,New York City Police Department                                                                                                         ,553         ,0
Complaint Type                 ,String  ,                   ,../../WEB-INF/web.xml;x=                                                           ,ZTESTINT                                                                                                                                                                                                                                                                                                                                                  ,3          ,41         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,Noise - Residential                                                                                                                     ,287         ,0
Descriptor                     ,String  ,                   ,1 Missed Collection                                                                ,unknown odor/taste in drinking water (QA6)                                                                                                                                                                                                                                                                                                                ,0          ,80         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,Loud Music/Party                                                                                                                        ,1392        ,3001
Location Type                  ,String  ,                   ,"1-, 2- and 3- Family Home"                                                        ,Wooded Area                                                                                                                                                                                                                                                                                                                                               ,0          ,36         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,RESIDENTIAL BUILDING                                                                                                                    ,162         ,239131
Incident Zip                   ,String  ,                   ,*                                                                                  ,XXXXX                                                                                                                                                                                                                                                                                                                                                     ,0          ,10         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,                                                                                                                                        ,535         ,54978
Incident Address               ,String  ,                   ,* *                                                                                ,west 155 street and edgecombe avenue                                                                                                                                                                                                                                                                                                                      ,0          ,55         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,                                                                                                                                        ,341996      ,174700
Street Name                    ,String  ,                   ,*                                                                                  ,wyckoff avenue                                                                                                                                                                                                                                                                                                                                            ,0          ,55         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,                                                                                                                                        ,14837       ,174720
Cross Street 1                 ,String  ,                   ,1 AVE                                                                              ,mermaid                                                                                                                                                                                                                                                                                                                                                   ,0          ,32         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,                                                                                                                                        ,16238       ,320401
Cross Street 2                 ,String  ,                   ,1 AVE                                                                              ,surf                                                                                                                                                                                                                                                                                                                                                      ,0          ,35         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,                                                                                                                                        ,16486       ,323644
Intersection Street 1          ,String  ,                   ,1 AVE                                                                              ,flatlands AVE                                                                                                                                                                                                                                                                                                                                             ,0          ,35         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,                                                                                                                                        ,11237       ,767422
Intersection Street 2          ,String  ,                   ,1 AVE                                                                              ,glenwood RD                                                                                                                                                                                                                                                                                                                                               ,0          ,33         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,                                                                                                                                        ,11674       ,767709
Address Type                   ,String  ,                   ,ADDRESS                                                                            ,PLACENAME                                                                                                                                                                                                                                                                                                                                                 ,0          ,12         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,ADDRESS                                                                                                                                 ,6           ,125802
City                           ,String  ,                   ,*                                                                                  ,YORKTOWN HEIGHTS                                                                                                                                                                                                                                                                                                                                          ,0          ,22         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,BROOKLYN                                                                                                                                ,382         ,61963
Landmark                       ,String  ,                   ,1 AVENUE                                                                           ,ZULETTE AVENUE                                                                                                                                                                                                                                                                                                                                            ,0          ,32         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,                                                                                                                                        ,5915        ,912779
Facility Type                  ,String  ,                   ,DSNY Garage                                                                        ,School District                                                                                                                                                                                                                                                                                                                                           ,0          ,15         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,N/A                                                                                                                                     ,6           ,145478
Status                         ,String  ,                   ,Assigned                                                                           ,Unspecified                                                                                                                                                                                                                                                                                                                                               ,4          ,16         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,Closed                                                                                                                                  ,10          ,0
Due Date                       ,String  ,                   ,01/01/2010 01:26:03 PM                                                             ,12/31/2018 12:59:20 PM                                                                                                                                                                                                                                                                                                                                    ,0          ,22         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,                                                                                                                                        ,345077      ,647794
Resolution Description         ,String  ,                   ,A DOB violation was issued for failing to comply with an existing Stop Work Order. ,"Your request was submitted to the Department of Homeless Services. The City?s outreach team will assess the homeless individual and offer appropriate assistance within 2 hours. If you asked to know the outcome of your request, you will get a call within 2 hours. No further status will be available through the NYC 311 App, 311, or 311 Online." ,0          ,934        ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,The Police Department responded to the complaint and with the information available observed no evidence of the violation at that time. ,1216        ,20480
Resolution Action Updated Date ,String  ,                   ,01/01/2010 01:50:45 PM                                                             ,12/31/2019 12:58:00 PM                                                                                                                                                                                                                                                                                                                                    ,0          ,22         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,                                                                                                                                        ,690314      ,15072
Community Board                ,String  ,                   ,0 Unspecified                                                                      ,Unspecified STATEN ISLAND                                                                                                                                                                                                                                                                                                                                 ,8          ,25         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,0 Unspecified                                                                                                                           ,77          ,0
BBL                            ,Integer ,2082985217282449   ,0                                                                                  ,5270000501                                                                                                                                                                                                                                                                                                                                                ,0          ,10         ,2751798943.2415357 ,1168122117.9238517  ,1.3645092823829048e18 ,-941195045.5       ,2028310001.0 ,3019480063.0 ,4007980032.0 ,1979670031.0        ,6977485078.5       ,-0.6874652461017284  ,                                                                                                                                        ,268383      ,243046
Borough                        ,String  ,                   ,BRONX                                                                              ,Unspecified                                                                                                                                                                                                                                                                                                                                               ,5          ,13         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,BROOKLYN                                                                                                                                ,6           ,0
X Coordinate (State Plane)     ,Integer ,919555108413       ,913281                                                                             ,1067220                                                                                                                                                                                                                                                                                                                                                   ,0          ,7          ,1005337.5451259618 ,22512.45281021959   ,506810531.5323639     ,956616.5           ,993572.0     ,1004546.0    ,1018209.0    ,24637.0             ,1055164.5          ,0.10548097081662003  ,                                                                                                                                        ,102556      ,85327
Y Coordinate (State Plane)     ,Integer ,188099299101       ,121152                                                                             ,271876                                                                                                                                                                                                                                                                                                                                                    ,0          ,6          ,205646.49782053265 ,31723.19849376398   ,1006361322.6747493    ,103334.0           ,182411.0     ,202514.0     ,235129.0     ,52718.0             ,314206.0           ,0.29623410966726027  ,                                                                                                                                        ,116092      ,85327
Open Data Channel Type         ,String  ,                   ,MOBILE                                                                             ,UNKNOWN                                                                                                                                                                                                                                                                                                                                                   ,5          ,7          ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,PHONE                                                                                                                                   ,5           ,0
Park Facility Name             ,String  ,                   ,"""Uncle"" Vito F. Maranzano Glendale Playground"                                  ,Zimmerman Playground                                                                                                                                                                                                                                                                                                                                      ,3          ,82         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,Unspecified                                                                                                                             ,1889        ,0
Park Borough                   ,String  ,                   ,BRONX                                                                              ,Unspecified                                                                                                                                                                                                                                                                                                                                               ,5          ,13         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,BROOKLYN                                                                                                                                ,6           ,0
Vehicle Type                   ,String  ,                   ,Ambulette / Paratransit                                                            ,Green Taxi                                                                                                                                                                                                                                                                                                                                                ,0          ,23         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,                                                                                                                                        ,5           ,999652
Taxi Company Borough           ,String  ,                   ,BRONX                                                                              ,Staten Island                                                                                                                                                                                                                                                                                                                                             ,0          ,13         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,                                                                                                                                        ,11          ,999156
Taxi Pick Up Location          ,String  ,                   ,1 5 AVENUE MANHATTAN                                                               ,YORK AVENUE AND EAST 70 STREET                                                                                                                                                                                                                                                                                                                            ,0          ,60         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,                                                                                                                                        ,1903        ,992129
Bridge Highway Name            ,String  ,                   ,145th St. Br - Lenox Ave                                                           ,Willis Ave Br - 125th St/1st Ave                                                                                                                                                                                                                                                                                                                          ,0          ,42         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,                                                                                                                                        ,68          ,997711
Bridge Highway Direction       ,String  ,                   ,Bronx Bound                                                                        ,Westbound/To Goethals Br                                                                                                                                                                                                                                                                                                                                  ,0          ,33         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,                                                                                                                                        ,50          ,997691
Road Ramp                      ,String  ,                   ,N/A                                                                                ,Roadway                                                                                                                                                                                                                                                                                                                                                   ,0          ,7          ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,                                                                                                                                        ,4           ,997693
Bridge Highway Segment         ,String  ,                   ,1-1-1265963747                                                                     ,Wythe Ave/Kent Ave (Exit 31)                                                                                                                                                                                                                                                                                                                              ,0          ,100        ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,                                                                                                                                        ,937         ,997556
Latitude                       ,Float   ,30355391.760447357 ,40.1123853                                                                         ,40.9128688                                                                                                                                                                                                                                                                                                                                                ,0          ,18         ,40.72881808178842  ,0.0893143967633158  ,0.007977061469194998  ,40.46458052499999  ,40.6677055   ,40.7221652   ,40.80312215  ,0.13541665000000336 ,41.006247125       ,0.223465041343198    ,                                                                                                                                        ,353694      ,254695
Longitude                      ,Float   ,-55100392.94986466 ,-77.5195844                                                                        ,-73.7005968                                                                                                                                                                                                                                                                                                                                               ,0          ,18         ,-73.92999235194246 ,0.16351835417538155 ,0.02673825215222552   ,-74.11194174999999 ,-73.970536   ,-73.9279455  ,-73.8762655  ,0.09427049999999326 ,-73.73485975000001 ,-0.03755270078594161 ,                                                                                                                                        ,353996      ,254695
Location                       ,String  ,                   ,"(40.1123853, -77.5195844)"                                                        ,"(40.9128688, -73.9024731)"                                                                                                                                                                                                                                                                                                                               ,0          ,40         ,                   ,                    ,                      ,                   ,             ,             ,             ,                    ,                   ,                     ,                                                                                                                                        ,375772      ,254695

Clearly, --infer-dates is a very expensive operation, and understandably so, since qsv's date parser engine has to parse and recognize 15 different date formats, with each format having several permutations.

Currently, DP+ uses the --infer-dates option during its analysis phase, which is something I'd still like to keep as its very useful when it does infer a column is a date field.

Perhaps, we should only attempt to infer dates when a quick initial scan of the CSV headers suggest the presence of a date field (i.e. search for the presence of "date", "time", "timestamp", "datetime" anywhere in a column name)?

Question: how close will this project be to datapusher?

Are they meant to diverge or is it expected that stuff that is merged to datapusher will keep being merged here?

I think such an issue is important to consider because some great improvements can be made to datapusher-plus if it properly diverges from datapusher. Decisions made there no longer would have to affect here, for instance a proper package install can be thought of, or things like a container image.

How likely is it that datapusher will be replaced with datapusher-plus if it outperforms it? What is the relationship between the maintainers of this project and those of datapusher (and ckan in general)?

I know, a lot of questions, but it would give me some context and an idea of how I can contribute

Containerfile does miss config module

Could not get the containerfile for version 0.10.1 to run, issue is here:

Traceback (most recent call last):
  File "/usr/lib/ckan/datapusher/venv/bin/datapusher_initdb", line 11, in <module>
    load_entry_point('datapusher-plus', 'console_scripts', 'datapusher_initdb')()
  File "/usr/lib/ckan/datapusher/venv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 489, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/ckan/datapusher/venv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 2852, in load_entry_point
    return ep.load()
  File "/usr/lib/ckan/datapusher/venv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 2443, in load
    return self.resolve()
  File "/usr/lib/ckan/datapusher/venv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 2449, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/lib/ckan/datapusher/code/datapusher/main.py", line 5, in <module>
    from config import config
ModuleNotFoundError: No module named 'config'

Roadmap Tracking Issue - EPIC

OVERALL VISION: To increase the utility and performance of the CKAN Datastore:

  • by enriching resources, so that right after a file is pushed by DP+, it does a lot of data-wrangling tasks that are typically done manually:
    • a lot of metadata is inferred, so the Data Publisher does not have to laboriously enter it in
    • descriptive statistics are computed, allowing the Data Publisher and the end-user to better understand the resource
    • location information is automatically normalized and geocoded
    • related datasets/resources are automatically inferred
    • auto-tagging
  • by taking advantage of PostgreSQL native features
    • also use it as a Document Database leveraging JSONB?
    • partitioning/sharding?
  • by tapping into the rich PostgreSQL extensions ecosystem (in particular - PostGIS, Timescale, Citus, CartoDB, Apache Age and ZomboDB)
  • give it "Data Lake"-like capabilities
  • enable Datastore API users to issue performant, reliable SQL queries

  • #98
  • #18
  • #11
  • Auto-tagging
  • Automatic spatial extent calculation
  • Automatic processing/recognition of whitelisted common column names (e.g. latitude, longitude, status, open date, closed date, etc.)
  • #53
  • #47
  • #27
  • #9
  • Auto partitioning
  • #60
  • Deferred datapush on initial package creation to allow per package Datapusher+ Configuration
  • #87
  • #17
  • Enabling record-level search
  • #8
  • #13
  • #54
  • #10
  • #19
  • #30
  • Native PostGIS support
  • Native time-series support with Timescale
  • #34
  • #35
  • #46

Per resource Datapusher+ job configuration

If a resource extra field called "dpplus_job_config" is defined and contains a valid DP+ job configuration JSON, DP+ will apply the configuration just for that job.

This will allow us to selectively do things like geocoding, PII screening, auto-spatial extent calculation, and adding summary stats on a per job basis.

Perhaps, a standard DP+ scheming widget can be created with the necessary controls, so CKAN installations can quickly add a DP+ job configurator to the resource schema.

Pkent

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Scanning for Personally Identifiable Information

qsv can scan a CSV with several regex patterns in one pass using the searchset command.

Let's leverage qsv searchset and create a list of configurable regex patterns that Datapusher+ can use.

URL parameters can break temp files

Describe the bug

2023-04-13 15:51:05,819 INFO Fetching from: https://statistik.leipzig.de/opendata/api/values?kategorie_nr=5&rubrik_nr=2&periode=y&format=csv...
[pid: 6|app: 0|req: 2/2] 127.0.0.1 () {32 vars in 501 bytes} [Thu Apr 13 15:51:05 2023] GET /job/54ebecf7-2a52-4347-a241-e6f651b7f9e0 => generated 930 bytes in 2 msecs (HTTP/1.1 200) 2 headers in 72 bytes (1 switches on core 1)
2023-04-13 15:51:06,124 ERROR Job "push_to_datastore (trigger: date[2023-04-13 15:51:05 UTC], next run at: 2023-04-13 15:51:05 UTC)" raised an exception
Traceback (most recent call last):
  File "/usr/lib/datapusher-plus/lib/python3.10/site-packages/apscheduler/executors/base.py", line 125, in run_job
    retval = job.func(*job.args, **job.kwargs)
  File "/usr/lib/datapusher-plus/src/datapusher-plus/datapusher/jobs.py", line 471, in push_to_datastore
    tmp = tempfile.NamedTemporaryFile(suffix="." + resource.get("format").lower())
  File "/usr/lib/python3.10/tempfile.py", line 698, in NamedTemporaryFile
    file = _io.open(dir, mode, buffering=buffering,
  File "/usr/lib/python3.10/tempfile.py", line 695, in opener
    fd, name = _mkstemp_inner(dir, prefix, suffix, flags, output_type)
  File "/usr/lib/python3.10/tempfile.py", line 395, in _mkstemp_inner
    fd = _os.open(file, flags, 0o600)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmppdo2j9wn.text/csv'

To Reproduce
Steps to reproduce the behavior:

  1. Create a resource with this link: https://statistik.leipzig.de/opendata/api/values?kategorie_nr=5&rubrik_nr=2&periode=y&format=csv
  2. Upload resource with datapusher-plus

Installer/Tester script

DP+ is currently being containerized and that will make for an easier deployment.

Independent of that effort, DP+ should have a near bulletproof installer script that:

  • check if python 3.8+ is installed
  • creates the python 3.8+ venv
  • downloads the required version of qsv
  • creates the datapusher_jobs databases
  • fetches the datastore.write_url sqlalchemy connect string from a provided ckan.ini
  • creates the datapusher user in the CKAN Datastore
  • confirms that the Datastore is installed and ready
  • configures supervisor
  • and pushes some sample CSV, TSV, Excel and ODS files
  • and exercises the different DP+ options (summary stats, PII screening, etc.)

The installer script can also be called to "test/verify" an already installed DP+ at any time.

Add resource constrained warnings

If DP+ is deployed on a resource-constrained host (low memory and low working disk space), add a warning when starting up.

Continue monitoring disk space and datastore utilization as well and warn the administrator if they go pass pre-configured thresholds.

Rewrite resource URLs

Hey there,

thank's for the greate work on DP+. There is one important feature I'm currently missing:
I'm using DP+ in Kubernetes cluster. Apparently, DP+ can't rewirte resource urls, if callback_url_base is used. (or am I missing something?)

The Datapusher image of Keitaro implements this (see here).

This is impotant, because without this feature k8s service names (z.B. http://ckan:5000) can't be used, to keep traffic between CKAN and DP+ inside the cluster network.
Thus, DP+ is not working e.g. in a local dev environment, without a FQDN for CKAN.

Better downloading of resources

Currently, DP+ downloads the resource as follows:

  • checks the Header content-length if its below MAX_CONTENT_LENGTH to see if the file is too large, filesize checking is not done if PREVIEW_ROWS is true
  • does chunked download in CHUNK_SIZE bytes
  • while doing chunked downloads, checks if the actual file size is actually below MAX_CONTENT_LENGTH (unless PREVIEW_ROWS is true)
  • checks the hash of the downloaded file if it has changed, if it hasn't , it will skip pushing the file into the datastore
  • supports only HTTP, HTTPS and FTP url schemes

Improve downloading by:

  • adding SFTP and S3 url schemes to start with, and then incrementally add other libcloud providers as required
  • SFTP keys will be managed in the Datapusher+ Management Interface
  • if PREVIEW_ROWS is true do not download the entire file, only download the first PREVIEW_ROWS_SAMPLE_SIZE and see if you have enough PREVIEW_ROWS, and then keep adding to the sample by PREVIEW_ROWS_SAMPLE_SIZE divided by 2 until a sample of PREVIEW_ROWS is downloaded
  • If PREVIEW ROWS is false and the Header content-length is less than MAX_CONTENT_LENGTH, download the file using http2 with brotli, gzip and deflate encoding in that order, direct to disk instead of streaming it and writing to disk in CHUNK_SIZE bytes.
  • for resources whose URL does not start with CKAN_SITE_URL, (it's a link to a third-party site), have more robust, fault-tolerant downloading, logging broken links, adding the link to the DATAPUSHER_RETRY queue
  • add an In-Progress placeholder resource view for queued resources, with optional link to the Datastore Tab where the Datapusher+ log messages are displayed

These changes will:

  • make sure we don't download unnecessarily large files, if we're only take the first N rows to create a PREVIEW in the datastore
  • if we're not doing PREVIEW, to download files in the most efficient way with http2 and compression.
  • allow better cataloging/datapushing of data hosted on third-party sites with configurable retries
  • improve user experience with the In-Progress placeholder resource view - for both the Data Publisher and Data Users

Datapusher+ Management Console for Orgadmins/Sysadmins

The browser-based Management Console will:

  • allow admins to view Datastore entries on a per Org/Instance basis based on their CKAN access rights
  • be able to see the upload log of resources they have access to
  • be able to reupload DP+ jobs
  • be able to change the DP+ parameters on a per job basis
  • export the log
  • management console actions are also captured in the activity stream
  • update/delete summary statistics for a resource

Fast upsert mode

Currently, DP+ like Datapusher and xloader, only does drop & replace and doesn't do upserts.

It'd be great if DP+ can support upserts in a performant way.

This can be done by:

  • adding a resource-level metadata field that the Data Publisher can set to enable upsert mode.
  • when a resource has upsert mode enabled, instead of drop & replace, DP+ will:
    • compare the schemas of the existing resource and the new CSV to see if they are identical (qsv can do this very quickly)
    • if they're not, DP+ will abort stating that the resource is in upsert mode and the schemas do not match
    • if the schemas are identical, do a PostgreSQL copy to a temporary table of the file to be pushed
    • then do a INSERT INTO ON CONFLICT DO UPDATE to upsert the temporary table into the existing resource
    • the temporary table is then deleted

Alias for resource is not stable

Describe the bug
Currently, DP+ creates a human-readable alias for a resource based on its org/package name/resource name.
It has logic to add a sequence to a resource to avoid namespace collisions for other resources, but it should always have a stable alias for the same resource.

Expected behavior
When a resource is refreshed, the alias should remain unchanged.

cc @twdbben @samibaig

Automatic CKAN Alias creation

When a resource is inserted into the Datastore, automatically create a CKAN alias (aka PostgreSQL view) so we have a human-readable name apart from the resource-id we can use in API calls, datastore-choices helper in scheming, etc.

To minimize namespace collisions - the alias should be - resource_name-package_name-owner_org.

However, PostgreSQL by default limits object names to be 63 characters long, so we need to create some biz rules in case of namespace collisions - perhaps, adding a sequence at the end, or flagging it during the datapusher+ job.

Upgrade to psycopg3 and use async COPY

Psycopg3 has been stable for ~1.5 years (https://www.psycopg.org/articles/2021/10/13/psycopg-30-released/) and one of its headline features is async support.

With it, perhaps we can do async COPY so we can return quickly even for very large files, while the streaming COPY and auto-indexing is still in progress.

This should allow us to support a more predictable Resource first upload workflow, allowing us to create the placeholder resource with inferred metadata in a few seconds (<5 seconds, even for very large files).

cc @wardi @twdbben

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.