Coder Social home page Coder Social logo

simonw / datasette Goto Github PK

View Code? Open in Web Editor NEW
8.9K 101.0 625.0 6.29 MB

An open source multi-tool for exploring and publishing data

Home Page: https://datasette.io

License: Apache License 2.0

Python 89.13% HTML 6.66% CSS 1.46% Dockerfile 0.05% JavaScript 2.15% Shell 0.32% C 0.13% Just 0.10%
sqlite python datasets json docker datasette automatic-api asgi csv datasette-io

datasette's Introduction

Datasette

PyPI Changelog Python 3.x Tests Documentation Status License docker: datasette discord

An open source multi-tool for exploring and publishing data

Datasette is a tool for exploring and publishing data. It helps people take data of any shape or size and publish that as an interactive, explorable website and accompanying API.

Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with the world.

Explore a demo, watch a video about the project or try it out by uploading and publishing your own CSV data.

Want to stay up-to-date with the project? Subscribe to the Datasette newsletter for tips, tricks and news on what's new in the Datasette ecosystem.

Installation

If you are on a Mac, Homebrew is the easiest way to install Datasette:

brew install datasette

You can also install it using pip or pipx:

pip install datasette

Datasette requires Python 3.8 or higher. We also have detailed installation instructions covering other options such as Docker.

Basic usage

datasette serve path/to/database.db

This will start a web server on port 8001 - visit http://localhost:8001/ to access the web interface.

serve is the default subcommand, you can omit it if you like.

Use Chrome on OS X? You can run datasette against your browser history like so:

 datasette ~/Library/Application\ Support/Google/Chrome/Default/History --nolock

Now visiting http://localhost:8001/History/downloads will show you a web interface to browse your downloads data:

Downloads table rendered by datasette

metadata.json

If you want to include licensing and source information in the generated datasette website you can do so using a JSON file that looks something like this:

{
    "title": "Five Thirty Eight",
    "license": "CC Attribution 4.0 License",
    "license_url": "http://creativecommons.org/licenses/by/4.0/",
    "source": "fivethirtyeight/data on GitHub",
    "source_url": "https://github.com/fivethirtyeight/data"
}

Save this in metadata.json and run Datasette like so:

datasette serve fivethirtyeight.db -m metadata.json

The license and source information will be displayed on the index page and in the footer. They will also be included in the JSON produced by the API.

datasette publish

If you have Heroku or Google Cloud Run configured, Datasette can deploy one or more SQLite databases to the internet with a single command:

datasette publish heroku database.db

Or:

datasette publish cloudrun database.db

This will create a docker image containing both the datasette application and the specified SQLite database files. It will then deploy that image to Heroku or Cloud Run and give you a URL to access the resulting website and API.

See Publishing data in the documentation for more details.

Datasette Lite

Datasette Lite is Datasette packaged using WebAssembly so that it runs entirely in your browser, no Python web application server required. Read more about that in the Datasette Lite documentation.

datasette's People

Contributors

abdusco avatar asg017 avatar aslakr avatar bgrins avatar bobwhitelock avatar bollwyvl avatar cb160 avatar davidbgk avatar dependabot-preview[bot] avatar dependabot[bot] avatar eyeseast avatar fgregg avatar glasnt avatar jaap3 avatar jacobian avatar jaywgraves avatar jefftriplett avatar jthodge avatar natbat avatar qwo avatar r4vi avatar raynae avatar rgieseke avatar rhettbull avatar rixx avatar rprimet avatar russss avatar ryanpitts avatar simonw avatar wragge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datasette's Issues

Implement command-line tool interface

The first version needs to take one or more file names or URLs, then generate and deploy an app to Now. It will assume you already have the now command installed and configured.

Homepage UI for editing metadata file

Since we are going to have a metadata file which sets the title/description/etc for each database, why not allow you to run the app in —dev mode which makes the homepage into a WYSIWYG editor that can save to that file format.

Ability to plot a simple graph

Might be as simple as: pick he type of chart (bar, line) and then pick the column for the X axis and the column for the Y axis. Maybe also allow a pie chart. It’s up to the user to come up with SQL that gets the right values.

date, year, month and day querystring lookups

  • ?timestamp___date=2017-07-17 - return every item where the timestamp falls on that date
  • ?timestamp___year=2017 - return every item where the timestamp falls within 2017
  • ?timestamp___month=1 - return every item where the month component is January
  • ?timestamp___day=10 - return every item where the day-of-the-month component is 10

Follow on from #23

Implement full URL design

Full URL design:

/database-name
/database-name.json
/database-name-7sha256
/database-name-7sha256.json
/database-name/table-name
/database-name/table-name.json
/database-name-7sha256/table-name
/database-name-7sha256/table-name.json
/database-name-7sha256/table-name/compound-pk
/database-name-7sha256/table-name/compound-pk.json

Make it so you can override templates

The app will ship with default templates but, just like with the Django admin, you will be able to override them using either explicit configuration settings or just by dropping in templates with certain file names.

Template inheritance should work here, both allowing you to override just the base template and allowing you to customize tiny bits of others.

Switch to ujson

ujson is already a dependency of Sanic, and should be quite a bit faster.

Support Django-style filters in querystring arguments

e.g

/database/table?name__contains=Simon&age__gte=4

Same format as Django: double underscore as the split.

If you need to match against a column that happens to contain a double underscore in its official name, do this:

/database/table?weird__column__exact=Simon

__exact is the default operation if none is supplied.

Unit tests against application itself

Use Sanic’s testing mechanism. Test should create a temporary SQLite database file on disk by executing sql that is stored in the test themselves.

For the moment we can just test the JSON API more thoroughly and just sanity check that the HTML output doesn’t throw any errors.

While running, server should spot new db files added to its directory

Maybe in each request it checks the time and if 5s has elapsed since t last scanned the directory it scans it again

This would allow people with dedicated hosting to run the app there and just upload new datasets whenever they want. It would also be very convenient for development.

Addressable pages for every row in a table

/database-name-7sha256/table-name/compound-pk
/database-name-7sha256/table-name/compound-pk.json

Tricky part will be figuring out what the private key is - especially since it could be a compound primary key and it might involve different data types.

Protect against malicious SQL that causes damage even though our DB is immutable

I’m currently operating under the assumption that it’s safe to allow arbitrary SQL statements because we are dealing with an immutable database. But this might not be the case - there are some pretty weird SQLite language extensions (ATTACH, PRAGMA etc) and I’m not certain they cannot be used to break things in a way that would affect future requests to the API.

Solution: provide a “safe mode” option which disables the ?sql= mechanism. This still leaves the URL filter lookups, so I need to make sure that those are “safe”.

In the future I may also implement a whitelist option where datasets can be configured to only allow specific filters against specific columns.

?_group_count=country - return counts by specific column(s)

Imagine if this:

https://stateless-datasets-jykibytogk.now.sh/flights-07d1283/airports.jsono?country__contains=gu&_group_count=country

Turned into this:

https://stateless-datasets-jykibytogk.now.sh/flights-07d1283?sql=select%20country,%20count(*)%20as%20group_count_country%20from%20airports%20where%20country%20like%20%27%gu%%27%20group%20by%20country%20order%20by%20group_count_country%20desc

This would involve introducing a new precedent of query string arguments that start with an _ having special meanings. While we're at it, could try adding _fields=x,y,z

Tasks:

  • Get initial version working
  • Refactor code to not just "pretend to be a view"
  • Get foreign key relationships expanded

Datasette Plugins

It would be neat if additional functionality could be opted-in to the system in the form of easy-to-add plugins, hosted as separate packages. First example: a Google Analytics plugin, which adds GA tracking code with your tracking ID to the web interface for your dataset.

This may be an opportunity to experiment with entry points: http://amir.rachum.com/blog/2017/07/28/python-entry-points/

Homepage should show summary of databases

I sch database should have a name, optional description, download link and a summary of the tables

Flights.db
Flights and suchlike blah.
URL? License?
577373 rows across 14 tables
airports, routes, airlines...

Title of the homepage is derived from the databases or can be manually overridden e. “Datasets of Flights, NHS, Blah...” - or if only one database just the title of that.

Make URLs immutable

Absolutely everything should have a far-future expires header

Part of the URL will be the truncated sha1 hash of the database file itself, calculated at build time

Support CSV export with a .csv extension

Maybe do this using streaming with multiple pagination SQL queries so we can support arbritrarily large exports.

How would this work against a view which doesn’t have an obvious efficient pagination mechanism? Maybe limit views to up to 1000 exported records?

Relates to #5

See if I can get a websockets interface working

Since I am already running on Sanic, how hard would it be to add a websocket ebdpoint that lets you talk to sqlite interactively?

Could this be used to efficiently support streaming in answers to giant queries?

Better JSON response options

Default returns this:

{
    “Columns”: [“id”, “name”, “age”],
    “Rows”: [
         [45, “Simon”, 36]
    ]
}

.jsono instead returns a list of objects each duplicating the headers in its keys.

They both probably share the same pagination mechanism so it might not be a jsono flat list.

Dockerfile should build more recent SQLite with FTS5 and spatialite support

The SQLite bundled with Python 3 doesn't support the FTS5 search extension. It would be nice if the SQLite built by our Dockerfile could support as many modern SQLite features as possible.

https://web.archive.org/web/20170212034155/http://charlesleifer.com/blog/using-the-sqlite-json1-and-fts5-extensions-with-python/ has instructions on building a more recent SQLite and the pysqlite package. Our Dockerfile could carry out an updated version of this process.

Pick a name

Options so far:

  • immutabase
  • datasite
  • sqlstatic
  • dbserve
  • sqlserve

Terms to play with:

  • immutable
  • sqlite
  • dataset
  • json
  • static
  • serve

Support multiple databases

I'm going to loop through every database file in the app root directory and bundle all of them.

Each one will be accessible at /databasename

Note this is without the file extension, and we will disallow multiple files with the same name but different extensions.

Supported extensions to start with will be .db and .sqlite and .sqlite3

Make individual column valuables addressable, with smart content types

Some SQLite databases embed images in columns. It would be cool if these had URLs.

/database-name-7sha256/table-name/compound-pk/column
/database-name-7sha256/table-name/compound-pk/column.json
/database-name-7sha256/table-name/compound-pk/column.png
/database-name-7sha256/table-name/compound-pk/column.gif
/database-name-7sha256/table-name/compound-pk/column.txt

The one without an explicit file extension auto-detects the correct extension.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.