Coder Social home page Coder Social logo

datasette-insert's Introduction

datasette-insert

PyPI Changelog License

Datasette plugin for inserting and updating data

Installation

Install this plugin in the same environment as Datasette.

$ pip install datasette-insert

This plugin should always be deployed with additional configuration to prevent unauthenticated access, see notes below.

If you are trying it out on your own local machine, you can pip install the datasette-insert-unsafe plugin to allow access without needing to set up authentication or permissions separately.

Inserting data and creating tables

Start datasette and make sure it has a writable SQLite database attached to it. If you have not yet created a database file you can use this:

datasette data.db --create

The --create option will create a new empty data.db database file if it does not already exist.

The plugin adds an endpoint that allows data to be inserted or updated and tables to be created by POSTing JSON data to the following URL:

/-/insert/name-of-database/name-of-table

The JSON should look like this:

[
    {
        "id": 1,
        "name": "Cleopaws",
        "age": 5
    },
    {
        "id": 2,
        "name": "Pancakes",
        "age": 5
    }
]

The first time data is posted to the URL a table of that name will be created if it does not aready exist, with the desired columns.

You can specify which column should be used as the primary key using the ?pk= URL argument.

Here's how to POST to a database and create a new table using the Python requests library:

import requests

requests.post("http://localhost:8001/-/insert/data/dogs?pk=id", json=[
    {
        "id": 1,
        "name": "Cleopaws",
        "age": 5
    },
    {
        "id": 2,
        "name": "Pancakes",
        "age": 4
    }
])

And here's how to do the same thing using curl:

curl --request POST \
  --data '[
      {
        "id": 1,
        "name": "Cleopaws",
        "age": 5
      },
      {
        "id": 2,
        "name": "Pancakes",
        "age": 4
      }
    ]' \
    'http://localhost:8001/-/insert/data/dogs?pk=id'

Or by piping in JSON like so:

cat dogs.json | curl --request POST -d @- \
    'http://localhost:8001/-/insert/data/dogs?pk=id'

Inserting a single row

If you are inserting a single row you can optionally send it as a dictionary rather than a list with a single item:

curl --request POST \
  --data '{
      "id": 1,
      "name": "Cleopaws",
      "age": 5
    }' \
    'http://localhost:8001/-/insert/data/dogs?pk=id'

Automatically adding new columns

If you send data to an existing table with keys that are not reflected by the existing columns, you will get an HTTP 400 error with a JSON response like this:

{
    "status": 400,
    "error": "Unknown keys: 'foo'",
    "error_code": "unknown_keys"
}

If you add ?alter=1 to the URL you are posting to any missing columns will be automatically added:

curl --request POST \
  --data '[
      {
        "id": 3,
        "name": "Boris",
        "age": 1,
        "breed": "Husky"
      }
    ]' \
    'http://localhost:8001/-/insert/data/dogs?alter=1'

Upserting data

An "upsert" operation can be used to partially update a record. With upserts you can send a subset of the keys and, if the ID matches the specified primary key, they will be used to update an existing record.

Upserts can be sent to the /-/upsert API endpoint.

This example will update the dog with ID=1's age from 5 to 7:

curl --request POST \
  --data '{
      "id": 1,
      "age": 7
    }' \
    'http://localhost:3322/-/upsert/data/dogs?pk=id'

Like the /-/insert endpoint, the /-/upsert endpoint can accept an array of objects too. It also supports the ?alter=1 option.

Permissions and authentication

This plugin defaults to denying all access, to help ensure people don't accidentally deploy it on the open internet in an unsafe configuration.

You can read about Datasette's approach to authentication in the Datasette manual.

You can install the datasette-insert-unsafe plugin to run in unsafe mode, where all access is allowed by default.

I recommend using this plugin in conjunction with datasette-auth-tokens, which provides a mechanism for making authenticated calls using API tokens.

You can then use "allow" blocks in the datasette-insert plugin configuration to specify which authenticated tokens are allowed to make use of the API.

Here's an example metadata.json file which restricts access to the /-/insert API to an API token defined in an INSERT_TOKEN environment variable:

{
    "plugins": {
        "datasette-insert": {
            "allow": {
                "bot": "insert-bot"
            }
        },
        "datasette-auth-tokens": {
            "tokens": [
                {
                    "token": {
                        "$env": "INSERT_TOKEN"
                    },
                    "actor": {
                        "bot": "insert-bot"
                    }
                }
            ]
        }
    }
}

With this configuration in place you can start Datasette like this:

INSERT_TOKEN=abc123 datasette data.db -m metadata.json

You can now send data to the API using curl like this:

curl --request POST \
  -H "Authorization: Bearer abc123" \
  --data '[
      {
        "id": 3,
        "name": "Boris",
        "age": 1,
        "breed": "Husky"
      }
    ]' \
    'http://localhost:8001/-/insert/data/dogs'

Or using the Python requests library like so:

requests.post(
    "http://localhost:8001/-/insert/data/dogs",
    json={"id": 1, "name": "Cleopaws", "age": 5},
    headers={"Authorization": "bearer abc123"},
)

Finely grained permissions

Using an "allow" block as described above grants full permission to the features enabled by the API.

The API implements several new Datasett permissions, which other plugins can use to make more finely grained decisions.

The full set of permissions are as follows:

  • insert:all - all permissions - this is used by the "allow" block described above. Argument: database_name
  • insert:insert-update - the ability to insert data into an existing table, or to update data by its primary key. Arguments: (database_name, table_name)
  • insert:create-table - the ability to create a new table. Argument: database_name
  • insert:alter-table - the ability to add columns to an existing table (using ?alter=1). Arguments: (database_name, table_name)

You can use plugins like datasette-permissions-sql to hook into these more detailed permissions for finely grained control over what actions each authenticated actor can take.

Plugins that implement the permission_allowed() plugin hook can take full control over these permission decisions.

CORS

If you start Datasette with the datasette --cors option the following HTTP headers will be added to resources served by this plugin:

Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: content-type,authorization
Access-Control-Allow-Methods: POST

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

cd datasette-insert
python3 -m venv venv
source venv/bin/activate

Now install the dependencies and tests:

pip install -e '.[test]'

To run the tests:

pytest

datasette-insert's People

Contributors

simonw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

datasette-insert's Issues

upsert on table with NOT NULL columns is not supported

I found a case where /-/upserting a non-existing row behaves differently than /-/inserting a non-existing row.

This surprised me, but I might be misunderstanding the intended use.

If I have a table like:

create table test(key text primary key, value text);

I can do:

$ curl -XPOST https://example.com/-/upsert/db/test?pk=key -d '{"key": "abc", "value": "def"}'

{"table_count": 1}

But if the columns are non null, the upsert fails:

create table test(key text primary key, value text not null);
$ curl -XPOST https://example.com/-/upsert/db/test?pk=key -d '{"key": "abc", "value": "def"}'

{"table_count": 0}

(The /-/insert endpoint adds the row in both cases.)

Implement MVP

The minimum for this to be useful would be:

POST /-/update/mydb/my_table?pk=mycolumn - body is a JSON array of values

This will create the table if it doesn't exist. It will insert-replace the data based on the primary key (if one was specified).

Add ?alter=true to alter the table and add any missing columns if needed.

Originally posted by @simonw in #1 (comment)

Locked down by default

The plugin currently accepts any incoming JSON by default, under the expectation that you'll initially be running it on a laptop. The README shows how to secure it.

https://www.bleepingcomputer.com/news/security/new-meow-attack-has-deleted-almost-4-000-unsecured-databases/ reminded me that MongoDB and Elasticsearch are open by default, with the result that people keep on deploying unprotected instances.

So by 1.0 of this plugin I'm going to figure out how to have it secure by default.

Rename to datasette-insert-api

datasette-insert-api seems clearer to me in terms of explaining that this plugin lets you insert data. The fact that it ALSO lets you update data isn't particularly confusing.

datasette-update-api - the current name - doesn't imply insert as strongly as insert-API implies update.

Support inserting a single record

This should work:

curl --request POST \
  --data '{
        "id": 3,
        "name": "Boris",
        "age": 1,
        "breed": "Husky"
    }' \
    'http://localhost:8001/-/update/unsafe/dogs'

Think about CORS

Experimental patch - is this the right approach?

diff --git a/datasette_insert/__init__.py b/datasette_insert/__init__.py
index db10210..159f9d2 100644
--- a/datasette_insert/__init__.py
+++ b/datasette_insert/__init__.py
@@ -14,6 +14,16 @@ async def insert_update(request, datasette):
     table = request.url_vars["table"]
     db = datasette.get_database(database)
 
+    # Handle CORS
+    if request.method == "OPTIONS":
+        r = Response.text("ok")
+        if datasette.cors:
+            r.headers["Access-Control-Allow-Origin"] = "*"
+            r.headers["Access-Control-Allow-Headers"] = "content-type,authorization"
+            r.headers["Access-Control-Allow-Methods"] = "POST"
+        print(r.headers)
+        return r
+
     # Check permissions
     allow_insert_update = False
     allow_create_table = False

Finely grained permissions

Here's the documentation I wrote for finely grained permissions, which are not yet implemented:


Finely grained permissions

Using an "allow" block as described above grants full permission to the features enabled by the API.

The API implements several new Datasett permissions, which other plugins can use to make more finely grained decisions.

The full set of permissions are as follows:

  • insert-api:all - all permissions - this is used by the "allow" block described above. Argument: database_name
  • insert-api:insert-update - the ability to insert data into an existing table, or to update data by its primary key. Arguments: (database_name, table_name)
  • insert-api:create-table - the ability to create a new table. Argument: database_name
  • insert-api:alter-table - the ability to add columns to an existing table (using ?alter=1). Arguments: (database_name, table_name)

You can use plugins like datasette-permissions-sql to hook into these more detailed permissions for finely grained control over what actions each authenticated actor can take.

Plugins that implement the permission_allowed() plugin hook can take full control over these permission decisions.

Originally posted by @simonw in #4 (comment)

Support upserts

So I can POST a partial JSON document matching a primary key and only the fields I provide will be updated.

Optional table definition

It would be good if you could define the table columns in the metadata itself, such that the table is created to that specification if it's missing and then records are inserted into that table.

feature request: expose inserted row ids

Hello, thank you for the wonderful work!

When a table's primary key is generated by sqlite, it would be useful to return a reference to that id. For example, my use case is a web app where I would like to redirect a user after a form entry.

Exposing last_pk/last_rowid would be enough for single records I think, but it may be good to consider how to handle multiples.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.