Coder Social home page Coder Social logo

handle bulk POST about postgrest HOT 11 CLOSED

postgrest avatar postgrest commented on May 5, 2024
handle bulk POST

from postgrest.

Comments (11)

justinjoseph89 avatar justinjoseph89 commented on May 5, 2024

Hi,
This means posting multiple rows into postgres all at once is not possible? Please let me know if there is any ways?
Thanks

from postgrest.

begriffs avatar begriffs commented on May 5, 2024

It's not yet possible but I can prioritize this feature higher now that I know someone wants it. Ideally I'd like to use the COPY command internally for speed and I've been waiting for the hasql library to support it. But could implement the feature using INSERT INTO just to get it working then move it to COPY when that becomes available.

from postgrest.

justinjoseph89 avatar justinjoseph89 commented on May 5, 2024

Hi,
Thanks for the reply. I am looking forward to it.

from postgrest.

faroukI avatar faroukI commented on May 5, 2024

Hi,

I am also looking forward to this feature, as I need to insert multiple several rows at once.

Thanks for your hard work

from postgrest.

begriffs avatar begriffs commented on May 5, 2024

I've been thinking this evening about implementation details and there are a few choices we have to make.

  • How does the server tell the client all the urls and possibly fields of the newly created records? I'm leaning toward sending an HTTP mixed multipart response which can have a separate Location header for each resource.
  • If the request Prefer header is set to return=representation then each of these multipart responses can include a created json object.
  • How to handle failure? There are two ways I can see.
    • The entire post is considered as a transaction. if all items insert OK then we get the multipart response as described above. But if any fail they all roll back and we get a single response with the error message.
    • Alternately we can allow the successful ones to be saved and the failed ones not, and provide specific error messages in the multipart section for which it occurred. Thus the client would have to look through the list and determine if any should be modified and retried.
  • The first of those alternatives is slightly easier to implement in SQL using INSERT INTO ... VALUES .... To do the second option would require creating an explicit transaction and loading up a single call to the database with multiple insert statements.
  • Accepting an array of JSON objects puts a burden on the server to check that each object contains the same keys and to line the keys up in a certain order inside the sql insert statement. There are a few ways to handle this
    • Have the server not check at all. Blithely order the keys in each object alphabetically in the VALUES section of the insert statement. If one of the objects omits a necessary key then the error message will set things straight with the client. However if keys somehow are offset in the wrong way there is the possibility of silently doing the wrong thing and scrambling column values in the database, especially where NULLs are allowed.
    • Have the server make two passes through the input, one to assess that the data is OK and establish positions for each key in the insert statement, the other pass to build the query. Remember that some keys may have default values and it is OK to omit them, but omitting them means specifying DEFAULT in the correct value position of the insert statement. This prevents data problems but is slow. I don't know yet what the space and time overhead would actually be.
  • Another alternative would be to accept bulk inserts as CSV format only, not JSON. That puts the burden on the client to get column data in the right order. it also makes it easy to check for lines with too few fields. Would this be an acceptable solution for you?

from postgrest.

faroukI avatar faroukI commented on May 5, 2024

Hi,

Thank you again for everything.

For my application, your last solution would be the best one (CSV format only), as the burden toward the server would be reduced (but I guess it depends for everyone), and for me, adapting the code from a JSON to a CSV format is easy.
However, strictly speaking, that would create an inconsistency between the POST of a single row (which requires a JSON format) and the POST of multiple rows (which would require a CSV format).

If you decide however to use the first solution, with a JSON array and a multipart response, regarding how you handle failure, it would be preferable if the entire post is considered as a transaction. It would reduce the burden on the client side (of having to sort through successful and failed posts).
Regarding how you would, or not, check the the keys to ensure the data is OK, one way to go at it would be with an option when launching the postgrestsql program (thus leaving the choice to every developers):

  • one "expert" mode, where nothing would be check, and it would be up to the client application to make sure that everything sent is good
  • a "default" mode where the server would make the two passes as you mentionned.

That way, the developer will be able to use the default option when creating his application, and, when he is sure everything is good, reduce the server's burden by switching to the "expert" option.

Again, I want to thank you for this, as it is truly useful (at least for me).

from postgrest.

begriffs avatar begriffs commented on May 5, 2024

Today I implemented bulk inserts with CSV. Single-row CSV posts are supported so you can choose to always use CSV rather than being forced to use JSON for single row and CSV for multi-row. And single JSON object inserts will still be supported.

Tomorrow I'll work on returning the right headers and HTTP codes but I think the hard part is now finished. Once it's ready to merge to master I'll release it as version 0.2.8.0.

from postgrest.

begriffs avatar begriffs commented on May 5, 2024

All tests but one are passing now.

I have discovered an interesting issue though: how should we represent true null vs empty strings when importing from CSV? I'm tempted to say that the empty string with double quotes ("") could represent an empty string whereas a missing field (two commas next to each other) could represent null but I think the cassava Haskell library will parse these cases indistinguishably. Or we could make a special string code that gets turned into null. Feels a bit arbitrary. Any opinions?

from postgrest.

faroukI avatar faroukI commented on May 5, 2024

If the empty string ("") vs missing field (two commas) does not work, I'd think a "null" or "NULL" string should be good (as it is the same keyword used in SQL).

And in my opinion, if I could make it work without breaking/modifying the code of a library by imposing a keyword, I would.

Thank you for your work

from postgrest.

begriffs avatar begriffs commented on May 5, 2024

I chose the word NULL to map to a SQL null.

from postgrest.

DanielJoyce avatar DanielJoyce commented on May 5, 2024

What if someone has a value called "NULL". how is that handled?

from postgrest.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.