Comments (20)
I wrote a bunch of detail on this on this comment: #194 (comment)
In short, java.nio may not be necessary, or a simple answer here.
from pgjdbc.
As I also wondering why PGJDBC is not using native NIO Channels instead of InputStream/OutputStream, I'd like to ask the following questions:
- Do you mean that it is impossible to use NIO Channels?
- Or do you mean that it is just not needed to solve this particular issue?
In particular I wonder whether NIO Channels would be beneficial for non-SSL (pure TCP) connections?
from pgjdbc.
Markus,
The reality is that when Oliver wrote the V3 protocol NIO was not
available. Nobody has invested the effort into writing the protocol layer
with NIO. I'd certainly welcome an NIO protocol implementation.
Dave
On 19 June 2015 at 07:37, Markus Karg [email protected] wrote:
As I also wondering why PGJDBC is not using native NIO Channels instead of
InputStream/OutputStream, I'd like to ask the following questions:
- Do you mean that it is impossible to use NIO Channels?
- Or do you mean that it is just not needed to solve this particular
issue? In particular I wonder whether NIO Channels would be beneficial for
non-SSL (pure TCP) connections?—
Reply to this email directly or view it on GitHub
#163 (comment).
from pgjdbc.
@davecramer Great to hear. I used NIO some years back with a different project and it gave it a performance boost. @vlsi Possibly Vladimir can help me this Summer with some benchmarking, so I could provide a PR for that. Shouldn't be too complex, but would reduce the amount the byte arrays a lot.
from pgjdbc.
would reduce the amount the byte arrays a lot
I would not be very optimistic here.
- In the "send binds to backend" case the main "problem" is backend wants to have "length" field before the data, so we really have to buffer data somewhere.
- In the "receive tuple" case pgjdbc creates lots of individual
byte[]
for easier implementation of further "get" APIs. Well, it might be the case that netty'sByteBuf
would simplify "random access" into the received stream and allow us to avoid lots of temproarybyte[]
from pgjdbc.
@vlsi I'll give it a quick shot anyways, just for fun. If it turns out it is useless, we can simply stay with the code as-is. :-)
from pgjdbc.
just for fun.
Ah, that is fun indeed.
from pgjdbc.
I checked what Oracle report wrt NIO and they say ByteBuffers surpass byte[] starting with about 8K size. So I expect the biggest performance gain when batch-loading large ResultSet. Maybe I should concentrate on that.
from pgjdbc.
Or in the copy command ?
Dave Cramer
On 22 June 2015 at 06:15, Markus Karg [email protected] wrote:
I checked what Oracle report wrt NIO and they say ByteBuffers surpass
byte[] starting with about 8K size. So I expect the biggest performance
gain when batch-loading large ResultSet. Maybe I should concentrate on that.—
Reply to this email directly or view it on GitHub
#163 (comment).
from pgjdbc.
@davecramer I expect it would speed up COPY, but I assume many more people will use ResultSets.
from pgjdbc.
True, but if you do it for one we should be good to go for all of them
Dave Cramer
On 22 June 2015 at 07:22, Markus Karg [email protected] wrote:
@davecramer https://github.com/davecramer I expect it would speed up
COPY, but I assume many more people will use ResultSets.—
Reply to this email directly or view it on GitHub
#163 (comment).
from pgjdbc.
It'd be useful to know the size of data we're going to send in advance anyway, for protocol deadlock avoidance reasons. IMO we should really have a class for each message type, so we can instantiate it, set its data, get its length, then send it on the wire. This would fit well with a channels based approach.
from pgjdbc.
We have a service that periodically dumps/load a lot of data and we were using batched insert but was still slow. Only after reading jdbc source code and finding this issue and related one, I have rewrote parts of our service, so it now generates big insert (with 1000 rows per insert) and it is huge improvement. But it still feels like I am not able to achive 100% of what postgres server could service and because I am being slowed by plain round trip to database. It would be nice if batching in jdbc would work as expected and if possible with use asynchronous operation.
from pgjdbc.
@sasavilic the batching issues are an artifact of the server. It does not support batching. If you really want performant inserts, copy is the way to go.
As far as using NIO for inserts. Pull requests are welcome.
from pgjdbc.
with 1000 rows per insert
@sasavilic , which pgjdbc version are you using?
Have you tried adding reWriteBatchedInserts=true
option?
It will enable rewrite batched "insert values()" queries to a single insert values(),(),(),...
statement.
Unfortunately, sever-side does not really support true batching (it still processes commands one by one even in case the commands are send in a single network packet), so for high performance there are tree options only:
reWriteBatchedInserte=true
- Manually craft a single SQL
- CopyAPI
from pgjdbc.
I think I owe an apology. After double checking and repeating the tests, I can see that performance are almost identical. I must have done something wrong, when I was measuring first time. Sorry about false alarm.
I am using 42.2.2 with manually crafted SQL for insert (variant B) and simple SQL insert with JDBC batching (variant A). Beside oblivious error in measurement during first test, I was additionally confused by comments in source code and github: I have thought that JDBC library waits for Command completion after each Bind/Describe/Execute, but after looking into traffic in wireshark, I can see that I was wrong.
(I don't want to pollute your thread further, but my problem is that I have a java process that reads from one table, do some complex processing and then stores result in second table. I use cursor for fetching data from source table (5000 rows is fetch size) and then insert them in batches of 100 rows (variant A). My java thread is at 50% utilization, postgres backend is at 40% utilization (but the whole server is almost idle), write performance is around 5-7 MB/s (although we have fast ssds) and average throughput in around 10k rows/s, network is supposed to be good. I expected that either postgres or java will be near 100% utilization, but I can't achieve that. That is why I was suspicious about batch processing and JDBC driver and was under impression that I am loosing a lot of time for sending this small requests/responses between client and backend)
from pgjdbc.
I still think this would be useful, in fact rewriting the entire thing with NIO would be useful.
from pgjdbc.
I might be HS but just saying that in java 13
https://openjdk.java.net/jeps/353
https://openjdk.java.net/jeps/373
The Java socket API has been reimplemented which might alleviate some issues and crucially is forward compatible with loom which will enable true asynchronocity. NIO socket should also ensure loom compatibility.
Finally: noob question
Will the NIO socket enable true non blocking socket and therefore enable asynchronous queries, query pipelining and batch processing?
https://2ndquadrant.github.io/postgres/libpq-batch-mode.html
This feature is the biggest performance bottleneck for servers on the TechEmpowerup benchmarck. If you were able to solve this before loom or R2DBC you would allow existing Java code to shine and spring to be on the top 5 fastest HTTP servers of the world!
from pgjdbc.
interesting. Thanks
from pgjdbc.
@LifeIsStrange Loom doesn't automatically make NOTIFY properly async or anything else - for that the driver really needs to split rx/tx into separate threads. However Loom would make it cheaper to do that.
from pgjdbc.
Related Issues (20)
- NamedParameterJdbcTemplate execute complex Insert Query HOT 1
- feature request: Add support for native java unix sockets without needing JNI
- Postgres explain plan is not correctly optimezed if the data is a "double" but the collumn is a "numeric" HOT 10
- ResultSet.getObject throws exeception for 'Infinity'::numeric values HOT 4
- ARRAY() function return result problem HOT 2
- capitalization matters for PGInterval HOT 1
- PgStatement.executeWithFlags(sql, QueryMode.Simple) does not execute in Simple mode
- SQLWarnings returned on successful SQL statements HOT 28
- Provide Enum implementation of java.sql.SQLType interface? HOT 1
- Using Flink Datastream API to read and write PostgreSQL
- "ERROR: SET TRANSACTION ISOLATION LEVEL must not be called in a subtransaction" with autosave HOT 2
- SSLSocket duplex close failed (again) HOT 5
- Trying to update table using PreparadStatement, But parameters are generated inside a Parentheses HOT 1
- INSERT Java Boolean Type into a database BIT type will result in an incompatible type error. HOT 5
- Authentication SSL Mechanism HOT 20
- Query interceptors HOT 4
- Authentication failed SSL error HOT 1
- Memory Consumption, binary transfer bug HOT 5
- "Unexpected packet type during copy" when network is pool HOT 2
- Connect Java with PostgreSQL (16.3) using SSPI authentication HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pgjdbc.