Comments (9)
I don't know if you are aware of it, but there is builtin BLOB support in Crate. See https://crate.io/blog/using-crate-data-as-a-blobstore/ for an introduction and the documentation https://crate.io/docs/stable/blob.html
from crate.
The internal BLOB support is good and useful. I am talking about an option to flip a switch and have crate automatically store the blobs on s3 (or another file storage system).
from crate.
Ah, so you mean that Crate should act as a kind of proxy to 3rd party storages. I think this use-case is very seldom, so probably we will not put resources on this. We think that in most cases, one will use Crate as the BLOB store because there is no additional setup required. Also local disks are cheaper in most of the cases.
If you really need S3, it would make sense to access it directly, since a Crate proxy will give you no benefit in this case and will always have the same latency as S3.
from crate.
At a first glance I can see three possible benefits for external storage of blobs:
- It is possible to use the same code no matter if you use your local test environment or your productive environment on s3.
- You only need to know the connection string to crate to access all your data (blobs and text content)
- If offered flexibly it could work as a import/export option (i.e. you could easily import blobs from/to s3)
from crate.
Amendment: s3 as backup/seed option would also give me confidence that the system really is fail-safe :)
from crate.
yep, we have an s3 backup option like "copy to" for blobs on our roadmap.
from crate.
But that is just a backup right? There is another reason I just remembered for a blob storage strategy on s3
- you will never ever run out of disk space.
from crate.
I am not a code contributor but I did want to briefly chime in on this. I honestly see zero benefit to having Crate store its blob data in S3. I have actually used crate and its built in blob store to completely move all of our in house data off of S3. Crate allowed us to build an in house replacement for S3 and I believe that was one of the original intents of the Crate blob store. It allows you to store your blobs locally in your own storage cluster. The added latency from connecting to Crate and then having it connect to S3 is never going to be beneficial. You would be better served to store your meta data in Crate and then connect directly to S3 within your application than to have Crate act as a proxy to an external data store such as S3. It seems that your main concern is around not having enough disk space to store your data and really that is easily solved buy building your own storage cluster. It most likely is not cost effective to do so on VMs but dedicated hardware is cheap and plentiful once your project is getting to a scale where you need to be seriously concerned about it and honestly once you have that amount of data local disks is way cheaper than S3.
from crate.
i think @weswam is right, the usecase to use crate as a s3 proxy is very seldom.
from crate.
Related Issues (20)
- Add `quarter` to `INTERVAL` values HOT 1
- Improvement to the shard allocation logic when `max_shards_per_node` configured HOT 2
- Consider if some words could be allowed as identifier names without quoting HOT 1
- Vector Store: Support for Cosine similarity and Dot Product when creating a FLOAT_VECTOR HOT 3
- fdw: Parquet foreign data wrapper (write support)
- `_raw` returns IDs instead of column names HOT 4
- FDW - Can't query data from remote server using a non-superuser HOT 1
- ElasticsearchUncaughtExceptionHandler] [crate1] uncaught exception in thread [main] HOT 2
- Expand blob data type limitations in the docs
- dev cluster hash join regression HOT 1
- JWT: support public keys caching HOT 3
- Architecture image not readable in dark mode HOT 1
- fdw/jdbc: Support and document adding additional jdbc drivers
- Improve insert performance for partitioned tables HOT 2
- max_shards_per_node not behaving as documented HOT 5
- Improve SQLParseException to include query and approximate position of the error. HOT 2
- COPY FROM does not work on all files inside folder HOT 10
- Vector Store: Provide distance functions as scalar functions HOT 3
- Support for CREATE TYPE HOT 5
- Unable to copy data between tables using the syntax: `INSERT INTO test2 (SELECT * FROM test)` HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crate.