query optimization pushes column projection and filtering over to AWS (aka "predicate pushdown") leading to less data needing transfer out of S3 and reduced network/storage/memory footprint by the runtime computing+consuming the output record set
make sure you've already used AWS CLI's configure command to add credentials to whatever environment you dev in
in the test folder, replace all strings of "build.cauldron.tools" with an existing s3 bucket id of something your previously configured AWS credentials actually have read/write access to
use the LakeDriver.getConnection(...) methods to create JDBC connections
pass a list of TableSpecification defining all "external tables" your query needs to be a valid reference
(optional) specify one of the following Scan classes to configure behavior
LakeS3GetScan uses GetObject, full tables are downloaded, both projection and filtering are performed in memory
LakeS3SelectScan Uses SelectObjectContent, only the required projected columns are downloaded, filtering is done in memory
LakeS3SelectWhereScan (default) uses SelectObjectContent, both projection and filtering is done on AWS, the results are downloaded, any remaining untranslated filters are applied in memory
todo
improve WHERE push-down
performance profiling, optimization
smarter, more comprehensive testing
mixed scan mode: some table scans are better GET, others SELECT
integrate and test the parquet compression support and save cash
get rid of AmazonS3URI.java dependency
figure out a way to get S3 Select working on AWS SDK v2