I'm inspired by the development of Algorithms and Data Structures that are optimal for specific tasks.
๐จ
In free time I like to play sports, listen to music and solve algorithmic problems.
hero's Introduction
hero's People
hero's Issues
improve project structure
- check configuration files in all project branches (.pylintrc, requirements.txt, etc.).
- remove all unnecessary packages from the requirements.txt file
...
add costs
add cost values in the plan representation
Cost model vs NN
Compare learned NN with cost model on plan ranking problem (in generalisation mode!): "What is the probability that random 2 plans will be ordered correctly via cost (NN prediction) comparison?"
hintset exploration strategy
Task
Investigate possible learning algorithms (i.e. strategies of exploration for good transition).
Context
- We have realized that the robustness problem is quite acute even on commonly used benchmarks.
- The natural way to deal with it would be to a) switch to offline learning and b) use checks of similarity of the custom plan and its estimated cardinalities with the experience from history.
- In order to guarantee the safety of any prediction of the
M
model, the transitions obtained with it must have already been explored. In this case, we don't need any prediction model, because we can just take the times from history itself! - It means that offline learning becomes just applying a smart strategy for filling history with the most useful transitions, i.e., we must explore queries and hintsets in such a way that we can find transitions with the highest speedup as quickly as possible.
- So we get a situation where hintsets are just a way to get the desired transition, and inference becomes just a search against the default plan for hintsets that could potentially lead to good and already confirmed transition. We will see later why this is an extremely important feature of the model.
Emulate online scenario
Main Questions
- For various load configurations, answer the following questions:
- When can the application of a NN in an online scenario be beneficial?
- How much resources will be needed for this?
- How much more effective is the
hero
approach?
- Consider a) planning time, b) training time (
hero
), and c) regression from predictions in an online scenario. - Investigate the dependency of the achieved performance gain and required resources on the search space (only
hintset
/ onlydop
/hintset
anddop
).
Scenarios of Interest
A scenario in emulation is determined by two components - the available data for model training and the workload.
- Data = all default plans, workload = all queries.
Goal: to test the ability to generalise knowledge based on the history of standard plans without changing the workload. - Data = results of the execution of plans previously selected by the NN, workload = all queries; the process of training models, executing the workload, and collecting data is repeated until convergence to the optimum.
Goal: to measure the resources needed to achieve a beneficial outcome using the classical approach. - Data = plans of all fast queries, workload = long queries (and vice versa).
Goal: to test the ability to generalise to a workload with changes in the distribution of query execution times. - Data = plans of part of the queries with the structure of the standard tree
X
, workload = remaining queries with the same structureX
.
Goal: to test the ability to generalise knowledge from a partial history to a workload with changes only in the statistics of standard plans. - Data = plans of part of the queries with the standard tree
X
, workload = remaining queries with the same standard treeX
.
Goal: to test the ability to generalise knowledge from a partial history to a workload without changes in standard plans.
add experiments artifacts
add archives with experiment artifacts:
- model weights
- loss curves, and
- processed stratified metrics
explore the prediction modes
tldr;
compare different explore mode of hint prediction modes: a) by template, b) by logical plan based and c) by full-plan (with estimations).
Goals
Find the answers for the questions:
- "Is it possible to make robust template-based hint prediction?"
- "Is the logical plan enough to make robust hint prediction?"
- "What is the worst case for these types of predictions?"
Exploring the possibilities of hint-based optimization
Description
Investigate extreme cases of query behavior when using hints and query_dop
parameter (both regression and acceleration)
Create sequential-all Dataset
sequential-all
dataset
Dataset Description
Collect result of sequential calls of EXPLAIN (format json)
and EXPLAIN (analyze, format json)
commands for all queries from 3 common benchmarks (JOB
, TPCH
, sample_queries
) under different environment settings (all combination of 7 hints and 3 parallel modes). Execution of duplicated plans can be eliminated in order to reduce the collection time.
check TCNN abilitiies
to do:
- reimplement TCNN
- check its ability to avoid regressions
- try a neighbor prioritization approach during local search using TCNN prediction sorting
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.