darenasc / aeda Goto Github PK
View Code? Open in Web Editor NEWBuild a data catalog by running a single line of code
License: MIT License
Build a data catalog by running a single line of code
License: MIT License
Is your feature request related to a problem? Please describe.
In the profiling of columns, an optional analysis of anomalies could be included.
Describe the solution you'd like
The search of outliers in numerical columns and search of biased classes in categorical columns.
Describe alternatives you've considered
None
Add filter data types such as TEXT, NTEXT, JSONB, and others when processing data_values
.
Using confidence and support. A โ B is P(A & B) / P(A)
Log all the operations and processing times in a log file
Is your feature request related to a problem? Please describe.
Currently, aeda
allows to profile one database at a time. Ideally, it can accept a list of connections defined in databases.ini
and process them in parallel or sequence, using only one API call.
Describe the solution you'd like
A clear and concise description of what you want to happen.
The explore
call can receive a list of database connections separated by space and generate a queue to process them, the last database connection should represent the target metadata database.
For small tables in the source database, process them from a dataframe and insert them into the metadata database. Tables with less than 10k rows or according to the number of rows suggested by benchmark #26 between in-memory processing or querying the source database.
Can be used for filtering.
Is your feature request related to a problem? Please describe.
It would be ideal to provide an interface to transform data at column level. For example adding representation codes so that when a code appears in the table it can be visualized with its interpretation.
Describe the solution you'd like
Include a dict
where to store a dictionary alike for code transformations at the code level, so that reports can be visualized with these clear data.
Describe alternatives you've considered
Adding a user-input in the streamlit app or by passing a dictionary as a config file.
the structure of the metadata database can be taken from the config.py file
Verify length of the data value before sending it to the metadata database
A table similar to stats but for datetime related data.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.