Coder Social home page Coder Social logo

darenasc / aeda Goto Github PK

View Code? Open in Web Editor NEW
14.0 14.0 0.0 3.29 MB

Build a data catalog by running a single line of code

License: MIT License

Python 83.18% TSQL 4.39% Dockerfile 0.27% Jupyter Notebook 12.16%
data-catalog data-exploration database eda metadata metadata-extraction

aeda's People

Contributors

darenasc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

aeda's Issues

Add anomaly analysis

Is your feature request related to a problem? Please describe.
In the profiling of columns, an optional analysis of anomalies could be included.

Describe the solution you'd like
The search of outliers in numerical columns and search of biased classes in categorical columns.

Describe alternatives you've considered
None

Add filter by data type

Add filter data types such as TEXT, NTEXT, JSONB, and others when processing data_values.

Allow multiple profilings

Is your feature request related to a problem? Please describe.
Currently, aeda allows to profile one database at a time. Ideally, it can accept a list of connections defined in databases.ini and process them in parallel or sequence, using only one API call.

Describe the solution you'd like
A clear and concise description of what you want to happen.
The explore call can receive a list of database connections separated by space and generate a queue to process them, the last database connection should represent the target metadata database.

Process data in-memory

For small tables in the source database, process them from a dataframe and insert them into the metadata database. Tables with less than 10k rows or according to the number of rows suggested by benchmark #26 between in-memory processing or querying the source database.

Provide a way to transform data

Is your feature request related to a problem? Please describe.
It would be ideal to provide an interface to transform data at column level. For example adding representation codes so that when a code appears in the table it can be visualized with its interpretation.

Describe the solution you'd like
Include a dict where to store a dictionary alike for code transformations at the code level, so that reports can be visualized with these clear data.

Describe alternatives you've considered
Adding a user-input in the streamlit app or by passing a dictionary as a config file.

Add date related summary

A table similar to stats but for datetime related data.

  • max date
  • min date
  • range (total number of days)
  • data received per day
  • median

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.