mage-ai / mage-ai Goto Github PK

🧙 Build, run, and manage data pipelines for integrating and transforming data.

License: Apache License 2.0

JavaScript 0.21% TypeScript 36.34% CSS 0.68% Python 53.35% HTML 8.58% Dockerfile 0.10% Shell 0.12% Jupyter Notebook 0.05% Jinja 0.10% Mako 0.01% R 0.01% Makefile 0.01% SCSS 0.46%

machine-learning artificial-intelligence data data-engineering data-science python elt etl pipelines data-pipelines

mage-ai's Introduction

🧙 A modern replacement for Airflow.

Documentation 🌪️ Get a 5 min overview 🌊 Play with live tool 🔥 Get instant help

Give your data team `magical` powers

Integrate and synchronize data from 3rd party sources

Build real-time and batch pipelines to transform data using Python, SQL, and R

Run, monitor, and orchestrate thousands of pipelines without losing sleep

1️⃣ 🏗️

Build

Have you met anyone who said they loved developing in Airflow?
That’s why we designed an easy developer experience that you’ll enjoy.


Easy developer experience Start developing locally with a single command or launch a dev environment in your cloud using Terraform. Language of choice Write code in Python, SQL, or R in the same data pipeline for ultimate flexibility. Engineering best practices built-in Each step in your pipeline is a standalone file containing modular code that’s reusable and testable with data validations. No more DAGs with spaghetti code.

Easy developer experience
Start developing locally with a single command or launch a dev environment in your cloud using Terraform.

Language of choice
Write code in Python, SQL, or R in the same data pipeline for ultimate flexibility.

Engineering best practices built-in
Each step in your pipeline is a standalone file containing modular code that’s reusable and testable with data validations. No more DAGs with spaghetti code.

↓

2️⃣ 🔮

Preview

Stop wasting time waiting around for your DAGs to finish testing.
Get instant feedback from your code each time you run it.


Interactive code Immediately see results from your code’s output with an interactive notebook UI. Data is a first-class citizen Each block of code in your pipeline produces data that can be versioned, partitioned, and cataloged for future use. Collaborate on cloud Develop collaboratively on cloud resources, version control with Git, and test pipelines without waiting for an available shared staging environment.

Interactive code
Immediately see results from your code’s output with an interactive notebook UI.

Data is a first-class citizen
Each block of code in your pipeline produces data that can be versioned, partitioned, and cataloged for future use.

Collaborate on cloud
Develop collaboratively on cloud resources, version control with Git, and test pipelines without waiting for an available shared staging environment.

↓

3️⃣ 🚀

Launch

Don’t have a large team dedicated to Airflow?
Mage makes it easy for a single developer or small team to scale up and manage thousands of pipelines.


Fast deploy Deploy Mage to AWS, GCP, or Azure with only 2 commands using maintained Terraform templates. Scaling made simple Transform very large datasets directly in your data warehouse or through a native integration with Spark. Observability Operationalize your pipelines with built-in monitoring, alerting, and observability through an intuitive UI.

🧙 Intro

Mage is an open-source data pipeline tool for transforming and integrating data.

🏃‍♀️ Install

The recommended way to install the latest version of Mage is through Docker with the following command:

docker pull mageai/mageai:latest

You can also install Mage using pip or conda, though this may cause dependency issues without the proper environment.

pip install mage-ai

conda install -c conda-forge mage-ai

Looking for help? The fastest way to get started is by checking out our documentation here.

Looking for quick examples? Open a demo project right in your browser or check out our guides.

🎮 Demo

Live demo

Build and run a data pipeline with our demo app.

WARNING

The live demo is public to everyone, please don’t save anything sensitive (e.g. passwords, secrets, etc).

Demo video (5 min)

_{Click the image to play video}

👩‍🏫 Tutorials

🔮 Features


🎶	Orchestration	Schedule and manage data pipelines with observability.
📓	Notebook	Interactive Python, SQL, & R editor for coding data pipelines.
🏗️	Data integrations	Synchronize data from 3rd party sources to your internal destinations.
🚰	Streaming pipelines	Ingest and transform real-time data.
❎	dbt	Build, run, and manage your dbt models with Mage.

A sample data pipeline defined across 3 files ➝

Load data ➝

@data_loader
def load_csv_from_file():
    return pd.read_csv('default_repo/titanic.csv')

Transform data ➝

@transformer
def select_columns_from_df(df, *args):
    return df[['Age', 'Fare', 'Survived']]

Export data ➝

@data_exporter
def export_titanic_data_to_disk(df) -> None:
    df.to_csv('default_repo/titanic_transformed.csv')

What the data pipeline looks like in the UI ➝

New? We recommend reading about blocks and learning from a hands-on tutorial.

🏔️ Core design principles

Every user experience and technical design decision adheres to these principles.


💻	Easy developer experience	Open-source engine that comes with a custom notebook UI for building data pipelines.
🚢	Engineering best practices built-in	Build and deploy data pipelines using modular code. No more writing throwaway code or trying to turn notebooks into scripts.
💳	Data is a first-class citizen	Designed from the ground up specifically for running data-intensive workflows.
🪐	Scaling is made simple	Analyze and process large data quickly for rapid iteration.

🛸 Core abstractions

These are the fundamental concepts that Mage uses to operate.


Project	Like a repository on GitHub; this is where you write all your code.
Pipeline	Contains references to all the blocks of code you want to run, charts for visualizing data, and organizes the dependency between each block of code.
Block	A file with code that can be executed independently or within a pipeline.
Data product	Every block produces data after it's been executed. These are called data products in Mage.
Trigger	A set of instructions that determine when or how a pipeline should run.
Run	Stores information about when it was started, its status, when it was completed, any runtime variables used in the execution of the pipeline or block, etc.

🙋‍♀️ Contributing and developing

Add features and instantly improve the experience for everyone.

Check out the contributing guide to set up your development environment and start building.

👨‍👩‍👧‍👦 Community

Individually, we’re a mage.

🧙 Mage

Magic is indistinguishable from advanced technology. A mage is someone who uses magic (aka advanced technology). Together, we’re Magers!

🧙‍♂️🧙 Magers (/ˈmājər/)

A group of mages who help each other realize their full potential! Let’s hang out and chat together ➝

For real-time news, fun memes, data engineering topics, and more, join us on ➝


	Twitter
	LinkedIn
	GitHub
	Slack

🤔 Frequently Asked Questions (FAQs)

Check out our FAQ page to find answers to some of our most asked questions.

🪪 License

See the LICENSE file for licensing information.

mage-ai's People

Contributors

Stargazers

Watchers

Forkers

wangxiaoyou1993 olliewollie2022 thomaschung408 jaanvir johnson-mage shrey-mage juansolisctj13 platzi-master-c9 simond110 tigerbombz csharplus phamlkevin glcn35 agokmenefe wangqun010101 levintech chadgueli megadl yistar-traitor celik-muhammed datadeus ashiquebiniqbal siddharth2798 hephaex lluna4 reyrobs aidirectory tommydangerous afizs sgouda0412 rakpolo25 anamul430470 emonahmeds sivagowtham23 amaxime sshanto01610 ak0600266 rupakhan ujju89 anis1m anmol-343 opurahman1 mandieve bibek723 husnain123456789 navaneeth-sharma adiikhan dksingh256 hanzla786 guttamukesh kyupsycho aamir562 dibik2119 shane-stats savadev roshray gilbert1024 limsijie93 ossamajali rishirelan hunter155 jinyoung-kim-dinnercoding vallabhk alphonsebrandon armand-shogun088 boltparadigm flare026 victraomnissiah053 arbiterrss scepterluna nebula-operator068 arcanelea3 pulsar444 kelliprium083 randr97 srigirishankar idyweb storm-crmr affinityoiul92 quantumne3 planeswalkerkageaua mohadata22 amiteshwarr davgit fdoperezi meetshah15 artifact-am47 rath-nsao ionimperator tamra-6 dino-tlnne74 bolt-aigv esper067 metatron61 vectoravr03 travis-99 keith-009 nebula-iem jackie-kaea drewaaaae

mage-ai's Issues

Docker Issue

So, this worked in Docker 19, but now in docker 20 I get the following issue. It is worth noting that this issue does not appear to be docker related. Further, it precedes the Jupiter token issue.

[+] Building 6.2s (11/11) FINISHED                                              
 => [internal] load build definition from Dockerfile                       0.0s
 => => transferring dockerfile: 32B                                        0.0s
 => [internal] load .dockerignore                                          0.0s
 => => transferring context: 2B                                            0.0s
 => [internal] load metadata for docker.io/jupyter/minimal-notebook:lates  0.9s
 => [auth] jupyter/minimal-notebook:pull token for registry-1.docker.io    0.0s
 => [1/6] FROM docker.io/jupyter/minimal-notebook@sha256:edede2004c961f49  0.0s
 => [internal] load build context                                          0.0s
 => => transferring context: 37.71kB                                       0.0s
 => CACHED [2/6] RUN apt-get update && apt-get install -y --no-install-re  0.0s
 => CACHED [3/6] COPY requirements.txt requirements.txt                    0.0s
 => CACHED [4/6] RUN pip3 install -r requirements.txt                      0.0s
 => CACHED [5/6] COPY . /home/jovyan/src                                   0.0s
 => ERROR [6/6] RUN cd /home/jovyan/src/mage_ai/frontend && npm install    5.2s
------                                                                          
 > [6/6] RUN cd /home/jovyan/src/mage_ai/frontend && npm install:
#0 5.169 npm notice 
#0 5.169 npm notice New minor version of npm available! 8.5.5 -> 8.12.2
#0 5.169 npm notice Changelog: <https://github.com/npm/cli/releases/tag/v8.12.2>
#0 5.169 npm notice Run `npm install -g [email protected]` to update!
#0 5.169 npm notice 
#0 5.175 npm ERR! code ERESOLVE
#0 5.179 npm ERR! ERESOLVE unable to resolve dependency tree
#0 5.179 npm ERR! 
#0 5.179 npm ERR! While resolving: [email protected]
#0 5.179 npm ERR! Found: [email protected]
#0 5.179 npm ERR! node_modules/react
#0 5.179 npm ERR!   react@"18.1.0" from the root project
#0 5.179 npm ERR! 
#0 5.179 npm ERR! Could not resolve dependency:
#0 5.179 npm ERR! peer react@"^16.3.0-0" from @visx/[email protected]
#0 5.179 npm ERR! node_modules/@visx/heatmap
#0 5.179 npm ERR!   @visx/heatmap@"^1.0.0" from the root project
#0 5.179 npm ERR! 
#0 5.179 npm ERR! Fix the upstream dependency conflict, or retry
#0 5.179 npm ERR! this command with --force, or --legacy-peer-deps
#0 5.179 npm ERR! to accept an incorrect (and potentially broken) dependency resolution.
#0 5.179 npm ERR! 
#0 5.180 npm ERR! See /home/jovyan/.npm/eresolve-report.txt for a full report.
#0 5.181 
#0 5.181 npm ERR! A complete log of this run can be found in:
#0 5.182 npm ERR!     /home/jovyan/.npm/_logs/2022-06-20T21_22_38_504Z-debug-0.log
------
failed to solve: executor failed running [/bin/bash -o pipefail -c cd /home/jovyan/src/mage_ai/frontend && npm install]: exit code: 1

Support for GeoDataFrames?

Hi Team,

I've been trying to use mage-ai to perform some transformations on geo spatial data using geopandas. However, it looks like the "data loader" block is expecting a DataFrame output even though I've specified GeoDataFrame as the output.

Am I doing something wrong here or are GeoDataFrames not supported?

Setup info:

mage-ai version 0.3.6

My code is as below:

import io
import os
import pandas as pd
import geopandas as gpd
import requests
import zipfile
from pandas import DataFrame
from geopandas import GeoDataFrame

if 'data_loader' not in globals():
    from mage_ai.data_preparation.decorators import data_loader
if 'test' not in globals():
    from mage_ai.data_preparation.decorators import test


@data_loader
def load_data_from_abs(**kwargs) -> GeoDataFrame:

    url = 'https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition-3/jul2021-jun2026/access-and-downloads/digital-boundary-files/SAL_2021_AUST_GDA2020_SHP.zip'
    folder = "./data/raw/SAL_2021_SHP"

    res = requests.get(url)
    open(folder + ".zip", "wb").write(res.content)

    with zipfile.ZipFile(folder + ".zip", "r") as zip_ref:
        zip_ref.extractall(folder)

    return gpd.read_file("./data/raw/SAL_2021_SHP/SAL_2021_AUST_GDA2020.shp")


@test
def test_output(df) -> None:

    assert df is not None, 'The output is undefined'

And the error produced:

Exception                                 Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_10992\496708138.py in <module>
     73         return find(lambda val: val is not None, output)
     74 
---> 75 df = execute_custom_code()
     76 
     77 # Post processing code below (source: output_display.py)

~\AppData\Local\Temp\ipykernel_10992\496708138.py in execute_custom_code()
     62         global_vars=global_vars,
     63         analyze_outputs=True,
---> 64         update_status=True,
     65     )
     66     if False:

~\anaconda3\envs\SuburbExplorer-data\lib\site-packages\mage_ai\data_preparation\models\block\__init__.py in execute_sync(self, analyze_outputs, custom_code, execution_partition, global_vars, logger, redirect_outputs, run_all_blocks, update_status)
    435             if logger is not None:
    436                 logger.exception(f'Failed to execute block {self.uuid}')
--> 437             raise err
    438         finally:
    439             if update_status:

~\anaconda3\envs\SuburbExplorer-data\lib\site-packages\mage_ai\data_preparation\models\block\__init__.py in execute_sync(self, analyze_outputs, custom_code, execution_partition, global_vars, logger, redirect_outputs, run_all_blocks, update_status)
    414                 )
    415             else:
--> 416                 self.__verify_outputs(block_output)
    417                 variable_mapping = dict(zip(self.output_variables.keys(), block_output))
    418 

~\anaconda3\envs\SuburbExplorer-data\lib\site-packages\mage_ai\data_preparation\models\block\__init__.py in __verify_outputs(self, outputs)
   1007             ):
   1008                 raise Exception(
-> 1009                     f'Validation error for block {self.uuid}: '
   1010                     f'the variable {variable_names[idx]} should be {expected_dtype} type, '
   1011                     f'but {actual_dtype} type is returned',

Exception: Validation error for block load_sal: the variable df should be <class 'pandas.core.frame.DataFrame'> type, but <class 'geopandas.geodataframe.GeoDataFrame'> type is returned

Does it support image data preview and label modification?

Make mage-ai installable with poetry

Is your feature request related to a problem? Please describe.
When installing mage-ai with poetry, the boto3 dependency is not installed.

poetry add mage-ai
mage start errors with ModuleNotFoundError: No module named 'boto3'
This might be related with #1389, which uses boto3 to list EMR

Describe the solution you'd like
Be able to install and use mage-ai with poetry.

Describe alternatives you've considered
Install boto3 as a separate dependency.

Additional context
N/A

Can't see full message in the logs

Describe the bug
Long message gets cut off in logs.

Edit pipeline name missing on Firefox

Describe the bug
On Firefox the edit name of the pipeline does not exist.

To Reproduce
Steps to reproduce the behavior:

Go to pipelines
Click on Edit pipeline

Expected behavior
There should be the possibility to rename the pipeline.

Screenshots

Desktop (please complete the following information):

OS: OS Monterey
Browser firefox
Version 106.0.1
mage-ai = 0.6.3

Additional context
Add any other context about the problem here.

Sensor blocks can pass output from origin block to downstream blocks

Is your feature request related to a problem? Please describe.

A sensor block can wait on a pipeline to finish or a block in a pipeline to finish.

If the sensor is waiting on a block, when the block is done, pass that block’s output to all the downstream blocks of the sensor.

Transformer: Feature Scaling

Is your feature request related to a problem? Please describe.
Feature Scaling is used to normalize the range of independent variables or features of data. This action is generally performed during the data preprocessing step

Mage-Ai includes some common used transformations but feature scaling is not a part of the tool right now. So, I will be contributing this transformation feature.

Describe the solution you'd like
User is able to select “Normalization” or "Standardization" from the “Transformer” option on the tool. Once the use selects it, the transformation step is added in the pipeline and return the data after the normalization or standardization is performed.

Error occurred on demo

Well, I visited demo and found out error as followings.

[load_data] Executing data_loader block...

[load_data] DONE

[proud_snowflake] Executing data_loader block...

[proud_snowflake] DONE

Pipeline default_pipeline execution failed with error:

Traceback (most recent call last):

  File "/usr/local/lib/python3.10/site-packages/mage_ai/server/websocket.py", line 87, in run_pipeline

    pipeline.execute_sync(

  File "/usr/local/lib/python3.10/site-packages/mage_ai/data_preparation/models/pipeline.py", line 230, in execute_sync

    run_blocks_sync(

  File "/usr/local/lib/python3.10/site-packages/mage_ai/data_preparation/models/block/__init__.py", line 139, in run_blocks_sync

    block.execute_sync(

  File "/usr/local/lib/python3.10/site-packages/mage_ai/data_preparation/models/block/__init__.py", line 442, in execute_sync

    raise err

  File "/usr/local/lib/python3.10/site-packages/mage_ai/data_preparation/models/block/__init__.py", line 398, in execute_sync

    output = self.execute_block(

  File "/usr/local/lib/python3.10/site-packages/mage_ai/data_preparation/models/block/__init__.py", line 599, in execute_block

    outputs = execute_sql_code(self, custom_code or self.content)

  File "/usr/local/lib/python3.10/site-packages/mage_ai/data_preparation/models/block/sql/__init__.py", line 18, in execute_sql_code

    config_file_loader = ConfigFileLoader(config_path, config_profile)

  File "/usr/local/lib/python3.10/site-packages/mage_ai/io/config.py", line 275, in __init__

    self.config = yaml.full_load(config_file)[profile]

KeyError: None

HTML Select element has bad background on dark theme

Describe the bug
The Select input element in the New/Edit trigger page has a background not in line with the website theme: in my case I have a dark theme and the select menu has a white background with white text

To Reproduce
Steps to reproduce the behavior:

Create a pipeline
Create a trigger
Go to the trigger settings
Open the Frequency menu and observe the background

Expected behavior
The background should be in line with the current website theme

Screenshots

Desktop (please complete the following information):

OS: Microsoft Windows 10 Enterprise
Browser: Chrome
Version: 108.0.5359.125

Additional context
My OS preference is to use a dark theme, so I see mage with a dark theme, I don't know if there's a light theme as well

As a user, I want to see the current run time and the final execution time for each pipeline run from the pipeline runs view

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Can you provide support for user authentication so that we can set up different groups of users?

Is your feature request related to a problem? Please describe.
We are planning to set up a local instance of Mage and are wondering if it is possible to support user authentication. We would like different users to be able to have their own unique account settings."

Describe the solution you'd like
User authentication.

Describe alternatives you've considered
Grouping on a shared account.

Additional context
N/A

Can't query logs by start timestamp and end timestamp

Fetching logs takes a long time. Example link: http://demo.mage.ai/pipelines/example_pipeline/logs

Support fetching logs by start and end timestamp will help a lot.

Additionally, some empty logs are shown at the top. We should filter them out.

Support R as a programming language in blocks

Is your feature request related to a problem? Please describe.
From Robert K:

There are no orchestration tools that work with R and while I know it’s not huge in industry we have some code that’d be lovely to deploy via Mage. Right now everything runs via task scheduler on a VM.

Example use case:

Use a SQL block to pull in raw data (SQL Server), use R to clean it up and do some basic feature engineering, then write it back to a database (either SQL Server or Snowflake)

Describe the solution you'd like

Add a new BlockLanguage for R
Support executing R script in python
Support installing extra R packages

Describe alternatives you've considered
TBD

Additional context
Feature requested by Robert K

When fetching logs of a pipeline, set default time range

Describe the bug
Web server crashes because fetching too many logs.

Filter pipeline monitors by date

The pipeline monitor charts are set to show data from the last 90 days by default. The backend APIs support filtering by time range, so we can also add a UI component to filter the charts by date as well.

As a user, I want to get alerts for specific run statuses; e.g. only get alerted when runs fail

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Mage can't find template files

I have this problem whenever I try to create a new block (except for transformers and sensors for some reason), where jinja can't load the template.
The template files are present in C:\Users\benja\anaconda3\envs\mage-test\Lib\site-packages\mage_ai\data_preparation\templates
in the folders data_exporters, data_loaders etc... Which I think should be the path where they are loaded from

I'm on Windows 11
Browser is Chrome (however this shouldn't matter)

Error:

Traceback (most recent call last):

  File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\tornado\web.py", line 1702, in _execute
    result = method(*self.path_args, **self.path_kwargs)
  File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\mage_ai\server\api\blocks.py", line 96, in post
    upstream_block_uuids=payload.get('upstream_blocks', []),
  File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\mage_ai\data_preparation\models\block\__init__.py", line 338, in create
    pipeline_type=pipeline.type if pipeline is not None else None,
  File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\mage_ai\data_preparation\templates\template.py", line 82, in load_template
    pipeline_type=pipeline_type,
  File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\mage_ai\data_preparation\templates\template.py", line 61, in fetch_template_source
    template_source = __fetch_data_loader_templates(config, pipeline_type=pipeline_type)
  File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\mage_ai\data_preparation\templates\template.py", line 107, in __fetch_data_loader_templates
    template_env.get_template(template_path).render(
  File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\jinja2\environment.py", line 997, in get_template
    return self._load_template(name, globals)
  File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\jinja2\environment.py", line 958, in _load_template
    template = self.loader.load(self, name, self.make_globals(globals))
  File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\jinja2\loaders.py", line 125, in load
    source, filename, uptodate = self.get_source(environment, name)
  File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\jinja2\loaders.py", line 194, in get_source
    pieces = split_template_path(template)
  File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\jinja2\loaders.py", line 35, in split_template_path
    raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: data_loaders\default.jinja
data_loaders\default.jinja

Add option of columns to copied in Postgres exporter

Is your feature request related to a problem? Please describe.
There are cases in which the target table of a Postgres data exporter step might have more columns than in the source file (for example audit columns).
It would be useful to have an option to add columns in the Postgres data exporter:
https://github.com/mage-ai/mage-ai/blob/master/mage_ai/io/postgres.py#L191

Postgres doc: https://www.postgresql.org/docs/current/sql-copy.html

Describe the solution you'd like
I would like to be able to configure which columns to export data to in Postgres.

Describe alternatives you've considered
Either using pandas to_sql, either the copy command.

Additional context
N/A

Thank you!

Support plotly outputs in scratchpad blocks

Reproduce

Add scratchpad block
Paste in the following code:

import plotly.offline as pyo
import plotly.graph_objs as go
pyo.init_notebook_mode()
trace0 = go.Scatter(
    x=[1, 2, 3, 4],
    y=[10, 15, 13, 17]
)
trace1 = go.Scatter(
    x=[1, 2, 3, 4],
    y=[16, 5, 11, 9]
)
data = [trace0, trace1]
pyo.iplot(data, filename = 'basic-line')

The current output is this:

The above code works in Jupyter notebook.

Issue

When you execute code from a block, it sends the code to be executed to the kernel over WebSocket Secure, then the output from the code execution is sent to the client to be rendered.

Here is the data that is sent to the client from the kernel:

{
  "sparkling_field": [
    {
      "data": null,
      "error": null,
      "execution_state": "busy",
      "metadata": null,
      "msg_id": "9f1ac735-34aac2f242fb00991711f451_1_1",
      "msg_type": "status",
      "type": null,
      "uuid": "sparkling_field",
      "pipeline_uuid": null
    },
    {
      "data": "        <script type=\"text/javascript\">...</script>\n        ",
      "error": null,
      "execution_state": null,
      "metadata": {},
      "msg_id": "9f1ac735-34aac2f242fb00991711f451_1_1",
      "msg_type": "display_data",
      "type": "text/html",
      "uuid": "sparkling_field",
      "pipeline_uuid": null
    },
    {
      "data": "<div>                            <div id=\"10983948-dffe-4ce1-873c-c175858f7316\" class=\"plotly-graph-div\" style=\"height:525px; width:100%;\"></div>            <script type=\"text/javascript\">                require([\"plotly\"], function(Plotly) {                    window.PLOTLYENV=window.PLOTLYENV || {};                                    if (document.getElementById(\"10983948-dffe-4ce1-873c-c175858f7316\")) {                    Plotly.newPlot(                        \"10983948-dffe-4ce1-873c-c175858f7316\",                        [{\"x\":[1,2,3,4],\"y\":[10,15,13,17],\"type\":\"scatter\"},{\"x\":[1,2,3,4],\"y\":[16,5,11,9],\"type\":\"scatter\"}],                        {\"template\":{\"data\":{\"barpolar\":[{\"marker\":{\"line\":{\"color\":\"#E5ECF6\",\"width\":0.5},\"pattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2}},\"type\":\"barpolar\"}],\"bar\":[{\"error_x\":{\"color\":\"#2a3f5f\"},\"error_y\":{\"color\":\"#2a3f5f\"},\"marker\":{\"line\":{\"color\":\"#E5ECF6\",\"width\":0.5},\"pattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2}},\"type\":\"bar\"}],\"carpet\":[{\"aaxis\":{\"endlinecolor\":\"#2a3f5f\",\"gridcolor\":\"white\",\"linecolor\":\"white\",\"minorgridcolor\":\"white\",\"startlinecolor\":\"#2a3f5f\"},\"baxis\":{\"endlinecolor\":\"#2a3f5f\",\"gridcolor\":\"white\",\"linecolor\":\"white\",\"minorgridcolor\":\"white\",\"startlinecolor\":\"#2a3f5f\"},\"type\":\"carpet\"}],\"choropleth\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"type\":\"choropleth\"}],\"contourcarpet\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"type\":\"contourcarpet\"}],\"contour\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"contour\"}],\"heatmapgl\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"heatmapgl\"}],\"heatmap\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"heatmap\"}],\"histogram2dcontour\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"histogram2dcontour\"}],\"histogram2d\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"histogram2d\"}],\"histogram\":[{\"marker\":{\"pattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2}},\"type\":\"histogram\"}],\"mesh3d\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"type\":\"mesh3d\"}],\"parcoords\":[{\"line\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"parcoords\"}],\"pie\":[{\"automargin\":true,\"type\":\"pie\"}],\"scatter3d\":[{\"line\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scatter3d\"}],\"scattercarpet\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scattercarpet\"}],\"scattergeo\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scattergeo\"}],\"scattergl\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scattergl\"}],\"scattermapbox\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scattermapbox\"}],\"scatterpolargl\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scatterpolargl\"}],\"scatterpolar\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scatterpolar\"}],\"scatter\":[{\"fillpattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2},\"type\":\"scatter\"}],\"scatterternary\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scatterternary\"}],\"surface\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"surface\"}],\"table\":[{\"cells\":{\"fill\":{\"color\":\"#EBF0F8\"},\"line\":{\"color\":\"white\"}},\"header\":{\"fill\":{\"color\":\"#C8D4E3\"},\"line\":{\"color\":\"white\"}},\"type\":\"table\"}]},\"layout\":{\"annotationdefaults\":{\"arrowcolor\":\"#2a3f5f\",\"arrowhead\":0,\"arrowwidth\":1},\"autotypenumbers\":\"strict\",\"coloraxis\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"colorscale\":{\"diverging\":[[0,\"#8e0152\"],[0.1,\"#c51b7d\"],[0.2,\"#de77ae\"],[0.3,\"#f1b6da\"],[0.4,\"#fde0ef\"],[0.5,\"#f7f7f7\"],[0.6,\"#e6f5d0\"],[0.7,\"#b8e186\"],[0.8,\"#7fbc41\"],[0.9,\"#4d9221\"],[1,\"#276419\"]],\"sequential\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"sequentialminus\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]]},\"colorway\":[\"#636efa\",\"#EF553B\",\"#00cc96\",\"#ab63fa\",\"#FFA15A\",\"#19d3f3\",\"#FF6692\",\"#B6E880\",\"#FF97FF\",\"#FECB52\"],\"font\":{\"color\":\"#2a3f5f\"},\"geo\":{\"bgcolor\":\"white\",\"lakecolor\":\"white\",\"landcolor\":\"#E5ECF6\",\"showlakes\":true,\"showland\":true,\"subunitcolor\":\"white\"},\"hoverlabel\":{\"align\":\"left\"},\"hovermode\":\"closest\",\"mapbox\":{\"style\":\"light\"},\"paper_bgcolor\":\"white\",\"plot_bgcolor\":\"#E5ECF6\",\"polar\":{\"angularaxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"},\"bgcolor\":\"#E5ECF6\",\"radialaxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"}},\"scene\":{\"xaxis\":{\"backgroundcolor\":\"#E5ECF6\",\"gridcolor\":\"white\",\"gridwidth\":2,\"linecolor\":\"white\",\"showbackground\":true,\"ticks\":\"\",\"zerolinecolor\":\"white\"},\"yaxis\":{\"backgroundcolor\":\"#E5ECF6\",\"gridcolor\":\"white\",\"gridwidth\":2,\"linecolor\":\"white\",\"showbackground\":true,\"ticks\":\"\",\"zerolinecolor\":\"white\"},\"zaxis\":{\"backgroundcolor\":\"#E5ECF6\",\"gridcolor\":\"white\",\"gridwidth\":2,\"linecolor\":\"white\",\"showbackground\":true,\"ticks\":\"\",\"zerolinecolor\":\"white\"}},\"shapedefaults\":{\"line\":{\"color\":\"#2a3f5f\"}},\"ternary\":{\"aaxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"},\"baxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"},\"bgcolor\":\"#E5ECF6\",\"caxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"}},\"title\":{\"x\":0.05},\"xaxis\":{\"automargin\":true,\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\",\"title\":{\"standoff\":15},\"zerolinecolor\":\"white\",\"zerolinewidth\":2},\"yaxis\":{\"automargin\":true,\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\",\"title\":{\"standoff\":15},\"zerolinecolor\":\"white\",\"zerolinewidth\":2}}}},                        {\"responsive\": true}                    ).then(function(){\n                            \nvar gd = document.getElementById('10983948-dffe-4ce1-873c-c175858f7316');\nvar x = new MutationObserver(function (mutations, observer) {{\n        var display = window.getComputedStyle(gd).display;\n        if (!display || display === 'none') {{\n            console.log([gd, 'removed!']);\n            Plotly.purge(gd);\n            observer.disconnect();\n        }}\n}});\n\n// Listen for the removal of the full notebook cells\nvar notebookContainer = gd.closest('#notebook-container');\nif (notebookContainer) {{\n    x.observe(notebookContainer, {childList: true});\n}}\n\n// Listen for the clearing of the current output cell\nvar outputEl = gd.closest('.output');\nif (outputEl) {{\n    x.observe(outputEl, {childList: true});\n}}\n\n                        })                };                });            </script>        </div>",
      "error": null,
      "execution_state": null,
      "metadata": {},
      "msg_id": "9f1ac735-34aac2f242fb00991711f451_1_1",
      "msg_type": "display_data",
      "type": "text/html",
      "uuid": "sparkling_field",
      "pipeline_uuid": null
    },
    {
      "data": null,
      "error": null,
      "execution_state": "idle",
      "metadata": null,
      "msg_id": "9f1ac735-34aac2f242fb00991711f451_1_1",
      "msg_type": "status",
      "type": null,
      "uuid": "sparkling_field",
      "pipeline_uuid": null
    }
  ]
}

Potential solution

Update this file https://github.com/mage-ai/mage-ai/blob/master/mage_ai/frontend/components/CodeBlock/CodeOutput/index.tsx
to handle rendering HTML strings sent from the kernel.

Try using the React property dangerouslySetInnerHTML:

<div dangerouslySetInnerHTML={{ __html: '...' }} />

Reporter

Stefan

Add dbt snapshot as option

Is your feature request related to a problem? Please describe.
N/A

Describe the solution you'd like
I would like to be able to execute dbt snapshots as well: https://docs.getdbt.com/docs/build/snapshots.

Describe alternatives you've considered
N/A

Additional context
Add any other context or screenshots about the feature request here.

Thank you!

Scheduler sometimes fails to schedule blocks due to memory issue.

Describe the bug
When running scheduler, sometimes scheduler fails to schedule blocks with exception "Cannot allocate memory."

Then pipeline runs are hanging there

As a user, I want data loader and data exporter templates for AWS SQL

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Wrong Docs URL on Data Exporter - Python Goocle Cloud Storage

Describe the bug
Wrong URL for the python data exporter regarding google cloud storage.

To Reproduce
Steps to reproduce the behavior:

Load a new data exporter selecting python and google cloud storage
Click on the Docs: https://github.com/mage-ai/mage-ai/blob/master/docs/blocks/data_loading.md#googlecloudstorage
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots

As a user, I want data loader and data exporter templates for Oracle

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

One hot encoding as a transform

Is your feature request related to a problem? Please describe.
One Hot Encoding is used to convert numerical categorical variables into binary vectors. This is generally performed during the data preprocessing step.
Mage-Ai includes some common used transformations but one hot encoding is not a part of the tool right now.

Describe the solution you'd like
User is able to select “One Hot Encoding” from the “Transformer” option on the tool. Once the use selects it, the transformation step is added in the pipeline and return the data including the one hot encoded (get dummies) is performed.

Additional context
I intend to implement using the scikit learn library

As a user, I want to see the timestamps in logs in a specific timezone vs the default UTC timezone

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Creating new data loaders, transformations and/or data exporters | Documentation

First things first, what a beauty Mage is ❤️!

I would like to add some generic 3rd party data loaders and exporters, so I'm looking for some documentation on how to add one. So far, I went through existing documentation on Blocks (https://github.com/mage-ai/mage-ai/blob/master/docs/blocks/data_loading.md) and Contributing (https://github.com/mage-ai/mage-ai/tree/master/docs/contributing) and eyeballed some code (especially mage_integrations) but couldn't get much idea how it would come up in UI, etc.

Do we have some documentation already available on this?

Cannot pip install

pip install mage-ai
Collecting mage-ai
  Downloading mage_ai-0.2.9-py3-none-any.whl (6.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.2/6.2 MB 7.9 MB/s eta 0:00:00
Requirement already satisfied: joblib>=1.1.0 in /opt/homebrew/lib/python3.9/site-packages (from mage-ai) (1.1.0)
Requirement already satisfied: click==8.1.3 in /opt/homebrew/lib/python3.9/site-packages (from mage-ai) (8.1.3)
Collecting Flask~=1.1.2
  Downloading Flask-1.1.4-py2.py3-none-any.whl (94 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 94.6/94.6 kB 2.8 MB/s eta 0:00:00
Collecting itsdangerous~=1.1.0
  Downloading itsdangerous-1.1.0-py2.py3-none-any.whl (16 kB)
Collecting pyarrow==6.0.0
  Downloading pyarrow-6.0.0-cp39-cp39-macosx_11_0_arm64.whl (13.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.7/13.7 MB 8.5 MB/s eta 0:00:00
Collecting pyyaml~=6.0
  Using cached PyYAML-6.0-cp39-cp39-macosx_11_0_arm64.whl (173 kB)
Collecting jupyter-server-proxy==3.2.1
  Downloading jupyter_server_proxy-3.2.1-py3-none-any.whl (35 kB)
Collecting asyncio==3.4.3
  Downloading asyncio-3.4.3-py3-none-any.whl (101 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 101.8/101.8 kB 3.9 MB/s eta 0:00:00
Requirement already satisfied: pytz==2022.1 in /opt/homebrew/lib/python3.9/site-packages (from mage-ai) (2022.1)
Requirement already satisfied: zipp==3.8.0 in /opt/homebrew/lib/python3.9/site-packages (from mage-ai) (3.8.0)
Requirement already satisfied: python-dateutil==2.8.2 in /opt/homebrew/lib/python3.9/site-packages (from mage-ai) (2.8.2)
Collecting requests==2.27.0
  Downloading requests-2.27.0-py2.py3-none-any.whl (63 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.1/63.1 kB 2.1 MB/s eta 0:00:00
Collecting tables==3.7.0
  Downloading tables-3.7.0.tar.gz (8.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.2/8.2 MB 8.4 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [12 lines of output]
      /var/folders/l7/m69bgsp16830cvfm6973p2h00000gn/T/H5closeleexy1dq.c:2:5: error: implicit declaration of function 'H5close' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
          H5close();
          ^
      1 error generated.
      cpuinfo failed, assuming no CPU features: No module named 'cpuinfo'
      * Using Python 3.9.13 (main, Aug  7 2022, 01:19:39)
      * Found cython 0.29.32
      * USE_PKGCONFIG: True
      .. ERROR:: Could not find a local HDF5 installation.
         You may need to explicitly state where your local HDF5 headers and
         library can be found by setting the ``HDF5_DIR`` environment
         variable or by using the ``--hdf5`` command-line option.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

non-ascii chars are mistreated

Describe the bug
My code contains non-ascii chars (in particular the Euro symbol €) but Mage transforms it into an unknown symbol

PS: for the moment I found a workaround saving the python string as a unicode string:

u"\N{euro sign}"

To Reproduce
Steps to reproduce the behavior:

Create a pipeline in Mage UI
Somewhere in your pipeline code type a non-ascii char like the EUR symbol €
Wait for the file to be saved automatically
go to your IDE and open the python file
the non-ascii char should be corrupted

You can also do the opposite:

Create a pipeline in Mage UI
Edit the pipeline file in you IDE (not in Mage UI)
Somewhere in your pipeline code type a non-ascii char like the EUR symbol €
Save the file
Refresh Mage pipeline page
the non-ascii char should be corrupted

Expected behavior
UTF-8 chars should be supported

Screenshots
I type the EUR symbol in my IDE:

And when I open the pipeline in Mage, in the browser, this is the result:

Also the opposite doesn't work, if I type the EUR symbol in Mage IDE in the browser, when I open the python file in my IDE, the symbol is corrupted

Desktop (please complete the following information):

OS: Win 10
Browser: Chrome
Version: 108

Additional context
None

As a user, I want to filter and/or search all my pipeline runs

Is your feature request related to a problem? Please describe.
I have thousands of pipeline runs for a single pipeline. I want to filter and search for a specific one using values in the pipeline run’s variables.

Example: filter or search by execution date range

Struggling to get it to work behind jupyter-server-proxy

Hello, I'm trying to get this up and running behind a jupyter-server-proxy and I'm getting some strange errors.

For context I have a JupyterHub server running in AWS that I use for development. Usually I can proxy ports for things like streamlit and code-server using jupyter-server-proxy by launching the process and visiting {base_url}/proxy/{port}/ but it doesn't seem to be working with mage-ai

Steps I have followed:

pip installed mage-ai
run mage start demo_project and it creates the scaffolding and such then starts running the server like so:
Tried to visit the {base_url}/proxy/6789/url and I get a blank white page and some 404 errors in the JS Console along the lines of GET https://{jupyter_hub_url}/hub/_next/static/css/9563de380da227ba.css net::ERR_ABORTED 404

You can see on that screenshot that requests are getting to the web server but it looks like it's having trouble retrieving the assets for the page looks like it should be trying to retrieve them from {base_url}/proxy/6789/_next/static/css/9563de380da227ba.css (I can get the CSS file if I make a GET request to this address)

Any ideas why the routing isn't working? and why it seems to work fine with other projects e.g. streamlit ?

Let me know if you need any more details 😊

Support conf file format in data loader and exporter

Is your feature request related to a problem? Please describe.

Hello, I am trying to load and export .conf files between AWS S3, and I'm getting an error due to the format of the file. Loading a .csv file (commented in screenshot below) works as expected. Any ideas on how I can load/export .conf files?

Describe the solution you'd like

Support reading conf file and store the data in as a json.
Support exporting json data to a conf file.

Describe alternatives you've considered

Additional context
Feature requested by Wes Moorhead

As a user, I want data loader and data exporter templates for Microsoft SQL Server

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Can't filter pipeline runs by status

When debugging the failed pipeline runs, it'll be helpful to add the filter to filter pipeline runs or block runs by status (failed, pending, etc.).

Outdated default IO query limit in documentation

Describe the bug
The documentation states that the default query limit for loading data from the cloud is 100,000 rows. #779 bumped this value to 10,000,000 rows.

To Reproduce
Steps to reproduce the behavior:

Go to the data loading documentation.
CMD + F into "100,000".

Screenshots

Additional context
I opened a PR to update this value (#1706)

Interpolate global variables in SQL block configuration

When writing in the database or the schema in a SQL block, currently you hardcode the values.

You write the values in an input field shown below:

This information gets written to the pipeline’s YAML file, in the block’s configuration. For example:

blocks:
- all_upstream_blocks_executed: true
  configuration:
    data_provider: snowflake
    data_provider_profile: default
    data_provider_database: mage
    data_provider_schema: public
    export_write_policy: append
  executor_type: local_python
  language: sql

Instead of hardcoding, we should be able to interpolate any variable values using the handlebar syntax: {{ var('env') }} where env is the key we want to extract from the global variables.

So now, we can save SQL block configurations like this:

blocks:
- all_upstream_blocks_executed: true
  configuration:
    data_provider: snowflake
    data_provider_profile: default
    data_provider_database: mage_{{ env_var('env') }}
    data_provider_schema: public_{{ env_var('env') }}
    export_write_policy: append
  executor_type: local_python
  language: sql

When a SQL block is executed, the database and schema are accessed here: https://github.com/mage-ai/mage-ai/blob/master/mage_ai/data_preparation/models/block/sql/__init__.py#L57

We should interpolate the variable values from the global variables into the database and schema values if the above syntax is present.

Example code:

from jinja2 import Template
from mage_ai.data_preparation.shared.utils import get_template_vars
interpolated_str = Template(raw_str).render(
    **get_template_vars()
)

Run a pipeline without the need for decorators (data_loaders, transformers, data_exporters)

I'm testing Mage to see if it can be a good fit for a project I'm working on.
I already have a pipeline in notebooks, using Spark Structured Streaming.
I tried to just copy the notebooks to mage and use scratchpad, and it works fine, but then it's not possible to run it as a pipeline. I then tried to use data loader, transformer and data exporter, but I get some strange errors.

The pipeline reads from a Kafka topic using spark structured streaming, so first I thought I could just wrap the spark read method in a decorator and it would work.
So I created a streaming pipeline, but then I can't make the data loader in spark and have to use the kafka yaml file.
So instead I tried to use batch pipeline. Here is a super simple example

from pyspark.sql import SparkSession

if 'data_loader' not in globals():
    from mage_ai.data_preparation.decorators import data_loader
if 'test' not in globals():
    from mage_ai.data_preparation.decorators import test

spark = SparkSession.builder.appName("foo").getOrCreate()


@data_loader
def load_data(*args, **kwargs):
    return (spark.readStream.format("kafka")
            .option("kafka.bootstrap.servers", "localhost:9093")
            .option("subscribe", "example")
            .option("startingOffsets", "earliest")
            .load())


@test
def test_output(df, *args) -> None:
    """
    Template code for testing the output of the block.
    """
    assert df is not None, 'The output is undefined'

But I get the following error

---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
Cell In[7], line 71
     68     else:
     69         return find(lambda val: val is not None, output)
---> 71 df = execute_custom_code()
     73 # Post processing code below (source: output_display.py)
     76 def __custom_output():

Cell In[7], line 55, in execute_custom_code()
     50     block.run_upstream_blocks()
     52 global_vars = {'env': 'dev', 'execution_date': datetime.datetime(2023, 1, 2, 21, 0, 21, 934826), 'event': {}} or dict()
---> 55 block_output = block.execute_sync(
     56     custom_code=code,
     57     global_vars=global_vars,
     58     analyze_outputs=True,
     59     update_status=True,
     60     test_execution=True,
     61 )
     62 if False:
     63     block.run_tests(custom_code=code, update_tests=False)

File /usr/local/lib/python3.10/site-packages/mage_ai/data_preparation/models/block/__init__.py:575, in Block.execute_sync(self, analyze_outputs, build_block_output_stdout, custom_code, execution_partition, global_vars, logger, run_all_blocks, test_execution, update_status, store_variables, verify_output, input_from_output, runtime_arguments, dynamic_block_index, dynamic_block_uuid, dynamic_upstream_block_uuids)
    568     if logger is not None:
    569         logger.exception(
    570             f'Failed to execute block {self.uuid}',
    571             block_type=self.type,
    572             block_uuid=self.uuid,
    573             error=err,
    574         )
--> 575     raise err
    576 finally:
    577     if update_status:

File /usr/local/lib/python3.10/site-packages/mage_ai/data_preparation/models/block/__init__.py:544, in Block.execute_sync(self, analyze_outputs, build_block_output_stdout, custom_code, execution_partition, global_vars, logger, run_all_blocks, test_execution, update_status, store_variables, verify_output, input_from_output, runtime_arguments, dynamic_block_index, dynamic_block_uuid, dynamic_upstream_block_uuids)
    542 if store_variables and self.pipeline.type != PipelineType.INTEGRATION:
    543     try:
--> 544         self.store_variables(
    545             variable_mapping,
    546             execution_partition=execution_partition,
    547             override_outputs=True,
    548             spark=(global_vars or dict()).get('spark'),
    549             dynamic_block_uuid=dynamic_block_uuid,
    550         )
    551     except ValueError as e:
    552         if str(e) == 'Circular reference detected':

File /usr/local/lib/python3.10/site-packages/mage_ai/data_preparation/models/block/__init__.py:1210, in Block.store_variables(self, variable_mapping, execution_partition, override, override_outputs, spark, dynamic_block_uuid)
   1208     if spark is not None and type(data) is pd.DataFrame:
   1209         data = spark.createDataFrame(data)
-> 1210     self.pipeline.variable_manager.add_variable(
   1211         self.pipeline.uuid,
   1212         uuid_to_use,
   1213         uuid,
   1214         data,
   1215         partition=execution_partition,
   1216     )
   1218 for uuid in removed_variables:
   1219     self.pipeline.variable_manager.delete_variable(
   1220         self.pipeline.uuid,
   1221         uuid_to_use,
   1222         uuid,
   1223     )

File /usr/local/lib/python3.10/site-packages/mage_ai/data_preparation/variable_manager.py:72, in VariableManager.add_variable(self, pipeline_uuid, block_uuid, variable_uuid, data, partition, variable_type)
     70 variable.delete()
     71 variable.variable_type = variable_type
---> 72 variable.write_data(data)

File /usr/local/lib/python3.10/site-packages/mage_ai/data_preparation/models/variable.py:153, in Variable.write_data(self, data)
    151     self.__write_parquet(data)
    152 elif self.variable_type == VariableType.SPARK_DATAFRAME:
--> 153     self.__write_spark_parquet(data)
    154 elif self.variable_type == VariableType.GEO_DATAFRAME:
    155     self.__write_geo_dataframe(data)

File /usr/local/lib/python3.10/site-packages/mage_ai/data_preparation/models/variable.py:279, in Variable.__write_spark_parquet(self, data)
    277 def __write_spark_parquet(self, data) -> None:
    278     (
--> 279         data.write
    280         .option('header', 'True')
    281         .mode('overwrite')
    282         .csv(self.variable_path)
    283     )

File /usr/local/lib/python3.10/site-packages/pyspark/sql/dataframe.py:338, in DataFrame.write(self)
    326 @property
    327 def write(self) -> DataFrameWriter:
    328     """
    329     Interface for saving the content of the non-streaming :class:`DataFrame` out into external
    330     storage.
   (...)
    336     :class:`DataFrameWriter`
    337     """
--> 338     return DataFrameWriter(self)

File /usr/local/lib/python3.10/site-packages/pyspark/sql/readwriter.py:731, in DataFrameWriter.__init__(self, df)
    729 self._df = df
    730 self._spark = df.sparkSession
--> 731 self._jwrite = df._jdf.write()

File /usr/local/lib/python3.10/site-packages/py4j/java_gateway.py:1321, in JavaMember.__call__(self, *args)
   1315 command = proto.CALL_COMMAND_NAME +\
   1316     self.command_header +\
   1317     args_command +\
   1318     proto.END_COMMAND_PART
   1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
   1322     answer, self.gateway_client, self.target_id, self.name)
   1324 for temp_arg in temp_args:
   1325     temp_arg._detach()

File /usr/local/lib/python3.10/site-packages/pyspark/sql/utils.py:196, in capture_sql_exception.<locals>.deco(*a, **kw)
    192 converted = convert_exception(e.java_exception)
    193 if not isinstance(converted, UnknownException):
    194     # Hide where the exception came from that shows a non-Pythonic
    195     # JVM exception message.
--> 196     raise converted from None
    197 else:
    198     raise

AnalysisException: 'write' can not be called on streaming Dataset/DataFrame

Any idea why this isn't working?

Will it be possible in the future to run a pipeline without the data loader, transformer and data exporter?

As a user, I want to kick off a new pipeline run from the pipeline list view

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Vertica support

Would it be possible to support Vertica DB on SQL blocks?

Thanks!

As a user, I want to see all unused data loaders, transformers, scratchpads, data exporters, charts, and sensors and be able to easily delete them.

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

As a user, I want to execute a function if a block run fails or succeeds

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

First Review

Bugs:
Environment: Virtual Env in WSL 2

While installing: pyproject.toml file format issue.
While launching .py script: ModuleNotFoundError: No module named 'requests'
While launching .py script: Unable to access localhost:5789 from browser

Operations Performed:

Removing duplicates
Formatting columns value that was suggested
Custom code implementation - typecast values in certain columns using regex

Scope of Improvements:

Option to edit custom code selected from New Actions
Error info displaying in the UI is not redirecting to the root-cause. Was able to figure out from logs.
Info icon which on hover can explain some terminology like category_high_cardinality
Similar info icon or hyperlink that redirects to what certain outliers means when selecting, specially how auto is performed
Option to download the processed data in CSV/Excel format, currently only JSON format

As a user, I want data loader and data exporter templates for Azure SQL

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Use different conda/python environments for different pipelines

Is your feature request related to a problem? Please describe.
Each pipeline requires a specific package requirements. We cannot share an environment among pipelines because of conflicts (E.g. some packages require numpy<=1.19 while others require numpy>=1.20.
Current mage-ai version does not support to assign an environment for a specific pipeline.
I think it's good to have this feature.
Hope to hear some responses from authors.

As a user, I want to be able to run complex pipelines where each task runs in its own Docker container

Is your feature request related to a problem? Please describe.
Yes, the problem is that in complex production environments, different tasks within a pipeline often require different programming languages and/or Python environments. This can make it difficult to manage and execute these tasks in an efficient and scalable way.

Describe the solution you'd like
I would like the ability to define pipelines where tasks can be executed in separate Docker containers, potentially using multiple Docker images. This would allow for better isolation and management of the different environments and languages required for each task.

Describe alternatives you've considered
One alternative that I have considered is using Argo to run tasks as Docker containers. However, using YAML to define pipelines can become unwieldy for larger projects and organizations. In the past, I have worked on production projects where we used Airflow with the KubernetesPodOperator, which provided a more scalable solution.

Additional context
If Mage is intended as a replacement for Airflow, it is important that it support the most common setup used by mature organizations in production environments. This includes the ability to run each task in its own isolated environment using Docker containers.

Mage retrieves EMR metadata even if it is not configured to use EMR

Describe the bug
While running the clean mage-ai service on localhost I observe in the logs that an AWS call is being initiated in order to retrieve EMR information, even though I have not enabled EMR in metadata.yaml.

demo_mage_app  | [E 221115 05:17:02 web:1789] Uncaught exception GET /api/clusters/emr (172.22.0.1)
demo_mage_app  |     HTTPServerRequest(protocol='http', host='localhost:6789', method='GET', uri='/api/clusters/emr', version='HTTP/1.1', remote_ip='172.22.0.1')
demo_mage_app  |     Traceback (most recent call last):
demo_mage_app  |       File "/usr/local/lib/python3.10/site-packages/tornado/web.py", line 1702, in _execute
demo_mage_app  |         result = method(*self.path_args, **self.path_kwargs)
demo_mage_app  |       File "/usr/local/lib/python3.10/site-packages/mage_ai/server/api/clusters.py", line 14, in get
demo_mage_app  |         clusters = emr_cluster_manager.list_clusters()
demo_mage_app  |       File "/usr/local/lib/python3.10/site-packages/mage_ai/cluster_manager/aws/emr_cluster_manager.py", line 19, in list_clusters
demo_mage_app  |         clusters = list_clusters()['Clusters']
demo_mage_app  |       File "/usr/local/lib/python3.10/site-packages/mage_ai/services/aws/emr/emr.py", line 128, in list_clusters
demo_mage_app  |         clusters = emr_client.list_clusters(
demo_mage_app  |       File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 514, in _api_call
demo_mage_app  |         return self._make_api_call(operation_name, kwargs)
demo_mage_app  |       File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 921, in _make_api_call
demo_mage_app  |         http, parsed_response = self._make_request(
demo_mage_app  |       File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 944, in _make_request
demo_mage_app  |         return self._endpoint.make_request(operation_model, request_dict)
demo_mage_app  |       File "/usr/local/lib/python3.10/site-packages/botocore/endpoint.py", line 119, in make_request
demo_mage_app  |         return self._send_request(request_dict, operation_model)
demo_mage_app  |       File "/usr/local/lib/python3.10/site-packages/botocore/endpoint.py", line 198, in _send_request
demo_mage_app  |         request = self.create_request(request_dict, operation_model)
demo_mage_app  |       File "/usr/local/lib/python3.10/site-packages/botocore/endpoint.py", line 134, in create_request
demo_mage_app  |         self._event_emitter.emit(
demo_mage_app  |       File "/usr/local/lib/python3.10/site-packages/botocore/hooks.py", line 412, in emit
demo_mage_app  |         return self._emitter.emit(aliased_event_name, **kwargs)
demo_mage_app  |       File "/usr/local/lib/python3.10/site-packages/botocore/hooks.py", line 256, in emit
demo_mage_app  |         return self._emit(event_name, kwargs)
demo_mage_app  |       File "/usr/local/lib/python3.10/site-packages/botocore/hooks.py", line 239, in _emit
demo_mage_app  |         response = handler(**kwargs)
demo_mage_app  |       File "/usr/local/lib/python3.10/site-packages/botocore/signers.py", line 105, in handler
demo_mage_app  |         return self.sign(operation_name, request)
demo_mage_app  |       File "/usr/local/lib/python3.10/site-packages/botocore/signers.py", line 189, in sign
demo_mage_app  |         auth.add_auth(request)
demo_mage_app  |       File "/usr/local/lib/python3.10/site-packages/botocore/auth.py", line 418, in add_auth
demo_mage_app  |         raise NoCredentialsError()
demo_mage_app  |     botocore.exceptions.NoCredentialsError: Unable to locate credentials

To Reproduce
Steps to reproduce the behavior:

pip install mage-ai
mage start
In the logs you observe a call being made to AWS to list the EMR clusters

Expected behavior
Any AWS EMR call should be done only if the project is configured to use EMR.

Screenshots
N/A

Desktop (please complete the following information):

OS: Monterey
Browser: firefox
Version: 106.0
mage-ai: 0.6.3

Smartphone (please complete the following information):
N/A

Additional context
N/A

Additional languages

Hi,

Do you think it will be possible in a near future to have have blocks using shell scripts (bash, zsh, fish...) or the Julia language ?

Thank you

Cannot Create New Pipeline

I have a project running with:

mage start
mage version 0.3.4
windows machine using WSL

I seem to be unable to create a new pipeline. I can press the new pipeline button on the UI and refresh the page so it appears, but then I get the error:

Traceback (most recent call last):

  File "/mnt/c/Users/path_to/venv/lib/python3.7/site-packages/tornado/web.py", line 1702, in _execute

    result = method(*self.path_args, **self.path_kwargs)

  File "/mnt/c/Users/path_to/venv/lib/python3.7/site-packages/mage_ai/server/server.py", line 111, in get

    pipeline = Pipeline.get(pipeline_uuid)

  File "/mnt/c/Users/path_to/venv/lib/python3.7/site-packages/mage_ai/data_preparation/models/pipeline.py", line 137, in get

    return Pipeline(uuid)

  File "/mnt/c/Users/path_to/venv/lib/python3.7/site-packages/mage_ai/data_preparation/models/pipeline.py", line 38, in __init__

    self.load_config_from_yaml()

  File "/mnt/c/Users/path_to/venv/lib/python3.7/site-packages/mage_ai/data_preparation/models/pipeline.py", line 247, in load_config_from_yaml

    self.load_config(self.get_config_from_yaml())

  File "/mnt/c/Users/path_to/venv/lib/python3.7/site-packages/mage_ai/data_preparation/models/pipeline.py", line 269, in load_config

    blocks = [build_shared_args_kwargs(c, Block) for c in self.block_configs]

TypeError: 'NoneType' object is not iterable

'NoneType' object is not iterable

Mage doesn't support running two servers simultaneously

Describe the bug

When trying to run simultaneously 2 servers
mage start something
using mage start something_else --port 6791 (in another folder) it opens a server on that port but it shows the data of the first server
and the button doesn't work when on different port

To Reproduce
Steps to reproduce the behavior:

Run mage start project_1 in one folder
Run mage start project_2 --port 6791 in another folder
Check whether both servers load pipelines correctly

Expected behavior
Two server should be running correctly at the same time

Additional context
Reported by Stefan Chirus

mage-ai / mage-ai Goto Github PK

mage-ai's Introduction

Give your data team magical powers

Build

Preview

Launch

🧙 Intro

🏃‍♀️ Install

🎮 Demo

Live demo

Demo video (5 min)

👩‍🏫 Tutorials

🔮 Features

🏔️ Core design principles

🛸 Core abstractions

🙋‍♀️ Contributing and developing

👨‍👩‍👧‍👦 Community

🤔 Frequently Asked Questions (FAQs)

🪪 License

mage-ai's People

Contributors

Stargazers

Watchers

Forkers

mage-ai's Issues

My code is as below:

And the error produced:

Reproduce

Issue

Potential solution

Reporter

Recommend Projects

Recommend Topics

Recommend Org

Give your data team `magical` powers