tekumara / fakesnow Goto Github PK

View Code? Open in Web Editor NEW

68.0 2.0 6.0 501 KB

Fake Snowflake Connector for Python. Run, mock and test Snowflake DB locally.

License: Apache License 2.0

Makefile 0.74% Python 91.93% Jupyter Notebook 7.32%

local mock snowflakedb testing

fakesnow's Introduction

fakesnow ❄️

Fake Snowflake Connector for Python. Run and mock Snowflake DB locally.

Install

pip install fakesnow

Usage

Run script.py with fakesnow:

fakesnow script.py

Or a module, eg: pytest

fakesnow -m pytest

fakesnow executes fakesnow.patch before running the script or module.

fakesnow.patch

To use fakesnow within your code:

import fakesnow
import snowflake.connector

with fakesnow.patch():
    conn = snowflake.connector.connect()

    print(conn.cursor().execute("SELECT 'Hello fake world!'").fetchone())

The following imports are automatically patched:

import snowflake.connector.connect
import snowflake.connector.pandas_tools.write_pandas

To patch modules that use the from ... import syntax, manually specify them, eg: if mymodule.py has the import:

from snowflake.connector.pandas_tools import write_pandas

Then patch it using:

with fakesnow.patch("mymodule.write_pandas"):
    ...

By default databases are in-memory. To persist databases between processes, specify a databases path:

with fakesnow.patch(db_path="databases/"):
    ...

pytest fixtures

pytest fixtures are provided for testing. Example conftest.py:

from typing import Iterator

import fakesnow.fixtures
import pytest

pytest_plugins = fakesnow.fixtures.__name__

@pytest.fixture(scope="session", autouse=True)
def setup(_fakesnow_session: None) -> Iterator[None]:
    # the standard imports are now patched
    ...
    yield

Or with from ... import patch targets:

from typing import Iterator

import fakesnow
import pytest

@pytest.fixture(scope="session", autouse=True)
def _fakesnow_session() -> Iterator[None]:
    with fakesnow.patch("mymodule.write_pandas"):
        yield

Implementation coverage

Partial support

For more detail see tests/test_fakes.py

Caveats

The order of rows is non deterministic and may not match Snowflake unless ORDER BY is fully specified.
A more liberal Snowflake SQL dialect than used by a real Snowflake instance is supported, ie: some queries might pass using fakesnow that a real Snowflake instance would reject.

Contributing

See CONTRIBUTING.md to get started and develop in this repo.

fakesnow's People

Contributors

Stargazers

Watchers

Forkers

ifm-anowaczek tharwan dancardin seruman ifm-pgarner peak

fakesnow's Issues

Lateral Flatten Conversion Error: Malformed JSON at byte 0 of input: unexpected character

MCVE in Snowflake:

CREATE OR REPLACE TEMPORARY TABLE TEST (ID BIGINT, COL VARCHAR);
INSERT INTO TEST (ID, COL) VALUES (1, 's1,s2,s3'), (2, 's1,s2');

SELECT
    ID
    , F.VALUE::varchar as V
FROM TEST AS T
, LATERAL FLATTEN(input => SPLIT(T.COL, ',')) AS F;

ID	V
1	s1
1	s2
1	s3
2	s1
2	s2

MCVE in Fakesnow:

import fakesnow
import snowflake.connector


def main():
    with fakesnow.patch():
        conn = snowflake.connector.connect(database="X", schema="Y")
        conn.execute_string("CREATE OR REPLACE TEMPORARY TABLE TEST (ID BIGINT, COL VARCHAR)")
        conn.execute_string("INSERT INTO TEST (ID, COL) VALUES (1, 's1,s2,s3'), (2, 's1,s2')")
        conn.cursor().execute("""
        SELECT
            ID
            , F.VALUE::varchar as V
        FROM TEST AS T
        , LATERAL FLATTEN(input => SPLIT(T.COL, ',')) AS F
        """).fetchall()


if __name__ == "__main__":
    main()

Error:

duckdb.duckdb.ConversionException: Conversion Error: Malformed JSON at byte 0 of input: unexpected character.  Input: s1

Version:

$ pip freeze | grep snow
fakesnow==0.9.12
snowflake-connector-python==3.2.1
snowflake-sqlalchemy==1.5.0

FAKESNOW_DEBUG=1:

CREATE OR REPLACE TEMPORARY TABLE TEST (ID BIGINT, COL TEXT);
INSERT INTO TEST (ID, COL) VALUES (1, 's1,s2,s3'), (2, 's1,s2');
SELECT ID, CAST(F.VALUE AS TEXT) AS V FROM TEST AS T, LATERAL UNNEST(CAST(STR_SPLIT(T.COL, ',') AS JSON[])) AS F(VALUE);

Incorrect precision for integers in information_schema

fakesnow == 0.2.0

According to:

https://docs.snowflake.com/en/sql-reference/data-types-numeric#int-integer-bigint-smallint-tinyint-byteint

INT , INTEGER , BIGINT , SMALLINT , TINYINT , BYTEINT
Synonymous with NUMBER, except that precision and scale cannot be specified (i.e. always defaults to NUMBER(38, 0)). Therefore, for all INTEGER data types, the range of values is all integer values from -99999999999999999999999999999999999999 to +99999999999999999999999999999999999999 (inclusive).

Yet, when I create:

                    TEST_BIGINT BIGINT,
                    TEST_INTEGER INTEGER,
                    TEST_SMALLINT SMALLINT,

When I select the columns from information_schema.columns, I get:

{'TABLE_CATALOG': 'TESTDB', 'TABLE_SCHEMA': 'PUBLIC', 'TABLE_NAME': 'TEST_TYPES', 'COLUMN_NAME': 'TEST_BIGINT', 'ORDINAL_POSITION': 1, 'COLUMN_DEFAULT': None, 'IS_NULLABLE': 'YES', 'DATA_TYPE': 'BIGINT', 'CHARACTER_MAXIMUM_LENGTH': None, 'CHARACTER_OCTET_LENGTH': None, 'NUMERIC_PRECISION': 64, 'NUMERIC_PRECISION_RADIX': 2, 'NUMERIC_SCALE': 0, 'DATETIME_PRECISION': None, 'INTERVAL_TYPE': None, 'INTERVAL_PRECISION': None, 'CHARACTER_SET_CATALOG': None, 'CHARACTER_SET_SCHEMA': None, 'CHARACTER_SET_NAME': None, 'COLLATION_CATALOG': None, 'COLLATION_SCHEMA': None, 'COLLATION_NAME': None, 'DOMAIN_CATALOG': None, 'DOMAIN_SCHEMA': None, 'DOMAIN_NAME': None, 'UDT_CATALOG': None, 'UDT_SCHEMA': None, 'UDT_NAME': None, 'SCOPE_CATALOG': None, 'SCOPE_SCHEMA': None, 'SCOPE_NAME': None, 'MAXIMUM_CARDINALITY': None, 'DTD_IDENTIFIER': None, 'IS_SELF_REFERENCING': None, 'IS_IDENTITY': None, 'IDENTITY_GENERATION': None, 'IDENTITY_START': None, 'IDENTITY_INCREMENT': None, 'IDENTITY_MAXIMUM': None, 'IDENTITY_MINIMUM': None, 'IDENTITY_CYCLE': None, 'IS_GENERATED': None, 'GENERATION_EXPRESSION': None, 'IS_UPDATABLE': None}
{'TABLE_CATALOG': 'TESTDB', 'TABLE_SCHEMA': 'PUBLIC', 'TABLE_NAME': 'TEST_TYPES', 'COLUMN_NAME': 'TEST_INTEGER', 'ORDINAL_POSITION': 2, 'COLUMN_DEFAULT': None, 'IS_NULLABLE': 'YES', 'DATA_TYPE': 'INTEGER', 'CHARACTER_MAXIMUM_LENGTH': None, 'CHARACTER_OCTET_LENGTH': None, 'NUMERIC_PRECISION': 32, 'NUMERIC_PRECISION_RADIX': 2, 'NUMERIC_SCALE': 0, 'DATETIME_PRECISION': None, 'INTERVAL_TYPE': None, 'INTERVAL_PRECISION': None, 'CHARACTER_SET_CATALOG': None, 'CHARACTER_SET_SCHEMA': None, 'CHARACTER_SET_NAME': None, 'COLLATION_CATALOG': None, 'COLLATION_SCHEMA': None, 'COLLATION_NAME': None, 'DOMAIN_CATALOG': None, 'DOMAIN_SCHEMA': None, 'DOMAIN_NAME': None, 'UDT_CATALOG': None, 'UDT_SCHEMA': None, 'UDT_NAME': None, 'SCOPE_CATALOG': None, 'SCOPE_SCHEMA': None, 'SCOPE_NAME': None, 'MAXIMUM_CARDINALITY': None, 'DTD_IDENTIFIER': None, 'IS_SELF_REFERENCING': None, 'IS_IDENTITY': None, 'IDENTITY_GENERATION': None, 'IDENTITY_START': None, 'IDENTITY_INCREMENT': None, 'IDENTITY_MAXIMUM': None, 'IDENTITY_MINIMUM': None, 'IDENTITY_CYCLE': None, 'IS_GENERATED': None, 'GENERATION_EXPRESSION': None, 'IS_UPDATABLE': None}
{'TABLE_CATALOG': 'TESTDB', 'TABLE_SCHEMA': 'PUBLIC', 'TABLE_NAME': 'TEST_TYPES', 'COLUMN_NAME': 'TEST_SMALLINT', 'ORDINAL_POSITION': 3, 'COLUMN_DEFAULT': None, 'IS_NULLABLE': 'YES', 'DATA_TYPE': 'SMALLINT', 'CHARACTER_MAXIMUM_LENGTH': None, 'CHARACTER_OCTET_LENGTH': None, 'NUMERIC_PRECISION': 16, 'NUMERIC_PRECISION_RADIX': 2, 'NUMERIC_SCALE': 0, 'DATETIME_PRECISION': None, 'INTERVAL_TYPE': None, 'INTERVAL_PRECISION': None, 'CHARACTER_SET_CATALOG': None, 'CHARACTER_SET_SCHEMA': None, 'CHARACTER_SET_NAME': None, 'COLLATION_CATALOG': None, 'COLLATION_SCHEMA': None, 'COLLATION_NAME': None, 'DOMAIN_CATALOG': None, 'DOMAIN_SCHEMA': None, 'DOMAIN_NAME': None, 'UDT_CATALOG': None, 'UDT_SCHEMA': None, 'UDT_NAME': None, 'SCOPE_CATALOG': None, 'SCOPE_SCHEMA': None, 'SCOPE_NAME': None, 'MAXIMUM_CARDINALITY': None, 'DTD_IDENTIFIER': None, 'IS_SELF_REFERENCING': None, 'IS_IDENTITY': None, 'IDENTITY_GENERATION': None, 'IDENTITY_START': None, 'IDENTITY_INCREMENT': None, 'IDENTITY_MAXIMUM': None, 'IDENTITY_MINIMUM': None, 'IDENTITY_CYCLE': None, 'IS_GENERATED': None, 'GENERATION_EXPRESSION': None, 'IS_UPDATABLE': None}

The NUMERIC_PRECISION is 64, 32, and 16, respectively. This seems wrong. I was expecting them all to be 38, as defined in the docs.

NOTE: I didn't verify this behavior against Snowflake. I'm going off their docs.

DELETE statement fails with Parser Error

Hi I triggered another issue when issuing a DELETE:

from snowflake.sqlalchemy import URL
import sqlalchemy
import fakesnow
from sqlalchemy.ext.declarative import declarative_base

from sqlalchemy import Column, Integer

Base = declarative_base()


class TestTable(Base):
    __tablename__ = "testtable"
    id_col = Column(Integer, primary_key=True, autoincrement=False)
    val_col = Column(Integer)


with fakesnow.patch():
    engine = sqlalchemy.create_engine(
        URL(
            account="abc123",
            user="testuser1",
            password="0123456",
            database="testdb",
            schema="public",
        )
    )

    TestTable.__table__.create(engine)

    with engine.connect() as con:
        ins = TestTable.__table__.insert().values(id_col=1, val_col=10)
        con.execute(ins)
        ins = TestTable.__table__.insert().values(id_col=2, val_col=20)
        con.execute(ins)

        con.execute("CREATE TABLE TESTTABLE2 AS FROM TESTTABLE")

        print(con.execute("SELECT * FROM TESTTABLE").fetchall())
        print(con.execute("SELECT * FROM TESTTABLE2").fetchall())

        print(
            con.execute(
                "DELETE FROM TESTTABLE2 USING TESTTABLE WHERE TESTTABLE.id_col = TESTTABLE2.id_col"
            ).fetchall()
        )

Results in:

duckdb.duckdb.ParserException: Parser Error: syntax error at or near "FROM"
LINE 1: DESCRIBE DELETE FROM TESTTABLE2 USING TESTTABLE WHERE T...

This looks related to #38 for me as it also happens in the describe.

CREATE TAG

I have issue when I try to create TAG

  File "*\Python\Python312\site-packages\fakesnow\fakes.py", line 238, in _execute
    self._duck_conn.execute(sql, params)
duckdb.duckdb.ParserException: Parser Error: syntax error at or near "TAG"

Don't allow unquoted tables to be created with hypens

Binder Error when alias is used in JOIN clause

MCVE in Snowflake:

CREATE OR REPLACE TEMPORARY TABLE TEST (COL VARCHAR);
INSERT INTO TEST (COL) VALUES ('VARCHAR1'), ('VARCHAR2');

CREATE OR REPLACE TEMPORARY TABLE JOINED (COL VARCHAR, ANOTHER VARCHAR);
INSERT INTO JOINED (COL, ANOTHER) VALUES ('CHAR1', 'JOIN');

SELECT
    T.COL
    , SUBSTR(T.COL, 4) AS ALIAS
    , J.ANOTHER
FROM TEST AS T
LEFT JOIN JOINED AS J
ON ALIAS = J.COL;

Result:

COL	ALIAS	ANOTHER
VARCHAR1	CHAR1	JOIN
VARCHAR2	CHAR2

MCVE in fakesnow:

import fakesnow
import snowflake.connector


def main():
    with fakesnow.patch():
        conn = snowflake.connector.connect(database="X", schema="Y")
        conn.execute_string("CREATE OR REPLACE TEMPORARY TABLE TEST (COL VARCHAR)")
        conn.execute_string("INSERT INTO TEST (COL) VALUES ('VARCHAR1'), ('VARCHAR2')")
        conn.execute_string("CREATE OR REPLACE TEMPORARY TABLE JOINED (COL VARCHAR, ANOTHER VARCHAR)")
        conn.execute_string("INSERT INTO JOINED (COL, ANOTHER) VALUES ('CHAR1', 'JOIN')")
        rows = conn.cursor().execute("""
        SELECT
            T.COL
            , SUBSTR(T.COL, 4) AS ALIAS
            , J.ANOTHER
        FROM TEST AS T
        LEFT JOIN JOINED AS J
        ON ALIAS = J.COL;
        """).fetchall()


if __name__ == "__main__":
    main()

Error:

snowflake.connector.errors.ProgrammingError: 002043 (02000): Binder Error: Referenced column "ALIAS" not found in FROM clause!
Candidate bindings: "J.COL"
LINE 1: ...ROM TEST AS T LEFT JOIN JOINED AS J ON ALIAS = J.COL

Version:

$ pip freeze | grep snow
fakesnow==0.9.12
snowflake-connector-python==3.2.1
snowflake-sqlalchemy==1.5.0

FAKESNOW_DEBUG=1:

CREATE OR REPLACE TEMPORARY TABLE TEST (COL TEXT);
INSERT INTO TEST (COL) VALUES ('VARCHAR1'), ('VARCHAR2');
CREATE OR REPLACE TEMPORARY TABLE JOINED (COL TEXT, ANOTHER TEXT);
INSERT INTO JOINED (COL, ANOTHER) VALUES ('CHAR1', 'JOIN');
SELECT T.COL, SUBSTR(T.COL, 4) AS ALIAS, J.ANOTHER FROM TEST AS T LEFT JOIN JOINED AS J ON ALIAS = J.COL;

describe on an ARRAY or OBJECT column incorrectly returns VARIANT

result metadata for an ARRAY or OBJECT column incorrectly returns a type_code of 5 (VARIANT) but should be:

9 for OBJECT
10 for ARRAY

as per https://docs.snowflake.com/en/user-guide/python-connector-api.html#type-codes

Connection object has no attribute 'close'

Fake connection object should mimic original one methods. Miss the close:
AttributeError: 'FakeSnowflakeConnection' object has no attribute 'close'

Pandas dependency missing

PDM, version 2.7.4.

[tool.pdm.dev-dependencies]
tests = [
    "pytest>=7.2.1",
    "psycopg2>=2.9.6",
    "snowflake-connector-python>=3.0.4",
    "fakesnow>=0.1.0",
]

When I run my pyunit tests, I get:

Traceback:
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/readers/test_snowflake.py:3: in <module>
    import fakesnow
.venv/lib/python3.10/site-packages/fakesnow/__init__.py:17: in <module>
    import fakesnow.fakes as fakes
.venv/lib/python3.10/site-packages/fakesnow/fakes.py:9: in <module>
    import pandas as pd
E   ModuleNotFoundError: No module named 'pandas'

It looks like fakesnow is importing pandas but doesn't declare it as a dependency.

I do see that the [pandas] snowflake extra_requires does declare it as a dependency, but it doesn't appear to be installing from your package.

`extract_text_length` might extract invalid column names on CTAS

If a table created with a CTAS query containing TEXT/VARCHAR casts, extract_text_length might not be working as expected.

With a DDL like below;

NOTE: query is a little non-sense but this is the simplest reproducer I could come up with

CREATE TABLE SOME_TABLE AS (
  SELECT
    CAST(C1 AS TEXT) || '-' || CAST(C1 AS TEXT) AS TEXT) AS KEY_ID
  FROM
    VALUES (1), (2) AS T (C1)
)

Extracted lengths would end up;

e = sqlglot.parse_one(q, read="snowflake").transform(extract_text_length)
text_lengths = e.args["text_lengths"]
# [(DPipe(
#   this=Cast(
#     this=Column(
#       this=Identifier(this=C1, quoted=False)),
#     to=DataType(this=Type.TEXT, nested=False)),
#   expression=Literal(this=-, is_string=True),
#   safe=True), 16777216), (Identifier(this=C1, quoted=False), 16777216), (Identifier(this=C1, quoted=False), 16777216)]

That would generate an INSERT query to _fs_columns_ext table as;

INSERT INTO DB1.information_schema._fs_columns_ext
values ('DB1', 'SCHEMA1', 'SOME_TABLE', 'CAST(C1 AS TEXT) || '-'', 16777216, 16777216), ('DB1', 'SCHEMA1', 'SOME_TABLE', 'C1', 16777216, 16777216), ('DB1', 'SCHEMA1', 'SOME_TABLE', 'C1', 16777216, 16777216)
ON CONFLICT (ext_table_catalog, ext_table_schema, ext_table_name, ext_column_name)
DO UPDATE SET ext_character_maximum_length = excluded.ext_character_maximum_length,
    ext_character_octet_length = excluded.ext_character_octet_length

As column names are extracted as sqlglot expressions instead of the projected column name strings, query is invalid due to single quotes. Even if they're escaped, it would still be wrong as in the example column name should be KEY_ID.

fakesnow/fakes.py:139: in execute
    return self._execute(transformed, params)
fakesnow/fakes.py:325: in _execute
    self._duck_conn.execute(q)
E   duckdb.duckdb.BinderException: Binder Error: Could not choose a best candidate function for the function call "-(STRING_LITERAL, STRING_LITERAL)". In order to select one, please add explicit type casts.

Another example where generated INSERT query is valid with wrong column names;

CREATE TABLE SOME_TABLE AS (
  SELECT
    CAST(C1 AS TEXT) AS KEY_ID
  FROM
    VALUES (1), (2) AS T (C1)
)

-- [(Identifier(this=C1, quoted=False), 16777216)]

INSERT INTO DB1.information_schema._fs_columns_ext
values ('DB1', 'SCHEMA1', 'SOME_TABLE', 'C1', 16777216, 16777216)
ON CONFLICT (ext_table_catalog, ext_table_schema, ext_table_name, ext_column_name)
DO UPDATE SET ext_character_maximum_length = excluded.ext_character_maximum_length,
    ext_character_octet_length = excluded.ext_character_octet_length

I've experimented a little with sqlglot.optimizer.scope and sqlglotl.planner do get projected column names but couldn not come up with a solution.

In my use case, VARCHAR length enforcement is not required so I've just disabled the transformation, but wanted to notify as some folks might also encounter this.

parse_json transform is redundant? or its test is missing the point?

I noticed while implementing #63 that the parse_json transform seems to be redundant

For example here:
https://github.com/tekumara/fakesnow/blob/main/tests/test_transforms.py#L265-L271

if I remove the .transform(parse_json) from the test it still passes

it seems that sqlglot dialect="duckdb" does the parse_json -> JSON transform itself already

However there are a lot of other test cases in test_transforms.py which use parse_json in their test queries, with other transforms applied instead i.e. relying on sqlglot to do it, so maybe I am missing something - perhaps the parse_json transform is there for a special case that is not being exercised by the corresponding test case?

Possibly related to inconsistency in the docstring example here: https://github.com/tekumara/fakesnow/blob/main/fakesnow/transforms.py#L524-L527 ?

info schema for ARRAY and OBJECT column returns data_type VARIANT

related to #26 - solving that will solve this

`Catalog Error: Scalar Function with name date does not exist!` in version 0.9.8

when using function date() in a sql script results in Catalog Error: Scalar Function with name date does not exist! in version 0.9.8.

it used to work successfully in version 0.9.6.

here's code used to replicate the issue. Works in 0.9.6. Fails in 0.9.8

import fakesnow
import snowflake.connector

conn = snowflake.connector.connect()

conn.cursor().execute('CREATE DATABASE TEST')
conn.cursor().execute('USE DATABASE TEST')
conn.cursor().execute('CREATE SCHEMA TEST')
conn.cursor().execute('USE SCHEMA TEST')
print(conn.cursor().execute("with SOURCE_TABLE AS (SELECT '2024-01-01' AS start_date) SELECT 'Hello fake world!', date(a.start_date) from SOURCE_TABLE AS a").fetchone())

`information_schmema.catalogs` doesn't seem to exist

Fakesnow: 0.6.0
Python 3.10

When I run:

SELECT CATALOG_NAME FROM INFORMATION_SCHEMA.CATALOGS ORDER BY CATALOG_NAME

I get:

    def _execute(
        self,
        command: str,
        params: Sequence[Any] | dict[Any, Any] | None = None,
        *args: Any,
        **kwargs: Any,
    ) -> FakeSnowflakeCursor:
        self._arrow_table = None
    
        command, params = self._rewrite_with_params(command, params)
        expression = parse_one(command, read="snowflake")
    
        cmd = expr.key_command(expression)
    
        no_database, no_schema = checks.is_unqualified_table_expression(expression)
    
        if no_database and not self._conn.database_set:
            raise snowflake.connector.errors.ProgrammingError(
                msg=f"Cannot perform {cmd}. This session does not have a current database. Call 'USE DATABASE', or use a qualified name.",  # noqa: E501
                errno=90105,
                sqlstate="22000",
            )
        elif no_schema and not self._conn.schema_set:
            raise snowflake.connector.errors.ProgrammingError(
                msg=f"Cannot perform {cmd}. This session does not have a current schema. Call 'USE SCHEMA', or use a qualified name.",  # noqa: E501
                errno=90106,
                sqlstate="22000",
            )
    
        transformed = (
            expression.transform(transforms.upper_case_unquoted_identifiers)
            .transform(transforms.set_schema, current_database=self._conn.database)
            .transform(transforms.create_database)
            .transform(transforms.extract_comment)
            .transform(transforms.information_schema_columns_snowflake)
            .transform(transforms.information_schema_tables_ext)
            .transform(transforms.drop_schema_cascade)
            .transform(transforms.tag)
            .transform(transforms.semi_structured_types)
            .transform(transforms.parse_json)
            # indices_to_json_extract must be before regex_substr
            .transform(transforms.indices_to_json_extract)
            .transform(transforms.regex_replace)
            .transform(transforms.regex_substr)
            .transform(transforms.values_columns)
            .transform(transforms.to_date)
            .transform(transforms.to_decimal)
            .transform(transforms.object_construct)
            .transform(transforms.timestamp_ntz_ns)
            .transform(transforms.float_to_double)
            .transform(transforms.integer_precision)
            .transform(transforms.extract_text_length)
        )
        sql = transformed.sql(dialect="duckdb")
    
        try:
            self._last_sql = sql
            self._last_params = params
            # print(f"{sql};")
            self._duck_conn.execute(sql, params)
        except duckdb.BinderException as e:
            msg = e.args[0]
            raise snowflake.connector.errors.ProgrammingError(msg=msg, errno=2043, sqlstate="02000") from None
        except duckdb.CatalogException as e:
            # minimal processing to make it look like a snowflake exception, message content may differ
            msg = cast(str, e.args[0]).split("\n")[0]
>           raise snowflake.connector.errors.ProgrammingError(msg=msg, errno=2003, sqlstate="42S02") from None
E           snowflake.connector.errors.ProgrammingError: 002003 (42S02): Catalog Error: Table with name CATALOGS does not exist!

I suspect fakesnow isn't handling the information_schema.catalogs table properly.

Support for loading data from stages

Hi,

Would it be possible to support loading of data from a stage? (https://docs.snowflake.com/en/user-guide/data-load-considerations-load)

I am not sure from the docs if this would be even inside the scope of this project.

However it would help us a lot to test our complete ETL workflow.

VARCHAR(100) missing character/octet length

fakesnow 0.2.0

I create:

CREATE TABLE IF NOT EXISTS test_types (
    TEST_BIGINT BIGINT,
    TEST_INTEGER INTEGER,
    TEST_SMALLINT SMALLINT,
    TEST_FLOAT8 DOUBLE PRECISION,
    TEST_FLOAT4 REAL,
    TEST_BOOLEAN BOOLEAN,
    TEST_VARCHAR VARCHAR(100),
    TEST_STRING STRING,
    TEST_TEXT TEXT,
    TEST_CHAR CHAR(10),
    TEST_NVARCHAR NVARCHAR(100),
    TEST_NVARCHAR2 NVARCHAR(100),
    TEST_CHAR_VARYING CHAR VARYING(100),
    TEST_NCHAR_VARYING NCHAR VARYING(100),
    TEST_NCHAR NCHAR(100),
    TEST_CHARACTER CHARACTER(100),
    TEST_BINARY BINARY,
    TEST_VARBINARY VARBINARY,
    TEST_BLOB BLOB,
    TEST_DATE DATE,
    TEST_TIMESTAMP TIMESTAMP,
    TEST_DATETIME DATETIME,
    TEST_TIME TIME,
    TEST_DECIMAL DECIMAL(10,2),
    TEST_NUMERIC NUMERIC(10,2),
    TEST_NUMBER NUMBER(10,2),
    TEST_TINYINT TINYINT,
    TEST_BYTEINT BYTEINT
);

I select information_schema.columns for TEST_VARCHAR:

{'TABLE_CATALOG': 'TESTDB', 'TABLE_SCHEMA': 'PUBLIC', 'TABLE_NAME': 'TEST_TYPES', 'COLUMN_NAME': 'TEST_VARCHAR', 'ORDINAL_POSITION': 7, 'COLUMN_DEFAULT': None, 'IS_NULLABLE': 'YES', 'DATA_TYPE': 'VARCHAR', 'CHARACTER_MAXIMUM_LENGTH': None, 'CHARACTER_OCTET_LENGTH': None, 'NUMERIC_PRECISION': None, 'NUMERIC_PRECISION_RADIX': None, 'NUMERIC_SCALE': None, 'DATETIME_PRECISION': None, 'INTERVAL_TYPE': None, 'INTERVAL_PRECISION': None, 'CHARACTER_SET_CATALOG': None, 'CHARACTER_SET_SCHEMA': None, 'CHARACTER_SET_NAME': None, 'COLLATION_CATALOG': None, 'COLLATION_SCHEMA': None, 'COLLATION_NAME': None, 'DOMAIN_CATALOG': None, 'DOMAIN_SCHEMA': None, 'DOMAIN_NAME': None, 'UDT_CATALOG': None, 'UDT_SCHEMA': None, 'UDT_NAME': None, 'SCOPE_CATALOG': None, 'SCOPE_SCHEMA': None, 'SCOPE_NAME': None, 'MAXIMUM_CARDINALITY': None, 'DTD_IDENTIFIER': None, 'IS_SELF_REFERENCING': None, 'IS_IDENTITY': None, 'IDENTITY_GENERATION': None, 'IDENTITY_START': None, 'IDENTITY_INCREMENT': None, 'IDENTITY_MAXIMUM': None, 'IDENTITY_MINIMUM': None, 'IDENTITY_CYCLE': None, 'IS_GENERATED': None, 'GENERATION_EXPRESSION': None, 'IS_UPDATABLE': None}

I expect the max length/character octet length to be set since I specified VARCHAR(100) for TEST_VARCHAR.

NOTE: I haven not verified the behavior of Snowflake itself. Perhaps it really doesn't fill these fields in? PG definitely fills in CHARACTER_OCTET_LENGTH.

CREATE SEQUENCE fails with cursor description

from snowflake.sqlalchemy import URL
import sqlalchemy
import fakesnow

with fakesnow.patch():
    engine = sqlalchemy.create_engine(
        URL(
            account="abc123",
            user="testuser1",
            password="0123456",
            database="testdb",
            schema="public",
        )
    )

    with engine.connect() as con:
        print(con.execute("CREATE SEQUENCE insertion_id_seq START WITH 1").fetchall())

related to #40, the cursor description fails.

Add `INTEGER` to as_result_metadata

On 0.4.1, I create this table:

CREATE TABLE IF NOT EXISTS test_types (
    TEST_BIGINT BIGINT,
    TEST_INTEGER INTEGER,
    TEST_SMALLINT SMALLINT,
    TEST_FLOAT FLOAT,
    TEST_FLOAT8 FLOAT8,
    TEST_FLOAT4 FLOAT4,
    TEST_DOUBLE FLOAT,
    TEST_DOUBLE_PRECISION DOUBLE PRECISION,
    TEST_REAL REAL,
    TEST_BOOLEAN BOOLEAN,
    TEST_VARCHAR VARCHAR(100),
    TEST_STRING STRING,
    TEST_TEXT TEXT,
    TEST_CHAR CHAR(10),
    TEST_NVARCHAR NVARCHAR(100),
    TEST_NVARCHAR2 NVARCHAR(100),
    TEST_CHAR_VARYING CHAR VARYING(100),
    TEST_NCHAR_VARYING NCHAR VARYING(100),
    TEST_NCHAR NCHAR(100),
    TEST_CHARACTER CHARACTER(100),
    TEST_BINARY BINARY,
    TEST_VARBINARY VARBINARY,
    TEST_BLOB BLOB,
    TEST_DATE DATE,
    TEST_TIMESTAMP TIMESTAMP,
    TEST_DATETIME DATETIME,
    TEST_TIME TIME,
    TEST_DECIMAL DECIMAL(10,2),
    TEST_NUMERIC NUMERIC(10,2),
    TEST_NUMBER NUMBER(10,2),
    TEST_TINYINT TINYINT,
    TEST_BYTEINT BYTEINT
);

I then open a cursor and call cursor.description. The call fails with:

recap/readers/dbapi.py:29: in to_recap
    names = [name[0].upper() for name in cursor.description]
.venv/lib/python3.10/site-packages/fakesnow/fakes.py:77: in description
    with self._duck_conn.cursor() as cur:
.venv/lib/python3.10/site-packages/fakesnow/fakes.py:84: in description
    meta = FakeSnowflakeCursor._describe_as_result_metadata(cur.fetchall())  # noqa: SLF001
.venv/lib/python3.10/site-packages/fakesnow/fakes.py:297: in _describe_as_result_metadata
    meta = [
.venv/lib/python3.10/site-packages/fakesnow/fakes.py:298: in <listcomp>
    as_result_metadata(column_name, column_type, null)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

column_name = 'ordinal_position', column_type = 'INTEGER', _ = 'YES'

    def as_result_metadata(column_name: str, column_type: str, _: str) -> ResultMetadata:
        # see https://docs.snowflake.com/en/user-guide/python-connector-api.html#type-codes
        # and https://arrow.apache.org/docs/python/api/datatypes.html#type-checking
        if column_type == "BIGINT":
            return ResultMetadata(
                name=column_name, type_code=0, display_size=None, internal_size=None, precision=38, scale=0, is_nullable=True               # noqa: E501
            )
        elif column_type.startswith("DECIMAL"):
            match = re.search(r'\((\d+),(\d+)\)', column_type)
            if match:
                precision = int(match[1])
                scale = int(match[2])
            else:
                precision = scale = None
            return ResultMetadata(
                name=column_name, type_code=0, display_size=None, internal_size=None, precision=precision, scale=scale, is_nullable=True    # noqa: E501
            )
        elif column_type == "VARCHAR":
            # TODO: fetch internal_size from varchar size
            return ResultMetadata(
                name=column_name, type_code=2, display_size=None, internal_size=16777216, precision=None, scale=None, is_nullable=True      # noqa: E501
            )
        elif column_type == "DOUBLE":
            return ResultMetadata(
                name=column_name, type_code=1, display_size=None, internal_size=None, precision=None, scale=None, is_nullable=True          # noqa: E501
            )
        elif column_type == "BOOLEAN":
            return ResultMetadata(
                name=column_name, type_code=13, display_size=None, internal_size=None, precision=None, scale=None, is_nullable=True         # noqa: E501
            )
        elif column_type == "DATE":
            return ResultMetadata(
                name=column_name, type_code=3, display_size=None, internal_size=None, precision=None, scale=None, is_nullable=True          # noqa: E501
            )
        elif column_type in {"TIMESTAMP", "TIMESTAMP_NS"}:
            return ResultMetadata(
                name=column_name, type_code=8, display_size=None, internal_size=None, precision=0, scale=9, is_nullable=True                # noqa: E501
            )
        else:
            # TODO handle more types
>           raise NotImplementedError(f"for column type {column_type}")
E           NotImplementedError: for column type INTEGER

This code worked on 0.3.0

Merge upsert support?

Hi! Thanks for this awesome tool.

I'm (finally) writing some unit tests for some internal libraries, in which one of the main functions is completing an upsert using the merge feature.

sql= """
    merge into test as dst
    using TMP_TEST_1698864265 as src
        on dst.a = src.a
    when not matched then
        insert (a,b)
            values (src.a,src.b)
    when matched then
        update set a = src.a, b = src.b
"""

cur.execute(sql)

And i get an error like:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/.../3.9.10/envs/aqlib/lib/python3.9/site-packages/fakesnow/fakes.py", line 100, in execute
    return self._execute(command, params, *args, **kwargs)
  File "/home/.../3.9.10/envs/aqlib/lib/python3.9/site-packages/fakesnow/fakes.py", line 164, in _execute
    self._duck_conn.execute(sql, params)
duckdb.ParserException: Parser Error: syntax error at or near "MERGE"
LINE 1: MERGE INTO TEST AS DST USING TMP_TEST_1...

duckdb does not have a MERGE statement (it uses insert-on-conflict), which i presume is the source of this issue. Have you run into this/do you know how you might approach a solution?

For now I am writing out the same merge using INSERT ON CONFLICT depending on the test env, but would be charmed for this to "just work". :)

Info:

python 3.9
fakesnow==0.7.0
duckdb==0.8.1

FakeSnowflakeConnection doesn't have `commit()`

According to PEP-249, Connection objects should have a commit() method:

https://peps.python.org/pep-0249/#commit

When I ran commit() with fakesnow, I got:

AttributeError: 'FakeSnowflakeConnection' object has no attribute 'commit'

Double transpiled `DESCRIBE` queries on `cursor.description`

On any non-cmd queries, _last_sql on cursor is set to transpiled DuckDB query;

fakesnow/fakesnow/fakes.py

Line 320 in b7be898

self._last_sql = result_sql or sql

On .description access, transpiled _last_query is executed with DESCRIBE <..>, thus gets double transpiled here;

fakesnow/fakesnow/fakes.py

Lines 112 to 118 in b7be898

    
           def description(self) -> list[ResultMetadata]: 
        
               # use a separate cursor to avoid consuming the result set on this cursor 
        
               with self._conn.cursor() as cur: 
        
                   cur.execute(f"DESCRIBE {self._last_sql}", self._last_params) 
        
                   meta = FakeSnowflakeCursor._describe_as_result_metadata(cur.fetchall()) 
        
               return meta

SELECT DATE_DIFF('DAY', CAST('2023-04-02' AS DATE), CAST('2023-04-05' AS DATE));
DESCRIBE SELECT DATE_DIFF('CAST('2023-04-05' AS DATE)', CAST('2023-04-02' AS DATE), 'DAY');

import fakesnow
import snowflake.connector

q = """SELECT DATEDIFF( DAY, '2023-04-02'::DATE, '2023-04-05'::DATE)"""

with fakesnow.patch():
    conn = snowflake.connector.connect()
    with conn.cursor() as cur:
        cur.execute(q)
        rows = cur.fetchall()
        desc = cur.description

        print(rows)
        print(desc)

python-3.11 ; FAKESNOW_DEBUG=true python3 main.py
SELECT DATE_DIFF('DAY', CAST('2023-04-02' AS DATE), CAST('2023-04-05' AS DATE));
DESCRIBE SELECT DATE_DIFF('CAST('2023-04-05' AS DATE)', CAST('2023-04-02' AS DATE), 'DAY');
Traceback (most recent call last):
  File "/private/tmp/mm/main.py", line 11, in <module>
    desc = cur.description
           ^^^^^^^^^^^^^^^
  File "/private/tmp/mm/.direnv/python-3.11/lib/python3.11/site-packages/fakesnow/fakes.py", line 115, in description
    cur.execute(f"DESCRIBE {self._last_sql}", self._last_params)
  File "/private/tmp/mm/.direnv/python-3.11/lib/python3.11/site-packages/fakesnow/fakes.py", line 129, in execute
    return self._execute(command, params, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/tmp/mm/.direnv/python-3.11/lib/python3.11/site-packages/fakesnow/fakes.py", line 219, in _execute
    self._duck_conn.execute(sql, params)
duckdb.duckdb.ParserException: Parser Error: syntax error at or near "2023"

Python: 3.11.5
sqlglot: 21.2.1
fakesnow: 0.9.6

I did a quick hack to walk around this but not sure if this was on on purpose and I'm not missing anything;

     def _execute(
         self,
         command: str,
         params: Sequence[Any] | dict[Any, Any] | None = None,
+        _describe_only: bool = False,
         *args: Any,
         **kwargs: Any,
     ) -> FakeSnowflakeCursor:
# ...
         try:
-            self._duck_conn.execute(sql, params)
+            q = sql if not _describe_only else command
+            self._duck_conn.execute(q, params)
         except duckdb.BinderException as e:

Aggregate function with over clause handled incorrectly

MCVE in Snowflake:

CREATE OR REPLACE TEMPORARY TABLE TEST (ID BIGINT, COL VARCHAR, ANOTHER VARCHAR);
INSERT INTO TEST (ID, COL, ANOTHER) VALUES (1, 's1', 'c1'),(1, 's2', 'c1'), (1, 's3', 'c1'), (2, 's1', 'c2'), (2,'s2','c2');

SELECT DISTINCT
    ID
    , ANOTHER
    , ARRAY_AGG(DISTINCT COL) OVER(PARTITION BY ID) AS COLS
FROM TEST;

ID	ANOTHER	COLS
1	c1	[ "s1", "s2", "s3" ]
2	c2	[ "s1", "s2" ]

MCVE in fakesnow:

import fakesnow
import snowflake.connector


def main():
    with fakesnow.patch():
        conn = snowflake.connector.connect(database="X", schema="Y")
        conn.execute_string("CREATE OR REPLACE TEMPORARY TABLE TEST (ID BIGINT, COL VARCHAR, ANOTHER VARCHAR)")
        conn.execute_string("INSERT INTO TEST (ID, COL, ANOTHER) VALUES (1, 's1', 'c1'),(1, 's2', 'c1'),(1, 's3', 'c1'),(2, 's1', 'c2'), (2,'s2','c2')")
        conn.cursor().execute("""
        SELECT DISTINCT
            ID
            , ANOTHER
            , ARRAY_AGG(DISTINCT COL) OVER(PARTITION BY ID) AS COLS
        FROM TEST;
        """).fetchall()


if __name__ == "__main__":
    main()

Error:

$ FAKESNOW_DEBUG=1 python3 standalone.py 

CREATE OR REPLACE TEMPORARY TABLE TEST (ID BIGINT, COL TEXT, ANOTHER TEXT);
INSERT INTO TEST (ID, COL, ANOTHER) VALUES (1, 's1', 'c1'), (1, 's2', 'c1'), (1, 's3', 'c1'), (2, 's1', 'c2'), (2, 's2', 'c2');
SELECT DISTINCT ID, ANOTHER, TO_JSON(ARRAY_AGG(DISTINCT COL)) OVER (PARTITION BY ID) AS COLS FROM TEST;
Traceback (most recent call last):
[...]
    raise snowflake.connector.errors.ProgrammingError(msg=msg, errno=2003, sqlstate="42S02") from None
snowflake.connector.errors.ProgrammingError: 002003 (42S02): Catalog Error: to_json is not an aggregate function

Version:

$ pip freeze | grep snow
fakesnow==0.9.12
snowflake-connector-python==3.2.1
snowflake-sqlalchemy==1.5.0

Redirecting Queries from a Third-Party Service to 'fakesnow': Is it Possible?

I am currently utilizing a third-party service that generates queries for Snowflake. I'm interested in redirecting these queries to the 'fakesnow' project. Specifically, I want to proxy all the Snowflake requests to 'fakesnow'.

The purpose of this operation is to enable end-to-end tests of the aforementioned service to run locally, rather than on AWS. Could you provide any guidance or steps on how to accomplish this?

Upgrade SQLGlot dependency to support BYTEINT

SQLGlot was missing BYTEINT for Snowfalke dialect.

This PR fixes that:

tobymao/sqlglot#1819

Once it's released, fakesnow should pull in the latest changes in ~=16

cursor description fails with "SHOW TABLES"

the following throws an error for me

from snowflake.sqlalchemy import URL
import sqlalchemy
import fakesnow

with fakesnow.patch():
    engine = sqlalchemy.create_engine(
        URL(
            account="abc123",
            user="testuser1",
            password="0123456",
            database="testdb",
            schema="public",
        )
    )
    print(engine.execute("SHOW TABLES").fetchone())

duckdb.duckdb.ParserException: Parser Error: syntax error at or near "TABLES"
LINE 1: DESCRIBE SHOW TABLES

Executing the same command with the curser like so:

import fakesnow
import snowflake.connector

with fakesnow.patch():
    conn = snowflake.connector.connect()

    print(conn.cursor().execute("SHOW TABLES").fetchone())

works just fine.

We ran into this error because our code creates tables using sqlalchemy declarative base, which internally seems to run similar commands. (This also showed a problem with auto increment but I will open a separate ticket for this).

Dependency resolution causes AttributeError: 'pyarrow.lib.RecordBatch' object has no attribute 'to_pylist'

During dependency resolution we backtrack and resolve to snowflake-connector-python==2.6.1 and pyarrow==5.0.0:

make install

Collecting snowflake-connector-python!=3.0.1
  Using cached snowflake_connector_python-3.0.2-cp39-cp39-macosx_11_0_arm64.whl (13.7 MB)
  Using cached snowflake_connector_python-3.0.0-1-cp39-cp39-macosx_11_0_arm64.whl (13.7 MB)
  Using cached snowflake_connector_python-2.9.0-cp39-cp39-macosx_11_0_arm64.whl (13.3 MB)
Collecting pyOpenSSL<23.0.0,>=16.2.0
  Using cached pyOpenSSL-22.1.0-py3-none-any.whl (57 kB)
Collecting snowflake-connector-python!=3.0.1
  Using cached snowflake_connector_python-2.8.3-cp39-cp39-macosx_11_0_arm64.whl (13.3 MB)
  Using cached snowflake_connector_python-2.8.1-cp39-cp39-macosx_11_0_arm64.whl (13.3 MB)
Collecting cryptography<39.0.0,>=3.1.0
  Using cached cryptography-38.0.4-cp36-abi3-macosx_10_10_universal2.whl (5.4 MB)
Collecting snowflake-connector-python!=3.0.1
  Using cached snowflake_connector_python-2.8.0-cp39-cp39-macosx_11_0_arm64.whl (13.3 MB)
  Using cached snowflake_connector_python-2.7.12-cp39-cp39-macosx_11_0_arm64.whl (13.3 MB)
  Using cached snowflake_connector_python-2.7.11-cp39-cp39-macosx_11_0_arm64.whl (13.3 MB)
  Using cached snowflake_connector_python-2.7.9-cp39-cp39-macosx_11_0_arm64.whl (10.3 MB)
...
Collecting snowflake-connector-python[pandas]
  Using cached snowflake_connector_python-3.0.1-cp39-cp39-macosx_11_0_arm64.whl (13.7 MB)
Collecting pyarrow<5.1.0,>=5.0.0
  Using cached pyarrow-5.0.0-cp39-cp39-macosx_11_0_arm64.whl (12.3 MB)

This causes test failures:

self = <fakesnow.fakes.DuckResultBatch object at 0x156b4eb50>, kwargs = {}

    def create_iter(
        self, **kwargs: dict[str, Any]
    ) -> (Iterator[dict | Exception] | Iterator[tuple | Exception] | Iterator[pyarrow.Table] | Iterator[pd.DataFrame]):
        if self._use_dict_result:
>           return iter(self._batch.to_pylist())
E           AttributeError: 'pyarrow.lib.RecordBatch' object has no attribute 'to_pylist'

fakesnow/fakes.py:366: AttributeError

Info schema for VARIANT returns OBJECT

Should return VARIANT

write_pandas malformed JSON when inserting into an array column

  File "../.venv/lib/python3.10/site-packages/fakesnow/fakes.py", line 476, in _insert_df
    self._duck_conn.execute(f"INSERT INTO {table_name}({','.join(df.columns.to_list())}) SELECT {cols} FROM df")
duckdb.ConversionException: Conversion Error: Malformed JSON at byte 0 of input: unexpected character.  Input: Can't log in to

Support information_schema.columns

It would be cool if fakesnow supported the information_schema.columns view in Snowflake, so any table that's created automatically has its columns show up there.

Binder Error: No function matches the given name and argument types 'json_extract(JSON, BOOLEAN)'.

MCVE in Snowflake:

CREATE OR REPLACE TEMPORARY TABLE TBL (COL OBJECT);
INSERT INTO TBL (COL) SELECT ({'K1': {'K2': 1}});

-- 1 row
SELECT * FROM TBL;

-- 1 row
SELECT COL
FROM TBL
WHERE COL:K1:K2 > 0;

MCVE failing wih fakesnow:

import fakesnow
import snowflake.connector


def main():
    with fakesnow.patch():
        conn = snowflake.connector.connect(database="X", schema="Y")
        conn.execute_string("CREATE TEMPORARY TABLE TBL (COL OBJECT)")
        conn.execute_string("INSERT INTO TBL (COL) SELECT ({'K1': {'K2': 1}})")
        row = conn.cursor().execute("SELECT * FROM TBL").fetchone()
        assert row[0] == '{"K1":{"K2":1}}'
        sql = """
            SELECT COL
            FROM TBL
            WHERE COL:K1:K2 > 0
        """
        conn.execute_string(sql)


if __name__ == "__main__":
    main()

Error:

snowflake.connector.errors.ProgrammingError: 002043 (02000): Binder Error: No function matches the given name and argument types 'json_extract(JSON, BOOLEAN)'. You might need to add explicit type casts.
	Candidate functions:
	json_extract(VARCHAR, BIGINT) -> JSON
	json_extract(VARCHAR, VARCHAR) -> JSON
	json_extract(VARCHAR, VARCHAR[]) -> JSON[]
	json_extract(JSON, BIGINT) -> JSON
	json_extract(JSON, VARCHAR) -> JSON
	json_extract(JSON, VARCHAR[]) -> JSON[]

Version:

$ pip freeze | grep snow
fakesnow==0.9.4
snowflake-connector-python==3.2.1

FAKESNOW_DEBUG=1:

FAKESNOW_DEBUG=1 python3 repro.py 
CREATE TEMPORARY TABLE TBL (COL JSON);
INSERT INTO TBL (COL) SELECT (TO_JSON({'K1': {'K2': 1}}));
SELECT * FROM TBL;
SELECT COL FROM TBL WHERE COL -> '$.K1' -> '$.K2' > 0;

Describe on variant column returns object

describe on a variant column should return:

        ResultMetadata(name='XVARIANT', type_code=5, display_size=None, internal_size=None, precision=None, scale=None, is_nullable=True)

Enforce fixed length VARCHAR

VARCHAR field sizes are not enforced. Unlike Snowflake which errors with "User character length limit (xxx) exceeded by string" when an inserted string exceeds the column limit.

Mock table

Hello! I wonder if it is already possible to have a way to "mock" tables that we don't really have control over.

Use case is when some tables exists on a schema different from the one we are deploying in, and are referenced by our code.

For tables outside of the schema we control, it will be unpractical to provide a definition for them, so I'm envisioning being able to "mock" tables on a schema so that any query selecting on any table defined on this schema will be successful.

Example:

-- This will fail with Catalog Error: Table with name EXTERNAL_SCHEMA.EXAMPLE_TABLE does not exists!
CREATE VIEW EXAMPLE_VIEW AS
 SELECT COLUMN_A, COLUMN_B 
   FROM EXTERNAL_SCHEMA.EXAMPLE_TABLE

let's say we are able to define a mock for schema EXTERNAL_SCHEMA

-- This results in success
CREATE VIEW EXAMPLE_VIEW AS
 SELECT COLUMN_A, COLUMN_B 
   FROM EXTERNAL_SCHEMA.EXAMPLE_TABLE

`paramstyle` is not propagated from original client

Original client have an option to control paramstyle, it is intended to use it like this:

snowflake.connector.paramstyle = "numeric"

I see there is a support, but not an option to enable this mode. This works connection._paramstyle = "numeric", but the code differs in this case for fake and real connections.

Create table like existing not supported

Snowflake supports creating a table based on an existing table via LIKE

CREATE OR REPLACE TEMP TABLE new_table LIKE existing_table

Since this is not supported in duckdb, sqlglot does not transpile it. Maybe fakesnow should at least create better error messages in such cases?

Support for `SHOW USERS`

I wonder if it would be possible/practical to have provisional support for CRUD on roles/users? I was sort of imagining part of setup would create an information_schema.fakesnow_users table (since there's no information_schema table that i can see, unlike other SHOW commands that do have public backing tables) that matches the structure given by SHOW USERS.

Then SHOW USERS could function, even if it only returned an empty table.

But at that point, basic CREATE/UPDATE/DELETE USER/ROLE statements could probably be supported by inserting into that table.

If you'd support such an endeavor, I'd be interested in supplying the impl.

I believe I see that sqlglot would need to gain SHOW USERS syntactically. At which point I could implement up to SHOW USERS. Beyond that, my usecase could probably live with manually manipulating fakesnow_users, although certainly it'd be nice to keep going.

fwiw, my usecase is for adding snowflake support to https://github.com/DanCardin/sqlalchemy-declarative-extensions, wherein my usual e2e testing mechanism for the rest of the library is impractical for snowflake. Developing against a very rough approximation of users isn't ideal, but it seems better than the manual patching i would otherwise be forced to do.

	def description(self) -> list[ResultMetadata]:
	# use a separate cursor to avoid consuming the result set on this cursor
	with self._conn.cursor() as cur:
	cur.execute(f"DESCRIBE {self._last_sql}", self._last_params)
	meta = FakeSnowflakeCursor._describe_as_result_metadata(cur.fetchall())

	return meta

tekumara / fakesnow Goto Github PK

fakesnow's Introduction

fakesnow ❄️

Install

Usage

fakesnow.patch

pytest fixtures

Implementation coverage

Caveats

Contributing

fakesnow's People

Contributors

Stargazers

Watchers

Forkers

fakesnow's Issues

Info:

Recommend Projects

Recommend Topics

Recommend Org