kaiko-ai / typedspark Goto Github PK
View Code? Open in Web Editor NEWColumn-wise type annotations for pyspark DataFrames
License: Apache License 2.0
Column-wise type annotations for pyspark DataFrames
License: Apache License 2.0
This project looks great, and might be what I'm missing on my projects, what would be great, to speed up adoption and integration, would be an interface we can use to generate stubs for schemas, given a spark dataframe as an input e.g:
from typedspark.helpers import generate_stubs
my_person_df = get_person()
print(generate_stubs(my_person_df))
This would really help on big projects with lots of data sources that need schemas (including mine)
This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.
These problems occurred while renovating this repository. View logs.
This repository currently has no open or pending branches.
.github/workflows/build.yml
actions/checkout v4
actions/setup-java v4
vemonet/setup-spark v1
actions/setup-python v5
.github/workflows/publish.yml
actions/checkout v4
actions/setup-python v5
.github/workflows/semgrep.yml
actions/checkout v4
returntocorp/semgrep-action 713efdd345f3035192eaa63f56867b88e63e4e5d
github/codeql-action v3
requirements-dev.txt
pyspark ==3.5.1
flake8 ==7.1.0
pylint ==3.2.3
bandit ==1.7.9
black ==24.4.2
isort ==5.13.2
docformatter ==1.7.5
mypy ==1.10.0
pyright ==1.1.368
autoflake ==2.3.1
pandas-stubs ==2.2.2.240603
types-setuptools ==70.0.0.20240524
pytest ==8.2.2
coverage ==7.5.4
pandas ==2.2.2
setuptools ==70.1.0
chispa ==0.10.0
nbconvert ==7.16.4
jupyter ==1.0.0
nbformat ==5.10.4
sphinx ==7.3.7
sphinx-rtd-theme ==2.0.0
nbsphinx ==0.9.4
pre-commit ==3.7.1
requirements.txt
typing-extensions <=4.12.2
setup.py
setuptools-git-versioning >=2.0,<3
There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.
Location: renovate.json
Error type: Invalid JSON (parsing failed)
Message: Syntax error: expecting end of expression or separator near ] "auto
Currently there is no way to set the nullability of a field and when calling the get_structtype function to get the schema the nullability is hardcoded as true.
Hi,
Thank you for this project, really helpful for people using type hints!
I'd like to know if there's a way to annotate a Struct column that can have a varying schema?
All examples I've seen in docs indicate a Struct column needs to have a particular schema.
For example, I'd like to have a Resource[T]
dataframe, with a struct column resource_properties of type T, such that T is a Python TypeVar, or at the very least be able to have a column with type Any
so that the type linter ignores it and the developers will know how to treat the values.
I wanted to have a function accept a Union[]
of Schema
, but when I try that I hit this error:
TypeError: issubclass() arg 1 must be a class
Is there a way to express this?
The unit tests currently don't pass for Python 3.11.9. As a temporary fix, the ci/cd is constrained to use 3.11.8 for now.
Interestingly, the other supported versions (3.9, 3.10, 3.12) work without problems.
I'll debug the problem later. Currently, I can't install Python 3.11.9 with pyenv.
It would be nice for the DataSet to support the Spark cache operations to support this use-case:
cached_ds: DataSet[A] = original_ds.cache()
These cache operations return a PySpark DataFrame:
I would like to contribute on this organisation,
If you have any bugs or want the implementation of the feature then I'm available.
Thank you
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.