Coder Social home page Coder Social logo

typedspark's People

Contributors

dependabot[bot] avatar hahamark1 avatar jana-starkova avatar jonmclean avatar marijncv avatar nanne-aben avatar renovate[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

typedspark's Issues

Idea: Schema code generator

This project looks great, and might be what I'm missing on my projects, what would be great, to speed up adoption and integration, would be an interface we can use to generate stubs for schemas, given a spark dataframe as an input e.g:

from typedspark.helpers import generate_stubs


my_person_df = get_person()
print(generate_stubs(my_person_df))

This would really help on big projects with lots of data sources that need schemas (including mine)

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Repository problems

These problems occurred while renovating this repository. View logs.

  • WARN: File contents are invalid JSON but parse using JSON5. Support for this will be removed in a future release so please change to a support .json5 file name or ensure correct JSON syntax.

This repository currently has no open or pending branches.

Detected dependencies

github-actions
.github/workflows/build.yml
  • actions/checkout v4
  • actions/setup-java v4
  • vemonet/setup-spark v1
  • actions/setup-python v5
.github/workflows/publish.yml
  • actions/checkout v4
  • actions/setup-python v5
.github/workflows/semgrep.yml
  • actions/checkout v4
  • returntocorp/semgrep-action 713efdd345f3035192eaa63f56867b88e63e4e5d
  • github/codeql-action v3
pip_requirements
requirements-dev.txt
  • pyspark ==3.5.1
  • flake8 ==7.1.0
  • pylint ==3.2.3
  • bandit ==1.7.9
  • black ==24.4.2
  • isort ==5.13.2
  • docformatter ==1.7.5
  • mypy ==1.10.0
  • pyright ==1.1.368
  • autoflake ==2.3.1
  • pandas-stubs ==2.2.2.240603
  • types-setuptools ==70.0.0.20240524
  • pytest ==8.2.2
  • coverage ==7.5.4
  • pandas ==2.2.2
  • setuptools ==70.1.0
  • chispa ==0.10.0
  • nbconvert ==7.16.4
  • jupyter ==1.0.0
  • nbformat ==5.10.4
  • sphinx ==7.3.7
  • sphinx-rtd-theme ==2.0.0
  • nbsphinx ==0.9.4
  • pre-commit ==3.7.1
requirements.txt
  • typing-extensions <=4.12.2
pip_setup
setup.py
  • setuptools-git-versioning >=2.0,<3

  • Check this box to trigger a request for Renovate to run again on this repository

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Location: renovate.json
Error type: Invalid JSON (parsing failed)
Message: Syntax error: expecting end of expression or separator near ] "auto

Using Generics with typed DataFrame

Hi,

Thank you for this project, really helpful for people using type hints!
I'd like to know if there's a way to annotate a Struct column that can have a varying schema?
All examples I've seen in docs indicate a Struct column needs to have a particular schema.

For example, I'd like to have a Resource[T] dataframe, with a struct column resource_properties of type T, such that T is a Python TypeVar, or at the very least be able to have a column with type Any so that the type linter ignores it and the developers will know how to treat the values.

unclear how to use union types

I wanted to have a function accept a Union[] of Schema, but when I try that I hit this error:

TypeError: issubclass() arg 1 must be a class

Is there a way to express this?

Typedspark does not work with Python 3.11.9

The unit tests currently don't pass for Python 3.11.9. As a temporary fix, the ci/cd is constrained to use 3.11.8 for now.

Interestingly, the other supported versions (3.9, 3.10, 3.12) work without problems.

I'll debug the problem later. Currently, I can't install Python 3.11.9 with pyenv.

Support DataSet cache operations

It would be nice for the DataSet to support the Spark cache operations to support this use-case:

cached_ds: DataSet[A] = original_ds.cache()

These cache operations return a PySpark DataFrame:

  • DataFrame.cache()
  • DataFrame.persist()
  • DataFrame.unpersist()

Contribution request

I would like to contribute on this organisation,
If you have any bugs or want the implementation of the feature then I'm available.

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.