Coder Social home page Coder Social logo

foundry-dev-tools's People

Contributors

bernhardschaefer avatar chetesta avatar dgleish avatar jonas-w avatar juliecious avatar kochc avatar nicornk avatar nilskch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

foundry-dev-tools's Issues

[Feature Request]: RID <=> Path Conversion for folders

Issue checklist

  • This is a feature request/enhancement. And not a bug.
  • I searched through the GitHub issues and this feature/enhancement has not been requested before.
  • I have installed the latest version of Foundry DevTools and don't use an unsupported python version.
  • Others could also benefit from this feature or enhancement and it is not a very specific use case.

Feature use case

While functionality for getting all children inside a folder exists (FoundryRestClient.get_child_objects_of_folder()), the function requires a folder RID and cannot work with a path. Thus, a conversion function path -> RID for folders would be very handy. Such a function does not exist. However, apparently FoundryRestClient.get_dataset_rid() and FoundryRestClient.get_dataset_path() also seem to work for folders (at least in a test on my side) despite their name not suggesting this.

Description of the Feature

It would be desirable to add functions for folders (or even more general for conversion path <=> RID for every object type if possible).

Furthermore, it might make sense to allow functions like FoundryRestClient.get_child_objects_of_folder() that only accept an RID to also accept paths. I assume, it should be simple to distinguish inside the function wether a rid or a path was supplied and convert accordingly automatically. This could make the API more lean and simplify usage.

Alternatives you considered

Instead of using string paths or RIDs in the API, one could also think about a dedicated class for the identifiers similar to the following (which is just a thought, not engineered to the end):

class Resource:
  def __init__(self, fc: FoundryRestClient):
      self._fc = fc
      self._rid = None
      self._path = None

  @classmethod
  def from_rid(rid: string, fc: FoundryRestClient):
    r = Resource(fc)
    r._rid = rid
    return r

  @classmethod
  def from_path(path: string, fc: FoundryRestClient):
    r = Resource(fc)
    r._path = path
    return r

  @property
  def rid(self):
    if self._rid is None:
        assert(self._path is not None)
        self._rid = self._fc.path_to_rid(self._path)
    return self._rid

  @property
  def rid(self):
    if self._rid is None:
        assert(self._path is not None)
        self._rid = self._fc.path_to_rid(self._path)
    return self._rid

  @property
  def path(self):
    if self._path is None:
        assert(self._rid is not None)
        self. _path = self._fc.rid_to_path(self._rid)
    return self._path

Additional Context

No response

[Bug]: fdt build throws exception when user logs with format other than %s (e.g. %d)

Issue checklist

  • This is a bug in Foundry DevTools and not a bug in another project. It is also not an enhancement/feature request
  • I searched through the GitHub issues and this issue has not been opened before.
  • I have installed the latest version of Foundry DevTools and don't use an unsupported python version.

Description of the bug

I have custom logging statements in a transform and fdt build throws an error (see below) when these log statements contain %d formatters.
I think the issue is in build.py#L75 where all logging params are converted to string.

Steps to reproduce this bug.

Should be sufficient to create a transform with a log statement such as logging.info("hello %d", 3)

Log output

[08:37:14] ERROR    fdt build >>> This shouldn't happen, but while parsing the log message this error occured: %d format: a real number is required, not str              build.py:152
                                                                                                                                                                                      
           ERROR    The traceback:                                                                                                                                        build.py:156
                    Traceback (most recent call last):                                                                                                                                
                      File "/opt/homebrew/Caskroom/mambaforge/base/envs/myenv/lib/python3.11/site-packages/foundry_dev_tools/cli/build.py", line 150, in                     
                    tail_job_log                                                                                                                                                      
                        log.handle(create_log_record(log_message))                                                                                                                    
                      File "/opt/homebrew/Caskroom/mambaforge/base/envs/myenv/lib/python3.11/logging/__init__.py", line 1644, in handle                                      
                        self.callHandlers(record)                                                                                                                                     
                      File "/opt/homebrew/Caskroom/mambaforge/base/envs/myenv/lib/python3.11/logging/__init__.py", line 1706, in callHandlers                                
                        hdlr.handle(record)                                                                                                                                           
                      File "/opt/homebrew/Caskroom/mambaforge/base/envs/myenv/lib/python3.11/logging/__init__.py", line 978, in handle                                       
                        self.emit(record)                                                                                                                                             
                      File "/opt/homebrew/Caskroom/mambaforge/base/envs/myenv/lib/python3.11/site-packages/rich/logging.py", line 128, in emit                               
                        message = self.format(record)                                                                                                                                 
                                  ^^^^^^^^^^^^^^^^^^^                                                                                                                                 
                      File "/opt/homebrew/Caskroom/mambaforge/base/envs/myenv/lib/python3.11/logging/__init__.py", line 953, in format                                       
                        return fmt.format(record)                                                                                                                                     
                               ^^^^^^^^^^^^^^^^^^                                                                                                                                     
                      File "/opt/homebrew/Caskroom/mambaforge/base/envs/myenv/lib/python3.11/logging/__init__.py", line 687, in format                                       
                        record.message = record.getMessage()                                                                                                                          
                                         ^^^^^^^^^^^^^^^^^^^                                                                                                                          
                      File "/opt/homebrew/Caskroom/mambaforge/base/envs/myenv/lib/python3.11/logging/__init__.py", line 377, in getMessage                                   
                        msg = msg % self.args                                                                                                                                         
                              ~~~~^~~~~~~~~~~                                                                                                                                         
                    TypeError: %d format: a real number is required, not str                                                                                                          
           INFO     fdt build >>> Will output the log message in plain:                                                                                                   build.py:157
                                                                                                                                                                                      
           INFO     {"type": "service.1", "level": "INFO", "origin": "python:root", "thread": "Thread-3", "message": "Retrieved %d items for title %s", "time":           build.py:160
                    "2023-11-29T07:37:14.231Z", "tags": {"jobRid": "ri.foundry.main.job.asdf"}, "params": {}, "unsafeParams": {"param_0":             
                    514, "param_1": "My Title"}}                                                                                                        
           ERROR    fdt build >>> This shouldn't happen, but while parsing the log message this error occured: %d format: a real number is required, not str              build.py:152

Additional context

No response

Operating System

MacOS

Your python version

Python 3.11.6

Export foundry-dev-tool errors such as DatasetHasOpenTransactionError

Issue checklist

  • This is a feature request/enhancement. And not a bug.
  • I searched through the GitHub issues and this feature/enhancement has not been requested before.
  • I have installed the latest version of Foundry DevTools and don't use an unsupported python version.
  • Others could also benefit from this feature or enhancement and it is not a very specific use case.

Feature use case

The export of Error types would make it possible to catch raised errors and handle them accordingly.

Description of the Feature

An import of the errors like

from foundry_dev_tools import DatasetHasOpenTransactionError

try:
   # something
except DatasetHasOpenTransactionError as e:
  # something else
except:
  # whatever

would be great.

Alternatives you considered

An import of the errors like

from foundry_dev_tools import Errors

try:
   # something
except Errors.DatasetHasOpenTransactionError as e:
  # something else
except:
  # whatever

would be great.

Additional Context

I

_py4j.protocol.Py4JError: An error occurred while calling None.org.apache.spark.sql.SparkSession. Trace: py4j.Py4JException: Constructor org.apache.spark.sql.SparkSession([class org.apache.spark.SparkContext, class java.util.HashMap]) does not exist_

Issue checklist

  • This is not a bug or a feature/enhancement request.
  • I searched through the GitHub issues and this issue has not been opened before.

Issue

Hi all, I tried to run Foundry codes locally on Windows, but I am getting the following error:

py4j.protocol.Py4JError: An error occurred while calling None.org.apache.spark.sql.SparkSession. Trace:
py4j.Py4JException: Constructor org.apache.spark.sql.SparkSession([class org.apache.spark.SparkContext, class java.util.HashMap]) does not exist

  • Pyspark and Py4j are installed by the package provided in the repository and they look fine.
  • I am using transform_pandas decorator.
  • Pyspark related environment variables are set.
  • same spark and pyspark versions are used(3.4.1)
  • Pyspark looks to work(running pyspark --version is returning the version)

but wondering what could be the reason.

[Bug]: CachedFoundryClient.sava_dataset with APPEND mode option changes the data type, which causes error in foundry dataset.

Issue checklist

  • This is a bug in Foundry DevTools and not a bug in another project. It is also not an enhancement/feature request
  • I searched through the GitHub issues and this issue has not been opened before.
  • I have installed the latest version of Foundry DevTools and don't use an unsupported python version.

Description of the bug

Hi,

I have used CachedFoundryClient tools to update the foundry database with APPEND mode option. I checked with the SNAPSHOT option first to check whether the data is uploaded, and it works well. But when I changed it to APPEND option - so that I am uploading only the new data, I got the error of "SchemaColumnConvertNotSupportedException" on foundry, and we figured that the new file appended has changed their data type from Double to FLOAT - while the original file still have DOUBLE type, so there was a crash due to this data type difference. This data type change did not happen with SNAPSHOT option.
What we also figured was that when we use this APPEND option in local environment, it works, but when we combine the code into the code, then this data type error happens. Is there any way to resolve this issue? :) Your help would be much appreciated!
Thanks a lot in advance.

Best Regards,
Bora Lee

Steps to reproduce this bug.

Hi,
As the code contains all the information mentioned above, I think it would be better to share the code through internal platform!
Thanks,
Best,
Bora

Log output

Additional context

No response

Operating System

Windows

Your python version

Python 3.10.5

[Bug]: Transform runner does not work if one of the inputs is a Object Storage V2 materialization

Issue checklist

  • This is a bug in Foundry DevTools and not a bug in another project. It is also not an enhancement/feature request
  • I searched through the GitHub issues and this issue has not been opened before.
  • I have installed the latest version of Foundry DevTools and don't use an unsupported python version.

Description of the bug

I tried running transforms as described in the docs, and the package works wonderfully. However, the package fails if one of the transform inputs is a materialization of a Object Storage V2 table (writeback). I guess the reason is that in V2, materializations aren't datasets anymore but views. As such, they probably require another endpoint to read the data from.

Steps to reproduce this bug.

I have a transform such as:

from pyspark.sql import DataFrame
from transforms.api import transform_df, Input, Output

@transform_df(
    Output("..."),
    input_df=Input("<RID-of-V2-materialization>"),
)
def compute(input_df: DataFrame) -> DataFrame:
    return input_df

if __name__ == "__main__":
    compute.compute().show()

Running the script it fails with the error below.

Log output

<removed>\venv\Scripts\python.exe C:/srdev/projects/am_projection/transforms-python/src/am_projection/transforms/quarterly_projection.py
c:\srdev\projects\foundrycommons\src\stargatecommons\df_utils\dataset_metadata.py:12: UserWarning: Cannot import foundry transform package locally, skipping. Some functionality might not work as expected.
  warnings.warn(
<removed>\venv\lib\site-packages\pyspark\sql\pandas\utils.py:24: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
  from distutils.version import LooseVersion
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
500 Server Error: Internal Server Error for url: https://stargate.swissre.com/foundry-data-proxy/api/dataproxy/datasets/ri.foundry.main.dataset.d129cc10-11a3-4cb6-9978-4ff56482f5f8/views/master/edits_empty_file_edb25a13-5503-438b-8bfc-635ce144c929
{"errorCode":"INTERNAL","errorName":"Default:Internal","errorInstanceId":"44e9cce6-146c-40bc-810a-e9fc9b938158","parameters":{}}
Traceback (most recent call last):
  File "<removed>\transforms-python\src\am_projection\transforms\quarterly_projection.py", line 13, in <module>
    assumptions=Input("ri.foundry.main.dataset.d129cc10-11a3-4cb6-9978-4ff56482f5f8"),
  File "<removed>\venv\lib\site-packages\transforms\api\_dataset.py", line 93, in __init__
    self._spark_df, self._dataset_identity, self.branch = self._online(
  File "<removed>\venv\lib\site-packages\transforms\api\_dataset.py", line 123, in _online
    self._retrieve_spark_df(dataset_identity, branch),
  File "<removed>\venv\lib\site-packages\transforms\api\_dataset.py", line 160, in _retrieve_spark_df
    return self._retrieve_from_foundry_and_cache(dataset_identity, branch)
  File "<removed>\venv\lib\site-packages\transforms\api\_dataset.py", line 205, in _retrieve_from_foundry_and_cache
    spark_df = self._cached_client.load_dataset(
  File "<removed>\venv\lib\site-packages\foundry_dev_tools\cached_foundry_client.py", line 80, in load_dataset
    _, dataset_identity = self.fetch_dataset(dataset_path_or_rid, branch)
  File "<removed>\venv\lib\site-packages\foundry_dev_tools\cached_foundry_client.py", line 116, in fetch_dataset
    self._download_dataset_and_return_local_path(
  File "<removed>\venv\lib\site-packages\foundry_dev_tools\cached_foundry_client.py", line 148, in _download_dataset_and_return_local_path
    self._download_dataset_to_cache_dir(dataset_identity, branch, foundry_schema)
  File "<removed>\venv\lib\site-packages\foundry_dev_tools\cached_foundry_client.py", line 175, in _download_dataset_to_cache_dir
    self.api.download_dataset_files(**params)
  File "<removed>\venv\lib\site-packages\foundry_dev_tools\foundry_api_client.py", line 1195, in download_dataset_files
    self.download_dataset_file(
  File "<removed>\venv\lib\site-packages\foundry_dev_tools\foundry_api_client.py", line 1136, in download_dataset_file
    resp = self._download_dataset_file(
  File "<removed>\venv\lib\site-packages\foundry_dev_tools\foundry_api_client.py", line 1155, in _download_dataset_file
    _raise_for_status_verbose(response)
  File "<removed>\venv\lib\site-packages\foundry_dev_tools\foundry_api_client.py", line 2067, in _raise_for_status_verbose
    raise error
  File "<removed>\venv\lib\site-packages\foundry_dev_tools\foundry_api_client.py", line 2056, in _raise_for_status_verbose
    response.raise_for_status()
  File "<removed>\venv\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://stargate.swissre.com/foundry-data-proxy/api/dataproxy/datasets/ri.foundry.main.dataset.d129cc10-11a3-4cb6-9978-4ff56482f5f8/views/master/edits_empty_file_edb25a13-5503-438b-8bfc-635ce144c929

Additional context

pyspark 3.2.0

Operating System

Windows

Your python version

3.10.4

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.