Coder Social home page Coder Social logo

tfx-bsl's Introduction

TFX Basic Shared Libraries

Python PyPI

TFX Basic Shared Libraries (tfx_bsl) contains libraries shared by many TensorFlow eXtended (TFX) components.

Only symbols exported by sub-modules under tfx_bsl/public are intended for direct use by TFX users, including by standalone TFX library (e.g. TFDV, TFMA, TFT) users, TFX pipeline authors and TFX component authors. Those APIs will become stable and follow semantic versioning once tfx_bsl goes beyond 1.0.

APIs under other directories should be considered internal to TFX (and therefore there is no backward or forward compatibility guarantee for them).

Each minor version of a TFX library or TFX itself, if it needs to depend on tfx_bsl, will depend on a specific minor version of it (e.g. tensorflow_data_validation 0.14.* will depend on, and only work with, tfx_bsl 0.14.*)

Installing from PyPI

tfx_bsl is available as a PyPI package.

pip install tfx-bsl

Nightly Packages

TFX-BSL also hosts nightly packages at https://pypi-nightly.tensorflow.org on Google Cloud. To install the latest nightly package, please use the following command:

pip install --extra-index-url https://pypi-nightly.tensorflow.org/simple tfx-bsl

This will install the nightly packages for the major dependencies of TFX-BSL such as TensorFlow Metadata (TFMD).

However it is a dependency of many TFX components and usually as a user you don't need to install it directly.

Build with Docker

If you want to build a TFX component from the master branch, past the latest release, you may also have to build the latest tfx_bsl, as that TFX component might have depended on new features introduced past the latest tfx_bsl release.

Building from Docker is the recommended way to build tfx_bsl under Linux, and is continuously tested at Google.

1. Install Docker

Please first install docker and docker-compose by following the directions.

2. Clone the tfx_bsl repository

git clone https://github.com/tensorflow/tfx-bsl
cd tfx-bsl

Note that these instructions will install the latest master branch of tfx-bsl. If you want to install a specific branch (such as a release branch), pass -b <branchname> to the git clone command.

3. Build the pip package

Then, run the following at the project root:

sudo docker-compose build manylinux2010
sudo docker-compose run -e PYTHON_VERSION=${PYTHON_VERSION} manylinux2010

where PYTHON_VERSION is one of {39}.

A wheel will be produced under dist/.

4. Install the pip package

pip install dist/*.whl

Build from source

1. Prerequisites

Install NumPy

If NumPy is not installed on your system, install it now by following these directions.

Install Bazel

If Bazel is not installed on your system, install it now by following these directions.

2. Clone the tfx_bsl repository

git clone https://github.com/tensorflow/tfx-bsl
cd tfx-bsl

Note that these instructions will install the latest master branch of tfx_bsl If you want to install a specific branch (such as a release branch), pass -b <branchname> to the git clone command.

3. Build the pip package

tfx_bsl wheel is Python version dependent -- to build the pip package that works for a specific Python version, use that Python binary to run:

python setup.py bdist_wheel

You can find the generated .whl file in the dist subdirectory.

4. Install the pip package

pip install dist/*.whl

Supported platforms

tfx_bsl is tested on the following 64-bit operating systems:

  • macOS 10.12.6 (Sierra) or later.
  • Ubuntu 20.04 or later.

Compatible versions

The following table is the tfx_bsl package versions that are compatible with each other. This is determined by our testing framework, but other untested combinations may also work.

tfx-bsl apache-beam[gcp] pyarrow tensorflow tensorflow-metadata tensorflow-serving-api
GitHub master 2.47.0 10.0.0 nightly (2.x) 1.15.0 2.15.1
1.15.1 2.47.0 10.0.0 2.15 1.15.0 2.15.1
1.15.0 2.47.0 10.0.0 2.15 1.15.0 2.15.1
1.14.0 2.47.0 10.0.0 2.13 1.14.0 2.13.0
1.13.0 2.40.0 6.0.0 2.12 1.13.1 2.9.0
1.12.0 2.40.0 6.0.0 2.11 1.12.0 2.9.0
1.11.0 2.40.0 6.0.0 1.15 / 2.10 1.11.0 2.9.0
1.10.0 2.40.0 6.0.0 1.15 / 2.9 1.10.0 2.9.0
1.9.0 2.38.0 5.0.0 1.15 / 2.9 1.9.0 2.9.0
1.8.0 2.38.0 5.0.0 1.15 / 2.8 1.8.0 2.8.0
1.7.0 2.36.0 5.0.0 1.15 / 2.8 1.7.0 2.8.0
1.6.0 2.35.0 5.0.0 1.15 / 2.7 1.6.0 2.7.0
1.5.0 2.34.0 5.0.0 1.15 / 2.7 1.5.0 2.7.0
1.4.0 2.31.0 5.0.0 1.15 / 2.6 1.4.0 2.6.0
1.3.0 2.31.0 2.0.0 1.15 / 2.6 1.2.0 2.6.0
1.2.0 2.31.0 2.0.0 1.15 / 2.5 1.2.0 2.5.1
1.1.0 2.29.0 2.0.0 1.15 / 2.5 1.1.0 2.5.1
1.0.0 2.29.0 2.0.0 1.15 / 2.5 1.0.0 2.5.1
0.30.0 2.28.0 2.0.0 1.15 / 2.4 0.30.0 2.4.0
0.29.0 2.28.0 2.0.0 1.15 / 2.4 0.29.0 2.4.0
0.28.0 2.28.0 2.0.0 1.15 / 2.4 0.28.0 2.4.0
0.27.1 2.27.0 2.0.0 1.15 / 2.4 0.27.0 2.4.0
0.27.0 2.27.0 2.0.0 1.15 / 2.4 0.27.0 2.4.0
0.26.1 2.25.0 0.17.0 1.15 / 2.3 0.27.0 2.3.0
0.26.0 2.25.0 0.17.0 1.15 / 2.3 0.27.0 2.3.0

tfx-bsl's People

Contributors

brills avatar davidxia avatar dhruvesh09 avatar iindyk avatar martinbomio avatar paulgc avatar tangm avatar tfx-copybara avatar zoyahav avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tfx-bsl's Issues

tensorflow-data-validation fails on MacOS Monterey within tfx-bsl

tensorflow 2.6.0 py38h52b2510_1 conda-forge
tensorflow-base 2.6.0 py38h1615122_1 conda-forge
tensorflow-data-validation 1.4.0 pypi_0 pypi
tensorflow-datasets 4.4.0 pypi_0 pypi
tensorflow-estimator 2.6.0 py38h02c4698_1 conda-forge
tensorflow-metadata 1.4.0 pypi_0 pypi
tensorflow-serving-api 2.6.0 pypi_0 pypi
tfx-bsl 1.4.0 pypi_0 pypi

import tensorflow as tf
import tensorflow_data_validation as tfdv
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/folders/x8/54gm3z5n28b76cvlkrl13j_80000gn/T/ipykernel_4855/3507982694.py in <module>
      1 import tensorflow as tf
----> 2 import tensorflow_data_validation as tfdv
      3 
      4 print('TF version: {}'.format(tf.__version__))
      5 print('TFDV version: {}'.format(tfdv.version.__version__))

~/opt/anaconda3/envs/tensorflow-2/lib/python3.8/site-packages/tensorflow_data_validation/__init__.py in <module>
     16 
     17 # Import stats API.
---> 18 from tensorflow_data_validation.api.stats_api import GenerateStatistics
     19 from tensorflow_data_validation.api.stats_api import WriteStatisticsToBinaryFile
     20 from tensorflow_data_validation.api.stats_api import WriteStatisticsToTFRecord

~/opt/anaconda3/envs/tensorflow-2/lib/python3.8/site-packages/tensorflow_data_validation/api/stats_api.py in <module>
     49 import apache_beam as beam
     50 import pyarrow as pa
---> 51 from tensorflow_data_validation.statistics import stats_impl
     52 from tensorflow_data_validation.statistics import stats_options
     53 

~/opt/anaconda3/envs/tensorflow-2/lib/python3.8/site-packages/tensorflow_data_validation/statistics/stats_impl.py in <module>
     25 from tensorflow_data_validation import types
     26 from tensorflow_data_validation.arrow import arrow_util
---> 27 from tensorflow_data_validation.statistics import stats_options
     28 from tensorflow_data_validation.statistics.generators import basic_stats_generator
     29 from tensorflow_data_validation.statistics.generators import image_stats_generator

~/opt/anaconda3/envs/tensorflow-2/lib/python3.8/site-packages/tensorflow_data_validation/statistics/stats_options.py in <module>
     30 from tensorflow_data_validation.utils import slicing_util
     31 from tfx_bsl.arrow import sql_util
---> 32 from tfx_bsl.coders import example_coder
     33 
     34 from google.protobuf import json_format

~/opt/anaconda3/envs/tensorflow-2/lib/python3.8/site-packages/tfx_bsl/coders/example_coder.py in <module>
     16 
     17 import pyarrow as pa
---> 18 from tfx_bsl.tfxio import tensor_representation_util
     19 
     20 from tensorflow_metadata.proto.v0 import schema_pb2

~/opt/anaconda3/envs/tensorflow-2/lib/python3.8/site-packages/tfx_bsl/tfxio/tensor_representation_util.py in <module>
     44 _LEGACY_DEFAULT_VALUE_FOR_FEATURE_TYPE = {
     45     schema_pb2.BYTES:
---> 46         schema_pb2.TensorRepresentation.DefaultValue(bytes_value=b""),
     47     schema_pb2.INT:
     48         schema_pb2.TensorRepresentation.DefaultValue(int_value=-1),

AttributeError: module 'tensorflow_metadata.proto.v0.schema_pb2' has no attribute 'TensorRepresentation'

Avoid pickle/dill hell due to collections.namedTuple

Several libraries monkeypatch collections.namedTuple as it's difficult to pickle it (namingly pyspark: https://github.com/apache/spark/blob/ee8d66105885929ac0c0c087843d70bf32de31a1/python/pyspark/serializers.py#L385 and beam: https://github.com/apache/beam/blob/v2.21.0/sdks/python/apache_beam/internal/pickler.py#L150)

This makes it difficult to use TFX when you have pyspark on your environment as the libraries try to hijack the pickling at the same time.

I know this is an issue of pyspark and beam trying to solve this monkey patching, but I'm wondering if it's possible to move out of namedtuples at all from TFX BSL codebase. Found the issue while trying to launch a Dataflow job from my JupyterLab environment which has PySpark in the environment.

The issue came when dill serializes the namedtuple for the default values in:

ColumnInfo = collections.namedtuple(

This uses pyspark as pyspark has hijacked the serializer.

Wondering, is there any way to avoid using named_tuples in tfx_bsl all together? That would help avoid this chaos I had to go through to find the issue for future people who may be in a similar environment.

Related: https://issues.apache.org/jira/browse/SPARK-22674
Filed new issue in PySpark: https://issues.apache.org/jira/browse/SPARK-32079

Upgrade pandas upper bound?

Hi,

Are there any plans to increase the pandas>=1,<2 upper bound to support pandas 2? This is blocking us upgrading our version of pandas in our docker image.

Thanks!

pyarrow 3.0.0 support

Is there a plan to support pyarrow 3.0.0? Pyarrow 2.0.0 does not have streaming reading api and we are depending on the streaming reading of pyarrow 3.0.0. But we are not able to get the dependency works because tfx-bsl does not support pyarrow 3.0.0

Any plans to support Apple silicon devices like M1 Macs?

Is it possible to install tfx-bsl on M1 Macs either through pre-built distributions or from source? If not, are there any plans to support?

using M1 Mac, OS X 12.1, Python 3.8.12

pip install tfx-bsl
ERROR: Could not find a version that satisfies the requirement tfx-bsl (from versions: none)
ERROR: No matching distribution found for tfx-bsl

I tried to build from source like so but wasn't successful.

Support for multi-signature models in RunInference API

Is it possible to use the RunInference API with models that have more than one signature? I used the RunInference API via the TFX BulkInferrer component with a model that has two signatures and got the error ValueError('Signature should have 1 and only 1 inputs'). I stopped getting the error after removing one of the signatures from my model.

Update pypi installer

Since Apache 2.17 became available recently I get this error when installing TF and TFMA:

from setuptools import find_packages
from setuptools import setup

REQUIRED_PACKAGES = [
    'tensorflow==1.15.0',
#    'tensorflow-model-analysis==0.15.4'
]

setup(
    name='trainer',
    description='AI Platform Training job for TensorFlow',
    author='Google Cloud Platform',
    install_requires=REQUIRED_PACKAGES,
    version='0.1',
    packages=find_packages(),
    include_package_data=True
)

When I run: python setup.py install

tfx-bsl is complaining about Apache Beam version, I'm getting this error:

ERROR: tensorflow 2.1.0 has requirement scipy==1.4.1; python_version >= "3", but you'll have scipy 1.1.0 which is incompatible.
ERROR: tensorflow-serving-api 2.0.0 has requirement tensorflow~=2.0.0, but you'll have tensorflow 2.1.0 which is incompatible.
ERROR: tfx-bsl 0.15.3 has requirement absl-py<0.9,>=0.7, but you'll have absl-py 0.9.0 which is incompatible.
ERROR: tfx-bsl 0.15.3 has requirement apache-beam[gcp]<2.17,>=2.16, but you'll have apache-beam 2.17.0 which is incompatible.
ERROR: tfx-bsl 0.15.3 has requirement pyarrow<0.15.0,>=0.14.0, but you'll have pyarrow 0.15.1 which is incompatible.

Looks like this is already solved, just need to package it.

tfx-bsl 0.22.0 seems missing when running on Google Cloud Dataflow

I am not sure if this is the right place to post this issue, but I have noticed that my dataflow jobs are failing after upgrading my project to tfx-bsl==0.22.0. Generally, the error that is given is something like AttributeError: module 'tfx_bsl.arrow.array_util' has no attribute 'ValueCounts', which seems to indicate that some of the C++ extensions did not properly install on the workers.

I have tfx-bsl==0.22.0 in my setup.py file, I have the setup.py file added as a --setup_file for my dataflow options. Running locally with the DirectRunner is fine. However, nothing I have tried can get this error to go away when using the DataflowRunner, other than downloading the wheel for 0.22.0 and adding --extra_package deps/tfx_bsl-0.22.0-cp37-cp37m-manylinux2010_x86_64.whl to my dataflow options. When I do that, it works perfectly.

CRITICAL? version 0.21.0 on pypi doesn't install record_based_tfxio.py, causing TFX imports to fail

record_based_tfxio.py seems to have been added to tfx-bsl on February 10. The version of tfx_bsl on pypi, 0.21.0, does not install this file. This causes other imports to fail, e.g. CsvExampleGen.

STEPS TO REPRODUCE LOCALLY (assumes virtualenv and virtualenvwrapper are installed):

mkvirtualenv jb_testing_tfx_bsl --python=python3.7
pip install tfx_bsl
ls /Users/<my user name>/Virtualenvs/jb_testing_tfx_bsl/lib/python3.7/site-packages/tfx_bsl/tfxio
# observe that record_based_tfxio.py is not present.

STEPS TO REPRODUCE (colab):
Create a new Colab notebook

!pip install tfx==0.21.0 tensorflow==2.1
# then restart runtime
from tfx.components import CsvExampleGen

observe the following error:

<ipython-input-1-b8326c471c1e> in <module>()
----> 1 from tfx.components import CsvExampleGen

4 frames
/usr/local/lib/python3.6/dist-packages/tfx/components/__init__.py in <module>()
     26 from tfx.components.example_gen.csv_example_gen.component import CsvExampleGen
     27 from tfx.components.example_gen.import_example_gen.component import ImportExampleGen
---> 28 from tfx.components.example_validator.component import ExampleValidator
     29 from tfx.components.model_validator.component import ModelValidator
     30 from tfx.components.pusher.component import Pusher

/usr/local/lib/python3.6/dist-packages/tfx/components/example_validator/component.py in <module>()
     25 from tfx.components.base import base_component
     26 from tfx.components.base import executor_spec
---> 27 from tfx.components.example_validator import executor
     28 from tfx.types import standard_artifacts
     29 from tfx.types.standard_component_specs import ExampleValidatorSpec

/usr/local/lib/python3.6/dist-packages/tfx/components/example_validator/executor.py in <module>()
     23 
     24 import absl
---> 25 import tensorflow_data_validation as tfdv
     26 
     27 from tfx import types

/usr/local/lib/python3.6/dist-packages/tensorflow_data_validation/__init__.py in <module>()
     31 
     32 # Import coders.
---> 33 from tensorflow_data_validation.coders.csv_decoder import DecodeCSV
     34 from tensorflow_data_validation.coders.tf_example_decoder import DecodeTFExample
     35 

/usr/local/lib/python3.6/dist-packages/tensorflow_data_validation/coders/csv_decoder.py in <module>()
     24 from tensorflow_data_validation import types
     25 from tfx_bsl.coders import csv_decoder as csv_decoder
---> 26 from tfx_bsl.tfxio import record_based_tfxio
     27 from typing import List, Iterable, Optional, Text
     28 

ImportError: cannot import name 'record_based_tfxio'

Wheel support for linux aarch64

Summary
Installing tfx-bsl on aarch64 via pip using command "pip3 install tfx-bsl" pops out "Could not find a version that satisfies the requirement tfx-bsl"

Problem description
tfx-bsl doesn't have wheel for aarch64 on PyPI repository. Also source distribution is not uploaded to PyPi. So, tfx-bsl installation fails on aarch64.

Expected Output
Pip should be able to download tfx-bsl wheel from PyPI repository.

@team, please let me know if I can help you building wheel/uploading to PyPI repository.

Update pyarrow version range to address vulnerability CVE-2023-47248

Hi,

current pyarrow dependency version is set to pyarrow>=10,<11. However, there is a known vulnerability in pyarrow with the CVE-2023-47248.
I'd like to propose bumping the pyarrow version to a range of pyarrow>=14.0.1,<15, which should include the necessary fix for the aforementioned vulnerability. This version range should not introduce compatibility issues while ensuring we are using a secure version of the library.

Python 3.12 support

Any plan to support 3.12? It seems v1.14.0 also does not support 3.11 yet.

Raise pyarrow upper bound?

Would it be possible to remove the pyarrow upper bound in the next release?

The current release of TFX-BSL pins pyarrow to an 18-month old version (6.0.0; the latest is 11.0.0).

Unable to build from source on Mac M1 with bazel 5.0.0

I'm following the build from source directions in README but fail at this step. I'm using Mac M1 OSX 12.1, Python 3.8.12, and bazel 5.0.0. I get the error Unrecognized option: --incompatible_restrict_string_escapes=false. This switch was introduced in bazel 4.0.0 but removed in 5.0.0.

❯ python setup.py bdist_wheel
/Users/dxia/.pyenv/versions/ml-golden-path/lib/python3.8/site-packages/setuptools/dist.py:493: UserWarning: Normalizing '1.7.0.dev' to '1.7.0.dev0'
  warnings.warn(tmpl.format(**locals()))
running bdist_wheel
running build
running bazel_build
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Reading rc options for 'run' from /Users/dxia/src/github.com/tensorflow/tfx-bsl/.bazelrc:
  Inherited 'build' options: --copt=-DTFX_BSL_USE_ARROW_C_ABI --cxxopt=-std=c++17 --incompatible_restrict_string_escapes=false --incompatible_require_linker_input_cc_api=false
ERROR: --incompatible_restrict_string_escapes=false :: Unrecognized option: --incompatible_restrict_string_escapes=false
Traceback (most recent call last):
  File "setup.py", line 133, in <module>
    setup(
  File "/Users/dxia/.pyenv/versions/ml-golden-path/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/Users/dxia/.pyenv/versions/3.8.12/lib/python3.8/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/Users/dxia/.pyenv/versions/3.8.12/lib/python3.8/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/Users/dxia/.pyenv/versions/3.8.12/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/Users/dxia/.pyenv/versions/ml-golden-path/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 299, in run
    self.run_command('build')
  File "/Users/dxia/.pyenv/versions/3.8.12/lib/python3.8/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/Users/dxia/.pyenv/versions/3.8.12/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/Users/dxia/.pyenv/versions/3.8.12/lib/python3.8/distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/Users/dxia/.pyenv/versions/3.8.12/lib/python3.8/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/Users/dxia/.pyenv/versions/3.8.12/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "setup.py", line 91, in run
    subprocess.check_call(
  File "/Users/dxia/.pyenv/versions/3.8.12/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/homebrew/bin/bazel', 'run', '-c', 'opt', '--macos_minimum_os=10.14', '//tfx_bsl:move_generated_files']' returned non-zero exit status 2.

tfx-bsl on  master via 🐍 v3.8.12 (ml-golden-path) on ☁️  [email protected] took 4s
❯ /opt/homebrew/bin/bazel run -c opt --macos_minimum_os=10.14 //tfx_bsl:move_generated_files
INFO: Reading rc options for 'run' from /Users/dxia/src/github.com/tensorflow/tfx-bsl/.bazelrc:
  Inherited 'build' options: --copt=-DTFX_BSL_USE_ARROW_C_ABI --cxxopt=-std=c++17 --incompatible_restrict_string_escapes=false --incompatible_require_linker_input_cc_api=false
ERROR: --incompatible_restrict_string_escapes=false :: Unrecognized option: --incompatible_restrict_string_escapes=false

tfx-bsl on  master via 🐍 v3.8.12 (ml-golden-path) on ☁️  [email protected]
❯ bazel --version
bazel 5.0.0-homebrew

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.