Coder Social home page Coder Social logo

lyhue1991 / eat_tensorflow2_in_30_days Goto Github PK

View Code? Open in Web Editor NEW
9.9K 9.9K 2.5K 59.89 MB

Tensorflow2.0 🍎🍊 is delicious, just eat it! 😋😋

License: Apache License 2.0

Python 100.00%
tensorflow tensorflow-examples tensorflow-tutorial tensorflow2

eat_tensorflow2_in_30_days's People

Contributors

lyhue1991 avatar nbwuzhe avatar neilteng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eat_tensorflow2_in_30_days's Issues

1-2 load_image函数打标签错误

![image](https://user-images.githubusercontent.com/55381998/79407314-c5b0ac00-7fcb-11ea-8546-54e90495fbf1.png
标签全为0,导致后续训练正确率均为1.
Train for 100 steps, validate for 20 steps
Epoch 1/10
100/100 [==============================] - 16s 162ms/step - loss: 0.0116 - accuracy: 0.9904 - val_loss: 1.2626e-09 - val_accuracy: 1.0000
Epoch 2/10
100/100 [==============================] - 11s 106ms/step - loss: 5.7853e-09 - accuracy: 1.0000 - val_loss: 1.2602e-09 - val_accuracy: 1.0000
Epoch 3/10
100/100 [==============================] - 11s 105ms/step - loss: 5.7422e-09 - accuracy: 1.0000 - val_loss: 1.2595e-09 - val_accuracy: 1.0000
...

Suggest a virtual environment.

I suggest a virtual environment for this tutorial

For one thing, it decouples the change in new relase of tf and the development environment we use. it saves authors' effort to answer tf version related problem and delegate them back to tf developers.
And it also saves readers effort to figure out missing package. e.g. When I run the 5-1, it tells me I miss the package pillow which is not explicitly imported.

Best
Neil

文件路径表示问题

由于Windows和类Unix系统对于路径表示有差异,所以示例代码需要考虑兼容性才能在不同系统成功运行。以“1-2,图片数据建模流程范例”为例子,其中tf.strings.regex_full_match(img_path, "./automobile/.")就需要改为tf.strings.regex_full_match(img_path, ".automobile."),以及logdir = "./data/keras_model/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")也需要改为用os.path.join()函数连接的形式而不是硬编码。

数据集

时间序列数据集好像没有啊

tf serving预测会有错误

tf serving预测会有错误帮忙看下

{ "error": "Malformed request: POST /v1/models/linear_model" }{ "error": "In[0] is not a matrix. Instead it has shape [3]\n\t [[{{node model/outputs/BiasAdd}}]]" }%

3-2在GPU上运行会报错,查了一下发现有人在CPU下可以运行

会报错如下
(0) Internal: No unary variant device copy function found for direction: 1 and Variant type_index: class tensorflow::data::`anonymous namespace'::DatasetVariantWrapper [[{{node while_input_4/_12}}]] (1) Internal: No unary variant device copy function found for direction: 1 and Variant type_index: class tensorflow::data::`anonymous namespace'::DatasetVariantWrapper [[{{node while_input_4/_12}}]] [[Func/while/body/_1/input/_60/_20]]

3-3,高阶API示范 划分数据集有一点错误

如果从二,DNN二分类模型这部分开始运行代码
运行到这里

ds_train = tf.data.Dataset.from_tensor_slices((X[0:n*3//4,:],Y[0:n*3//4,:])) \
     .shuffle(buffer_size = 1000).batch(20) \
     .prefetch(tf.data.experimental.AUTOTUNE) \
     .cache()

ds_valid = tf.data.Dataset.from_tensor_slices((X[n*3//4:,:],Y[n*3//4:,:])) \
     .batch(20) \
     .prefetch(tf.data.experimental.AUTOTUNE) \
     .cache()

会出现NameError: name 'n' is not defined这个错误,我感觉您的意思训练集是总数据的75%,测试集是总数据的25%。
所以我建议改成

n = n_positive+n_negative
ds_train = tf.data.Dataset.from_tensor_slices((X[0:n*3//4,:],Y[0:n*3//4,:])) \
     .shuffle(buffer_size = 1000).batch(20) \
     .prefetch(tf.data.experimental.AUTOTUNE) \
     .cache()

ds_valid = tf.data.Dataset.from_tensor_slices((X[n*3//4:,:],Y[n*3//4:,:])) \
     .batch(20) \
     .prefetch(tf.data.experimental.AUTOTUNE) \
     .cache()

`

1-3的Valid Loss为什么在上升?

源文档中:
Epoch=1,Loss:0.442317516,Accuracy:0.7695,Valid Loss:0.323672801,Valid Accuracy:0.8614
Epoch=2,Loss:0.245737702,Accuracy:0.90215,Valid Loss:0.356488883,Valid Accuracy:0.8554
Epoch=3,Loss:0.17360799,Accuracy:0.93455,Valid Loss:0.361132562,Valid Accuracy:0.8674
Epoch=4,Loss:0.113476314,Accuracy:0.95975,Valid Loss:0.483677238,Valid Accuracy:0.856
Epoch=5,Loss:0.0698405355,Accuracy:0.9768,Valid Loss:0.607856631,Valid Accuracy:0.857
Epoch=6,Loss:0.0366807655,Accuracy:0.98825,Valid Loss:0.745884955,Valid Accuracy:0.854

我复现后:
Epoch=1,Loss:0.679053724,Accuracy:0.55235,Valid Loss:0.572207093,Valid Accuracy:0.717
Epoch=2,Loss:0.467248648,Accuracy:0.7762,Valid Loss:0.491477,Valid Accuracy:0.7588
Epoch=3,Loss:0.349681437,Accuracy:0.8475,Valid Loss:0.514342368,Valid Accuracy:0.7628
Epoch=4,Loss:0.278649092,Accuracy:0.8863,Valid Loss:0.564446032,Valid Accuracy:0.763
Epoch=5,Loss:0.2197005,Accuracy:0.9159,Valid Loss:0.643948495,Valid Accuracy:0.7548
Epoch=6,Loss:0.163983703,Accuracy:0.94135,Valid Loss:0.770707726,Valid Accuracy:0.7524

可以看到Valid Loss在逐渐上升

5-4 无法运行

在创建了 Linear 类以后,第一次实例化这个类的时候 linear = Linear(units=8),系统报错。反复与原始代码比较,没发现不同的地方。

class Linear(layers.Layer):
    def __init__(self, units=32, **kwargs):
        super(Linear, self).__init__(**kwargs)
        self.units = units
        
    def build(self, input_shape):
        self.w = self.add_weight('w', shape=(input_shape[-1], self.units),
                                initializer='random_normal',
                                trainable=True)
        self.b = self.add_weight('b', shape=(self.units,),
                                initializer='random_normal',
                                trainable=True)
        super(Linear, self).build(input_shape)
        
    @tf.function
    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b
    
    def get_config(self):
        config = super(Linear, self).get_config()
        config.update({'units':self.units})
        return config
linear = Linear(units=8)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-efa0d9cd5402> in <module>
----> 1 linear = Linear(units=8)
      2 print(linear.built)
      3 linear.build(input_shape=(None, 16))
      4 print(linear.built)

TypeError: __call__() missing 1 required positional argument: 'inputs'

谢谢!

Eat 第一天想使用继承Model基类构建自定义模型

class Model(models.Model):
def init(self):
super(Model, self).init()

def build(self, input_shape):

    self.dense = tf.keras.layers.Dense(15, 20)
    self.dense = tf.keras.layers.Dense(20, 10)
    self.dense = tf.keras.layers.Dense(10, 1)
    super(Model, self).build(input_shape)

def call(self, x):
    x = self.dense(x)
    x = tf.nn.relu(x)
    x = self.dense(x)
    x = tf.nn.relu(x)
    x = self.dense(x)
    x = tf.nn.sigmoid(x)

    return (x)

model = Model()
print(model)
model.build(input_shape=(15,))
model.summary()

报错 :TypeError: Could not interpret activation function identifier: 20

3-1 低阶API示范 构建数据管道迭代器

3-1 低阶API示范 构建数据管道迭代器data_iter(features, labels, batch_size=8)函数中,
yield tf.gather(X,indexs), tf.gather(Y,indexs)
是不是该写成
tf.gather(features,indexs), tf.gather(labels,indexs)

Some Suggestions for '1-3' Maybe

Excuse me. QAQ But I hope to get suggestions!


Where the issue happens

Chapter 1-3,文本数据建模流程范例

# 构建词典
def clean_text(text):
    ...
    tf.strings.regex_replace(stripped_html,
         '[%s]' % re.escape(string.punctuation),'')

Issue Detail

In re.escape(string.punctuation),'', should '' be this->' ' ?
Otherwise, we'll get "himbut" from "him,but".
Additionally, I'm considering we should remove "'" from string.punctuation.
Otherwise, we'll get "It's a good" from "it s a good".

My Edition for These Codes

def clean_text(text):
    # A string include all punctuations which has been escaped by re.
    # Use '\\' for escape of metacharacters.
    escaped_punctuation = re.escape(string.punctuation.replace("'", ""))
    lowercase = tf.strings.lower(text)
    stripped_html = tf.strings.regex_replace(lowercase, '<br />', ' ')
    cleaned_punctuation = tf.strings.regex_replace(stripped_html,
                                                   '[%s]' % escaped_punctuation, ' ')

    return cleaned_punctuation

1-3 TensorFlow 运行报错

1-1可以正常运行,但是1-3就会报错

软件版本:
Ubuntu18.04
CUDA: 10.0
CuDNN: 7.6.5
TensorFlow-gpu: 2.1.0

报错信息:

2020-04-29 10:23:08.233741: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-04-29 10:23:08.233797: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-04-29 10:23:08.233803: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-04-29 10:23:08.758262: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-04-29 10:23:08.764938: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-29 10:23:08.765330: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.71GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-04-29 10:23:08.765483: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-29 10:23:08.766542: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-29 10:23:08.767329: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-04-29 10:23:08.767509: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-04-29 10:23:08.768612: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-04-29 10:23:08.769447: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-04-29 10:23:08.771963: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-29 10:23:08.772061: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-29 10:23:08.772396: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-29 10:23:08.772661: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-29 10:23:08.772903: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-29 10:23:08.797181: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2020-04-29 10:23:08.797494: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555c37cbd2a0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-29 10:23:08.797521: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-04-29 10:23:08.870114: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-29 10:23:08.870464: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555c3850a540 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-04-29 10:23:08.870477: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5
2020-04-29 10:23:08.870591: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-29 10:23:08.870863: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.71GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-04-29 10:23:08.870889: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-29 10:23:08.870899: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-29 10:23:08.870908: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-04-29 10:23:08.870916: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-04-29 10:23:08.870925: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-04-29 10:23:08.870933: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-04-29 10:23:08.870941: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-29 10:23:08.870976: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-29 10:23:08.871252: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-29 10:23:08.871502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-29 10:23:08.871523: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-29 10:23:08.872180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-29 10:23:08.872188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2020-04-29 10:23:08.872192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2020-04-29 10:23:08.872253: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-29 10:23:08.872536: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-29 10:23:08.872803: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6900 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
[b'the', b'and', b'a', b'of', b'to', b'is', b'in', b'it', b'i', b'this', b'that', b'was', b'as', b'for', b'with', b'movie', b'but', b'film', b'on', b'not', b'you', b'his', b'are', b'have', b'be', b'he', b'one', b'its', b'at', b'all', b'by', b'an', b'they', b'from', b'who', b'so', b'like', b'her', b'just', b'or', b'about', b'has', b'if', b'out', b'some', b'there', b'what', b'good', b'more', b'when', b'very', b'she', b'even', b'my', b'no', b'would', b'up', b'time', b'only', b'which', b'story', b'really', b'their', b'were', b'had', b'see', b'can', b'me', b'than', b'we', b'much', b'well', b'get', b'been', b'will', b'into', b'people', b'also', b'other', b'do', b'bad', b'because', b'great', b'first', b'how', b'him', b'most', b'dont', b'made', b'then', b'them', b'films', b'movies', b'way', b'make', b'could', b'too', b'any', b'after', b'characters']
Model: "cnn_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        multiple                  70000     
_________________________________________________________________
conv_1 (Conv1D)              multiple                  576       
_________________________________________________________________
maxpool_1 (MaxPooling1D)     multiple                  0         
_________________________________________________________________
conv_2 (Conv1D)              multiple                  4224      
_________________________________________________________________
maxpool_2 (MaxPooling1D)     multiple                  0         
_________________________________________________________________
flatten (Flatten)            multiple                  0         
_________________________________________________________________
dense (Dense)                multiple                  6145      
=================================================================
Total params: 80,945
Trainable params: 80,945
Non-trainable params: 0
_________________________________________________________________
2020-04-29 10:23:12.802249: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-29 10:23:12.968219: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-29 10:23:13.365572: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-29 10:23:13.378753: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-29 10:23:13.378835: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[{{node cnn_model/conv_1/conv1d}}]]
	 [[Nadam/ReadVariableOp_3/_20]]
2020-04-29 10:23:13.378885: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[{{node cnn_model/conv_1/conv1d}}]]
Traceback (most recent call last):
  File "/home/huxiaoyang/PycharmProjects/eat_tensorflow2_in_30_days/1-3_text_data_modeling_process_example/example.py", line 170, in <module>
    main()
  File "/home/huxiaoyang/PycharmProjects/eat_tensorflow2_in_30_days/1-3_text_data_modeling_process_example/example.py", line 166, in main
    train_model(model, ds_train, ds_test, epochs=6)
  File "/home/huxiaoyang/PycharmProjects/eat_tensorflow2_in_30_days/1-3_text_data_modeling_process_example/example.py", line 148, in train_model
    train_step(model, features, labels)
  File "/home/huxiaoyang/miniconda3/envs/tf210/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 568, in __call__
    result = self._call(*args, **kwds)
  File "/home/huxiaoyang/miniconda3/envs/tf210/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 632, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/huxiaoyang/miniconda3/envs/tf210/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 2363, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/huxiaoyang/miniconda3/envs/tf210/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1611, in _filtered_call
    self.captured_inputs)
  File "/home/huxiaoyang/miniconda3/envs/tf210/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/home/huxiaoyang/miniconda3/envs/tf210/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 545, in call
    ctx=ctx)
  File "/home/huxiaoyang/miniconda3/envs/tf210/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node cnn_model/conv_1/conv1d (defined at /PycharmProjects/eat_tensorflow2_in_30_days/1-3_text_data_modeling_process_example/example.py:71) ]]
	 [[Nadam/ReadVariableOp_3/_20]]
  (1) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node cnn_model/conv_1/conv1d (defined at /PycharmProjects/eat_tensorflow2_in_30_days/1-3_text_data_modeling_process_example/example.py:71) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_step_4773]

Function call stack:
train_step -> train_step


Process finished with exit code 1

1-2预处理num_parallel_calls出现问题

Using a tf.Tensor as a Python bool is not allowed. Use if t is not None: instead of if t: to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.

why can we use parameter "input_shape = (2,)" which undefined in __init__?

In 4-3, it is not a bug but just a question that I dont understand. Why we can do this model.add(Linear(units = 1,input_shape = (2,))) without this parameter in init method "input_shape = (2,)"

class Linear(layers.Layer):
    def __init__(self, units=32, **kwargs):
#         super(Linear, self).__init__(**kwargs)
        super().__init__(**kwargs)
        self.units = units
    
    # The trainable parameters are defined in build method
    # Since we do not need the input_shape except the build function,
    # we do not need to store then in the __init__ function
    def build(self, input_shape): 
        self.w = self.add_weight("w",shape=(input_shape[-1], self.units),
                                 initializer='random_normal',
                                 trainable=True) # Parameter named "w" is compulsory or an error will be thrown out
        self.b = self.add_weight("b",shape=(self.units,),
                                 initializer='random_normal',
                                 trainable=True)
        super().build(input_shape) # Identical to self.built = True

    # The logic of forward propagation is defined in call method, and is called by __call__ method
    @tf.function
    def call(self, inputs): 
        return tf.matmul(inputs, self.w) + self.b
    
    # Use customized get-config method to save the model as h5 format, specifically for the model composed through Functional API with customized Layer
    def get_config(self):  
        config = super().get_config()
        config.update({'units': self.units})
        return config

tf.keras.backend.clear_session()

model = models.Sequential()
# Note: the input_shape here will be modified by the model, so we don't have to fill None in the dimension representing the number of samples.
model.add(Linear(units = 1,input_shape = (2,)))  
print("model.input_shape: ",model.input_shape)
print("model.output_shape: ",model.output_shape)
model.summary()

3-3,高阶API示范 DNN模型的示例跟中阶API一样

按照高阶API的线性回归模型中的示范,建模过程应该是:
(1)通过 models.Sequential()的方式构建模型
(2)add()添加网络层
(3)定义loss, metric, optimizer
(4)通过compile()的方式配置模型训练中的各种参数
(5)model.fit()的方式训练模型

但是在高阶API示例的DNN模型的建模中,使用的是build()的方式,整体框架基本跟中阶API示例中DNN模型的建模过程一模一样,感觉这个是不是给错了示例了。

spark-scala调用tensorflow2.0 模型会报错

有个疑问,原生SavedModelBundle 、Session 类并没有实现serializable 接口,直接

val broads = sc.broadcast(bundle)
会报
Serialization stack: - object not serializable (class: org.tensorflow.SavedModelBundle, value: org.tensorflow.SavedModelBundle@6a1ebcff)

的异常,自己要修改原码增加 serializable 接口,要改不少代码,文中是如何做到这点的呢?

windows log路径问题

如果在windows上使用绝对路径时,需要写成类似
logdir = 'C:\xx\autograph\%s' %stamp

关于input_shape的问题

在1-1结构化数据建模流程范例,为什么input_shape=(15,),而不是x_train.shape 即input_shape=(891,15)

自定义评估函数

`@tf.function
def update_state(self,y_true,y_pred):
y_true = tf.cast(tf.reshape(y_true,(-1,)),tf.bool)
y_pred = tf.cast(100*tf.reshape(y_pred,(-1,)),tf.int32)

    for i in tf.range(0,tf.shape(y_true)[0]):
        if y_true[i]:
            self.true_positives[y_pred[i]].assign(
                self.true_positives[y_pred[i]]+1.0)
        else:
            self.false_positives[y_pred[i]].assign(
                self.false_positives[y_pred[i]]+1.0)
    return (self.true_positives,self.false_positives)`

在2.1中 输入应该添加 sample_weight=None, 而且返回值只能选择一个

使用继承Model基类构建自定义模型的模型加载问题

模型保存

model.save('./data/tf_model_savedmodel', save_format="tf")

经测试,只能以这种方式保存,不能保存成keras的h5形式

模型加载

model_loaded = tf.keras.models.load_model('./data/tf_model_savedmodel')

error

ValueError: Could not find matching function to call loaded from the SavedModel. Got:
  Positional arguments (2 total):
    * Tensor("x:0", shape=(None, 200), dtype=int32)
    * Tensor("training:0", shape=(), dtype=bool)
  Keyword arguments: {}

Expected these arguments to match one of the following 4 option(s):

Option 1:
  Positional arguments (2 total):
    * TensorSpec(shape=(None, 200), dtype=tf.int32, name='input_1')
    * True
  Keyword arguments: {}

Option 2:
  Positional arguments (2 total):
    * TensorSpec(shape=(None, 200), dtype=tf.int32, name='x')
    * False
  Keyword arguments: {}

Option 3:
  Positional arguments (2 total):
    * TensorSpec(shape=(None, 200), dtype=tf.int32, name='x')
    * True
  Keyword arguments: {}

Option 4:
  Positional arguments (2 total):
    * TensorSpec(shape=(None, 200), dtype=tf.int32, name='input_1')
    * False
  Keyword arguments: {}

成功加载

load_model = tf.saved_model.load('./data/saved_model')

但是这样加载的模型没有编译,无法直接使用model.xxx方法

目前解决方法

以tensorflow serving的docker形式部署saved_model 格式的模型

请问作者考虑写一下tft data pipeline内容吗?

tf2 加入了tfx的扩展支持,其中tfr我觉得是最可能在工程中用到。请问作者考虑加入这部分的教程?

典型的一个场景是:
里面对apache beam的整合,可以让我们将线下训练和线上serving的data pipeline统一起来。这样子我们的model只要消费pipeline给的数据就好了。

在6-3章节我遇到有一个小错误

6-3里面有一句gpus = tf.config.list_physical_devices("GPU")
我在运行之后会报错,module 'tensorflow_core._api.v2.config' has no attribute 'list_physical_devices'
我改为tf.config.experimental.list_physical_devices("GPU")解决了,不知道其他人遇没遇见,建议可以修改一下。

5.5的损失函数有误

def focal_loss(gamma=2., alpha=.25):
    
    def focal_loss_fixed(y_true, y_pred):
        pt_1 = tf.where(tf.equal(y_true, 1), y_pred, tf.ones_like(y_pred))
        pt_0 = tf.where(tf.equal(y_true, 0), y_pred, tf.zeros_like(y_pred))
        loss = -tf.sum(alpha * tf.pow(1. - pt_1, gamma) * tf.log(1e-07+pt_1)) \
           -tf.sum((1-alpha) * tf.pow( pt_0, gamma) * tf.log(1. - pt_0 + 1e-07))
        return loss
    return focal_loss_fixed

提示 AttributeError: module 'tensorflow' has no attribute 'sum', 猜测应该更正为:

def focal_loss(gamma=2., alpha=.25):
    
    def focal_loss_fixed(y_true, y_pred):
        pt_1 = tf.where(tf.equal(y_true, 1), y_pred, tf.ones_like(y_pred))
        pt_0 = tf.where(tf.equal(y_true, 0), y_pred, tf.zeros_like(y_pred))
        loss = -tf.reduce_sum(alpha * tf.pow(1. - pt_1, gamma) * tf.math.log(1e-07+pt_1)) \
           -tf.reduce_sum((1-alpha) * tf.pow( pt_0, gamma) * tf.math.log(1. - pt_0 + 1e-07))
        return loss
    return focal_loss_fixed

3-2 中阶API train_model提示 Internal: No unary variant device copy function found for direction...

这章的实例code好像在最新的tensorflow下不能用会遇到

Traceback (most recent call last):
  File "demo.py", line 40, in <module>
    train_model(model,epochs = 200)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 608, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 678, in _call
    return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1665, in _filtered_call
    self.captured_inputs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1746, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 598, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal:  No unary variant device copy function found for direction: 1 and Variant type_index: tensorflow::data::(anonymous namespace)::DatasetVariantWrapper
         [[{{node while_input_5/_12}}]]
         [[Func/while/body/_1/while/cond/then/_78/input/_91/_52]]
  (1) Internal:  No unary variant device copy function found for direction: 1 and Variant type_index: tensorflow::data::(anonymous namespace)::DatasetVariantWrapper
         [[{{node while_input_5/_12}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_model_342]

Function call stack:
train_model -> train_model

我把示例code中的visualization的部分都去掉以便于重现这个问题:

import numpy as np
import tensorflow as tf
from tensorflow.keras import layers,losses,metrics,optimizers

n = 400

X = tf.random.uniform([n,2],minval=-10,maxval=10)
w0 = tf.constant([[2.0],[-3.0]])
b0 = tf.constant([[3.0]])
Y = X@w0 + b0 + tf.random.normal([n,1],mean = 0.0,stddev= 2.0)

ds = tf.data.Dataset.from_tensor_slices((X,Y)) \
     .shuffle(buffer_size = 100).batch(10) \
     .prefetch(tf.data.experimental.AUTOTUNE)

model = layers.Dense(units = 1)
model.build(input_shape = (2,))
model.loss_func = losses.mean_squared_error
model.optimizer = optimizers.SGD(learning_rate=0.001)

@tf.function
def train_step(model, features, labels):
    with tf.GradientTape() as tape:
        predictions = model(features)
        loss = model.loss_func(tf.reshape(labels,[-1]), tf.reshape(predictions,[-1]))
    grads = tape.gradient(loss,model.variables)
    model.optimizer.apply_gradients(zip(grads,model.variables))
    return loss

@tf.function
def train_model(model,epochs):
    for epoch in tf.range(1,epochs+1):
        loss = tf.constant(0.0)
        for features, labels in ds:
            loss = train_step(model,features,labels)
        if epoch%50==0:
            tf.print("epoch =",epoch,"loss = ",loss)
            tf.print("w =",model.variables[0])
            tf.print("b =",model.variables[1])
train_model(model,epochs = 200)

问题应该是出现再train_model这个function里。如果把train_model上的@tf.function去掉,则没有问题。难道原因是不能在tf function里操作tf.dataset?

我使用的是tensorflow的nightly build。谢谢

@符号增加正态扰动的含义?

在3-1低阶API示范中准备数据的时候有一条注释是:

@表示矩阵乘法,增加正态扰动

具体位置在3-1低阶API示范的“一、线性回归模型”的“1、准备数据”的第一段程序片的最后一行,已附上图片不知道能不能显示
20200517115005

而在tensorflow的API(matmul )中却这样写道:
Since python >= 3.5 the @ operator is supported (see PEP 465). In TensorFlow, it simply calls the tf.matmul() function, so the following lines are equivalent:

d = a @ b @ [[10], [11]]
d = tf.matmul(tf.matmul(a, b), [[10], [11]])

在网上找了一圈也没找到关于“矩阵相乘增加正态扰动”等之类的资料,请问增加正态扰动的含义是什么呢或者说是在什么地方用到呢?是与变量X = tf.random.uniform([n,2],minval=-10,maxval=10)此处的random有关吗还是其他?谢谢!!

子类化构建模型

你好,请问我在用子类化方法构建模型的时候,想将该模型嵌入另一个子类化模型中,
第一种方式是将嵌入的模型写成继承Layer类的方法,然后重写get_config 方法,
第二种方式是将嵌入的模型写成继承Model类的方法重写compute_output_shape方法。

请问这两种方法效果是否是一样的?或者有什么区别?

estimator是否考虑加入书籍?

estimator是tf从1到2一直延续的重要api,层级上来看应该属于高阶api,可以直接定义model。
是否考虑把这一部分加入书籍呢?
为什么把这部分丢弃,是出于什么考虑呢

1-1结构化数据建模流程规范的问题

在1-1章中, 作者使用到的y_test = dftest_raw['Survived'].values,其中dftest_raw是没有Survived这一列的, 这个时候会报错。

不知道作者使用的test data是官方的test data,还是从train data中分割一部分出来成为test data呢? 谢谢!

1-2,图片数据建模流程范例

`#使用并行化预处理num_parallel_calls 和预存数据prefetch来提升性能
ds_train = tf.data.Dataset.list_files("./data/cifar2/train//.jpg")
.map(load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
.shuffle(buffer_size = 1000).batch(BATCH_SIZE)
.prefetch(tf.data.experimental.AUTOTUNE)

ds_test = tf.data.Dataset.list_files("./data/cifar2/test//.jpg")
.map(load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
.batch(BATCH_SIZE)
.prefetch(tf.data.experimental.AUTOTUNE) `
我经过处理后打印标签都相同,不知何处问题?

day 1 训练模型报错

history = model.fit(x_train,y_train,batch_size= 64,epochs= 30, validation_split=0.2)报错:

validation_split is only supported for Tensors or NumPy arrays, found following types in the input: [<class 'pandas.core.frame.DataFrame'>]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.