googlecreativelab / quickdraw-dataset Goto Github PK

Documentation on how to access and use the Quick, Draw! Dataset.

Home Page: https://quickdraw.withgoogle.com/data

License: Other

quickdraw-dataset's Introduction

The Quick, Draw! Dataset

The Quick Draw Dataset is a collection of 50 million drawings across 345 categories, contributed by players of the game Quick, Draw!. The drawings were captured as timestamped vectors, tagged with metadata including what the player was asked to draw and in which country the player was located. You can browse the recognized drawings on quickdraw.withgoogle.com/data.

We're sharing them here for developers, researchers, and artists to explore, study, and learn from. If you create something with this dataset, please let us know by e-mail or at A.I. Experiments.

We have also released a tutorial and model for training your own drawing classifier on tensorflow.org.

Please keep in mind that while this collection of drawings was individually moderated, it may still contain inappropriate content.

The raw moderated dataset

The raw data is available as ndjson files seperated by category, in the following format:

Key	Type	Description
key_id	64-bit unsigned integer	A unique identifier across all drawings.
word	string	Category the player was prompted to draw.
recognized	boolean	Whether the word was recognized by the game.
timestamp	datetime	When the drawing was created.
countrycode	string	A two letter country code (ISO 3166-1 alpha-2) of where the player was located.
drawing	string	A JSON array representing the vector drawing

Each line contains one drawing. Here's an example of a single drawing:

  { 
    "key_id":"5891796615823360",
    "word":"nose",
    "countrycode":"AE",
    "timestamp":"2017-03-01 20:41:36.70725 UTC",
    "recognized":true,
    "drawing":[[[129,128,129,129,130,130,131,132,132,133,133,133,133,...]]]
  }

The format of the drawing array is as following:

[ 
  [  // First stroke 
    [x0, x1, x2, x3, ...],
    [y0, y1, y2, y3, ...],
    [t0, t1, t2, t3, ...]
  ],
  [  // Second stroke
    [x0, x1, x2, x3, ...],
    [y0, y1, y2, y3, ...],
    [t0, t1, t2, t3, ...]
  ],
  ... // Additional strokes
]

Where x and y are the pixel coordinates, and t is the time in milliseconds since the first point. x and y are real-valued while t is an integer. The raw drawings can have vastly different bounding boxes and number of points due to the different devices used for display and input.

Preprocessed dataset

We've preprocessed and split the dataset into different files and formats to make it faster and easier to download and explore.

Simplified Drawing files (`.ndjson`)

We've simplified the vectors, removed the timing information, and positioned and scaled the data into a 256x256 region. The data is exported in ndjson format with the same metadata as the raw format. The simplification process was:

Align the drawing to the top-left corner, to have minimum values of 0.
Uniformly scale the drawing, to have a maximum value of 255.
Resample all strokes with a 1 pixel spacing.
Simplify all strokes using the Ramer–Douglas–Peucker algorithm with an epsilon value of 2.0.

There is an example in examples/nodejs/simplified-parser.js showing how to read ndjson files in NodeJS.
Additionally, the examples/nodejs/ndjson.md document details a set of command-line tools that can help explore subsets of these quite large files.

Binary files (`.bin`)

The simplified drawings and metadata are also available in a custom binary format for efficient compression and loading.

There is an example in examples/binary_file_parser.py showing how to load the binary files in Python.
There is also an example in examples/nodejs/binary-parser.js showing how to read the binary files in NodeJS.

Numpy bitmaps (`.npy`)

All the simplified drawings have been rendered into a 28x28 grayscale bitmap in numpy .npy format. The files can be loaded with np.load(). These images were generated from the simplified data, but are aligned to the center of the drawing's bounding box rather than the top-left corner. See here for code snippet used for generation.

Get the data

The dataset is available on Google Cloud Storage as ndjson files seperated by category. See the list of files in Cloud , or read more about accessing public datasets using other methods. As an example, to easily download all simplified drawings, one way is to run the command gsutil -m cp 'gs://quickdraw_dataset/full/simplified/*.ndjson' .

Full dataset seperated by categories

Raw files (.ndjson)
Simplified drawings files (.ndjson)
Binary files (.bin)
Numpy bitmap files (.npy)

Sketch-RNN QuickDraw Dataset

This data is also used for training the Sketch-RNN model. An open source, TensorFlow implementation of this model is available in the Magenta Project, (link to GitHub repo). You can also read more about this model in this Google Research blog post. The data is stored in compressed .npz files, in a format suitable for inputs into a recurrent neural network.

In this dataset, 75K samples (70K Training, 2.5K Validation, 2.5K Test) has been randomly selected from each category, processed with RDP line simplification with an epsilon parameter of 2.0. Each category will be stored in its own .npz file, for example, cat.npz.

We have also provided the full data for each category, if you want to use more than 70K training examples. These are stored with the .full.npz extensions.

Numpy .npz files

Note: For Python3, loading the npz files using np.load(data_filepath, encoding='latin1', allow_pickle=True)

Instructions for converting Raw ndjson files to this npz format is available in this notebook.

Projects using the dataset

Here are some projects and experiments that are using or featuring the dataset in interesting ways. Got something to add? Let us know!

Creative and artistic projects

Letter collages by Deborah Schmidt
Face tracking experiment by Neil Mendoza
Faces of Humanity by Tortue
Infinite QuickDraw by kynd.info
Misfire.io by Matthew Collyer
Draw This by Dan Macnish
Scribbling Speech by Xinyue Yang
illustrAItion by Ling Chen
Dreaming of Electric Sheep by Dr. Ernesto Diaz-Aviles

Data analyses

How do you draw a circle? by Quartz
Forma Fluens by Mauro Martino, Hendrik Strobelt and Owen Cornec
How Long Does it Take to (Quick) Draw a Dog? by Jim Vallandingham
Finding bad flamingo drawings with recurrent neural networks by Colin Morris
Facets Dive x Quick, Draw! by People + AI Research Initiative (PAIR), Google
Exploring and Visualizing an Open Global Dataset by Google Research
Machine Learning for Visualization - Talk / article by Ian Johnson

Papers

A Neural Representation of Sketch Drawings by David Ha, Douglas Eck, ICLR 2018. code
Sketchmate: Deep hashing for million-scale human sketch retrieval by Peng Xu et al., CVPR 2018.
Multi-graph transformer for free-hand sketch recognition by Peng Xu, Chaitanya K Joshi, Xavier Bresson, ArXiv 2019. code
Deep Self-Supervised Representation Learning for Free-Hand Sketch by Peng Xu et al., ArXiv 2020. code
SketchTransfer: A Challenging New Task for Exploring Detail-Invariance and the Abstractions Learned by Deep Networks by Alex Lamb, Sherjil Ozair, Vikas Verma, David Ha, WACV 2020.
Deep Learning for Free-Hand Sketch: A Survey by Peng Xu, ArXiv 2020.
A Novel Sketch Recognition Model based on Convolutional Neural Networks by Abdullah Talha Kabakus, 2nd International Congress on Human-Computer Interaction, Optimization and Robotic Applications, pp. 101-106, 2020.

Guides & Tutorials

Code and tools

Quick, Draw! Polymer Component & Data API by Nick Jonas
Quick, Draw for Processing by Cody Ben Lewis
Quick, Draw! prediction model by Keisuke Irie
Random sample tool by Learning statistics is awesome
SVG rendering in d3.js example by Ian Johnson (read more about the process here)
Sketch-RNN Classification by Payal Bajaj
quickdraw.js by Thomas Wagenaar
~ Doodler ~ by Krishna Sri Somepalli
quickdraw Python API by Martin O'Hanlon
RealTime QuickDraw by Akshay Bahadur
DataFlow processing by Guillem Xercavins
QuickDrawGH Rhino Plugin by James Dalessandro
QuickDrawBattle by Andri Soone

Changes

May 25, 2017: Updated Sketch-RNN QuickDraw dataset, created .full.npz complementary sets.

License

This data made available by Google, Inc. under the Creative Commons Attribution 4.0 International license.

Dataset Metadata

The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.

property value

name The Quick, Draw! Dataset

alternateName Quick Draw Dataset

alternateName quickdraw-dataset

url https://github.com/googlecreativelab/quickdraw-dataset

sameAs https://github.com/googlecreativelab/quickdraw-dataset

description The Quick Draw Dataset is a collection of 50 million drawings across 345 categories, contributed by players of the game "Quick, Draw!". The drawings were captured as timestamped vectors, tagged with metadata including what the player was asked to draw and in which country the player was located.\n \n Example drawings: ![preview](https://raw.githubusercontent.com/googlecreativelab/quickdraw-dataset/master/preview.jpg)

provider

property	value
name	`Google`
sameAs	`https://en.wikipedia.org/wiki/Google`

license

property	value
name	`CC BY 4.0`
url	`https://creativecommons.org/licenses/by/4.0/`

quickdraw-dataset's People

Contributors

Stargazers

Watchers

Forkers

codeaudit ml-lab jdc08161063 pandamax lkngin fyxx oppa3109 abhi-jha sekhar2017 yifengyiye dat-ng mayidudu shmuma caomw aleksandrfomkin vikingmew zeromtmu benjamesbabala hbcbh1999 mfilipav conqs eric-seekas hammingcube mentrics zententacles happyport bestbeknown ethertyper lolalow mylearning2017 haojile melioratus wangjunsheng riccitensor neelkadia-zz leolorenzoluis beastbird raquelredo tigerdeng luispalacios ajay-wong wahalulu danielpetrica d4le joshnewnham little1tow nulledexceptions shenglih abhibisht89 kkac tmrt akinhwan caozhengquan coocoky kieranbrowne amab94 ganji15 fototo widebluesky iamjesse98 minskybelieve heechulk tony32769 chenbangfeng jemapau gxlcliqi jessynd 1iyiwei wangyu9 cs0101 hkejigu absarf chenxuelee percy151618 kublaj hidannyxu frankatmech crazy121 allensmile midasc ilyankou subhodeep gaos1 sandro-pasquali ubaidsayyed54 abhicps hieuqtran alexxnica kryndex imathno gggdkdjs liulin7576 abesleistigal carvrodrigo gvpriya philip-le caesarfeta hariesramdhani laura-yu lingnanhe

quickdraw-dataset's Issues

Align jpg image file's strokes to the center of the image as in numpy data.

I need to align drawings strokes to the center of the image in which drawing is going to save as in numpy format in given data is done.

How to transform .npz to photograph

Hi
Now our team want to use the .npz dataset to do other research,but we tried many times, we can't transform the .npz numpy array to the grtaph like .jpg or .png. we show the shape about the array is (28,3). so we can't get back to rgb graph.
We read the quick draw rnn dataset paper, still don't know how to transform them.
Can you help me to solve it out?

Best wishes
Hans Yang

Problem with the category name on clicking the randomize button

When randomize button is clicked, the doodles for the next category are loaded but on hovering over the doodles we find the name of the previous category.

Convert saved drawing images/files (png/jpg) to same numpy (npy) bitmaps for prediction

I have been scratching my head for over 5 days now trying various models and code repos and still have not been able to make it work. The model trains well and evals well but I am failing at actual predictions.
Instead of models based on drawing strokes, I have been playing with models using actual drawing images to predict (like a image classifier) and most of these models use the numpy bitmaps dataset (npy files).

Everything is well and good except the part to feed the model drawing from a saved image file (since most of these articles or code repos fed it via canvas or JS or android). I tried to replicate their prediction code (mainly image processing) as much as I can in python but the predictions are still way way wrong.

Here is my image processing and prediction code:

from PIL import Image
import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
from random import randint
from scipy.misc.pilutil import imsave, imread, imresize
%matplotlib inline  

clock = qd.get_drawing("circle")
apple = clock
apple.image.save("apple.png")


mypath = "data/"
txt_name_list = []
for (dirpath, dirnames, filenames) in walk(mypath):
        if filenames != '.DS_Store':
            txt_name_list.extend(filenames)
            break
    
    

def adjust_gamma(image, gamma=1.5):
   invGamma = 1.0 / gamma
   table = np.array([((i / 255.0) ** invGamma) * 255
      for i in np.arange(0, 256)]).astype("uint8")

   return cv.LUT(image, table)


def preprocess(img):
    # for sketch & not canvas drawings use the following:

    gray = cv.bilateralFilter(img, 9, 75, 75)
    #
    gray = cv.erode(gray, None, iterations=1)
    #
    gray = adjust_gamma(gray, 1.1)
    #return gray

    th3 = cv.adaptiveThreshold(gray, 255, cv.ADAPTIVE_THRESH_GAUSSIAN_C,cv.THRESH_BINARY_INV, 11, 2)
    #th3 = cv.adaptiveThreshold(img, 255, cv.ADAPTIVE_THRESH_GAUSSIAN_C,cv.THRESH_BINARY_INV, 11, 2)
    return th3
  
  
  
#img = apple.image.convert("L")

#imgData = request.get_data()
#convertImage(imgData)
print("debug")

x = imread('apple.png', mode='L')

x = preprocess(x)

#x = cv.bitwise_not(x)



x = imresize(x, (32, 32))

x = x.astype('float32')
x /= 255

x = x.reshape(1, 32, 32, 1)

print(txt_name_list)
#print(x)

out = model.predict(x)
#print(out)
print(np.argmax(out, axis=1))
index = np.array(np.argmax(out, axis=1))
index = index[0]

print(txt_name_list[index])

plt.imshow(x.squeeze())

There is quite a difference between how image looks in numpy dataset and how it comes after I process it.

Here is my full model:

from __future__ import print_function
import  numpy  as  np
import matplotlib.pyplot as plt
from  sklearn.model_selection  import train_test_split
from os import walk, getcwd
import h5py
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
import cv2 as cv
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D, ZeroPadding2D, BatchNormalization, AveragePooling2D
from keras import backend as K
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from keras.optimizers import SGD
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping,ModelCheckpoint
from sklearn.metrics import confusion_matrix

#For Multi GPU
from keras.utils import multi_gpu_model
from keras import metrics

batch_size = 128

epochs = 40

img_rows, img_cols = 28, 28

mypath = "data/"
txt_name_list = []

#slice_train = 30500
slice_train = 10000

def top_3_acc(y_true, y_pred):
    return metrics.top_k_categorical_accuracy(y_true, y_pred, k=3)

def readData():
    x_train = []
    x_test = []
    y_train = []
    y_test = []
    xtotal = []
    ytotal = []
    x_val = []
    y_val = []

    for (dirpath, dirnames, filenames) in walk(mypath):
        if filenames != '.DS_Store':
            txt_name_list.extend(filenames)
            break

    #print(mypath)
    i=0
    classescount = 0

    for txt_name in txt_name_list:
        txt_path = mypath + txt_name
        x = np.load(txt_path)
        print(txt_name)
        print(i)
        classescount += 1
        x = x.astype('float32') / 255.  ##scale images
        y = [i] * len(x)
        x = x[:slice_train]
        y = y[:slice_train]

        if i != 0:
            xtotal = np.concatenate((x, xtotal), axis=0)
            ytotal = np.concatenate((y, ytotal), axis=0)
        else:
            xtotal = x
            ytotal = y
        i += 1

    print(classescount)
    print("xshape = ", xtotal.shape)
    print("yshape = ", ytotal.shape)
    x_train, x_test, y_train, y_test = train_test_split(xtotal, ytotal, test_size=0.3, random_state=42)
    x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2, random_state=1)

    return x_train, x_val, x_test, y_train, y_val, y_test, classescount


def lenet(x_train, x_val, x_test, y_train, y_val, y_test, num_classes):
    if K.image_data_format() == 'channels_first':
        x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
        x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
        x_val = x_val.reshape(x_val.shape[0], 1, img_rows, img_cols)
        input_shape = (1, img_rows, img_cols)
    else:
        x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
        x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
        x_val = x_val.reshape(x_val.shape[0], img_rows, img_cols, 1)
        input_shape = (img_rows, img_cols, 1)

    # more reshaping
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
    x_val = x_val.astype('float32')
    x_train /= 255
    x_test /= 255
    x_val /= 255

    # convert class vectors
    y_train = keras.utils.to_categorical(y_train, num_classes)
    y_test = keras.utils.to_categorical(y_test, num_classes)
    y_val = keras.utils.to_categorical(y_val, num_classes)

    x_train = np.pad(x_train, ((0, 0), (2, 2), (2, 2), (0, 0)), 'constant')
    x_val = np.pad(x_val, ((0, 0), (2, 2), (2, 2), (0, 0)), 'constant')
    x_test = np.pad(x_test, ((0, 0), (2, 2), (2, 2), (0, 0)), 'constant')

    print('x_train shape:', x_train.shape)
    print(x_train.shape[0], 'train samples')
    print(x_val.shape[0], 'validation samples')
    print(x_test.shape[0], 'test samples')

    print(y_train.shape)

    print(input_shape)

    model = Sequential()

    model.add(Conv2D(filters=6, kernel_size=(3, 3), activation='relu', input_shape=(32, 32, 1)))
    model.add(AveragePooling2D())

    model.add(Conv2D(filters=16, kernel_size=(3, 3), activation='relu'))
    model.add(AveragePooling2D())

    model.add(Flatten())

    model.add(Dense(units=120, activation='relu'))

    model.add(Dense(units=84, activation='relu'))

    model.add(Dense(units=num_classes, activation='softmax'))

    filepath = "saved/weightslenet.{epoch:02d}.h5"
    ES = EarlyStopping(patience=5)
    check = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=False, mode='max')

    #model.compile(optimizer=Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
    model.compile(optimizer=Adam(), loss='categorical_crossentropy', metrics=['accuracy', top_3_acc])
    #Trying Multi GPU
    #model = multi_gpu_model(model, gpus=2)
    #model.compile(optimizer=Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
    
    model.fit(x_train, y_train,batch_size=batch_size,epochs=epochs,verbose=1, validation_data=(x_val, y_val), callbacks=[ES, check])
    #model.fit(x_train, y_train,batch_size=batch_size,epochs=epochs,verbose=1, validation_data=(x_val, y_val), callbacks=[ES, check])

    score = model.evaluate(x_test, y_test, verbose=0)
    print('Test loss:', score[0])
    print('Test accuracy:', score[1])

    model.save('cnnOld2.h5')
    print("Saved model to disk")
    #
    # cm = metrics.confusion_matrix(test_batch.classes, y_pred)
    # # or
    # # cm = np.array([[1401,    0],[1112, 0]])
    #
    # plt.imshow(cm, cmap=plt.cm.Blues)
    # plt.xlabel("Predicted labels")
    # plt.ylabel("True labels")
    # plt.xticks([], [])
    # plt.yticks([], [])
    # plt.title('Confusion matrix ')
    # plt.colorbar()
    # plt.show()
    print(y_test)

    loaded_model = keras.models.load_model('cnnOld2.h5', custom_objects={"top_3_acc": top_3_acc})
    print("test")
    #y_pred = loaded_model.predict_on_batch(x_test)
    #score = loaded_model.evaluate(x_test, y_test, verbose=0)

    y_pred = loaded_model.predict(x_test)
    print(y_pred)

    indexes = np.argmax(y_pred, axis=1)
    i=0
    for y in y_pred:
        y[y<1000]=0
        # print("allzero",y)
        y[indexes[i]] = 1
        i+=1

    cm = confusion_matrix(
        y_test.argmax(axis=1), y_pred.argmax(axis=1))
    acc = accuracy_score(y_test.argmax(axis=1), y_pred.argmax(axis=1), normalize=True, sample_weight=None)
    cr = classification_report(y_test.argmax(axis=1), y_pred.argmax(axis=1))
    print(cm)
    print(acc)
    print(cr)

def main():
    x_train, x_val, x_test, y_train, y_val, y_test, num_classes = readData()
    lenet(x_train, x_val, x_test, y_train, y_val, y_test, num_classes)

if __name__ == '__main__':
    main()

Looking for a ressource to help me export the model for tensorflow model serving

Based on the mnist serving tutorial I'm trying to export the quickdraw-data set to load it into the tensorflow-model-server and make a client request that returns me the prediction of the input data.
But I'm struggling to get the export right in combination with the correct client request to the server.

Are there any ressources available that might help me getting the model export right for the model server?
The mnist example and the tensorflow documentation are not sufficient for me.

Thanks in advance!

Is there a way to view drawings by subject?

So, there are 345 categories. I'm curious how the drawings look by each category. I'd like a way to just see airplanes on one screen.

Inappropriate content in Dog data

Hey, I don't know if this is the right place to report this, but I randomly spotted some, uh, inappropriate content in the moderated json dataset for "Dog":

Entry 3230 spells out "F*** YOU"
{"word":"dog","countrycode":"IN","timestamp":"2017-03-14 13:45:59.72155 UTC","recognized":false,"key_id":"5976946842271744","drawing":[[[0,35],[40,115]],[[0,7,57],[48,38,6]],[[24,45,59,61],[63,55,45,49]],[[71,86,98,102,108,107],[71,81,77,68,34,16]],[[129,106,101,104,111,139,157,162],[19,32,46,58,66,76,77,72]],[[180,215,207,205,212,225,236,255,249,223,220,245],[0,93,69,47,36,27,24,26,37,56,62,79]],[[107,100,51,45],[145,159,224,237]],[[37,70],[155,183]],[[125,116,115,119,127,144,152,161,162,162,154,132,125],[189,209,219,230,236,238,232,223,207,195,182,178,190]],[[190,197,201,215,225,237,237],[182,215,219,220,211,180,168]]]}

In portuguese clown expected crown images

Preconditions:
The game displayed "Draw" Palhaço(Clown).
Step by step:
1 - draw a Clown
2 - Finished the 6 drawings
3 - Click on Clown Draw and check the examples drawn by other people.

Current behavior:
The examples are Crown (Coroa in portuguese) images

Expeced behavior:
The examples are Clown images

100% recognized in the first and last days

I was interested in seeing the percentage of hits over time. So, for some themes I tabulated the variable "recognized" and the days. I got graphics with the same format. Here are some examples.

Does anyone know the reason for this pattern? Is there something I'm missing?

Script for rendering simplified images to 28x28 grayscale numpy format

Can the script that was used to convert from .ndjson to .npy be open-sourced? I've trained a model on the .npy files and want to be able to take new .ndjson examples and run them through my model. Thanks!

Pretrained model

Hello, are you also considering releasing the trained neural network used for recognition of images? O:-) The Sketch-RNN is a generative model, from what I understand by a quick look.

Unable to recognize camera data？

Draw T-shirt on paper, then get the corresponding camera data stream, but can't recognize it
Is there any way to convert camera data to the drawing array？

Symbol on coffee cup

the coffee cup ndjson with key id 6723059434127360 has a symbol on it that may be the peaceful Swastika or could be the Nazi Hakenkreuz

Screen should autoscroll

When clicking a drawing at the bottom of the data page to see the detail, the detail pop up bubble is rendered below the screen, this should either auto scroll or render above where it has been clicked knowing it is near the bottom of the page

Translation

In portuguese there is a mistake with the words "clown" and "crown".

Can a normal doodle image be used ?

Tried searching on this topic quite a lot but could not find any information. Seems most of the implementation of Quick Draw are done using strokes data rather than the doodle images itself, due to which for prediction/inference you need to provide stroke data of your drawing too and not a image of the drawing, atleast from what I have concluded till now.

So I wanted to ask and confirm, is it correct that you cannot use a image of your drawing/doodle to make same kind of categorical prediction like Quick Draw ?

Any reference or pointers on this would be much appreciated

Let me finish my doodle

User should be allowed to finish the doodle in given time , not cut off immediately after being recognized... this way you get unfinished doodles in a database which is useless.

Hi There, anyone who can help me to figure out why I cannot open my csv file on Jupyter?

import pandas

df= pandas.read_csv('inventory_INVENTORY_TRANSACTION_.csv')

print(df)
import pandas

df= pandas.read_csv('inventory_INVENTORY_TRANSACTION_.csv')

print(df)

ParserError Traceback (most recent call last)
in
1 import pandas
2
----> 3 df= pandas.read_csv('inventory_INVENTORY_TRANSACTION_.csv')
4
5 print(df)

~\anaconda3\lib\site-packages\pandas\io\parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
608 kwds.update(kwds_defaults)
609
--> 610 return _read(filepath_or_buffer, kwds)
611
612

~\anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
466
467 with parser:
--> 468 return parser.read(nrows)
469
470

~\anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
1055 def read(self, nrows=None):
1056 nrows = validate_integer("nrows", nrows)
-> 1057 index, columns, col_dict = self._engine.read(nrows)
1058
1059 if index is None:

~\anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
2059 def read(self, nrows=None):
2060 try:
-> 2061 data = self._reader.read(nrows)
2062 except StopIteration:
2063 if self._first_chunk:

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas_libs\parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 20 fields in line 46, saw 33

import pandas

df= pandas.read_csv('C:\Users\Marcos\OneDrive\Desktop\DATA\inventory_INVENTORY_TRANSACTION_.csv')

print(df)
import pandas

df= pandas.read_csv('C:\Users\Marcos\OneDrive\Desktop\DATA\inventory_INVENTORY_TRANSACTION_.csv')

print(df)

NameError Traceback (most recent call last)
in
3 # df= pandas.read_csv('C:\Users\Marcos\OneDrive\Desktop\DATA\inventory_INVENTORY_TRANSACTION_.csv')
4
----> 5 print(df)

NameError: name 'df' is not defined

import pandas

df= pandas.read_csv("inventory_INVENTORY_TRANSACTION_.csv")

print(df)
import pandas

df= pandas.read_csv("inventory_INVENTORY_TRANSACTION_.csv")

print(df)

ParserError Traceback (most recent call last)
in
1 import pandas
2
----> 3 df= pandas.read_csv("inventory_INVENTORY_TRANSACTION_.csv")
4
5 print(df)

~\anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
466
467 with parser:
--> 468 return parser.read(nrows)
469
470

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas_libs\parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 20 fields in line 46, saw 33

import pandas

df= pandas.read_csv("inventory_INVENTORY_TRANSACTION_.csv")

df.head()
import pandas

df= pandas.read_csv("inventory_INVENTORY_TRANSACTION_.csv")

df.head()

ParserError Traceback (most recent call last)
in
1 import pandas
2
----> 3 df= pandas.read_csv("inventory_INVENTORY_TRANSACTION_.csv")
4
5 df.head()

~\anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
466
467 with parser:
--> 468 return parser.read(nrows)
469
470

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas_libs\parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 20 fields in line 46, saw 33``

Modern numpy.load does not accept these .npz files

What is a current, up-to-date code sample that can load the .npz files in this distribution?
This no longer works in Python 3:

import numpy as np
x = np.load(file_path, allow_pickle=True, encoding='latin1')
print(x.keys())

Here is the error for 'cats.npz'

x = np.load(file, allow_pickle=True)
Traceback (most recent call last):
File "/Users/l0n008k/anaconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 454, in load
return pickle.load(fid, **pickle_kwargs)
_pickle.UnpicklingError: invalid load key, '\x0a'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/Users/l0n008k/anaconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 457, in load
"Failed to interpret file %s as a pickle" % repr(file))
OSError: Failed to interpret file '/Users/l0n008k/Downloads/cat.npz' as a pickle

Error with Placeholder_1

I tried to run el script https://github.com/tensorflow/magenta-demos/blob/master/jupyter-notebooks/Sketch_RNN.ipynb but with diferente dataset (bread.npz) and with the checkpoints of that and i have the next problem :
![69332857-d8f75f80-0c57-11ea-813c-22fe4db6b636]
(https://user-images.githubusercontent.com/38189240/69523732-e2890c00-0f64-11ea-9e81-62b97e725735.png)

Here is the code (https://github.com/Pauladds/Rnn_sketch) , i am running "blob_sketch_RNN-bread" but i don't know my problem...

the article on tensorflow.org about quick draw now is 404

I want to learn how to train my own drawing classifier with tensorflow. Is there any way to get the article back?

Code to process raw dataset to simplified dataset

The simplification process is discussed as follows.

Align the drawing to the top-left corner, to have minimum values of 0.
Uniformly scale the drawing, to have a maximum value of 255.
Resample all strokes with a 1 pixel spacing.
Simplify all strokes using the Ramer–Douglas–Peucker algorithm with an epsilon value of 2.0.

Where can I find the source code that implements the above steps for simplifying the raw dataset? Thanks.

How are authors' rights handled?

Wouldn't Quick, Draw! either need to make it clear to its users that their drawings were going to be open sourced, or get permission for all 50 million users before open-sourcing their hand-made drawings?

I am curious about the implications of Authors' Rights in such projects. No matter how small and simple a drawing, it is still a unique, hand created artefact and therefore such rights surely apply?

How is this handled by Google?

How could I get information like "countrycode" in numpy_bitmap dataset.

Hello,

I wonder how could I get information like "word", "countrycode", "recognized" in numpy_bitmap. I use up.load() to get the data and the only information I got is about the image. I wonder how could I get more information.

I tried binary data. It contains the information data I need but I want the image into 28*28 format. I tried to use vector_to_raster() function but failed since my image format is like:

[((0, 31, 70, 97, 121, 195, 230), (46, 38, 9, 0, 0, 29, 32)),
((3, 24, 98, 118, 157, 181, 197, 212, 255),
(46, 45, 72, 83, 88, 77, 54, 42, 30)),
((120, 109, 99, 92, 91, 105, 116, 129, 142, 150, 155, 156, 146, 109),
(1, 2, 11, 25, 41, 66, 75, 79, 77, 66, 54, 28, 15, 1)),
((109, 104, 103, 113, 122, 138, 146, 150), (8, 13, 32, 51, 57, 57, 52, 44))]

Thank you so much for your help! :)

There is an api for recognize my drawing?

There is an api can I send it an image/raw and it will tell me what is the drawings using quickdraw AI ?

https://github.com/mattermost/docs/projects/2#card-21675750

Time until first point?

At the moment, the first point of the first stroke has t = 0. It would be super interesting to also be able to know how many milliseconds after the player was asked to draw something they actually made their first stroke! Is there anyway of incorporating this info in any future releases of the data set?

Broken links

It looks like all data links in readme are broken. Thank you for this amazing work. It would be great this issue could be fixed. thank you.

Can't find the drawings

I can't find the drawings!

I can't find the drawings! Why don't you make it easier?

Heart symbol

What do you think about adding a heart symbol category to the app? I believe it would be a good candidate for QuickDraw as it can is fairly language agnostic, very recognizable, and can be easily drawn with a single stroke.

On the same topic, is this the best place to propose new symbols?

.numpy to bitmap format on Mac?

Okay, since the batch processing of .ndjson to SVG looks like it's going to be problematic, how about suggestions for batch converting the numpy format images into plain old vanilla bitmaps on Mac OS X?
Ideally through a nice simple app, since my Python familiarity is zilch.

Komal

I love meesho

Quick draw

How do I make a drawing to train the Google AI to recognize the drawings??????? 🤷‍♀️

your very confused user of the Quick Draw website

preprocessing new png/jpg image to predict on deep learning model

When I load npy data which was provided by google quick draw, the prediction works fine on my deep learning model.

data_url = '/content/gdrive/My Drive/Colab Notebooks/img/numpy_bitmap/sun.npy'
example_cat = np.load(data_url)

cat_len = example_cat.shape[0] # number of total image

start_num = 11 

example = example_cat[start_num,:784+start_num]

plt.imshow(example.reshape(28, 28))
example = example.reshape(28,28,1).astype('float32')
example /=255.0
print(example)

import matplotlib.pyplot as plt
from random import randint
%matplotlib inline  

pred = model.predict(np.expand_dims(example, axis=0))[0]
ind = (-pred).argsort()[:5]
print(ind)
latex = [categories_dict[x] for x in ind]
plt.imshow(example.squeeze()) 
print(latex)

somehow the image file won't be uploaded here so I attach the result of above code by the link: https://s3.us-west-2.amazonaws.com/secure.notion-static.com/57460690-0ad2-42c1-9d5a-cc9d756534ea/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAT73L2G45O3KS52Y5%2F20201104%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20201104T012109Z&X-Amz-Expires=86400&X-Amz-Signature=cc6febeea2aee67315b8c7d353e7e378677ac965707a8ac0eb7bb6ecfb8a5f0b&X-Amz-SignedHeaders=host&response-content-disposition=filename%20%3D%22Untitled.png%22

Then, I captured the exact same image and saved it as a png image. I loaded the file again as NumPy array and preprocessed it so that I can put in my model to predict which category it belongs to. And somehow it does not work and returns completely different prediction. This is happening for every new png image I am trying to work with.

im = cv2.imread('/content/gdrive/My Drive/Colab Notebooks/sun2.PNG', cv2.IMREAD_GRAYSCALE)
resize_img = cv2.resize(im, (28,28), interpolation = cv2.INTER_AREA) 
img_vector = np.asarray(resize_img, dtype="uint8")
img = img_vector.reshape(28,28,1).astype('float32')

import matplotlib.pyplot as plt
from random import randint
%matplotlib inline  

img /= 255.0
pred = model.predict(np.expand_dims(img, axis=0))[0]
ind = (-pred).argsort()[:5]
print(ind)
latex = [categories_dict[x] for x in ind]
plt.imshow(img.squeeze()) 
print(latex)

again, I attach the result for this code as a link: https://s3.us-west-2.amazonaws.com/secure.notion-static.com/a64bd7ce-f72f-4c67-a9f3-0689421ef10e/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAT73L2G45O3KS52Y5%2F20201104%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20201104T012838Z&X-Amz-Expires=86400&X-Amz-Signature=1cc52972a8a50202efa90618acd41c8be483f188f5b70d092ef63c6bc5ce8a18&X-Amz-SignedHeaders=host&response-content-disposition=filename%20%3D%22Untitled.png%22

below is how I preprocessed data and how I trained my model.

# Reshape and normalize
x_train = x_train.reshape(x_train.shape[0], image_size, image_size, 1).astype('float32')
x_test = x_test.reshape(x_test.shape[0], image_size, image_size, 1).astype('float32')
#image_size is 28

x_train /= 255.0
x_test /= 255.0

# Convert class vectors to class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

def cnn_model():
    # create model
    model = Sequential()
    model.add(Conv2D(30, (5, 5), input_shape=x_train.shape[1:], activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(15, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.2))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dense(50, activation='relu'))
    model.add(Dense(num_classes, activation='softmax'))
    # Compile model
    
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

the training process and evaluation results are like below.

Epoch 1/100
22356/22356 [==============================] - 1323s 59ms/step - loss: 2.7714 - accuracy: 0.3795 - val_loss: 2.2759 - val_accuracy: 0.4751
Epoch 2/100
22356/22356 [==============================] - 1339s 60ms/step - loss: 2.3925 - accuracy: 0.4481 - val_loss: 2.1659 - val_accuracy: 0.4948
Epoch 3/100
22356/22356 [==============================] - 1323s 59ms/step - loss: 2.3365 - accuracy: 0.4588 - val_loss: 2.1333 - val_accuracy: 0.5015
Epoch 4/100
22356/22356 [==============================] - 1303s 58ms/step - loss: 2.3131 - accuracy: 0.4630 - val_loss: 2.1396 - val_accuracy: 0.4996
Epoch 5/100
22356/22356 [==============================] - 1262s 56ms/step - loss: 2.3013 - accuracy: 0.4655 - val_loss: 2.1199 - val_accuracy: 0.5026
Epoch 6/100
22356/22356 [==============================] - 1326s 59ms/step - loss: 2.2932 - accuracy: 0.4663 - val_loss: 2.1190 - val_accuracy: 0.5046
Epoch 7/100
22356/22356 [==============================] - 1269s 57ms/step - loss: 2.2870 - accuracy: 0.4676 - val_loss: 2.1067 - val_accuracy: 0.5053
Epoch 8/100
22356/22356 [==============================] - 1299s 58ms/step - loss: 2.2844 - accuracy: 0.4678 - val_loss: 2.1090 - val_accuracy: 0.5053
Epoch 9/100
22356/22356 [==============================] - 1288s 58ms/step - loss: 2.2828 - accuracy: 0.4683 - val_loss: 2.1147 - val_accuracy: 0.5045
Epoch 10/100
22356/22356 [==============================] - 1289s 58ms/step - loss: 2.2797 - accuracy: 0.4683 - val_loss: 2.0907 - val_accuracy: 0.5073
Epoch 11/100
22356/22356 [==============================] - 1280s 57ms/step - loss: 2.2784 - accuracy: 0.4690 - val_loss: 2.1087 - val_accuracy: 0.5058
Epoch 12/100
22356/22356 [==============================] - 1262s 56ms/step - loss: 2.2787 - accuracy: 0.4688 - val_loss: 2.1078 - val_accuracy: 0.5035
Epoch 13/100
22356/22356 [==============================] - 1335s 60ms/step - loss: 2.2773 - accuracy: 0.4690 - val_loss: 2.1078 - val_accuracy: 0.5049
Epoch 14/100
22356/22356 [==============================] - 1292s 58ms/step - loss: 2.2789 - accuracy: 0.4687 - val_loss: 2.1239 - val_accuracy: 0.5014
Epoch 15/100
22356/22356 [==============================] - 1277s 57ms/step - loss: 2.2824 - accuracy: 0.4676 - val_loss: 2.1220 - val_accuracy: 0.5016
Epoch 16/100
22356/22356 [==============================] - 1291s 58ms/step - loss: 2.2816 - accuracy: 0.4682 - val_loss: 2.1093 - val_accuracy: 0.5058
CPU times: user 18h 13min 31s, sys: 4h 19min 8s, total: 22h 32min 40s
Wall time: 5h 46min 14s

19407/19407 [==============================] - 101s 5ms/step - loss: 2.1135 - accuracy: 0.5047
Test accuarcy: 50.47%

I am assuming that something is wrong with how I am preprocessing the data, but I cannot find why this is happening and what I am doing wrong. I would be glad if you'd look up what is needed to be done for my code or data. Thank you for open sourcing this amazing project.

Could the images be released under CC Zero license?

To free them for use by anyone for any reason. Attribution could still be made, but they can spread easier.

Cull scribbles

This is true for many images, but people get frustrated and then scribble out their drawing. (try "tiger" for example.) It would be nice to cull out the ones where the artist said "to heck with it" and scratched out their drawing.

How can I convert it to Core ML file to swift project?

Hi,

I think the probject is awsome, but how can I convert it to mlmodel file, so I can use it in my swift project?

Any plan to update the dataset ?

The GCP folder shows that the raw data have been modified last in 2017. Any plan to update them with the data collected after that time ?

idunno

idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno
idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno well if you don't know this will be you
idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno idunno in github so look out

How is the stroke data represented (square examples)?

I've been scouring this Github (reading its source code) and reading the following documentation https://quickdraw.readthedocs.io/en/latest/api.html#

But I don't understand the structure of some_image.strokes.

I've filtered all squares to have no_of_strokes == 4. Looking at the images, this is indeed the case as the strokes are subtly separated from each other.

But then when I inspect the data, I see things such as:

Square 1 (a perfectly fine square upon image inspection)

[[(42, 7), (31, 63), (16, 124), (0, 229)], 
[(40, 0), (255, 11)], 
[(7, 229), (116, 229), (211, 223)], 
[(251, 18), (233, 211), (227, 218), (213, 219)]]

Square 2 (another perfectly fine square upon image inspection)

[[(1, 18), (0, 48), (8, 117), (19, 196), (21, 203), (25, 203)], 
[(14, 5), (55, 4), (105, 8), (253, 3)], 
[(33, 207), (86, 192), (117, 187), (255, 188)], 
[(246, 0), (240, 5), (239, 18), (245, 108), (247, 198)]]

My question: I always thought that lines were drawn in the format of:
(x1, y1) to (x2, y2).

So why are there more than 2 coordinates per array? I'd expect the data to be:

[ [(x1, y1), (x2, y2)],
 [(x1, y1), (x2, y2)],
 [(x1, y1), (x2, y2)],
 [(x1, y1), (x2, y2)] ]

I hope that someone knows, and is kind enough to clear the confusion.

It doesnt recognise a penis

Reconstruct high quality 28x28 .npy files from binary files

First off, thank you so much for providing such a helpful dataset, it truly is a goldmine!

I am having trouble reconstructing the 28x28 images from the binary files provided in this dataset.
What are the actual steps in order to reconstruct the 28x28 images in the provided quality from the binary files?

My issue is quite similar to #15 but I managed to get intermediary results.

Here is my current progress:

Using the examples/binary_file_parser.py I am able to reconstruct the image in any given size by handling the stroke paths.
I am also able to use some blurring to smooth the image.
However the quality of the reconstructed image is nowhere near the 28x28 images dataset provided in this repository.

This is an original image from the 28x28 .npy dataset:

This is my reconstruction using no blurring technique:

And this is my reconstruction using a (2, 2) blur kernel in OpenCV:

Any idea on how to reconstruct these images in the quality that is available when downloading the 28x28 .npy files?
Is there some more advanced smoothing and filtering techniques that I have been missing?

Any way to export the image as .png/.jpg from the ndjson/bin file.

I'm not getting how the block describe the image. Is there any way I can export image as png/jpg?
example block:
{ 'key_id': 6545061917491200, 'image': [((54, 37, 7, 5, 12, 0, 7, 34, 52, 56, 52, 42, 38), (66, 49, 8, 11, 63, 105, 105, 89, 84, 88, 149, 205, 255)), ((59, 62, 88, 95, 97), (66, 13, 1, 1, 33))], 'recognized': 1, 'countrycode': 'SE', 'timestamp': 1485553794 }

Exporting Drawings as Raw SVG's

I want to start off by giving tremendous props to the Quick Draw team for releasing this data set to the public. I can't wait to start digging in!

I'm wondering if anyone would be able to provide documentation on how to generating these drawings as individual SVG's. Or even at the very least where I should start looking to be able to do it myself.

Thanks!

gsutil -m cp gs://quickdraw_dataset/full/simplified/*.ndjson . is wrong

gsutil -m cp 'gs://quickdraw_dataset/full/simplified/*.ndjson' . is correct.

One-to-one correspondence between datasets in different formats

Hi,

Thanks for sharing this awesome dataset. I want to use both the bitmap dataset and the meta information contained in the original raw dataset. So I am wondering if there is a one-to-one correspondence between entries with the same indices in these two datasets?

Thanks.

How to convert the preprocessed bin file to Numpy data?

I am wondering how the preprocessed bin data was converted to Numpy (28x28) data?

In the Numpy data, I see that each location has a value between 0 and 255 (not just 1). How was this value arrived at? I thought the original stroke data contained only x and y coordinates and for each such x,y coordinate, we make a 1? But this does not seem to be the case. Is there a pointer to any algorithm to convert the stroke data to the numpy array?

Flip flops!!!?

It is not flip flops, it was invented by NZ and we named them jandals, I was disgusted when I was told to draw flip flops!!!?

Link to tensorflow does not work

This link does not work any longer.
https://www.tensorflow.org/tutorials/sequences/recurrent_quickdraw

filtering images

I used this data to create a kaggle challenge for my students last semester (https://www.kaggle.com/c/pictionary/leaderboard). There are some disturbing images that people have drawn: swatikas, penises, ... These emerged after examining the misclassified images from the classification model built to separate kangaroo, crab, banana, boomerang, cactus and flip flops. It might be useful to filter the dataset to remove these.

googlecreativelab / quickdraw-dataset Goto Github PK

quickdraw-dataset's Introduction

The Quick, Draw! Dataset

Content

The raw moderated dataset

Preprocessed dataset

Simplified Drawing files (.ndjson)

Binary files (.bin)

Numpy bitmaps (.npy)

Get the data

Full dataset seperated by categories

Sketch-RNN QuickDraw Dataset

Projects using the dataset

Changes

License

Dataset Metadata

quickdraw-dataset's People

Contributors

Stargazers

Watchers

Forkers

quickdraw-dataset's Issues

print(df) import pandas ​ df= pandas.read_csv('inventory_INVENTORY_TRANSACTION_.csv') ​ print(df)

df= pandas.read_csv('C:\Users\Marcos\OneDrive\Desktop\DATA\inventory_INVENTORY_TRANSACTION_.csv')

df= pandas.read_csv('C:\Users\Marcos\OneDrive\Desktop\DATA\inventory_INVENTORY_TRANSACTION_.csv')

​ print(df)

print(df) import pandas ​ df= pandas.read_csv("inventory_INVENTORY_TRANSACTION_.csv") ​ print(df)

df.head() import pandas ​ df= pandas.read_csv("inventory_INVENTORY_TRANSACTION_.csv") ​ df.head()

I can't find the drawings!

Recommend Projects

Recommend Topics

Recommend Org

Simplified Drawing files (`.ndjson`)

Binary files (`.bin`)

Numpy bitmaps (`.npy`)

print(df)
import pandas

df= pandas.read_csv('inventory_INVENTORY_TRANSACTION_.csv')

print(df)

print(df)

print(df)
import pandas

df= pandas.read_csv("inventory_INVENTORY_TRANSACTION_.csv")

print(df)

df.head()
import pandas

df= pandas.read_csv("inventory_INVENTORY_TRANSACTION_.csv")

df.head()