The zozo-shift15m's intro from nocotan

[arXiv]

The main motivation of the SHIFT15M project is to provide a dataset that contains natural dataset shifts collected from a web service that was actually in operation for several years. In addition, the SHIFT15M dataset has several types of dataset shifts, allowing us to evaluate the robustness of the model to different types of shifts (e.g., covariate shift and target shift).

We provide the Datasheet for SHIFT15M. This datasheet is based on the Datasheets for Datasets [1] template.

System	Python 3.6	Python 3.7	Python 3.8
Linux CPU
Linux GPU
Windows CPU / GPU	Status Currently Unavailable	Status Currently Unavailable	Status Currently Unavailable
Mac OS CPU

SHIFT15M is a large-scale dataset based on approximately 15 million items accumulated by the fashion search service IQON.

Installation

(WIP) From PyPi

$ pip install shift15m

From Source

$ git clone https://github.com/st-tech/zozo-shift15m.git
$ cd zozo-shift15m
$ poetry build
$ pip install dist/shift15m-xxxx-py3-none-any.whl

Download SHIFT15M dataset

(WIP) Use Dataset class

You can download SHIFT15M dataset as follows:

from shift15.datasets import NumLikesRegression

dataset = NumLikesRegression(root="./data", download=True)

Download Directly by using download scripts

Please download the dataset as follows:

$ bash scripts/download_all.sh

To avoid downloading the test dataset for set matching (80GB), which is not required in training, you can use the following script.

$ bash scripts/download_all_wo_set_testdata.sh

Tasks

The following tasks are now available:

Tasks	Task type	Shift type	# of input dim	# of output dim
NumLikesRegression	regression	target shift	(N,25)	(N,1)
SumPricesRegression	regression	covariate shift, target shift	(N, 1)	(N, 1)
ItemPriceRegression	regression	target shift	(N, 4096)	(N, 1)
ItemCategoryClassification	classification	target shift	(N, 4096)	(N, 7)
Set2SetMatching	set-to-set matching	covariate shift	(N,4096)x(M,4096)	(1)

Benchmarks

As templates for numerical experiments on the SHIFT15M dataset, we have published experimental results for each task with several models.

Original Dataset Structure

The original dataset is maintained in json format, and a row consists of the following:

{
  "user":{"user_id":"xxxx"},
  "like_num":"xx",
  "set_id":"xxx",
  "items":[
    {"price":"xxxx","item_id":"xxxxxx","category_id1":"xx","category_id2":"xxxxx"},
    ...
  ],
  "publish_date":"yyyy-mm-dd"
}

Contributing

To learn more about making a contribution to SHIFT15M, please see the following materials:

LICENSE

The dataset itself is provided under a CC BY-NC 4.0 license. On the other hand, the software in this repository is provided under the MIT license.

Dataset Metadata

The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.

property value

name SHIFT15M Dataset

alternateName SHIFT15M

alternateName shift15m-dataset

url https://github.com/st-tech/zozo-shift15m

sameAs https://github.com/st-tech/zozo-shift15m

description SHIFT15M is a multi-objective, multi-domain dataset which includes multiple dataset shifts.

provider

property	value
name	`ZOZO Research`
sameAs	`https://ja.wikipedia.org/wiki/ZOZO`

license

property	value
name	`CC BY-NC 4.0`
url	`https://github.com/st-tech/zozo-shift15m/blob/main/LICENSE.CC`

Citation

@misc{Kimura_SHIFT15M_Multiobjective_LargeScale_2021,
author = {Kimura, Masanari and Nakamura, Takuma and Saito, Yuki},
month = {8},
title = {SHIFT15M: Multiobjective Large-Scale Fashion Dataset with Distributional Shifts},
year = {2021}
}

Errata

No errata are currently available.

References

[1] Gebru, Timnit, et al. "Datasheets for datasets." arXiv preprint arXiv:1803.09010 (2018).

nocotan / zozo-shift15m Goto Github PK

zozo-shift15m's Introduction

Installation

(WIP) From PyPi

From Source

Download SHIFT15M dataset

(WIP) Use Dataset class

Download Directly by using download scripts

Tasks

Benchmarks

Original Dataset Structure

Contributing

LICENSE

Dataset Metadata

Citation

Errata

References

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent