tofunlp / lineflow Goto Github PK
View Code? Open in Web Editor NEW:zap:A Lightweight NLP Data Loader for All Deep Learning Frameworks in Python
Home Page: https://towardsdatascience.com/lineflow-introduction-1caf7851125e
License: MIT License
:zap:A Lightweight NLP Data Loader for All Deep Learning Frameworks in Python
Home Page: https://towardsdatascience.com/lineflow-introduction-1caf7851125e
License: MIT License
The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.
from lineflow import Dataset
from lineflow.reducers import FlatMap, Filter
ds = Dataset(range(100)).map(lambda x: [x] * 3)
_ = Filter(lambda x: x % 2 == 0)(ds)
_ = FlatMap(lambda x: x)(ds)
The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.
class Dataset:
def __len__(self):
return len(self._dataset)
The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.
If cache exists, load it.
The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.
Datasets
*** SNLI dataset *** : https://nlp.stanford.edu/projects/snli/
*** MLNLI dataset *** : https://www.nyu.edu/projects/bowman/multinli/
Please provide both of the datasets individually and the combined dataset as ALLNLI
lineflow.Dataset.load
to lineflow.load
Is your feature request related to a problem? Please describe.
train = lfds.SciTLDR(split="train") # IterableDataset
train_mini = train[:10] # Now this is just a python list (List[Any])
If I make a subset of a dataset, it loses all the features such as .map
.
Describe the solution you'd like
Return IterableDataset
in stead of List.
If a directory doesn't exist, make it.
Dependabot couldn't authenticate with https://pypi.org/simple/.
You can provide authentication details in your Dependabot dashboard by clicking into the account menu (in the top right) and selecting 'Config variables'.
Describe the bug
the google drive link to download wmt14 dataset is now unavailable.
To Reproduce
import lineflow.datasets as lfds
train_dataset = lfds.Wmt14("train")
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Smartphone (please complete the following information):
Additional context
I can try finding a working URL when I have some time.
Currently, TextDataset can handle single/multiple files and also combine two or more files with zip/concat method.
But it is implemented in only one class and hard to understand.
So I'd like to break into multiple classes.
yield line.rstrip(os.linesep)
dask.highlevelgraph.HighLevelGraph
dask.base.compute
Add more informative explanation.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.