Comments (3)
@dberenbaum, Tree.load()
is used to .dir
file from cache.
Tree.from_list
is a bit generic in that it can also load from files
section in .dvc
files.
Tree
loading creates 2 objects: Meta
, HashInfo
, and this above profiling is for CelebA where we build 500k of those instances. You can see above that it takes 1.5s just to load that in memory.
Of course, we are not doing anything bad here. It's just Python being slow, we have already optimized this in terms of memory and performance. I opened this issue for exploring other ways to improve this.
This affects a lot of commands. Eg: data status
has to load all .dir
files in memory times 2 (from Git HEAD and from the hash in .dvc
file). So a lot of time is lost just loading this in memory.
from dvc-data.
I played with Cython (which required 0 code changes), and got 20-30% improvement. But it will require a lot of infrastructure changes. It has to be 5x-10x improvement to be worth it here.
from dvc-data.
@skshetry Do you have a more "high-level" use case where you can show how much of an impact it has?
from dvc-data.
Related Issues (20)
- meta: capture nlink/ishardlink and islink/issymlink HOT 7
- index: add filewatcher
- index: lazy-load dirs on `view.ls()`
- index: introduce restore method
- dvc commit slow with many files HOT 5
- fs: don't rely on entry.odb/remote objects directly
- hashfile: get rid of state
- odb: move corrupted files to /bad instead of deleting HOT 1
- index: support loading dirs from FileStorage
- index: introduce fetch HOT 3
- index: checkout: support checking out one file without a dir
- transfer/checkout: reduce relink/transfer HOT 11
- index: fetch: use index to cache collected tasks
- index: checkout: add logging
- index: load: introduce onerror
- dvc migrate fails on 3.0 repos
- S3 pull ends up with PermissionError: The difference between the request time and the server's
- conflicting `dvc` script -- typo? HOT 4
- Cannot import name 'umask' from 'dvc_objects.fs.system' HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dvc-data.