Comments (6)
So I wonder if we should give the user the option to skip the test for altered media.
I think that would be an easy and good solution. In the long run it could also help that you don't have to download all media files with audb.load_to()
if you just want to fix the header or tables.
from audb.
But of course it's not exactly the same, as you might not alter existing media files, but add new ones.
from audb.
The "Find media" part seems indeed to be the slowest part of publishing a database. We cannot easily avoid this with new data (besides maybe providing the opportunity to provide pre-calculated values?).
But for updating large databases we should definitely provide an option to skip it.
from audb.
Speed of checking existing media has increased, but it might still be a problem when you have a large number of files. On the other hand when adding the argument to skip checking, we introduce a possible source of error during publication.
from audb.
The worst use-case is if you neither upload or alter media, but only change the metadata.
On the other hand when adding the argument to skip checking, we introduce a possible source of error during publication.
I would say we could take that risk given the extreme speed up we would gain.
from audb.
#216 now implements a solution without adding a new argument. Media files that are referenced in the tables and are part of the previous version, must no longer exist in the build folder since for those files, we can safely assume they remain unchanged.
from audb.
Related Issues (20)
- Support to connect to a backend only ones HOT 10
- Error on using `load` with `format` argument HOT 7
- Header as returned by audb.info.header() fails for __eq__()
- Investigate if we should skip zipping of parquet dependency table HOT 7
- Depend on a smaller pyarrow package
- Dependency file error reported when trying to build the documentation locally HOT 3
- `Dependencies._column_loc`: files parameter has a mismatch between typing and implementation HOT 4
- Downloading datasets from public servers fails after some time HOT 4
- Updating and publishing databases without `parquet` fails with 1.7.2 HOT 7
- `ModuleNotFoundError`/`KeyError` when trying to load a database from cache
- Improve definition of dependency table column names and dtypes
- String representation of dependency table might vary
- Requesting versions of a database can fail with ConnectionError
- pathlib._Flavour AttributeError when importing audb (Python 3.12) HOT 4
- Add support for PARQUET file tables HOT 1
- Share more code between audb.load() and audb.load_to()
- Comparing CSV and PARQUET dependency tables might fail
- Include cache handling in documentation on audb load process
- Progress bar estimated remaining time too erratic HOT 2
- Document right settings of shared cache might not be presistent after reboot
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from audb.