continuumio / anaconda-package-data Goto Github PK
View Code? Open in Web Editor NEWConda package download data
License: Creative Commons Attribution 4.0 International
Conda package download data
License: Creative Commons Attribution 4.0 International
March data missing and pkg_python info missing. Issue moved from: conda-incubator/condastats#15
Can we please add nvidia channel (https://anaconda.org/nvidia) so we can get download stats for all packages within?
Currently, I don't see any download counts using condastats.
Can we please add nvidia channel (https://anaconda.org/mindspore)? we are working on open source evaluation,and need mindspore download stats from condastats.
I ran into RuntimeError: Decompression 'SNAPPY' not available. Options: ['GZIP', 'UNCOMPRESSED']
while using the binder notebook in this repo.
Firstly: this is a great data source. Thanks for providing it!
I'd love to be able to get the same type of data for a specific anaconda cloud channel that isn't one of the big ones (i.e. not anaconda, conda-forge, or bioconda) so that I can more easily track adoption by OS and Python version for the packages we distribute. Is there an API (or scripts) that I can use for this?
As requested by @sophiamyang , I pass on an issue I opened for condastats since this package depends on the data pipeline in this very repo :
Unable to use condastats.cli.overall (internal error on pandas->pyArrow)
dataconda = condastats.cli.overall([conda_module], monthly=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "[...]/lib/python3.11/site-packages/condastats/cli.py", line 62, in overall
df = dd.read_parquet(
^^^^^^^^^^^^^^^^
File "[...]/python3.11/site-packages/dask/backends.py", line 138, in wrapper
raise type(e)(
ValueError: An error occurred while calling the read_parquet method registered to the pandas backend.
Original Message: ArrowStringArray requires a PyArrow (chunked) array of string type
Thank you for making this data and the documented methods available - fantastic stuff!
I noticed when attempting to use the intake
methods from the README.md
there are Pandas PyArrow errors when using recent versions of Pandas (>=v2.0.0
). This appears to also effect condastats
though maybe through different means. I imagine but don't know whether this could be a Pandas or Dask DataFrame issue at the core, but also wondered about data type management within the Parquet files related to this repo (for ex. are there incompatible types which users should be made aware of?). While it might be an external issue in terms of a fix, maybe this issue could help with increased or updated documentation here.
Specifically, the errors I most often saw were:
ValueError: An error occurred while calling the read_parquet method registered to the pandas backend.
Original Message: ArrowStringArray requires a PyArrow (chunked) array of string type
There also may have been errors regarding "Pandas categorical types".
I worked around the issue by looking at the last modified date of the README.md
(around January 2020) and installing a version of Pandas from around that time (v1.3.5
worked for me).
Hello Anaconda team. We would like to retrieve anaconda download statistics for PyTorch packages.
For this we would need to add following channels to the anaconda-package-data repo:
pytorch : https://anaconda.org/pytorch/
pytorch-test : https://anaconda.org/pytorch-test/
This way we can query it using condastats package.
Appears the last data uploaded was for August. Would it be possible to include the last 2 months?
I have installed Anaconda3-2021.05-windows-x86_64.exe but no package named "Crypto" is found. Is this package only existed in Linux version only?
Hi,
Is there some threshold or rule for inclusion in the stats? The package I'm looking for but can't find is conda-forge/arcticdb.
https://anaconda.org/conda-forge/arcticdb
The package page says 50k downloads but I can't find it in the monthly parquet files.
Thanks,
Installing anaconda on a Linux Mint OS (a distro based on Ubuntu) runs into problems due to the missing keyword "linuxmint" in vscode.py when looking for the OS type. Presently, only "debian" and "ubuntu" are listed in this file for that branch of Linux distros using deb package managers. As a result, running anaconda-navigator fails absent this keyword.
The file supplied has the additions needed for Linux Mint. It runs perfectly. Location:
~/anaconda3/lib/python3.7/site-packages/anaconda_navigator/api/external_apps/vscode.py
vscode.py.zip
I get this error when trying to get download information from anaconda.
SyntaxError: invalid non-printable character U+202F
It was fine is June but this started in July
The examples in the binder notebook are failing with this error:
>>> df = dd.read_parquet('s3://anaconda-package-data/conda/hourly/2018/12/2018-12-31.parquet',
... storage_options={'anon': True})
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-3-37350afb994b> in <module>
1 df = dd.read_parquet('s3://anaconda-package-data/conda/hourly/2018/12/2018-12-31.parquet',
----> 2 storage_options={'anon': True})
/srv/conda/envs/notebook/lib/python3.7/site-packages/dask/dataframe/io/parquet/core.py in read_parquet(path, columns, filters, categories, index, storage_options, engine, gather_statistics, **kwargs)
135 if hasattr(path, "name"):
136 path = stringify_path(path)
--> 137 fs, _, paths = get_fs_token_paths(path, mode="rb", storage_options=storage_options)
138
139 paths = sorted(paths, key=natural_sort_key) # numeric rather than glob ordering
/srv/conda/envs/notebook/lib/python3.7/site-packages/fsspec/core.py in get_fs_token_paths(urlpath, mode, num, name_function, storage_options, protocol)
313 cls = get_filesystem_class(protocol)
314
--> 315 options = cls._get_kwargs_from_urls(urlpath)
316 path = cls._strip_protocol(urlpath)
317 update_storage_options(options, storage_options)
AttributeError: type object 'S3FileSystem' has no attribute '_get_kwargs_from_urls'
I guess s3fs
changed the API in a recent version and should be pinned in environment.yml
.
when running this (both from binder and in my own conda environment, python 3.7, both on windows and linux):
cat = intake.open_catalog('https://raw.githubusercontent.com/ContinuumIO/anaconda-package-data/master/catalog/anaconda_package_data.yaml')
df = cat.anaconda_package_data_by_year(year=2019).to_dask()
I get the following error:
ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied
This used to work one month ago or so. ANy ideas of what's wrong?
It seems to work fine if I say year=2018.
Thanks!
Hi,
I am using the condastats
package, which relies on anaconda-package-data
. When running
import condastats.cli
condastats.cli.overall('numpy')
I get the error message
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/s3fs/core.py", line 110, in _error_wrapper
return await func(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/aiobotocore/client.py", line 265, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/condastats", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/site-packages/condastats/cli.py", line 387, in main
overall(
File "/usr/local/lib/python3.8/site-packages/condastats/cli.py", line 87, in overall
df = df.compute()
File "/usr/local/lib/python3.8/site-packages/dask/base.py", line 315, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/usr/local/lib/python3.8/site-packages/dask/base.py", line 598, in compute
results = schedule(dsk, keys, **kwargs)
File "/usr/local/lib/python3.8/site-packages/dask/threaded.py", line 89, in get
results = get_async(
File "/usr/local/lib/python3.8/site-packages/dask/local.py", line 511, in get_async
raise_exception(exc, tb)
File "/usr/local/lib/python3.8/site-packages/dask/local.py", line 319, in reraise
raise exc
File "/usr/local/lib/python3.8/site-packages/dask/local.py", line 224, in execute_task
result = _execute_task(task, data)
File "/usr/local/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/usr/local/lib/python3.8/site-packages/dask/optimization.py", line 990, in __call__
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
File "/usr/local/lib/python3.8/site-packages/dask/core.py", line 149, in get
result = _execute_task(task, cache)
File "/usr/local/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/usr/local/lib/python3.8/site-packages/dask/dataframe/io/parquet/core.py", line 89, in __call__
return read_parquet_part(
File "/usr/local/lib/python3.8/site-packages/dask/dataframe/io/parquet/core.py", line 587, in read_parquet_part
dfs = [
File "/usr/local/lib/python3.8/site-packages/dask/dataframe/io/parquet/core.py", line 588, in <listcomp>
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
File "/usr/local/lib/python3.8/site-packages/dask/dataframe/io/parquet/arrow.py", line 435, in read_partition
arrow_table = cls._read_table(
File "/usr/local/lib/python3.8/site-packages/dask/dataframe/io/parquet/arrow.py", line 1518, in _read_table
arrow_table = _read_table_from_path(
File "/usr/local/lib/python3.8/site-packages/dask/dataframe/io/parquet/arrow.py", line 239, in _read_table_from_path
return pq.ParquetFile(fil, **pre_buffer).read(
File "/usr/local/lib/python3.8/site-packages/pyarrow/parquet/__init__.py", line 277, in __init__
self.reader.open(
File "pyarrow/_parquet.pyx", line 1213, in pyarrow._parquet.ParquetReader.open
File "/usr/local/lib/python3.8/site-packages/fsspec/spec.py", line 1578, in read
out = self.cache._fetch(self.loc, self.loc + length)
File "/usr/local/lib/python3.8/site-packages/fsspec/caching.py", line 41, in _fetch
return self.fetcher(start, stop)
File "/usr/local/lib/python3.8/site-packages/s3fs/core.py", line 2030, in _fetch_range
return _fetch_range(
File "/usr/local/lib/python3.8/site-packages/s3fs/core.py", line 2173, in _fetch_range
resp = fs.call_s3(
File "/usr/local/lib/python3.8/site-packages/fsspec/asyn.py", line 86, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/fsspec/asyn.py", line 66, in sync
raise return_result
File "/usr/local/lib/python3.8/site-packages/fsspec/asyn.py", line 26, in _runner
result[0] = await coro
File "/usr/local/lib/python3.8/site-packages/s3fs/core.py", line 332, in _call_s3
return await _error_wrapper(
File "/usr/local/lib/python3.8/site-packages/s3fs/core.py", line 137, in _error_wrapper
raise err
PermissionError: Access Denied
The owner of condastats
asked me to open an issue here (see conda-incubator/condastats#16).
Thank you very much for your kind help,
Cheers,
Tom.
It would be helpful to include both .conda
& .tar.bz2
packages. Particularly as more of the former and less of the latter are produced. May also help to track these separately to track the transition to the newer format
Hi!
cudatoolkit package at https://anaconda.org/conda-forge/cudatoolkit is very old. How is it possible to update it till the latest version = 12.2.0?
Thanks!
Description
Using condastats, the data show an exponential increase in downloads over the last few months. While we're confident in the quality of our package ;-), this seems unrealistic and, in any case, unexpected (*100 between 2023/12 and 2024/05 !).
Do you have any idea why these variations are occurring ?
condastats overall pyagrum --monthly
[...]
2023-08 2484
2023-09 2433
2023-10 4560
2023-11 3154
2023-12 1114
2024-01 2829
2024-02 2812
2024-03 12573
2024-04 66098
2024-05 110944
Thank you for any hints, explanation or information on this subject
(Copy of conda-incubator/condastats#22)
We are facing access issues while accessing March data files from below s3 path.
s3://anaconda-package-data/conda/hourly/2023/03/
Error: fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
pyviz.org fetches data from here to display the download stats of various viz and dashboards packages and related. Among those plotly is one that has its own conda channel and it gets downloaded from there a non negligible amount of times. Could the channel plotly
be added to the dataset?
it seems that the database for a specific month's daily download data is populated every month, not daily.
As of today (2022-06-16), download data for 2022-06-01~2022-06-15 is not available, which makes it not easy to collect statistics (e.g., the Download count for the last 30 days).
It would be great if the database is updated daily.
There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.
Error type: Cannot find preset's package (github>anaconda/renovate-config)
It seems as if there isn't a Parquet file for March (yet?): https://s3.amazonaws.com/anaconda-package-data/conda/monthly/2024/2024-03.parquet
Request from @jakirkham on behalf of the RAPIDS team.
Originally filed in conda/infrastructure#660
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.