thebigmunch / audio-metadata Goto Github PK
View Code? Open in Web Editor NEWA library for reading and, in the future, writing audio metadata. https://audio-metadata.readthedocs.io/
Home Page: https://forum.thebigmunch.me
License: MIT License
A library for reading and, in the future, writing audio metadata. https://audio-metadata.readthedocs.io/
Home Page: https://forum.thebigmunch.me
License: MIT License
While uploading my collection with:
gms up -u somename -v --debug /srv/library/music
I encounter with the following error, while I used the
Traceback (most recent call last):
File "/usr/local/lib/venv.google-music-script/bin/gms", line 10, in <module>
sys.exit(run())
File "/usr/local/lib/venv.google-music-script/lib/python3.7/site-packages/google_music_scripts/cli.py", line 920, in run
DISPATCH[command](args)
File "/usr/local/lib/venv.google-music-script/lib/python3.7/site-packages/google_music_scripts/commands.py", line 364, in do_upload
if generate_client_id(song) not in google_client_ids:
File "/usr/local/lib/venv.google-music-script/lib/python3.7/site-packages/google_music_proto/musicmanager/utils.py", line 18, in generate_client_id
song = audio_metadata.load(song)
File "/usr/local/lib/venv.google-music-script/lib/python3.7/site-packages/audio_metadata/api.py", line 82, in load
return parser_cls.load(fileobj)
File "/usr/local/lib/venv.google-music-script/lib/python3.7/site-packages/audio_metadata/formats/mp3.py", line 519, in load
self.streaminfo = MP3StreamInfo.load(self._obj)
File "/usr/local/lib/venv.google-music-script/lib/python3.7/site-packages/audio_metadata/formats/mp3.py", line 406, in load
raise InvalidFormat("Missing XING header and insufficient MPEG frames.")
audio_metadata.exceptions.InvalidFormat: Missing XING header and insufficient MPEG frames
Although "verbose" and "debug" gms options have been activated, not much additional info is reported ...
Here is my config in a Python 3.7 with a virtualenv:
Package Version
---------------------- -----------
ansimarkup 1.4.0
appdirs 1.4.3
attrs 18.2.0
audio-metadata 0.4.0
better-exceptions-fork 0.2.1.post6
bidict 0.17.5
bitstruct 6.0.0
certifi 2019.3.9
chardet 3.0.4
colorama 0.4.1
google-music 3.0.1
google-music-proto 2.4.0
google-music-scripts 4.0.1
google-music-utils 2.1.0
idna 2.8
loguru 0.2.5
marshmallow 2.19.2
more-itertools 4.3.0
multidict 4.5.2
natsort 6.0.0
oauthlib 3.0.1
pendulum 2.0.4
pip 19.0.3
pprintpp 0.4.0
protobuf 3.7.1
Pygments 2.4.0
python-dateutil 2.8.0
pytzdata 2019.1
requests 2.21.0
requests-oauthlib 1.2.0
setuptools 40.8.0
six 1.12.0
tenacity 5.0.4
tomlkit 0.5.3
urllib3 1.24.3
wrapt 1.11.1
Unable to find so far to find the file generating this error ...
Some examples:
Related to/dependent on #18.
After changing the __*attr__
methods in the tbm_utils.AttrMapping
class in thebigmunch/tbm-utils@d9dd372, the attribute access involving the field mapping functionality of Tags
classes can cause failures like in https://github.com/thebigmunch/google-music-scripts/issues/56. These should've been implemented before the last release.
So, the reason this is the case now in audio-metadata
, mutagen
, and many, if not most, other audio metadata libraries is that Vorbis comment fields and some ID3 fields support/make sense with multiple values. But, some ID3 fields and other tag systems (like MP4, RIFF, and others) do not. Some people have even suggested in style guides that some Vorbis comment fields should not be given multiple times, as they don't make sense as multiple-value fields.
Making these lists has some advantages, mainly keeping the API consistent across all tag formats and fields. However, there are some disadvantages:
So, the question is whether the API consistency is worth all the other hassles. Or vice versa.
So i was doing a for loop with os.listdir(). I couldn't perform operations on these files because they were still open from audio_metadata.open(). this is easily fixable. I'll create a PR
I find that when I try to catch InvalidHeader exceptions on audio_metadata.load() the code will not execute in the 'except audio_metadata.exceptions.InvalidHeader:' block. Even a 'print("test")' does not execute if I catch the exception. My only options are to not catch the exception and have the code crash or to catch it and then raise it again to have the code crash.
I have a loop that iterates over a list of files to grab metadata from each. The first step is trying to get the metadata, and I handle exceptions right away before the rest of the loop code is executed.
The below prints "Test" and then crashes with the exception:
except audio_metadata.exceptions.InvalidHeader:
print("Test")
raise
The below does not print "Test" and hangs:
except audio_metadata.exceptions.InvalidHeader:
print("Test")
continue
In case it helps, this is the way my loop is set up:
for f in filenames:
try:
metadata = audio_metadata.load(f)
except audio_metadata.exceptions.UnsupportedFormat:
print("Error - Unsupported format. Skipping file")
continue
except audio_metadata.exceptions.InvalidHeader:
print("Error - Invalid header. Skipping file")
continue
<process metadata>
Sorry, I'm pretty new to adding issues to GitHub and can't figure out how to show indentation, but it is all properly indented in code.
Thank you, thebigmunch, for fixing my formatting!
while using google-music-script i'm seeing this error:
owner, identifier = frame_data.split(b'\x00')
ValueError: too many values to unpack (expected 2)
this might be due to some misbehaving file. however, this should be dealt with more gently - i guess, here (that's why i'm filling an issue here; if you want it at gms, i'll file it there...).
btw, quick fix:
owner, identifier = frame_data.split(b'\x00', 1)
but i have no idea about the possible ramifications of this fix!
This has been sitting around locally for too long. Time to get some other people looking at/testing it, so it can finally get across the finish line.
For anyone willing to do some testing of MP4 support: pip install -U git+https://github.com/thebigmunch/audio-metadata@mp4
.
I set a warning to emit when it runs into tags using data types I don't currently have explicitly handled. I'd love to get any files that emit this warning.
Any questions/issues/discussion, post here.
----
tags.MP4StreamInfo
et al.Hey there. I was really intrested in your package when I was looking for an alternative to mutagen and other alike audio metadata parsers. I really liked your because I can read directly from memory (BytesIO).
However I noticed it lacked support for m4a containers [mainly AAC and Apple lossless]. So I was wondering if you could implement it. Thanks again.
I used to be very good about adding comments to code. But, with a reduction in hobby coding time comes sacrifices. Much of this code could benefit from some helpful comments, including links to applicable resources.
You can try it out with: pip install -U git+https://github.com/audio-metadata@opus
.
So, the high-level API has the Python-traditional load(s)/dump(s)
naming scheme. Since I modified the DataReader
class some time ago, this isn't strictly necessary for the load(s)
functions anymore. But, it makes the code nicer and easier to raise an exception on incorrect parameters. Also, explicit is better than implicit here, especially considering there is a need to have both dump
and dumps
.
So far, all of the format and component classes have been using load
classes that handle everything as it just casts the data
argument to a DataReader
. But, this should certainly be confusing, or at least inconsistent.
I propose to rename the loading class builder methods to parse
, taking any DataReader
-compatible input as now and returning on instance of the class, and naming the dumping methods format
, returning a bytes
object.
I'm getting the following error while trying to upload a folder:
[2019-10-14 12:13:36] Logging in to Music Manager
[2019-10-14 12:13:37] Logging in to Mobile Client
[2019-10-14 12:13:38] Loading local songs
[2019-10-14 12:13:39] Comparing hashes
Traceback (most recent call last):
File "/usr/local/bin/gms", line 10, in
sys.exit(run())
File "/usr/local/lib/python3.7/dist-packages/google_music_scripts/cli.py", line 920, in run
DISPATCHcommand
File "/usr/local/lib/python3.7/dist-packages/google_music_scripts/commands.py", line 364, in do_upload
if generate_client_id(song) not in google_client_ids:
File "/usr/local/lib/python3.7/dist-packages/google_music_proto/musicmanager/utils.py", line 21, in generate_client_id
song = audio_metadata.load(song)
File "/usr/local/lib/python3.7/dist-packages/audio_metadata/api.py", line 119, in load
return parser_cls.load(fileobj)
File "/usr/local/lib/python3.7/dist-packages/audio_metadata/formats/mp3.py", line 536, in load
self.streaminfo = MP3StreamInfo.load(self._obj)
File "/usr/local/lib/python3.7/dist-packages/audio_metadata/formats/mp3.py", line 492, in load
bitrate = ((audio_size - frames[0]._size) * 8 * frames[0].sample_rate) / num_samples
ZeroDivisionError: division by zero
Most probably one of the mp3 files is broken, but there is no way to figure out which one.
Would it be possible to add a debugging option to show the file that is being processed so that way I can see what mp3 is broken :)
Cheers,
Liviu
This includes:
Tag
class.
name
and value
.Tag
.Tags
classes to support both plain values and Tag
instances in some way.Spotted here.
This includes:
There is something missing in the READEM "why" section: Your library is not GPL, your library is MIT. That is a great.
I've been holding off on really writing docs until https://github.com/pawamoy/mkdocstrings/issues/27 comes to fruition. I want to use MkDocs, but I can't live without an autodoc reference. I want to have at least the autodoc and some basic tutorial/how-to stuff up before diving into dump support. So, I will likely need to do this before mkdocstrings is ready.
For example, there is a line bidict>=0.17,<0.18
in the PyPI tarball for version 0.4.0 whereas on GitHub, we have
Line 32 in b1f7c6a
Using 'gms down -v --debug --log-to-file '%albumartist%/%album%/%track2% - %title%'' to create local backup of files, but am currently receiving the following error (note that ~1000 files were successfully downloaded already in a fresh backup):
[2019-07-14 03:36:04] Downloading 18376 songs from Google Music
Traceback (most recent call last):
File "/usr/local/bin/gms", line 10, in <module>
sys.exit(run())
File "/usr/local/lib/python3.6/dist-packages/google_music_scripts/cli.py", line 920, in run
DISPATCH[command](args)
File "/usr/local/lib/python3.6/dist-packages/google_music_scripts/commands.py", line 221, in do_download
download_songs(mm, to_download, template=args.output)
File "/usr/local/lib/python3.6/dist-packages/google_music_scripts/core.py", line 36, in download_songs
tags = audio_metadata.loads(audio).tags
File "/usr/local/lib/python3.6/dist-packages/audio_metadata/api.py", line 103, in loads
return parser_cls.load(b)
File "/usr/local/lib/python3.6/dist-packages/audio_metadata/formats/mp3.py", line 519, in load
self.streaminfo = MP3StreamInfo.load(self._obj)
File "/usr/local/lib/python3.6/dist-packages/audio_metadata/formats/mp3.py", line 406, in load
raise InvalidFormat("Missing XING header and insufficient MPEG frames.")
audio_metadata.exceptions.InvalidFormat: Missing XING header and insufficient MPEG frames.
I have the latest audio-metadata (Requirement already up-to-date: audio-metadata in /usr/local/lib/python3.6/dist-packages (0.4.0)) and latest google-music-scripts (Requirement already up-to-date: google-music-scripts in /usr/local/lib/python3.6/dist-packages (4.0.1))
Thanks!
Name: audio-metadata
Version: 0.11.1
Summary: A library for reading and, in the future, writing metadata from audio files.
Home-page: https://github.com/thebigmunch/audio-metadata
Author: thebigmunch
Author-email: [email protected]
License: MIT
Location: /usr/lib/python3.8/site-packages
Requires: bidict, pendulum, tbm-utils, wrapt, attrs, more-itertools, pprintpp, bitstruct
Required-by: google-music, google-music-utils, google-music-scripts, google-music-proto
Python 3.8.2
Traceback (most recent call last):
File "/usr/bin/gms", line 8, in <module>
sys.exit(run())
File "/usr/lib/python3.8/site-packages/google_music_scripts/cli.py", line 618, in run
args.func(args)
File "/usr/lib/python3.8/site-packages/google_music_scripts/commands.py", line 390, in do_upload
local_songs = get_local_songs(
File "/usr/lib/python3.8/site-packages/google_music_scripts/core.py", line 186, in get_local_songs
local_songs = [
File "/usr/lib/python3.8/site-packages/google_music_scripts/core.py", line 195, in <listcomp>
if audio_metadata.determine_format(filepath) in [
File "/usr/lib/python3.8/site-packages/audio_metadata/api.py", line 72, in determine_format
ID3v2.parse(data)
File "/usr/lib/python3.8/site-packages/tbm_utils/decorators.py", line 44, in wrapper
return wrapped(*args, **kwargs)
File "/usr/lib/python3.8/site-packages/audio_metadata/formats/id3v2.py", line 416, in parse
self.tags = ID3v2Frames.parse(
File "/usr/lib/python3.8/site-packages/tbm_utils/decorators.py", line 44, in wrapper
return wrapped(*args, **kwargs)
File "/usr/lib/python3.8/site-packages/audio_metadata/formats/id3v2.py", line 246, in parse
frame = ID3v2Frame.parse(data, id3_version, unsync)
File "/usr/lib/python3.8/site-packages/tbm_utils/decorators.py", line 44, in wrapper
return wrapped(*args, **kwargs)
File "/usr/lib/python3.8/site-packages/audio_metadata/formats/id3v2frames.py", line 551, in parse
return frame_type(
File "<attrs generated init audio_metadata.formats.id3v2frames.ID3v2TimestampFrame>", line 6, in __init__
File "/usr/lib/python3.8/site-packages/audio_metadata/formats/id3v2frames.py", line 1423, in _validate_value
parse_iso8601(v)
File "/usr/lib/python3.8/site-packages/pendulum/parsing/iso8601.py", line 187, in parse_iso8601
return datetime.date(year, month, day)
ValueError: year 0 is out of range
Similar to https://github.com/thebigmunch/google-music-proto/issues/6, ValueError
should be gracefully handled.
--- audio_metadata/formats/id3v2frames.py.orig
+++ audio_metadata/formats/id3v2frames.py
@@ -1421,7 +1421,7 @@
for v in value:
try:
parse_iso8601(v)
- except ParserError:
+ except (ParserError, ValueError):
raise TagError("Timestamp frame values must conform to the ID3v2-compliant subset of ISO 8601.")
@datareader
The latest update v0.8.0 breaks the duration for wav files.
In /src/audio_metadata_formats/wav.py L195:
self.streaminfo.duration = self.streaminfo._size / self.streaminfo.bitrate / 8
should read
self.streaminfo.duration = self.streaminfo._size / (self.streaminfo.bitrate / 8)
This bug was introduced in 2bbda71. It was tested on a 130 second wav file. In v0.6.0, the duration is correctly returned. In v0.8.0, the duration returned is 2.03 seconds.
It might be a good idea to include a test for the streaminfo to ensure data is correct.
Name: audio-metadata
Version: 0.11.1
Summary: A library for reading and, in the future, writing metadata from audio files.
Home-page: https://github.com/thebigmunch/audio-metadata
Author: thebigmunch
Author-email: [email protected]
License: MIT
Location: /home/bill/.local/lib/python3.8/site-packages
Requires: attrs, bidict, bitstruct, more-itertools, pendulum, pprintpp, tbm-utils, wrapt
Required-by:
3.8.10 (default, Nov 14 2022, 12:59:47)
[GCC 9.4.0]
Linux Mint 20.0
System: Kernel: 5.15.0-58-generic x86_64 bits: 64 compiler: N/A Desktop: Cinnamon 5.0.7
wm: muffin dm: LightDM Distro: Linux Mint 20.2 Uma base: Ubuntu 20.04 focal
audio_metadata.load(Path(df.loc[3421]['FileA']))
Traceback (most recent call last):
File "/tmp/ipykernel_384411/3107725757.py", line 1, in <cell line: 1>
audio_metadata.load(Path(df.loc[3421]['FileA']))
File "/home/me/.local/lib/python3.8/site-packages/audio_metadata/api.py", line 113, in load
parser_cls = determine_format(data)
File "/home/me/.local/lib/python3.8/site-packages/audio_metadata/api.py", line 72, in determine_format
ID3v2.parse(data)
File "/home/me/.local/lib/python3.8/site-packages/tbm_utils/decorators.py", line 44, in wrapper
return wrapped(*args, **kwargs)
File "/home/me/.local/lib/python3.8/site-packages/audio_metadata/formats/id3v2.py", line 416, in parse
self.tags = ID3v2Frames.parse(
File "/home/me/.local/lib/python3.8/site-packages/tbm_utils/decorators.py", line 44, in wrapper
return wrapped(*args, **kwargs)
File "/home/me/.local/lib/python3.8/site-packages/audio_metadata/formats/id3v2.py", line 246, in parse
frame = ID3v2Frame.parse(data, id3_version, unsync)
File "/home/me/.local/lib/python3.8/site-packages/tbm_utils/decorators.py", line 44, in wrapper
return wrapped(*args, **kwargs)
File "/home/me/.local/lib/python3.8/site-packages/audio_metadata/formats/id3v2frames.py", line 538, in parse
frame_data = remove_unsynchronization(data.read(read_size))
File "/home/me/.local/lib/python3.8/site-packages/audio_metadata/utils.py", line 59, in remove_unsynchronization
data = data[sync_index + 3:]
KeyboardInterrupt
If I run
audio_metadata.load(Path(df.loc[3421]['FileA']))
Or this file is accessed during my dataframe.appy() loop, then the process freezes.
If I wrap this call in a function eg..
import time
import timeout_decorator
@timeout_decorator.timeout(1)
def mp3meta(path):
audio_metadata.load(path)
mp3meta(Path(poo.srcdest[0], poo.df.loc[3421]['FileA']))
Traceback (most recent call last):
File "/tmp/ipykernel_384411/3184206738.py", line 1, in <cell line: 1>
mp3meta(Path(poo.srcdest[0], poo.df.loc[3421]['FileA']))
File "/home/me/.local/lib/python3.8/site-packages/timeout_decorator/timeout_decorator.py", line 82, in new_function
return function(*args, **kwargs)
File "/tmp/ipykernel_384411/2936242082.py", line 3, in mp3meta
audio_metadata.load(path)
File "/home/me/.local/lib/python3.8/site-packages/audio_metadata/api.py", line 113, in load
parser_cls = determine_format(data)
File "/home/me/.local/lib/python3.8/site-packages/audio_metadata/api.py", line 72, in determine_format
ID3v2.parse(data)
File "/home/me/.local/lib/python3.8/site-packages/tbm_utils/decorators.py", line 44, in wrapper
return wrapped(*args, **kwargs)
File "/home/me/.local/lib/python3.8/site-packages/audio_metadata/formats/id3v2.py", line 416, in parse
self.tags = ID3v2Frames.parse(
File "/home/me/.local/lib/python3.8/site-packages/tbm_utils/decorators.py", line 44, in wrapper
return wrapped(*args, **kwargs)
File "/home/me/.local/lib/python3.8/site-packages/audio_metadata/formats/id3v2.py", line 246, in parse
frame = ID3v2Frame.parse(data, id3_version, unsync)
File "/home/me/.local/lib/python3.8/site-packages/tbm_utils/decorators.py", line 44, in wrapper
return wrapped(*args, **kwargs)
File "/home/me/.local/lib/python3.8/site-packages/audio_metadata/formats/id3v2frames.py", line 538, in parse
frame_data = remove_unsynchronization(data.read(read_size))
File "/home/me/.local/lib/python3.8/site-packages/audio_metadata/utils.py", line 57, in remove_unsynchronization
if data[sync_index + 1 : sync_index + 3] == b'\x00\x00':
File "/home/me/.local/lib/python3.8/site-packages/timeout_decorator/timeout_decorator.py", line 69, in handler
_raise_exception(timeout_exception, exception_message)
File "/home/me/.local/lib/python3.8/site-packages/timeout_decorator/timeout_decorator.py", line 45, in _raise_exception
raise exception()
TimeoutError: 'Timed Out'
So it seems that somewhere inside 'remove_unsynchronization' there is the capability to create an endless 'while' loop.
I don't know what is special about this file. It plays ok as does the previous file (ie the track before).
I have set up this snippet right here (variation of example):
import audio_metadata
metadata = audio_metadata.load('Music.flac')
print(metadata)
And it only outputs this:
<FLAC ({
'filepath': 'C:\Users\ptgms\PycharmProjects\medialibs\Music.flac',
'filesize': '30.30 MiB',
'pictures': [
})>,
],
'seektable': <FLACSeekTable (26 seekpoints)>,
})>,
})>,
})>
I honestly have no idea why lmao
thanks for help
This package is in desperate need of logging : )
My preference is to use loguru for this.
Differences in the minor versions of ID3v2 require different handling. This should include only allowing frames specified for that version, and, in the future, changing between them.
I've found that scanning my Flac library there are some albums that fail with audio_metadata. Even though these files pass a 'flac -t FILE' they fail to extract with audio_metadata.
I've found that some of these files fail because of a ZERO size in a block, which is caught and triggers an exception.
While this is probably wanted behaviour for the STREAMINFO block, it seems a bit over zealous for blocks like, PADDING and SEEKTABLE.
So rather than call it a bug, I'll ask for a feature change. It requires a 1 (or2) line change in flac.py.
E.g. (Where flay.py in Downloads is the GitHub version)
[2041]$ diff flac.py ~/Downloads/flac.py
456,457c456,457
< if ( block_size == 0 ) and ( block_type == FLACMetadataBlockType.STREAMINFO ) :
< raise FormatError(f"FLAC metadata block {block_type}; size must be greater than 0.")
---
> if block_size == 0:
> raise FormatError("FLAC metadata block size must be greater than 0.")
Change certain exceptions, to be determined, to warnings that can be made exceptions based on which mode audio-metadata
is in. The exceptions most likely to be changed regard spec-compliance of tags. This would be the precursor of the ability to fix/upgrade tags/tag versions.
Got a new one :-D
Traceback (most recent call last):
File "/usr/local/bin/gms", line 11, in <module>
sys.exit(run())
File "/usr/local/lib/python3.6/dist-packages/google_music_scripts/cli.py", line 968, in run
args.func(args)
File "/usr/local/lib/python3.6/dist-packages/google_music_scripts/commands.py", line 465, in do_upload
delete_on_success=args.delete_on_success
File "/usr/local/lib/python3.6/dist-packages/google_music_scripts/core.py", line 334, in upload_songs
no_sample=no_sample
File "/usr/local/lib/python3.6/dist-packages/google_music/clients/musicmanager.py", line 245, in upload
track_info = mm_calls.Metadata.get_track_info(song)
File "/usr/local/lib/python3.6/dist-packages/google_music_proto/musicmanager/calls.py", line 297, in get_track_info
track.track_number = int(track_split[0])
ValueError: invalid literal for int() with base 10: '1\x00'
I added a print
statement at line 259 on google_music_proto/musicmanager/calls.py
to get the offending file. Here's the FLAC header for it (kinda big due to embedded image):
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.