kellnerd / harmony Goto Github PK
View Code? Open in Web Editor NEWMusic Metadata Aggregator and MusicBrainz Importer
License: MIT License
Music Metadata Aggregator and MusicBrainz Importer
License: MIT License
Continuing the discussion from #22 (comment)
We can factor out the copyright normalization logic and reuse it for other providers, e.g. as suggested for Tidal in https://community.metabrainz.org/t/harmony-music-metadata-aggregator-and-musicbrainz-importer/698641/15
But I'd do this after merging this PR. It also needs some further research. I know Tidal includes the copyright text both with and without the © symbol. What I'm unsure is whether this strictly contains copyright © info, or whether it also can sometimes contain phonographic copyright ℗ info. Spotify has those separated, which makes it easier.
I fully agree, this is enough for its own PR and it needs more research.
Tidal also has a copyright
property at the track level by the way, this should also be considered if it is different from the release level coypright. So far they were identical for the releases which I have checked, maybe a compilation has different values there.
For starters I have a commit in the dev
branch which displays the alternative copyright values.
When we have more examples we can decide how the release merge algorithm should handle these, one possibility would be to keep all and deduplicate them.
List of sources/websites for which a metadata provider has been implemented or requested.
Leave a comment which includes at least a link to an example release for a quick provider request, or create a separate issue which is named after the provider and labeled with
provider
Detailed requests for sources with an open API and good documentation are more likely to be implemented.
Edit: Let's be honest, every request which contains more details than just the name or URL of the source is probably worth its own issue already.
If you plan to work on a provider, be it doing more research or actually implementing it, please create a separate issue which is named after the provider and ask for it being assigned to you.
Tokens and other secrets should not be included in this repository but loaded from environment variables.
Some services add these bits to the ends of release titles. Currently atisket automatically removes them from the titles.
For integration with external tools it would be convenient to be able to easily link to Harmony with a URL or GTIN. Links like this should be supported:
With GTIN: https://harmony.pulsewidth.org.uk/release?gtin=191402047554
There are two separate behaviors:
deezer=
. This is inconvenient for third-party tools, as they need to hardcode all possible providers. It would be better if the default providers would be used instead.The second case is more convenient and probably intentional. Having case 1. with url
parameter behave the same would likely solve the issue.
Not sure whether both cases should trigger an automatic lookup.
Sometimes barcodes are not as unique as they should be...
635669065024 returns two different releases (different artists, but same label) with the iTunes, Spotify and Tidal providers. Deezer's API only returns one of them (YBC III), for the others it seems to be random which one is the first result that gets returned.
The iTunes provider at least warns about this, the other providers currently ignore this issue silently.
mostly for individual track pages, but I believe Bandcamp can have different licenses per track (tho I don't know if that'd be a recording or work URL...), for example
Bandcamp contains many Creative Commons licensed releases, see for example https://aeonsable.bandcamp.com/album/aenigma-2023
The Bandcamp importer user script supports reading the license information and sets the license URL when seeding, see https://github.com/murdos/musicbrainz-userscripts/blob/master/bandcamp_importer.user.js#L188-L195
Similar could be done in the Bandcamp provider.
(as suggested in #5)
Yandex Music is a Russian music streaming service developed by Yandex. Users select musical compositions, albums, collections of musical tracks to stream to their device on demand and receive personalized recommendations. The service is also available as web browser. Service is available in Armenia, Azerbaijan, Belarus, Georgia, Israel, Kazakhstan, Kyrgyzstan, Moldova, Russia, Tajikistan, Turkmenistan and Uzbekistan. Subscription can only be paid from supported countries above, but the service is then available in all other countries. (wiki)
Example of an album: https://music.yandex.ru/album/12353342
Open JSON API: https://api.music.yandex.net/albums/12353342/ (or https://api.music.yandex.net/albums/12353342/with-tracks for additional info on tracks from album, such as the distributor of release). VPN might be needed to open those (mirror for the "with-tracks" response: https://www.jsonkeeper.com/b/YKSE)
The API does not support neither GTAN nor ISRC. Also, the "label" section of response takes the info from the P-line of release and in most cases would remove words "Productions", "Music", "Publishing" and etc., as well as split one label onto multiple ones if there's a slash in its name (like here).
API supports showing whether it's an album, single, podcast or an audiobook (since they all have a link of https://music.yandex.ru/album/album_id).
There's also an unofficial implantation of an API at https://github.com/MarshalX/yandex-music-api/releases but token needed to use it
I honestly couldn't find any API for OTOTOY, but since it is a Japanese store, most of the help pages aren't in English, so there might be.
that said, perhaps it could be scraped for data, especially since it's one of the few stores I know that shows catalog numbers (for example, here).
important note, OTOTOY does keep seperate pages for Lossless and High-Resolution releases, which would be the same MusicBrainz release (all other data being the same, of course)
This is just a loosely ordered list of things I already have on my radar, to be cleaned up later™️.
/track
URLs (#7)/track
URLs (expensive, only if there is no better source)ReleaseOptions.regions
option by using an ordered setprovider
key
and cert
options to start()
Tidal also provides videos as separate entities. They come with title, cover image, duration, release date, ISRC copyright info. Seems to be well suited to be added as releases on their own.
Examples:
API provides the /videos/{id}
endpoint, see https://developer.tidal.com/reference/web-api?spec=catalogue&ref=get-video .
Example response for https://tidal.com/browse/video/358461354
{
"resource": {
"artifactType": "video",
"id": "358461354",
"title": "My Boy Only Breaks His Favorite Toys (Lyric Video)",
"image": [
{
"url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/1024x256.jpg",
"width": 1024,
"height": 256
},
{
"url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/1080x720.jpg",
"width": 1080,
"height": 720
},
{
"url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/160x107.jpg",
"width": 160,
"height": 107
},
{
"url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/160x160.jpg",
"width": 160,
"height": 160
},
{
"url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/320x214.jpg",
"width": 320,
"height": 214
},
{
"url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/320x320.jpg",
"width": 320,
"height": 320
},
{
"url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/480x480.jpg",
"width": 480,
"height": 480
},
{
"url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/640x428.jpg",
"width": 640,
"height": 428
},
{
"url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/750x500.jpg",
"width": 750,
"height": 500
},
{
"url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/750x750.jpg",
"width": 750,
"height": 750
}
],
"releaseDate": "2024-04-19",
"artists": [
{
"id": "3557299",
"name": "Taylor Swift",
"picture": [
{
"url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/1024x256.jpg",
"width": 1024,
"height": 256
},
{
"url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/1080x720.jpg",
"width": 1080,
"height": 720
},
{
"url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/160x107.jpg",
"width": 160,
"height": 107
},
{
"url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/160x160.jpg",
"width": 160,
"height": 160
},
{
"url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/320x214.jpg",
"width": 320,
"height": 214
},
{
"url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/320x320.jpg",
"width": 320,
"height": 320
},
{
"url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/480x480.jpg",
"width": 480,
"height": 480
},
{
"url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/640x428.jpg",
"width": 640,
"height": 428
},
{
"url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/750x500.jpg",
"width": 750,
"height": 500
},
{
"url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/750x750.jpg",
"width": 750,
"height": 750
}
],
"main": true
}
],
"duration": 208,
"trackNumber": 0,
"volumeNumber": 0,
"isrc": "USUMV2400558",
"copyright": "© 2024 Taylor Swift",
"properties": {},
"tidalUrl": "https://tidal.com/browse/video/358461354"
}
}
Starting with https://www.deezer.com/fr/album/10882160 and harmony gives
https://music.apple.com/gb/artist/505840851
This leads to MB not autodetecting the service:
Correct for autodetection would be https://itunes.apple.com/gb/artist/id505840851
I don't know if these are two separated services or just URL redundancy for itunes. If it's the same service, changing the output URL via harmony should easily fix it or is there some technical reason against?
For now I'll stick with the itunes link :)
https://musicbrainz.org/artist/2e21383f-f71e-4367-bfa8-5a02c74643a8
For some releases (pre-releases?), Tidal's API does not return all tracks:
The missing tracks are not shown on tidal.com/browse/album pages at all, on listen.tidal.com pages they are displayed greyed out.
Since the API returns at least the correct track count we could try to fill the tracklist (for single medium releases) with [unknown] tracks to allow for these releases being combined with other sources which have the track titles and lengths.
when I'm adding multiple releases, I find it easiest to have my importer in one window and the artists' page in another, so I can just click and drag a link when moving to the next release. with how Harmony currently works, I've got to highlight the whole field and backspace before I can do this
an alternate option would be to clear the provider and GTIN fields at the top after looking up a release, but there might be a reason to show that even after the lookup. perhaps a second "new lookup" set of fields could work too? I'm up for any solutions~
Originally reported on the forums:
It seems for Bandcamp Harmony is lacking the check if a barcode is used for another edition like the userscript does:
https://harmony.pulsewidth.org.uk/release?bandcamp=consvmer%2Fseelenfrieden&ts=1718342804
According to the listing at Apple Music it should be 3617389461901
I would say this is a data error and it should be sufficient to unset the digital release GTIN only in case of a reused GTIN. If it is different from all physical release GTINs on the Bandcamp page (or when there are no physical packages) it should still be fine to use it.
https://harmony.pulsewidth.org.uk/release?gtin=197875266348&itunes=®ion=GB&ts=1717477988
iTunes: The API also returned 1 other result, which was skipped: https://music.apple.com/gb/album/1702051779
The other result would have been the correct one with GTIN 197875266348.
iTunes: Extracted GTIN 197985529395 (from artwork URL) does not match the looked up value 197875266348
In this case, both image URLs contain the corresponding barcode, but this is not always the case unfortunately:
https://harmony.pulsewidth.org.uk/release?gtin=882951718827&itunes=®ion=GB&ts=1717495471
iTunes: The API also returned 1 other result, which was skipped: https://music.apple.com/gb/album/600624295
That would've been the correct result 🫤
Another example where GTIN would help: https://harmony.pulsewidth.org.uk/release?gtin=822603266801&itunes=®ion=GB&ts=1717435544
one feature I miss from a-tisket is how it can seed an edit to add artist URLs from the services it supports to the MusicBrainz artist
When trying to put a geo.music.apple.com link, Harmony displays an error:
No provider supports https://geo.music.apple.com/XX/album/_/1234567890?mt=1&app=music&ls=1&at=1000lHKX
where XX is region code (e.g. US), and 1234567890 is the album's ID
7digital (us store) has API docs at https://docs.7digital.com .
featured artists are handled very inconsistently across the various platforms, with Spotify removing feats and putting them in the artist field, Deezer keeping feat in the title and the artist field, and Apple Music only keeping feats in the track title. I think if a service has a featured artist, this should be reflected in the harmonized data, both on the track level and potentially on the release level (if all tracks have the same feat, especially for singles)
here's a decent cross section of the variants on this release
Preview resolution of covers and show highest resolution in another color (green?) to pick fast.
I know this generates more traffic, but I open them manually in tabs anyway to compare.
It would be good if providers could set the primary type and if this would be seeded when submitting to MB.
Not all providers will support this, but it is sometimes possible to at least detect singles and EPs. If in doubt a provider should likely keep this field empty.
Some notes on specific implementations:
- Single
and - EP
seem to be commonly added to singles / EPs. These should be stripped (see #9) and then can be used for seeding the primary type as well. a-tisket does this.album_type
, which is one of album
, single
or compilation
. Maybe it is too broad to use the album type (better leave it empty and have the user decide), but single and compilation should be fine to use.Generally it seems that if specific types, in particular single or EP, are detectable, this could be seeded. In most cases a source type of "album", if given, might be too unspecific and better kept out.
In the release editor the primary type can be seeded using the field type
.
Implement a Spotify provider based on the Spotify Web API.
Implementation notes:
/
.Related to #5
Providers using OAuth tokens (currently Tidal and Spotify) persist the token for the token lifetime, then do a refresh. This is usually working fine. But should the token become invalid for any reason on the server side this will block any requests until the currently stored token is expired.
It would be better if the providers would attempt to refresh the token if they get a 401 Unauthorized status response and retry the current request once. Only if it also fails with a new token raise the error exception.
I know it's mentioned in #5, but I figured I'd start up a ticket with a link to the API docs at least~
https://developers.soundcloud.com/docs
a couple notes about SoundCloud:
continuing from discussion here.
so, after a very brief search, it seems there's no official YouTube Music API, only one for YouTube (and a few unofficial ones for YouTube Music)
a few items to be aware of specific to YouTube with examples where applicable:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.