acoustid / mbdata Goto Github PK

View Code? Open in Web Editor NEW

97.0 97.0 22.0 922 KB

MusicBrainz SQLAlchemy Models

License: MIT License

Python 36.33% Shell 0.11% PLpgSQL 62.63% Perl 0.49% Raku 0.44%

mbdata's People

Contributors

Stargazers

Watchers

mbdata's Issues

Are the environment variables set correctly?

I don't really understand python but it seems to me that the application tries to read the environment variable MBSLAVE_DB_DB for the db name and not MBSLAVE_DB_NAME as it is stated in the documentation.

mbdata/mbdata/replication.py

Line 103 in ea0e7e2

read_env_item(self, 'name', prefix + 'DB_DB')

mbdata/mbdata/replication.py

Line 197 in ea0e7e2

self.musicbrainz.read_env('MBSLAVE_')

I tried to add MBSLAVE_DB_DB as an environment variable but still for some reason I am getting an error.

The issue I am facing is that although I am setting up the environment variables like this (in a k8s yml file):

containers:
      - name: musicbrainz-db-mirror
        image: leiyiliro/mbslave:1.0  # Specific version of the Docker image
        env:
        - name: MBSLAVE_DB_HOST
          value: musicbrainz-db
        - name: MBSLAVE_DB_PORT
          value: "5432"
        - name: MBSLAVE_DB_NAME
          value: musicbrainz
        - name: MBSLAVE_DB_DB
          value: musicbrainz
          # Used for read and write operations on the MusicBrainz database          
        - name: MBSLAVE_DB_USER
          value: $(POSTGRES_USER)
          # PostgreSQL database password for the general user
        - name: MBSLAVE_DB_PASSWORD
          value: $(POSTGRES_PASSWORD)
          # Used for creating and managing the mbslave database, schema updates, and replication
        - name: MBSLAVE_DB_ADMIN_USER
          value: $(POSTGRES_USER)
          # MusicBrainz Slave admin password for the admin user
        - name: MBSLAVE_DB_ADMIN_PASSWORD
          value: $(POSTGRES_PASSWORD)
        - name: MBSLAVE_MUSICBRAINZ_TOKEN
          value: $(MBSLAVE_MUSICBRAINZ_TOKEN)
        ports:
        - containerPort: 80

The code seems to look for a database with the name of my env variable $(POSTGRES_USER) ,which is "xxxxxx".

Traceback (most recent call last):
2023-04-18 00:30:10   File "/usr/local/bin/mbslave", line 8, in <module>
2023-04-18 00:30:10     sys.exit(main())
2023-04-18 00:30:10              ^^^^^^
2023-04-18 00:30:10   File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 803, in main
2023-04-18 00:30:10     args.func(config, args)
2023-04-18 00:30:10   File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 622, in mbslave_init_main
2023-04-18 00:30:10     create_user(config)
2023-04-18 00:30:10   File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 576, in create_user
2023-04-18 00:30:10     db = connect_db(config, superuser=True, no_db=True)
2023-04-18 00:30:10          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-04-18 00:30:10   File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 209, in connect_db
2023-04-18 00:30:10     return cfg.connect_db(set_search_path=set_search_path, superuser=superuser, no_db=no_db)
2023-04-18 00:30:10            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-04-18 00:30:10   File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 202, in connect_db
2023-04-18 00:30:10     db = psycopg2.connect(**self.database.create_psycopg2_kwargs(superuser=superuser, no_db=no_db))
2023-04-18 00:30:10          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-04-18 00:30:10   File "/usr/local/lib/python3.11/site-packages/psycopg2/__init__.py", line 122, in connect
2023-04-18 00:30:10     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
2023-04-18 00:30:10            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-04-18 00:30:10 psycopg2.OperationalError: FATAL:  database "xxxxxx" does not exist
2023-04-18 00:30:10 
2023-04-18 00:30:10 Traceback (most recent call last):
2023-04-18 00:30:10   File "/usr/local/bin/mbslave", line 8, in <module>
2023-04-18 00:30:10     sys.exit(main())
2023-04-18 00:30:10              ^^^^^^
2023-04-18 00:30:10   File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 803, in main
2023-04-18 00:30:10     args.func(config, args)
2023-04-18 00:30:10   File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 513, in mbslave_sync_main
2023-04-18 00:30:10     cursor.execute("SELECT current_schema_sequence, current_replication_sequence FROM %s.replication_control" % config.schemas.name('musicbrainz'))
2023-04-18 00:30:10 psycopg2.errors.UndefinedTable: relation "musicbrainz.replication_control" does not exist
2023-04-18 00:30:10 LINE 1: ...chema_sequence, current_replication_sequence FROM musicbrain...

mbslave: error: invalid choice: 'init'

mbslave init --create-user --create-database

Gives the following error -

usage: mbslave [-h] [-c, --config PATH] {import,sync,remap-schema,print-sql,psql} ...
mbslave: error: invalid choice: 'init' (choose from 'import', 'sync', 'remap-schema', 'print-sql', 'psql')

mbslave command does not recognize long options

I haven't tried every option, but none of those I tried worked.

$> mbslave psql --file CreateCollations.sql
usage: mbslave [-h] [-c, --config PATH]
               {import,sync,remap-schema,print-sql,psql} ...
mbslave: error: unrecognized arguments: --file CreateCollations.sql

That's one example of several I tried.

The only long option it seems to recognize is --help.

"musicbrainz.alternative_release_type" does not exist

~/ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/20211120-001843$ mbslave import mbdump.tar.bz2 mbdump-derived.tar.bz2
Importing data from mbdump.tar.bz2

Loading alternative_release_type to musicbrainz.alternative_release_type
Traceback (most recent call last):
File "/home/ubuntu/.local/bin/mbslave", line 8, in
sys.exit(main())
File "/home/ubuntu/.local/lib/python3.8/site-packages/mbdata/replication.py", line 592, in main
args.func(config, args)
File "/home/ubuntu/.local/lib/python3.8/site-packages/mbdata/replication.py", line 253, in mbslave_import_main
load_tar(filename, db, config, config.schemas.ignored_schemas, config.tables.ignored_tables)
File "/home/ubuntu/.local/lib/python3.8/site-packages/mbdata/replication.py", line 245, in load_tar
cursor.copy_from(tar.extractfile(member), fulltable)
psycopg2.errors.UndefinedTable: relation "musicbrainz.alternative_release_type" does not exist

Seems to be related to the issue mentioned here:
https://community.metabrainz.org/t/database-replication-issue-with-mbdata-25-0-4/548110

Update for MBS 2021-05-17 schema change release

https://blog.metabrainz.org/2021/05/18/musicbrainz-schema-change-release-2021-05-17-with-upgrade-instructions/

@yvanzo already has a branch with updates, not sure if it covers everything or not though, but it might work as a start if not: https://github.com/yvanzo/mbdata/tree/schema-change-2021-q2 – also not sure yet if he plans on making a PR of that or not. :)

Issue with --create-database argument

I am trying to run the mbslave init --create-database

The code here:

mbdata/mbdata/replication.py

Line 584 in ea0e7e2

def create_database(config: Config) -> None:

seems to try to connect to the "musicbrainz" database before it is created, and I get the error FATAL: database "musicbrainz" does not exist:

Traceback (most recent call last):
  File "/usr/local/bin/mbslave", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 803, in main
    args.func(config, args)
  File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 625, in mbslave_init_main
    create_database(config)
  File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 583, in create_database
    db = connect_db(config, superuser=True, no_db=True)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 209, in connect_db
    return cfg.connect_db(set_search_path=set_search_path, superuser=superuser, no_db=no_db)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 202, in connect_db
    db = psycopg2.connect(**self.database.create_psycopg2_kwargs(superuser=superuser, no_db=no_db))
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
psycopg2.OperationalError: FATAL:  database "musicbrainz" does not exist

How to actually access MB database with this tool?

The documentation states that:

Alternatively, if you are not interested in having a local MusicBrainz website and web service, you can use mbdata that includes replication without the rest of MusicBrainz Server.

But how to get the replica itself? The README includes:

>>> engine = create_engine('postgresql://musicbrainz:[email protected]/musicbrainz', echo=True)

Is this local existing database? If yes how to spin it?
To have a fully functioanl MB-server? In such case it doesn't match with the original docs.

musicbrainz.release table restoration takes too long probably because of triggers

I first use the mbslave command to restore musicbrainz db from a dump on my computer, since it took ages (didn't finish in a day) I dig in the code to narrow it down to restoring the release table from a downloaded db dump

from mbdata.replication import Config

config = Config(['mbslave.conf'])
db = config.connect_db()
filename = "mbdump/release"
schema, table = "musicbrainz", "release"

cursor = db.cursor()
with open(filename, 'r') as f:
    cursor.copy_expert('COPY {} FROM STDIN'.format("musicbrainz.release"), f)
db.commit()

Still very long - after 4 hours the data is not inserted

If I disable the triggers on this table:

cursor = db.cursor()
cursor.execute("ALTER TABLE musicbrainz.release DISABLE TRIGGER ALL")
with open(filename, 'r') as f:
    cursor.copy_expert('COPY {} FROM STDIN'.format("musicbrainz.release"), f)
cursor.execute("ALTER TABLE musicbrainz.release ENABLE TRIGGER ALL")

the restoration takes a little more than a minute.

Doing so before restoring each table might speed up the restoration process by a few hundreds, there might be a reason not to do it that is beyond the reach of my thinking :-|

Is it doable?

PS: in postgresql log file there's also a mention of WAL writing occuring too frequently, so a 'SET UNLOGGED' on each table before restoring might be a good idea, but I have no idea if it affects the performances dramatically

where are mbdump.tar.bz2 mbdump-derived.tar.bz2 on debian?

After downloading from:

http://ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/

Where are these files?

mbslave import mbdump.tar.bz2 mbdump-derived.tar.bz2

ls -lah
total 60K
drwxrwxr-x 9 ubuntu ubuntu 4.0K Nov 21 04:35 .
drwxr-xr-x 8 ubuntu ubuntu 4.0K Nov 21 22:12 ..
drwxrwxr-x 2 ubuntu ubuntu 4.0K Nov 21 04:35 debian
drwxrwxr-x 2 ubuntu ubuntu 4.0K Nov 21 04:35 debian-cd
drwxrwxr-x 2 ubuntu ubuntu 4.0K Nov 21 04:35 debian-cdimage
drwxrwxr-x 2 ubuntu ubuntu 4.0K Nov 21 18:15 header-inc
drwxrwxr-x 2 ubuntu ubuntu 4.0K Nov 21 05:45 icons
-rw-rw-r-- 1 ubuntu ubuntu 3.6K Nov 21 22:17 index.html
-rw-rw-r-- 1 ubuntu ubuntu 3.6K Nov 22 07:14 'index.html?C=D;O=A'
-rw-rw-r-- 1 ubuntu ubuntu 3.6K Nov 22 07:14 'index.html?C=M;O=A'
-rw-rw-r-- 1 ubuntu ubuntu 3.6K Nov 22 07:14 'index.html?C=N;O=D'
-rw-rw-r-- 1 ubuntu ubuntu 3.6K Nov 22 07:14 'index.html?C=S;O=A'
drwxrwxr-x 2 ubuntu ubuntu 4.0K Nov 21 04:35 lost+found
drwxrwxr-x 113 ubuntu ubuntu 4.0K Nov 21 22:17 pub
-rw-rw-r-- 1 ubuntu ubuntu 754 May 13 2015 welcome.msg

New release (28)

Hi guys,

Our mbslave is stuck since few days, it seems to be a problem with the new schema from metabrainz :
https://blog.metabrainz.org/2023/05/15/musicbrainz-schema-change-release-2023-05-15-with-upgrade-instructions/

SQL updates :
https://github.com/metabrainz/musicbrainz-server/blob/master/admin/sql/updates/schema-change/28.all.sql
https://github.com/metabrainz/musicbrainz-server/blob/master/admin/sql/updates/schema-change/28.master_and_standalone.sql
https://github.com/metabrainz/musicbrainz-server/blob/master/admin/sql/updates/schema-change/28.master_only.sql

Thank you for your hard work 💪🏻

How BIG is the database

I find it funny when you go to the Apple App Store, and you aren't told how LARGE an app is. That is a KEY piece of information. When you get charged $1700 for a 1tb phone one of the MAIN DETERMINANTS I have in determining 'which games do I play' correlate to 'HOW LARGE IS THE GAME'?

Likewise, I don't have any idea how large this database is. That is the MAIN LIMITING FACTOR for me today. As a Database professional, I COULD install this on many different machines.

but I don't know, and I CANNOT decide what machine to install this on without having SOME ballpark of how large this app is. To be honest, I wish that GitHub would tell me 'How Large is a Project' before I choose to 'Download the Zip File' or connect to the project using Github Desktop. If it's SMALL? A zip file is preferable.

Right now, I have utterly fallen in love with Music Brainz. I want to query the Database in PLAIN OLD SQL. I can't do that until I google 'How Large Is the Music Brains Database'. Things should be easier than that. Google doesn't even return the SAME RESULTS for every person.

Instructions missing

Instructions only say to 'pip install mbdata' then adjust mbslave.conf.default. However this file does not appear to exist unless cloning the repository (according to my find query and general poking around).

Error running mbslave sync

I see the following error when running mbslave sync

% mbslave sync
INFO:mbdata.replication:Downloading https://metabrainz.org/api/musicbrainz/replication-155737.tar.bz2?token=***
Traceback (most recent call last):
  File "/Users/simonhopkin/.local/bin/mbslave", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/simonhopkin/.local/pipx/venvs/mbdata/lib/python3.11/site-packages/mbdata/replication.py", line 803, in main
    args.func(config, args)
  File "/Users/simonhopkin/.local/pipx/venvs/mbdata/lib/python3.11/site-packages/mbdata/replication.py", line 520, in mbslave_sync_main
    process_tar(packet, db, config, ignored_schemas, ignored_tables, schema_seq, replication_seq, hook)
  File "/Users/simonhopkin/.local/pipx/venvs/mbdata/lib/python3.11/site-packages/mbdata/replication.py", line 457, in process_tar
    logger.info("Processing %s", fileobj.name)
                                 ^^^^^^^^^^^^
AttributeError: 'HTTPResponse' object has no attribute 'name'

I installed mbdata using pipx:

pipx install 'mbdata[replication]'

MusicBrainz schema change release 2024-05-13 (29)

We're seeing the following error:

mbslave.replication.MismatchedSchemaError: Mismatched schema sequence, 28 (database) vs 29 (replication packet)

This is caused by MusicBrainz schema change release 2024-05-13

mbslave: No "urllib2" in Python 3

mbdata.replication tries to import urllib2, but urllib2 is simply urllib in Python 3, so it causes a ModuleNotFoundError.

Creating Schema results in error

echo 'CREATE SCHEMA musicbrainz;' | mbslave psql -S

Results in:

Traceback (most recent call last):
File "/usr/local/bin/mbslave", line 11, in
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/mbdata/replication.py", line 603, in main
args.func(config, args)
File "/usr/local/lib/python2.7/dist-packages/mbdata/replication.py", line 560, in mbslave_psql_main
process = subprocess.Popen(command, env=environ)
File "/usr/lib/python2.7/subprocess.py", line 394, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1047, in _execute_child
raise child_exception
TypeError: coercing to Unicode: need string or buffer, int found

Apologies if this is me, however I had no trouble completing these pieces when installing mbslave rather than mbdata, so after much config stalking I'm assuming it's something in the instructions or in the configuration of the new code.

relation "musicbrainz.alternative_release_type" does not exist

Following the steps in the README the importing fails with the following error message:

mbslave import mbdump.tar.bz2 
INFO:mbdata.replication:Importing data from mbdump.tar.bz2
INFO:mbdata.replication:Loading alternative_release_type to musicbrainz.alternative_release_type
Traceback (most recent call last):
  File "/home/julian/devel/bandmap/venv/bin/mbslave", line 33, in <module>
    sys.exit(load_entry_point('mbdata==26.0.0', 'console_scripts', 'mbslave')())
  File "/home/julian/devel/bandmap/venv/lib64/python3.9/site-packages/mbdata-26.0.0-py3.9.egg/mbdata/replication.py", line 607, in main
    args.func(config, args)
  File "/home/julian/devel/bandmap/venv/lib64/python3.9/site-packages/mbdata-26.0.0-py3.9.egg/mbdata/replication.py", line 256, in mbslave_import_main
    load_tar(filename, db, config, config.schemas.ignored_schemas, config.tables.ignored_tables)
  File "/home/julian/devel/bandmap/venv/lib64/python3.9/site-packages/mbdata-26.0.0-py3.9.egg/mbdata/replication.py", line 248, in load_tar
    cursor.copy_from(tar.extractfile(member), fulltable)
psycopg2.errors.UndefinedTable: relation "musicbrainz.alternative_release_type" does not exist

Am I doing something wrong or has there been a schema change ?

Instructions do not work

By running sudo su - postgres the shell is in a state where user musicbrainz can be used without a password, but where neither the script mbslave nor the mbdata.replication module exist.

If I exit being logged in as postgres, I have access to mbslave and mbdata.replication, but now I need a password for the musicbrainz user I just created.

Please test the instructions on a virgin system and modify as needed to work. Thanks!

mbslave installed via pip: ImportError: No module named mbdata.replication

I installed mbdata via pip exactly how it's described in the instructions. Unfortunately i get an error each time i try to use the mbslave script:

Traceback (most recent call last):
File "/home/****/.local/bin/mbslave", line 6, in
from mbdata.replication import main
ImportError: No module named mbdata.replication

There is only the mbslave script in the directory after doing the pip install.
The script works great if i clone the whole git instead.

CreateIndexes.sql -> "musicbrainz.ll_to_earth(double precision, double precision) does not exist"

Running mbslave psql -f CreateIndexes.sql executes for a long time with lots of "CREATE INDEX" outputs, but ultimately errors out with:

ERROR:  function musicbrainz.ll_to_earth(double precision, double precision) does not exist
LINE 1: CREATE INDEX place_idx_geo ON place USING gist (musicbrainz....
                                                        ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

No module named 'psycopg2'

I'm trying to follow the instructions in a raspberry pi with ubuntu 22.04.

Once I get to part 5 of the instructions it always fails at the first command.

Traceback (most recent call last):
File "/home/ubuntu/.local/bin/mbslave", line 5, in <module>
from mbdata.replication import main
File "/home/ubuntu/.local/pipx/venvs/mbdata/lib/python3.10/site-packages/mbdata/replication.py", line 7, in <module>
import psycopg2
ModuleNotFoundError: No module named 'psycopg2'

I tried to install psycopg2 in several ways (installing psycopg2-binary using pip and pip3, building from the source and installing it), and in all those times the installation was successful. However, those mbslave commands always end up not finding the module.

What am I missing here?

ERROR: relation "art_type" already exists

The following error is displayed when running mbslave init --create-user --create-database

psql:/var/folders/63/wlxrc7ds36s2mhyddjpg56_80000gn/T/tmprzrao_gq.sql:14: ERROR:  relation "art_type" already exists
Traceback (most recent call last):
  File "/Users/simonhopkin/.local/bin/mbslave", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/simonhopkin/.local/pipx/venvs/mbdata/lib/python3.11/site-packages/mbdata/replication.py", line 803, in main
    args.func(config, args)
  File "/Users/simonhopkin/.local/pipx/venvs/mbdata/lib/python3.11/site-packages/mbdata/replication.py", line 653, in mbslave_init_main
    run_sql_script(sql_script)
  File "/Users/simonhopkin/.local/pipx/venvs/mbdata/lib/python3.11/site-packages/mbdata/replication.py", line 617, in run_sql_script
    run_script(command)
  File "/Users/simonhopkin/.local/pipx/venvs/mbdata/lib/python3.11/site-packages/mbdata/replication.py", line 609, in run_script
    subprocess.run(['bash', '-euxc', script], check=True)
  File "/usr/local/Cellar/[email protected]/3.11.1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['bash', '-euxc', 'mbslave psql -f eaa/CreateTables.sql']' returned non-zero exit status 3.

I'm using Postgresql 15 on Mac OS, Python 3.10.9

This issue only occurs when applying the following suggestion to merge into one schema:

[schemas]
musicbrainz=musicbrainz
statistics=musicbrainz
cover_art_archive=musicbrainz
wikidocs=musicbrainz
documentation=musicbrainz

README references non-existent SQL files

I think the README is out of date with the actual SQL files. For example, it shows mbslave psql -f CreateCollations.sql, but that file doesn't exist.

I haven't checked every such command, but I think a number of them are no longer correct.

mbslave command not found

Hello,

I'm trying to import the MusicBrainz database. I'm a student and don't really know what I'm doing. I try to do the commands with the mbslave command, but it tells me that it doesn't exist (this inside the PostgreSQL, which I think is correct)
This is my mbslave file:

[database]
host=127.0.0.1
port=5432
name=musicbrainz
user=musicbrainz
password=musicbrainz

[musicbrainz]
base_url=https://metabrainz.org/api/musicbrainz/
token=MyToken

[tables]
ignore=
#ignore=tracklist_index

[schemas]
musicbrainz=musicbrainz
statistics=statistics
cover_art_archive=cover_art_archive
event_art_archive=event_art_archive
wikidocs=wikidocs
documentation=documentation
ignore=
#ignore=statistics,cover_art_archive,wikidocs,documentation

Can someone help me? I'm not sure what I'm doing wrong.

Any steps necessary for mbdata users after MB’s PG12 update?

MusicBrainz updated their PostgreSQL version to 12 and have some steps for downstream users already on PG12.

Do mbdata users need to take any special steps in this regard?

Build instructions that actually work.

Make sure you're using Postgre 12 or later.

Complete the first 3 steps to install mbdata and setup your mbslave.conf file.

The SQL scripts in the mbdata repo haven't been updated in over 6 months so you need to
either clone the Musicbrainz git repo or download the latest Musicbrainz zip file from
Github and copy and/or exact the contents of the musicbrainz-server-master/admin/sql folder
and replace the files in the mbdata/sql folder.

If you used pipx the mbdata/sql folder will be located somewhere like the following:

~/.local/pipx/venvs/mbdata/lib/python3.x/site-packages/mbdata/sql

If needed, use find to locate your specific folder location.

The remainder of the steps where taken verbatim from the Musicbrainz InitDb.pl script. This will create a complete
Musicbrainz slave database will all tables, indexes, etc. You may want to customize as needed if say for instance
you're not importing wikidocs or other non-essential data dumps.
Create the Database:

sudo su - postgres
createuser musicbrainz
createdb -l C -E UNICODE -T template0 -O musicbrainz musicbrainz

#MB now use DateTime->now to populate 'TIMESTAMP WITH TIME ZONE' columns in their code.
#DateTime->now outputs 'floating' UTC by default, but doesn't encode any timezone info in its output, so the
#database must have its timezone set to UTC in order to correctly interpret those values.

psql musicbrainz -c 'ALTER DATABASE musicbrainz SET timezone TO 'UTC';'

psql musicbrainz -c 'CREATE EXTENSION IF NOT EXISTS cube WITH SCHEMA public;'
psql musicbrainz -c 'CREATE EXTENSION IF NOT EXISTS earthdistance WITH SCHEMA public;'
psql musicbrainz -c 'CREATE EXTENSION IF NOT EXISTS unaccent WITH SCHEMA public;'

#exit out of sudo back to the account you setup for mbdata
Prepare empty schemas:

echo 'CREATE SCHEMA musicbrainz;' | mbslave psql -S
echo 'CREATE SCHEMA statistics;' | mbslave psql -S
echo 'CREATE SCHEMA cover_art_archive;' | mbslave psql -S
echo 'CREATE SCHEMA wikidocs;' | mbslave psql -S
echo 'CREATE SCHEMA documentation;' | mbslave psql -S
echo 'CREATE SCHEMA event_art_archive;' | mbslave psql -S
echo 'CREATE SCHEMA json_dump;' | mbslave psql -S
echo 'CREATE SCHEMA report;' | mbslave psql -S
echo 'CREATE SCHEMA sitemaps;' | mbslave psql -S
Create tables structures:

#The first script will give an error that the extensions already exist because we already added them in step 1. The
#extensions have to be created in step 1 as you need to be a superuser account to create extensions. We just need
#to run the Extensions.sql script to add the musicbrainz.ll_to_earth() function so index creation won't fail as reported
#in this issue ticket.

mbslave psql -f Extensions.sql
mbslave psql -f CreateCollations.sql
mbslave psql -f CreateTables.sql
mbslave psql -f caa/CreateTables.sql
mbslave psql -f eaa/CreateTables.sql
mbslave psql -f documentation/CreateTables.sql
mbslave psql -f json_dump/CreateTables.sql
mbslave psql -f report/CreateTables.sql
mbslave psql -f sitemaps/CreateTables.sql
mbslave psql -f statistics/CreateTables.sql
mbslave psql -f wikidocs/CreateTables.sql
Import the data dumps. Minimally you need the following two dump files:

mbslave import mbdump.tar.bz2 mbdump-derived.tar.bz2
Create the primary keys:

mbslave psql -f CreatePrimaryKeys.sql
mbslave psql -f caa/CreatePrimaryKeys.sql
mbslave psql -f documentation/CreatePrimaryKeys.sql
mbslave psql -f eaa/CreatePrimaryKeys.sql
mbslave psql -f statistics/CreatePrimaryKeys.sql
mbslave psql -f wikidocs/CreatePrimaryKeys.sql
Create functions

mbslave psql -f CreateSearchConfiguration.sql
mbslave psql -f CreateFunctions.sql
mbslave psql -f caa/CreateFunctions.sql
mbslave psql -f eaa/CreateFunctions.sql
mbslave psql -f CreateSlaveOnlyFunctions.sql
Create indexes

mbslave psql -f CreateIndexes.sql
mbslave psql -f caa/CreateIndexes.sql
mbslave psql -f eaa/CreateIndexes.sql
mbslave psql -f json_dump/CreateIndexes.sql
mbslave psql -f sitemaps/CreateIndexes.sql
mbslave psql -f statistics/CreateIndexes.sql
mbslave psql -f CreateSlaveIndexes.sql
Set initial sequence values

mbslave psql -f SetSequences.sql
mbslave psql -f statistics/SetSequences.sql
Create views and triggers and search indexes

mbslave psql -f CreateViews.sql
mbslave psql -f caa/CreateViews.sql
mbslave psql -f eaa/CreateViews.sql
mbslave psql -f CreateSlaveOnlyTriggers.sql
mbslave psql -f CreateSearchIndexes.sql

mbslave is wrong

cat /etc/mbslave.conf

[database]
host=127.0.0.1
port=5432
name=musicbrainz
user=musicbrainz
#password=

[musicbrainz]
base_url=https://metabrainz.org/api/musicbrainz/
token=Eqzq...(this is my token)

[tables]
ignore=
#ignore=tracklist_index

when I work with step 5
Prepare empty schemas for the MusicBrainz database and create the table structure:
echo 'CREATE SCHEMA musicbrainz;' | mbslave psql -S
echo 'CREATE SCHEMA statistics;' | mbslave psql -S
echo 'CREATE SCHEMA cover_art_archive;' | mbslave psql -S
echo 'CREATE SCHEMA wikidocs;' | mbslave psql -S
echo 'CREATE SCHEMA documentation;' | mbslave psql -S

mbslave psql -f CreateCollations.sql
mbslave psql -f CreateTables.sql
mbslave psql -f statistics/CreateTables.sql
mbslave psql -f caa/CreateTables.sql
mbslave psql -f wikidocs/CreateTables.sql
mbslave psql -f documentation/CreateTables.sql

it get a wrong

[root@test pgsql]# echo 'CREATE SCHEMA musicbrainz;' | mbslave psql -S
Traceback (most recent call last):
File "/bin/mbslave", line 7, in
from mbdata.replication import main
File "/usr/lib/python2.7/site-packages/mbdata/replication.py", line 20, in
from contextlib2 import ExitStack
File "/usr/lib/python2.7/site-packages/contextlib2/init.py", line 56
async def aenter(self):
^
SyntaxError: invalid syntax

Import fails

Trying to build a standalone database (no replication) with the June 1 extracts and get a psycopg2.errors.BadCopyFileFormat error - see output / error messages below. I'm assuming this is related to the schema upgrade last month, but it's not clear how or where to retrieve the latest / correct sql. I would have assumed an update to the mbdata package would correspond to the schema changes, but no luck. I'm not running the server or docker - just want to play around with the database. Any pointers / advice much appreciated. Thanks.

mbslave import mbdump.tar.bz2
INFO:mbdata.replication:Importing data from mbdump.tar.bz2
INFO:mbdata.replication:Loading alternative_release_type to musicbrainz.alternative_release_type
INFO:mbdata.replication:Loading area to musicbrainz.area
INFO:mbdata.replication:Loading area_alias to musicbrainz.area_alias
INFO:mbdata.replication:Loading area_alias_type to musicbrainz.area_alias_type
INFO:mbdata.replication:Loading area_gid_redirect to musicbrainz.area_gid_redirect
INFO:mbdata.replication:Loading area_type to musicbrainz.area_type
INFO:mbdata.replication:Loading artist to musicbrainz.artist
INFO:mbdata.replication:Loading artist_alias to musicbrainz.artist_alias
INFO:mbdata.replication:Loading artist_alias_type to musicbrainz.artist_alias_type
INFO:mbdata.replication:Loading artist_credit to musicbrainz.artist_credit
Traceback (most recent call last):
File "/Users/me/dev/miniconda3/bin/mbslave", line 8, in
sys.exit(main())
File "/Users/me/dev/miniconda3/lib/python3.8/site-packages/mbdata/replication.py", line 607, in main
args.func(config, args)
File "/Users/me/dev/miniconda3/lib/python3.8/site-packages/mbdata/replication.py", line 256, in mbslave_import_main
load_tar(filename, db, config, config.schemas.ignored_schemas, config.tables.ignored_tables)
File "/Users/me/dev/miniconda3/lib/python3.8/site-packages/mbdata/replication.py", line 248, in load_tar
cursor.copy_from(tar.extractfile(member), fulltable)
psycopg2.errors.BadCopyFileFormat: extra data after last expected column
CONTEXT: COPY artist_credit, line 1: "2152096 The Chats 1 202 2018-01-26 11:59:06.33519+00 0 33fbf1e4-4768-30cc-a5c6-1c72f4f45826"

Schema change release 2022-05-16 upgrade

Add support and upgrade instructions for 2022-05-16 schema change release

mbdata.replication.MismatchedSchemaError: Mismatched schema sequence, 26 (database) vs 27 (replication packet)

CreateFunctions.sql -> function array_append(anyarray, anyelement) does not exist

phrogz@GrayBook MusicExplorer % mbslave psql -f CreateFunctions.sql
BEGIN
CREATE FUNCTION
psql:/var/folders/gr/391stds133bg1326wp7w70tm0000gn/T/tmps2a60q5n.sql:23: ERROR:  function array_append(anyarray, anyelement) does not exist

phrogz@GrayBook MusicExplorer % psql --version
psql (PostgreSQL) 14.1

Where should mbslave.conf be put?

I note the instructions have now got curl https://raw.githubusercontent.com/lalinsky/mbdata/master/mbslave.conf.default -o mbslave.conf
vim mbslave.conf

However, where should this be placed so that the remainder of mbdata can access it?

Sorry I haven't figured out what binary is calling it and how.

Nearly there!

Thanks.

extras_required misspelled

In setup.py we have:

https://github.com/lalinsky/mbdata/blob/42461d850e3ae47db3c95131ef1309215ce9fc72/setup.py#L31-L34

This presumably should be extras_require instead. Note the extra s, extras is plural, not singular.

400: Bad Request Error with mbslave sync

Initial db setup worked like a dream. When trying to sync for the firs time, I get a 400. Any thoughts? Here is the full trace

(mbdata) max@catify-1:~/vendor$ PGPASSWORD=MYPASS mbslave sync
Downloading https://metabrainz.org/api/musicbrainz/replication-126143.tar.bz2
Traceback (most recent call last):
  File "/home/max/vendor/mbdata/bin/mbslave", line 10, in <module>
    sys.exit(main())
  File "/home/max/vendor/mbdata/lib/python3.6/site-packages/mbdata/replication.py", line 592, in main
    args.func(config, args)
  File "/home/max/vendor/mbdata/lib/python3.6/site-packages/mbdata/replication.py", line 466, in mbslave_sync_main
    tmp = download_packet(base_url, token, replication_seq)
  File "/home/max/vendor/mbdata/lib/python3.6/site-packages/mbdata/replication.py", line 437, in download_packet
    data = urlopen(url, timeout=60)
  File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: BAD REQUEST

Issue replication database with mbdata(25.0.4)

Hi,

I have recently joined the meta/musicbrainz community. For a few days, I have been facing troubles in the replication of Musicbrainz database (through Mbdata). Basically, following the guide there are no problems up to point 6, everything is fine; however, on point 7, where I should import dumps in the form of .tar files, I keep running into this error:

Importing data from mbdump.tar.bz2
- Loading alternative_release_type to musicbrainz.alternative_release_type
Traceback (most recent call last):
File "/home/manuel/.local/bin/mbslave", line 8, in
sys.exit(main())
File "/home/manuel/.local/lib/python3.9/site-packages/mbdata/replication.py", line 592, in main
args.func(config, args)
File "/home/manuel/.local/lib/python3.9/site-packages/mbdata/replication.py", line 253, in mbslave_import_main
load_tar(filename, db, config, config.schemas.ignored_schemas, config.tables.ignored_tables)
File "/home/manuel/.local/lib/python3.9/site-packages/mbdata/replication.py", line 245, in load_tar
cursor.copy_from(tar.extractfile(member), fulltable)
psycopg2.errors.UndefinedTable: relation "musicbrainz.alternative_release_type" does not exist

If I check in the Postgres musicbrainz database, all the tables are available. Any advice on how to proceed? Thank you so much in advance.

Password required when importing data dumps

Hello,

I am currently following the instructions to load the data into SQL, but I am stuck in step 7.

When I am using the command 'mbslave import ../mbdump.tar.bz2 ../mbdump-derived.tar.bz2' I get the followig error :

(base) alexis@1ZSMZM2:~/data/mbdata$ mbslave import ../mbdump.tar.bz2
Traceback (most recent call last):
File "/home/alexis/apps/anaconda3/bin/mbslave", line 10, in
sys.exit(main())
File "/home/alexis/apps/anaconda3/lib/python3.7/site-packages/mbdata/replication.py", line 592, in main
args.func(config, args)
File "/home/alexis/apps/anaconda3/lib/python3.7/site-packages/mbdata/replication.py", line 250, in mbslave_import_main
db = connect_db(config)
File "/home/alexis/apps/anaconda3/lib/python3.7/site-packages/mbdata/replication.py", line 196, in connect_db
return cfg.connect_db(set_search_path=set_search_path, superuser=superuser)
File "/home/alexis/apps/anaconda3/lib/python3.7/site-packages/mbdata/replication.py", line 189, in connect_db
db = psycopg2.connect(**self.database.create_psycopg2_kwargs(superuser=superuser))
File "/home/alexis/apps/anaconda3/lib/python3.7/site-packages/psycopg2/init.py", line 127, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: fe_sendauth: no password supplied

Should I use this command from postgres user instead ? But with which password ?

Thanks

Instructions incorrect?

The instructions need to use 'mbslave' e.g.

echo 'CREATE SCHEMA musicbrainz;' | mbslave psql -S

mbslave psql -f CreateTables.sql

However, mbslave does not seem to exist anywhere or the instructions are not showing me clearly how to include this in my path / install it. This worked very easily with mbslave, but does not seem to work with mbdata (which you referred me to in place of mbslave).

What am I missing?

Thanks.

mbslave without arguments just fails

root@mbdata:~/tmp/mbdata# ./mbslave.py
Traceback (most recent call last):
  File "./mbslave.py", line 6, in <module>
    main()
  File "/root/tmp/mbdata/mbdata/replication.py", line 607, in main
    args.func(config, args)
AttributeError: 'Namespace' object has no attribute 'func'

while

root@mbdata:~/tmp/mbdata# ./mbslave.py -h
usage: mbslave.py [-h] [-c, --config PATH] {import,sync,remap-schema,print-sql,psql} ...

positional arguments:
  {import,sync,remap-schema,print-sql,psql}

optional arguments:
  -h, --help            show this help message and exit
  -c, --config PATH     path to the config file (default: mbslave.conf:/etc/mbslave.conf)

works.

This at least is discouraging, printing help message is expected

root@mbdata:~/tmp/mbdata# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.3 LTS
Release:        20.04
Codename:       focal
root@mbdata:~/tmp/mbdata# python --version
Python 3.8.10

Build Docker images from GitHub Actions

"mbslave sync" command raises exception "AttributeError: 'HTTPResponse' object has no attribute 'name'"

After I successfully completed mbslave init --create-user --create-database command, which took quite a while, the mbslave sync caused an error as following:

INFO:mbdata.replication:Downloading https://metabrainz.org/api/musicbrainz/replication-154689.tar.bz2?token=***
Traceback (most recent call last):
  File "/home/m/.local/bin/mbslave", line 8, in <module>
    sys.exit(main())
  File "/home/m/.local/pipx/venvs/mbdata/lib/python3.10/site-packages/mbdata/replication.py", line 803, in main
    args.func(config, args)
  File "/home/m/.local/pipx/venvs/mbdata/lib/python3.10/site-packages/mbdata/replication.py", line 520, in mbslave_sync_main
    process_tar(packet, db, config, ignored_schemas, ignored_tables, schema_seq, replication_seq, hook)
  File "/home/m/.local/pipx/venvs/mbdata/lib/python3.10/site-packages/mbdata/replication.py", line 457, in process_tar
    logger.info("Processing %s", fileobj.name)
AttributeError: 'HTTPResponse' object has no attribute 'name'

Take a look at the bottom of the trackback. The exception seems to be raised by logging. So, I commented the logging code out and successfully completed mbslave sync command. Hooray!

AttributeError: 'DatabaseConfig' object has no attribute 'connect_db'

When i run mbslave import mbdump.tar.bz2 I get an error:

Traceback (most recent call last):
  File "/usr/local/bin/mbslave", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/mbdata/replication.py", line 803, in main
    args.func(config, args)
  File "/usr/local/lib/python3.8/dist-packages/mbdata/replication.py", line 263, in mbslave_import_main
    db = config.database.connect_db(superuser=True, set_search_path=False)
AttributeError: 'DatabaseConfig' object has no attribute 'connect_db'

I made changes to the file /usr/local/lib/python3.8/dist-packages/mbdata/replication.py line 263

This line was replaced

db = config.database.connect_db(superuser=True, set_search_path=False)

with this

db = connect_db(config)

Is this solution correct?

mbslave sync >>/var/log/mbslave.log

Hi First of all.. thks for your great job on MusicBrainz Server
Everything works find except mbslave sync in crontab..
But mbslave sync >>/var/log/mbslave.log works find in command line mode
I'm under Ubuntu 19.10
Ths for any advice

MBSLAVE_CONFIG and command line argument "-c" don’t behave the same

Using -c path/to/mbslave.conf and having MBSLAVE_CONFIG set to path/to/mbslave.conf (same path) don’t seem to behave identically (as I would have expected):

> echo 'UPDATE replication_control SET current_schema_sequence = 25;' | mbslave psql
psql: FATAL:  no pg_hba.conf entry for host "[local]", user "freso", database "musicbrainz", SSL off
> echo 'UPDATE replication_control SET current_schema_sequence = 25;' | mbslave -c $XDG_CONFIG_HOME/mbdata/mbslave.conf psql
UPDATE 1
> echo $MBSLAVE_CONFIG
/home/freso/.config/mbdata/mbslave.conf
> pacman -Qi python-mbdata-git|grep Version
Version                : 25.0.4.r0.g42461d8-1

acoustid / mbdata Goto Github PK

mbdata's People

Contributors

Stargazers

Watchers

Forkers

mbdata's Issues

Recommend Projects

Recommend Topics

Recommend Org