Coder Social home page Coder Social logo

mbdata's People

Contributors

alastair avatar amcap1712 avatar dependabot[bot] avatar felix avatar gerion0 avatar lalinsky avatar mjpieters avatar reosarevok avatar sfussenegger avatar wsovine avatar yvanzo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mbdata's Issues

Are the environment variables set correctly?

I don't really understand python but it seems to me that the application tries to read the environment variable MBSLAVE_DB_DB for the db name and not MBSLAVE_DB_NAME as it is stated in the documentation.

read_env_item(self, 'name', prefix + 'DB_DB')

self.musicbrainz.read_env('MBSLAVE_')

I tried to add MBSLAVE_DB_DB as an environment variable but still for some reason I am getting an error.

The issue I am facing is that although I am setting up the environment variables like this (in a k8s yml file):

containers:
      - name: musicbrainz-db-mirror
        image: leiyiliro/mbslave:1.0  # Specific version of the Docker image
        env:
        - name: MBSLAVE_DB_HOST
          value: musicbrainz-db
        - name: MBSLAVE_DB_PORT
          value: "5432"
        - name: MBSLAVE_DB_NAME
          value: musicbrainz
        - name: MBSLAVE_DB_DB
          value: musicbrainz
          # Used for read and write operations on the MusicBrainz database          
        - name: MBSLAVE_DB_USER
          value: $(POSTGRES_USER)
          # PostgreSQL database password for the general user
        - name: MBSLAVE_DB_PASSWORD
          value: $(POSTGRES_PASSWORD)
          # Used for creating and managing the mbslave database, schema updates, and replication
        - name: MBSLAVE_DB_ADMIN_USER
          value: $(POSTGRES_USER)
          # MusicBrainz Slave admin password for the admin user
        - name: MBSLAVE_DB_ADMIN_PASSWORD
          value: $(POSTGRES_PASSWORD)
        - name: MBSLAVE_MUSICBRAINZ_TOKEN
          value: $(MBSLAVE_MUSICBRAINZ_TOKEN)
        ports:
        - containerPort: 80

The code seems to look for a database with the name of my env variable $(POSTGRES_USER) ,which is "xxxxxx".

Traceback (most recent call last):
2023-04-18 00:30:10   File "/usr/local/bin/mbslave", line 8, in <module>
2023-04-18 00:30:10     sys.exit(main())
2023-04-18 00:30:10              ^^^^^^
2023-04-18 00:30:10   File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 803, in main
2023-04-18 00:30:10     args.func(config, args)
2023-04-18 00:30:10   File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 622, in mbslave_init_main
2023-04-18 00:30:10     create_user(config)
2023-04-18 00:30:10   File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 576, in create_user
2023-04-18 00:30:10     db = connect_db(config, superuser=True, no_db=True)
2023-04-18 00:30:10          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-04-18 00:30:10   File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 209, in connect_db
2023-04-18 00:30:10     return cfg.connect_db(set_search_path=set_search_path, superuser=superuser, no_db=no_db)
2023-04-18 00:30:10            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-04-18 00:30:10   File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 202, in connect_db
2023-04-18 00:30:10     db = psycopg2.connect(**self.database.create_psycopg2_kwargs(superuser=superuser, no_db=no_db))
2023-04-18 00:30:10          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-04-18 00:30:10   File "/usr/local/lib/python3.11/site-packages/psycopg2/__init__.py", line 122, in connect
2023-04-18 00:30:10     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
2023-04-18 00:30:10            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-04-18 00:30:10 psycopg2.OperationalError: FATAL:  database "xxxxxx" does not exist
2023-04-18 00:30:10 
2023-04-18 00:30:10 Traceback (most recent call last):
2023-04-18 00:30:10   File "/usr/local/bin/mbslave", line 8, in <module>
2023-04-18 00:30:10     sys.exit(main())
2023-04-18 00:30:10              ^^^^^^
2023-04-18 00:30:10   File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 803, in main
2023-04-18 00:30:10     args.func(config, args)
2023-04-18 00:30:10   File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 513, in mbslave_sync_main
2023-04-18 00:30:10     cursor.execute("SELECT current_schema_sequence, current_replication_sequence FROM %s.replication_control" % config.schemas.name('musicbrainz'))
2023-04-18 00:30:10 psycopg2.errors.UndefinedTable: relation "musicbrainz.replication_control" does not exist
2023-04-18 00:30:10 LINE 1: ...chema_sequence, current_replication_sequence FROM musicbrain...

mbslave: error: invalid choice: 'init'

mbslave init --create-user --create-database

Gives the following error -

usage: mbslave [-h] [-c, --config PATH] {import,sync,remap-schema,print-sql,psql} ...
mbslave: error: invalid choice: 'init' (choose from 'import', 'sync', 'remap-schema', 'print-sql', 'psql')

mbslave command does not recognize long options

I haven't tried every option, but none of those I tried worked.

$> mbslave psql --file CreateCollations.sql
usage: mbslave [-h] [-c, --config PATH]
               {import,sync,remap-schema,print-sql,psql} ...
mbslave: error: unrecognized arguments: --file CreateCollations.sql

That's one example of several I tried.

The only long option it seems to recognize is --help.

"musicbrainz.alternative_release_type" does not exist

~/ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/20211120-001843$ mbslave import mbdump.tar.bz2 mbdump-derived.tar.bz2
Importing data from mbdump.tar.bz2

  • Loading alternative_release_type to musicbrainz.alternative_release_type
    Traceback (most recent call last):
    File "/home/ubuntu/.local/bin/mbslave", line 8, in
    sys.exit(main())
    File "/home/ubuntu/.local/lib/python3.8/site-packages/mbdata/replication.py", line 592, in main
    args.func(config, args)
    File "/home/ubuntu/.local/lib/python3.8/site-packages/mbdata/replication.py", line 253, in mbslave_import_main
    load_tar(filename, db, config, config.schemas.ignored_schemas, config.tables.ignored_tables)
    File "/home/ubuntu/.local/lib/python3.8/site-packages/mbdata/replication.py", line 245, in load_tar
    cursor.copy_from(tar.extractfile(member), fulltable)
    psycopg2.errors.UndefinedTable: relation "musicbrainz.alternative_release_type" does not exist

Seems to be related to the issue mentioned here:
https://community.metabrainz.org/t/database-replication-issue-with-mbdata-25-0-4/548110

Issue with --create-database argument

I am trying to run the mbslave init --create-database

The code here:

def create_database(config: Config) -> None:

seems to try to connect to the "musicbrainz" database before it is created, and I get the error FATAL: database "musicbrainz" does not exist:
image

Traceback (most recent call last):
  File "/usr/local/bin/mbslave", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 803, in main
    args.func(config, args)
  File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 625, in mbslave_init_main
    create_database(config)
  File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 583, in create_database
    db = connect_db(config, superuser=True, no_db=True)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 209, in connect_db
    return cfg.connect_db(set_search_path=set_search_path, superuser=superuser, no_db=no_db)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mbdata/replication.py", line 202, in connect_db
    db = psycopg2.connect(**self.database.create_psycopg2_kwargs(superuser=superuser, no_db=no_db))
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
psycopg2.OperationalError: FATAL:  database "musicbrainz" does not exist

How to actually access MB database with this tool?

The documentation states that:

Alternatively, if you are not interested in having a local MusicBrainz website and web service, you can use mbdata that includes replication without the rest of MusicBrainz Server.

But how to get the replica itself? The README includes:

>>> engine = create_engine('postgresql://musicbrainz:[email protected]/musicbrainz', echo=True)

Is this local existing database? If yes how to spin it?
To have a fully functioanl MB-server? In such case it doesn't match with the original docs.

musicbrainz.release table restoration takes too long probably because of triggers

I first use the mbslave command to restore musicbrainz db from a dump on my computer, since it took ages (didn't finish in a day) I dig in the code to narrow it down to restoring the release table from a downloaded db dump

from mbdata.replication import Config

config = Config(['mbslave.conf'])
db = config.connect_db()
filename = "mbdump/release"
schema, table = "musicbrainz", "release"

cursor = db.cursor()
with open(filename, 'r') as f:
    cursor.copy_expert('COPY {} FROM STDIN'.format("musicbrainz.release"), f)
db.commit()

Still very long - after 4 hours the data is not inserted

If I disable the triggers on this table:

cursor = db.cursor()
cursor.execute("ALTER TABLE musicbrainz.release DISABLE TRIGGER ALL")
with open(filename, 'r') as f:
    cursor.copy_expert('COPY {} FROM STDIN'.format("musicbrainz.release"), f)
cursor.execute("ALTER TABLE musicbrainz.release ENABLE TRIGGER ALL")

the restoration takes a little more than a minute.

Doing so before restoring each table might speed up the restoration process by a few hundreds, there might be a reason not to do it that is beyond the reach of my thinking :-|

Is it doable?

PS: in postgresql log file there's also a mention of WAL writing occuring too frequently, so a 'SET UNLOGGED' on each table before restoring might be a good idea, but I have no idea if it affects the performances dramatically

where are mbdump.tar.bz2 mbdump-derived.tar.bz2 on debian?

After downloading from:

http://ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/

Where are these files?

mbslave import mbdump.tar.bz2 mbdump-derived.tar.bz2

ls -lah
total 60K
drwxrwxr-x 9 ubuntu ubuntu 4.0K Nov 21 04:35 .
drwxr-xr-x 8 ubuntu ubuntu 4.0K Nov 21 22:12 ..
drwxrwxr-x 2 ubuntu ubuntu 4.0K Nov 21 04:35 debian
drwxrwxr-x 2 ubuntu ubuntu 4.0K Nov 21 04:35 debian-cd
drwxrwxr-x 2 ubuntu ubuntu 4.0K Nov 21 04:35 debian-cdimage
drwxrwxr-x 2 ubuntu ubuntu 4.0K Nov 21 18:15 header-inc
drwxrwxr-x 2 ubuntu ubuntu 4.0K Nov 21 05:45 icons
-rw-rw-r-- 1 ubuntu ubuntu 3.6K Nov 21 22:17 index.html
-rw-rw-r-- 1 ubuntu ubuntu 3.6K Nov 22 07:14 'index.html?C=D;O=A'
-rw-rw-r-- 1 ubuntu ubuntu 3.6K Nov 22 07:14 'index.html?C=M;O=A'
-rw-rw-r-- 1 ubuntu ubuntu 3.6K Nov 22 07:14 'index.html?C=N;O=D'
-rw-rw-r-- 1 ubuntu ubuntu 3.6K Nov 22 07:14 'index.html?C=S;O=A'
drwxrwxr-x 2 ubuntu ubuntu 4.0K Nov 21 04:35 lost+found
drwxrwxr-x 113 ubuntu ubuntu 4.0K Nov 21 22:17 pub
-rw-rw-r-- 1 ubuntu ubuntu 754 May 13 2015 welcome.msg

New release (28)

How BIG is the database

I find it funny when you go to the Apple App Store, and you aren't told how LARGE an app is. That is a KEY piece of information. When you get charged $1700 for a 1tb phone one of the MAIN DETERMINANTS I have in determining 'which games do I play' correlate to 'HOW LARGE IS THE GAME'?

Likewise, I don't have any idea how large this database is. That is the MAIN LIMITING FACTOR for me today. As a Database professional, I COULD install this on many different machines.

but I don't know, and I CANNOT decide what machine to install this on without having SOME ballpark of how large this app is. To be honest, I wish that GitHub would tell me 'How Large is a Project' before I choose to 'Download the Zip File' or connect to the project using Github Desktop. If it's SMALL? A zip file is preferable.

Right now, I have utterly fallen in love with Music Brainz. I want to query the Database in PLAIN OLD SQL. I can't do that until I google 'How Large Is the Music Brains Database'. Things should be easier than that. Google doesn't even return the SAME RESULTS for every person.

Instructions missing

Instructions only say to 'pip install mbdata' then adjust mbslave.conf.default. However this file does not appear to exist unless cloning the repository (according to my find query and general poking around).

Error running mbslave sync

I see the following error when running mbslave sync

% mbslave sync
INFO:mbdata.replication:Downloading https://metabrainz.org/api/musicbrainz/replication-155737.tar.bz2?token=***
Traceback (most recent call last):
  File "/Users/simonhopkin/.local/bin/mbslave", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/simonhopkin/.local/pipx/venvs/mbdata/lib/python3.11/site-packages/mbdata/replication.py", line 803, in main
    args.func(config, args)
  File "/Users/simonhopkin/.local/pipx/venvs/mbdata/lib/python3.11/site-packages/mbdata/replication.py", line 520, in mbslave_sync_main
    process_tar(packet, db, config, ignored_schemas, ignored_tables, schema_seq, replication_seq, hook)
  File "/Users/simonhopkin/.local/pipx/venvs/mbdata/lib/python3.11/site-packages/mbdata/replication.py", line 457, in process_tar
    logger.info("Processing %s", fileobj.name)
                                 ^^^^^^^^^^^^
AttributeError: 'HTTPResponse' object has no attribute 'name'

I installed mbdata using pipx:

pipx install 'mbdata[replication]'

Creating Schema results in error

echo 'CREATE SCHEMA musicbrainz;' | mbslave psql -S

Results in:

Traceback (most recent call last):
File "/usr/local/bin/mbslave", line 11, in
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/mbdata/replication.py", line 603, in main
args.func(config, args)
File "/usr/local/lib/python2.7/dist-packages/mbdata/replication.py", line 560, in mbslave_psql_main
process = subprocess.Popen(command, env=environ)
File "/usr/lib/python2.7/subprocess.py", line 394, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1047, in _execute_child
raise child_exception
TypeError: coercing to Unicode: need string or buffer, int found

Apologies if this is me, however I had no trouble completing these pieces when installing mbslave rather than mbdata, so after much config stalking I'm assuming it's something in the instructions or in the configuration of the new code.

relation "musicbrainz.alternative_release_type" does not exist

Following the steps in the README the importing fails with the following error message:

mbslave import mbdump.tar.bz2 
INFO:mbdata.replication:Importing data from mbdump.tar.bz2
INFO:mbdata.replication:Loading alternative_release_type to musicbrainz.alternative_release_type
Traceback (most recent call last):
  File "/home/julian/devel/bandmap/venv/bin/mbslave", line 33, in <module>
    sys.exit(load_entry_point('mbdata==26.0.0', 'console_scripts', 'mbslave')())
  File "/home/julian/devel/bandmap/venv/lib64/python3.9/site-packages/mbdata-26.0.0-py3.9.egg/mbdata/replication.py", line 607, in main
    args.func(config, args)
  File "/home/julian/devel/bandmap/venv/lib64/python3.9/site-packages/mbdata-26.0.0-py3.9.egg/mbdata/replication.py", line 256, in mbslave_import_main
    load_tar(filename, db, config, config.schemas.ignored_schemas, config.tables.ignored_tables)
  File "/home/julian/devel/bandmap/venv/lib64/python3.9/site-packages/mbdata-26.0.0-py3.9.egg/mbdata/replication.py", line 248, in load_tar
    cursor.copy_from(tar.extractfile(member), fulltable)
psycopg2.errors.UndefinedTable: relation "musicbrainz.alternative_release_type" does not exist

Am I doing something wrong or has there been a schema change ?

Instructions do not work

By running sudo su - postgres the shell is in a state where user musicbrainz can be used without a password, but where neither the script mbslave nor the mbdata.replication module exist.

If I exit being logged in as postgres, I have access to mbslave and mbdata.replication, but now I need a password for the musicbrainz user I just created.

Please test the instructions on a virgin system and modify as needed to work. Thanks!

mbslave installed via pip: ImportError: No module named mbdata.replication

I installed mbdata via pip exactly how it's described in the instructions. Unfortunately i get an error each time i try to use the mbslave script:

Traceback (most recent call last):
File "/home/****/.local/bin/mbslave", line 6, in
from mbdata.replication import main
ImportError: No module named mbdata.replication

There is only the mbslave script in the directory after doing the pip install.
The script works great if i clone the whole git instead.

CreateIndexes.sql -> "musicbrainz.ll_to_earth(double precision, double precision) does not exist"

Running mbslave psql -f CreateIndexes.sql executes for a long time with lots of "CREATE INDEX" outputs, but ultimately errors out with:

ERROR:  function musicbrainz.ll_to_earth(double precision, double precision) does not exist
LINE 1: CREATE INDEX place_idx_geo ON place USING gist (musicbrainz....
                                                        ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

No module named 'psycopg2'

I'm trying to follow the instructions in a raspberry pi with ubuntu 22.04.

Once I get to part 5 of the instructions it always fails at the first command.

Traceback (most recent call last):
File "/home/ubuntu/.local/bin/mbslave", line 5, in <module>
from mbdata.replication import main
File "/home/ubuntu/.local/pipx/venvs/mbdata/lib/python3.10/site-packages/mbdata/replication.py", line 7, in <module>
import psycopg2
ModuleNotFoundError: No module named 'psycopg2'

I tried to install psycopg2 in several ways (installing psycopg2-binary using pip and pip3, building from the source and installing it), and in all those times the installation was successful. However, those mbslave commands always end up not finding the module.

What am I missing here?

ERROR: relation "art_type" already exists

The following error is displayed when running mbslave init --create-user --create-database

psql:/var/folders/63/wlxrc7ds36s2mhyddjpg56_80000gn/T/tmprzrao_gq.sql:14: ERROR:  relation "art_type" already exists
Traceback (most recent call last):
  File "/Users/simonhopkin/.local/bin/mbslave", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/simonhopkin/.local/pipx/venvs/mbdata/lib/python3.11/site-packages/mbdata/replication.py", line 803, in main
    args.func(config, args)
  File "/Users/simonhopkin/.local/pipx/venvs/mbdata/lib/python3.11/site-packages/mbdata/replication.py", line 653, in mbslave_init_main
    run_sql_script(sql_script)
  File "/Users/simonhopkin/.local/pipx/venvs/mbdata/lib/python3.11/site-packages/mbdata/replication.py", line 617, in run_sql_script
    run_script(command)
  File "/Users/simonhopkin/.local/pipx/venvs/mbdata/lib/python3.11/site-packages/mbdata/replication.py", line 609, in run_script
    subprocess.run(['bash', '-euxc', script], check=True)
  File "/usr/local/Cellar/[email protected]/3.11.1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['bash', '-euxc', 'mbslave psql -f eaa/CreateTables.sql']' returned non-zero exit status 3.

I'm using Postgresql 15 on Mac OS, Python 3.10.9

This issue only occurs when applying the following suggestion to merge into one schema:

[schemas]
musicbrainz=musicbrainz
statistics=musicbrainz
cover_art_archive=musicbrainz
wikidocs=musicbrainz
documentation=musicbrainz

README references non-existent SQL files

I think the README is out of date with the actual SQL files. For example, it shows mbslave psql -f CreateCollations.sql, but that file doesn't exist.

I haven't checked every such command, but I think a number of them are no longer correct.

mbslave command not found

Hello,

I'm trying to import the MusicBrainz database. I'm a student and don't really know what I'm doing. I try to do the commands with the mbslave command, but it tells me that it doesn't exist (this inside the PostgreSQL, which I think is correct)
This is my mbslave file:

[database]
host=127.0.0.1
port=5432
name=musicbrainz
user=musicbrainz
password=musicbrainz

[musicbrainz]
base_url=https://metabrainz.org/api/musicbrainz/
token=MyToken

[tables]
ignore=
#ignore=tracklist_index

[schemas]
musicbrainz=musicbrainz
statistics=statistics
cover_art_archive=cover_art_archive
event_art_archive=event_art_archive
wikidocs=wikidocs
documentation=documentation
ignore=
#ignore=statistics,cover_art_archive,wikidocs,documentation

Can someone help me? I'm not sure what I'm doing wrong.

Build instructions that actually work.

Make sure you're using Postgre 12 or later.

  1. Complete the first 3 steps to install mbdata and setup your mbslave.conf file.

    The SQL scripts in the mbdata repo haven't been updated in over 6 months so you need to
    either clone the Musicbrainz git repo or download the latest Musicbrainz zip file from
    Github and copy and/or exact the contents of the musicbrainz-server-master/admin/sql folder
    and replace the files in the mbdata/sql folder.

    If you used pipx the mbdata/sql folder will be located somewhere like the following:

    ~/.local/pipx/venvs/mbdata/lib/python3.x/site-packages/mbdata/sql

    If needed, use find to locate your specific folder location.

    The remainder of the steps where taken verbatim from the Musicbrainz InitDb.pl script. This will create a complete
    Musicbrainz slave database will all tables, indexes, etc. You may want to customize as needed if say for instance
    you're not importing wikidocs or other non-essential data dumps.

  2. Create the Database:

    sudo su - postgres
    createuser musicbrainz
    createdb -l C -E UNICODE -T template0 -O musicbrainz musicbrainz

    #MB now use DateTime->now to populate 'TIMESTAMP WITH TIME ZONE' columns in their code.
    #DateTime->now outputs 'floating' UTC by default, but doesn't encode any timezone info in its output, so the
    #database must have its timezone set to UTC in order to correctly interpret those values.

    psql musicbrainz -c 'ALTER DATABASE musicbrainz SET timezone TO 'UTC';'

    psql musicbrainz -c 'CREATE EXTENSION IF NOT EXISTS cube WITH SCHEMA public;'
    psql musicbrainz -c 'CREATE EXTENSION IF NOT EXISTS earthdistance WITH SCHEMA public;'
    psql musicbrainz -c 'CREATE EXTENSION IF NOT EXISTS unaccent WITH SCHEMA public;'

    #exit out of sudo back to the account you setup for mbdata

  3. Prepare empty schemas:

    echo 'CREATE SCHEMA musicbrainz;' | mbslave psql -S
    echo 'CREATE SCHEMA statistics;' | mbslave psql -S
    echo 'CREATE SCHEMA cover_art_archive;' | mbslave psql -S
    echo 'CREATE SCHEMA wikidocs;' | mbslave psql -S
    echo 'CREATE SCHEMA documentation;' | mbslave psql -S
    echo 'CREATE SCHEMA event_art_archive;' | mbslave psql -S
    echo 'CREATE SCHEMA json_dump;' | mbslave psql -S
    echo 'CREATE SCHEMA report;' | mbslave psql -S
    echo 'CREATE SCHEMA sitemaps;' | mbslave psql -S

  4. Create tables structures:

    #The first script will give an error that the extensions already exist because we already added them in step 1. The
    #extensions have to be created in step 1 as you need to be a superuser account to create extensions. We just need
    #to run the Extensions.sql script to add the musicbrainz.ll_to_earth() function so index creation won't fail as reported
    #in this issue ticket.

    mbslave psql -f Extensions.sql
    mbslave psql -f CreateCollations.sql
    mbslave psql -f CreateTables.sql
    mbslave psql -f caa/CreateTables.sql
    mbslave psql -f eaa/CreateTables.sql
    mbslave psql -f documentation/CreateTables.sql
    mbslave psql -f json_dump/CreateTables.sql
    mbslave psql -f report/CreateTables.sql
    mbslave psql -f sitemaps/CreateTables.sql
    mbslave psql -f statistics/CreateTables.sql
    mbslave psql -f wikidocs/CreateTables.sql

  5. Import the data dumps. Minimally you need the following two dump files:

    mbslave import mbdump.tar.bz2 mbdump-derived.tar.bz2

  6. Create the primary keys:

    mbslave psql -f CreatePrimaryKeys.sql
    mbslave psql -f caa/CreatePrimaryKeys.sql
    mbslave psql -f documentation/CreatePrimaryKeys.sql
    mbslave psql -f eaa/CreatePrimaryKeys.sql
    mbslave psql -f statistics/CreatePrimaryKeys.sql
    mbslave psql -f wikidocs/CreatePrimaryKeys.sql

  7. Create functions

    mbslave psql -f CreateSearchConfiguration.sql
    mbslave psql -f CreateFunctions.sql
    mbslave psql -f caa/CreateFunctions.sql
    mbslave psql -f eaa/CreateFunctions.sql
    mbslave psql -f CreateSlaveOnlyFunctions.sql

  8. Create indexes

    mbslave psql -f CreateIndexes.sql
    mbslave psql -f caa/CreateIndexes.sql
    mbslave psql -f eaa/CreateIndexes.sql
    mbslave psql -f json_dump/CreateIndexes.sql
    mbslave psql -f sitemaps/CreateIndexes.sql
    mbslave psql -f statistics/CreateIndexes.sql
    mbslave psql -f CreateSlaveIndexes.sql

  9. Set initial sequence values

    mbslave psql -f SetSequences.sql
    mbslave psql -f statistics/SetSequences.sql

  10. Create views and triggers and search indexes

    mbslave psql -f CreateViews.sql
    mbslave psql -f caa/CreateViews.sql
    mbslave psql -f eaa/CreateViews.sql
    mbslave psql -f CreateSlaveOnlyTriggers.sql
    mbslave psql -f CreateSearchIndexes.sql

mbslave is wrong

cat /etc/mbslave.conf

[database]
host=127.0.0.1
port=5432
name=musicbrainz
user=musicbrainz
#password=

[musicbrainz]
base_url=https://metabrainz.org/api/musicbrainz/
token=Eqzq...(this is my token)

[tables]
ignore=
#ignore=tracklist_index

[schemas]
musicbrainz=musicbrainz
statistics=statistics
cover_art_archive=cover_art_archive
event_art_archive=event_art_archive
wikidocs=wikidocs
documentation=documentation
ignore=
#ignore=statistics,cover_art_archive,wikidocs,documentation

when I work with step 5
Prepare empty schemas for the MusicBrainz database and create the table structure:
echo 'CREATE SCHEMA musicbrainz;' | mbslave psql -S
echo 'CREATE SCHEMA statistics;' | mbslave psql -S
echo 'CREATE SCHEMA cover_art_archive;' | mbslave psql -S
echo 'CREATE SCHEMA wikidocs;' | mbslave psql -S
echo 'CREATE SCHEMA documentation;' | mbslave psql -S

mbslave psql -f CreateCollations.sql
mbslave psql -f CreateTables.sql
mbslave psql -f statistics/CreateTables.sql
mbslave psql -f caa/CreateTables.sql
mbslave psql -f wikidocs/CreateTables.sql
mbslave psql -f documentation/CreateTables.sql

it get a wrong

[root@test pgsql]# echo 'CREATE SCHEMA musicbrainz;' | mbslave psql -S
Traceback (most recent call last):
File "/bin/mbslave", line 7, in
from mbdata.replication import main
File "/usr/lib/python2.7/site-packages/mbdata/replication.py", line 20, in
from contextlib2 import ExitStack
File "/usr/lib/python2.7/site-packages/contextlib2/init.py", line 56
async def aenter(self):
^
SyntaxError: invalid syntax

Import fails

Trying to build a standalone database (no replication) with the June 1 extracts and get a psycopg2.errors.BadCopyFileFormat error - see output / error messages below. I'm assuming this is related to the schema upgrade last month, but it's not clear how or where to retrieve the latest / correct sql. I would have assumed an update to the mbdata package would correspond to the schema changes, but no luck. I'm not running the server or docker - just want to play around with the database. Any pointers / advice much appreciated. Thanks.

mbslave import mbdump.tar.bz2
INFO:mbdata.replication:Importing data from mbdump.tar.bz2
INFO:mbdata.replication:Loading alternative_release_type to musicbrainz.alternative_release_type
INFO:mbdata.replication:Loading area to musicbrainz.area
INFO:mbdata.replication:Loading area_alias to musicbrainz.area_alias
INFO:mbdata.replication:Loading area_alias_type to musicbrainz.area_alias_type
INFO:mbdata.replication:Loading area_gid_redirect to musicbrainz.area_gid_redirect
INFO:mbdata.replication:Loading area_type to musicbrainz.area_type
INFO:mbdata.replication:Loading artist to musicbrainz.artist
INFO:mbdata.replication:Loading artist_alias to musicbrainz.artist_alias
INFO:mbdata.replication:Loading artist_alias_type to musicbrainz.artist_alias_type
INFO:mbdata.replication:Loading artist_credit to musicbrainz.artist_credit
Traceback (most recent call last):
File "/Users/me/dev/miniconda3/bin/mbslave", line 8, in
sys.exit(main())
File "/Users/me/dev/miniconda3/lib/python3.8/site-packages/mbdata/replication.py", line 607, in main
args.func(config, args)
File "/Users/me/dev/miniconda3/lib/python3.8/site-packages/mbdata/replication.py", line 256, in mbslave_import_main
load_tar(filename, db, config, config.schemas.ignored_schemas, config.tables.ignored_tables)
File "/Users/me/dev/miniconda3/lib/python3.8/site-packages/mbdata/replication.py", line 248, in load_tar
cursor.copy_from(tar.extractfile(member), fulltable)
psycopg2.errors.BadCopyFileFormat: extra data after last expected column
CONTEXT: COPY artist_credit, line 1: "2152096 The Chats 1 202 2018-01-26 11:59:06.33519+00 0 33fbf1e4-4768-30cc-a5c6-1c72f4f45826"

400: Bad Request Error with mbslave sync

Initial db setup worked like a dream. When trying to sync for the firs time, I get a 400. Any thoughts? Here is the full trace

(mbdata) max@catify-1:~/vendor$ PGPASSWORD=MYPASS mbslave sync
Downloading https://metabrainz.org/api/musicbrainz/replication-126143.tar.bz2
Traceback (most recent call last):
  File "/home/max/vendor/mbdata/bin/mbslave", line 10, in <module>
    sys.exit(main())
  File "/home/max/vendor/mbdata/lib/python3.6/site-packages/mbdata/replication.py", line 592, in main
    args.func(config, args)
  File "/home/max/vendor/mbdata/lib/python3.6/site-packages/mbdata/replication.py", line 466, in mbslave_sync_main
    tmp = download_packet(base_url, token, replication_seq)
  File "/home/max/vendor/mbdata/lib/python3.6/site-packages/mbdata/replication.py", line 437, in download_packet
    data = urlopen(url, timeout=60)
  File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: BAD REQUEST

Issue replication database with mbdata(25.0.4)

Hi,

I have recently joined the meta/musicbrainz community. For a few days, I have been facing troubles in the replication of Musicbrainz database (through Mbdata). Basically, following the guide there are no problems up to point 6, everything is fine; however, on point 7, where I should import dumps in the form of .tar files, I keep running into this error:

Importing data from mbdump.tar.bz2
 - Loading alternative_release_type to musicbrainz.alternative_release_type
Traceback (most recent call last):
  File "/home/manuel/.local/bin/mbslave", line 8, in
    sys.exit(main())
  File "/home/manuel/.local/lib/python3.9/site-packages/mbdata/replication.py", line 592, in main
    args.func(config, args)
  File "/home/manuel/.local/lib/python3.9/site-packages/mbdata/replication.py", line 253, in mbslave_import_main
    load_tar(filename, db, config, config.schemas.ignored_schemas, config.tables.ignored_tables)
  File "/home/manuel/.local/lib/python3.9/site-packages/mbdata/replication.py", line 245, in load_tar
    cursor.copy_from(tar.extractfile(member), fulltable)
psycopg2.errors.UndefinedTable: relation "musicbrainz.alternative_release_type" does not exist

If I check in the Postgres musicbrainz database, all the tables are available. Any advice on how to proceed? Thank you so much in advance.

Password required when importing data dumps

Hello,

I am currently following the instructions to load the data into SQL, but I am stuck in step 7.

When I am using the command 'mbslave import ../mbdump.tar.bz2 ../mbdump-derived.tar.bz2' I get the followig error :

(base) alexis@1ZSMZM2:~/data/mbdata$ mbslave import ../mbdump.tar.bz2
Traceback (most recent call last):
File "/home/alexis/apps/anaconda3/bin/mbslave", line 10, in
sys.exit(main())
File "/home/alexis/apps/anaconda3/lib/python3.7/site-packages/mbdata/replication.py", line 592, in main
args.func(config, args)
File "/home/alexis/apps/anaconda3/lib/python3.7/site-packages/mbdata/replication.py", line 250, in mbslave_import_main
db = connect_db(config)
File "/home/alexis/apps/anaconda3/lib/python3.7/site-packages/mbdata/replication.py", line 196, in connect_db
return cfg.connect_db(set_search_path=set_search_path, superuser=superuser)
File "/home/alexis/apps/anaconda3/lib/python3.7/site-packages/mbdata/replication.py", line 189, in connect_db
db = psycopg2.connect(**self.database.create_psycopg2_kwargs(superuser=superuser))
File "/home/alexis/apps/anaconda3/lib/python3.7/site-packages/psycopg2/init.py", line 127, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: fe_sendauth: no password supplied

Should I use this command from postgres user instead ? But with which password ?

Thanks

Instructions incorrect?

The instructions need to use 'mbslave' e.g.

echo 'CREATE SCHEMA musicbrainz;' | mbslave psql -S

or

mbslave psql -f CreateTables.sql

However, mbslave does not seem to exist anywhere or the instructions are not showing me clearly how to include this in my path / install it. This worked very easily with mbslave, but does not seem to work with mbdata (which you referred me to in place of mbslave).

What am I missing?

Thanks.

mbslave without arguments just fails

root@mbdata:~/tmp/mbdata# ./mbslave.py
Traceback (most recent call last):
  File "./mbslave.py", line 6, in <module>
    main()
  File "/root/tmp/mbdata/mbdata/replication.py", line 607, in main
    args.func(config, args)
AttributeError: 'Namespace' object has no attribute 'func'

while

root@mbdata:~/tmp/mbdata# ./mbslave.py -h
usage: mbslave.py [-h] [-c, --config PATH] {import,sync,remap-schema,print-sql,psql} ...

positional arguments:
  {import,sync,remap-schema,print-sql,psql}

optional arguments:
  -h, --help            show this help message and exit
  -c, --config PATH     path to the config file (default: mbslave.conf:/etc/mbslave.conf)

works.

This at least is discouraging, printing help message is expected

root@mbdata:~/tmp/mbdata# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.3 LTS
Release:        20.04
Codename:       focal
root@mbdata:~/tmp/mbdata# python --version
Python 3.8.10

"mbslave sync" command raises exception "AttributeError: 'HTTPResponse' object has no attribute 'name'"

After I successfully completed mbslave init --create-user --create-database command, which took quite a while, the mbslave sync caused an error as following:

INFO:mbdata.replication:Downloading https://metabrainz.org/api/musicbrainz/replication-154689.tar.bz2?token=***
Traceback (most recent call last):
  File "/home/m/.local/bin/mbslave", line 8, in <module>
    sys.exit(main())
  File "/home/m/.local/pipx/venvs/mbdata/lib/python3.10/site-packages/mbdata/replication.py", line 803, in main
    args.func(config, args)
  File "/home/m/.local/pipx/venvs/mbdata/lib/python3.10/site-packages/mbdata/replication.py", line 520, in mbslave_sync_main
    process_tar(packet, db, config, ignored_schemas, ignored_tables, schema_seq, replication_seq, hook)
  File "/home/m/.local/pipx/venvs/mbdata/lib/python3.10/site-packages/mbdata/replication.py", line 457, in process_tar
    logger.info("Processing %s", fileobj.name)
AttributeError: 'HTTPResponse' object has no attribute 'name'

Take a look at the bottom of the trackback. The exception seems to be raised by logging. So, I commented the logging code out and successfully completed mbslave sync command. Hooray!

AttributeError: 'DatabaseConfig' object has no attribute 'connect_db'

When i run mbslave import mbdump.tar.bz2 I get an error:

Traceback (most recent call last):
  File "/usr/local/bin/mbslave", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/mbdata/replication.py", line 803, in main
    args.func(config, args)
  File "/usr/local/lib/python3.8/dist-packages/mbdata/replication.py", line 263, in mbslave_import_main
    db = config.database.connect_db(superuser=True, set_search_path=False)
AttributeError: 'DatabaseConfig' object has no attribute 'connect_db'

I made changes to the file /usr/local/lib/python3.8/dist-packages/mbdata/replication.py line 263

This line was replaced

db = config.database.connect_db(superuser=True, set_search_path=False)

with this

db = connect_db(config)

Is this solution correct?

mbslave sync >>/var/log/mbslave.log

Hi First of all.. thks for your great job on MusicBrainz Server
Everything works find except mbslave sync in crontab..
But mbslave sync >>/var/log/mbslave.log works find in command line mode
I'm under Ubuntu 19.10
Ths for any advice

MBSLAVE_CONFIG and command line argument "-c" don’t behave the same

Using -c path/to/mbslave.conf and having MBSLAVE_CONFIG set to path/to/mbslave.conf (same path) don’t seem to behave identically (as I would have expected):

> echo 'UPDATE replication_control SET current_schema_sequence = 25;' | mbslave psql
psql: FATAL:  no pg_hba.conf entry for host "[local]", user "freso", database "musicbrainz", SSL off
> echo 'UPDATE replication_control SET current_schema_sequence = 25;' | mbslave -c $XDG_CONFIG_HOME/mbdata/mbslave.conf psql
UPDATE 1
> echo $MBSLAVE_CONFIG
/home/freso/.config/mbdata/mbslave.conf
> pacman -Qi python-mbdata-git|grep Version
Version                : 25.0.4.r0.g42461d8-1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.