Coder Social home page Coder Social logo

boojack's Introduction

Hi there 👋

boojack's People

Contributors

boojack avatar

boojack's Issues

Article: "How to do proper pg_dump and pg_restore in PostgreSQL"

How to do proper pg_dump and pg_restore in PostgreSQL

PostgreSQL is an object-relational database, while MySQL is a purely relational database. This means that Postgres includes features like the ownership and privilege of objects:

When you create a database object in PostgreSQL, you become its owner. By default, only the owner of an object can do anything with the object. And in order to allow other users to use it, privileges must be granted.

And that makes a little difference in backup with PostgreSQL. Howerver, by setting options with the backup utilities pg_dump and pg_restore will effectively deal with that, and let's figure out how to use them firstly.

pg_dump: Dump a particular database

pg_dump is a utility that used to dump a database for migration or upgrade purposes.

And the normally basic syntax is shown below:

pg_dump [options] [database_name] > [backup_file_name]

Using pg_dump with the following useful options:

In this article, options are selective shown and with the full name not a shortcut name because it's good for understanding its meaning. You can get all options by reading PostgreSQL official docs.

  • --format=format-type select the output file format type:
    • plain A plain-text SQL script file (the default).
    • custom The custom-format archive suitable for input into pg_restore. And custom format is the most flexible format which compressed by default, speedy dump and restore.
    • directory The archive is a directory archive.
    • tar The archive is a tar archive.
  • --jobs=number-of-jobs can dump multiple tables at the same time and dramatically reduced the time to dump a large database on a multiprocessor machine.
  • --no-owner will not output commands to set the ownership of data objects, and this option is ignored when emitting an non-plain format output.
  • --no-privileges prevent dumping of access privileges (grant/revoke commands).

pg_restore: Restore the database from a dump file

pg_restore is a tool used to restore the database from the dump file.

Here is the basic syntax for using it:

pg_restore [options] [backup_file_name]

A brief explanation of some useful options is shown below:

  • --clean clean (drop) database objects before recreating them.
  • --jobs=number-of-jobs can restore in parallel and dramatically reduce the time to restore from a large database dump file on a multiprocessor machine.
  • —-no-owner any user name can be used for the initial connection, and this user will own all the created objects.
  • --no-privileges prevent restoring of access privileges (grant/revoke commands).
  • --dbname=database_name connect to database named database_name and restore directly into the database.

Best Practice With An Example

Background hint: There are some issues and bugs that only happens in production environment, so I need to reproduce them locally to fix up.

Here's how I did it by using pg_dump and pg_restore to backup the production database firstly, and then restore it locally.

  1. Exporting the database using pg_dump :

    pg_dump \
    --format=custom \
    --jobs=4 \
    DATABASE_NAME > DATABASE_NAME.dmp

    Here's the reason for those chosen options:

    • --format=custom compress the output file to reduce storage costs and as a standard input file for pg_restore;
    • --jobs=4 make this dump in parallel and dramatically reduce the process time.
  2. Using scp to copy the DATABASE_NAME.dmp file from cloud server to local machine:

    scp username@my-remote-cloud-server:/path/to/DATABASE_NAME.dmp ~/Downloads/
  3. And then restore it locally with pg_restore :

    pg_restore \
    --no-owner \
    --no-privileges \
    --jobs=4 \
    --clean \
    --dbname=NEW_DATABASE_NAME \
    DATABASE_NAME.dmp

    Here's the meaning for the flag arguments:

    • --no-owner stop trying to set the ownership of the objects to the original user name;
    • --no-privileges prevent restoring the access privileges from the objects;
    • --jobs=4 make this restore in parallel and speed up the process;
    • --clean clean (drop) database objects before recreating them;
    • --dbname=NEW_DATABASE_NAME connect to database named NEW_DATABASE_NAME and restore directly into the database.

Conclusion

Exporting a database in PostgreSQL could be simply like MySQL. But due to the unique ownership system in PostgreSQL, before restoring a dump file, all the database users who own objects or were granted privileges on objects in the dumped database must exist in the target database.

If they don't, the import operation fails to recreate the objects with the original ownership or privilege and will throw an error:

pg_restore: [archiver (db)] could not execute query: ERROR: role "owner_name" does not exist

By setting the --no-owner and --no-privileges options in pg_restore will effectively prevent this error and stop the transform of ownership/privilege from origin objects.

Furthermore, here are 3 tips about the right way to use the backup commands in PostgreSQL:

  1. Using custom format not plain script file

    Using --format=custom to compress the output file can save significant storage space and reduce the storage costs, especially when you are exporting a huge database. And the beauty of a custom format dump is that it is easily possible to extract just a subset of tables, a single index, or maybe a single procedure.

  2. Prevent the ownership/privilege of objects

    By default, pg_restore issues ALTER OWNER or SET SESSION AUTHORIZATION statements to set ownership of created schema elements. These statements will fail unless the initial connection to the database is made by a superuser or origin-user.
    With --no-owner and --no-privileges options setting in pg_restore, any user name can be used for the initial connection, and this user will own all the created objects.

  3. Making full use of the CPUs

    Setting --jobs=n option in a multiprocessor machine can drastically speed up the process of the dump and restore.

Now that you have seen how to dump and restore a database in PostgreSQL, just go ahead and try it yourself!

References

  1. PostgreSQL Official Docs: backup and restore, pg_dump, pg_restore
  2. Best practices for importing and exporting data in Google Cloud

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.