Hi there 👋
boojack's Introduction
boojack's People
boojack's Issues
Article: "How to do proper pg_dump and pg_restore in PostgreSQL"
How to do proper pg_dump and pg_restore in PostgreSQL
PostgreSQL is an object-relational database, while MySQL is a purely relational database. This means that Postgres includes features like the ownership and privilege of objects:
When you create a database object in PostgreSQL, you become its owner. By default, only the owner of an object can do anything with the object. And in order to allow other users to use it, privileges must be granted.
And that makes a little difference in backup with PostgreSQL. Howerver, by setting options with the backup utilities pg_dump
and pg_restore
will effectively deal with that, and let's figure out how to use them firstly.
pg_dump: Dump a particular database
pg_dump
is a utility that used to dump a database for migration or upgrade purposes.
And the normally basic syntax is shown below:
pg_dump [options] [database_name] > [backup_file_name]
Using pg_dump
with the following useful options:
In this article, options are selective shown and with the full name not a shortcut name because it's good for understanding its meaning. You can get all options by reading PostgreSQL official docs.
--format=format-type
select the output file format type:plain
A plain-text SQL script file (the default).custom
The custom-format archive suitable for input intopg_restore
. And custom format is the most flexible format which compressed by default, speedy dump and restore.directory
The archive is a directory archive.tar
The archive is atar
archive.
--jobs=number-of-jobs
can dump multiple tables at the same time and dramatically reduced the time to dump a large database on a multiprocessor machine.--no-owner
will not output commands to set the ownership of data objects, and this option is ignored when emitting an non-plain format output.--no-privileges
prevent dumping of access privileges (grant/revoke commands).
pg_restore: Restore the database from a dump file
pg_restore
is a tool used to restore the database from the dump file.
Here is the basic syntax for using it:
pg_restore [options] [backup_file_name]
A brief explanation of some useful options is shown below:
--clean
clean (drop) database objects before recreating them.--jobs=number-of-jobs
can restore in parallel and dramatically reduce the time to restore from a large database dump file on a multiprocessor machine.—-no-owner
any user name can be used for the initial connection, and this user will own all the created objects.--no-privileges
prevent restoring of access privileges (grant/revoke commands).--dbname=database_name
connect to database nameddatabase_name
and restore directly into the database.
Best Practice With An Example
Background hint: There are some issues and bugs that only happens in production environment, so I need to reproduce them locally to fix up.
Here's how I did it by using pg_dump
and pg_restore
to backup the production database firstly, and then restore it locally.
-
Exporting the database using
pg_dump
:pg_dump \ --format=custom \ --jobs=4 \ DATABASE_NAME > DATABASE_NAME.dmp
Here's the reason for those chosen options:
--format=custom
compress the output file to reduce storage costs and as a standard input file forpg_restore
;--jobs=4
make this dump in parallel and dramatically reduce the process time.
-
Using
scp
to copy theDATABASE_NAME.dmp
file from cloud server to local machine:scp username@my-remote-cloud-server:/path/to/DATABASE_NAME.dmp ~/Downloads/
-
And then restore it locally with
pg_restore
:pg_restore \ --no-owner \ --no-privileges \ --jobs=4 \ --clean \ --dbname=NEW_DATABASE_NAME \ DATABASE_NAME.dmp
Here's the meaning for the flag arguments:
--no-owner
stop trying to set the ownership of the objects to the original user name;--no-privileges
prevent restoring the access privileges from the objects;--jobs=4
make this restore in parallel and speed up the process;--clean
clean (drop) database objects before recreating them;--dbname=NEW_DATABASE_NAME
connect to database namedNEW_DATABASE_NAME
and restore directly into the database.
Conclusion
Exporting a database in PostgreSQL could be simply like MySQL. But due to the unique ownership system in PostgreSQL, before restoring a dump file, all the database users who own objects or were granted privileges on objects in the dumped database must exist in the target database.
If they don't, the import operation fails to recreate the objects with the original ownership or privilege and will throw an error:
pg_restore: [archiver (db)] could not execute query: ERROR: role "owner_name" does not exist
By setting the --no-owner
and --no-privileges
options in pg_restore
will effectively prevent this error and stop the transform of ownership/privilege from origin objects.
Furthermore, here are 3 tips about the right way to use the backup commands in PostgreSQL:
-
Using custom format not plain script file
Using
--format=custom
to compress the output file can save significant storage space and reduce the storage costs, especially when you are exporting a huge database. And the beauty of a custom format dump is that it is easily possible to extract just a subset of tables, a single index, or maybe a single procedure. -
Prevent the ownership/privilege of objects
By default,
pg_restore
issuesALTER OWNER
orSET SESSION AUTHORIZATION
statements to set ownership of created schema elements. These statements will fail unless the initial connection to the database is made by a superuser or origin-user.
With--no-owner
and--no-privileges
options setting inpg_restore
, any user name can be used for the initial connection, and this user will own all the created objects. -
Making full use of the CPUs
Setting
--jobs=n
option in a multiprocessor machine can drastically speed up the process of the dump and restore.
Now that you have seen how to dump and restore a database in PostgreSQL, just go ahead and try it yourself!
References
- PostgreSQL Official Docs: backup and restore, pg_dump, pg_restore
- Best practices for importing and exporting data in Google Cloud
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.