Comprehensive Guide to Backup and Restore PostgreSQL Database Utilizing Google Drive Storage
This guide details the use of two Python scripts: backup.py
and restore.py
for the purpose of backing up and restoring databases. The backup.py
script is responsible for creating a PostgreSQL dump, which it achieves by zipping and encrypting the specified database. Once encrypted, the dump is uploaded to Google Drive Storage for safekeeping. On the other hand, the restore.py
script locates the most recent dump in Google Drive Storage, downloads it, and subsequently unzips and decrypts the data. Once decrypted, the data is then loaded back into the PostgreSQL database.
To effectively use both scripts, you'll need:
- A version of Python3.6 or higher,
- The pip packages installed from the
requirements.txt
file, - The Google Drive API and a Service Account enabled with the appropriate roles and permissions,
- Files with public (
backup_key.pem.pub
) and private (backup_key.pem
) keys for encrypting and decrypting the dump, which can be generated with openssl:
openssl req -x509 -nodes -days 1000000 -newkey rsa:4096 -keyout backup_key.pem\\
-subj "/C=US/ST=Illinois/L=Chicago/O=IT/CN=[www.example.com](<http://www.example.com/>)" \\
-out backup_key.pem.pub
- A file with the PostgreSQL database password (
~/.pgpass
) set withchmod 600
, including, for example:
localhost:5432:your_database:your_db_user:your_db_user_password
- Finally, ensure to check the
check_hostname()
function in therestore.py
script. This function checks thehostname
of the current server, serving as a safeguard against potentially dropping database tables on the production server.
To enable the Google Drive API, follow these steps:
-
Go to the Google Cloud Console at
https://console.cloud.google.com/
. -
Navigate to
APIs & Services
. In the left sidebar, selectDashboard
underAPIs & Services
. -
Click on
Enable APIs and Services
. Search forGoogle Drive API
and enable it for your project.
- Open the console left side menu and select
IAM & Admin
, thenService Accounts
. - Click
Create Service Account
. - In the
Service account name
field, enter a name. In theService account description
field, enter a description. ClickCreate
. - Under
Grant this service account access to project
, select theRole
drop-down list. You should assign it the following roles:Editor
,Storage Admin
,Storage Object Admin
, andStorage Object Creator
. These roles will allow your service account to perform necessary actions in Google Drive. ClickContinue
. - Under
Grant users access to this service account
, in theUser
field, enter your email address. ClickDone
. - Find the email of your new service account in the
Service Accounts
list and click on it. - In the
Keys
section, clickAdd Key
, thenCreate new key
. - For the
Key type
, chooseJSON
. ClickCreate
. A JSON file that contains your key downloads to your computer. - Save this JSON file, you will need it to run your scripts. Do not share it with anyone, as it allows access to your Google Drive.
To bind a service account from Google Cloud to your personal Google Drive and gain necessary permissions to upload and delete files, you can follow these steps:
- Share Your Google Drive Folder with the Service Account
- Open your Google Drive and create or navigate to the folder you want to share
- Click on the folder’s name and then click on the
Share
icon - In the
Add people and groups
section, enter the email address of your Service Account (you can find this on the Service Account's detail page in Google Cloud or in the JSON file which you downloaded in theemail
field) and clickSend
- Open your Google Drive and create or navigate to the folder you want to share
To find the Google Drive folder ID, follow these steps:
- Open your Google Drive.
- Navigate to the folder for which you need the ID.
- Click on the folder to open it.
- Look at the URL in your web browser's address bar. The long string of characters at the end of the URL is the folder's ID.
(Don't forget to Remove the Query Parameter)
Here's how the script works:
- Environment Variables: At the beginning, the script sets up several environment variables, such as the host name of the database (
DB_HOSTNAME
), the database name (DB_NAME
), the database user (DB_USER
), the Google Drive folder ID where the backup will be stored (GOOGLE_DRIVE_FOLDER_ID
), and the file path to the public key used for encryption (BACKUP_KEY_PUB_FILE
). - Function Definitions: Several functions are defined for use in the script. These include:
say_hello()
: This function prints a greeting message when the script is run.get_now_datetime_str()
: This function returns the current date and time as a formatted string.check_key_file_exists()
: This function checks if the public key file for encryption exists. If it doesn't, the script exits.dump_database()
: This function uses thepg_dump
command to create a dump of the PostgreSQL database, compresses the dump using gzip, and then encrypts the compressed dump using openssl. The encrypted dump is then saved to a temporary file.get_drive_service()
: This function creates and returns a Google Drive service client, which is used to interact with the Google Drive API.upload_dump_to_drive()
: This function uploads the encrypted database dump to Google Drive. The file metadata (including the file name and the ID of the parent Google Drive folder) and the file content are specified in the request. After the file is uploaded, the function prints the ID of the uploaded file.remove_temp_files()
: This function removes the temporary file containing the encrypted database dump.
- Main Program: The main part of the script (under
if __name__ == "__main__":
) calls the functions defined above in the correct order. First, it prints a greeting message, checks if the public key file exists, creates a database dump, uploads the dump to Google Drive, and finally removes the temporary file.
Here's how the script works:
- Environment Variables: The script starts by setting up several environment variables, including the hostname of the database, the database name, the database user, the Google Drive folder ID where the backup is stored, and the file path to the private key used for decryption.
- Database Connection: The script establishes a connection to the PostgreSQL database.
- Function Definitions: The script defines several functions that are used to restore the database:
say_hello()
: Prints a greeting message to the console.check_hostname()
: Checks the hostname of the current server. The script will stop execution if the hostname does not start with 'loader-' or end with '.local'.check_key_file_exists()
: Checks if the private key file for decryption exists. If not, the script stops.get_drive_service()
: Creates and returns a Google Drive service client, which is used to interact with the Google Drive API.get_last_backup_filename()
: Lists the files in the Google Drive folder and finds the most recent file, returning its name.download_s3_file(filename)
: Removes any existing file with the same name as the database dump, then downloads the specified file from Google Drive.unencrypt_database()
: Decrypts the downloaded database dump using openssl.unzip_database()
: Unzips the decrypted database dump.clear_database()
: Clears the current database state by dropping all tables.load_database()
: Loads the unzipped database dump into the PostgreSQL database.remove_temp_files()
: Removes the temporary file containing the encrypted database dump._get_all_db_tables()
: Helper function that returns a list of all table names in the public schema of the database._silent_remove_file(filename)
: Helper function that removes a specified file without throwing an error if the file does not exist.
- Main Program: The main part of the script (under
if __name__ == "__main__":
) calls the defined functions in the correct order. It prints a greeting message, checks the hostname and if the private key file exists, downloads the most recent backup file from Google Drive, decrypts and unzips the downloaded file, clears the database, loads the unzipped dump into the database, and finally, removes the temporary file.