AWS

How the blog was built part 5 - import a backup

A few times I have had to manually re-import the blog data... I've done this enough times and I need to update the Ghost blog version and fix any Snyk reported security issues. I will automate all this and will show the scripts that will help import a backup automatically.

Jon-Paul Flood

Jan 8, 2023 • 6 min read

Photo by Andy Li / Unsplash

Now I can backup and update the blog but a few times I have had to manually re-import the blog data. You saw from the backup scripts in part 1 will export the blog content as a JSON file and the content (that includes all the images used) as a ZIP file. The blog is manually imported with these steps:

The blog content JSON file from the Settings Lab Import button.
The images content uncompressed from the ZIP file and into the blog folder.
The profile is reset whenever the blog is reset and the details have to be manually updated again, the profile text and pictures.

I've done this enough times and I need to update the Ghost blog version and fix any Snyk reported security issues. I will automate all this and will show the scripts that will help import a backup automatically. These scripts are a work-in-progress because I haven't needed to execute them . This code is my educated guess so far and I will update the code once I have confirmed them working.

Importing the blog content

The import is split into two executing scripts, one to download the backup and the other to do the import.

The download_website.sh script will:

Get the list of backups from AWS S3 after the set date time after when backups are packaged correctly.
Convert the JSON array to a bash shell array.
Display the list of backup datetimes and wait for user input.
Once selected, create the download folder.
Get each file in the backup from AWS S3 in the requested backup and save to the download folder.
From Docker, extract the blog content images from the archive into the blog folder. NOTE: This doesn't currently work because I haven't restarted the blog with the new import folder reference in the Dockerfile.

This saves me logging into the AWS console, listing the bucket contents and downloading the archive or using the AWS CLI myself.

#!/bin/bash

GHOST_DOMAIN=binarydreams.biz
AWS_REGION=eu-west-1

# This datetime since the backups are packaged a certain way from then
DATETIME="2023-03-03-00-00-00"

echo "\nGet list of backups from S3 ..."

# Filter after datetime AND by tar.gz file because there is only ever one in a backup rather than by .json
# to get a definitive list of backups
declare -a DATETIMES_BASHED
declare -a DATETIMES
DATETIMES=$(aws s3api list-objects-v2 --bucket $GHOST_DOMAIN-backup --region $AWS_REGION --output json --profile ghost-blogger --query 'Contents[?LastModified > `'"$DATETIME"'` && ends_with(Key, `.tar.gz`)].Key | sort(@)')

echo "Backups found.\n"

# I didn't want to install more extensions etc but I just wanted
# a working solution.
# installed xcode-select --install
# brew install jq
# https://jqplay.org/
# jr -r output raw string
# @sh converts input string to to space separated strings
# and also removes [] and ,
# tr to remove single quotes from string output
DATETIMES_BASHED=($(jq -r '@sh' <<< $DATETIMES | tr -d \'\"))

# Show datetimes after $DATETIME and add the extracted string
# to a new array
declare -a EXTRACTED_DATETIMES

for (( i=0; i<${#DATETIMES_BASHED[@]}; i++ ));
do
    # Extract first 19 characters to get datetime
    backup=${DATETIMES_BASHED[$i]}
    #echo $backup
    backup=${backup:0:19}
    EXTRACTED_DATETIMES+=($backup)
    menuitem=$(($i + 1))
    echo "[${menuitem}] ${backup}"
done
echo "[x] Any other key to exit\n"

read -p "Choose backup number> " RESULT

# Check if not a number
if [ -z "${RESULT##*[!0-9]*}" ]
then
    exit 1
fi

# Reduce array index to get correct menu item
RESULT=$(($RESULT - 1))
SELECTED_DATE=${EXTRACTED_DATETIMES[$RESULT]}

echo "\nDownloading backup $SELECTED_DATE\n"

IMPORT_LOCATION="data/import"
DOWNLOAD_FOLDER="$IMPORT_LOCATION/$SELECTED_DATE"

# Create backup download folder if required
if [ ! -d "$DOWNLOAD_FOLDER" ] 
then
    mkdir -p "$DOWNLOAD_FOLDER"

    if [ $? -ne 0 ]; then
        exit 1
    fi

    echo "Created required $DOWNLOAD_FOLDER folder"
else
    # TODO: not working
    rm -rf "$DOWNLOAD_FOLDER/*"

    if [ $? -ne 0 ]; then
        exit 1
    fi
fi

function get_file {
    FILENAME=$1
    FILE_KEY="$SELECTED_DATE/$FILENAME"
    OUTPUT_FILE="$DOWNLOAD_FOLDER/$FILENAME"
    OUTPUT=$(aws s3api get-object --bucket $GHOST_DOMAIN-backup --region $AWS_REGION --profile ghost-blogger --key $FILE_KEY $OUTPUT_FILE)
    echo "$FILENAME downloaded."
}

get_file "ghost-content-$SELECTED_DATE.tar.gz"
get_file "content.ghost.$SELECTED_DATE.json"
get_file "profile.ghost.$SELECTED_DATE.json"

echo "Download complete.\n"

echo "Extract content folder from archive"
docker compose exec -T app /usr/local/bin/extract_content.sh $SELECTED_DATE

download_website.sh

Extract the blog data

The first part of the extraction is to uncompress the blog content into the import location and then move it to the Ghost install location.

#!/bin/bash

NOW=$1

GHOST_INSTALL=/var/www/ghost/
GHOST_ARCHIVE=ghost-content-$NOW.tar.gz
IMPORT_LOCATION=import/$NOW

echo "Unarchiving Ghost content"
cd /$IMPORT_LOCATION

# x - , v - show verbose progress, 
# f - file name type, z - create compressed gzip archive
tar -xvf $GHOST_ARCHIVE -C /$IMPORT_LOCATION

if [ $? -ne 0 ]; then
    exit 1
fi

#echo "Moving archive to $IMPORT_LOCATION"
#cp -Rv $GHOST_INSTALL$GHOST_ARCHIVE /$IMPORT_LOCATION
#rm -f $GHOST_INSTALL$GHOST_ARCHIVE

extract_content.sh

Import the blog data

The 2nd script, import_website.sh , will:

Get the list of downloaded imports found in the import folder.
Wait for user to select the datetime of the import.
Execute the Cypress tests with the selected datetime.

#!/bin/bash

echo "\nGet list of imports from import folder ..."

declare -a FOLDERS
FOLDERS=($(ls -d data/import/*))

for (( i=0; i<${#FOLDERS[@]}; i++ ));
do
    folder=${FOLDERS[$i]}
    echo "[${i}] $folder";
done

if [ -z ]
then
    echo "No imports found."
    exit 1
fi

echo "[x] Any other key to exit\n"

read -p "Choose import number> " RESULT

# Check if not a number
if [ -z "${RESULT##*[!0-9]*}" ]
then
    exit 1
fi

# Reduce array index to get correct menu item
RESULT=$(($RESULT - 1))
SELECTED_DATE=${FOLDERS[$RESULT]}

echo $SELECTED_DATE

# TODO:
# User choose whether to reset blog content by deleting existing blog content
# Check if first page in test is logging in OR blog setup.

# FOR NOW:
# Will need to manually setup blog, delete default blog posts and content files
# The UI tests should do the rest

#echo "Run the UI test to import the blog from JSON files and return to this process"
#npx as-a binarydreams-blog cypress run --spec "cypress/e2e/ghost_import.cy.js" --env timestamp=$SELECTED_DATE

The Cypress test will:

log into Ghost
check the blog content JSON file exists
run the test to import the blog with the import datetime passed in as an argument

Then the profile is imported with these steps:

log into Ghost
reading the profile JSON file from the expected location
Browse to the profile page
Upload the cover picture
Upload the profile picture
Enter the profile details

/// <reference types="cypress" />

// Command to use to pass secret to cypress
// as-a local cypress open/run

describe('Import', () => {

  beforeEach(() => {
    // Log into ghost
    const username = Cypress.env('username')
    const password = Cypress.env('password')
    
    cy.visit('/#/signin')

    // it is ok for the username to be visible in the Command Log
    expect(username, 'username was set').to.be.a('string').and.not.be.empty
    // but the password value should not be shown
    if (typeof password !== 'string' || !password) {
      throw new Error('Missing password value, set using CYPRESS_password=...')
    }

    cy.get('#ember7').type(username).should('have.value', username)
    cy.get('#ember9').type(password, { log: false }).should(el$ => {
      if (el$.val() !== password) {
        throw new Error('Different value of typed password')
      }
    })

    // Click Log in button
    cy.get('#ember11 > span').click()
  })

  it('Content from JSON', () => {

    let timestamp = Cypress.env("timestamp")
    let inputFile = `/import/${timestamp}/content.ghost.${timestamp}.json`
    cy.readFile(inputFile)

    // Click Settings icon
    cy.get('.gh-nav-bottom-tabicon', { timeout: 10000 }).should('be.visible').click()

    // The Labs link is generated so go via the link
    cy.visit('/#/settings/labs')

    // Click browse and select the file
    cy.get('.gh-input > span').selectFile(inputFile)

    // Click Import button
    cy.get(':nth-child(1) > .gh-expandable-header > #startupload > span').click()
  })

  it('Profile from JSON', () => {

    let timestamp = Cypress.env("timestamp")
    let inputFile = `/import/${timestamp}/profile.ghost.${timestamp}.json`
    let profile = cy.readFile(inputFile)

    // Click Settings icon
    cy.get('.gh-nav-bottom-tabicon', { timeout: 10000 }).should('be.visible').click()

    // The profile link is easier to go via the link
    cy.visit('/#/staff/jp')

    // Cover picture
    cy.get('.gh-btn gh-btn-default user-cover-edit', { timeout: 10000 })
      .should('be.visible').click()

    cy.get('.gh-btn gh-btn-white').click().selectFile(profile.coverpicture)

    // Save the picture
    cy.get('.gh-btn gh-btn-black right gh-btn-icon ember-view').click()

    // Profile picture
    cy.get('.edit-user-image', { timeout: 10000 })
      .should('be.visible').click()

    cy.get('.gh-btn gh-btn-white').click().selectFile(profile.profilepicture)

    // Save the picture
    cy.get('.gh-btn gh-btn-black right gh-btn-icon ember-view').click()

    // Import text from profile file
    cy.get('#user-name')
      .type(profile.username)

    cy.get('#user-slug')
      .type(profile.userslug)

    cy.get('#user-email')
      .type(profile.email)

    cy.get('#user-location')
      .type(profile.location)

    cy.get('#user-website')
      .type(profile.location)

    cy.get('#user-facebook')
      .type(profile.facebookprofile)

    cy.get('#user-twitter')
      .type(profile.twitterprofile)

    cy.get('#user-bio')
      .type(profile.bio)

  }) 
})

ghost_import.cy.js

Once this has been fully tested then I will update this article.

Importing the blog content

Extract the blog data

Import the blog data

Sign up for more like this.