After introducing the G-Suite system into the company, one of the added values quickly became the migration of our backups to the cloud. This reduced our maintenance costs, eliminated storage limitations, and placed our data in more reliable, professionally managed hands.

Of course, this is an automated process. Most of the time, our systems or services running under some Linux distribution need to be backed up on a daily basis.

At Ponte, this typically means two types of backups: either a directory and its full contents must be saved, or a database file. In both cases, it is worthwhile to compress the data so the result forms a single unit.

Preparing the backups

Before uploading the compressed backup files, they must be prepared with data protection in mind. Unlimited storage is great, but naturally, we wouldn’t hand over sensitive data “just like that.” It wouldn’t be professionally acceptable to do only half the job.

Therefore, we take the following steps:

  • We compress the backups using the tar.gz command available on Linux systems.
  • We split the file into 1 GB chunks. This is practical because if the network connection drops during upload, we won’t need to restart the process. Additionally, the encryption tool we use can only handle files of this size.
  • We encrypt the chunks using OpenSSL.
  • We name the files following a naming convention, including the date in the filename. The reason is that metadata may become distorted over time depending on the systems involved.

Uploading to Google Drive

After reviewing our options, it soon became clear that Linux does not have a reliable, first-party command-line client from Google. The remaining option was to find a trustworthy third-party tool with verifiable behavior and implementation. We ultimately chose gdrive, which proved to work well.

Management

The essence of backup management is to supervise the process. We must know if backups fail and also remove those that are no longer needed.

The latter may sound contradictory at first, given that earlier I praised the freedom of unlimited storage. However, this can quickly get out of hand. After implementing G-Suite, we didn’t pay attention to this for a while, and within almost two years, more than 40TB of data accumulated. For comparison, before G-Suite, we had only 2TB available on our own hardware. It’s easy to get used to the good things.

In the spirit of fair use, and to manage our storage before it spirals out of control, we created a Google Apps Script that removes outdated backups.

Deleting outdated backups

This task brought new experiences for us, as we encountered several G.A.S. limitations during implementation. These surfaced throughout development and forced us to modify the script continuously. To the reader, we present these constraints first, followed by the rules used for deletion.

Google Apps Script limitations

syoc_-_backup_management_-_limits-1.jpg

syoc_-_backup_management_-_limits-2.jpg

Which backups should we delete?

The first important question is: what must remain, and what may be deleted? In our case, files are kept if any of the following conditions are met:

  • all files created within the last 14 days
    Daily backups from the last two weeks are sufficient for recovery after data loss.
  • the first backup created in each month
    This is not always necessary, but it can be useful to preserve historic snapshots—although GDPR considerations may limit this.
  • the most recent backup from each system
    One might think the 14-day rule covers this, but not necessarily. If a backup process stops mid-month or silently fails, the last successful backup may fall outside the 14-day window.

It becomes clear that these rules are time-based. This is why preserving accurate timestamps is essential—including in the filename. The timestamp in the filename reflects the client’s known time, while the Google Drive metadata reflects the server’s time.

Be mindful of time zone differences! In our case, this does not pose a problem, since we operate with day-based logic.

Implementation

With both limitations and retention rules clarified, let’s look at how to schedule the process that locates and removes files, then generates a report.

Scheduling

One limitation of G.A.S. is that a script may run for a maximum of 6 minutes. This becomes an issue when listing or analyzing many folders and files.

We work around this by organizing backups into project-specific folders and scheduling the script so that it processes one folder at a time. This gives each folder a 6-minute window.

If even that is insufficient (e.g., a folder contains many old files), this will only be a concern at the beginning. Regular runs—say, weekly—keep the dataset manageable. The key is that the process must always be able to continue.

Recursive traversal is very slow in Drive scripts. Avoid nested folder structures for backup archives whenever possible.

Unfortunately, we face another limit: only 20 time-based triggers can exist simultaneously. To avoid this, our task scheduling forms a chain.

syoc_-_backup_management_-_trigger_chain.png

Steps of the scheduled chain:

1. Define the starting point of the process. For example, the cleanup may run once per week, every Saturday. This can be configured manually via the script’s trigger interface: Edit > Current project's triggers.

syoc_-_backup_management_-_start_trigger.jpg

Here, specify the method to run and the scheduling rule.

syoc_-_backup_management_-_start_trigger-2.jpg

2. To create chained triggers, we introduce a session identifier in startTriggerChain(), based on the start time. Then we call timingTrigger().

function startTriggerChain(){
  var sessionDate = formatDate(new Date());
  timingTrigger(sessionDate, FOLDER_IDS, 5 * 1000); // 5 seconds
}

3. The timingTrigger() method schedules the next task in the chain and passes the required parameters.

function timingTrigger(sessionDate, bucketList, delay){
  var trigger = ScriptApp
    .newTrigger("doTrigger")
    .timeBased()
    .after(delay)
    .create();
  
  setupTriggerArguments(trigger.getUniqueId(), {
      sessionDate: sessionDate,
      bucketList : bucketList
    });
}

Parameters must be stored using PropertiesService because time-based triggers cannot rely on in-memory values. Scheduled executions may occur much later.

For convenience, we implement helper methods:

// Basic Trigger functions -------------------------------------
function removeTriggerByUid(triggerUid){
  if (!ScriptApp.getProjectTriggers().some(function (trigger) {
    if (trigger.getUniqueId() === triggerUid) {
      removeTrigger(trigger);
      return true;
    }
    return false;
  })) {
    console.error("Could not find trigger with id '%s'", triggerUid);
  }
}

function removeTrigger(trigger){
  ScriptApp.deleteTrigger(trigger);
  deleteTriggerArguments(trigger.getUniqueId());
}

function setupTriggerArguments(triggerUid, args){
  PropertiesService.getScriptProperties().setProperty(triggerUid, JSON.stringify(args));
}

function extractTriggerArguments(triggerUid) {
  return JSON.parse(PropertiesService.getScriptProperties().getProperty(triggerUid));
}

function deleteTriggerArguments(triggerUid) {
  PropertiesService.getScriptProperties().deleteProperty(triggerUid);
}

4. The doTrigger() implementation ensures the execution chain functions as intended:

  • Extracts parameters from PropertiesService.
  • Removes itself immediately to avoid accumulating triggers (remember the 20-trigger limit).
  • Selects the next folder to process.
  • Schedules the next run with updated parameters.
function doTrigger(event){
  // Get args & remove timing and args
  var args = extractTriggerArguments(event.triggerUid);
  var sessionDate = args.sessionDate;
  var bucketList = args.bucketList;
  removeTriggerByUid(event.triggerUid);
  
  // Find & definite next trigger in the chain or finally send report
  var bucketListKeys = Object.keys(bucketList);
  if(!!bucketListKeys && bucketListKeys.length > 0){
    
    // Save current job params
    var target = bucketListKeys[0];
    var id = bucketList[target];
    
    // Next job args & timing now
    delete bucketList[target];
    timingTrigger(sessionDate, bucketList, MIN_TASK_TIME);
    
    // Do the job using params
    startCleanup(sessionDate, target, id);
    
  } else {
    // Send report
    sendReport();
  }  
}

The last step is to call startCleanup(), which performs directory scanning and deletion.

If no folders remain, a report is generated.

Scanning

Scanning means identifying files or groups of files eligible for deletion. Since backups are split into 1 GB chunks, group handling is important.

1. Traverse the target folder (recursively) using findBackupFiles() and validate filenames using our date format. Invalid files are ignored.

function findBackupFiles(root){
  var result = [];
  var folders = root.getFolders();
  while(folders.hasNext()) {
    var childFolder = folders.next();
    var subResult = findBackupFiles(childFolder);
    result = result.concat(subResult);
  }
  
  var files = root.getFiles();
  while(files.hasNext()){
    var childFile = files.next();
    if(FILE_DATE_FORMAT.test(childFile.getName()))
       result.push(childFile);
  }
  return result;
}

The regex used to validate dates:

var FILE_DATE_FORMAT = new RegExp("(20[0-9]{0,2})[\-_]?([01]?[0-9])[\-_]?([0123]?[0-9])");

This matches various formats, as shown on regex101.

  • 2017-01-01
  • 2018-05-15-project1.backup.tar
  • projekt2-2018-05-15.tgz.part01.enc
  • projekt3-20170913.tar.gz
  • projekt3-2017_09_13.tar.gz

2. Group files by date using groupBackupFiles():

function groupBackupFiles(arr){
  var result = {};
  if(!!arr && arr.length != 0){
    arr.map(function(f){
      var found = f.getName().match(FILE_DATE_FORMAT);
      if(!!found && found.length != 0){
        var timestamp = asTimestamp(found[0]);
        if(!result[timestamp])
          result[timestamp] = [];
        result[timestamp].push(f);
      }
    });
  }
  return result;
}

Grouping by date is sufficient because each processed folder contains only a single project's backups.

3. Apply rules to separate files into fresh, keep, and remove groups:

function separateBackupFiles(groups){
  var result = { fresh : [], keep : [], remove : [] };
  
  var lastYear, lastMonth;
  var firstDayInMonthFounded = false;
  var now = new Date();
  var groupKeys = Object.keys(groups).sort();
  var groupKeyCount = groupKeys.length;
  groupKeys.map(function(time, index){
    var date = new Date(Number(time));
    var diff = daysBetween(date, now);
    
    if(diff < KEEP_WITHIN_DAYS){
      result.fresh.push(groups[time]);
      return;
    }
    
    Logger.log("index: " + index + ", Count: " + groupKeyCount);
    if(index == groupKeyCount - 1){ // Keep the last
      result.keep.push(groups[time]);
      return;
    }
    
    var year = date.getFullYear();
    var month = date.getMonth();
    var day = date.getDay();
    
    // reset when year or month changed
    if(lastYear != year || lastMonth != month){
      lastYear = year;
      lastMonth = month;
      firstDayInMonthFounded = false;
    }
  
    if(!firstDayInMonthFounded){ // Keep
      result.keep.push(groups[time]);
      firstDayInMonthFounded = true;
    }else{ // Remove
      result.remove.push(groups[time]);
    }
    
  });
  return result;
}

4. Implement startCleanup() which ties everything together:

// Main logic ---------------------------------------
function startCleanup(sessionDate, target, id) {
  if(!target || !id) return;
  
  // Collect & separate files
  var backupFolder = DriveApp.getFolderById(id);
  var backupFiles = findBackupFiles(backupFolder);
  var backupFilesByDate = groupBackupFiles(backupFiles);
  var backupFilesSeparated = separateBackupFiles(backupFilesByDate);
  
  // Do the job
  var results = [0, 0]; // count, size
  backupFilesSeparated.remove.map(function(arr){
    var r = moveToTrash(arr);
    results[0] += r[0];
    results[1] += r[1];
  });
  
  // Reporting
  // TODO: result contains every useful information about removed files
}

Deleting files

Once the files to be deleted are identified, actually deleting them is straightforward. However, instead of permanent deletion, it is better to move them to the trash.

function moveToTrash(arr){
  var result = [0, 0];  // count, size
  // there is nothing to do
  if(!arr || arr.length == 0) return result;
  // each file set trashed
  arr.map(function(f){
    f.setTrashed(true);
    result[1] += f.getSize();
  });
  result[0] = arr.length;
  // trashed parent folder too when it's empty
  var parentFolder = arr[0].getParents().next();
  if(!!parentFolder){
    var empty = !parentFolder.getFiles().hasNext() && !parentFolder.getFolders().hasNext();
    if(empty) parentFolder.setTrashed(true); // set trashed
  }
  return result;
}

Unlike Gmail, Google Drive does not automatically empty the trash:
"The file will stay there until you empty your trash."
https://support.google.com/drive/answer/2375102

Creating the report

Since this project revolves around automated scheduled cleanup, it is useful to receive a summary of the process. The startCleanup() result includes the freed space, which could be logged, for example, into a Spreadsheet. Implementation is left to the reader.

Here is a snippet for sending the summary email:

function sendReport(){
  // Get summary results
  var summary = DEFAULT_LOGGER.calcSummary();
  var date = formatDate(new Date(summary[0]));
  var size = summary[2] / 1024 / 1024 / 1024;  
  
  // Read template & fill informations
  var bodyHTML = convertToHTML(REPORT_EMAIL_TEMPLATE_DOC_ID);
  var reportBodyHTML = bodyHTML.replace(/{{SESSION_DATE}}/g, date)
                               .replace(/{{COUNT}}/g, summary[1])
                               .replace(/{{SIZE}}/g, size.toFixed(2))
                               .replace(/{{WITHIN_DAYS}}/g, KEEP_WITHIN_DAYS)
                               .replace(/{{LOG_URL}}/g, DEFAULT_LOGGER.getUrl());
  
  // Send mail
  MailApp.sendEmail({
    to: REPORT_EMAIL_ADDRESSES.join(","),
    subject: "Backup Cleaner Report - " + date,
    htmlBody: reportBodyHTML,
    noReply: true
  });
}

And with that, we are essentially done.


The above implementation demonstrates a real example where Google Apps Script once again proved valuable in an environment where diverse office and business services can be programmed together. In doing so, we automated yet another process, saving both time and money. Moreover, we prevented a future scenario where we would suddenly face hundreds of terabytes of accumulated data. As a result of this script, our storage usage dropped to 10% of its previous size.