Wednesday 28 January 2009

One of the biggest questions I always get asked as a consultant is what/how/why should I backup in my business, and how should I plan my Disaster Recovery and Business continuity around that backup?

The specific answers to these questions depend on a number of factors, but this blog should give you some clear guidelines and techniques to solve them.

A quick few definitions:

Business Continuity (BC) - this is the approach that says 'How can I keep my business going should something happen?'
Disaster Recovery (DR) - this is the approach that says 'What do I do if a disaster happens?'

The two are closely intertwined, but separate issues. Business Continuity is about what is in place to maintain business functions - and it includes plenty of non-IT related considerations (office space, stock, telephony, staff etc.). Disaster Recovery is about the short term plan to get from a failure state (i.e. servers stolen) back to a better state. DR is short term, BC is long term.

There is one important caveat with this blog - these are GUIDELINES. Individual businesses have individual requirements and these may not be correct for you.

To start with we'll discuss the first question - what to backup?

What?

Firstly, the old rule of 'backup it all up' is somewhat flawed - data grows exponentially in all companies, and will overflow your backup method at some point. You may not simply have enough room to backup everything, or a big enough time window.

The simplest approach is to first have a good filing system, then break all your data into one of four categories.

The filing system is key to this - lots of SME business rely on Windows, Outlook and some kind of server yet a lot of them don't have it setup to store everything in one place. This means each laptop, desktop and server has its own little set of critical data that isn't backed up. The same can happen in a more corporate environment if the IT guys aren't on top of this. This means your PC crashes and dies and you've just lost your business forecast. And then you are on the phone to a data recovery company and spending thousands of pounds (more on this later), or worse it's been stolen and it's now gone forever.

Centralising this is simple, but requires a bit of knowledge (we aren't talking computer genius knowledge here). Technically (skip this bit if IT scares you), you need to enable My Documents redirection for everyone and offline folders for laptops. And use Exchange Server with caching enabled/use an IMAP based server product/backup PST files somehow for email. These two techniques will ensure users still think their data is local on their PCs, but actually keep it all centralised on the server. Backing up one location is much simpler than backing up 25.

Once you've centralised it, make sure it's organised well. A simple system is one area per department and/or location, and a 'temp' area for stuff you don't need to keep (see categories below). These can be shares, directories, separate servers or whatever depending on the organisation. You can keep user profiles somewhere else and user documents in another location. You can then put permissions onto everything if necessary (so non-Accounts people can't look at accounts for instance).

The rough categories of data (in non-technical speech):
  1. Stuff I have on disks (e.g. installations, operating systems, applications, music files - note that having a copy of the disks is probably a good idea!)
  2. Stuff I don't need to keep (temporary records, stuff you've downloaded from the internet)
  3. Stuff that isn't business critical (copies of documents you have hardcopies of, documents with no retention requirements, some email data, some pictures, stuff that has been sent to you by email that you know is being kept, user profiles, personal files such as music and photos)
  4. Everything else

Note that these are broad categories and there will be exceptions (for instance files downloaded after paying for them should go into Category 3 or 4, some installations might ask you to back themselves up to preserve state).

You NEED to backup category 4. You SHOULD backup Category 3. The rest doesn't matter. No really, it doesn't. You can backup category 2 data if you have capacity. Make sure it gets done last, so when you do run out of room, it doesn't matter. Category one data is too often backed up. You don't need to backup the 400+Mb installation of your favourite office program - it's on the disk, it can be reinstalled. Your profile settings should all be stored elsewhere nowadays (i.e. in your user profile, which is backed up).

Next we look at when you will need the data - this forms part of your business continuity (BC) and disaster recovery (DR) plan:

Let's assume your worst case scenario - all your servers get stolen. Which applications do you need access to NOW, which do you need access to in a day or two, and which do you need access to in a week or so? Using this 1-2-3 or Gold/Silver/Bronze approach lets you gestate your disaster recovery plan, lets you know what kind of business continuity plan you need and the lets you know where your critical systems are.

You may find you don't need anything for a week and can survive accordingly. You may find that without your stock system, email or whatever else your business is dead in the water. You need to adjust your backup, DR and BC plans accordingly. Note that this DOESN'T have to be expensive - if you know you can survive without stuff for a few days you have time to buy hardware, to get it rebuilt and to get data shipped/restored. If one system is critical you can have standby equipment offsite just to run that system - it could just be a standby PC stored at a Directors house preconfigured and ready to go.

Some of the biggest mistakes in planning BC and DR is that you assume it's an expensive process - more often than not it's about stabilising the business until you can get back into your offices and/or find somewhere else to go. Planning for it is the best thing you can do, as doing it on the fly is much more expensive and finding out that you don't have a critical piece of data after your server has caught fire isn't going to keep your board happy.

Some key points to review:
  1. Ensure ALL the stuff you need backed up is centralised. For dispersed offices this means you either need to backup each office, or replicate each office back to a head office location for backup.
  2. Think about all the data you have - your emails are often overlooked, my documents directories on local computers are often overlooked
  3. If you have the install disks, don't back it up. If you downloaded it from the internet, don't back it up unless you have room. There are exceptions, but these rules generally apply
  4. Figure out what systems need to be back up and running in what order - make sure you have planned for that to happen.

How?

This is a complex decision based on many factors.

Your key criteria for HOW is based on:
  1. How often do you NEED to backup (hourly, daily, weekly, less regular)
  2. How much data do you NEED to keep?
  3. How long do you need to keep a version for?
  4. What time window do you have for backup?
  5. (Crucially) how much does your data change between backups (expressed as a percentage)
  6. How quickly do you need access, and what kinds of access?

Question one is based on your recovery window - how long can you afford to be not working. If your business has mission critical, time sensitive data then your backup window might be hourly (or even lower). For instance an online, high volume commercial website might required live replication of data to a secondary data centre. Whereas a painting and decorating company might only require a backup once a week after their invoice run. Certain types of backups can be ruled in or out by this process (the former is unlikely to use tape except as a tertiary backup, the latter is unlikely to use replication).

Your recovery window is how long you can afford to repeat - if you only key in a few invoices a day and send/receive a few emails then this can be quite long. If you have twenty staff bashing in orders every hour it can be quite short!

Question two is about volume of data. Some simple sizing rules can be applied here. Most normal data will compress to a 2:1 ratio on average, however this doesn't apply for anything which is already compressed. This INCLUDES things like most images files (tiff, jpeg), music (aac and mp3) and most movie formats. These won't compress, so if your business is about making video trailers then you shouldn't expect much compression. On the other hand text data will compress more and Microsoft Access Databases are notorious for being mostly empty and compressing sometimes ten-fold.

Some simple guidelines for common low-end storage media.

(Storage Sizes are expressed in Gb. 1Gb = 1024Kb. Typical files are around 50-100Kb, although the bigger the file the larger the file size. Anything graphical in nature tends to be bigger. JPEG Pictures @ 4Megapixels are 3-500Kb, music files are 5-15Mb, 60 mins of movie data is ~600Mb. These all depend on the compression technology used to store them.)

CD's hold around 0.7Gb, or 1.4Gb compressed.
Single-layer DVD's hold around 4.5Gb of data (9Gb compressed). Dual-layer doubles this.
USB Sticks are now available in 32Gb versions, with USB disks going well over a terabyte (1024Gb).
DDS-5 (or DAT-72), the largest size of 'cheap' tape formats holds 36Gb (72Gb compressed).
SDLT (mid-range expense tape) holds 160Gb or 320Gb compressed.

You must allow for data growth - if you have 60-70Gb to backup up now, don't go for DDS-5 tapes as you'll run out in a year or so (data growth is hard to estimate, but 10% per annum is not unusual). Otherwise you simply add up the total size of the data you need to backup, do a quick sum on how much of it is already compressed and there is your backup set size.

Question three is about retention. If you might need to refer back to data from 6 months ago that will be overwritten (for instance transactional database records back to a point in time) you need to plan for that storage. This is where traditional backup methods such as tape have benefit - you can keep a tape for each month for a very low cost. Online backup usually has a recovery window of a month or two (although you can usually get longer at additional cost).

Most businesses tend to only need the most recent or possibly a week or two of history. However your individual requirements may differ.

Question four is about when you can backup. Most systems do allow for interactive backup (i.e. backup whilst data is being worked on) in some fashion, however this notoriously slows down the systems. Some systems can export incremental data so you can backup the changes since the last backup, or differential data which is the difference since the last full backup. These shorten your backup size dramatically. If you are running a 24x7 internet facing business, you don't HAVE a backup window - you've just got to accept your systems will run slow at some point. Pick a point in time when transactions are lower. Often you combine incremental (log) backups with a full backup in a maintenance/slow time window (especially true for databases).

9-5 business have a decent backup window (17:00 - 09:00!). Your backup needs to be able to complete inside this window (sounds obvious, but is worth checking!). You may be forced to do incremental/differential backups on week nights and a full backup on the weekend.

Question five forces options closed for you. If your data is volatile (i.e. changes a LOT every day) then you may not be able to use systems like online backup, and incremental/differential backups might not work for you.

Typically data changes less that one percent per day.

Question six depends on your recovery plans and if you need to use small scale restores a lot.

If you have issues with some systems that require frequent restores then this needs to be included in your backup strategy. Your disaster recovery plan and/or business continuity plan will tell you how quickly you need the backup (this is one of the major draw backs with online backup - it's great having cheap online backup in China, but if you need your 2Tb of data downloaded by tomorrow you'd better have a fast internet connection!).

So now you've answered those questions, what kinds of backup are there?

Without investigating specific technologies, they can loosely be grouped into:
  1. Live backup. This is about replicating data to a standby or clustered system. It's VERY expensive, complex and hard to manage and maintain, but when your primary system fails you either have instant or triggered failover to a standby system. This can be locally clustered or geographically dispersed. Windows includes forms of live backup for free (e.g. Distributed File System (DFS)). Virtualisation supports excellent failover. Products such as SQL Server clustering, DoubleTake and the like allow disperse systems to failover successfully. Recovery windows are either instantaneous or very short.
  2. Replicated backup. This is where the data is replicated to another server. This includes most online backup services. The difference between the above is that there isn't a failover necessarily (there may be partial failover). You usually need to do something to get the data back. Database log shipping does this (where you backup the transaction logs and then 'ship' them to another server). Recovery windows for this kind of backup are usually low, but depend on the speed of copying back the data.
  3. Near-line backup. This is where the data is physically copied to some kind of removable device (e.g. tape, disk cartridge, USB stick, CD/DVD). Recovery windows here are higher as tape is slower to transfer (plus you have to find it!), however can be a lot quicker to restore than online backup.

Note that you can (and sometimes should) mix the above. There is nothing wrong with having clustered servers that also have a tape backup - this gives you the retention you need (question three above) without using tons of disk space. Or you might backup your Email data from Microsoft Exchange using an online agent but backup your file data onto tape as one is more critical.

Why?

Well there are two primary reason to backup.

Firstly is to recover in the case of a disaster (server failure, fire, flood, theft). This means you aren't out of business if this happens. The usual statistics thrown around here is that 90% of businesses without backup or DR planning are out of business within a year of a major incident.

The second reason is so you can go BACK in time to recover something important (e.g. if you changed your sales forecast and want to compare it to last weeks version).

There are additional reasons - you may have to for legislative reasons, your customers and/or suppliers may require you to have an active DR plan, which backup forms a key aspect of.

The online vs. onsite argument?

Both sides of the coin will always argue in favour of their solution, but cutting through the sales pitch reveals some key truths:
  1. Online backup WORKS. It is more reliable than tape backup, and doesn't require any intervention to tape (the human element of tape backup can be unreliable, tape drives can fail, tapes and drives get dirty etc. etc.)

  2. Online backup depends on you having reasonable internet connectivity, but more importantly on the 'delta' (the change in data each night) being transmittable during your backup window. If not, it won't work as it will get out of sync. Most of the time the company internet connection is fast enough to achieve this (as companies with more data will tend to have faster connections!)

  3. Online backup supports excellent small-scale recovery. If you need one file, one email etc. then recovery is often simpler than onsite backup (as you don't need to find the right tape etc.) Onsite backup can support small-scale recovery, as you can create a tape and disk backup using most common backup tools. Otherwise you have to hunt for your tapes.

  4. Online backup doesn't necessarily support full-scale recovery. If you need to recover ALL your data you may need to get it burnt onto disk/media and couriered over. Downloading it may take some time which you need to plan for in your DR plan. Onsite backup is easier for a full recovery. Remember you need to factor in media burn times into this (your online backup company may take 2 hours to burn the media plus 2 hours to courier it over, followed by 2 hours to upload it - that means you've lost most of a working day - near-line means you start off 4 hours ahead).

  5. Online backup tends to cost a regular fee that isn't a capital investment - usually charged per Gb per month (sometimes with additional cost per service that you backup, e.g. database, email). Onsite backup tends to have an up-front capital infrastructure costs but less ongoing cost. If you are starting out, online is cheaper. If you already have tape, it's cheaper to stay with tape. However cost isn't the only choice!

Some golden rules:
  1. Tape wears out. Replace them. 12-18 months for normal use.
  2. TEST the backup! Restore stuff regularly and make sure it works.
  3. MONITOR the backup. Don't assume it works, check the log files/emails/whatever
  4. Give someone ownership of the backup - don't assume someone will do it
  5. Data recovery is expensive, backup is cheap in comparison
  6. Business continuity plans are for everyone, not just massive blue chips. It's all about planning ahead and thinking what would happen.
  7. Untested disaster recovery plans aren't worth the paper they are printed on. Only by testing them do you find out they don't work, and fix them.
  8. It isn't always expensive to create a DR/BC plan.
  9. DR/BC isn't just an IT issue - it's an HR issue, it's a logistics issue, it's a financial issue - every department should be involved in some way.

If you want to receive updates on my blogs or contact me please either check out my Ecademy profile here or follow me on twitter here.

No comments:

Post a Comment