-- Just a very short intro showing backup and restore commands, and some essential theory. -- Usable for SQL 2005 / 2008 -- By Albert van der Sel -- v. 0.2 - 17/09/2010 This small note will discuss some basics of Backup and Recovery. We will discuss some theory, and we will show some representative TSQL commands. It's important to know that it thus only deals on the very basics of Full, Differential, and Transaction log backups and recoveries. So, it does not discuss "modern" sophisticated techniques. Ofcourse, backup and restore actions can also be performed from the graphical console (SSMS), but that's not the subject of this note. Also, in larger environments, probably a third-party tool might be in use. But, in this case too, the "basic" theory will not change. -- ------------------------------------------------------------------------------- -- 1. What is the Transaction Log of a SQL Server database? -- ------------------------------------------------------------------------------- Changes in, for example, tables, should be "written" into the "true" database files ofcourse. But, before that happens, there is a sort of "write ahead" log, called the "transaction log" which will records those changes *first*. So, the first (diskbased) entity that will records changes at all, is the transaction log. Slightly later, changes will ofcourse be written to the pure database files. Note 1: When you would take a look at the files on the filesystems, the "true" database files would (typically) end with an ".mdf" or ".ndf" extension, while the transaction log files would (typically) have an ".ldf" extension. You might consider any change to the database as a sort of "change vector" or "delta", that is stored in the transaction log first, before it gets applied to the database. Why is that considered to be neccessary? It's for safety and consistency reasons. If you first put changes in a log first, and there happens a power interrupt, or Server crash, then at a new startup, changes can still be applied to the database (they are in the log). If you would not have such a "intermediate" entity, changes would be applied directly to the database files. Those changes could then be "half finished" for example. If you have them recorded in a log first, you can commit finished transactions, and rollback incomplete transactions. This methodology is quite "universal". For example, many good filesystems use "journaling", meaning that changes are placed in as sort log, just before they really get committed. (ofcourse it goes so fast, that we hardly notice it). So, what will happen to the transaction log? Will it grow indefinitely, since it spares up all changes? Yes and No. Or I should say: yes or no. It depends on the "Recovery Model" and/or how often you backup the log. This might all sound "strange", but hang on. It's really not so hard. You can place a database in a certain "Recovery Model", which is "Simple", "Full" and "Bulk Logged". (Note: we are not discussing "Bulk Logged" here, since it does not contribute anything important in such a short note as this). In the SQL Server Management Console (SSMS), if you rightclick your database, and choose "properties", you will find the current Recovery Model of your database. -- ------------------------------------------------------------------------------- 2. Types of "Recovery Models": -- ------------------------------------------------------------------------------- >> Simple means: As usual, the transaction log, will store changes (transactions) first, before they get applied to the database. But, at the socalled "checkpoint" process, which just happens regularly, the transaction log get's clipped. Actually, this means that all transactions that *are for sure already comitted to the database*, gets removed from the transaction log. That effect will not counteract the purpose of this write ahead log, since only stuff gets removed from the transaction log, which was already applied to the database. But you can imagine, that your transaction log will not grow fast, because it get's "trucated" at checkpoint. >> Full means: As usual, the transaction log, will store changes first, before they get applied to the database. But, at the socalled "checkpoint" process, which just happens regularly, the transaction DOES NOT get clipped. So, will it then grow indefinitely? Yes. But it get's clipped if you *backup* the transaction log (to a disk, or tape). Is that logical? Yes, since you have backupped the log, directly after that, it may be emptied of all comitted transactions. * >> Then, what is the purpose of all this.... ? (1): In a "graphical" way, visualize the log like this: | ---- | ---- | ---- - = historical transactions (already applied to the database) V ---- | ^ check marker 1 time ---- | ---- | ---- historical transactions V ---- | ^ check marker 2 | **** | **** * = New transactions (not already applied to the database) V **** So, what might get clipped (with no harm done) is everything above check marker 2. So, if you use the "Simple" model, then indeed (at the regular checkpoint), the log get's clipped. If you use the "Full" model, then the log will not be clipped. It will continue to grow. (2): If you make a Full backup of a database, you ofcourse backup everything of a database to a file (or files), or tape (or tapes). Now, you will not make full backups "all the time, all day long". If you could make backups of the changes only, since the last full backup, that would be ideal. So, then you would (for example) make a full backup at night, and during the day, you would only make backups of the delta's (or the changes) since the last full backup. This is exactly what happens if you would backup the transaction log. -> Now, this strategy is NOT possible using the Simple recovery model, since the log gets clipped at every checkpoint. And that happes quite often -> But it is possible if the database uses the Full recovery model. -- ------------------------------------------------------------------------------- 3. Types of Backup and Restore policies: -- ------------------------------------------------------------------------------- 3.1 Full backups only: ---------------------- Suppose you only would make full backups. For example, you could do that once a day, at 01.00h AM. Then it does not matter at all in which mode (Simple or Full) your database is using. You can make full backups (which means you backup *everything* = all data) of the database, and it does not matter at all in which mode your database is running. Mo Tue Wed Thu 0100h 0100h 0100h 0100h crash at Thu 1600h -|-------------------|-------------------|------------------|-----------X----> Full B Full B Full B Full B Suppose there is a crash a Thursday 16.00h. Then your only option is to restore the backup of Thursday 01.00h. Thus you would have all data up to Thursday 01.00h. Thus you would miss all data that was entered in the database on Thursday up to 16.00h 3.2 Full backup + Transaction log backups: ------------------------------------------ In this case, your database must use the "Full recovery" mode, since in the Simple mode, the transaction log gets clipped (or a better term is "truncated") at each checkpoint. So, for example once a day, at 01.00h AM, you would make a Full Backup. Then, say once per 4 hours, you would make a Transaction log backup, and therebay saving only the changes since the last backup. It's very important to realize, that a Transaction log backup, has only the changes since the former backup (full or Transaction log). In a figure, it "looks" like this: Wed 0100h 0500h 0900h 1300h 1700h crash at 17.15h -----------------------------------------------------------------X---- Full B Tlog B Tlog B Tlog B Tlog B has all has all has all has all changes changes changes changes in the in the in the in the period period period period 0100-0500 0500-0900 0900-1300 1300-1700 In such scenario's a transaction log backup, are always the delta's with respect to the former backup, whether that former backup is a Full, or a Transaction log backup. So, suppose you have a sudden crash at 17.15h which destroys your database. Then you could proceed as follows: - Restore the Full backup of 0100h - Restore the Transaction Log backup of 0500h - Restore the Transaction Log backup of 0900h - Restore the Transaction Log backup of 1300h - Restore the Transaction Log backup of 1700h Other scenario's are possible too. You can restore the Full, and any number of followinf Transaction log backups. Ofcourse, you normally would restore all transaction log backups, in order to get as close as possible to the state of the database before the crash. But in principle, this would be possible too: - Restore the Full backup of 0100h - Restore the Transaction Log backup of 0500h - Restore the Transaction Log backup of 0900h - Restore the Transaction Log backup of 1300h In that case, you have recovered the database up to 13.00h (and missing all data that was altered between 13.00-17.15) Note 2: in the upper scenario we assumed that the database was fully destroyed at 17.15h. If it was not fully destroyed, in some cases, it might have been possible to backup "the last tail" of the transaction log, meaning the changes that occured in 17.00h - 17.15h. In this case, you would have "zero dataloss". So, you might think: this is the policy to go for ! Always place a database in Full Recovery mode, and make scheduled Full backups with scheduled Transaction log backups ! Only, in some exceptional cases, with some nasty applications, it's not feasable to put the database in Full recovery mode. Ofcourse it *should* always be your goal to have full and transaction log backups, especially for production systems. But.., some applications are so "nasty" (or they have a batch which is "nasty"), that the transaction log grows so fast, that you run into space problems. In such a case, people might just vote for the Simple recovery model, so that the transaction log gets clipped (truncated) at every checkpoint. Now, you might think that in this case, Full backups are your only option. Fortunately, it's not. Independent of whatever mode your database is using, you can always create "Differential" backups too ! 3.3 Full backup + Differential backups: --------------------------------------- What we have seen in section 3.2, is probably the best policy for production systems. But in some cases, if the Transaction log grows too fast, we need another policy. A Differential backup, will always store the changes that have occurred since the last Full backup. Contrary to a Transaction log backup, which only has the changes since the former backup (full or transaction), a differential has always the changes since the full backup. So, if you make several differentials per day, the differential backups will grow bigger and bigger. Usually, it is a managable situation. Only, when there are extremely many changes in your database, you might have a situation that the differential backups gets (unmanagebly) large too. But that's a bit extreme. So, usually, this policy is a very good alternative for the Full + transaction log backups. How does SQL Server knows what to backup in a differential backup, after the full backup? It uses certain pages, in which bitmaps are present, representing datapages. After the Full backup is done, and during the following time that datapages are changed, the bitmap in those special pages will alter correspondingly. So if data page N is changed (due to an insert, or update, or delete statement), the registration in the bitmap page is changed as well. This is how SQL server knows, which data pages should be placed in a differential backup. This also explains why a later differential backup is larger than the former differential backup. In a picture, it "looks" lik this. Suppose we make a full backup at 01.00h AM. After that, some differential backups are scheduled. 0100h 0500h 0900h 1300h 1700h crash at 17.15h -----------------------------------------------------------------X---- Full B Diff B Diff B Diff B Diff B has all has all has all has all changes changes changes changes in the in the in the in the period period period period 0100-0500 0100-0900 0100-1300 0100-1700 So, suppose there is a crash at 17.15h. Then you would: - Restore the Full backup of 01.00h - Only restore the Differential backup of 17.00h 3.4 Restore with "WITH RECOVERY" or "WITH NORECOVERY": ------------------------------------------------------ If you would do a restore job with the "WITH RECOVERY" clause, you tell SQL Server to restore a backup, and when that succeeds, make the database ready for use. That is, as soon as the restore finishes, it will be opened for use. If you would do a restore job "WITH NO RECOVERY", you tell SQL Server to restore a backup and that you have the intention to restore additional backups. So, if you would only make Full backups (like discussed in section 3.1), then when you need to restore the database, you would use the "WITH RECOVERY" clause. But if you have a Full backup with additional Transaction Log backups, then you would restore like this: - Restore the FULL backup "WITH NO RECOVERY" - Restore all Later Transaction log backups (except the last one) "WITH NO RECOVERY" - Only the Last Transaction log backup should be restored "WITH RECOVERY" So, if you restore "N" backups, all restores will be done with "WITH NO RECOVERY", except the last one, which uses the "WITH RECOVERY" clause. -- ------------------------------------------------------------------------------- 4. Example Backup and Restore commands: -- ------------------------------------------------------------------------------- It took some time before we came to real examples of backup and restore commands, but the theory in sections 1-3, is really needed to understand what we are doing. Not suprisingly, the TSQL backup and restore commands are "BACKUP" and "RESTORE". Here, we will illustrate the commands using a few examples. 4.1 BACKUP command: ------------------- Suppose we have the database TEST. let's take a quick look to a typical command to create a full backup. Suppose we just want to backup to a diskfile. It is as simple as: BACKUP DATABASE TEST TO DISK='d:\backups\test_full.dmp' WITH INIT The "WITH INIT" clause means that the backup device (here the file d:\backups\test.dmp) will be initialized before the backup, and that means that the former content (if present) will be overwritten (instead that the backup will be appended to the file). If you use "WITH NOINIT", your new backup will be appended to any existing backup in the file. It's probably best not to append to a backupfile, so usually one would use the WITH INIT clause. Here are typical commands if you would want to make a FULL, DIFFERENTIAL and Transaction log backup: --> Full (can be done in any Recovery mode): BACKUP DATABASE TEST TO DISK='d:\backups\test_full.dmp' WITH INIT --> Differential (can be done in any Recovery mode): BACKUP DATABASE TEST TO DISK='d:\backups\test_diff1.dmp' WITH DIFFERENTIAL, INIT --> Transaction Log backup (this can only be done if the database uses the "Full Recovery model") BACKUP LOG TEST TO DISK='d:\backups\test_log1.dmp' WITH INIT Actually, this all there is to it. 4.2 Example using Full and Transaction log backups: --------------------------------------------------- Suppose we want to implement the backup policy as discussed in section 3.2. Here, we have the following schedule: Day at 0100h 0500h 0900h 1300h 1700h h ----------------------------------------------------------------- Full B Tlog B Tlog B Tlog B Tlog B has all has all has all has all changes changes changes changes in the in the in the in the period period period period 0100-0500 0500-0900 0900-1300 1300-1700 Then we could schedule the following commands: BACKUP DATABASE TEST TO DISK='d:\backups\test_full.dmp' WITH INIT -- at 01:00h BACKUP LOG TEST TO DISK='d:\backups\test_log_0500.dmp' WITH INIT -- at 05:00h BACKUP LOG TEST TO DISK='d:\backups\test_log_0900.dmp' WITH INIT -- at 09:00h BACKUP LOG TEST TO DISK='d:\backups\test_log_1300.dmp' WITH INIT -- at 13:00h BACKUP LOG TEST TO DISK='d:\backups\test_log_1700.dmp' WITH INIT -- at 17:00h (Note: database TEST needs to use the Full recovery model) 4.3 Example using Full and Differential backups: ------------------------------------------------ Suppose we want to implement the backup policy as discussed in section 3.3. Here, we have the following schedule: 0100h 0500h 0900h 1300h 1700h ------------------------------------------------------------------ Full B Diff B Diff B Diff B Diff B has all has all has all has all changes changes changes changes in the in the in the in the period period period period 0100-0500 0100-0900 0100-1300 0100-1700 Then we could schedule the following commands: BACKUP DATABASE TEST TO DISK='d:\backups\test_full.dmp' WITH INIT -- at 01:00h BACKUP DATABASE TEST TO DISK='d:\backups\test_diff_0500.dmp' WITH DIFFERENTIAL, INIT -- at 05:00h BACKUP DATABASE TEST TO DISK='d:\backups\test_diff_0900.dmp' WITH DIFFERENTIAL, INIT -- at 09:00h BACKUP DATABASE TEST TO DISK='d:\backups\test_diff_1300.dmp' WITH DIFFERENTIAL, INIT -- at 13:00h BACKUP DATABASE TEST TO DISK='d:\backups\test_diff_1700.dmp' WITH DIFFERENTIAL, INIT -- at 17:00h (Note: database TEST can be in any Recovery model) Some other remarks with respect to the backup commands: ------------------------------------------------------- --> Using Multipe dumpdevices: The larger your database becomes, it might be of help that you can use muliple diskdevices at the same time. This means that the throughput will be larger, and the backupduration will be shorter. Take a look at the following example: BACKUP DATABASE TEST TO DISK='d:\backups\test.dmp', DISK='f:\backups\test.dmp' WITH INIT Note the use of different filesystems (d: and f:) in the example above. --> Example script with some extra's Although the above backup commands are really sufficient to ensure decent backups, you could enhance it a bit using a script. Maybe you just want to add the date to the filename of the backup. Then you should create a script like below, and put it in a SQL Server Agent job. DECLARE @DATE DATETIME DECLARE @BACKUP_DATE VARCHAR(128) DECLARE @FILE VARCHAR(128) DECLARE @FULLFILE VARCHAR(128) DECLARE @EXTENSION VARCHAR(128) SELECT @DATE=GETDATE() SELECT @BACKUP_DATE=CONVERT(VARCHAR(10),@DATE,20) SELECT @FILE='d:\backups\test' SELECT @EXTENSION='.dmp' SELECT @FULLFILE=@FILE+@BACKUP_DATE+@EXTENSION BACKUP DATABASE TEST TO DISK=@FULLFILE WITH INIT GO Just use your creativity to enhance your scripts. 4.4 RESTORE command: -------------------- The backups you created, can be restored using the RESTORE commands. You just use the RESTORE command to restore FULL, Differential and Transaction Log backups. Here is a quick example. Suppose you only need to restore a Full backup of the test database, then you could use: RESTORE DATABASE TEST FROM DISK='d:\backups\test_full.dmp' WITH REPLACE, RECOVERY Because here we only want to restore the Full backup, because we do not have additional Transaction or Differential backups. In such a case, we use the "WITH RECOVERY" clause. Usually, when you restore a database, you want to "overwrite" the original files. You just want to replace the current database with your backup. So, this is why you usually will use the "WITH REPLACE" clause. If both clauses are used at the same time, it is shortened to "WITH REPLACE, RECOVERY". 4.5 Example restore of Full and Transaction log backups: -------------------------------------------------------- Suppose we have implemented the backup policy as discussed in section 4.2. Now, suppose a crash occurred at 17.15 h Before the crash, we had created the following backups BACKUP DATABASE TEST TO DISK='d:\backups\test_full.dmp' WITH INIT -- at 01:00h BACKUP LOG TEST TO DISK='d:\backups\test_log_0500.dmp' WITH INIT -- at 05:00h BACKUP LOG TEST TO DISK='d:\backups\test_log_0900.dmp' WITH INIT -- at 09:00h BACKUP LOG TEST TO DISK='d:\backups\test_log_1300.dmp' WITH INIT -- at 13:00h BACKUP LOG TEST TO DISK='d:\backups\test_log_1700.dmp' WITH INIT -- at 17:00h So, now we need to to restore 5 backups, to restore the database as to it was at 17:00h. Here we go: RESTORE DATABASE TEST FROM DISK='d:\backups\test_full.dmp' WITH REPLACE, NORECOVERY RESTORE LOG TEST FROM DISK='d:\backups\test_log_0500.dmp' WITH NORECOVERY RESTORE LOG TEST FROM DISK='d:\backups\test_log_0900.dmp' WITH NORECOVERY RESTORE LOG TEST FROM DISK='d:\backups\test_log_1300.dmp' WITH NORECOVERY RESTORE LOG TEST FROM DISK='d:\backups\test_log_1700.dmp' WITH RECOVERY The REPLACE clause is only neccessary with the first restore, because that is where you create new files due to the full backup restore. The Transaction log restore do not have the REPLACE clause, because there is nothing to replace: it's only applying transactions to a database. Usually, you will apply all backups you have, to recover as close as possible before the crash. But if you want, for whatever reason, you might want to recover the database to as it was at 13.00h. In that case, you would use this: RESTORE DATABASE TEST FROM DISK='d:\backups\test_full.dmp' WITH REPLACE, NORECOVERY RESTORE LOG TEST FROM DISK='d:\backups\test_log_0500.dmp' WITH NORECOVERY RESTORE LOG TEST FROM DISK='d:\backups\test_log_0900.dmp' WITH NORECOVERY RESTORE LOG TEST FROM DISK='d:\backups\test_log_1300.dmp' WITH RECOVERY Now, to conclude this small note, let's show you how to restore a database to different location. That is, the filesystems and path to where the database must be restored to, is different from the original filesystems and/or paths. Please see the following example. In this RESTORE statement, the MOVE option tells SQL Server to where the files (which information is in the backup) need to get restored. RESTORE DATABASE TEST FROM DISK='L:\temp\test_full.bak' WITH RECOVERY, MOVE 'test_data_1' TO 'D:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\test_data_1.mdf', MOVE 'test_index_1' TO 'D:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\test_index_1.ndf', MOVE 'test_data_2' TO 'E:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\test_data_2.ndf', MOVE 'test_Log' TO 'F:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\test_Log.ldf', REPLACE One problem often seen here, is that you do not know the logical names, like for example the name 'test_data_1'. For an existing database, the logical and physical filenames are easy to find from the sysfiles view, like using the query "select * from sysfiles". Also, to retrieve logical and physical names from the backupfile itself, you can use the RESTORE FILELISTONLY command. So, suppose you have a backupfile like d:\backups\test_full.bak, then you can retrieve the names using: RESTORE FILELISTONLY from disk='d:\backups\test_full.bak' That's it. Hope it was of use.