31 Days of Disaster Recovery

Yüklə 211,03 Kb.

səhifə	15/15
tarix	16.08.2018
ölçüsü	211,03 Kb.
	#63138

1 ... 7 8 9 10 11 12 13 14 15

Summary

The reason I like to talk about talking about things like piecemeal restores, is that it gives us really good insight into how things work. Understanding how things work and why they work is key to figuring out how to do things that aren’t spelled out for you in a manual somewhere. It enables thinking out of the box and doing things that you might otherwise assume is undo-able.

You can download the code for this demo here: Demo_PiecemealRestoreRestore.zip (2 KB)

Day 28: Recovering SQL if the Tempdb Drive Dies

Welcome to day 28 of my series 31 Days of Disaster Recovery. Today I want to talk about recovering SQL Server if your tempdb drive suddenly dies. SQL Server won’t start if it can’t start up tempdb, so it poses a challenge if you can’t get the drive up and running again.

Recovering Tempdb

So let’s say that you’re running queries on your SQL Server one day and suddenly the query fails saying there was a transport error. You run it again, and the query window is not able to connect to the server again. You check the SQL services and you see that SQL isn’t running. Obviously, you try to start SQL running again via SQL Server Configuration Manager, and it shows that SQL Server started. You try your query again and get the same result. Next, you find the SQL error log and open it in Notepad, and you see the following errors:

Clearing tempdb database.

Error: 5123, Severity: 16, State: 1.

CREATE FILE encountered operating system error 3(The system cannot find the path specified.) while attempting to open or create the physical file 'E:\tempdbtempdb.mdf'.

Error: 17204, Severity: 16, State: 1.

FCB::Open failed: Could not open file E:\tempdbtempdb.mdf for file number 1. OS error: 3(The system cannot find the path specified.).

Error: 5120, Severity: 16, State: 101.

Unable to open the physical file "E:\tempdbtempdb.mdf". Operating system error 3: "3(The system cannot find the path specified.)".

Error: 1802, Severity: 16, State: 4.

CREATE DATABASE failed. Some file names listed could not be created. Check related errors.

Could not create tempdb. You may not have enough disk space available. Free additional disk space by deleting other files on the tempdb drive and then restart SQL Server. Check for additional errors in the event log that may indicate why the tempdb files could not be initialized.

SQL can’t create or open the tempdb files. The errors state that it can’t find the path specified. So you look at disk manager, and the drive is gone. It’s dead. Unrecoverable. Well, we’re not worried about losing anything in tempdb, we just want to get the server up and running again. One option is to bring another drive online with the same name as the old one and let tempdb recreate its files at startup. Let’s assume that’s not an option for you at this time and you’re only recourse is to change where tempdb is looking for its files. How do you do that if you can’t start it to change the file locations of SQL Server?

The solution is to start SQL Server in minimal configuration with trace flag 3608. Trace flag 3608 skips startup of all databases except master. Then connect via SQLCMD and make the changes to the tempdb file paths. Open a command window and run the following command to start SQL Server. In this example, I am starting a named instance named SQL13 so the server name format is MSSQL%. The other options are /f to start in minimal configuration mode and /t3608 to start with trace flag 3608 and not startup any database other than master.

net start mssql$sql13 /f /t3608

Then I connect via SQLCMD and modify the file paths. Since I start SQL in minimal configuration mode, I will need to connect via the Dedicated Admin Connection (DAC) (-A option). Just make sure you created the required directory structure for the tempdb files. The two commands I will run (1 for each tempdb file) are:

sqlcmd -A -S SQLLaptopSQL13 -q"Alter Database tempdb Modify File (Name = 'tempdev', FileName = 'G:\tempdbtempdb.mdf');"

sqlcmd -A -S SQLLaptopSQL13 -q"Alter Database tempdb Modify File (Name = 'templog', FileName = 'G:\tempdbtemplog.ldf');"

Then just restart the service and you’re good to go.

Day 29: Using Database Snapshots to Restore Replicated Databases in Test

For day 29 of my 31 Days of Disaster Recover series, I want to talk about restoring replicated databases from database snapshots. Someone asked me recently if I had any recommendations for speeding up their unit testing on replicated databases. He has to run unit tests over and over on replicated databases to ensure that they deploy successfully in a replicated environment. His current process was adding too much time to the process and automating it was difficult for him because he doesn’t know replication to a great depth. My recommendation was to use database snapshots.

Restoring Replicated Databases

My friend’s current process for resetting his test replication environment was to drop replication, restore the publisher and subscriber, and then re-setup replication from scratch. This was time consuming for him, and he didn’t have the replication knowledge to automate the process. He had done some research and was considering using the sync with backup option for the distributor and publisher databases so that he could just restore them and they would work. He would still have to restore the subscription database or reinitialize the subscriber to get replication to work

I made the alternate suggestion of using database snapshots instead of regular backups for the restore of the replication databases. Since this is in a non-production environment, they are using Developer Edition even if they are using Standard Edition in productions. The trick is to create the snapshot of the publisher, distributor, and subscriber at the same time while there is no activity occurring and replication is in sync. Then when you need to reset the test environment, simply revert (restore) all 3 databases back to the snapshots at the same time. As far as replication knows, nothing has happened and everything is perfectly in sync. This process requires no replication knowledge to automate as it’s simply creating 3 database snapshots and then reverting the databases. And of course, both the snapshot creation and restore are sub-second processes adding less than a second to the turn-around of the test databases.

I sent him some sample code of how to do it:

Use master;

-- Create database snapshots of all 3 replication databases

-- distributor:

Create Database distribution_snap

On (Name = 'distribution',

FileName = 'D:\Program FilesMicrosoft SQL ServerMSSQL11.SQL12MSSQLDatadistribution.ndf')

As Snapshot of distribution;

Go
-- Publisher

Create Database Publisher_snap

On (Name = 'Publisher',

FileName = 'D:\Program FilesMicrosoft SQL ServerMSSQL11.SQL12MSSQLDataPublisher.ndf')

As Snapshot of Publisher;

Go
-- Subscriber

Create Database Subscriber_snap

On (Name = 'Subscriber',

FileName = 'D:\Program FilesMicrosoft SQL ServerMSSQL11.SQL12MSSQLDataSubscriber.ndf')

As Snapshot of Subscriber;

Go
-- Restore databases as revert to database snapshots

-- Need to ensure that no connections exist in databases

-- distributor:

Use distribution;

Alter Database distribution Set Single_User With Rollback Immediate;
Use master;
Restore Database distribution

From Database_Snapshot = 'distribution_snap';

Go
-- Publisher

Use Publisher;

Alter Database Publisher Set Single_User With Rollback Immediate;
Use master;
Restore Database Publisher

From Database_Snapshot = 'Publisher_snap';

Go
-- Subscriber

Use Subscriber;

Alter Database Subscriber Set Single_User With Rollback Immediate;
Use master;
Restore Database Subscriber

From Database_Snapshot = 'Subscriber_snap';

It is important that the code for the revert to the database snapshot is run as a single batch. Don’t run it command by command. If you pause long enough between the point where you switch to master database after changing the database to single-user, you give time for other processes to get the single connection to the database, and the restore will be blocked. Just run the whole thing as a single batch.

Day 30: Using Partial Availability and Initialize from Backup to Replicate a Partial Database

It’s been a tough and long road to 31 Days of Disaster Recovery. It’s been very difficult coming up with quality topic ideas for the series as we near the end. For day 30 of the series, I am combining a post on performing piecemeal restores with a post on filegroups for T-SQL Tuesday #40 and a post on replication.

In case you’re not familiar with T-SQL Tuesday, let me enlighten you. This blog entry is participating in T-SQL Tuesday #40, hosted this month by the Midnight DBA Jen McCown (Blog|@MidnightDBA|@JenniferMcCown). You are invited to visit his blog to join the blog party. You are welcome to write your own participating blog post for the party or just to read more blogs participating in this month’s theme: File and Filegroup Wisdom.

Replicating a Partial Database

Let’s assume a scenario where you have a very large database (VLDB) and you want to replicate only part of the database, but you don’t want to use a snapshot to initialize the subscription and you don’t want to copy the whole database backup to the subscriber and restore it. For example, if you have a 5 TB database with mostly historical, unchanging data and you only want to replicate a small percentage of the database. For this example, let’s assume that 3 TB of data is old data and 2 TB is current data and you only want to replicate the current data. How would you go about doing it in the fastest, easiest method?

Initialize from snapshot? Snapshots are slow and performance impacting. There is limited support for compression in snapshots, but it uses Windows CAB compression that can be CPU intensive and has limitations such as a 2 GB file limit. Maybe your production database can take that sort of performance hit, maybe not.

Initialize from backup? The trouble with this option is the documented way to do this is to back up the entire database, copy the entire database over, and restore the entire database. For 5 TB of data, this can take a really long time. Additionally, the entire database gets restored and if you don’t want the extra 3 TB of data in the database, you have to delete it yourself. Even more time and trouble to set up, not to mention the performance hit on the subscriber.

From Books Online:

A backup contains an entire database; therefore each subscription database will contain a complete copy of the publication database when it is initialized.
…
It is the responsibility of the administrator or application to remove any unwanted objects or data after the backup has been restored.

But there’s a better way. You can take the one-time hit of separating your tables to different filegroups (assuming they’re not already that way, and they’re probably not) in order to be able to set up replication in a much quicker and simpler way. Once you have the files segregated, you can use partial database availability to restore only the filegroups that contain the tables you want to replicate.

One of the caveats for doing this is that you have to backup the primary filegroup with any other filegroups you want to restore. The primary filegroup always has to be restored as it contains all of the system objects that define everything in the database, and the primary filegroups and any other filegroups must be based on the same recovery point, so they must be backed up together. You can perform a filegroup backup of filegroups by name or just the read/write filegroups which include all filegroups not marked as read-only. On the other hand, the restore can actually be performed from a full backup by specifying which filegroups to restore.

Any filegroups not restored on the subscriber will still be listed as part of the database, but the files in those filegroups will be listed as RECOVERY_PENDING. Any attempts to query tables in those filegroups will return an error stating that the file is not online. Then it’s a matter of setting up replication subscription to initialize from backup.

You may be wondering, “Any downsides to this process?“. Well, partial database availability is an Enterprise Only feature. So, yeah, you must be on Enterprise Edition (or Enterprise or Datacenter Edition if on SQL 2008 R2).

Steps to Set Up and Demo Code

I’ve done presentations on this process and written a whitepaper on it. I’m not going to repeat all of the details that I’ve presented elsewhere. I recommend downloading my whitepaper Initializing Replication from Backup and my slide deck and demo code from SQL Saturday #107 in Houston, TX.

Whitepaper: Initializing Replication from Backup
Replication Magic: Initializing from Backup: ReplicationMagic.zip (5.04 MB)

Day 31: Backup and Restore of the Resource Database

It has been a long journey to the final day my 31 Days of Disaster Recovery series, but we have finally reached the final post, day 31. My final topic for the series is born out of a conversation I had today with my good friend and fellow DBA, Ed Watson (blog|@SQLGator). Today, I want to talk about backing up and restoring the resource database.

The resource database stores critical system objects safely separated away from the master database. It is critical in operation of the system and the master database is fairly useless without it. If you lose the resource database, the system cannot start the master database and SQL Server will not being able to start. So we need to back it up and be prepared to restore it if need be.

Back Up the Resource Database

SQL Server service packs (SPs), cumulative updates (CUs), hotfixes, etc., may or may not upgrade the resource database to a new version. It is highly critical that we back up the resource database after installing an SP, CU, hotfix, etc. to ensure that we have a current version. In some scenarios, you may be able to restore a slightly older version of the resource database and then reapply all patches and updates to bring it current. In some cases, you may not. The safest bet is to always make sure you have a current copy of the database.

If you thought the resource database was backed up with the master database, it is not. You must back it up separately. You can’t, however, back it up using native SQL Server backup. My recommended process is to simply copy the resource database files to the backup directory using simple file copy (robocopy, xcopy, copy-item, etc). Then I am sure that whatever process saves my backups to tape or alternative storage is saving backups of the resource database as well.

Below are a couple examples of how I back up the resource database as part of an automated job. Each example copies the files to the backup folder and adds a data string in the format _YYYYMMDD to the end of the filename.

Windows script (DOS):

for /f "tokens=1* delims= " %%a in ('date/t') do set cdate=%%b

for /f "tokens=1,2 eol=/ delims=/ " %%a in ('date/t') do set mm=%%b

for /f "tokens=1,2 delims=/ eol=/" %%a in ('echo %cdate%') do set dd=%%b

for /f "tokens=2,3 delims=/ " %%a in ('echo %cdate%') do set yyyy=%%b

set date=%yyyy%%mm%%dd%

xcopy "C:\Program filesMicrosoft SQL ServerMSSQL11.SQL13MSSQLBinnmssqlsystemresource.*" D:\BackupSQL13mssqlsystemresourcemssqlsystemresource_%date%.* /J /Q

PowerShell:

$date = "_" + (get-date -format yyyyMMdd) + "."

copy-item -path "C:\Program filesMicrosoft SQL ServerMSSQL11.SQL13MSSQLBinnmssqlsystemresource.*" -destination D:\BackupSQL13mssqlsystemresource -force -passthru | rename-item -newname { $_.name -replace '.',$date}

Restore the Resource Database

The process of restoring the resource database is just as straightforward as backing it up. If you determine that your resource database is corrupted, fixing it is as simple as copying a good copy of the resource database files in place of the old ones. If corruption is detected, SQL Server will not be able to start and you will find a message in the application event file similar to the following:

SQL Server detected a logical consistency-based I/O error: invalid protection option. It occurred during a read of page (1:3289) in database ID 32767 at offset 0x000000019b2000 in file ‘C:\Program FilesMicrosoft SQL ServerMSSQL11.SQL13MSSQLBinnmssqlsystemresource.mdf’. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

If you are wondering, yes, I intentionally corrupted my resource database file to be able to demonstrate this problem. Since SQL Server is offline, I copy a good backup copy of the files in place of the old ones, and then SQL is able to restart successfully. Easy, peezy, lemon-squeezy.

But what happens if you don’t have any backups of the resource database files? Well, you can always rebuild the master database (see Books Online for the process) which means all system databases get rebuilt and you will have to recreate server objects (logins, linked servers, SQL jobs, etc) and settings (server configurations, changes to the model database, replication settings). In short, rebuilding the master database can be a colossal pain.

So be safe and always have backups!

Yüklə 211,03 Kb.

Dostları ilə paylaş:

1 ... 7 8 9 10 11 12 13 14 15