Saturday, September 09, 2006

Find tape numbers for backups done using rman-netbackup from rman catalog.

Details:
One of client who has multiTerabytes of oracle databases has recently purchased Veritas Netbackup version 4.5. They are fairly new to this media management solution. They have also purchased oracle agent for netbackup so that, rman can be interfaced to take backups to tape.

All went as per plan to implement veritas netbackup. They also hired new netbackup consultant to train their employees and manage/setup netbackup as per industry standard.

After couple of month I received call from client indicating they were not able to restore oracle database which was backed few months earlier by one of their DBA as they don't know how to locate the backups tapes required to restore that backup.


I got engaged on a conf call.


Here is brief history from the conference call,

After netbackup was implemented in production as enterprise wide backup solution, their production DBA interfaced RMAN with oracle agent and started taking backups to tape. Backups included hot, archive along with controlfile backups.

In this type of backup, typical role of rman is to make data available and stream to media management layer (MML), it was doing its job without any issues. As long as tape management and retention is concerned, they are primarily controlled by MML (netbackup in this case).
In order to perform restore of backups done, we will need to first identify tapes and then catalog and load them in tape library. Since our retention period is 180 days (that's 6months) restoring tape which was a month old was not a problem however, finding right one was.


Approach/Solution:

Now question was what are those tape numbers? I approached netbackup administrator. He explained me that, when they create any oracle/rman backup to tape, they create backup policy and then assign tape drives to that backup policy along with server names and other parameters including retention window. However, when data is written to those tapes, they can be streamed from multiple servers at the same time and netbackup team had no way to identify which tape was written with what data as with oracle/rman backups netbackup receives no further details.

Netbackup administrator was not able to grasp those tape numbers and we needed them for restore in any case. Time was running out since we had down functionality of production system, I decided to look further.

Our backup strategy included running all production databases backups with rman catalog. Also, we "tag" our backup with database name and day of week hence, it will be easy to identify them and restore them once we locate them in rman repository. With that in mind, I decided to look in detail in rman catalog.

I browsed all RC_ views in rman catalog. While browsing, one of the view caught my attention. RC_BACKUP_PIECE view was of special interest. I described view and it had a column named "media". I was excited to find that detail.
I subqueried RC_DATBASE to supply database name to RC_BACKUP_PIECE and grouped results within date window with "tag" as below:


select count(1),trunc(COMPLETION_TIME)TIME,media from rc_backup_piece
where tag like 'SALEPRD_SUNDAY%'
and handle like 'db%'
and completion_time between (sysdate - 35) and (sysdate - 25)
and db_id in (select dbid from rc_database where name ='SALEPRD')
group by trunc(COMPLETION_TIME),media;

Output:
COUNT(1) TIME MEDIA
-------- ----------- --------
10 04-AUG-2004 100517
10 04-AUG-2004 100550
10 04-AUG-2004 100590


Wonderful !! I was able to isolate all backups required during the time window which was idle for restore purpose along with tape numbers. I will now just need to verify with netbackup administrator and make sure those tapes are correct for the pool assigned for this backup policy.


Now we were in much better shape.I trimmed query to display tapes required for archivelog backup required for restore and supplied those tapes to netbackup administrator.

select count(1),trunc(COMPLETION_TIME)TIME,media from rc_backup_piece
where tag like 'SALEPRD_AL%'
and handle like 'al%' -- "Note...I changed handle from db to al to find out archivelogs"
and completion_time between (sysdate - 35) and (sysdate - 25)
and db_id in (select dbid from rc_database where name ='SALEPRD')
group by trunc(COMPLETION_TIME),media;


Conclusion:
Within short time,RMAN started performing restore/recovery and production functionality was restored. Client was very pleased to know that we had found our way to get restore done. This feature of RMAN catalog keeping track of all tapes required for oracle backup came handy later on at various other sites where I have worked with netbackup/rman.


My 2 cents:
I had recently used this functionality at another client site who were using TSM instead of netbackup and found that, TSM did not supply any further details than pool name to rman and hence there was not much details available in terms of locating tapes. However, TSM administrator have way to isolate tapes on certain days where backup was performed under certain pool and hence restore was simple when TSM administrator was called upon to perform same task.


support on how to get details on tape numbers which was At Later point of time, netbackup administrator from my previous client learned from veritas backed up by rman/oracle.