Saturday, September 09, 2006

Find tape numbers for backups done using rman-netbackup from rman catalog.

Details:
One of client who has multiTerabytes of oracle databases has recently purchased Veritas Netbackup version 4.5. They are fairly new to this media management solution. They have also purchased oracle agent for netbackup so that, rman can be interfaced to take backups to tape.

All went as per plan to implement veritas netbackup. They also hired new netbackup consultant to train their employees and manage/setup netbackup as per industry standard.

After couple of month I received call from client indicating they were not able to restore oracle database which was backed few months earlier by one of their DBA as they don't know how to locate the backups tapes required to restore that backup.


I got engaged on a conf call.


Here is brief history from the conference call,

After netbackup was implemented in production as enterprise wide backup solution, their production DBA interfaced RMAN with oracle agent and started taking backups to tape. Backups included hot, archive along with controlfile backups.

In this type of backup, typical role of rman is to make data available and stream to media management layer (MML), it was doing its job without any issues. As long as tape management and retention is concerned, they are primarily controlled by MML (netbackup in this case).
In order to perform restore of backups done, we will need to first identify tapes and then catalog and load them in tape library. Since our retention period is 180 days (that's 6months) restoring tape which was a month old was not a problem however, finding right one was.


Approach/Solution:

Now question was what are those tape numbers? I approached netbackup administrator. He explained me that, when they create any oracle/rman backup to tape, they create backup policy and then assign tape drives to that backup policy along with server names and other parameters including retention window. However, when data is written to those tapes, they can be streamed from multiple servers at the same time and netbackup team had no way to identify which tape was written with what data as with oracle/rman backups netbackup receives no further details.

Netbackup administrator was not able to grasp those tape numbers and we needed them for restore in any case. Time was running out since we had down functionality of production system, I decided to look further.

Our backup strategy included running all production databases backups with rman catalog. Also, we "tag" our backup with database name and day of week hence, it will be easy to identify them and restore them once we locate them in rman repository. With that in mind, I decided to look in detail in rman catalog.

I browsed all RC_ views in rman catalog. While browsing, one of the view caught my attention. RC_BACKUP_PIECE view was of special interest. I described view and it had a column named "media". I was excited to find that detail.
I subqueried RC_DATBASE to supply database name to RC_BACKUP_PIECE and grouped results within date window with "tag" as below:


select count(1),trunc(COMPLETION_TIME)TIME,media from rc_backup_piece
where tag like 'SALEPRD_SUNDAY%'
and handle like 'db%'
and completion_time between (sysdate - 35) and (sysdate - 25)
and db_id in (select dbid from rc_database where name ='SALEPRD')
group by trunc(COMPLETION_TIME),media;

Output:
COUNT(1) TIME MEDIA
-------- ----------- --------
10 04-AUG-2004 100517
10 04-AUG-2004 100550
10 04-AUG-2004 100590


Wonderful !! I was able to isolate all backups required during the time window which was idle for restore purpose along with tape numbers. I will now just need to verify with netbackup administrator and make sure those tapes are correct for the pool assigned for this backup policy.


Now we were in much better shape.I trimmed query to display tapes required for archivelog backup required for restore and supplied those tapes to netbackup administrator.

select count(1),trunc(COMPLETION_TIME)TIME,media from rc_backup_piece
where tag like 'SALEPRD_AL%'
and handle like 'al%' -- "Note...I changed handle from db to al to find out archivelogs"
and completion_time between (sysdate - 35) and (sysdate - 25)
and db_id in (select dbid from rc_database where name ='SALEPRD')
group by trunc(COMPLETION_TIME),media;


Conclusion:
Within short time,RMAN started performing restore/recovery and production functionality was restored. Client was very pleased to know that we had found our way to get restore done. This feature of RMAN catalog keeping track of all tapes required for oracle backup came handy later on at various other sites where I have worked with netbackup/rman.


My 2 cents:
I had recently used this functionality at another client site who were using TSM instead of netbackup and found that, TSM did not supply any further details than pool name to rman and hence there was not much details available in terms of locating tapes. However, TSM administrator have way to isolate tapes on certain days where backup was performed under certain pool and hence restore was simple when TSM administrator was called upon to perform same task.


support on how to get details on tape numbers which was At Later point of time, netbackup administrator from my previous client learned from veritas backed up by rman/oracle.

Saturday, September 02, 2006

Table restore from complete RMAN backup

Scenario:
Nisha works at premier bank headquarters in Virginia.They have databases primarily running with versions 8i(8.1.7.4) and 9i (9.2.0.6).Full backups are scheduled to run every Sunday using rman. Archivelog backups are taken every two hours. On Tuesday night at 8pm Nisha receives call from management team that, one of application person had dropped a SALES table from "SALE" database.

Nisha is engaged on a conference call to gather more details. Soon she realizes that, she may be out of options other than doing full restore/recovery of database and that will take long time. At this point of time she plans to consult with other DBA team members from branch office. Brian came up with alternate plan which will allow to perform table restore up to point in time in past using RMAN backups.


Plan of action:

Brian lays out complete Process and scripts and explain details to Nisha,

1] Find out last full successful backup.
2] Pickup date and time of restore and calculate duration between last full backup and restore time.
3] Find archivelogs required to roll forward from last full backup till restore time.
4] Now find out tablespace in which table was created.
5] Identify datafiles in that tablespace.
6] Calculate space required for datafiles and other required files. Lets assume SALES_DATA tablespace had 4 datafiles each of 2G, so total of 8G then we have system tablespace of 300M and 2 redo log groups with 2 members each of 100M that will be total of 400M. Finally total archivelogs required are 24 and hence space for archivelog required for recovery will be 2.4G.
Total space will be 8G (datafiles )+ 2.4G (archive) + 400M( redo) + 300M(system) + 200M (control files and other scripts)
is 11.3G.
7] Find out oracle home which have binaries installed and running with same version
8] If you don't want to perform restore/recovery on production server and you have available test server with similar configuration, you can add space on TEST server and NFS mount to prod server to avoid redirected restore and recovery effort.
In this case, test server has similar version and patchset available and hence we will use TEST server to run restore/recovery.

We had temp space /u_stage of 15G which we mounted on TEST server and then NFS mounted to PROD server which will allow us to perform restore operation. Once restore is complete, we will remove nfs mount and perform recovery.

9] run restore of datafiles with system and 4 datafiles

run{

allocate channel t1 type 'SBT_TAPE' parms
'ENV=;

set until time "to_date('

Thursday, August 24, 2006

Sending email from Unix with attachment and your name as "reply to" person

Challenges/Requirements:
One afternoon received phone call from customer.They wanted to run a report on daily basis to look for obsolete records in database and have them mail out report for those obsolete records. This report was also going to be viewed by higher management and hence required better arrangement in email.

Thinking:
1 We all know mailx commmand by now, which will help us to send emails from unix server to get information/reports on regular basis when put in a script and scheduled via CRON.
2 Since this email report was going to be viewed by higher management, proper text layout and viewing arrangement was necessary. So this time I decided to add attachment to email along with the body Which is different than sending regular mail.
3 One major drawback I have seen since so many years is that, we always send email out via mailx command where recepient receives that email from "unix user" who is running mailx command. In most of database accounts user name is oracle, so we get sender as "oracle@'hostname.domain.com' ". If /etc/passwd file has owner as physical person, then we may see alias to above email sender as " 'account owner name' oracle@'hostname.domain.com' "
4 Also, when recepient do "reply to all", they will need editing in sender list, else it will bring "mailer daemon" to replyer of that email since unix host will not receive any emails.
5 Finally, If I am not around to reply in repsonse to any issues in that report, we needed my manager to be CC'ed so that, he can allocate resource to take care of any work.
6 Creating report was easy part of all. So finally decided to create a quick script before I leave home for the day.


Implementation/scripts/outputs :
1 created a regular shell script and scheduled via CRON
2 "-r" in mailx command is "reply to" option. If we provide "my_name" along with -r option then, it will be replied back to me.
3 Attachment to mailx command is simple. We need to pipe "body of email" to file then pipe that to uuencode command to make an attachment and than pipe it again to mailx

--Script details
LOGFILE=/home/oracle/1.log
DATA=/home/oracle/1.data
.
..
... {rest of script to spool data via sqlplus in to file named LOGFILE }
....
echo "Please find attached file.\n Thanks,\n Sandeep " >${DATA}|uuencode ${LOGFILE} Obsolete_data.txt|cat ${DATA} -|mailx -s "Obsolete data from ${ORACLE_SID} database" -c " 'my_manager'@domain.com, 'my_name'@domain.com" -r " 'my_name'@domain.com"
" 'client_1'@customer.domain.com,'client_2'@Customer.domain.com"
.....


4 In mailx command
mailx -s "Obsolete data from ${ORACLE_SID} database" -c " 'my_manager'@domain.com, 'my_name'@domain.com" -r "@domain.com" " 'client_1'@customer.domain.com, 'client_2'@Customer.domain.com"

-c will perform Carbon Copy to my manager
-r will "reply to" person described after that option
-s is subject of email
5 In uunecode command
uuencode ${LOGFILE} Obsolete_data.txt|cat ${DATA}

${LOGFILE} is file spooled by sqlplus which has all report of obsolete record. Remember to view this file in "wordpad" when received in email as an attachment to see proper format.
Followed by attachment is name you want to provide to attached file
${DATA} contains body of email.

Conclusion:
We have successfully created email from unix host with an attachment which not only provides security to sensitive data,but also improves readability of emails by keeping required information in attached files. Customer was very pleased to receive daily report with proper format along with management contacts from my side.

Introduction...

Hello folks,

I have just created this blog for posting out rman/backup and recovery techniques and discuss new/unpublished features along with with tips which will help us supporting Oracle databases.

Stay tuned for upcoming post.

Sandeep.