Oracle RAC 10g 의 Failover 테스트 방법


 

시스템
네트웍
구성
점검

 

DB Server Network

/etc/hosts

kblotdb1@oracle10:/home2/oracle10/work>cat /etc/hosts

# @(#)B.11.11_LRhosts $Revision: 1.9.214.1 $ $Date: 96/10/08 13:20:01 $

#

# The form for each entry is:

# <internet address> <official hostname> <aliases>

#

# For example:

# 192.1.2.34 hpfcrm loghost

#

# See the hosts(4) manual page for more information.

# Note: The entries cannot be preceded by a space.

# The format described in this file is the correct format.

# The original Berkeley manual page contains an error in

# the format description.

#

 

127.0.0.1 localhost loopback

10.55.50.201 kblotdb1

10.55.50.202 kblotdb2

10.55.49.206 kblotdb1_int

10.55.49.207 kblotdb2_int

10.55.50.208 kblotdb1_vip

10.55.50.209 kblotdb2_vip

  • kblotdb1/kblotdb2 public IP, kblotdb1_int/kblotdb2_int cluster interconnect, kblotdb1_vip/kblotdb2_vip oracle VIP

 

netstat

kblotdb1@oracle10:/home2/oracle10>netstat -in

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

lan3* 1500 none none 0 0 0 0 0

lan2 1500 10.55.49.0 10.55.49.206 11357335 0 11975506 0 0

lan1 1500 10.55.50.0 10.55.50.201 6916147 0 13666471 0 0

lan0 1500 192.1.1.0 192.1.1.1 1044011 0 1973741 0 0

lo0 4136 127.0.0.0 127.0.0.1 34678497 0 34678513 0 0

lan4* 1500 none none 0 0 0 0 0

lan1:1 1500 10.55.50.0 10.55.50.208 7737385 0 2486558 0 0


  • 서버
    모두 public IP lan1 10.55.50.x
    사용하고
    있고, cluster interconnect lan2 10.55.49.x
    사용하고
    있음
  • Oracle VIP public IP lan1
    대해
    구성되어
    있음 (lan1:1)
  • lan3 lan4 standby
    구성되어
    이중화가
    되어
    있음
  • lan0 HP MC/SG cluster HeartBit으로
    구성되어
    있음

     

modify nodeapps

lan1
장애가
발생하면 lan3 lan1 IP
받게
되므로, oracle VIP lan3
정보를
가지고
있어야
. 아래와
같은
작업이
필요. (root user
수행해야
)

/home2/oracle10/bin/srvctl modify nodeapps -n kblotdb1 -o /home2/oracle10 -A kblotdb1_vip/255.255.255.0/lan1\|lan3

/home2/oracle10/bin/srvctl modify nodeapps -n kblotdb2 -o /home2/oracle10 -A kblotdb2_vip/255.255.255.0/lan1\|lan3


  • 작업을
    위해서는 DB, nodeapps 모두
    내리고
    해야
    . (관련
    정보는 metalink Note 296874.1 참조)

 

 

 

Cluster Interconnect

앞의
구성에서


있지만, sysdba
접속한 SQL상에서
다음과
같은
명령에
생성된 trace file
보고
확인할

있음. trace file udump
생성됨.

SQL> oradebug setmypid

SQL> oradebug ipc

SSKGXPT 0x275efc flags SSKGXPT_READPENDING info for network 0

socket no 8 IP 10.55.49.206 UDP 54216

sflags SSKGXPT_UP

info for network 1

socket no 0 IP 0.0.0.0 UDP 0

sflags SSKGXPT_DOWN

context timestamp 0

no ports

sconno accono ertt state seq# sent async sync rtrans acks

ach accono sconno admno state seq# rcv rtrans acks

 

SSKGXPT 0x275fb4 flags SSKGXPT_READPENDING info for network 0

socket no 8 IP 10.55.49.207 UDP 51946

sflags SSKGXPT_UP

info for network 1

socket no 0 IP 0.0.0.0 UDP 0

sflags SSKGXPT_DOWN

context timestamp 0

no ports

sconno accono ertt state seq# sent async sync rtrans acks

ach accono sconno admno state seq# rcv rtrans acks

  • UDP
    해당하는 IP
    보면, 앞에서
    살펴본
    바와
    동일함을


    있음

 

 

 

REMOTE_LISTENER

테스트
이전

장애
테스트
, 서버의 listener 정보가
다음과
같았음

kblotdb1@oracle10:/home2/oracle10/admin/dslot/udump>lsnrctl ser LISTENER_KBLOTDB1

 

LSNRCTL for HPUX: Version 10.1.0.4.0 – Production on 21-SEP-2005 14:59:44

 

Copyright (c) 1991, 2004, Oracle. All rights reserved.

 

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb1_vip)(PORT=1521)))

Services Summary…

Service “PLSExtProc” has 1 instance(s).

Instance “PLSExtProc”, status UNKNOWN, has 1 handler(s) for this service…

Handler(s):

“DEDICATED” established:0 refused:0

LOCAL SERVER

Service “dslot” has 2 instance(s).

Instance “dslot1”, status READY, has 2 handler(s) for this service…

Handler(s):

“DEDICATED” established:481 refused:0 state:ready

LOCAL SERVER

“DEDICATED” established:0 refused:0 state:ready

REMOTE SERVER

(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb1_vip)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=dslot)(INSTANCE_NA

ME=dslot1)))

Instance “dslot2”, status READY, has 1 handler(s) for this service…

Handler(s):

“DEDICATED” established:131 refused:0 state:ready

REMOTE SERVER

(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb2_vip)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=dslot)(INSTANCE_NAME=dslot2)))

The command completed successfully

 

REMOTE SERVER
등록된
배경은
다음 init.ora tnsnames.ora
내용을
살펴보면
. (현재 spfile
사용하고
있지
않음)

remote_listener=LISTENERS_DSLOT

dslot1.local_listener =’LOCAL_DSLOT1′

dslot2.local_listener =’LOCAL_DSLOT2′

 

LOCAL_DSLOT2 =

(DESCRIPTION =

(ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb2_vip)(PORT = 1521))

(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = dslot)

(INSTANCE_NAME = dslot2)

)

)

 

LOCAL_DSLOT1 =

(DESCRIPTION =

(ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb1_vip)(PORT = 1521))

(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = dslot)

(INSTANCE_NAME = dslot1)

)

)

 

LISTENERS_DSLOT =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb1_vip)(PORT = 1521))

(ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb2_vip)(PORT = 1521))

)

 

 

REMOTE_LISTENER

REMOTE_LISTENER
정의되어
있으면 server단에서 connection load balancing
이루어지기
때문에, client
의도하지
않은
상황으로 DB connection
맺어질

있음.

  • BEA WebLogic Connection pool
    사용하기
    때문에, 굳이 REMOTE_LISTENER
    사용할
    필요가
    없음.
  • 그리고 LOCAL_LISTENER
    해당하는 tnsnames.ora CONNECT_DATA
    불필요함

 

 

테스트
상황

따라서 init.ora tnsnames.ora
아래와
같이
구성해서 REMOTE_LISTENER
사용하지
않도록
.    

#remote_listener=LISTENERS_DSLOT

dslot1.local_listener =’LISTENER_DSLOT1′

dslot2.local_listener =’LISTENER_DSLOT2′

 

LISTENER_DSLOT1 =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb1_vip)(PORT = 1521))

)

 

LISTENER_DSLOT2 =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb2_vip)(PORT = 1521))

)

 


상황에서

서버의 listener 정보는
다음과
같음

kblotdb1@oracle10:/home2/oracle10/dbs>lsnrctl ser LISTENER_KBLOTDB1

 

LSNRCTL for HPUX: Version 10.1.0.4.0 – Production on 21-SEP-2005 15:42:03

 

Copyright (c) 1991, 2004, Oracle. All rights reserved.

 

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb1_vip)(PORT=1521)))

Services Summary…

Service “PLSExtProc” has 1 instance(s).

Instance “PLSExtProc”, status UNKNOWN, has 1 handler(s) for this service…

Handler(s):

“DEDICATED” established:0 refused:0

LOCAL SERVER

Service “dslot” has 1 instance(s).

Instance “dslot1”, status READY, has 1 handler(s) for this service…

Handler(s):

“DEDICATED” established:22 refused:0 state:ready

LOCAL SERVER

The command completed successfully

 

kblotdb2|/home2/oracle10/dbs> lsnrctl ser LISTENER_KBLOTDB2

 

LSNRCTL for HPUX: Version 10.1.0.4.0 – Production on 21-SEP-2005 15:46:24

 

Copyright (c) 1991, 2004, Oracle. All rights reserved.

 

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb2_vip)(PORT=1521)))

Services Summary…

Service “PLSExtProc” has 1 instance(s).

Instance “PLSExtProc”, status UNKNOWN, has 1 handler(s) for this service…

Handler(s):

“DEDICATED” established:0 refused:0

LOCAL SERVER

Service “dslot” has 1 instance(s).

Instance “dslot2”, status READY, has 1 handler(s) for this service…

Handler(s):

“DEDICATED” established:40 refused:0 state:ready

LOCAL SERVER

The command completed successfully

 

 

RAC10g Failover 테스트

 

ORACLE instance 강제
종료

RAC상의
임의의 instance
강제
종료시킨
경우, client WebLogic
서비스가 RAC
살아있는 instance failover
되어야
.

1호기의 instance
강제
종료시킨
직후, 2호기의 alert.log
다음과
같음

Wed Sep 21 16:31:31 2005

Reconfiguration started (old inc 5, new inc 6)

List of nodes:

1

Global Resource Directory frozen

* dead instance detected – domain 0 invalid = TRUE

Update rdomain variables

Communication channels reestablished

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

Wed Sep 21 16:31:31 2005

LMS 1: 0 GCS shadows cancelled, 0 closed

Wed Sep 21 16:31:31 2005

LMS 0: 0 GCS shadows cancelled, 0 closed

Set master node info

Submitted all remote-enqueue requests

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

Post SMON to start 1st pass IR

Wed Sep 21 16:31:32 2005

LMS 1: 2988 GCS shadows traversed, 0 replayed

Wed Sep 21 16:31:32 2005

LMS 0: 2871 GCS shadows traversed, 0 replayed

Wed Sep 21 16:31:32 2005

Submitted all GCS remote-cache requests

Post SMON to start 1st pass IR

Fix write in gcs resources

Wed Sep 21 16:31:32 2005

Instance recovery: looking for dead threads

Wed Sep 21 16:31:32 2005

Beginning instance recovery of 1 threads

Reconfiguration complete

Wed Sep 21 16:31:33 2005

Started redo scan

Wed Sep 21 16:31:33 2005

Completed redo scan

240 redo blocks read, 104 data blocks need recovery

Wed Sep 21 16:31:33 2005

Started redo application at

Thread 1: logseq 7, block 1392, scn 0.0

Wed Sep 21 16:31:33 2005

Recovery of Online Redo Log: Thread 1 Group 2 Seq 7 Reading mem 0

Mem# 0 errs 0: /dev/kblotdb_vgdb01/rredo112.dbf

Mem# 1 errs 0: /dev/kblotdb_vgdb02/rredo212.dbf

Wed Sep 21 16:31:33 2005

Completed redo application

Wed Sep 21 16:31:34 2005

Completed instance recovery at

Thread 1: logseq 7, block 1632, scn 0.3659612

84 data blocks read, 128 data blocks written, 240 redo blocks read

  • 장애 instance
    대한 instance recovery
    완전히
    이루어지는데 3
    정도
    소요됨을


    있음
  • 10g JDBC driver(THIN)
    쓰는 WebLogic5.1 connection pool
    살아있는 RAC instance failover되었음

     

     

 

DB 서버 shutdown

RAC상의
임의의 DB 서버에
장애를
발생시킨
경우, client WebLogic
서비스가 RAC
살아있는
서버(instance) failover
되어야
.

1호기의 DB 서버에
장애가
발생한
, 2호기의 log
다음과
같음

$ORA_CRS_HOME/css/log/ocssd2.log

2005-09-22 02:08:52.076 [4] >WARNING: clssnmeventhndlr: Receive failure with node 1, rc=11

2005-09-22 02:08:52.441 [3] >TRACE: clssnm_skgxncheck: CSS daemon failed on node 1

2005-09-22 02:08:55.330 [8] >WARNING: clssnmPollingThread: node(1) missed(4) checkin(s)

2005-09-22 02:08:56.340 [8] >WARNING: clssnmPollingThread: node(1) missed(5) checkin(s)

2005-09-22 02:08:57.350 [8] >WARNING: clssnmPollingThread: Eviction started for node 1, flags 0x0001, state 3, wt4c 0

2005-09-22 02:09:02.402 [8] >TRACE: clssnmDoSyncUpdate: Initiating sync 15

2005-09-22 02:09:02.402 [4] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] seq[1] sync[15]

2005-09-22 02:09:02.871 [1] >USER: NMEVENT_SUSPEND [00][00][00][04]

2005-09-22 02:09:06.441 [8] >TRACE: clssnmEvict: Evicting node 1, birth 10, death 0, killme 1

2005-09-22 02:09:06.443 [4] >USER: clssnmHandleUpdate: SYNC(15) from node(2) completed

2005-09-22 02:09:06.443 [4] >USER: clssnmHandleUpdate: NODE(2) IS ACTIVE MEMBER OF CLUSTER

2005-09-22 02:09:06.911 [13] >USER: NMEVENT_RECONFIG [00][00][00][04]

2005-09-22 02:09:06.911 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock DBDSLOT type 2

2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock DGDSLOT type 2

2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock DAALL_DB type 2

2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock ocr_crs type 2

2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock IGDSLOTALL type 2

2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock RES ora.dslot.dslot.dslot2.srv type 3

2005-09-22 02:09:06.912 [13] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 15

2005-09-22 02:09:06.912 [7] >TRACE: clssgmPeerListener: connects done (1/1)

CLSS-3000: reconfiguration successful, incarnation 15 with 1 nodes

 

CLSS-3001: local node number 2, master node number 2

 

2005-09-22 02:09:06.985 [13] >TRACE: clssnmpostev: leave event posted, node 1

  • 살아있는 2호기에서 1호기의
    장애를
    감지하고 1호기
    노드를 eviction했음을


    있음.

     

     

$ORA_CRS_HOME/crs/log/kblotdb2.log

2005-09-22 02:09:07.001: Processing MemberLeave

2005-09-22 02:09:07.001: [MEMBERLEAVE:717] Processing member leave for kblotdb1, incarnation: 15

2005-09-22 02:09:07.217: [RESOURCE:717] Not failing resource ora.dslot.dslot.dslot2.srv because it was locked.

2005-09-22 02:09:07.218: [RESOURCE:717] X_RES_Unavailable : Resource ora.dslot.dslot.dslot2.srv is locked

(File: rti.cpp, line: 812)

2005-09-22 02:09:07.351: Attempting to start ora.kblotdb1.vip on member kblotdb2

2005-09-22 02:09:35.059: Start of ora.kblotdb1.vip on member kblotdb2 succeeded.

2005-09-22 02:09:35.194: Attempting to start ora.dslot.dslot.cs on member kblotdb2

2005-09-22 02:09:35.755: Start of ora.dslot.dslot.cs on member kblotdb2 succeeded.

2005-09-22 02:09:35.865: Attempting to start ora.dslot.db on member kblotdb2

2005-09-22 02:09:36.319: Start of ora.dslot.db on member kblotdb2 succeeded.

2005-09-22 02:09:36.323: [MEMBERLEAVE:717] Do failover for: kblotdb1

2005-09-22 02:09:36.324: [MEMBERLEAVE:717] Post recovery done evmd event for: kblotdb1

  • 다음으로 CRS 1호기에
    있던 oracle VIP
    살아있는 2호기로 failover시켰음을


    있음.

     

     

netstat

kblotdb2|/home2/oracle10/work> netstat -in

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

lan3* 1500 none none 0 0 0 0 0

lan2 1500 10.55.49.0 10.55.49.207 12518382 0 12044400 0 0

lan1:1 1500 10.55.50.0 10.55.50.209 6161297 0 1696506 0 0

lan1 1500 10.55.50.0 10.55.50.202 12894636 0 22979733 0 0

lan0* 1500 192.1.1.0 192.1.1.2 2111713 0 1247138 0 0

lo0 4136 127.0.0.0 127.0.0.1 36147569 0 36147578 0 0

lan1:2 1500 10.55.50.0 10.55.50.208 1575 0 169 0 0

lan4* 1500 none none 0 0 0 0 0

  • 실제로 1호기의 oracle VIP 2호기의 lan1:2 failover됐음을


    있음.
  • WebLogic 서비스에
    문제
    없음

     

     

 

DB 서버
네트웍
장애

Public LAN 장애

정상인
경우 2호기의
네트웍
상황은
다음과
같음

kblotdb2|/home2/oracle10> netstat -in

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

lan3* 1500 none none 0 0 0 0 0

lan2 1500 10.55.49.0 10.55.49.207 12518881 0 12044830 0 0

lan1:1 1500 10.55.50.0 10.55.50.209 6485707 0 1702063 0 0

lan1 1500 10.55.50.0 10.55.50.202 12911405 0 23327953 0 0

lan0 1500 192.1.1.0 192.1.1.2 2112194 0 1247528 0 0

lo0 4136 127.0.0.0 127.0.0.1 36181974 0 36181983 0 0

lan1:2 1500 10.55.50.0 10.55.50.208 2914 0 272 0 0

lan4* 1500 none none 0 0 0 0 0

 

2호기의 public LAN lan1
네트웍을
절체한
경우
다음과
같이
변함

kblotdb2|/home2/oracle10> netstat -in

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

lan3 1500 10.55.50.0 10.55.50.202 1346 0 3331 0 0

lan2 1500 10.55.49.0 10.55.49.207 102765 0 102573 0 0

lan1* 1500 none none 13257 0 22091 0 0

lan0 1500 192.1.1.0 192.1.1.2 3766 0 6839 0 0

lo0 4136 127.0.0.0 127.0.0.1 140637 0 140637 0 0

lan3:1 1500 10.55.50.0 10.55.50.209 1 0 0 0 0

lan4* 1500 none none 2621 0 2677 0 0

  • Public IP standby였던 lan3으로
    이동됐으며, 이에
    따라 oracle VIP lan3:1
    올라와
    있음을


    있음
  • WebLogic 서비스에
    문제
    없음

     

2호기의 lan1
복구가
되면
아래처럼
원래대로
돌아오는
것을
확인할

있음

kblotdb2|/home2/oracle10> netstat -in

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

lan3* 1500 none none 3106 0 7692 0 0

lan2 1500 10.55.49.0 10.55.49.207 107982 0 107628 0 0

lan1:1 1500 10.55.50.0 10.55.50.209 12 0 0 0 0

lan1 1500 10.55.50.0 10.55.50.202 13752 0 22982 0 0

lan0 1500 192.1.1.0 192.1.1.2 4122 0 7513 0 0

lo0 4136 127.0.0.0 127.0.0.1 152489 0 152489 0 0

lan4* 1500 none none 2621 0 2677 0 0

 

 

cluster_interconnect LAN 장애

2호기의 cluster_interconnect LAN lan2
네트웍을
절체한
경우
다음과
같음

kblotdb2|/home2/oracle10> netstat -in

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

lan3 1500 10.55.49.0 10.55.49.207 5519 0 10620 0 0

lan2* 1500 none none 108070 0 107778 0 0

lan1:1 1500 10.55.50.0 10.55.50.209 487 0 13 0 0

lan1 1500 10.55.50.0 10.55.50.202 15595 0 24175 0 0

lan0 1500 192.1.1.0 192.1.1.2 4354 0 7953 0 0

lo0 4136 127.0.0.0 127.0.0.1 160463 0 160463 0 0

lan4* 1500 none none 2621 0 2677 0 0

  • Standby lan3으로 cluster_interconnect IP
    이동됐음을


    있음.
  • Oracle, WebLogic 모두
    서비스에
    문제
    없음

Comments

comments

haisins

오라클 DBA 박용석 입니다. haisins@gmail.com 으로 문의 주세요.

Oracle RAC 10g 의 Failover 테스트 방법”의 6개의 댓글

댓글 남기기

이메일은 공개되지 않습니다. 필수 입력창은 * 로 표시되어 있습니다