'詳解一張歷史數據庫性能的關鍵表--dba_hist_active_sess_history'

數據庫 DBA 歷史 SQL 波波說運維 2019-08-26

概述

在很多情況下，當數據庫發生性能問題的時候，我們並沒有機會來收集足夠的診斷信息，比如system state dump或者hang analyze，甚至問題發生的時候DBA根本不在場。這給我們診斷問題帶來很大的困難。那麼在這種情況下，我們是否能在事後收集一些信息來分析問題的原因呢？

今天主要介紹一種通過dba_hist_active_sess_history的數據來分析問題的一種方法，雖然通過awr也可以獲取相關信息。

思路

AWR和ASH採樣機制，有一個視圖gv$active_session_history會每秒鐘將數據庫所有節點的Active Session採樣一次，而dba_hist_active_sess_history則會將gv$active_session_history裡的數據每10秒採樣一次並持久化保存。基於這個特徵，可以通過分析dba_hist_active_sess_history的Session採樣情況，來定位問題發生的準確時間範圍，並且可以觀察每個採樣點的top event和top holder。

1. Dump出問題期間的ASH數據：

基於dba_hist_active_sess_history創建一個新表m_ash

SQL> create table m_ash as select * from dba_hist_active_sess_history where SAMPLE_TIME between 
TO_TIMESTAMP ('<time_begin>', 'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP ('<time_end>', 'YYYY-MM-DD HH24:MI:SS');

概述

今天主要介紹一種通過dba_hist_active_sess_history的數據來分析問題的一種方法，雖然通過awr也可以獲取相關信息。

思路

1. Dump出問題期間的ASH數據：

基於dba_hist_active_sess_history創建一個新表m_ash

SQL> create table m_ash as select * from dba_hist_active_sess_history where SAMPLE_TIME between 
TO_TIMESTAMP ('<time_begin>', 'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP ('<time_end>', 'YYYY-MM-DD HH24:MI:SS');

詳解一張歷史數據庫性能的關鍵表--dba_hist_active_sess_history

2. 查看ASH時間範圍:

為了加快速度這裡採用了並行查詢。

alter session set nls_timestamp_format='yyyy-mm-dd hh24:mi:ss.ff';
select /*+ parallel 8 */
 t.dbid,
 t.instance_number,
 min(sample_time),
 max(sample_time),
 count(*) session_count
 from m_ash t
 group by t.dbid, t.instance_number
 order by dbid, instance_number;

概述

今天主要介紹一種通過dba_hist_active_sess_history的數據來分析問題的一種方法，雖然通過awr也可以獲取相關信息。

思路

1. Dump出問題期間的ASH數據：

基於dba_hist_active_sess_history創建一個新表m_ash

SQL> create table m_ash as select * from dba_hist_active_sess_history where SAMPLE_TIME between 
TO_TIMESTAMP ('<time_begin>', 'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP ('<time_end>', 'YYYY-MM-DD HH24:MI:SS');

2. 查看ASH時間範圍:

為了加快速度這裡採用了並行查詢。

alter session set nls_timestamp_format='yyyy-mm-dd hh24:mi:ss.ff';
select /*+ parallel 8 */
 t.dbid,
 t.instance_number,
 min(sample_time),
 max(sample_time),
 count(*) session_count
 from m_ash t
 group by t.dbid, t.instance_number
 order by dbid, instance_number;

從以上輸出可知該數據庫共2個節點，採樣時間共30分鐘，節點2的採樣比節點1要多很多，問題可能發生在節點2上。

3. 確認問題發生的精確時間範圍:

select /*+ parallel 8 */
 dbid, instance_number, sample_id, sample_time, count(*) session_count
 from m_ash t
 group by dbid, instance_number, sample_id, sample_time
 order by dbid, instance_number, sample_time;

概述

今天主要介紹一種通過dba_hist_active_sess_history的數據來分析問題的一種方法，雖然通過awr也可以獲取相關信息。

思路

1. Dump出問題期間的ASH數據：

基於dba_hist_active_sess_history創建一個新表m_ash

SQL> create table m_ash as select * from dba_hist_active_sess_history where SAMPLE_TIME between 
TO_TIMESTAMP ('<time_begin>', 'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP ('<time_end>', 'YYYY-MM-DD HH24:MI:SS');

2. 查看ASH時間範圍:

為了加快速度這裡採用了並行查詢。

alter session set nls_timestamp_format='yyyy-mm-dd hh24:mi:ss.ff';
select /*+ parallel 8 */
 t.dbid,
 t.instance_number,
 min(sample_time),
 max(sample_time),
 count(*) session_count
 from m_ash t
 group by t.dbid, t.instance_number
 order by dbid, instance_number;

從以上輸出可知該數據庫共2個節點，採樣時間共30分鐘，節點2的採樣比節點1要多很多，問題可能發生在節點2上。

3. 確認問題發生的精確時間範圍:

select /*+ parallel 8 */
 dbid, instance_number, sample_id, sample_time, count(*) session_count
 from m_ash t
 group by dbid, instance_number, sample_id, sample_time
 order by dbid, instance_number, sample_time;

概述

今天主要介紹一種通過dba_hist_active_sess_history的數據來分析問題的一種方法，雖然通過awr也可以獲取相關信息。

思路

1. Dump出問題期間的ASH數據：

基於dba_hist_active_sess_history創建一個新表m_ash

SQL> create table m_ash as select * from dba_hist_active_sess_history where SAMPLE_TIME between 
TO_TIMESTAMP ('<time_begin>', 'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP ('<time_end>', 'YYYY-MM-DD HH24:MI:SS');

2. 查看ASH時間範圍:

為了加快速度這裡採用了並行查詢。

alter session set nls_timestamp_format='yyyy-mm-dd hh24:mi:ss.ff';
select /*+ parallel 8 */
 t.dbid,
 t.instance_number,
 min(sample_time),
 max(sample_time),
 count(*) session_count
 from m_ash t
 group by t.dbid, t.instance_number
 order by dbid, instance_number;

從以上輸出可知該數據庫共2個節點，採樣時間共30分鐘，節點2的採樣比節點1要多很多，問題可能發生在節點2上。

3. 確認問題發生的精確時間範圍:

select /*+ parallel 8 */
 dbid, instance_number, sample_id, sample_time, count(*) session_count
 from m_ash t
 group by dbid, instance_number, sample_id, sample_time
 order by dbid, instance_number, sample_time;

注意觀察以上輸出的每個採樣點的active session的數量，數量突然變多往往意味著問題發生了。

注意: 觀察以上的輸出有無斷檔，比如某些時間沒有采樣。

4. 確定每個採樣點的top n event:

在這裡指定的是top 2 event，並且注掉了採樣時間以觀察所有采樣點的情況。如果數據量較多，您也可以通過開啟sample_time的註釋來觀察某個時間段的情況。注意最後一列session_count指的是該採樣點上的等待該event的session數量。

select t.dbid,
 t.sample_id,
 t.sample_time,
 t.instance_number,
 t.event,
 t.session_state,
 t.c session_count
 from (select t.*,
 rank() over(partition by dbid, instance_number, sample_time order by c desc) r
 from (select /*+ parallel 8 */
 t.*,
 count(*) over(partition by dbid, instance_number, sample_time, event) c,
 row_number() over(partition by dbid, instance_number, sample_time, event order by 1) r1
 from m_ash t
 /*where sample_time >
 to_timestamp('2019-08-22 13:59:00',
 'yyyy-mm-dd hh24:mi:ss')
 and sample_time <
 to_timestamp('2019-08-22 14:10:00',
 'yyyy-mm-dd hh24:mi:ss')*/
 ) t
 where r1 = 1) t
 where r < 3
 order by session_count desc,dbid, instance_number, sample_time, r;

概述

今天主要介紹一種通過dba_hist_active_sess_history的數據來分析問題的一種方法，雖然通過awr也可以獲取相關信息。

思路

1. Dump出問題期間的ASH數據：

基於dba_hist_active_sess_history創建一個新表m_ash

SQL> create table m_ash as select * from dba_hist_active_sess_history where SAMPLE_TIME between 
TO_TIMESTAMP ('<time_begin>', 'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP ('<time_end>', 'YYYY-MM-DD HH24:MI:SS');

2. 查看ASH時間範圍:

為了加快速度這裡採用了並行查詢。

alter session set nls_timestamp_format='yyyy-mm-dd hh24:mi:ss.ff';
select /*+ parallel 8 */
 t.dbid,
 t.instance_number,
 min(sample_time),
 max(sample_time),
 count(*) session_count
 from m_ash t
 group by t.dbid, t.instance_number
 order by dbid, instance_number;

從以上輸出可知該數據庫共2個節點，採樣時間共30分鐘，節點2的採樣比節點1要多很多，問題可能發生在節點2上。

3. 確認問題發生的精確時間範圍:

select /*+ parallel 8 */
 dbid, instance_number, sample_id, sample_time, count(*) session_count
 from m_ash t
 group by dbid, instance_number, sample_id, sample_time
 order by dbid, instance_number, sample_time;

注意觀察以上輸出的每個採樣點的active session的數量，數量突然變多往往意味著問題發生了。

注意: 觀察以上的輸出有無斷檔，比如某些時間沒有采樣。

4. 確定每個採樣點的top n event:

select t.dbid,
 t.sample_id,
 t.sample_time,
 t.instance_number,
 t.event,
 t.session_state,
 t.c session_count
 from (select t.*,
 rank() over(partition by dbid, instance_number, sample_time order by c desc) r
 from (select /*+ parallel 8 */
 t.*,
 count(*) over(partition by dbid, instance_number, sample_time, event) c,
 row_number() over(partition by dbid, instance_number, sample_time, event order by 1) r1
 from m_ash t
 /*where sample_time >
 to_timestamp('2019-08-22 13:59:00',
 'yyyy-mm-dd hh24:mi:ss')
 and sample_time <
 to_timestamp('2019-08-22 14:10:00',
 'yyyy-mm-dd hh24:mi:ss')*/
 ) t
 where r1 = 1) t
 where r < 3
 order by session_count desc,dbid, instance_number, sample_time, r;

概述

今天主要介紹一種通過dba_hist_active_sess_history的數據來分析問題的一種方法，雖然通過awr也可以獲取相關信息。

思路

1. Dump出問題期間的ASH數據：

基於dba_hist_active_sess_history創建一個新表m_ash

SQL> create table m_ash as select * from dba_hist_active_sess_history where SAMPLE_TIME between 
TO_TIMESTAMP ('<time_begin>', 'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP ('<time_end>', 'YYYY-MM-DD HH24:MI:SS');

2. 查看ASH時間範圍:

為了加快速度這裡採用了並行查詢。

alter session set nls_timestamp_format='yyyy-mm-dd hh24:mi:ss.ff';
select /*+ parallel 8 */
 t.dbid,
 t.instance_number,
 min(sample_time),
 max(sample_time),
 count(*) session_count
 from m_ash t
 group by t.dbid, t.instance_number
 order by dbid, instance_number;

從以上輸出可知該數據庫共2個節點，採樣時間共30分鐘，節點2的採樣比節點1要多很多，問題可能發生在節點2上。

3. 確認問題發生的精確時間範圍:

select /*+ parallel 8 */
 dbid, instance_number, sample_id, sample_time, count(*) session_count
 from m_ash t
 group by dbid, instance_number, sample_id, sample_time
 order by dbid, instance_number, sample_time;

注意觀察以上輸出的每個採樣點的active session的數量，數量突然變多往往意味著問題發生了。

注意: 觀察以上的輸出有無斷檔，比如某些時間沒有采樣。

4. 確定每個採樣點的top n event:

select t.dbid,
 t.sample_id,
 t.sample_time,
 t.instance_number,
 t.event,
 t.session_state,
 t.c session_count
 from (select t.*,
 rank() over(partition by dbid, instance_number, sample_time order by c desc) r
 from (select /*+ parallel 8 */
 t.*,
 count(*) over(partition by dbid, instance_number, sample_time, event) c,
 row_number() over(partition by dbid, instance_number, sample_time, event order by 1) r1
 from m_ash t
 /*where sample_time >
 to_timestamp('2019-08-22 13:59:00',
 'yyyy-mm-dd hh24:mi:ss')
 and sample_time <
 to_timestamp('2019-08-22 14:10:00',
 'yyyy-mm-dd hh24:mi:ss')*/
 ) t
 where r1 = 1) t
 where r < 3
 order by session_count desc,dbid, instance_number, sample_time, r;

從以上輸出我們可以發現問題期間最嚴重的等待為gc buffer busy acquire，高峰期等待該event的session數達到了16個，其次為db file sequential read，高峰期session為5個。

5. 觀察每個採樣點的等待鏈

其原理為通過dba_hist_active_sess_history. blocking_session記錄的holder來通過connect by級聯查詢，找出最終的holder. 在RAC環境中，每個節點的ASH採樣的時間很多情況下並不是一致的，因此可以通過將本SQL的第二段註釋的sample_time稍作修改讓不同節點相差1秒的採樣時間可以比較(注意最好也將partition by中的sample_time做相應修改)。該輸出中isleaf=1的都是最終holder，而iscycle=1的代表死鎖了(也就是在同一個採樣點中a等b，b等c，而c又等a，這種情況如果持續發生，那麼尤其值得關注)。採用如下查詢能觀察到阻塞鏈。

select /*+ parallel 8 */
 level lv,
 connect_by_isleaf isleaf,
 connect_by_iscycle iscycle,
 t.dbid,
 t.sample_id,
 t.sample_time,
 t.instance_number,
 t.session_id,
 t.sql_id,
 t.session_type,
 t.event,
 t.session_state,
 t.blocking_inst_id,
 t.blocking_session,
 t.blocking_session_status
 from m_ash t
/*where sample_time >
 to_timestamp('2019-08-22 13:55:00',
 'yyyy-mm-dd hh24:mi:ss')
and sample_time <
 to_timestamp('2019-08-22 14:10:00',
 'yyyy-mm-dd hh24:mi:ss')*/
 start with blocking_session is not null
connect by nocycle
 prior dbid = dbid
 and prior sample_time = sample_time
 /*and ((prior sample_time) - sample_time between interval '-1'
 second and interval '1' second)*/
 and prior blocking_inst_id = instance_number
 and prior blocking_session = session_id
 and prior blocking_session_serial# = session_serial#;

概述

今天主要介紹一種通過dba_hist_active_sess_history的數據來分析問題的一種方法，雖然通過awr也可以獲取相關信息。

思路

1. Dump出問題期間的ASH數據：

基於dba_hist_active_sess_history創建一個新表m_ash

SQL> create table m_ash as select * from dba_hist_active_sess_history where SAMPLE_TIME between 
TO_TIMESTAMP ('<time_begin>', 'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP ('<time_end>', 'YYYY-MM-DD HH24:MI:SS');

2. 查看ASH時間範圍:

為了加快速度這裡採用了並行查詢。

alter session set nls_timestamp_format='yyyy-mm-dd hh24:mi:ss.ff';
select /*+ parallel 8 */
 t.dbid,
 t.instance_number,
 min(sample_time),
 max(sample_time),
 count(*) session_count
 from m_ash t
 group by t.dbid, t.instance_number
 order by dbid, instance_number;

從以上輸出可知該數據庫共2個節點，採樣時間共30分鐘，節點2的採樣比節點1要多很多，問題可能發生在節點2上。

3. 確認問題發生的精確時間範圍:

select /*+ parallel 8 */
 dbid, instance_number, sample_id, sample_time, count(*) session_count
 from m_ash t
 group by dbid, instance_number, sample_id, sample_time
 order by dbid, instance_number, sample_time;

注意觀察以上輸出的每個採樣點的active session的數量，數量突然變多往往意味著問題發生了。

注意: 觀察以上的輸出有無斷檔，比如某些時間沒有采樣。

4. 確定每個採樣點的top n event:

select t.dbid,
 t.sample_id,
 t.sample_time,
 t.instance_number,
 t.event,
 t.session_state,
 t.c session_count
 from (select t.*,
 rank() over(partition by dbid, instance_number, sample_time order by c desc) r
 from (select /*+ parallel 8 */
 t.*,
 count(*) over(partition by dbid, instance_number, sample_time, event) c,
 row_number() over(partition by dbid, instance_number, sample_time, event order by 1) r1
 from m_ash t
 /*where sample_time >
 to_timestamp('2019-08-22 13:59:00',
 'yyyy-mm-dd hh24:mi:ss')
 and sample_time <
 to_timestamp('2019-08-22 14:10:00',
 'yyyy-mm-dd hh24:mi:ss')*/
 ) t
 where r1 = 1) t
 where r < 3
 order by session_count desc,dbid, instance_number, sample_time, r;

5. 觀察每個採樣點的等待鏈

select /*+ parallel 8 */
 level lv,
 connect_by_isleaf isleaf,
 connect_by_iscycle iscycle,
 t.dbid,
 t.sample_id,
 t.sample_time,
 t.instance_number,
 t.session_id,
 t.sql_id,
 t.session_type,
 t.event,
 t.session_state,
 t.blocking_inst_id,
 t.blocking_session,
 t.blocking_session_status
 from m_ash t
/*where sample_time >
 to_timestamp('2019-08-22 13:55:00',
 'yyyy-mm-dd hh24:mi:ss')
and sample_time <
 to_timestamp('2019-08-22 14:10:00',
 'yyyy-mm-dd hh24:mi:ss')*/
 start with blocking_session is not null
connect by nocycle
 prior dbid = dbid
 and prior sample_time = sample_time
 /*and ((prior sample_time) - sample_time between interval '-1'
 second and interval '1' second)*/
 and prior blocking_inst_id = instance_number
 and prior blocking_session = session_id
 and prior blocking_session_serial# = session_serial#;

從上面的輸出可見，在相同的採樣點上(2019-08-22 14:00:05.077)，節點1 session 165、319、396、473、625、857全部在等待gc buffer busy acquire，其被節點2 session 1090阻塞。

6. 基於第5步的原理來找出每個採樣點的最終top holder:

比如如下SQL列出了每個採樣點top 2的blocker session，並且計算了其最終阻塞的session數(參考blocking_session_count)

select t.lv,
 t.iscycle,
 t.dbid,
 t.sample_id,
 t.sample_time,
 t.instance_number,
 t.session_id,
 t.sql_id,
 t.session_type,
 t.event,
 t.seq#,
 t.session_state,
 t.blocking_inst_id,
 t.blocking_session,
 t.blocking_session_status,
 t.c blocking_session_count
 from (select t.*,
 row_number() over(partition by dbid, instance_number, sample_time order by c desc) r
 from (select t.*,
 count(*) over(partition by dbid, instance_number, sample_time, session_id) c,
 row_number() over(partition by dbid, instance_number, sample_time, session_id order by 1) r1
 from (select /*+ parallel 8 */
 level lv,
 connect_by_isleaf isleaf,
 connect_by_iscycle iscycle,
 t.*
 from m_ash t
 /*where sample_time >
 to_timestamp('2013-11-17 13:55:00',
 'yyyy-mm-dd hh24:mi:ss')
 and sample_time <
 to_timestamp('2013-11-17 14:10:00',
 'yyyy-mm-dd hh24:mi:ss')*/
 start with blocking_session is not null
 connect by nocycle
 prior dbid = dbid
 and prior sample_time = sample_time
 /*and ((prior sample_time) - sample_time between interval '-1'
 second and interval '1' second)*/
 and prior blocking_inst_id = instance_number
 and prior blocking_session = session_id
 and prior
 blocking_session_serial# = session_serial#) t
 where t.isleaf = 1) t
 where r1 = 1) t
 where r < 3
 order by blocking_session_count desc,dbid, sample_time, r;

概述

今天主要介紹一種通過dba_hist_active_sess_history的數據來分析問題的一種方法，雖然通過awr也可以獲取相關信息。

思路

1. Dump出問題期間的ASH數據：

基於dba_hist_active_sess_history創建一個新表m_ash

SQL> create table m_ash as select * from dba_hist_active_sess_history where SAMPLE_TIME between 
TO_TIMESTAMP ('<time_begin>', 'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP ('<time_end>', 'YYYY-MM-DD HH24:MI:SS');

2. 查看ASH時間範圍:

為了加快速度這裡採用了並行查詢。

alter session set nls_timestamp_format='yyyy-mm-dd hh24:mi:ss.ff';
select /*+ parallel 8 */
 t.dbid,
 t.instance_number,
 min(sample_time),
 max(sample_time),
 count(*) session_count
 from m_ash t
 group by t.dbid, t.instance_number
 order by dbid, instance_number;

從以上輸出可知該數據庫共2個節點，採樣時間共30分鐘，節點2的採樣比節點1要多很多，問題可能發生在節點2上。

3. 確認問題發生的精確時間範圍:

select /*+ parallel 8 */
 dbid, instance_number, sample_id, sample_time, count(*) session_count
 from m_ash t
 group by dbid, instance_number, sample_id, sample_time
 order by dbid, instance_number, sample_time;

注意觀察以上輸出的每個採樣點的active session的數量，數量突然變多往往意味著問題發生了。

注意: 觀察以上的輸出有無斷檔，比如某些時間沒有采樣。

4. 確定每個採樣點的top n event:

select t.dbid,
 t.sample_id,
 t.sample_time,
 t.instance_number,
 t.event,
 t.session_state,
 t.c session_count
 from (select t.*,
 rank() over(partition by dbid, instance_number, sample_time order by c desc) r
 from (select /*+ parallel 8 */
 t.*,
 count(*) over(partition by dbid, instance_number, sample_time, event) c,
 row_number() over(partition by dbid, instance_number, sample_time, event order by 1) r1
 from m_ash t
 /*where sample_time >
 to_timestamp('2019-08-22 13:59:00',
 'yyyy-mm-dd hh24:mi:ss')
 and sample_time <
 to_timestamp('2019-08-22 14:10:00',
 'yyyy-mm-dd hh24:mi:ss')*/
 ) t
 where r1 = 1) t
 where r < 3
 order by session_count desc,dbid, instance_number, sample_time, r;

5. 觀察每個採樣點的等待鏈

select /*+ parallel 8 */
 level lv,
 connect_by_isleaf isleaf,
 connect_by_iscycle iscycle,
 t.dbid,
 t.sample_id,
 t.sample_time,
 t.instance_number,
 t.session_id,
 t.sql_id,
 t.session_type,
 t.event,
 t.session_state,
 t.blocking_inst_id,
 t.blocking_session,
 t.blocking_session_status
 from m_ash t
/*where sample_time >
 to_timestamp('2019-08-22 13:55:00',
 'yyyy-mm-dd hh24:mi:ss')
and sample_time <
 to_timestamp('2019-08-22 14:10:00',
 'yyyy-mm-dd hh24:mi:ss')*/
 start with blocking_session is not null
connect by nocycle
 prior dbid = dbid
 and prior sample_time = sample_time
 /*and ((prior sample_time) - sample_time between interval '-1'
 second and interval '1' second)*/
 and prior blocking_inst_id = instance_number
 and prior blocking_session = session_id
 and prior blocking_session_serial# = session_serial#;

6. 基於第5步的原理來找出每個採樣點的最終top holder:

比如如下SQL列出了每個採樣點top 2的blocker session，並且計算了其最終阻塞的session數(參考blocking_session_count)

select t.lv,
 t.iscycle,
 t.dbid,
 t.sample_id,
 t.sample_time,
 t.instance_number,
 t.session_id,
 t.sql_id,
 t.session_type,
 t.event,
 t.seq#,
 t.session_state,
 t.blocking_inst_id,
 t.blocking_session,
 t.blocking_session_status,
 t.c blocking_session_count
 from (select t.*,
 row_number() over(partition by dbid, instance_number, sample_time order by c desc) r
 from (select t.*,
 count(*) over(partition by dbid, instance_number, sample_time, session_id) c,
 row_number() over(partition by dbid, instance_number, sample_time, session_id order by 1) r1
 from (select /*+ parallel 8 */
 level lv,
 connect_by_isleaf isleaf,
 connect_by_iscycle iscycle,
 t.*
 from m_ash t
 /*where sample_time >
 to_timestamp('2013-11-17 13:55:00',
 'yyyy-mm-dd hh24:mi:ss')
 and sample_time <
 to_timestamp('2013-11-17 14:10:00',
 'yyyy-mm-dd hh24:mi:ss')*/
 start with blocking_session is not null
 connect by nocycle
 prior dbid = dbid
 and prior sample_time = sample_time
 /*and ((prior sample_time) - sample_time between interval '-1'
 second and interval '1' second)*/
 and prior blocking_inst_id = instance_number
 and prior blocking_session = session_id
 and prior
 blocking_session_serial# = session_serial#) t
 where t.isleaf = 1) t
 where r1 = 1) t
 where r < 3
 order by blocking_session_count desc,dbid, sample_time, r;

注意以上輸出，比如第一行，代表在22-AUG-19 02.00.05.077 PM，節點2的session 1090最終阻塞了7個session. 順著時間往下看，可見節點2存在多個session都堵塞了接近10個session,其中會話10是問題期間最嚴重的holder，並且它持續等待gc cr block busy，注意觀察其seq#會發現該event的seq#在不斷變化，表明該session並未完全hang住，涉及sql為4ksvn2rgjnhcm，可以結合Scheduler/MMAN/MMNL的trace以及dba_hist_memory_resize_ops的輸出進一步確定問題。

7.查看具體sql信息

select listagg(sql_text,' ') within group (order by piece)
 from v$sqltext
 where sql_id = '4ksvn2rgjnhcm'
 group by sql_id

概述

今天主要介紹一種通過dba_hist_active_sess_history的數據來分析問題的一種方法，雖然通過awr也可以獲取相關信息。

思路

1. Dump出問題期間的ASH數據：

基於dba_hist_active_sess_history創建一個新表m_ash

SQL> create table m_ash as select * from dba_hist_active_sess_history where SAMPLE_TIME between 
TO_TIMESTAMP ('<time_begin>', 'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP ('<time_end>', 'YYYY-MM-DD HH24:MI:SS');

2. 查看ASH時間範圍:

為了加快速度這裡採用了並行查詢。

alter session set nls_timestamp_format='yyyy-mm-dd hh24:mi:ss.ff';
select /*+ parallel 8 */
 t.dbid,
 t.instance_number,
 min(sample_time),
 max(sample_time),
 count(*) session_count
 from m_ash t
 group by t.dbid, t.instance_number
 order by dbid, instance_number;

從以上輸出可知該數據庫共2個節點，採樣時間共30分鐘，節點2的採樣比節點1要多很多，問題可能發生在節點2上。

3. 確認問題發生的精確時間範圍:

select /*+ parallel 8 */
 dbid, instance_number, sample_id, sample_time, count(*) session_count
 from m_ash t
 group by dbid, instance_number, sample_id, sample_time
 order by dbid, instance_number, sample_time;

注意觀察以上輸出的每個採樣點的active session的數量，數量突然變多往往意味著問題發生了。

注意: 觀察以上的輸出有無斷檔，比如某些時間沒有采樣。

4. 確定每個採樣點的top n event:

select t.dbid,
 t.sample_id,
 t.sample_time,
 t.instance_number,
 t.event,
 t.session_state,
 t.c session_count
 from (select t.*,
 rank() over(partition by dbid, instance_number, sample_time order by c desc) r
 from (select /*+ parallel 8 */
 t.*,
 count(*) over(partition by dbid, instance_number, sample_time, event) c,
 row_number() over(partition by dbid, instance_number, sample_time, event order by 1) r1
 from m_ash t
 /*where sample_time >
 to_timestamp('2019-08-22 13:59:00',
 'yyyy-mm-dd hh24:mi:ss')
 and sample_time <
 to_timestamp('2019-08-22 14:10:00',
 'yyyy-mm-dd hh24:mi:ss')*/
 ) t
 where r1 = 1) t
 where r < 3
 order by session_count desc,dbid, instance_number, sample_time, r;

5. 觀察每個採樣點的等待鏈

select /*+ parallel 8 */
 level lv,
 connect_by_isleaf isleaf,
 connect_by_iscycle iscycle,
 t.dbid,
 t.sample_id,
 t.sample_time,
 t.instance_number,
 t.session_id,
 t.sql_id,
 t.session_type,
 t.event,
 t.session_state,
 t.blocking_inst_id,
 t.blocking_session,
 t.blocking_session_status
 from m_ash t
/*where sample_time >
 to_timestamp('2019-08-22 13:55:00',
 'yyyy-mm-dd hh24:mi:ss')
and sample_time <
 to_timestamp('2019-08-22 14:10:00',
 'yyyy-mm-dd hh24:mi:ss')*/
 start with blocking_session is not null
connect by nocycle
 prior dbid = dbid
 and prior sample_time = sample_time
 /*and ((prior sample_time) - sample_time between interval '-1'
 second and interval '1' second)*/
 and prior blocking_inst_id = instance_number
 and prior blocking_session = session_id
 and prior blocking_session_serial# = session_serial#;

6. 基於第5步的原理來找出每個採樣點的最終top holder:

比如如下SQL列出了每個採樣點top 2的blocker session，並且計算了其最終阻塞的session數(參考blocking_session_count)

select t.lv,
 t.iscycle,
 t.dbid,
 t.sample_id,
 t.sample_time,
 t.instance_number,
 t.session_id,
 t.sql_id,
 t.session_type,
 t.event,
 t.seq#,
 t.session_state,
 t.blocking_inst_id,
 t.blocking_session,
 t.blocking_session_status,
 t.c blocking_session_count
 from (select t.*,
 row_number() over(partition by dbid, instance_number, sample_time order by c desc) r
 from (select t.*,
 count(*) over(partition by dbid, instance_number, sample_time, session_id) c,
 row_number() over(partition by dbid, instance_number, sample_time, session_id order by 1) r1
 from (select /*+ parallel 8 */
 level lv,
 connect_by_isleaf isleaf,
 connect_by_iscycle iscycle,
 t.*
 from m_ash t
 /*where sample_time >
 to_timestamp('2013-11-17 13:55:00',
 'yyyy-mm-dd hh24:mi:ss')
 and sample_time <
 to_timestamp('2013-11-17 14:10:00',
 'yyyy-mm-dd hh24:mi:ss')*/
 start with blocking_session is not null
 connect by nocycle
 prior dbid = dbid
 and prior sample_time = sample_time
 /*and ((prior sample_time) - sample_time between interval '-1'
 second and interval '1' second)*/
 and prior blocking_inst_id = instance_number
 and prior blocking_session = session_id
 and prior
 blocking_session_serial# = session_serial#) t
 where t.isleaf = 1) t
 where r1 = 1) t
 where r < 3
 order by blocking_session_count desc,dbid, sample_time, r;

7.查看具體sql信息

select listagg(sql_text,' ') within group (order by piece)
 from v$sqltext
 where sql_id = '4ksvn2rgjnhcm'
 group by sql_id

注意：

1) blocking_session_count 指某一個holder最終阻塞的session數，比如 a <- b<- c （a被b阻塞,b又被c阻塞），只計算c阻塞了1個session，因為中間的b可能在不同的阻塞鏈中發生重複。

2) 如果最終的holder沒有被ash採樣(一般因為該holder處於空閒)，比如 a<- c 並且b<- c (a被c阻塞，並且b也被c阻塞)，但是c沒有采樣，那麼以上腳本無法將c統計到最終holder裡，這可能會導致一些遺漏。

3) 注意比較blocking_session_count的數量與第3步查詢的每個採樣點的總session_count數，如果每個採樣點的blocking_session_count數遠小於總session_count數，那表明大部分session並未記載holder，因此本查詢的結果並不能代表什麼。

後面會分享更多devops和DBA方面的內容，感興趣的朋友可以關注下~

概述

今天主要介紹一種通過dba_hist_active_sess_history的數據來分析問題的一種方法，雖然通過awr也可以獲取相關信息。

思路

1. Dump出問題期間的ASH數據：

基於dba_hist_active_sess_history創建一個新表m_ash

SQL> create table m_ash as select * from dba_hist_active_sess_history where SAMPLE_TIME between 
TO_TIMESTAMP ('<time_begin>', 'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP ('<time_end>', 'YYYY-MM-DD HH24:MI:SS');

2. 查看ASH時間範圍:

為了加快速度這裡採用了並行查詢。

alter session set nls_timestamp_format='yyyy-mm-dd hh24:mi:ss.ff';
select /*+ parallel 8 */
 t.dbid,
 t.instance_number,
 min(sample_time),
 max(sample_time),
 count(*) session_count
 from m_ash t
 group by t.dbid, t.instance_number
 order by dbid, instance_number;

從以上輸出可知該數據庫共2個節點，採樣時間共30分鐘，節點2的採樣比節點1要多很多，問題可能發生在節點2上。

3. 確認問題發生的精確時間範圍:

select /*+ parallel 8 */
 dbid, instance_number, sample_id, sample_time, count(*) session_count
 from m_ash t
 group by dbid, instance_number, sample_id, sample_time
 order by dbid, instance_number, sample_time;

注意觀察以上輸出的每個採樣點的active session的數量，數量突然變多往往意味著問題發生了。

注意: 觀察以上的輸出有無斷檔，比如某些時間沒有采樣。

4. 確定每個採樣點的top n event:

select t.dbid,
 t.sample_id,
 t.sample_time,
 t.instance_number,
 t.event,
 t.session_state,
 t.c session_count
 from (select t.*,
 rank() over(partition by dbid, instance_number, sample_time order by c desc) r
 from (select /*+ parallel 8 */
 t.*,
 count(*) over(partition by dbid, instance_number, sample_time, event) c,
 row_number() over(partition by dbid, instance_number, sample_time, event order by 1) r1
 from m_ash t
 /*where sample_time >
 to_timestamp('2019-08-22 13:59:00',
 'yyyy-mm-dd hh24:mi:ss')
 and sample_time <
 to_timestamp('2019-08-22 14:10:00',
 'yyyy-mm-dd hh24:mi:ss')*/
 ) t
 where r1 = 1) t
 where r < 3
 order by session_count desc,dbid, instance_number, sample_time, r;

5. 觀察每個採樣點的等待鏈

select /*+ parallel 8 */
 level lv,
 connect_by_isleaf isleaf,
 connect_by_iscycle iscycle,
 t.dbid,
 t.sample_id,
 t.sample_time,
 t.instance_number,
 t.session_id,
 t.sql_id,
 t.session_type,
 t.event,
 t.session_state,
 t.blocking_inst_id,
 t.blocking_session,
 t.blocking_session_status
 from m_ash t
/*where sample_time >
 to_timestamp('2019-08-22 13:55:00',
 'yyyy-mm-dd hh24:mi:ss')
and sample_time <
 to_timestamp('2019-08-22 14:10:00',
 'yyyy-mm-dd hh24:mi:ss')*/
 start with blocking_session is not null
connect by nocycle
 prior dbid = dbid
 and prior sample_time = sample_time
 /*and ((prior sample_time) - sample_time between interval '-1'
 second and interval '1' second)*/
 and prior blocking_inst_id = instance_number
 and prior blocking_session = session_id
 and prior blocking_session_serial# = session_serial#;

6. 基於第5步的原理來找出每個採樣點的最終top holder:

比如如下SQL列出了每個採樣點top 2的blocker session，並且計算了其最終阻塞的session數(參考blocking_session_count)

select t.lv,
 t.iscycle,
 t.dbid,
 t.sample_id,
 t.sample_time,
 t.instance_number,
 t.session_id,
 t.sql_id,
 t.session_type,
 t.event,
 t.seq#,
 t.session_state,
 t.blocking_inst_id,
 t.blocking_session,
 t.blocking_session_status,
 t.c blocking_session_count
 from (select t.*,
 row_number() over(partition by dbid, instance_number, sample_time order by c desc) r
 from (select t.*,
 count(*) over(partition by dbid, instance_number, sample_time, session_id) c,
 row_number() over(partition by dbid, instance_number, sample_time, session_id order by 1) r1
 from (select /*+ parallel 8 */
 level lv,
 connect_by_isleaf isleaf,
 connect_by_iscycle iscycle,
 t.*
 from m_ash t
 /*where sample_time >
 to_timestamp('2013-11-17 13:55:00',
 'yyyy-mm-dd hh24:mi:ss')
 and sample_time <
 to_timestamp('2013-11-17 14:10:00',
 'yyyy-mm-dd hh24:mi:ss')*/
 start with blocking_session is not null
 connect by nocycle
 prior dbid = dbid
 and prior sample_time = sample_time
 /*and ((prior sample_time) - sample_time between interval '-1'
 second and interval '1' second)*/
 and prior blocking_inst_id = instance_number
 and prior blocking_session = session_id
 and prior
 blocking_session_serial# = session_serial#) t
 where t.isleaf = 1) t
 where r1 = 1) t
 where r < 3
 order by blocking_session_count desc,dbid, sample_time, r;

7.查看具體sql信息

select listagg(sql_text,' ') within group (order by piece)
 from v$sqltext
 where sql_id = '4ksvn2rgjnhcm'
 group by sql_id

注意：

後面會分享更多devops和DBA方面的內容，感興趣的朋友可以關注下~

'詳解一張歷史數據庫性能的關鍵表--dba_hist_active_sess_history'

概述

思路

1. Dump出問題期間的ASH數據：

概述

思路

1. Dump出問題期間的ASH數據：

2. 查看ASH時間範圍:

概述

思路

1. Dump出問題期間的ASH數據：

2. 查看ASH時間範圍:

3. 確認問題發生的精確時間範圍:

概述

思路

1. Dump出問題期間的ASH數據：

2. 查看ASH時間範圍:

3. 確認問題發生的精確時間範圍:

概述

思路

1. Dump出問題期間的ASH數據：

2. 查看ASH時間範圍:

3. 確認問題發生的精確時間範圍:

4. 確定每個採樣點的top n event:

概述

思路

1. Dump出問題期間的ASH數據：

2. 查看ASH時間範圍:

3. 確認問題發生的精確時間範圍:

4. 確定每個採樣點的top n event:

概述

思路

1. Dump出問題期間的ASH數據：

2. 查看ASH時間範圍:

3. 確認問題發生的精確時間範圍:

4. 確定每個採樣點的top n event:

5. 觀察每個採樣點的等待鏈

概述

思路

1. Dump出問題期間的ASH數據：

2. 查看ASH時間範圍:

3. 確認問題發生的精確時間範圍:

4. 確定每個採樣點的top n event:

5. 觀察每個採樣點的等待鏈

6. 基於第5步的原理來找出每個採樣點的最終top holder:

概述

思路

1. Dump出問題期間的ASH數據：

2. 查看ASH時間範圍:

3. 確認問題發生的精確時間範圍:

4. 確定每個採樣點的top n event:

5. 觀察每個採樣點的等待鏈

6. 基於第5步的原理來找出每個採樣點的最終top holder:

7.查看具體sql信息

概述

思路

1. Dump出問題期間的ASH數據：

2. 查看ASH時間範圍:

3. 確認問題發生的精確時間範圍:

4. 確定每個採樣點的top n event:

5. 觀察每個採樣點的等待鏈

6. 基於第5步的原理來找出每個採樣點的最終top holder:

7.查看具體sql信息

概述

思路

1. Dump出問題期間的ASH數據：

2. 查看ASH時間範圍:

3. 確認問題發生的精確時間範圍:

4. 確定每個採樣點的top n event:

5. 觀察每個採樣點的等待鏈

6. 基於第5步的原理來找出每個採樣點的最終top holder:

7.查看具體sql信息

相關推薦