Authors: Silvano Coriani, Jasraj Dange, Ewan Fairweather, Xin Jin, Alexei Khalyako, Sanjay Mishra, Selcin Turkarslan Technical Reviewers

Guidance on defining key performance indicators

Yüklə 268,61 Kb.

səhifə	8/8
tarix	16.08.2018
ölçüsü	268,61 Kb.
	#63132

1 2 3 4 5 6 7 8

How to use KPIs
SQL Server troubleshooting scripts
Requests executing on the system script
Top query statements and plan by total CPU time
Snapshot spinlock stats
Snapshot I/O stats
Performance monitor

Guidance on defining key performance indicators

We recommend that you define key performance indicators (KPIs) for all of the components in your application scenario. Typically, you can use performance counters to define your KPIs.

This section lists KPIs that we’ve used to monitor application tier performance and user characteristics while running SQL Server in Azure. Note that the KPIs listed in this section are meant to be guidance only. You should define your KPIs based on the characteristics of your application’s components.

Typical SQL Server KPIs:

Maximum value for \Process(SQLServ)\% Processor Time
Average value for \Process(SQLServ)\% Processor Time
Maximum value for \Processor(_Total)\% Processor Time
Average value for \Processor(_Total)\% Processor Time
Maximum value for \SQLServer:SQL Statistics\Batch Requests/sec
Average value for \SQLServer:SQL Statistics\Batch Requests/sec

If your workload is likely to be I/O bound, you should also add the Logical Disk Performance Monitor counters referenced earlier. If you use additional SQL Server features, such as AlwaysOn, it is recommended that you add the appropriate performance monitor counters.

Typical web application tier KPIs:

Maximum value for \ASP.NET Applications (_Total_)\Requests/sec
Average value for \ASP.NET Applications (_Total_)\Requests/sec
Average value for \Memory\Available Mbytes
Maximum value for \Processor(_Total)\% Processor Time
Average value for \Processor(_Total)\% Processor Time
Average value for \ASP.NET\Request Wait Time
Average value for \ASP.NET\Request Execution Time
Average value for \ASP.NET\Requests Queued
Average value for \ASP.NET\Requests Rejected
Average value for \ASP.NET\Requests Current

Typical user/test characteristics KPI:

Number of concurrent users
Average/Max request execution time
Number of web servers
Ramp up period, test method
Start and end time of test

How to use KPIs

First, identify five to eight KPIs for each application component, such as SQL Server, application tier, and a similar number for the test characteristics KPI’s. The KPIs you choose should be a subset of overall performance counter collection. Note that these KPIs represent an overall performance of the application during different time intervals. In a production environment, you may notice KPIs return different information during the day. We recommend that you calculate and analyze KPIs regularly by using tools, such as Excel. For example, you should be able to make assertions such as “When we scale from one application server to two application servers, the CPU utilization on my SQL Server increases by 2.5x”. In this example, you could continue investigating further using detailed Performance Monitor logs and DMV stats to understand why SQL Server CPU utilization has increased by 2.5x and understand whether this is a normal characteristic of our workload or there is an issue we need to investigate.

You can automate the collection of the raw performance logs and the calculation of the KPIs by using the Logman and Log Parser tools.

SQL Server troubleshooting scripts

Snapshot wait stats script

/* Snapshot the current wait stats and store them so that they can be compared over a time period

Return the statistics between this point in time and the last collection point in time.

use tempdb

go
if exists (select * from sys.objects where name = 'snap_waits')

drop procedure snap_waits

create procedure snap_waits

declare @current_snap_time datetime

declare @previous_snap_time datetime
set @current_snap_time = GETDATE()

if not exists(select name from tempdb.sys.sysobjects where name like 'wait_stats%')

create table wait_stats

(

wait_type varchar(128)

,waiting_tasks_count bigint

,wait_time_ms bigint

,avg_wait_time_ms int

,max_wait_time_ms bigint

,signal_wait_time_ms bigint

,avg_signal_wait_time int

,snap_time datetime

)
insert into wait_stats (

wait_type

,waiting_tasks_count

,wait_time_ms

,max_wait_time_ms

,signal_wait_time_ms

,snap_time

)

select

wait_type

,waiting_tasks_count

,wait_time_ms

,max_wait_time_ms

,signal_wait_time_ms

,@current_snap_time

from sys.dm_os_wait_stats
--get the previous collection point

select top 1 @previous_snap_time = snap_time from wait_stats

where snap_time < (select max(snap_time) from wait_stats)

order by snap_time desc

--get delta in the wait stats

select top 10

s.wait_type

, (e.wait_time_ms - s.wait_time_ms)/((e.waiting_tasks_count - s.waiting_tasks_count)) as [avg_wait_time_ms]

,(e.signal_wait_time_ms - s.signal_wait_time_ms)/((e.waiting_tasks_count - s.waiting_tasks_count)) as [avg_signal_time_ms]

, (e.waiting_tasks_count - s.waiting_tasks_count) as [waiting_tasks_count]

, (e.wait_time_ms - s.wait_time_ms) as [wait_time_ms]

, (e.max_wait_time_ms) as [max_wait_time_ms]

, (e.signal_wait_time_ms - s.signal_wait_time_ms) as [signal_wait_time_ms]

, s.snap_time as [start_time]

, e.snap_time as [end_time]

, DATEDIFF(ss, s.snap_time, e.snap_time) as [seconds_in_sample]

from wait_stats e

inner join (

select * from wait_stats

where snap_time = @previous_snap_time

) s on (s.wait_type = e.wait_type)

where

e.snap_time = @current_snap_time

and s.snap_time = @previous_snap_time

and e.wait_time_ms - s.wait_time_ms > 0

and e.waiting_tasks_count - s.waiting_tasks_count > 0

and e.wait_type NOT IN ('LAZYWRITER_SLEEP', 'SQLTRACE_BUFFER_FLUSH'

, 'SOS_SCHEDULER_YIELD' ,'DBMIRRORING_CMD', 'BROKER_TASK_STOP'

, 'CLR_AUTO_EVENT', 'BROKER_RECEIVE_WAITFOR', 'WAITFOR'

, 'SLEEP_TASK', 'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT'

, 'FT_IFTS_SCHEDULER_IDLE_WAIT', 'BROKER_TO_FLUSH', 'XE_DISPATCHER_WAIT'

, 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP')

order by (e.wait_time_ms - s.wait_time_ms) desc

--clean up table

delete from wait_stats

where snap_time < @current_snap_time

go
exec snap_waits

Requests executing on the system script

/* requests executing on the system

*****************************************************************/

select r.session_id

,blocking_session_id

,wait_type

,wait_time

,wait_resource

,r.status

,r.cpu_time

,r.total_elapsed_time

,r.reads

,s.reads [session reads]

,s.logical_reads [session logical reads]

,r.writes

,r.logical_reads

--,r.scheduler_id

,s.host_name

,qt.dbid

,qt.objectid

,substring(substring(qt.text,r.statement_start_offset/2+1,

(case when r.statement_end_offset = -1

then len(convert(nvarchar(max), qt.text)) * 2

else r.statement_end_offset end - r.statement_start_offset)/2)

, 1, 255) --substring

as statement_text

--,qp.query_plan

from sys.dm_exec_requests r

inner join sys.dm_exec_sessions s on (s.session_id = r.session_id)

cross apply sys.dm_exec_sql_text(sql_handle) as qt

cross apply sys.dm_exec_query_plan (plan_handle) as qp

where r.session_id > 50

--and r.session_id = 55

order by r.total_elapsed_time desc --r.status -- r.scheduler_id, r.status, r.session_id

Top query statements and plan by total CPU time

/* Top statements by total CPU time

************************************************/

SELECT TOP 25

SUBSTRING(qt.text,qs.statement_start_offset/2+1,

(case when qs.statement_end_offset = -1

then len(convert(nvarchar(max), qt.text)) * 2

else qs.statement_end_offset end -qs.statement_start_offset)/2)

as statement_text,

--substring (qt.text , 1, 512) as batch_text,

qs.total_worker_time/qs.execution_count as average_cpu_time,

qs.total_elapsed_time/qs.execution_count as average_elapsed_time,

qs.total_logical_reads/qs.execution_count as average_logical_reads,

qs.total_logical_writes/qs.execution_count as average_logical_writes,

qs.execution_count,

qs.plan_generation_num,

qs.total_worker_time,

qs.total_elapsed_time,

cast((cast(qs.total_worker_time as decimal) / cast(qs.total_elapsed_time as decimal) * 100) as int) as cpu_vs_elapsed_percentage,

qs.total_logical_reads,

qs.total_logical_writes,

db_name(qt.dbid) as [database name],

qs.plan_handle,

qt.objectid

,qp.query_plan

FROM sys.dm_exec_query_stats qs

cross apply sys.dm_exec_sql_text(qs.sql_handle) as qt

cross apply sys.dm_exec_query_plan(qs.plan_handle) as qp

--order by qs.total_logical_reads/qs.execution_count desc

--ORDER BY total_logical_reads desc

--ORDER BY total_elapsed_time desc

ORDER BY total_worker_time desc

Snapshot spinlock stats

--115982764 (with lock)

--31250708

/* Snapshot the current spinlock stats and store so that this can be compared over a time period

Return the statistics between this point in time and the last collection point in time.*/

use tempdb

if exists (select * from sys.objects where name = 'snap_spins')

drop procedure snap_spins

create procedure snap_spins

declare @current_snap_time datetime

declare @previous_snap_time datetime
set @current_snap_time = GETDATE()
if not exists(select name from tempdb.sys.sysobjects where name like 'spin_waits%')

create table spin_waits

(

lock_name varchar(128)

,collisions bigint

,spins bigint

,sleep_time bigint

,backoffs bigint

,snap_time datetime

)
--capture the current stats

insert into spin_waits

(

lock_name

,collisions

,spins

,sleep_time

,backoffs

,snap_time

)

select name

,collisions

,spins

,sleep_time

,backoffs

,@current_snap_time

from sys.dm_os_spinlock_stats

select top 1 @previous_snap_time = snap_time from spin_waits

where snap_time < (select max(snap_time) from spin_waits)

order by snap_time desc
--get delta in the spin locks stats

select top 10

spins_current.lock_name

, (spins_current.spins - spins_previous.spins) as spins

,(spins_current.backoffs - spins_previous.backoffs) as backoffs

, (spins_current.collisions - spins_previous.collisions) as collisions

, (spins_current.sleep_time - spins_previous.sleep_time) as sleep_time

, spins_previous.snap_time as [start_time]

, spins_current.snap_time as [end_time]

, DATEDIFF(ss, @previous_snap_time, @current_snap_time) as [seconds_in_sample]

from spin_waits spins_current

inner join (

select * from spin_waits

where snap_time = @previous_snap_time

) spins_previous on (spins_previous.lock_name = spins_current.lock_name)

where

spins_current.snap_time = @current_snap_time

and spins_previous.snap_time = @previous_snap_time

and spins_current.spins > 0

order by (spins_current.spins - spins_previous.spins) desc

--clean up table

delete from spin_waits

where snap_time < @current_snap_time

go
exec snap_spins

Snapshot I/O stats

/* Snapshot the current file stats and store them so that they can be compared over a time period

Return the statistics between this point in time and the last collection point in time.

This uses a persisted table in tempdb. After the needed data is captured, drop this table.

use tempdb

drop table _iostats_

use tempdb

if not exists(select name from tempdb.sys.sysobjects where name like '_iostats_')

create table _iostats_

(

database_id int

,file_id int

,file_guid uniqueidentifier

,num_of_bytes_read bigint

,num_of_bytes_written bigint

,num_of_reads bigint

,num_of_writes bigint

,io_stall_write_ms bigint

,io_stall_read_ms bigint

,size_on_disk_bytes bigint

,physical_name nvarchar(260)

,type_desc nvarchar(60)

,snap_time datetime

)
declare @current_snap_time datetime

declare @previous_snap_time datetime

set @current_snap_time = GETDATE()

insert into _iostats_ (

database_id

,file_id

,file_guid

,num_of_bytes_read

,num_of_bytes_written

,num_of_reads

,num_of_writes

,io_stall_write_ms

,io_stall_read_ms

,size_on_disk_bytes

,physical_name

,type_desc

,snap_time

)

select

vfs.database_id

,vfs.file_id

,mf.file_guid

,vfs.num_of_bytes_read

,vfs.num_of_bytes_written

,vfs.num_of_reads

,vfs.num_of_writes

,vfs.io_stall_write_ms

,vfs.io_stall_read_ms

,vfs.size_on_disk_bytes

,mf.physical_name

,mf.type_desc

,@current_snap_time

from sys.dm_io_virtual_file_stats(null, null) as vfs

join sys.master_files as mf on vfs.database_id = mf.database_id and vfs.file_id = mf.file_id

where vfs.database_id > 4 or vfs.database_id = 2

order by vfs.database_id, vfs.file_id
select top 1 @previous_snap_time = snap_time from _iostats_

where snap_time < (

select max(snap_time) from _iostats_) order by snap_time desc
print 'Current snap time: ' + convert(varchar(32), @current_snap_time)

print 'Previous snap time: ' + convert(varchar(32), @previous_snap_time)

declare @tick_count_between_snaps bigint

set @tick_count_between_snaps = DATEDIFF(ms, @previous_snap_time, @current_snap_time)

print '@tick_count_between_snaps: ' + convert(varchar(32), @tick_count_between_snaps)

select

time_in_sample_secs = DATEDIFF(s, @previous_snap_time, @current_snap_time)

,db = db_name(iostats_now.database_id)

,iostats_now.physical_name

,iostats_now.type_desc

,iostats_now.file_id

,(iostats_now.num_of_bytes_read - iostats_lastsnap.num_of_bytes_read) num_of_bytes_read

,(iostats_now.num_of_bytes_written - iostats_lastsnap.num_of_bytes_written) num_of_bytes_written

,(iostats_now.num_of_reads - iostats_lastsnap.num_of_reads) num_of_reads

,(iostats_now.num_of_writes - iostats_lastsnap.num_of_writes) num_of_writes

,avg_read_IOPs = case when (iostats_now.num_of_reads - iostats_lastsnap.num_of_reads) = 0 then 0 else ((iostats_now.num_of_reads - iostats_lastsnap.num_of_reads) /(@tick_count_between_snaps/1000)) end

,avg_read_bytes_sec = case when (iostats_now.num_of_bytes_read - iostats_lastsnap.num_of_bytes_read) = 0 then 0 else ((iostats_now.num_of_bytes_read - iostats_lastsnap.num_of_bytes_read)/(@tick_count_between_snaps/1000)) end

,avg_read_stall_ms = case when (iostats_now.num_of_reads - iostats_lastsnap.num_of_reads) = 0 then 0 else ((iostats_now.io_stall_read_ms - iostats_lastsnap.io_stall_read_ms) /(iostats_now.num_of_reads - iostats_lastsnap.num_of_reads)) end

,avg_read_size = case when (iostats_now.num_of_reads - iostats_lastsnap.num_of_reads) = 0 then 0 else ((iostats_now.num_of_bytes_read - iostats_lastsnap.num_of_bytes_read)/(iostats_now.num_of_reads - iostats_lastsnap.num_of_reads)) end

,avg_write_IOPs = case when (iostats_now.num_of_writes - iostats_lastsnap.num_of_writes) = 0 then 0 else ((iostats_now.num_of_writes - iostats_lastsnap.num_of_writes) /(@tick_count_between_snaps/1000)) end

,avg_write_bytes_sec = case when (iostats_now.num_of_bytes_written - iostats_lastsnap.num_of_bytes_written) = 0 then 0 else ((iostats_now.num_of_bytes_written - iostats_lastsnap.num_of_bytes_written)/(@tick_count_between_snaps/1000)) end

,avg_write_stall_ms = case when (iostats_now.num_of_writes - iostats_lastsnap.num_of_writes) = 0 then 0 else ((iostats_now.io_stall_write_ms - iostats_lastsnap.io_stall_write_ms) /(iostats_now.num_of_writes - iostats_lastsnap.num_of_writes)) end

,avg_write_size = case when (iostats_now.num_of_writes - iostats_lastsnap.num_of_writes) = 0 then 0 else ((iostats_now.num_of_bytes_written - iostats_lastsnap.num_of_bytes_written)/(iostats_now.num_of_writes - iostats_lastsnap.num_of_writes)) end

,iostats_now.size_on_disk_bytes

,filegrowth = iostats_now.size_on_disk_bytes - iostats_lastsnap.size_on_disk_bytes

,iostats_now.file_guid

from _iostats_ as iostats_now

inner join

(select * from _iostats_

where snap_time = @previous_snap_time)

iostats_lastsnap on ( --iostats_lastsnap.file_guid = iostats_now.file_guid

iostats_lastsnap.file_id = iostats_now.file_id AND iostats_lastsnap.database_id = iostats_now.database_id

)

where (iostats_now.database_id > 4 or iostats_now.database_id = 2)

and iostats_now.snap_time = @current_snap_time

and iostats_lastsnap.snap_time = @previous_snap_time

order by iostats_now.database_id asc, iostats_now.file_id asc
--clean up

delete from _iostats_

where snap_time = @previous_snap_time
--drop table _iostats_

Performance monitor

In addition to the scripts provided earlier, you should capture the following performance monitor counters over the same time period.
\LogicalDisk(*)\Avg. Disk Bytes/Read

\LogicalDisk(*)\Avg. Disk Bytes/Write

\LogicalDisk(*)\Avg. Disk Queue Length

\LogicalDisk(*)\Avg. Disk sec/Read

\LogicalDisk(*)\Avg. Disk sec/Write

\LogicalDisk(*)\Current Disk Queue Length

\LogicalDisk(*)\Disk Read Bytes/sec

\LogicalDisk(*)\Disk Reads/sec

\LogicalDisk(*)\Disk Write Bytes/sec

\LogicalDisk(*)\Disk Writes/sec

\Memory\Available MBytes

\Memory\Free System Page Table Entries

\Memory\Pages/sec

\Network Interface(*)\*

\Process(*)\*

\Processor(*)\% Privileged Time

\Processor(*)\% Processor Time

\SQLServer:Availability Replica(*)\*

\SQLServer:Access Methods\*

\SQLServer:Buffer Manager\*

\SQLServer:Buffer Node\*

\SQLServer:Databases(*)\*

\SQLServer:Database Replica(*)\*

\SQLServer:General Statistics\*

\SQLServer:Latches\*

\SQLServer:Locks(_Total)\Average Wait Time (ms)

\SQLServer:Locks(_Total)\Lock Requests/sec

\SQLServer:Memory Manager\*

\SQLServer:Plan Cache\*

\SQLServer:Wait Statistics\*

\SQLServer:SQL Statistics\*

\System\Context Switches/sec

\System\Processor Queue Length
Note that we also recommend that you capture all replication counters when using replication.

The logman utility can be used to create the Performance Monitor counter set. Simply copy the above counter definition listed here into a file, such as counters.txt, and then run the following at the command prompt:

logman create counter –cf counters.txt

Yüklə 268,61 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8