We recommend that you define key performance indicators (KPIs) for all of the components in your application scenario. Typically, you can use performance counters to define your KPIs.
This section lists KPIs that we’ve used to monitor application tier performance and user characteristics while running SQL Server in Azure. Note that the KPIs listed in this section are meant to be guidance only. You should define your KPIs based on the characteristics of your application’s components.
Typical SQL Server KPIs:
-
Maximum value for \Process(SQLServ)\% Processor Time
-
Average value for \Process(SQLServ)\% Processor Time
-
Maximum value for \Processor(_Total)\% Processor Time
-
Average value for \Processor(_Total)\% Processor Time
-
Maximum value for \SQLServer:SQL Statistics\Batch Requests/sec
-
Average value for \SQLServer:SQL Statistics\Batch Requests/sec
If your workload is likely to be I/O bound, you should also add the Logical Disk Performance Monitor counters referenced earlier. If you use additional SQL Server features, such as AlwaysOn, it is recommended that you add the appropriate performance monitor counters.
Typical web application tier KPIs:
-
Maximum value for \ASP.NET Applications (_Total_)\Requests/sec
-
Average value for \ASP.NET Applications (_Total_)\Requests/sec
-
Average value for \Memory\Available Mbytes
-
Maximum value for \Processor(_Total)\% Processor Time
-
Average value for \Processor(_Total)\% Processor Time
-
Average value for \ASP.NET\Request Wait Time
-
Average value for \ASP.NET\Request Execution Time
-
Average value for \ASP.NET\Requests Queued
-
Average value for \ASP.NET\Requests Rejected
-
Average value for \ASP.NET\Requests Current
Typical user/test characteristics KPI:
-
Number of concurrent users
-
Average/Max request execution time
-
Number of web servers
-
Ramp up period, test method
-
Start and end time of test
How to use KPIs
First, identify five to eight KPIs for each application component, such as SQL Server, application tier, and a similar number for the test characteristics KPI’s. The KPIs you choose should be a subset of overall performance counter collection. Note that these KPIs represent an overall performance of the application during different time intervals. In a production environment, you may notice KPIs return different information during the day. We recommend that you calculate and analyze KPIs regularly by using tools, such as Excel. For example, you should be able to make assertions such as “When we scale from one application server to two application servers, the CPU utilization on my SQL Server increases by 2.5x”. In this example, you could continue investigating further using detailed Performance Monitor logs and DMV stats to understand why SQL Server CPU utilization has increased by 2.5x and understand whether this is a normal characteristic of our workload or there is an issue we need to investigate.
You can automate the collection of the raw performance logs and the calculation of the KPIs by using the Logman and Log Parser tools.
SQL Server troubleshooting scripts Snapshot wait stats script
/* Snapshot the current wait stats and store them so that they can be compared over a time period
Return the statistics between this point in time and the last collection point in time.
*/
use tempdb
go
if exists (select * from sys.objects where name = 'snap_waits')
drop procedure snap_waits
go
create procedure snap_waits
as
declare @current_snap_time datetime
declare @previous_snap_time datetime
set @current_snap_time = GETDATE()
if not exists(select name from tempdb.sys.sysobjects where name like 'wait_stats%')
create table wait_stats
(
wait_type varchar(128)
,waiting_tasks_count bigint
,wait_time_ms bigint
,avg_wait_time_ms int
,max_wait_time_ms bigint
,signal_wait_time_ms bigint
,avg_signal_wait_time int
,snap_time datetime
)
insert into wait_stats (
wait_type
,waiting_tasks_count
,wait_time_ms
,max_wait_time_ms
,signal_wait_time_ms
,snap_time
)
select
wait_type
,waiting_tasks_count
,wait_time_ms
,max_wait_time_ms
,signal_wait_time_ms
,@current_snap_time
from sys.dm_os_wait_stats
--get the previous collection point
select top 1 @previous_snap_time = snap_time from wait_stats
where snap_time < (select max(snap_time) from wait_stats)
order by snap_time desc
--get delta in the wait stats
select top 10
s.wait_type
, (e.wait_time_ms - s.wait_time_ms)/((e.waiting_tasks_count - s.waiting_tasks_count)) as [avg_wait_time_ms]
,(e.signal_wait_time_ms - s.signal_wait_time_ms)/((e.waiting_tasks_count - s.waiting_tasks_count)) as [avg_signal_time_ms]
, (e.waiting_tasks_count - s.waiting_tasks_count) as [waiting_tasks_count]
, (e.wait_time_ms - s.wait_time_ms) as [wait_time_ms]
, (e.max_wait_time_ms) as [max_wait_time_ms]
, (e.signal_wait_time_ms - s.signal_wait_time_ms) as [signal_wait_time_ms]
, s.snap_time as [start_time]
, e.snap_time as [end_time]
, DATEDIFF(ss, s.snap_time, e.snap_time) as [seconds_in_sample]
from wait_stats e
inner join (
select * from wait_stats
where snap_time = @previous_snap_time
) s on (s.wait_type = e.wait_type)
where
e.snap_time = @current_snap_time
and s.snap_time = @previous_snap_time
and e.wait_time_ms - s.wait_time_ms > 0
and e.waiting_tasks_count - s.waiting_tasks_count > 0
and e.wait_type NOT IN ('LAZYWRITER_SLEEP', 'SQLTRACE_BUFFER_FLUSH'
, 'SOS_SCHEDULER_YIELD' ,'DBMIRRORING_CMD', 'BROKER_TASK_STOP'
, 'CLR_AUTO_EVENT', 'BROKER_RECEIVE_WAITFOR', 'WAITFOR'
, 'SLEEP_TASK', 'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT'
, 'FT_IFTS_SCHEDULER_IDLE_WAIT', 'BROKER_TO_FLUSH', 'XE_DISPATCHER_WAIT'
, 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP')
order by (e.wait_time_ms - s.wait_time_ms) desc
--clean up table
delete from wait_stats
where snap_time < @current_snap_time
go
exec snap_waits
Requests executing on the system script
/* requests executing on the system
*****************************************************************/
select r.session_id
,blocking_session_id
,wait_type
,wait_time
,wait_resource
,r.status
,r.cpu_time
,r.total_elapsed_time
,r.reads
,s.reads [session reads]
,s.logical_reads [session logical reads]
,r.writes
,r.logical_reads
--,r.scheduler_id
,s.host_name
,qt.dbid
,qt.objectid
,substring(substring(qt.text,r.statement_start_offset/2+1,
(case when r.statement_end_offset = -1
then len(convert(nvarchar(max), qt.text)) * 2
else r.statement_end_offset end - r.statement_start_offset)/2)
, 1, 255) --substring
as statement_text
--,qp.query_plan
from sys.dm_exec_requests r
inner join sys.dm_exec_sessions s on (s.session_id = r.session_id)
cross apply sys.dm_exec_sql_text(sql_handle) as qt
cross apply sys.dm_exec_query_plan (plan_handle) as qp
where r.session_id > 50
--and r.session_id = 55
order by r.total_elapsed_time desc --r.status -- r.scheduler_id, r.status, r.session_id
Top query statements and plan by total CPU time
/* Top statements by total CPU time
************************************************/
SELECT TOP 25
SUBSTRING(qt.text,qs.statement_start_offset/2+1,
(case when qs.statement_end_offset = -1
then len(convert(nvarchar(max), qt.text)) * 2
else qs.statement_end_offset end -qs.statement_start_offset)/2)
as statement_text,
--substring (qt.text , 1, 512) as batch_text,
qs.total_worker_time/qs.execution_count as average_cpu_time,
qs.total_elapsed_time/qs.execution_count as average_elapsed_time,
qs.total_logical_reads/qs.execution_count as average_logical_reads,
qs.total_logical_writes/qs.execution_count as average_logical_writes,
qs.execution_count,
qs.plan_generation_num,
qs.total_worker_time,
qs.total_elapsed_time,
cast((cast(qs.total_worker_time as decimal) / cast(qs.total_elapsed_time as decimal) * 100) as int) as cpu_vs_elapsed_percentage,
qs.total_logical_reads,
qs.total_logical_writes,
db_name(qt.dbid) as [database name],
qs.plan_handle,
qt.objectid
,qp.query_plan
FROM sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(qs.sql_handle) as qt
cross apply sys.dm_exec_query_plan(qs.plan_handle) as qp
--order by qs.total_logical_reads/qs.execution_count desc
--ORDER BY total_logical_reads desc
--ORDER BY total_elapsed_time desc
ORDER BY total_worker_time desc
--115982764 (with lock)
--31250708
/* Snapshot the current spinlock stats and store so that this can be compared over a time period
Return the statistics between this point in time and the last collection point in time.*/
use tempdb
go
if exists (select * from sys.objects where name = 'snap_spins')
drop procedure snap_spins
go
create procedure snap_spins
as
declare @current_snap_time datetime
declare @previous_snap_time datetime
set @current_snap_time = GETDATE()
if not exists(select name from tempdb.sys.sysobjects where name like 'spin_waits%')
create table spin_waits
(
lock_name varchar(128)
,collisions bigint
,spins bigint
,sleep_time bigint
,backoffs bigint
,snap_time datetime
)
--capture the current stats
insert into spin_waits
(
lock_name
,collisions
,spins
,sleep_time
,backoffs
,snap_time
)
select name
,collisions
,spins
,sleep_time
,backoffs
,@current_snap_time
from sys.dm_os_spinlock_stats
select top 1 @previous_snap_time = snap_time from spin_waits
where snap_time < (select max(snap_time) from spin_waits)
order by snap_time desc
--get delta in the spin locks stats
select top 10
spins_current.lock_name
, (spins_current.spins - spins_previous.spins) as spins
,(spins_current.backoffs - spins_previous.backoffs) as backoffs
, (spins_current.collisions - spins_previous.collisions) as collisions
, (spins_current.sleep_time - spins_previous.sleep_time) as sleep_time
, spins_previous.snap_time as [start_time]
, spins_current.snap_time as [end_time]
, DATEDIFF(ss, @previous_snap_time, @current_snap_time) as [seconds_in_sample]
from spin_waits spins_current
inner join (
select * from spin_waits
where snap_time = @previous_snap_time
) spins_previous on (spins_previous.lock_name = spins_current.lock_name)
where
spins_current.snap_time = @current_snap_time
and spins_previous.snap_time = @previous_snap_time
and spins_current.spins > 0
order by (spins_current.spins - spins_previous.spins) desc
--clean up table
delete from spin_waits
where snap_time < @current_snap_time
go
exec snap_spins
Snapshot I/O stats
/* Snapshot the current file stats and store them so that they can be compared over a time period
Return the statistics between this point in time and the last collection point in time.
This uses a persisted table in tempdb. After the needed data is captured, drop this table.
use tempdb
go
drop table _iostats_
*/
use tempdb
go
if not exists(select name from tempdb.sys.sysobjects where name like '_iostats_')
create table _iostats_
(
database_id int
,file_id int
,file_guid uniqueidentifier
,num_of_bytes_read bigint
,num_of_bytes_written bigint
,num_of_reads bigint
,num_of_writes bigint
,io_stall_write_ms bigint
,io_stall_read_ms bigint
,size_on_disk_bytes bigint
,physical_name nvarchar(260)
,type_desc nvarchar(60)
,snap_time datetime
)
declare @current_snap_time datetime
declare @previous_snap_time datetime
set @current_snap_time = GETDATE()
insert into _iostats_ (
database_id
,file_id
,file_guid
,num_of_bytes_read
,num_of_bytes_written
,num_of_reads
,num_of_writes
,io_stall_write_ms
,io_stall_read_ms
,size_on_disk_bytes
,physical_name
,type_desc
,snap_time
)
select
vfs.database_id
,vfs.file_id
,mf.file_guid
,vfs.num_of_bytes_read
,vfs.num_of_bytes_written
,vfs.num_of_reads
,vfs.num_of_writes
,vfs.io_stall_write_ms
,vfs.io_stall_read_ms
,vfs.size_on_disk_bytes
,mf.physical_name
,mf.type_desc
,@current_snap_time
from sys.dm_io_virtual_file_stats(null, null) as vfs
join sys.master_files as mf on vfs.database_id = mf.database_id and vfs.file_id = mf.file_id
where vfs.database_id > 4 or vfs.database_id = 2
order by vfs.database_id, vfs.file_id
select top 1 @previous_snap_time = snap_time from _iostats_
where snap_time < (
select max(snap_time) from _iostats_) order by snap_time desc
print 'Current snap time: ' + convert(varchar(32), @current_snap_time)
print 'Previous snap time: ' + convert(varchar(32), @previous_snap_time)
declare @tick_count_between_snaps bigint
set @tick_count_between_snaps = DATEDIFF(ms, @previous_snap_time, @current_snap_time)
print '@tick_count_between_snaps: ' + convert(varchar(32), @tick_count_between_snaps)
select
time_in_sample_secs = DATEDIFF(s, @previous_snap_time, @current_snap_time)
,db = db_name(iostats_now.database_id)
,iostats_now.physical_name
,iostats_now.type_desc
,iostats_now.file_id
,(iostats_now.num_of_bytes_read - iostats_lastsnap.num_of_bytes_read) num_of_bytes_read
,(iostats_now.num_of_bytes_written - iostats_lastsnap.num_of_bytes_written) num_of_bytes_written
,(iostats_now.num_of_reads - iostats_lastsnap.num_of_reads) num_of_reads
,(iostats_now.num_of_writes - iostats_lastsnap.num_of_writes) num_of_writes
,avg_read_IOPs = case when (iostats_now.num_of_reads - iostats_lastsnap.num_of_reads) = 0 then 0 else ((iostats_now.num_of_reads - iostats_lastsnap.num_of_reads) /(@tick_count_between_snaps/1000)) end
,avg_read_bytes_sec = case when (iostats_now.num_of_bytes_read - iostats_lastsnap.num_of_bytes_read) = 0 then 0 else ((iostats_now.num_of_bytes_read - iostats_lastsnap.num_of_bytes_read)/(@tick_count_between_snaps/1000)) end
,avg_read_stall_ms = case when (iostats_now.num_of_reads - iostats_lastsnap.num_of_reads) = 0 then 0 else ((iostats_now.io_stall_read_ms - iostats_lastsnap.io_stall_read_ms) /(iostats_now.num_of_reads - iostats_lastsnap.num_of_reads)) end
,avg_read_size = case when (iostats_now.num_of_reads - iostats_lastsnap.num_of_reads) = 0 then 0 else ((iostats_now.num_of_bytes_read - iostats_lastsnap.num_of_bytes_read)/(iostats_now.num_of_reads - iostats_lastsnap.num_of_reads)) end
,avg_write_IOPs = case when (iostats_now.num_of_writes - iostats_lastsnap.num_of_writes) = 0 then 0 else ((iostats_now.num_of_writes - iostats_lastsnap.num_of_writes) /(@tick_count_between_snaps/1000)) end
,avg_write_bytes_sec = case when (iostats_now.num_of_bytes_written - iostats_lastsnap.num_of_bytes_written) = 0 then 0 else ((iostats_now.num_of_bytes_written - iostats_lastsnap.num_of_bytes_written)/(@tick_count_between_snaps/1000)) end
,avg_write_stall_ms = case when (iostats_now.num_of_writes - iostats_lastsnap.num_of_writes) = 0 then 0 else ((iostats_now.io_stall_write_ms - iostats_lastsnap.io_stall_write_ms) /(iostats_now.num_of_writes - iostats_lastsnap.num_of_writes)) end
,avg_write_size = case when (iostats_now.num_of_writes - iostats_lastsnap.num_of_writes) = 0 then 0 else ((iostats_now.num_of_bytes_written - iostats_lastsnap.num_of_bytes_written)/(iostats_now.num_of_writes - iostats_lastsnap.num_of_writes)) end
,iostats_now.size_on_disk_bytes
,filegrowth = iostats_now.size_on_disk_bytes - iostats_lastsnap.size_on_disk_bytes
,iostats_now.file_guid
from _iostats_ as iostats_now
inner join
(select * from _iostats_
where snap_time = @previous_snap_time)
iostats_lastsnap on ( --iostats_lastsnap.file_guid = iostats_now.file_guid
iostats_lastsnap.file_id = iostats_now.file_id AND iostats_lastsnap.database_id = iostats_now.database_id
)
where (iostats_now.database_id > 4 or iostats_now.database_id = 2)
and iostats_now.snap_time = @current_snap_time
and iostats_lastsnap.snap_time = @previous_snap_time
order by iostats_now.database_id asc, iostats_now.file_id asc
--clean up
delete from _iostats_
where snap_time = @previous_snap_time
--drop table _iostats_
In addition to the scripts provided earlier, you should capture the following performance monitor counters over the same time period.
\LogicalDisk(*)\Avg. Disk Bytes/Read
\LogicalDisk(*)\Avg. Disk Bytes/Write
\LogicalDisk(*)\Avg. Disk Queue Length
\LogicalDisk(*)\Avg. Disk sec/Read
\LogicalDisk(*)\Avg. Disk sec/Write
\LogicalDisk(*)\Current Disk Queue Length
\LogicalDisk(*)\Disk Read Bytes/sec
\LogicalDisk(*)\Disk Reads/sec
\LogicalDisk(*)\Disk Write Bytes/sec
\LogicalDisk(*)\Disk Writes/sec
\Memory\Available MBytes
\Memory\Free System Page Table Entries
\Memory\Pages/sec
\Network Interface(*)\*
\Process(*)\*
\Processor(*)\% Privileged Time
\Processor(*)\% Processor Time
\SQLServer:Availability Replica(*)\*
\SQLServer:Access Methods\*
\SQLServer:Buffer Manager\*
\SQLServer:Buffer Node\*
\SQLServer:Databases(*)\*
\SQLServer:Database Replica(*)\*
\SQLServer:General Statistics\*
\SQLServer:Latches\*
\SQLServer:Locks(_Total)\Average Wait Time (ms)
\SQLServer:Locks(_Total)\Lock Requests/sec
\SQLServer:Memory Manager\*
\SQLServer:Plan Cache\*
\SQLServer:Wait Statistics\*
\SQLServer:SQL Statistics\*
\System\Context Switches/sec
\System\Processor Queue Length
Note that we also recommend that you capture all replication counters when using replication.
The logman utility can be used to create the Performance Monitor counter set. Simply copy the above counter definition listed here into a file, such as counters.txt, and then run the following at the command prompt:
logman create counter –cf counters.txt
Dostları ilə paylaş: |