Beating the performance gremlins

This article was originally published in VSJ, which is now part of Developer Fusion.
Application response times are lagging, the databases are crawling, and then ‘wham!’ total meltdown. It’s your database. You’ve been doing your utmost to keep it running with limited resources, but it’s time-consuming, expensive, and soul-destroying work. The fact is, no human being can ever account for all of the potential problems and inefficiencies that lie dormant in even the best-managed databases – that kind of thankless work requires automation. Would you even know if, behind the calm exterior, performance gremlins were causing havoc with your mission critical Oracle databases?

Does this sound all too familiar?

The performance and availability demands placed on databases today are huge. Databases must respond quickly, yet somehow also accommodate more users and data on almost a daily basis. Applications designed years ago for ad-hoc use by a small department are now expected to scale to multiple sites and be accessed on the Web by customers and third parties with 99 percent uptime. These burdens rest on our shoulders. Instead of managing one or two key databases, most of us are now responsible for maintaining a whole family of servers across several sites.

Grinding to a halt?

With technology running at its limits, and with our attention divided across multiple sites, many production systems are the perfect environment for harbouring performance gremlins – those hidden problems that work in the background, over time impeding the performance of a system until the point where it brings everything to a complete halt. Gremlins in your databases are a classic case of “what you don’t know WILL hurt you.” On the face of it, your systems may be performing well, but how well do you really understand the performance personality of your database?

The solution is obviously regular performance monitoring and profiling of all live DB systems. But how to start? You could be forgiven for thinking that performance analysis is a complex business. So many factors can influence the performance of an Oracle database that even seasoned professionals fall back on a tried and tested set of favourite SQL scripts as a plan of attack. But while using just one approach may in some cases bring the right result, knowing and understanding a range of performance analysis methodologies can radically expand your productivity. I’ll discuss these techniques and suggest some rules of thumb that might prove helpful on your “gremlin hunt.”

Broadly there are five key performance analysis techniques considered by Oracle professionals to be industry best practice.

  • Ratio-Based Analysis
  • Bottleneck Analysis
  • Workload Analysis
  • Storage Analysis
  • SQL Analysis

Ratio-based analysis

When faced with a performance hit, many experienced DBAs bring up a prized SQL script, which increases resources for the area (such as memory). Most often, this does the trick, but many times, not. That’s because ratio-based analysis is a broad-brush approach to database performance monitoring. However, if you have a series of databases to look after, and that’s reality for most of us, quickly scanning a list of key database indicators (ratios) of each one can help to flag potential issues.

One key tip with ratios is that it’s important to rely upon delta measurements rather than cumulative ones. Many of the ratios come from the V%sysstat performance view. This view maintains a count of all the occurrences of a particular database incident since the database was brought up. That means that where a database has been up for a long time, this will impact how a particular ratio is reported. For example, if a database has been running for many weeks the I/O stats will look enormous. However, if delta statistics are used, based upon a specified sample taking the before and after counts of each statistic that make up the ratio, then the portrayal will be much more current and accurate.

Bottleneck analysis

When an Oracle database is up and running, every connected process is either busy doing work or waiting to perform work. In terms of performance, a process that is waiting can mean nothing in the overall scheme of things or it can be an indicator that a database bottleneck exists.

If latch contention or heavy table scan activity has been dragging a database’s performance down, you can use bottleneck analysis to confirm the actual root cause. Once one or more wait events or other bottlenecks have been pinpointed as possible “performance gremlins”, you can determine what sessions and objects are causing the problem. So how do you correctly practice bottleneck or wait-based analysis?

In order to understand the impact of wait events on database performance, you need to discover both what the database is or has been waiting on, and the duration of the waits. Therefore it is imperative that the timed_statistics initialisation parameter is set to TRUE. By default this parameter is set to FALSE, which disables the collection of wait times for each wait event defined to the Oracle engine.

When using bottleneck analysis, you cannot rely only on the information contained in the Wait Event views that Oracle provides. For example, an object can attempt to extend into another extent of space in a tablespace and yet be denied if no such free space exists. Such a failure is not reflected in any wait event, but still represents a very real bottleneck to the database.

You should also filter out those events that have no bearing on bottlenecks. For example, the Oracle Wait Event user will record a statistic on how long a user will sit at their SQL*Plus prompt between each request. Such a statistic provides no real value to a DBA trying to figure out where in the database a bottleneck exists, but it is useful for working out if it is inside or outside the database. A list of such ‘idle’ events to eliminate includes:

  • lock element cleanup
  • pmon timer
  • rdbms ipc message
  • smon timer
  • SQL*Net message from client
  • SQL*Net break/reset to client
  • SQL*Net message to client
  • SQL*Net more data to client
  • dispatcher timer
  • Null event
  • parallel query dequeue wait
  • parallel query idle wait – Slaves
  • pipe get
  • PL/SQL lock timer
  • slave wait
  • virtual circuit status

Workload analysis

Ratios help DBAs achieve a global view of database activity. Bottleneck Analysis provides an insight into those things that are holding up user activity and throughput. But what about the ‘real world’? Who’s connected to the database? What resources are they using? What are they executing?

A handful of users can overwhelm the I/O capabilities of Oracle (through untuned queries or runaway batch jobs) or hammer the CPU or memory structures of the database and server. If you are seeing a slowdown in your database and can’t seem to find a root cause, take a look at both the active and inactive sessions.

If any Active session uses more than 50 percent of a total resource (CPU, memory, etc.), go into the session to find out what the user is currently executing. Conversely, a number of inactive users could indicate user sessions that have mistakenly been left logged on. Because each user thread consumes a portion of memory on the Oracle server, to reduce resource usage disconnect any sessions that don’t need a connection.

Storage analysis

Most DBAs focus on memory when tuning performance, but many don’t realise that storage issues can cause severe problems in your database, either bringing everything to a complete standstill or slowly degrading performance – our friend the “performance gremlin” again. You’ll need to investigate the following.

Total Used/Total Free Space
Although it’s useful to know the ratio of used to free space, a more detailed listing by tablespace is needed in order to determine where any actual space shortages exist in the database. If any tablespace begins to approach 90 percent usage (and the objects contained within it are dynamic and growing as opposed to static), take action to prevent any future space allocation errors.

There are a couple of things you can do to prevent a tablespace from running out of available free space:

  • Turn AUTOEXTEND on for the underlying tablespace’s datafiles, allowing them to grow automatically.
  • Using the ALTER TABLESPACE… ADD DATAFILE… command, add a new datafile to a tablespace that is about to run out of available free space.

Archivelog
Oracle can be placed into ARCHIVELOG mode, which tells the database to make copies of the online redo log files for point-in-time recovery purposes. The Archivelog statistic displays whether the database is running in ARCHIVELOG mode or not.

SQL analysis

Poorly optimised SQL can be a major drain on performance. So when troubleshooting a slow system, you should be on the lookout for any query that shows an execution count that is significantly larger than any other query on the system. It could be that the query is in an inefficient PL/SQL loop, or other problematic programming construct. But only by bringing the query to the attention of the application developers, will you know if the query is being mishandled from a programming standpoint.

Of course, ideas about what is bad SQL differ, so it’s always a good idea to ask yourself “what is the actual goal of any optimisation?” Is it to improve response time, CPU time, or usage of physical I/O? Once you have established these goals, measure the performance of your SQL against this.

Routine maintenance

With more and more servers to manage, in an increasingly high availability world, the focus on routine tasks around your database environment becomes that much more critical.

No Oracle database is simple – and applying changes that work on one database won’t necessarily be a cure-all for all the databases in your care. For this reason, understanding and using a range of approaches to performance analysis can be the quickest way to diagnosing performance issues.

Thankfully there are tools such as Embarcadero Performance Analyst that support all major analysis methodologies in order to help even novice DBAs quickly analyse the performance of the database at hand, set automatic thresholds for regular monitoring, and optimise performance if problems are encountered.

Whichever tool you choose, a routine of regular performance monitoring enables you not only to address issues as they arise, but also to anticipate database performance issues so that a critical problem never stops the show. Because ultimately, becoming a more valuable DBA is not just about being more productive, but also about becoming more proactive.


Harry Flora is a Senior Software Consultant at Embarcadero Technologies

Ratio analysis – some rules of thumb

Because every database is different, a one size fits all approach to analysis is not desirable, but with Ratio Analysis there are several broad measures you can apply.

Cache/Hit Ratio

To help ensure excellent performance, you need to keep your cache hit ratio in the neighbourhood of 90 percent or higher. However, every database has its own ‘personality’ and can exhibit excellent performance with below average readings for the cache hit ratio. Nevertheless, investigate consistent readings of 60 percent or lower.

Library Cache/Hit Ratio

A key indicator that shows how often SQL code is being reused by other database users vs. the number of times a SQL statement is broken down, parsed, and then loaded (or reloaded) into the shared pool. A high library cache hit ratio is a good thing. Strive for a hit ratio between 95 – 100 percent, with 99 percent being a good performance benchmark.

Data Dictionary Cache Hit Ratio

Because Oracle references the data dictionary many times when an SQL statement is processed, it is imperative that this vital reference information is kept in RAM. A high data dictionary cache hit ratio is desirable. Strive for a hit ratio between 90 – 100 percent, with 95 percent being a good performance benchmark.

Memory Sort Ratio

One of the memory areas used by Oracle8i and below for normal activity is an area set aside for sort actions. If the sort is too large to be contained within this memory space, Oracle continues the sort on disk – specifically, in the user’s assigned TEMPORARY TABLESPACE). Obviously, memory sorts complete many times faster than sorts forced to use physical I/O.

If your memory sort ratio falls below 90 percent, and you are on Oracle8i or below, increase the sort_area_size and sort_area_retained_size.

In Oracle 9i, there is a new parameter called pga_aggregate_target. When the pga_aggregate_target parameter is set and you are using dedicated Oracle connections, Oracle ignores all of the PGA parameters in the Oracle file, including sort_area_size, hash_area_size and sort_area_retained_size. The value of pga_aggregate_target should be set to the amount of remaining memory (less a 10 percent overhead for other server tasks) on a server after the instance has been started.

Buffer Busy Ratio

Buffer busy waits occur when a process needs to access a data block in the buffer cache, but can’t because it is in use. So it must wait. Buffer busy waits normally centre around contention for rollback segments, too small an INITRANS setting for tables, or insufficient free lists for tables.

Latch Miss Ratio

Protecting the many memory structures in Oracle’s SGA are latches. They ensure that one and only one process at a time can run or modify any memory structure at the same instant. If the latch miss ratio exceeds one percent, take action to resolve the amount of latch contention.

A solution at hand

Embarcadero Performance Analyst is a client side database performance analysis system that provides visual insight into DB performance by providing:
  • key metrics
  • bottleneck analysis
  • workload statistics
  • and user-configurable thresholds.
The holistic approach used by Performance Analyst allows a DBA to use any or all of the monitoring styles that they prefer, without demanding the installation of potentially disruptive server-side Component/Agents. By using Performance Analyst, a DBA can ensure the continued availability and expected performance of their key database systems.

Performance Analyst is just one component available with the latest version of Embarcadero’s database administration solution – Embarcadero DBArtisan 7.2. The others are Capacity Analyst, which offers intelligent insight into the past, current and future inner-workings of a database, and Space Analyst, which provides sophisticated diagnostic capabilities to troubleshoot bottlenecks and performance problems.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“A computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are, in short, a perfect match” - Bill Bryson