Software Secret Weapons™
|
Framework For Adding Performance Counters To Large Software Applications by Pavel Simakov on 2007-05-07 16:12:26 under Linguine Watch, view comments |
|||
|
To make performance monitoring useful, we have to identify dozens specific metrics that will give us meaningful information about application behavior. How can we quickly identify what kinds of metrics will give a good information about the running software application?
After adding performance counters to several large software applications it became obvious to me that a simple framework can be created to popularize this activity. Linguine Watch provides a framework for identifying performance counters in large software applications. The goal of the framework is to guide a software engineer to a small, but sufficient set of meaningful performance counters that have stable semantics. The engineers do not need to invent monitoring strategy for every new component and can reuse existing patterns. The stable semantics for well known performance monitors also helps system administrators and tech support personnel to correctly interpret and monitor software behavior in production.
Similar to a car engine, a running software program effectively converts system resources into useful activity, under operational constraints (risk management). On a macro scale a runtime behavior a software component can be modeled around three basic aspects:
The most effective way to add performance counters to a software application is to capture runtime data related to these aspects. As a software engineers, we might tend to add performance counters to low-level programming language primitives like function calls, variables, loops, classes, objects, exceptions, etc. Instead, we need to add performance counters that reflect for higher-level concepts of program execution like resources, useful activity, and risk management.
For each complex software sub-system or a component that requires monitoring we need to select several individual monitors that together cover three key aspects of component behavior outlined above. Ideally, deciding what to monitor would be as simple as picking standard monitors from the list below. I present here a list of the most stable performance monitors identified so far. Some of them are included in the Linguine Watch distribution and some will be included in my future work.
I have recently discovered a Microsoft Command Shell Standardized Verb Set. This verb standardization effort just reinforced my confidence in viability of my framework. If verbs themselves are standardized, the administration of the objects that implement the verbs can be standardized too. I am actively working on the performance counters and monitors that compose the framework. And I am interested in your feedback and suggestions towards improving it. Please drop me a line if you have good ideas or know relevant resources.
com.oy.shared.lw.perf.monitor.SimpleCounterMonitor This monitor reports any metric that is always incrementally increased with time. Only one method must be called to update the monitor:
interface ISimpleCounterMonitor {
public void incValue(long value);
}
com.oy.shared.lw.perf.monitor.SimpleGaugeMonitor This monitor reports any metric that can be increased or decreased with time. Only one method must be called to update the monitor:
interface ISimpleGaugeMonitor {
public void setValue(long value);
}
com.oy.shared.lw.perf.monitor.ObjectLifetimeMonitor This monitor reports construction, serialization and deallocation of objects (created, serialized, deserialized, freed, live). To use this monitor you will need to override constructor, writeObject(), readObject() and finalize() for a specific class. The value for number of live instances ("live") is calculated as number of created minus number of freed instances. These four methods need to be called to update the monitor:
interface IObjectLifetimeMonitor {
public void incCreated();
public void incSerialized();
public void incDeSerialized();
public void incFreed();
}
com.oy.shared.lw.perf.monitor.TaskExecutionMonitor This monitor reports status of task execution for long tasks. The task is considered to have only two outcomes either completed or failed (started, completed, failed, in progress, qos). The values for tasks in progress ("in progress") and quality of service ("qos") are calculated automatically. The math behind quality of service ("qos") calculates moving average for number of completed tasks. These three methods need to be called to update the monitor:
interface ITaskExecutionMonitor {
public void incStarted();
public void incCompleted();
public void incFailed();
}
com.oy.shared.lw.perf.monitor.PipeMonitor This monitor reports bytes written and read across the logical pipe or connection (bytes written, bytes read, writes, reads). These two methods need to be called to update the monitor:
interface IPipeMonitor {
public void incBytesWritten(long value);
public void incBytesRead(long value);
}
com.oy.shared.lw.perf.monitor.VirtualMachineMonitor This monitor reports cpu's, threads, free and total memory (MB) as reported by Java Virtual Machine runtime (cpu's, threads, mb free, mb total). You do not need to update this monitor as it reports its values automatically.
com.oy.shared.lw.perf.monitor.ObjectPoolMonitor This monitor reports status of a typical object pool (added, removed, size, borrowed, returned, outstanding). The values for pool size ("size") and number of objects borrowed and not yet returned ("outstanding") is calculated automatically. These four methods need to be called to update the monitor:
interface IObjectPoolMonitor {
public void incAdded();
public void incRemoved();
public void incBorrowed();
public void incReturned();
}
The rest of the monitors in this section are under consideration for future work. It takes virtually minutes to actually implement them in Linguine Watch, but I am not fully cofident of their semantics.
com.oy.shared.lw.perf.monitor.TransactionExecutionMonitor This monitor reports status of transaction execution (started, committed, rolled back, failed, in progress). The value for transactions in progress ("in progress") is calculated automatically. These four methods need to be called to update the monitor:
interface ITransactionExecutionMonitor {
public void incStarted();
public void incCompleted();
public void incRolledback();
public void incFailed();
}
com.oy.shared.lw.perf.monitor.GenericInterpreterMonitor This monitor reports status of a generic interpreter. The SQL database engine or JavaScript evaluator is an example of a generic interpreter. While both of these things seem quite different, we need to monitor similar kinds of things for both of them (statements, syntax errors, slow statements, fatal runtime errors, non-fatal runtime errors). These five methods need to be called to update the monitor:
interface IGenericInterpreterMonitor {
public void incStatements();
public void incSyntxErrors();
public void incSlowStatements();
public void incFatalRuntimeErrors();
public void incNonFatalRuntimeErrors();
}
com.oy.shared.lw.perf.monitor.EnumMonitor This monitor reports the observed number of specific values for the enumerator. For the HTTP request we can define the enumerator with three individual request types (GET, POST, HEAD). One method per each enum value is needed to be called to update the monitor:
interface IEnumMonitorRequestType {
public void incGET();
public void incPOST();
public void incHEAD();
}
For the HTTP response we can define the enumerator with four individual status codes (200, 300, 400, 500). One method per each enum value is needed to be called to update the monitor:
interface IEnumMonitorStatusCode {
public void inc200();
public void inc300();
public void inc400();
public void inc500();
}
com.oy.shared.lw.perf.monitor.ObjectCacheMonitor This monitor reports status of a typical object cache (gets attempted, cache hits, cache misses). These three methods need to be called to update the monitor:
interface IObjectCacheMonitor {
public void incGetsAttempted();
public void incCacheHits();
public void incCacheMisses();
}
com.oy.shared.lw.perf.monitor.SynchronizedSectionMonitor This monitor reports execution status for a typical block of code that needs to hold a lock (acquired, released, time to acquire, time to run). These four methods need to be called to update the monitor:
interface ISynchronizedSectionMonitor {
public void incAcquired();
public void incReleased();
public void incTimeToAcquire(long value);
public void incTimeToRun(long value);
}
com.oy.shared.lw.perf.monitor.UnreliableConnectionEndpointMonitor This monitor reports a status for the endpoint of the unreliable connection. The endpoint usually has a reconnection logic that seamlessly maintains unreliable connections (connect attempts, accepted, rejected, timeout, disconnected, is connected). These four methods need to be called to update the monitor:
interface IUnreliableConnectionEndpointMonitor {
public void incConnectsAttempted();
public void incAccepted();
public void incRejected();
public void incTimeout();
public void incDisconnected();
public void setIsConnected(boolean value);
}
com.oy.shared.lw.perf.monitor.AgingMonitor This monitor reports a status for objects that have age (born, expired, renewed). The object cache with least recently used to be ejected first policy fits well here. These three methods need to be called to update the monitor:
interface IAgingMonitor {
public void incBorn();
public void incExpired();
public void incRenewed();
}
PS: The interfaces are given to illustrate monitor's use. The monitor classes don't actually implement these interface.
|
|
|||
|
Copyright © 2004-2010 by Pavel Simakov any conclusions, recommendations, ideas, thoughts or the source code presented on this site are my own and do not reflect a official opinion of my current or past employers, partners or clients |
|
No comments yet
Leave a comment