Note that the information on this page is the BETA 1 of a guide that is now released. See for the latest PDF and HTML content.

Chapter 16 - patterns & practices Performance Engineering

- J.D. Meier, Alex Homer, David Hill, Jason Taylor, Prashant Bansode, Lonnie Wall, Rob Boucher Jr, Akshay Bogawat


  • Understand the concepts of performance engineering.
  • Learn the key activities and patterns related to performance engineering.
  • Learn the best practices for performance engineering.


This chapter summarizes the patterns & practices approach to Performance Engineering, with the emphasis on architecture and design. To design, build and deploy better performing applications, you must integrate performance engineering into you application development lifecycle and include specific performance-related activities in your current software engineering processes. The key design-focused performance engineering activities include identifying performance objectives, applying performance design guidelines, conducting performance architecture and design reviews, and doing performance testing and tuning. Each activity will improve the performance of your application; for best results you should implement them all, but you can incrementally adopt any of these activities as you see fit.

Performance Overlay

The following schematic show how performance engineering topics fit with the core activities of application design.


Key Activities in the Life Cycle

This Performance Engineering approach extends these proven core activities to create security specific activities.

These include:
  • Performance Objectives. Setting objectives helps you scope and prioritize your work by setting boundaries and constraints. Setting performance objectives helps you identify where to start, how to proceed, and when your application meets your performance goals.
  • Budgeting. Budget represents your constraints and enables you to specify how much you can spend (resource-wise) and how you plan to spend it.
  • Performance Modeling. Performance modeling is an engineering technique that provides a structured and repeatable approach to meeting your performance objectives.
  • Performance Design Guidelines. Applying design guidelines, patterns and principles which enable you to engineer for performance from an early stage.
  • Performance Design Inspections. Performance design inspections are an effective way to identify problems in your application design. By using pattern-based categories and a question-driven approach, you simplify evaluating your design against root cause performance issues.
  • Performance Code Inspections. Many performance defects are found during code reviews. Analyzing code for performance defects includes knowing what to look for and how to look for it. Performance code inspections to identify inefficient coding practices that could lead to performance bottlenecks.
  • Performance Testing. Load and stress testing is used to generate metrics and to verify application behavior and performance under normal and peak load conditions.
  • Performance Tuning. Performance tuning is an iterative process that you use to identify and eliminate bottlenecks until your application meets its performance objectives. You start by establishing a baseline. Then you collect data, analyze the results, and make configuration changes based on the analysis. After each set of changes, you retest and measure to verify that your application has moved closer to its performance objectives.
  • Performance Health Metrics. Identity the metrics that includes measures, measurements, and criteria, relevant to your performance objectives, as well as those that help you identify bottlenecks. The metrics helps you for evaluating the health of your application from a performance perspective in relation to performance objectives — such as throughput, response time, and resource utilization.
  • Performance Deployment Inspections. During the deployment phase, you validate your model by using production metrics. You can validate workload estimates, resource utilization levels, response time, and throughput.
  • Capacity Planning. You should continue to measure and monitor when your application is deployed in the production environment. Changes that may affect system performance include increased user loads, deployment of new applications on shared infrastructure, system software revisions, and updates to your application to provide enhanced or new functionality. Use your performance metrics to guide your capacity and scaling plans.

Performance Frame

Performance frame define a set of patterns-based categories that can organize repeatable problems and solutions. You can use these categories to divide your application architecture for further analysis and to help identify application performance issues. The categories within the frame represent the critical areas where mistakes are most often made.

Category Description
Caching What and where to cache? Caching refers to how your applications stores frequently used data at a location close to the point of consumption. The main points to be considered are per-user, application-wide, data volatility.
Communication How to communicate between layers? Communication refers to choices for transport mechanism, boundaries, remote interface design, round trips, serialization, and bandwidth.
Concurrency How to handle concurrent user interactions? Concurrency refers to choices for transaction, locks, threading, and queuing.
Coupling / Cohesion How to structure your application? Coupling refers to the relationship between components or sub-systems. Tight coupling would lead to creation of rigid components where changes ripple through components making it hard to understand and change. Cohesion refers to the way components or classes are composed. If a component or class has a well-defined role within the entire system, then it is said to be highly cohesive.
Data Access How to access data? Data Access refers to choices and approaches for schema design, paging, hierarchies, indexes, amount of data, and round trips.
Data Structures How to handle data? Data Structures and Algorithm refers to choice of algorithms; Choice of application entities, for example using arrays or collections.
Exception Management How to handle exceptions? Exceptions management refers to choices / approach for catching, throwing, exceptions.
Resource Management How to manage resources? Resource Management refers to approach for allocating, creating, destroying, and pooling of application resource
State Management What and where to maintain state? State management refers to how your application maintains state. The main points to consider are per-user, application-wide, persistence, and location.

Architecture and Design Issues

To apply the performance frame to your application, it is useful to think about each category as it applies to your application scenarios and its specific deployment. For example, the diagram below shows how you could analyze performance architecture and design issues for a typical Web application.


Separate your performance concerns by application tier to get a clearer view of performance issues, potential bottlenecks and mitigations. For example, the key areas of concern for each application tier in the diagram above are:
  • Browser. Using large ViewState size, large page-outputs rendered at once, unnecessary use of Https protocol for unsecure pages.
  • Web Server. Not caching reference data, poor resource management, blocking calls to services on app tier, wrong data types, using state affinity.
  • Application Server. Not pooling database connections, incorrect data structure choices, chatty communication with data tier
  • Data Server. Contention, isolation level, locking and deadlocks.


Design Process Principles

Consider the following principles to enhance your design process:
  • Set objective goals. Avoid ambiguous or incomplete goals that cannot be measured such as "the application must run fast" or "the application must load quickly." You need to know the performance and scalability goals of your application so that you can (a) design to meet them, and (b) plan your tests around them. Make sure that your goals are measurable and verifiable. Requirements to consider for your performance objectives include response times, throughput, resource utilization, and workload. For example, how long should a particular request take? How many users does your application need to support? What is the peak load the application must handle? How many transactions per second must it support? You must also consider resource utilization thresholds. How much CPU, memory, network I/O, and disk I/O is it acceptable for your application to consume?
  • Validate your architecture and design early. Identify, prototype, and validate your key design choices up front. Beginning with the end in mind, your goal is to evaluate whether your application architecture can support your performance goals. Some of the important decisions to validate up front include deployment topology, load balancing, network bandwidth, authentication and authorization strategies, exception management, instrumentation, database design, data access strategies, state management, and caching. Be prepared to cut features and functionality or rework areas that do not meet your performance goals. Know the cost of specific design choices and features.
  • Cut the deadwood. Often the greatest gains come from finding whole sections of work that can be removed because they are unnecessary. This often occurs when (well-tuned) functions are composed to perform some greater operation. It is often the case that many interim results from the first function in your system do not end up getting used if they are destined for the second and subsequent functions. Elimination of these "waste" paths can yield tremendous end-to-end improvements.
  • Tune end-to-end performance. Optimizing a single feature could take away resources from another feature and hinder overall performance. Likewise, a single bottleneck in a subsystem within your application can affect overall application performance regardless of how well the other subsystems are tuned. You obtain the most benefit from performance testing when you tune end-to-end, rather than spending considerable time and money on tuning one particular subsystem. Identify bottlenecks, and then tune specific parts of your application. Often performance work moves from one bottleneck to the next bottleneck.
  • Measure throughout the life cycle. You need to know whether your application's performance is moving toward or away from your performance objectives. Performance tuning is an iterative process of continuous improvement with hopefully steady gains, punctuated by unplanned losses, until you meet your objectives. Measure your application's performance against your performance objectives throughout the development life cycle and make sure that performance is a core component of that life cycle. Unit test the performance of specific pieces of code and verify that the code meets the defined performance objectives before moving on to integrated performance testing. When your application is in production, continue to measure its performance. Factors such as the number of users, usage patterns, and data volumes change over time. New applications may start to compete for shared resources.

Design Principles

The following design principles are abstracted from architectures that have scaled and performed well over time:
  • Consider designing coarse-grained services. Coarse-grained services minimize the number of client-service interactions and help you design cohesive units of work. Coarse-grained services also help abstract service internals from the client and provide a looser coupling between the client and service. Loose coupling increases your ability to encapsulate change. If you already have fine-grained services, consider wrapping them with a facade layer to help achieve the benefits of a coarse-grained service.
  • Consider designing fine-grained services. <<when it makes sense>>**
  • Minimize round trips by batching work. Minimize round trips to reduce call latency. For example, batch calls together and design coarse-grained services that allow you to perform a single logical operation by using a single round trip. Apply this principle to reduce communication across boundaries such as threads, processes, processors, or servers. This principle is particularly important when making remote server calls across a network. <<database call it out>>
  • Acquire late and release early. Minimize the duration that you hold shared and limited resources such as network and database connections. Releasing and re-acquiring such resources from the operating system can be expensive, so consider a recycling plan to support "acquire late and release early." This enables you to optimize the use of shared resources across requests.
  • Evaluate affinity with processing resources. When certain resources are only available from certain servers or processors, there is an affinity between the resource and the server or processor. While affinity can improve performance, it can also impact scalability. Carefully evaluate your scalability needs. Will you need to add more processors or servers? If application requests are bound by affinity to a particular processor or server, you could inhibit your application's ability to scale. As load on your application increases, the ability to distribute processing across processors or servers influences the potential capacity of your application.
  • Put the processing closer to the resources it needs. If your processing involves a lot of client-service interaction, you may need to push the processing closer to the client. If the processing interacts intensively with the data store, you may want to push the processing closer to the data.
  • Pool shared resources. Pool shared resources that are scarce or expensive to create such as database or network connections. Use pooling to help eliminate performance overhead associated with establishing access to resources and to improve scalability by sharing a limited number of resources among a much larger number of clients.
  • Avoid unnecessary work. Use techniques such as caching, avoiding round trips, and validating input early to reduce unnecessary processing. For more information, see "Cut the Deadwood," above.
  • Reduce contention. Blocking and hotspots are common sources of contention. Blocking is caused by long-running tasks such as expensive I/O operations. Hotspots result from concentrated access to certain data that everyone needs. Avoid blocking while accessing resources because resource contention leads to requests being queued. Contention can be subtle. Consider a database scenario. On the one hand, large tables must be indexed very carefully to avoid blocking due to intensive I/O. However, many clients will be able to access different parts of the table with no difficulty. On the other hand, small tables are unlikely to have I/O problems but might be used so frequently by so many clients that they are hotly contested. Techniques for reducing contention include the efficient use of shared threads and minimizing the amount of time your code retains locks.
  • Use progressive processing. Use efficient practices for handling data changes. For example, perform incremental updates. When a portion of data changes, process the changed portion and not all of the data. Also consider rendering output progressively. Do not block on the entire result set when you can give the user an initial portion and some interactivity earlier.
  • Process independent tasks concurrently. When you need to process multiple independent tasks, you can asynchronously execute those tasks to perform them concurrently. Asynchronous processing offers you the most benefits to I/O bound tasks but has limited benefits when the tasks are CPU-bound and restricted to a single processor. If you plan to deploy on single-CPU servers, additional threads guarantee context switching, and because there is no real multithreading, there are likely to be only limited gains. Single CPU-bound multithreaded tasks perform relatively slowly due to the overhead of thread switching.


This table represents a set of performance design guidelines for application architects. Use this as a starting point for your design and to improve performance design inspections.

Category Guidelines
Caching Decide where to cache data.
Decide what data to cache.
Decide the expiration policy and scavenging mechanism.
Decide how to load the cache data.
Avoid distributed coherent caches.
Communication Choose the appropriate remote communication mechanism.
Design chunky interfaces.
Consider how to pass data between layers.
Minimize the amount of data sent across the wire.
Batch work to reduce calls over the network.
Reduce transitions across boundaries.
Consider asynchronous communication.
Consider message queuing.
Consider a "fire and forget" invocation model.
Concurrency Reduce contention by minimizing lock times.
Balance between coarse-grained and fine-grained locks.
Choose an appropriate transaction isolation level.
Avoid long-running atomic transactions.
Coupling / Cohesion Design for loose coupling.
Design for high cohesion.
Partition application functionality into logical layers.
Use early binding where possible.
Evaluate resource affinity.
Data Access Open connections as late as possible and release them as early as possible.
Separate read-only and transactional requests.
Avoid unnecessary data returns.
Cache data to avoid unnecessary work.
Data Structures Choose an appropriate data structure.
Pre-assign size for large dynamic growth data types.
Use value and reference types appropriately.
Exception Management Do not use exceptions to control application flow.
Use validation code to avoid unnecessary exceptions.
Do not catch exceptions that you cannot handle.
Use the finally block to ensure resources are released.
Resource Management Treat threads as a shared resource.
Pool shared or scarce resources.
Acquire late, release early.
Consider efficient object creation and destruction.
Consider resource throttling.
State Management Evaluate stateful versus stateless design.
Consider your state store options.
Minimize session data.
Free session resources as soon as possible.
Avoid accessing session variables from business logic.


Design patterns in this context refer to generic solutions that address commonly occurring application design problems. Some of the patterns identified below are well known design patterns. The primary purpose of some of these patterns does not relate specifically to performance. However their use in certain scenarios enables better performance as a secondary goal. Some of the main patterns that help improve performance are summarized below:
  • Data Transfer Object - Use Data Transfer Object to create an object that carries all the state it needs, combining multiple remote calls for state into a single call. This is closely associated with remote façade. The remote façade is what aggregates numerous fine grain calls in to a single coarse grain interface – not the data transfer object. The data transfer object is the element containing the data, which is shipped across the wire. See "Data Transfer Object" in "Enterprise Solution Patterns" at
  • Remote Façade - Reduce the overhead of calls by wrapping fine-grained calls with more coarse-grained calls.
  • Fast Lane Reader - For read-only data that does not change often, avoid transactional overhead and use the Fast Lane Reader pattern.
  • Flyweight - Use the Flyweight pattern to reuse objects versus creating new ones. For more information on Flyweight, see "Flyweight" at
  • Message Façade - When the client does not need a response to continue processing, use a Message Facade to provide asynchronous execution.
  • Service Locator - Use the Service Locator pattern to reduce expensive lookups by caching locations to services that have a high look up cost.
  • Singleton - Use the Singleton pattern to limit the number of objects created for a given type. Use this pattern with caution because while it reduces overhead, it can create contention. See "Singleton" in "Enterprise Solution Patterns" at


In addition to these well-known patterns, look for patterns that accomplish the following: reduce resource contention, reduce object creation/destruction overhead, distribute load, queue work, batch work, improve cohesion, reduce chatty communication, and improve resource sharing.

Additional Information

Last edited Dec 16, 2008 at 8:18 AM by rboucher, version 4


No comments yet.