Chapter 12: Data Access Layer Guidelines
J.D. Meier, Alex Homer, David Hill, Jason Taylor, Prashant Bansode, Lonnie Wall, Rob Boucher Jr, Akshay Bogawat
- Understand how the data layer fits into the application architecture.
- Understand the components of the data layer.
- Learn the steps for designing these components.
- Learn the common issues faced when designing the data layer.
- Learn the key guidelines for designing the data layer.
- Learn the key patterns and technology considerations for designing the data access layer.
This chapter describes the key guidelines for designing the data layer of an application. The guidelines are organized by category and cover the common issues encountered, and mistakes commonly made, when designing the data layer. Figure 1. shows how the data layer fits into typical application architecture.
[image:DataAccessLayer.PNG] Figure 1 A typical application showing the data layer and the components it may contain
Data Layer Components
- Data access logic components. Data access components abstract the logic necessary to access your underlying data stores. Doing so centralizes the data access functionality, which makes the application easier to configure and maintain.
- Data helpers / utilities. Helper functions and utilities assist in data manipulation, data transformation, and data access within the layer. They consist of specialized libraries and/or custom routines especially designed to maximize data access performance and reduce the development requirements of the logic components and the service agent parts of the layer.
- Service agents. When a business component must use functionality exposed by an external service, you might need to create code that manages the semantics of communicating with that service. Service agents isolate your application from the idiosyncrasies of calling diverse services, and can provide additional services such as basic mapping between the format of the data exposed by the service and the format your application requires.
A correct approach to designing the data layer will reduce development time and assist in maintenance of the data layer after the application is deployed. This section briefly outlines an effective design approach for the data layer. Perform the following key activities in each of these areas when designing your data layer:
- Create an overall design for your data access layer:
- Identify your data source requirements.
- Determine your data access approach.
- Choose how to map data structures to the data source.
- Determine how to connect to the data source.
- Determine strategies for handling data source errors.
- Design your data access components:
- Enumerate the data sources that you will access.
- Decide on the method of access for each data source.
- Determine whether helper components are required or desirable to simplify data access component development and maintenance.
- Determine relevant design patterns. For example, consider using the Table Data Gateway, Query Object, Repository, and other patterns.
- Design your data helper components:
- Identify functionality that could be moved out of the data access components and centralized for reuse.
- Research available helper component libraries.
- Consider custom helper components for common problems such as connection strings, data source authentication, monitoring, and exception processing.
- Consider implementing routines for data access monitoring and testing in your helper components.
- Consider the setup and implementation of logging for your helper components.
- Design your service agents:
- Use the appropriate tool to add a service reference. This will generate a proxy and the data classes that represent the data contract from the service.
- Determine how the service will be used in your application. For most applications, you should use an abstraction layer between the business layer and the data access layer, which will provide a consistent interface regardless of the data source. For smaller applications, the business layer, or even the presentation layer, may access the service agent directly.
The following design guidelines provide information about different aspects of the data access layer that you should consider. Follow these guidelines to ensure that your data access layer meets the requirements of your application, performs efficiently and securely, and is easy to maintain and extend as business requirements change.
- Choose the data access technology.** The choice of an appropriate data access technology will depend on the type of data you are dealing with, and how you want to manipulate the data within the application. Certain technologies are better suited for specific scenarios. Refer to the Data Access Technology Matrix found later in this guide. It discusses these options and enumerates the benefits and considerations for each data access technology.**
- Use abstraction to implement a loosely coupled interface to the data access layer. This can be accomplished by defining interface components, such as a gateway with well-known inputs and outputs, which translate requests into a format understood by components within the layer. In addition, you can use interface types or abstract base classes to define a shared abstraction that must be implemented by interface components.
- Consider consolidating data structures.** If you are dealing with table-based entities in your data access layer, consider using Data Transfer Objects (DTOs) to help you organize the data into unified structures. In addition, DTOs encourage coarse-grained operations while providing a structure that is designed to move data across different boundary layers.
- Encapsulate data access functionality within the data access layer. The data access layer hides the details of data source access. It is responsible for managing connections, generating queries, and mapping application entities to data source structures. Consumers of the data access layer interact through abstract interfaces using application entities such as custom objects, DataSets, DataReaders, and XML. Other application layers that access the data access layer will manipulate this data in more complex ways to implement the functionality of the application. Separating concerns in this way assists in application development and maintenance.
- Decide how to map application entities to data source structures.** The type of entity you use in your application is the main factor in deciding how to map those entities to data source structures.
- Decide how you will manage connections.** As a rule, the data access layer should create and manage all connections to all data sources required by the application. You must choose an appropriate method for storing and protecting connection information that conforms to application and security requirements.
- Determine how you will handle data exceptions.** The data access layer should catch and (at least initially) handle all exceptions associated with data sources and CRUD (Create, Read, Update, and Delete) operations. Exceptions concerning the data itself, and data source access and timeout errors, should be handled in this layer and passed to other layers only if the failures affect application responsiveness or functionality.
- Consider security risks. The data access layer should protect against attacks that try to steal or corrupt data, and protect the mechanisms used to gain access to the data source. It should also use the “least privilege” design approach to restrict privileges to only those needed to perform the operations required by the application. If the data source itself has the ability to limit privileges, security should be considered and implemented in the data access layer as well as in the source.
- Reduce round trips. Consider batching commands into a single database operation.
- Consider performance and scalability objectives. Scalability and performance objectives for the data access layer should be taken into account during design. For example, when designing an Internet-based merchant application, data layer performance is likely to be a bottleneck for the application. When data layer performance is critical, use profiling to understand and then limit expensive data operations.
Data Layer Frame
There are several common issues that you must consider as your develop your design. These issues can be categorized into specific areas of the design. The following table lists the common issues for each category where mistakes are most often made.
|Category ||Common issues|
|BLOB ||Improperly storing BLOBs in the database instead of the file system. |
| ||Using an incorrect type for BLOB data in the database. |
| ||Searching and manipulating BLOB data. |
|Batching ||Failing to use batching to reduce database round trips. |
| ||Holding onto locks for excessive periods when batching. |
|Connections ||Improper configuration of connection pooling. |
| ||Failing to handle connection timeouts and disconnections. |
| ||Performing transactions that span multiple connections. |
| ||Holding connections open for excessive periods. |
| ||Using individual identities instead of a trusted subsystem to access the database. |
|Data Format ||Choosing the wrong data format. |
| ||Failing to consider serialization requirements. |
| ||Not mapping objects to a relational data store. |
|Exception Management ||Not handling data access exceptions. |
| ||Failing to shield database exceptions from the original caller. |
| ||Failing to log critical exceptions. |
|Queries ||Using string concatenation to build queries. |
| ||Mixing queries with business logic. |
| ||Not optimizing the database for query execution. |
|Stored Procedures ||Using an incorrect strategy to pass parameters to stored procedures. |
| ||Formatting data for display to users in stored procedures. |
| ||Not considering how dynamic SQL in stored procedures can impact performance, security, and maintainability. |
|Transactions ||Using the incorrect isolation level. |
| ||Using exclusive locks, which can cause contention and deadlocks. |
| ||Allowing long-running transactions to block access to data. |
|Validation ||Failing to validate and constrain data fields. |
| ||Not handling NULL values. |
| ||Not filtering for invalid characters. |
|XML ||Not considering how to handle extremely large XML data sets. |
| ||Not choosing the appropriate technology for XML to relational database interaction. |
| ||Failure to set up proper indexes on applications that do heavy querying with XML |
| ||Failing to validate XML inputs using schemas. |
A BLOB is a binary large object. When data is stored and retrieved as a single stream of data, it can be considered to be a BLOB. A BLOB may have structure within it, but that structure is not apparent to the database that stores it or the data layer that reads and writes it. Databases can store the BLOB data or can store pointers to them within the database. The BLOB data is usually stored in a file system if not stored directly in the database. BLOBs are typically used to store image data, but can also be used to store binary representations of objects.
Consider the following guidelines when designing for BLOBs:
- Store BLOB data in a database only when it is not practical to store it on the disk.
- Consider using BLOBs to simplify synchronization of large binary objects between servers.
- Consider whether you need to search the BLOB data. If so, create and populate other searchable database fields instead of parsing the BLOB data.
- When retrieving the BLOB, cast it to the appropriate type for manipulation within your business or presentation layer.
- Do not consider storing the BLOB in the database when using buffered transmission.
Batching database commands can improve the performance of your data layer. Each request to the database execution environment incurs an overhead. Batching can reduce the total overhead by increasing throughput and decreasing latency. Batching similar queries can improve performance because the database caches and can reuse a query execution plan for a similar query.
Consider the following guidelines when designing batching:
- Consider using batched commands to reduce round trips to the database and minimize network traffic.
- Batch similar queries for maximum benefit. Batching dissimilar or random queries provides less reduction in overhead.
- Consider using batched commands and a DataReader to load or copy multiple sets of data.
- When loading large volumes of file-based data into the database, consider using bulk copy utilities.
- Do not consider placing locks on long-running batch commands.
Connections to data sources are a fundamental part of the data layer. All data source connections should be managed by the data layer. Creating and managing connections uses valuable resources in both the data layer and the data source. To maximize performance, follow guidelines for creating, managing, and closing connections
Consider the following guidelines when designing for data layer connections:
- In general, open connections as late as possible and close them as early as possible.
- To maximize the effectiveness of connection pooling, consider using a trusted subsystem security model and avoid impersonation if possible.
- Perform transactions through a single connection where possible.
- For security reasons, avoid using a System or User Data Source Name (DSN) to store connection information.
- Design retry logic to manage the situation where the connection to the data source is lost or times out.
Data formats and types are important in order to properly interpret the raw bytes stored in the database and transferred by the data layer. Choosing the appropriate data format provides interoperability with other applications, and facilitates serialized communications across different processes and physical machines. Data format and serialization are also important in order to allow the storage and retrieval of application state by the business layer.
Consider the following guidelines when designing your data format:
- In most cases, you should use custom data or business entities for improved application maintainability. This will require additional code to map the entities to database operations. However, new object/relational mapping (O/RM) solutions are available to reduce the amount of custom code required.
- Consider using XML for interoperability with other systems and platforms or when working with data structures that can change over time.
- Consider using DataSets for disconnected scenarios in simple CRUD-based applications.
- Understand the serialization and interoperability requirements of your application.
Design a centralized exception-management strategy so that exceptions are caught and thrown consistently in your data layer. If possible, centralize exception-handling logic in your database helper components. Pay particular attention to exceptions that propagate through trust boundaries and to other layers or tiers. Design for unhandled exceptions so they do not result in application reliability issues or exposure of sensitive application information.
Consider the following guidelines when designing your exception-management strategy:
- Determine exceptions that should be caught and handled in the data access layer. Deadlocks, connection issues, and optimistic concurrency checks can often be resolved at the data layer.
- Consider implementing a retry process for operations where data source errors or timeouts occur, where it is safe to do so.
- Design an appropriate exception propagation strategy. For example, allow exceptions to bubble up to boundary layers where they can be logged and transformed as necessary before passing them to the next layer.
- Design an approach for catching and handling unhandled exceptions.
- Design an appropriate logging and notification strategy for critical errors and exceptions that does not reveal sensitive information.
Object Relational Mapping Considerations
When designing an object oriented (OO) application, consider the impedance mismatch between the OO model and the relational model that makes it difficult to translate between them. For example, encapsulation in OO designs, where fields are hidden, contradicts the public nature of properties in a database. Other examples of impedance mismatch include differences in the data types, structural differences, transactional differences, and differences in how data is manipulated. The two common approaches to handling the mismatch are data access design patterns such as Repository, and O/RM tools. A common model associated with OO design is the Domain Model, which is based on modeling entities after objects within a domain. As a result, the term “domain” represents an object-oriented design in the following guidelines.
Consider the following guidelines when designing for object relational mapping:
- Consider using or developing a framework that provides a layer between domain entities and the database.
- If you are working in a Greenfield environment, where you have full control over the database schema, choose an O/RM tool that will generate a schema to support the object model and provide a mapping between the database and domain entities.
- If you are working in a Brownfield environment, where you must work with an existing database schema, consider tools that will help you to map between the domain model and relational model.
- If you are working with a smaller application or do not have access to O/RM tools, implement a common data access pattern such as Repository. With the Repository pattern, the repository objects allow you to treat domain entities as if they were located in memory.
- When working with Web applications or services, group entities and support options that will partially load domain entities with only the required data. This allows applications to handle the higher user load required to support stateless operations, and limit the use of resources by avoiding holding initialized domain models for each user in memory.
Queries are the primary data manipulation operations for the data layer. They are the mechanism that translates requests from the application into create, retrieve, update and delete (CRUD) actions on the database. As queries are so essential, they should be optimized to maximize database performance and throughput.
When using queries in your data layer, consider the following guidelines:
- Use parameterized SQL statements and typed parameters to mitigate security issues and reduce the chance of SQL injection attacks succeeding.
- When it is necessary to build queries dynamically, ensure that you validate user input data used in the query.
- Do not use string concatenation to build dynamic queries in the data layer.
- Consider using objects to build queries. For example, implement the Query Object pattern or use the object support provided by ADO.NET.
- When building dynamic SQL, avoid mixing business-processing logic with logic used to generate the SQL statement. Doing so can lead to code that is very difficult to maintain and debug.
In the past, stored procedures represented a performance improvement over dynamic SQL statements. However, with modern database engines, performance is no longer a major factor. When considering the use of stored procedures, the primary factors are abstraction, maintainability, and your environment. This section contains guidelines to help you design your application when using stored procedures. For guidance on choosing between using stored procedures and dynamic SQL statements, see the section that follows.
When it comes to security and performance, the primary guidelines are to use typed parameters and avoid dynamic SQL within the stored procedure. Parameters are one of the factors that influence the use of cached query plans instead of rebuilding the query plan from scratch. When parameter types and the number of parameters change, new query execution plans are generated, which can reduce performance.
Consider the following guidelines when designing stored procedures:
- Use typed parameters as input values to the procedure and output parameters to return single values.
- Use parameter or database variables if it is necessary to generate dynamic SQL within a stored procedure.
- Consider using XML parameters for passing lists or tabular data.
- Design appropriate error handling and return errors that can be handled by the application code.
- Avoid the creation of temporary tables while processing data. However, if temporary tables need to be used, consider creating them in-memory rather than on disk.
Stored Procedures vs. Dynamic SQL
The choice between stored procedures and dynamic SQL focuses primarily on the use of SQL statements dynamically generated in code instead of SQL implemented within a stored procedure in the database. When choosing between stored procedures and dynamic SQL, you must consider the abstraction requirements, maintainability, and environment constraints.
The main advantages of stored procedures are:
- They provide an abstraction layer to the database, which can minimize the impact on application code when the database schema changes.
- Security is easier to implement and manage because you can restrict access to everything except the stored procedure.
The main advantages of dynamic SQL statements are:
- You can take advantage of fine-grained security features supported by most databases.
- They require less in terms of specialist skills than stored procedures.
- They are easier to debug than stored procedures.
Consider the following guidelines when choosing between stored procedures and dynamic SQL:
- If you have a small application that has a single client and few business rules, dynamic SQL is often the best choice.
- If you have a larger application that has multiple clients, consider how you can achieve the required abstraction. Decide where that abstraction should exist: at the database in the form of stored procedures, or in the data layer of your application in the form of data access patterns or O/RM products.
- If you want to minimize code changes when the database schema changes, consider using stored procedures to provide an abstraction layer. Changes associated with normalization or schema optimization will often have no affect on application code. If a schema change does affect inputs and outputs in a procedure, application code is affected; however, the changes are limited to clients of the stored procedure.
- Consider the resources you have for development of the application. If you do not have resources that are intimately familiar with database programming, consider tools or patterns that are more familiar to your development staff.
- Consider debugging support. Dynamic SQL is easier for application developers to debug.
- When considering dynamic SQL, you must understand the impact that changes to database schemas will have on your application. As a result, you should implement an abstraction in the data access layer to decouple business components from the generation of database queries. Several patterns, such as Query Object and Repository, can be used to provide this abstraction.
is an exchange of sequential information and associated actions that are treated as an atomic unit in order to satisfy a request and ensure database integrity. A transaction is only considered complete if all information and actions are complete, and the associated database changes are made permanent. Transactions support undo (rollback) database actions following an error, which helps to preserve the integrity of data in the database.
Consider the following guidelines when designing transactions:
- Enable transactions only when you need them. For example, you should not use a transaction for an individual SQL statement because Microsoft SQL Server® automatically executes each statement as an individual transaction.
- Keep transactions as short as possible to minimize the amount of time that locks are held.
- Use the appropriate isolation level. The tradeoff is data consistency versus contention. A high isolation level will offer higher data consistency at the price of overall concurrency. A lower isolation level improves performance by lowering contention at the cost of consistency.
- If using manual or explicit transactions, consider implementing the transaction within a stored procedure.
- Consider the use of multiple active result sets (MARS) in transaction-heavy concurrent applications to avoid potential deadlock issues.
Designing an effective input and data-validation strategy is critical to the security of your application. Determine the validation rules for data received from other layers and from third-party components, as well as from the database or data store. Understand your trust boundaries so that you can validate any data that crosses these boundaries.
Consider the following guidelines when designing a validation strategy:
- Validate all data received by the data layer from all callers.
- Consider the purpose to which data will be put when designing validation. For example, user input used in the creation of dynamic SQL should be examined for characters or patterns that occur in SQL injection attacks.
- Understand your trust boundaries so that you can validate data that crosses these boundaries.
- Return informative error messages if validation fails.
Extensible Markup Language (XML) is useful for interoperability and for maintaining data structure outside the database. For performance reasons, be careful when using XML for very large amounts of data. If you must handle large amounts of data, use attribute-based schemas instead of element-based schemas. Use schemas to validate the XML structure and content.
Consider the following guidelines when designing for the use of XML:
- Consider using XML readers and writers to access XML-formatted data.
- Consider using an XML schema to define formats and to provide validation for data stored and transmitted as XML.
- Consider using custom validators for complex data parameters within your XML schema.
- Store XML in typed columns in the database, if available, for maximum performance.
- For read-heavy applications that use XML in SQL Server, consider XML indexes.
Manageability is an important factor in your application because a manageable application is easier for administrators and operators to install, configure, and monitor. Manageability also makes it easier to detect, validate, resolve, and verify errors at run time. You should always strive to maximize manageability when designing your application.
Consider the following guidelines when designing for manageability:
- Consider using common interface types or a shared abstraction (Dependency Inversion) to provide an interface to the data access layer.
- Consider the use of custom entities, or decide if other data representations will better meet your requirements. Coding custom entities can increase development costs; however, they also provide improved performance through binary serialization and a smaller data footprint.
- Implement business entities by deriving them from a base class that provides basic functionality and encapsulates common tasks. However, be careful not to overload the base class with unrelated operations, which would reduce the cohesiveness of entities derived from the base class, and cause maintainability and performance issues.
- Design business entities to rely on data access logic components for database interaction. Centralize implementation of all data access policies and related business logic. For example, if your business entities access SQL Server databases directly, all applications deployed to clients that use the business entities will require SQL connectivity and logon permissions.
- Consider using stored procedures to abstract data access from the underlying data schema. However, be careful not to overuse them because this will severely impact code maintenance and reuse and thus the maintainability of your application. A symptom of overuse is large trees of stored procedures that call each other.
Performance is a function of both your data layer design and your database design. Consider both together when tuning your system for maximum data throughput.
Consider the following guidelines when designing for performance:
- Use connection pooling and tune performance based on results obtained by running simulated load scenarios.
- Consider tuning isolation levels for data queries. If you are building an application with high-throughput requirements, special data operations may be performed at lower isolation levels than the rest of the transaction. Combining isolation levels can have a negative impact on data consistency, so you must carefully analyze this option on a case-by-case basis.
- Consider batching commands to reduce round trips to the database server.
- Consider using optimistic concurrency with non-volatile data to mitigate the cost of locking data in the database. This avoids the overhead of locking database rows, including the connection that must be kept open during a lock.
- If using a DataReader, use ordinal lookups for faster performance.
The data layer should protect the database against attacks that try to steal or corrupt data. It should allow only as much access to the various parts of the data source as is required. The data layer should also protect the mechanisms used to gain access to the data source.
Consider the following guidelines when designing for security:
- When using Microsoft SQL Server, consider using Windows authentication with a trusted subsystem.
- Encrypt connection strings in configuration files instead of using a system or user data source name (DSN).
- When storing passwords, use a salted hash instead of an encrypted version of the password.
- Require that callers send identity information to the data layer for auditing purposes.
- If you are using SQL statements, consider the parameterized approach instead of string concatenation to protect against SQL injection attacks.
When deploying a data access layer, the goal of a software architect is to consider the performance and security issues in the production environment.
Consider the following guidelines when deploying the data access layer:
- Locate the data access layer on the same tier as the business layer to improve application performance.
- If you need to support a remote data access layer, consider using the TCP protocol to improve performance.
- You should not locate the data access layer on the same server as the database.
Key patterns are organized by the key categories detailed in the Data Layer Frame in the following table. Consider using these patterns when making design decisions for each category.
|Category ||Relevant patterns|
|General ||Active Record |
| ||Data Mapper |
| ||Data Transfer Object |
| ||Domain Model |
| ||Query Object |
| ||Repository |
| ||Table Data Gateway |
| ||Table Module |
|Batching ||Parallel Processing |
| ||Partitioning |
|Transactions ||Coarse-Grained Lock |
| ||Capture Transaction Details |
| ||Implicit Lock |
| ||Optimistic Offline Lock |
| ||Pessimistic Offline Lock |
| ||Transaction Script |
- For more information on the Domain Model, Table Module, Coarse-Grained Lock, Implicit Lock, Transaction Script, Active Record, Data Mapper, Data Transfer Object, Optimistic Offline Locking, Pessimistic Offline Locking, Query Object, Repository, Row Data Gateway, and Table Data Gateway patterns, see “Patterns of Enterprise Application Architecture (P of EAA)” at http://martinfowler.com/eaaCatalog/
- For more information on the Capture Transaction Details pattern, see “Data Patterns” at http://msdn.microsoft.com/en-us/library/ms998446.aspx
- Active Record. Include a data access object within a domain entity.
- Application Service. Centralize and aggregate behavior to provide a uniform service layer.
- Capture Transaction Details.** Create database objects, such as triggers and shadow tables, to record changes to all tables belonging to the transaction.
- Coarse Grained Lock.** Lock a set of related objects with a single lock.
- Data Mapper. Implement a mapping layer between objects and the database structure that is used to move data from one structure to another while keeping them independent.
- Data Transfer Object. An object that stores the data transported between processes, reducing the number of method calls required.
- Domain Model.** A set of business objects that represents the entities in a domain and the relationships between them.
- Implicit Lock.** Use framework code to acquire locks on behalf of code that accesses shared resources.
- Optimistic Offline Lock.** Ensure that changes made by one session do not conflict with changes made by another session.
- Parallel Processing.** Allow multiple batch jobs to run in parallel to minimize the total processing time.
- Partitioning. Partition multiple large batch jobs to run concurrently.
- Pessimistic Offline Lock.** Prevent conflicts by forcing a transaction to obtain a lock on data before using it.
- Query Object. An object that represents a database query.
- Repository. An in-memory representation of a data source that works with domain entities.
- Table Data Gateway. An object that acts as a gateway to a table or view in a data source and centralizes all of the select, insert, update, and delete queries.
- Table Module.** A single component that handles the business logic for all rows in a database table or view.
- Transaction Script. Organize the business logic for each transaction in a single procedure, making calls directly to the database or through a thin database wrapper.
The following guidelines will help you to choose an appropriate implementation technology and techniques depending on the type of application you are designing and the requirements of that application:
- If you require basic support for queries and parameters, consider using ADO.NET objects directly.
- If you require support for more complex data-access scenarios, or need to simplify your data access code, consider using the Enterprise Library Data Access Application Block.
- If you are building a data-driven Web application with pages based on the data model of the underlying database, consider using ASP.NET Dynamic Data.
- If you want to manipulate XML-formatted data, consider using the classes in the System.Xml namespace and its subsidiary namespaces.
- If you are using ASP.NET to create user interfaces, consider using a DataReader to access data to maximize rendering performance. DataReaders are ideal for read-only, forward-only operations in which each row is processed quickly.
- If you are accessing Microsoft SQL Server, consider using classes in the ADO.NET SqlClient namespace to maximize performance.
- If you are accessing Microsoft SQL Server 2008, consider using a FILESTREAM for greater flexibility in the storage and access of BLOB data.
- If you are designing an object-oriented business layer based on the Domain Model pattern, consider using the ADO.NET Entity Framework.
patterns & practices Solution Assets
For information about patterns & practices solution assets, see the following resources:
For more information on general data access guidelines, see the following resources: