Note that the information on this page is the BETA 1 of a guide that is now released. See
http://www.codeplex.com/AppArchGuide for the latest PDF and HTML content.
Chapter 14 – Data Layers Guidelines
- J.D. Meier, Alex Homer, David Hill,
Prashant Bansode, Lonnie Wall, Rob Boucher Jr, Akshay Bogawat
This chapter describes the key guidelines for the design of the data layer of an application. The guidelines are organized by category. They cover the common issues encountered, and mistakes commonly made, when designing the data layer. The following diagram
shows how the data layer fits into a common application architecture.
Data Layer Components
- Data access logic components. Data access components abstract the logic necessary to access your underlying data stores. Doing so centralizes the data access functionality, which makes the application easier to configure and maintain.
- Data Helpers / Utilities. Helper functions and utilities assist in manipulation, transformation and access of data in the layer. They consist of specialized libraries and/or custom routines especially designed to maximize data access performance
and reduce the development requirements of the logic components and the service agent parts of the layer.
- Service agents. When a business component must use functionality exposed by an external service, you may need to create code that manages the semantics of communicating with that particular service. Service agents isolate your application from the
idiosyncrasies of calling diverse services , and can provide additional services such as basic mapping between the format of the data exposed by the service and the format your application requires.
A correct approach to designing the data layer will reduce development time and assist in maintenance of the data layer after the application is deployed. This section briefly outlines an effective design approach for the data layer. Perform the following key
activities in each of these areas when designing your data layer
- Create an overall design for your data access layer.
- Identify your data source requirements
- Determine your data access approach
- Choose how to map data structures to the data source
- Determine how to connect to the data source
- Determine strategies for handling data source errors.
- Design your data access components.:
- Enumerate the data sources that you will access
- Decide on the method of access for each data source
- Determine whether helper components are required or desirable to simplify data access component development and maintenance
- Determine relevant design patterns. For example, consider using the Table Data Gateway, Query Object, and Repository, and other patterns.
- Design your data helper components.
- Identify functionality that could be moved out of the data access components and centralized for reuse
- Research available helper component libraries
- Consider custom helper components for common problems such as connection strings, data source authentication, monitoring, and exception processing
- Consider implementing routines for data access monitoring and testing in your helper components 5) Consider the setup and implementation of logging for your helper components.
- Design your service agents.
- Use the appropriate tool to add a service reference. This will generate a proxy and the data classes that represent the data contract from the service
- Determine how the service will be used in your application. For most applications, you should use an abstraction layer between the business layer and the data access layer, which will provide a consistent interface regardless of the data source. For smaller
applications, the business layer, or even the presentation layer, may access the service agent directly.
The following design guidelines provide information about different aspects of the data access layer that you should consider. The remainder of this chapter provides details to help you choose appropriate technologies and design patterns.
- Choose the data access technology. The choice of an appropriate data access technology will depend on the type of data you are dealing with, and how you want to manipulate the data within the application. Certain technologies are better suited for
specific scenarios. The following sections of this guide discuss these options and enumerate the benefits and drawbacks of each data access technology.**
- Use abstraction to implement a loosely coupled interface to the data access layer. This can be accomplished by defining interface components, such as a façade with well-known inputs and outputs, which translate requests into a format understood
by components within the layer. In addition, you can use interface types or abstract base classes to define a shared abstraction that must be implemented by interface components.
- Consider consolidating data structures. If you are dealing with table-based entities in your data access layer, consider using Data Transfer Objects (DTOs) to help you organize the data into unified structures. In addition, DTOs encourage coarse-grained
operations while providing a structure that is designed to move data across different boundary layers.
- Encapsulate data access functionality within the data access layer. The data access layer hides the details of data source access. It is responsible for managing connections, generating queries, and mapping application entities to data source structures.
Consumers of the data access layer interact through abstract interfaces using application entities such as custom objects, DataSets, DataReaders, and XML documents. Other application layers that access the data access layer will manipulate this data in more
complex ways to implement the functionality of the application. Separating concerns in this way assists in application development and maintenance.
- Decide how to map application entities to data source structures. The type of entity you use in your application is the main factor in deciding how to map those entities to data source structures.
- Decide how you will manage connections. As a rule, the data access layer should create and manage all connections to all data sources required by the application. You must choose an appropriate method for storing and protecting connection information
that conforms to application and security requirements.
- Determine how you will handle data exceptions. The data access layer should catch and (at least initially) handle all exceptions associated with data sources and CRUD operations. Exceptions concerning the data itself, and data source access and timeout
errors, should be handled in this layer and passed to other layers only if the failures affect application responsiveness or functionality.
- Consider security risks. The data access layer should protect against attacks that try to steal or corrupt data, and protect the mechanisms used to gain access to the data source. It should also use the “least privilege” design approach to restrict
privileges to only those needed to perform the operations required by the application. If the data source itself has the ability to limit privileges, security should be considered and implemented in the data access layer as well as in the source.
- Reduce round trips. Consider batching commands into a single database operation.
- Consider performance and scalability objectives. Scalability and performance objectives for the data access layer should be taken into account during design. For example, when designing an Internet-based merchant application, data layer performance
is likely to be a bottleneck for the application. When data layer performance is critical, use profiling to understand and then limit expensive data operations.
Data Layer Frame
||Improperly storing BLOBs in the database instead of the file system.
||Using an incorrect type for BLOB data in database.
||Searching and manipulating BLOB data.
||Failing to use batching to reduce database round-trips .
||Holding onto locks for excessive periods when batching.
||Failing to consider a strategy for reducing database round-trips with batching.
||Improper configuration of connection pooling.
||Failing to handle connection timeouts and disconnections.
||Performing transactions that span multiple connections.
||Holding connections open for excessive periods.
||Using individual identities instead of a trusted subsystem to access the database.
||Choosing the wrong data format.
||Failing to consider serialization requirements.
||Not Mapping objects to a relational data store.
||Not handling data access exceptions.
||Failing to shield database exceptions from the original caller.
||Failing to log critical exceptions.
||Using string concatenation to build queries.
||Mixing queries with business logic.
||Not optimizing the database for query execution.
||Not passing parameters to stored procedures correctly.
||Implementing business logic in stored procedures.
||Not considering how dynamic SQL in stored procedures can impact performance, security, and maintainability.
||Using the incorrect isolation level.
||Using transactions that span multiple data sources.
||Using exclusive locks, which can cause contention and deadlocks.
||Allowing long-running transactions to blocking access to data.
||Failing to perform data type validation against data fields.
||Not handling NULL values.
||Not filtering for invalid characters.
||Not considering how to handle extremely large XML data sets.
||Not choosing the appropriate technology for XML to relational database interaction.
||Failure to set up proper indexes on applications that do heavy querying with XML
||Failing to validate XML inputs using schemas.
A BLOB is a Binary Large Object. When data is stored and retrieved as a single stream of data, it can be considered to be a BLOB. BLOBs may have structure within them, but that structure is not apparent to the database that stores it or the data layer that
reads and writes it. Databases can store the BLOB data or can store pointers to them within the database. The BLOB data is usually stored in a file system if not stored directly in the database. BLOBs are typically used to store image data, but can also be
used to store binary representations of objects
When designing for BLOBs, consider the following guidelines:
- Store images in a database only when it is not practical to store them on the disk.
- Use BLOBs to simplify synchronization of large binary objects between servers.
- Consider whether you need to search the BLOB data. If so, create and populate other searchable database fields instead of parsing the BLOB data.
- Consider using the file system for BLOB data to improve performance. How much of a performance increase will vary depending on your database structure and implementation. Using a file system you must also consider storing and synchronizing metadata related
to the file in the database.
- When retrieving the BLOB, cast it to the appropriate type for manipulation within your business or presentation layer.
- Do not consider storing BLOB in the database when using Buffered transmission.
Batching database commands can improve the performance of your data layer. There is an upfront cost to entering the database execution environment. Batching can reduce the total upfront cost and thus increase throughput and decrease latency. Batching similar
queries is better because the database caches and can reuse a query execution plan for a similar query.
When designing batching, consider the following guidelines:
- Use batched commands to reduce round trips to the database and minimize network traffic.
- Batch largely similar queries for maximum benefit. Batching dissimilar or random queries is minimally helpful
- Use batched commands and a DataReader to load or copy multiple sets of data.
- When loading large amounts of file-based data into the database, use bulk copy utilities. Do not consider placing locks on long running batch commands.
Connections to data sources are a fundamental part of the data layer. All data source connections should be managed by the data layer. Creating and managing connections uses valuable resources in both the data layer and the data source. To maximize performance,
follow guidelines for creating, managing and closing connections
When designing for data layer connections, consider the following guidelines:
- Choose a strategy for handling connection strings.
- Use the default connection-pooling mechanism to minimize the number of live connections.
- Be sure connections do not have extra unnecessary characters. Added characters will make functionally identical connections appear as different resulting in inefficient pooling.
- In general, open connections as late as possible and close them as early as possible
- Do not open and close a connection after each transactionConsider keeping the connection open for a configurable amount of time to allow reuse. At deployment, test your application under real-world loads, varying this amount of time to optimize performance.
- Do not rely on garbage collection to free connections. Free them explicitly as soon as possible.
- Perform transactions through a single connection where possible.
- Design retry logic to manage the situation where the connection to data source is lost or times out.
- For security reasons, avoid using a System or User Data Source Name (DSN).
- Consider encrypting connection information in the configuration files. For example, use the built-in .NET mechanism to encrypt connection strings.
Data formats and types are important to properly interpret the raw bytes stored in the database and transferred by the data layer. Choosing the appropriate data format provides interoperability with other applications, and facilitates serialized communications
across different processes and physical machines. Data format and serialization is also important to allow the storage and retrieval of application state by the business layer.
When designing your data format, consider the following guidelines:
- Choose a data format appropriate for your application.
- In most cases you should use custom data or business entities for improved application maintainability. This will require additional code to map the entities to database operations. However, new O/RM solutions are available to reduce the amount of custom
- Data structures used by the data access layer should not contain business rules.
- Use XML when working with structured data that changes over time.
- Determine the serialization requirements for the data.
- Consider the interoperability requirements.
Design a centralized exception management strategy so that exceptions are caught and thrown consistently in your data layer. If possible, centralize exception handling logic in your database helper components. Pay particular attention to exceptions that propagate
through trust boundaries and to other layers or tiers. Design for unhandled exceptions so they do not result in application reliability issues or exposure of sensitive application information.
When designing your exception management strategy, consider the following guidelines:
- Determine which exceptions should be propagated to the original caller. Deadlocks, connection issues and optimistic concurrency checks can often be resolved at the data layer.
- Implement a global exception handler to catch unhandled exceptions and re-throw those that should be passed to the original caller.
- When exposing your data layer through a service interface, use exception shielding to avoid exposing sensitive database information in your exception messages.
- Handle all exceptions related to data access in the data layer.
- Consider implementing a retry process for data source errors and timeouts for operations where it’s safe to do so.
- Inform your users of exceptions affecting their application experience. Consider returning exception information to your business layer for appropriate handling and reporting to the user.
- Log exceptions to allow easy troubleshooting of specific errors.
- If you have multiple data sources, include individual data source information when logging exceptions and errors.
Queries are the primary data manipulation operations for the data layer. They are the mechanism that translates requests from the application into create, retrieve, update and delete (CRUD) actions on the database. As queries are so essential, they should be
optimized to maximize database performance and throughput.
When using queries in your data layer, consider the following guidelines:
- Use SQL statements as queries only when a stored procedure is not practical.
- Use parameterized SQL statements instead of using literals to reduce the chances of SQL Injection.
- When it is necessary to dynamically build queries, do not allow user input to determine the query details.
- Do not use string concatenation to build dynamic queries in the data layer.
- Use objects to build the query. For example, implement the Query Object pattern or use the object support provided by ADO.NET.
Stored procedures can optimize query performance, and can also improve security. They provide significant performance improvement over SQL statement queries because the query execution plan can be optimized to suit the database, and the database can be optimized
to match the queries used in the stored procedures. They also act as an additional security mechanism because a caller can get data from a stored procedure, but it does not allow access to the underlying database tables and views. Stored procedures can also
be parameterized to provide the query complexity needed by many applications.
When designing stored procedures, consider the following guidelines:
- Use stored procedures to improve database efficiency and data layer security.
- Use output parameters to return single values.
- Avoid dynamic SQL. Use stored procedures when possible, especially when data access is a bottleneck in the application.
- Consider using individual parameters for single data inputs.
- Consider using XML parameters for passing lists or tabular data.
- Implement appropriate transactions to address consistency and recovery issues.
- Design appropriate error handling to return errors that can be handled by the application code.
- Avoid implementing business logic in stored procedures.
- Avoid the creation of temporary tables while processing data. However, if temporary tables need to be used, consider creating them in-memory rather than on disk.
A transaction is a exchange of sequential information an associated actions that is treated as an atomic unit so as to satisfy a request and ensure database integrity. A transaction is only considered complete if all information and actions are completed and
its associated database changes made permanent. Transactions support undo (rollback) database actions following an error which helps to preserve ACID properties of the database.
When designing transactions, consider the following guidelines:
- Enable transactions only when you need them. For example, you should not use a transaction for an individual SQL statement because SQL Server automatically executes each statement as an individual transaction.
- Keep transactions as short as possible to minimize the amount of time that locks are held.
- Use the appropriate isolation level. The tradeoff is data consistency versus contention. A high isolation level will offer higher data consistency at the price of overall concurrency. A lower isolation level improves performance by lowering contention at
the cost of consistency.
- If performing transactions against a single database, use manual transactions.
- If a single transaction spans multiple databases, you should use automatic transactions.
- Avoid mixing manual and automatic transactions.
- If using manual transactions, consider implementing the transaction within a stored procedure.
- Consider use of MARS in transaction heavy concurrent applications to avoid potential deadlock issues.
- If performing multiple inserts, updates and deletes together, wrap them in a transaction for better performance.
Designing an effective input and data validation strategy is critical to the security of your application. Determine the validation rules for data received from other layers, from third party components, as well as from the database or data store. Understand
your trust boundaries so that you can validate any data that crosses these boundaries.
- Validate all data received by the data layer from all callers.
- Validate all data retrieved from the database.
- Understand your trust boundaries so that you can validate data that crosses these boundaries.
- Determine the validation that occurs in other layers. If data is already trusted, you may not need to repeat the validation.
- Consider the purpose to which data will be put when designing validation.
- Validate all data before performing updates to the database.
- Return informative error messages if validation fails.
XML is useful for maintaining data structure outside of the database. For performance reasons, be careful when using XML for very large data sets. Use schemas to validate the XML structure and content.
When designing for the use of XML, consider the following guidelines:
- Use XML readers and writers to access XML-formatted data.
- Use an XML schema to define formats for data stored and transmitted as XML.
- Validate received XML against the appropriate schemas.
- Use custom validators for complex data parameters within your XML schema.
- Store XML in typed columns, if available, in the database for maximum performance.
- For read-heavy applications that use XML in SQL Server, consider XML indexes.
When designing for manageability, consider the following guidelines:
- Carefully consider whether you need to create custom entities, or if other data representations better meet your requirements. Coding custom entities is a complex task with development costs that increase in line with the number of features the entities
provide. Typically, custom entities are implemented for applications that must expose a developer-friendly object model for customization.
- Implement business entities by deriving them from a base class that provides basic functionality and encapsulates common tasks.
- Rely on the use of internal DataSets or XML documents for complex data instead of internal collections, structs, and other programmatic structures.
- Implement a common set of interfaces that expose common sets of functionality across your business entities.
- Design business entities to rely on data access logic components for database interaction. Centralize implementation of all data access policies and related business logic. For example, if your business entities access SQL Server databases directly, all
applications deployed to clients that use the business entities will require SQL connectivity and logon permissions.
- Use stored procedures to abstract data access from the underlying data schema. However, be careful not to overuse them because this will severely impact code maintenance and reuse and thus the maintainability of your application.. A symptom of overuse is
large trees of stored procedures that call each other. Avoid using them to implement control flow, to manipulate individual values (for example, perform string manipulation), and other functionality difficult to implement in Transact-SQL.
Performance is a function of both your data layer design and your database design. Consider both together when tuning your system for maximum data throughput.
When designing for performance, consider the following guidelines:
- Consider tuning isolation levels for data queries. If you are building an application with high throughput requirements, special data operations may be performed at lower isolation levels than the rest of the transaction. Combining isolation levels can
have a negative impact on data consistency, so you must carefully analyze this option on a case-by-case basis.
- Consider batching commands to reduce round-trips to the database server.
- Use connection pooling and tune performance based on results when running simulated load scenarios.
- Use optimistic concurrency with non-volatile data to mitigate the cost of locking data in the database. This avoids the overhead of locking database rows, including the connection that must be kept open during a lock.
- Use pessimistic concurrency when there is heavy contention to update data, but where locks are kept for a short amount of time. Consider that the longer time a lock is enforced, the less a design will scale.
- If using DataReader, use ordinal lookups to for faster performance.
The data layer should protect the database against attacks that try to steal or corrupt data. It should allow only as much access to the various parts of the data source as is required. It should also protect the mechanisms used to gain access to the data source.
When designing for security, consider the following guidelines:
- When using Microsoft SQL Server, consider using Windows authentication over SQL authentication.
- For performance reasons, avoid impersonation in the middle tier.
- Understand your trust boundaries and implement validation for all data that crosses any of these boundaries.
- If you are using SQL statements, consider the parameterized approach instead of string concatenation to protect against SQL injection attacks.
- Encrypt connection strings that have sensitive information in configuration files rather than using a system or user DSN.
- Encrypt sensitive data before storing it in a database or use database encryption when availableWhen storing passwords, use a salted hash instead of an encrypted version of the password.
- Encrypt sensitive data in transit over the network or the Internet.
- Consider using a least-privilege data source access model.
- Require that callers send identity information to the data layer for auditing purposes.
- Log identity and access information.
When deploying a data access layer, the goal of a software architect is to consider the performance and security issues in the production environment.
When deploying the data access layer, consider the following guidelines:
- Locate the data access layer on the same tier as the business layer to improve application performance.
- If you need to support a remote data access layer, consider using the TCP protocol to improve performance.
- You should not locate the data access layer on the same server as the database.
||Table Data Gateway
||Coarse Grained Lock
||Capture Transaction Details
||Optimistic Offline Lock
||Pessimistic Offline Lock
- Active Record. Put data access logic in the domain object.
- Capture Transaction Details. Create database objects, such as triggers and (shadow) tables, to record changes of all tables belonging to the replication set.
- Repository. Encapsulate set of objects persisted in a database and the operations performed over them.
- Data Transfer Object. Use an object to carry data between processes in order to reduce the number of method calls.
- Table Data Gateway.** Centralize all the SQL for accessing a single table or view for selects, inserts, updates, and deletes.
Use the following guidelines when choosing an appropriate technology for the data layer:
- Use the classes in the SqlClient namespace when accessing SQL Server for maximum performance.
- Consider using the Enterprise Library Data Access Application Block to simplify data access code.
- Use the classes in the System.Xml namespace and subsidiary namespaces to manipulate XML-formatted data.
- ASP.NET-based user interfaces will benefit in terms of performance from rendering data exposed as DataReaders. DataReaders are ideal for read-only, forward-only operations in which each row is processed quickly.
For more information on data access performance, see the following resources:
For more information on data access design patterns, see the following resources:
For more information on general data access guidelines, see the following resources:
For more information on security, see the following resources: