Oracle® Database 2 Day + Data Warehousing Guide 11g Release 1 (11.1) Part Number B28314-01 |
|
|
View PDF |
Use this section as a guide for creating ETL logic that meets your performance expectations.
Data partitioning can improve performance when loading or purging data in a target system. This practice is known as Partition Exchange Loading (PEL).
PEL is recommended when loading a relatively small amount of data into a target containing a much larger volume of historical data. The target can be a table, a dimension, or a cube in a data warehouse.
By manipulating partitions in your target system, you can use Partition Exchange Loading (PEL) to instantly add or delete data. When a table is exchanged with an empty partition, new data is added.
You can use Partition Exchange Loading to load new data by exchanging it into a target table as a partition. For example, a table that holds the new data assumes the identity of a partition from the target table and this partition assumes the identity of the source table. This exchange process is a DDL operation with no actual data movement.
Figure 6-1 Overview of Partition Exchange Loading
In Figure 5–8, data from a source table Source
is inserted into a target table consisting of four partitions (Target_P1
, Target_P2
, Target_P3
, and Target_P4
). If the new data needs to be loaded into Target_P3
, the partition exchange operation only exchanges the names on the data objects without moving the actual data. After the exchange, the formerly labeled Source
is renamed to Target_P3
, and the former Target_P3
is now labeled as Source
. The target table still contains four partitions: Target_P1
, Target_P2
, Target_P3
, and Target_P4
. The partition exchange operation available in Oracle 9i completes the loading process without data movement.
To configure a mapping for partition exchange loading, complete the following steps:
In the Project Explorer, right-click a mapping and select Configure.
Warehouse Builder displays the Configuration Properties window.
By default, Partition Exchange Loading is disabled for all mappings. Select PEL Enabled to use Partition Exchange Loading.
Use Data Collection Frequency to specify the amount of new data to be collected for each run of the mapping. Set this parameter to specify if you want the data collected by Year, Quarter, Month, Day, Hour, or Minute. This determines the number of partitions.
Select Direct if you want to create a temporary table to stage the collected data before performing the partition exchange. If you do not select this parameter, Warehouse Builder directly swaps the source table into the target table as a partition without creating a temporary table.
If you select Replace Data, Warehouse Builder replaces the existing data in the target partition with the newly collected data. If you do not select it, Warehouse Builder preserves the existing data in the target partition. The new data is inserted into a non-empty partition. This parameter affects the local partition and can be used to remove or swap a partition out of a target table. At the table level, you can set Truncate/Insert properties.
When you use Warehouse Builder to load a target by exchanging partitions, you can load the target indirectly or directly.
Indirect Partition Exchange Loading: By default, Warehouse Builder creates and maintains a temporary table that stages the source data before initiating the partition exchange process. For example, use Indirect PEL when the mapping includes a remote source or a join of multiple sources.
Direct Partition Exchange Loading: You design the source for the mapping to match the target structure. For example, use Direct PEL in a mapping to instantaneously publish fact tables that you loaded in a previously executed mapping.
If you design a mapping using Partition Exchange Loading and it includes remote sources or a join of multiple sources, Warehouse Builder must perform source processing and stage the data before partition exchange can proceed. Therefore, configure such mappings with Direct PEL set to False. Warehouse Builder transparently creates and maintains a temporary table that stores the results from source processing. After performing the PEL, Warehouse Builder drops the table.
Warehouse Builder creates the temporary table using the same structure as the target table with the same columns, indexes, and constraints. For the fastest performance, Warehouse Builder loads the temporary table using parallel direct-path loading INSERT. After the INSERT, Warehouse Builder indexes and constrains the temporary table in parallel.
Use Direct PEL when the source table is local and the data is of good quality. You must design the mapping such that the source and target are in the same database and have exactly the same structure. The source and target must have the same indexes and constraints, the same number of columns, and the same column types and lengths.
For example, assume that you have the same mapping from Figure 5–10 but would like greater control on when data is loaded into the target. Depending on the amount of data, it could take hours to load and you would not know precisely when the target table would be updated.
To instantly load data to a target using Direct PEL:
Design one mapping to join source data, if necessary, transform data, ensure data validity, and load it to a staging table. Do not configure this mapping to use PEL.
Design the staging table to exactly match the structure of the final target that you will load in a separate mapping. For example, the staging table in Figure 5–10 is ORDER_SUMMARY and should be of the same structure as the final target, ORDER_CUBE in Figure 5–11.
Create a second mapping that loads data from the staging table to the final target such as shown in Figure 5–11. Configure this mapping to use Direct PEL.
Use either the Warehouse Builder Process Flow Editor or Oracle Workflow to launch the second mapping after the completion of the first.
You can use Partition Exchange Loading effectively for scalable loading performance if the following conditions are true:
Table partitioning and tablespace: The target table must be Range partitioned by one DATE column. All partitions must be created in the same tablespace. All tables are created in the same tablespace.
Existing historical data: The target table must contain a huge amount of historical data. An example use for PEL is for a click stream application where the target collects data every day from an OLTP database or Web log files. New data is transformed and loaded into the target that already contains historical data.
New data: All new data must to be loaded into the same partition in a target table. For example, if the target table is partitioned by day, then the daily data should be loaded into one partition.
Loading Frequency: The loading frequency should be equal to or less than the data collection frequency.
No global indexes: There must be no global indexes on the target table.
To configure targets in a mapping for Partition Exchange Loading:
Step 1: Create All Partitions
Step 2: Create All Indexes Using the LOCAL Option
Step 3: Primary/Unique Keys Use "USING INDEX" Option
Warehouse Builder does not automatically create partitions during runtime. Before you can use Partition Exchange Loading, you must create all partitions.
For example, if you select Month as the frequency of new data collection, you need to create all the required partitions for each month of new data. Use the Data Object Editor to create partitions for a table, dimension, or cube.
Figure 6–5 shows the property inspector window for table ORDER_SUMMARY. This figure shows six partitions that have been added for this table.
To use Partition Exchange Loading, all partition names must follow a naming convention. For example, for a partition that will hold data for May 2002, the partition name must be in the format Y2002_Q2_M05.
For PEL to recognize a partition, its name must fit one of the following formats.
Y
dddd
Y
dddd_
Q
d
Y
dddd_
Q
d_
M
dd
Y
dddd_
Q
d_
M
dd_
D
dd
Y
dddd_
Q
d_
M
dd_
D
dd_
H
dd
Y
dddd_
Q
d_
M
dd_
D
dd_
H
dd_
M
dd
Where d
represents a decimal digit. All the letters must be in upper case. Lower case is not recognized.
Figure 6-5 Configuration Properties for Table ORDER_SUMMARY
If you correctly name each partition, Warehouse Builder automatically computes the Value Less Than property for each partition. Otherwise, you must manually configure Value Less Than for each partition for Warehouse Builder to generate a DDL statement. The following is an example of a DDL statement generated by Warehouse Builder:
. . . PARTITION A_PARTITION_NAME VALUES LESS THAN (TO_DATE('01-06-2002','DD-MM-YYYY')), . . .
Figure 5–13 shows automatically generated configuration values for the Value Less Than parameter.
Figure 6-6 Automatically Generated "Value Less Than" Setting
Figure 5–14 shows an index (ORDER_SUMMARY_PK_IDX) added to the ORDER_SUMMARY table. This index has two columns, ORDER_DATE and ITEM_ID. Configure the following:
Set the Index Type parameter to UNIQUE.
Set the Local Index parameter to True.
Now Warehouse Builder can generate a DDL statement for a unique local index on table ORDER_SUMMARY.
Using local indexes provides the most important PEL performance benefit. Local indexes require all indexes to be partitioned in the same way as the table. When the temporary table is swapped into the target table using PEL, so are the identities of the index segments.
Figure 6-7 Configure an Index as a Local Index
If an index is created as a local index, the Oracle server requires that the partition key column must be the leading column of the index. In the preceding example, the partition key is ORDER_DATE and it is the leading column in the index ORDER_SUMMARY_PK_IDX.
In this step you must specify that all primary key and unique key constraints are created with the USING INDEX option. Figure 5–15 shows an example where the primary key constraint ORDER_SUMMARY_PK on the ORDER_SUMMARY table is specified with the USING INDEX option.
With the USING INDEX option, a constraint will not trigger automatic index creation when it is added to the table. The server will search existing indexes for an index with same column list as that of the constraint. Thus, each primary or unique key constraint must be backed by a user-defined unique local index. The index required by the constraint ORDER_SUMMARY_PK is ORDER_SUMMARY_PK_IDX which was created in "Step 2: Create All Indexes Using the LOCAL Option".
Figure 6-8 Specify a Constraint with USING INDEX option
These are the restrictions for using PEL in Warehouse Builder:
Only One Date Partition Key: Only one partition key column of DATE data type is allowed. Numeric partition keys are not supported in Warehouse Builder.
Only Natural Calendar System: The current PEL method supports only the natural calendar system adopted worldwide. Specific business calendar systems with user-defined fiscal and quarter endings are currently not supported.
All Data Partitions Must Be In The Same Tablespace: All partitions of a target (table, dimension, or cube) must be created in the same tablespace.
All Index Partitions Must Be In The Same Tablespace: All indexes of a target (table, dimension, or cube) must be created in the same tablespace. However, the index tablespace can be different from the data tablespace.