Oracle® Data Mining Administrator's Guide 11g Release 1 (11.1) Part Number B28130-01 |
|
|
View PDF |
A number of sample programs are available with Oracle Data Mining. These programs illustrate the many features of the PL/SQL and Java APIs.
The sample programs create a set of models in the database. You can examine the sample source code, which includes numerous comments, to familiarize yourself with the Oracle Data Mining APIs, and you can create your own models by modifying the samples.
Note:
This chapter provides general information about the sample programs and the sample data.Instructions for installing and running the sample programs are provided in Chapter 1.
This chapter includes the following sections:
The PL/SQL sample programs illustrate the use of the DBMS_DATA_MINING
package for creating models and the DBMS_DATA_MINING_TRANSFORM
package for performing transformations on the mining data.
See Also:
Oracle Database PL/SQL Packages and Types Reference for syntax of the PL/SQL API
Oracle Data Mining Application Developer's Guide for information on the use of the APIs
Oracle Database SQL Language Reference for syntax of the SQL functions for model scoring
Summary descriptions of the PL/SQL sample programs are provided in Table 7-1. For detailed descriptions of the sample programs, see the comments in the source code.
Table 7-1 Overview of the PL/SQL Sample Programs
Most of the PL/SQL sample programs use the Data Mining SQL functions for scoring. The Data Mining scoring functions can be used to apply models created with the PL/SQL API or with the Java API.
Note:
The SQL functions for Data Mining are documented in Oracle Database SQL Language Reference. Information about these functions is also provided in Oracle Data Mining Application Developer's Guide.The programs that demonstrate the Data Mining functions are listed in Table 7-2.
The Java demos illustrate the features of the Oracle Data Mining Java API, which implements Oracle-specific extensions to the Java Data Mining (JDM) 1.0.1.1 standard.The Java programs demonstrate data preprocessing and the basic mining functions. Additional Java samples demonstrate predictive analytics, import/export, and text mining.
See Also:
Oracle Data Mining Java API Reference (javadoc) and the Oracle Data Mining Application Developer's Guide for information on the Java API.Summary descriptions of the Java sample programs are provided in Table 7-3. For detailed descriptions, see the comments in the source code.
Table 7-3 Overview of the Java Sample Programs
Oracle Data Mining can mine text columns that have undergone pre-processing by Oracle Text routines.
Oracle Text is a technology for building text query and document classification applications. It provides indexing, word and theme searching, and viewing capabilities for text. Oracle Text is included in a general installation of Oracle Database Enterprise Edition, and therefore is already present in a database installed according to the instructions in Chapter 1.
The pre-processing steps for text mining create nested table columns of type DM_NESTED_NUMERICALS
from columns of type VARCHAR2
or CLOB
. Each row of the nested table specifies an attribute name and a value. The type definition is as follows.
CREATE OR REPLACE TYPE dm_nested_numerical AS OBJECT (attribute_name VARCHAR2(4000), value NUMBER) / CREATE OR REPLACE TYPE dm_nested_numericals AS TABLE OF dm_nested_numerical
Terms extracted from text documents into nested tables can become generic attributes in training or scoring data.
Sample text mining programs in both PL/SQL and Java illustrate classification and feature extraction of a pre-processed text column.
Three PL/SQL sample programs illustrate the process of text mining. One program illustrates the pre-processing that is required to prepare the data for mining. The other two programs build models that use the transformed text.
To prepare a column for text mining using the PL/SQL API, you must use Oracle Text routines to perform the following general steps:
Create a domain index on the column.
Use the index to extract terms from the column to a temporary table.
Populate a column of type DM_NESTED_NUMERICALS
with the terms in the temporary table.
The process of term extraction using Oracle Text is illustrated in the sample program dmtxtfe.sql
. The source code contains extensive comments that explain the steps involved in transforming text into a set of features that can be mined using Oracle Data Mining.
More details about text transformation are provided in the Oracle Data Mining Application Developer's Guide.
The dmsh.sql
script performs the text transformation required by the PL/SQL text mining demos. There are two such sample programs: dmtxtnmf.sql
, which builds a feature extraction model using Non-Negative Matrix Factorization, and dmtxtsvm.sql
, which builds a classification model using Support Vector Machine. Both of these programs use the following tables, which have a nested table column of comment data:
MINING_BUILD_NESTED_TEXT MINING_TEST_NESTED_TEXT MINING_APPLY_NESTED_TEXT
You can run the PL/SQL text mining demo programs, dmtxtnmf.sql
and dmtxtsvm.sql
, like the other PL/SQL programs. The models created by these programs are listed in the following example.
SQL> @ %ORACLE_HOME%\rdbms\demo\dmtxtnmf.sql SQL> @ %ORACLE_HOME%\rdbms\demo\dmtxtsvm.sql SQL> select MODEL_NAME, MINING_FUNCTION, ALGORITHM from user_mining_models;MODEL_NAME MINING_FUNCTION ALGORITHM ---------------- ------------------ ------------------------ T_NMF_SAMPLE FEATURE_EXTRACTION NONNEGATIVE_MATRIX_FACTOR T_SVM_CLAS_SAMPLE CLASSIFICATION SUPPORT_VECTOR_MACHINES
Two Java sample programs illustrate the process of text mining. One builds a feature extraction model, the other builds a classification model.
The Oracle Data Mining Java API provides an interface that handles the term extraction process. If you are developing data mining applications in Java, you do not need to use Oracle Text directly. However, you must ensure that Oracle Text is present in the database.
The OraTextTransform
interface is used to perform text transformation within the Java text mining demos. There are two such sample programs: dmtxtnmfdemo.java
, which builds a feature extraction model using Non-Negative Matrix Factorization, and dmtxtsvmdemo.java
, which builds a classification model using Support Vector Machine. Both of these programs create build, test, and apply data sets from the following tables, which have a text column of comment data:
MINING_BUILD_TEXT MINING_TEST_TEXT MINING_APPLY_TEXT
You can run the Java text mining sample programs, dmtxtnmfdemo.java
and dmtxtsvmdemo.java
, like the other Java programs. The models created by these programs are shown in the following example.
> java dmtxtnmfdemo host:port:SID dmuser3 dmuser3_password > java dmtxtsvmdemo host:port:SID dmuser3 dmuser3_password >sqlplus dmuser/dmuser_password SQL> select MODEL_NAME, MINING_FUNCTION, ALGORITHM from user_mining_models;MODEL_NAME MINING_FUNCTION ALGORITHM ---------------- ------------------ ------------------------ txtnmfModel_jdm FEATURE_EXTRACTION NONNEGATIVE_MATRIX_FACTOR txtsvmModel_jdm CLASSIFICATION SUPPORT_VECTOR_MACHINES
The dmsh.sql
script creates views, tables, and indexes in the user's schema. The views define columns of customer data from tables in the SH
schema. This data is used by the Data Mining sample programs. The tables reference the same columns in SH
, but they include an extra COMMENTS
column for text mining. The indexes are used to extract terms from the text in the COMMENTS
column and build a nested table column.
Views in the data mining user's schema define columns of data from the CUSTOMERS
, SALES
, PRODUCTS
, COUNTRIES
, and SUPPLEMENTARY_DEMOGRAPHICS
tables in the SH
schema. You can list these views with the following SQL statements.
SQL>connect dmuser
Enter password: dmuser_password
SQL>select view_name from user_views;
The views are listed in Table 7-4.
Table 7-4 Views Used by the Data Mining Sample Programs
View Name | Description |
---|---|
|
Scoring data for o-cluster |
|
Training data for o-cluster |
|
Scoring data for data mining (not text mining) |
|
Training data for data mining (not text mining) |
|
Test data for data mining (not text mining) |
|
Data for association rules |
|
Data for one-class SVM |
You can see the references to tables in SH
by listing the view definitions. The definition of the view MINING_DATA_BUILD_V
is shown as follows.
SQL> set long 1000000 SQL> set longc 100000 SQL> set pagesize 100 SQL> select text from all_views where owner='DMUSER3'and view_name='MINING_DATA_BUILD_V'; SELECT a.CUST_ID, a.CUST_GENDER, 2003-a.CUST_YEAR_OF_BIRTH AGE, a.CUST_MARITAL_STATUS, c.COUNTRY_NAME, a.CUST_INCOME_LEVEL, b.EDUCATION, b.OCCUPATION, b.HOUSEHOLD_SIZE, b.YRS_RESIDENCE, b.AFFINITY_CARD, b.BULK_PACK_DISKETTES, b.FLAT_PANEL_MONITOR, b.HOME_THEATER_PACKAGE, b.BOOKKEEPING_APPLICATION, b.PRINTER_SUPPLIES, b.Y_BOX_GAMES, b.OS_DOC_SET_KANJI FROM sh.customers a, sh.supplementary_demographics b, sh.countries c WHERE a.CUST_ID = b.CUST_ID AND a.country_id = c.country_id AND a.cust_id between 101501 and 103000
The views are used to build, test, and score the sample models. Each view has a CUSTOMER_ID
column, which is the case ID, and an AFFINITY_CARD
column, which is the target used by the predictive models. Most of the views provide data for 1500 customers (1500 rows). The view used by the One-Class SVM model has data for 940 customers.
The columns of training data in the MINING_DATA_BUILD_V
view are listed in the following example.
SQL> describe MINING_DATA_BUILD_V CUST_ID NOT NULL NUMBER CUST_GENDER NOT NULL CHAR(1) AGE NUMBER CUST_MARITAL_STATUS VARCHAR2(20) COUNTRY_NAME NOT NULL VARCHAR2(40) CUST_INCOME_LEVEL VARCHAR2(30) EDUCATION VARCHAR2(21) OCCUPATION VARCHAR2(21) HOUSEHOLD_SIZE VARCHAR2(21) YRS_RESIDENCE NUMBER AFFINITY_CARD NUMBER(10) BULK_PACK_DISKETTES NUMBER(10) FLAT_PANEL_MONITOR NUMBER(10) HOME_THEATER_PACKAGE NUMBER(10) BOOKKEEPING_APPLICATION NUMBER(10) PRINTER_SUPPLIES NUMBER(10) Y_BOX_GAMES NUMBER(10) OS_DOC_SET_KANJI NUMBER(10)
The association demos use the MARKET_BASKET_V
data set, which includes columns of products from the PRODUCTS
table and the CUSTOMER_ID
column from the CUSTOMERS
table in SH
. The columns of the MARKET_BASKET_V
view are listed in the following example.
SQL> describe MARKET_BASKET_V CUST_ID NOT NULL NUMBER EXTENSION_CABLE NUMBER FLAT_PANEL_MONITOR NUMBER CD_RW_HIGH_SPEED_5_PACK NUMBER ENVOY_256MB_40GB NUMBER ENVOY_AMBASSADOR NUMBER EXTERNAL_8X_CD_ROM NUMBER KEYBOARD_WRIST_REST NUMBER SM26273_BLACK_INK_CARTRIDGE NUMBER MOUSE_PAD NUMBER MULTIMEDIA_SPEAKERS_3INCH NUMBER OS_DOC_SET_ENGLISH NUMBER SIMM_16MB_PCMCIAII_CARD NUMBER STANDARD_MOUSE NUMBER
The text mining demos use the same customer data from tables in SH
, but they include either an extra text column or a collection type column. The collection type is a nested table of type DM_NESTED_NUMERICALS
.
You can list these tables with the following SQL statements.
SQL>connect dmuser3
Enter password: dmuser3_password
SQL>select table_name from user_tables where table_name like '%MINING%';
The text mining tables are listed in Table 7-5.
Table 7-5 Tables Used by the Text Mining Sample Programs
Table Name | Description |
---|---|
|
Apply table with |
|
Build table with |
|
Test table with |
|
Apply table with |
|
Build table with |
|
Test table with |
In the MINING_BUILD_TEXT
, MINING_TEST_TEXT
, and MINING_APPLY_TEXT
tables, the COMMENTS
column is of type VARCHAR2(4000)
.
SQL> describe MINING_BUILD_TEXT Name Null? Type ----------------------------------------- -------- ---------------------------- CUST_ID NOT NULL NUMBER CUST_GENDER NOT NULL CHAR(1) AGE NUMBER CUST_MARITAL_STATUS VARCHAR2(20) COUNTRY_NAME NOT NULL VARCHAR2(40) CUST_INCOME_LEVEL VARCHAR2(30) EDUCATION VARCHAR2(21) OCCUPATION VARCHAR2(21) HOUSEHOLD_SIZE VARCHAR2(21) YRS_RESIDENCE NUMBER AFFINITY_CARD NUMBER(10) BULK_PACK_DISKETTES NUMBER(10) FLAT_PANEL_MONITOR NUMBER(10) HOME_THEATER_PACKAGE NUMBER(10) BOOKKEEPING_APPLICATION NUMBER(10) PRINTER_SUPPLIES NUMBER(10) Y_BOX_GAMES NUMBER(10) OS_DOC_SET_KANJI NUMBER(10) COMMENTS VARCHAR2(4000)
In the MINING_*_NESTED_TEXT
tables, the COMMENTS
column is of type DM_NESTED_NUMERICALS
.
SQL> describe MINING_BUILD_NESTED_TEXT Name Null? Type ----------------------------------------- -------- ---------------------------- CUST_ID NOT NULL NUMBER CUST_GENDER NOT NULL CHAR(1) AGE NUMBER CUST_MARITAL_STATUS VARCHAR2(20) COUNTRY_NAME NOT NULL VARCHAR2(40) CUST_INCOME_LEVEL VARCHAR2(30) EDUCATION VARCHAR2(21) OCCUPATION VARCHAR2(21) HOUSEHOLD_SIZE VARCHAR2(21) YRS_RESIDENCE NUMBER AFFINITY_CARD NUMBER(10) BULK_PACK_DISKETTES NUMBER(10) FLAT_PANEL_MONITOR NUMBER(10) HOME_THEATER_PACKAGE NUMBER(10) BOOKKEEPING_APPLICATION NUMBER(10) PRINTER_SUPPLIES NUMBER(10) Y_BOX_GAMES NUMBER(10) OS_DOC_SET_KANJI NUMBER(10) COMMENTS DM_NESTED_NUMERICALS
The process of extracting terms from a text column into a nested table column is described in "Text Mining Programs" and in Oracle Data Mining Application Developer's Guide.