1 Installing Oracle Database for Data Mining

Oracle Data Mining is part of Oracle Database. To perform data mining activities, you must be able to log on to an Oracle database, and your user ID must have the appropriate database privileges. You can install Oracle Database yourself, or you can connect to a database installed on a remote computer.

This chapter is intended for anyone who wants to install Oracle Database on a laptop or personal computer running Microsoft Windows. It includes instructions for creating a Data Mining demo user and running the Data Mining sample programs. To connect to a remote database and run the programs remotely, see the instructions in Chapter 2.

Tip:

If you have questions at any point during the installation, refer to "Installing Oracle Database and Creating a Database" in Oracle Database 2 Day DBA.

When you open Oracle Database 2 Day DBA in the Oracle Database Online Documentation Library, it contains direct links to the Oracle By Example (OBE) series on Database Installation.

This chapter contains the following sections. Complete the instructions in each section before proceeding to the next section.

Install Oracle Database
Install Database Companion
Create a Data Mining Demo User
Run the Sample Programs

Install Oracle Database

The instructions in this section explain how to install Oracle Database with the Data Mining option and the sample schemas on your personal computer.

Note:

These instructions assume that this is a fresh installation of Oracle Database 11g.

If you already have Oracle components installed on your computer, refer to Oracle Database Installation Guide for Microsoft Windows.

From the Database installation directory, run SETUP.EXE.

Oracle Universal Installer opens and displays the Select a Product to Install dialog. Choose Oracle Database 11g.

Description of the illustration install1.gif

Choose Next.
The Installer displays the Select Installation Method page.

Description of the illustration install2.gif
- Choose Basic Installation.
- Specify the Oracle Base and Home directories. Oracle Home is a subdirectory of the Oracle Base directory. You can accept the default paths provided by the Installer, as long as they do not already exist on your computer.
- Choose Enterprise Edition as the Installation Type.
- Check the Create Starter Database box.
- Specify a unique name for Global Database Name. You can use the default global database name provided by the Installer, as long as it does not already exist on your computer.
- Specify a password for the database accounts. The password must have at least eight characters and include both alphabetic and numeric characters.
  
  You will have the opportunity to change the passwords for the database accounts at a later time.
- Click Next.
On the Oracle Configuration Manager Registration page, you can choose to register your installation with your Metalink account.

Description of the illustration install3.gif

This page is optional. You can simply choose Next.
The Summary page displays the settings and components for the installation.

Description of the illustration install4.gif

Click Install.
The Installer proceeds with the installation.

Description of the illustration install5.gif
The Installer invokes the Configuration Assistants to configure and start the starter database.

Description of the illustration install7.gif

If the Configuration Assistants encounter an error, check the logs to determine the problem. You can choose to continue the installation and start the assistants manually later, or you can restart the installation. To continue the installation, click Install.
Database Configuration Assistant creates the starter database.

Description of the illustration install8.gif
The Database Configuration Assistant page displays information about the starter database.

Description of the illustration install9.gif

Click the Password Management button.
Unlock the SYS, SYSTEM, and SH accounts. Specify a password for SH. You can also change the passwords for SYS and SYSTEM if you wish. The password must have at least eight characters and include both alphabetic and numeric characters

Description of the illustration install10.gif

Click OK to return to the Database Configuration Assistant page.

On the Database Configuration Assistant page, click OK.
Click EXIT to exit the Installer.

Description of the illustration install11.gif

Install Database Companion

The Oracle Data Mining sample programs are installed with Oracle Database Companion.

The Database Companion installation process copies the Oracle Data Mining sample programs, along with examples and demonstrations of other database features, to the \rdbms\demo subdirectory of the Oracle home directory.

To install the Database Companion, perform these steps:

From the Companion installation directory, run SETUP.EXE.

Oracle Universal Installer opens and displays the Welcome page. Click Next to advance to the next page.

Description of the illustration cpinstall1.gif
On the Specify Home Details page, specify the Oracle home directory in which you installed Oracle Database. Do not assume that the directory displayed by the Installer is correct.

Description of the illustration cpinstall2.gif
On the Summary page, review the information and settings for your installation, then click Install.

Description of the illustration cpinstall3.gif
The Installer proceeds with the installation.

Description of the illustration cpinstall4.gif
On the End of Installation page, confirm that the installation was successful.

Description of the illustration cpinstall5.gif
Click Exit to exit the Installer.

Create a Data Mining Demo User

To build and score Data Mining models, you must have an Oracle user ID with the appropriate privileges. Follow these instructions to create a demo user that has required privileges for running the sample programs and creating and scoring models within the user's schema.

Run the Sample Programs

To locate the sample programs on your computer, navigate to the rdbms\demo subdirectory under Oracle Home.

To display the Data Mining PL/SQL sample programs, search for the files that start with dm and end with .sql. (The list will include dmsh.sql and dmshgrants.sql, which are used to configure the Data Mining demo user ID.) The PL/SQL sample programs are listed in Table 1-1.

Table 1-1 Sample PL/SQL Data Mining Programs

Program File	Algorithm	Mining Function or Task
dmaidemo.sql	Minimum Descriptor Length	Attribute Importance
dmardemo.sql	Apriori	Association
dmdtdemo.sql	Decision Tree	Classification
dmdtxvlddemo.sql	Decision Tree (cross validation)	Classification
dmglcdem.sql	Binary Logistic Regression (GLM)	Classification
dmglrdem.sql	Multivariate Linear Regression (GLM)	Regression
dmkmdemo.sql	k-Means	Clustering
dmnbdemo.sql	Naive Bayes	Classification
dmnmdemo.sql	Non-Negative Matrix Factorization	Feature Extraction
dmocdemo.sql	O-Cluster	Clustering
dmsvcdem.sql	Support Vector Machine	Classification
dmsvodem.sql	Support Vector Machine	Anomaly Detection
dmsvrdem.sql	Support Vector Machine	Regression
dmtxtfe.sql	Term extraction using Oracle Text	Text transformation for mining
dmtxtnmf.sql	Non-Negative Matrix Factorization	Text mining using NMF
dmtxtsvm.sql	Support Vector Machine	Text mining using SVM

In the same directory, search for the files that start with dm and end with .java to display the Java samples. The J ava sample programs are listed in Table 1-2.

Table 1-2 Sample Java Data Mining Programs

Program File	Algorithm	Mining Function or Task
dmaidemo.java	Minimum Description Length	Attribute importance
dmapplydemo.java	Naive Bayes	Illustrate scoring methods
dmardemo.java	Apriori	Association
dmexpimpdemo.java	NA	Model Export/Import
dmglcdemo.java	Binary Logistic Regression (GLM)	Classification
dmglrdemo.java	Multivariate Linear Regression (GLM)	Regression
dmkmdemo.java	k-Means	Clustering
dmnbdemo.java	Naive Bayes	Classification
dmnmdemo.java	Non-Negative Matrix Factorization	Feature extraction
dmocdemo.java	O-Cluster	Clustering
dmpademo.java	Automated predict and explain	Predictive Analytics
dmsvcdemo.java	Support Vector Machine	Classification
dmsvodemo.java	Support Vector Machine (one class)	Classification
dmsvrdemo.java	Support Vector Machine	Regression
dmtreedemo.java	Decision Tree	Classification
dmtxtnmfdemo.java	Non-Negative Matrix Factorization	Text mining with NMF
dmtxtsvmdemo.java	Support Vector Machine	Text mining with SVM classification
dmxfdemo.java	Binning, clipping, and normalization	Data Transformations

View the Source Code

You will learn a great deal about the Data Mining APIs by investigating the source code of the sample programs. The programs illustrate typical approaches to data preparation, algorithm selection, algorithm tuning, testing, and scoring. All the programs include extensive comments to help you understand what the code is doing.

You can view the source code simply by opening the files in a text editor.

Run the PL/SQL Sample Programs

Now that you have a user ID with the required privileges and a schema populated with the required objects, you can run the sample programs. Each program creates a Data Mining model.

While the program is running, it displays the code and the program output.

You can run the sample programs as many times as you wish. The programs clean up the results of the previous run before executing the current run.

To run the PL/SQL programs:

Start SQL*Plus and login as the Data Mining user.

Enter user-name: dmuser
    Enter password: dmuser_password

Run the program by specifying "@" followed by the fully-qualified path of the program. This example executes the program dmnbdemo.sql, which creates a Naive Bayes model.
```
SQL>@ %ORACLE_HOME%\rdbms\demo\dmnbdemo
```

Prepare to Run the Java Programs

Before you can run the Java programs, you must set up your Java environment and compile the programs. You can do this in an Integrated Development Environment such as Oracle jDeveloper, or your can execute the following commands at the operating system prompt.

Check that the version of Java you are using is 1.5 or higher. You can execute the following in a command window to check the version of Java.
```
>java -version
```
Add %ORACLE_HOME%\jdk\bin\ to your PATH variable before the paths of any other Java versions.

Add the following Data Mining JAR files to your Windows CLASSPATH:

%ORACLE_HOME%\rdbms\jlib\jdm.jar
            %ORACLE_HOME%\rdbms\jlib\ojdm_api.jar
            %ORACLE_HOME%\rdbms\jlib\xdb.jar
            %ORACLE_HOME%\jdbc\lib\ojdbc5.jar
            %ORACLE_HOME%\oc4j\j2ee\home\lib\connector.jar
            %ORACLE_HOME%\jlib\orai18n.jar   
            %ORACLE_HOME%\jlib\orai18n-mapping.jar
            %ORACLE_HOME%\lib\xmlparserv2.jar

Compile the programs listed in Table 1-2. To use the JAVAC executable, open a command window and go to \rdbms\demo in Oracle home.
```
>javac program_name.java
```
For example:
```
>javac dmnbdemo.java
```
If JAVAC is not found, then check the value of the PATH variable.

Run the Java Programs

You can run a Java program from the operating system prompt with a command like this:

>java program_name host_name:port_number:database_identifier user password

For example:

>java dmnbdemo mypc:1521:orcl dmuser dmuser_password

This command runs the dmnbdemo Java program on a computer named mypc. The program runs in a database that has the default name (orcl) and uses the default port (1521).

View the Models Created by the Sample Programs

You can query the USER_MINING_MODELS view to list the models in your schema.

SQL> set linesize 100
SQL> select model_name, mining_function, algorithm from user_mining_models;
 
MODEL_NAME               MINING_FUNCTION            ALGORITHM
------------------------ -------------------------- ------------------------------
AI_SH_SAMPLE             ATTRIBUTE_IMPORTANCE       MINIMUM_DESCRIPTION_LENGTH
AR_SH_SAMPLE             ASSOCIATION_RULES          APRIORI_ASSOCIATION_RULES

This example shows that there are two mining models in your schema. The model name, mining function, and algorithm are displayed. To find all the columns defined in a view, use a DESCRIBE command.

SQL> DESCRIBE user_mining_models

You can query the USER_MINING_MODEL_ATTRIBUTES and USER_MINING_MODEL_SETTINGS views to obtain information about the attributes and settings for the models in your schema.