Oracle9i SQL Reference Release 2 (9.2) Part Number A96540-02 |
|
|
View PDF |
The linear regression functions are:
linear_regr::=
See Also:
"Analytic Functions" for information on syntax, semantics, and restrictions |
The linear regression functions fit an ordinary-least-squares regression line to a set of number pairs. You can use them as both aggregate and analytic functions.
See Also:
|
Oracle applies the function to the set of (expr1
, expr2
) pairs after eliminating all pairs for which either expr1
or expr2
is null. Oracle computes all the regression functions simultaneously during a single pass through the data.
expr1
is interpreted as a value of the dependent variable (a "y value"), and expr2
is interpreted as a value of the independent variable (an "x value"). Both expressions must be numbers.
REGR_SLOPE
returns the slope of the line. The return value is a number and can be null. After the elimination of null (expr1
, expr2
) pairs, it makes the following computation:
COVAR_POP(expr1, expr2) / VAR_POP(expr2)
REGR_INTERCEPT
returns the y-intercept of the regression line. The return value is a number and can be null. After the elimination of null (expr1
, expr2
) pairs, it makes the following computation:
AVG(expr1) - REGR_SLOPE(expr1, expr2) * AVG(expr2)
REGR_COUNT
returns an integer that is the number of non-null number pairs used to fit the regression line.REGR_R2
returns the coefficient of determination (also called "R-squared" or "goodness of fit") for the regression. The return value is a number and can be null. VAR_POP
(expr1
) and VAR_POP
(expr2
) are evaluated after the elimination of null pairs. The return values are:
NULL if VAR_POP(expr2) = 0 1 if VAR_POP(expr1) = 0 and VAR_POP(expr2) != 0 POWER(CORR(expr1,expr),2) if VAR_POP(expr1) > 0 and VAR_POP(expr2 != 0
All of the remaining regression functions return a number and can be null:
REGR_AVGX
evaluates the average of the independent variable (expr2
) of the regression line. It makes the following computation after the elimination of null (expr1
, expr2
) pairs:
AVG(expr2)
REGR_AVGY
evaluates the average of the dependent variable (expr1
) of the regression line. It makes the following computation after the elimination of null (expr1
, expr2
) pairs:
AVG(expr1)
REGR_SXY
, REGR_SXX
, REGR_SYY
are auxiliary functions that are used to compute various diagnostic statistics.
REGR_SXX
makes the following computation after the elimination of null (expr1
, expr2
) pairs:
REGR_COUNT(expr1, expr2) * VAR_POP(expr2)
REGR_SYY
makes the following computation after the elimination of null (expr1
, expr2
) pairs:
REGR_COUNT(expr1, expr2) * VAR_POP(expr1)
REGR_SXY
makes the following computation after the elimination of null (expr1
, expr2
) pairs:
REGR_COUNT(expr1, expr2) * COVAR_POP(expr1, expr2)
The following examples are based on the sample tables sh.sales
and sh.products
.
The following example provides a comparison of the various linear regression functions:
SELECT s.channel_id, REGR_SLOPE(s.quantity_sold, p.prod_list_price) SLOPE , REGR_INTERCEPT(s.quantity_sold, p.prod_list_price) INTCPT , REGR_R2(s.quantity_sold, p.prod_list_price) RSQR , REGR_COUNT(s.quantity_sold, p.prod_list_price) COUNT , REGR_AVGX(s.quantity_sold, p.prod_list_price) AVGLISTP , REGR_AVGY(s.quantity_sold, p.prod_list_price) AVGQSOLD FROM sales s, products p WHERE s.prod_id=p.prod_id AND p.prod_category='Men' AND s.time_id=to_DATE('10-OCT-2000') GROUP BY s.channel_id ; C SLOPE INTCPT RSQR COUNT AVGLISTP AVGQSOLD - ---------- ---------- ---------- ---------- ---------- ---------- C -.03529838 16.4548382 .217277422 17 87.8764706 13.3529412 I -.0108044 13.3082392 .028398018 43 116.77907 12.0465116 P -.01729665 11.3634927 .026191191 33 80.5818182 9.96969697 S -.01277499 13.488506 .000473089 71 52.571831 12.8169014 T -.01026734 5.01019929 .064283727 21 75.2 4.23809524
The following example determines the slope and intercept of the regression line for the amount of sales and sale profits for each fiscal year:
SELECT t.fiscal_year, REGR_SLOPE(s.amount_sold, s.quantity_sold) "Slope", REGR_INTERCEPT(s.amount_sold, s.quantity_sold) "Intercept" FROM sales s, times t WHERE s.time_id = t.time_id GROUP BY t.fiscal_year; FISCAL_YEAR Slope Intercept ----------- ---------- ---------- 1998 49.3934247 71.6015479 1999 49.3443482 70.1502601 2000 49.2262135 75.0287476
The following example determines the cumulative slope and cumulative intercept of the regression line for the amount of and quantity of sales for two products (270 and 260) for weekend transactions (day_number_in_week
= 6 or 7) during the last three weeks (fiscal_week_number
of 50, 51, or 52) of 1998:
SELECT t.fiscal_month_number "Month", t.day_number_in_month "Day", REGR_SLOPE(s.amount_sold, s.quantity_sold) OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month) AS CUM_SLOPE, REGR_INTERCEPT(s.amount_sold, s.quantity_sold) OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month) AS CUM_ICPT FROM sales s, times t WHERE s.time_id = t.time_id AND s.prod_id IN (270, 260) AND t.fiscal_year=1998 AND t.fiscal_week_number IN (50, 51, 52) AND t.day_number_in_week IN (6,7) ORDER BY t.fiscal_month_desc, t.day_number_in_month;
Month Day CUM_SLOPE CUM_ICPT ---------- ---------- ---------- ---------- 12 12 -68 1872 12 12 -68 1872 12 13 -20.244898 1254.36735 12 13 -20.244898 1254.36735 12 19 -18.826087 1287 12 20 62.4561404 125.28655 12 20 62.4561404 125.28655 12 20 62.4561404 125.28655 12 20 62.4561404 125.28655 12 26 67.2658228 58.9712313 12 26 67.2658228 58.9712313 12 27 37.5245541 284.958221 12 27 37.5245541 284.958221 12 27 37.5245541 284.958221
The following example returns the number of customers in the customers
table (out of a total of 319) who have account managers.
SELECT REGR_COUNT(customer_id, account_mgr_id) FROM customers; REGR_COUNT(CUSTOMER_ID,ACCOUNT_MGR_ID) -------------------------------------- 231
The following example computes the cumulative number of transactions for each day in April of 1998:
SELECT UNIQUE t.day_number_in_month, REGR_COUNT(s.amount_sold, s.quantity_sold) OVER (PARTITION BY t.fiscal_month_number ORDER BY t.day_number_in_month) "Regr_Count" FROM sales s, times t WHERE s.time_id = t.time_id AND t.fiscal_year = 1998 AND t.fiscal_month_number = 4; DAY_NUMBER_IN_MONTH Regr_Count ------------------- ---------- 1 825 2 1650 3 2475 4 3300 . . . 26 21450 30 22200
The following example computes the coefficient of determination of the regression line for amount of sales greater than 5000 and quantity sold:
SELECT REGR_R2(amount_sold, quantity_sold) FROM sales WHERE amount_sold > 5000; REGR_R2(AMOUNT_SOLD,QUANTITY_SOLD) ---------------------------------- .024087453
The following example computes the cumulative coefficient of determination of the regression line for monthly sales amounts and quantities for each month during 1998:
SELECT t.fiscal_month_number, REGR_R2(SUM(s.amount_sold), SUM(s.quantity_sold)) OVER (ORDER BY t.fiscal_month_number) "Regr_R2" FROM sales s, times t WHERE s.time_id = t.time_id AND t.fiscal_year = 1998 GROUP BY t.fiscal_month_number ORDER BY t.fiscal_month_number; FISCAL_MONTH_NUMBER Regr_R2 ------------------- ---------- 1 2 1 3 .927372984 4 .807019972 5 .932745567 6 .94682861 7 .965342011 8 .955768075 9 .959542618 10 .938618575 11 .880931415 12 .882769189
The following example calculates the regression average for the amount and quantity of sales for each year:
SELECT t.fiscal_year, REGR_AVGY(s.amount_sold, s.quantity_sold) "Regr_AvgY", REGR_AVGX(s.amount_sold, s.quantity_sold) "Regr_AvgX" FROM sales s, times t WHERE s.time_id = t.time_id GROUP BY t.fiscal_year; FISCAL_YEAR Regr_AvgY Regr_AvgX ----------- ---------- ---------- 1998 716.602044 13.0584283 1999 714.910831 13.0665536 2000 717.331304 13.0479781
The following example calculates the cumulative averages for the amount and quantity of sales profits for product 260 during the last two weeks of December 1998:
SELECT t.day_number_in_month, REGR_AVGY(s.amount_sold, s.quantity_sold) OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month) "Regr_AvgY", REGR_AVGX(s.amount_sold, s.quantity_sold) OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month) "Regr_AvgX" FROM sales s, times t WHERE s.time_id = t.time_id AND s.prod_id = 260 AND t.fiscal_month_desc = '1998-12' AND t.fiscal_week_number IN (51, 52) ORDER BY t.day_number_in_month; DAY_NUMBER_IN_MONTH Regr_AvgY Regr_AvgX ------------------- ---------- ---------- 14 882 24.5 14 882 24.5 15 801 22.25 15 801 22.25 16 777.6 21.6 18 642.857143 17.8571429 18 642.857143 17.8571429 20 589.5 16.375 21 544 15.1111111 22 592.363636 16.4545455 22 592.363636 16.4545455 24 553.846154 15.3846154 24 553.846154 15.3846154 26 522 14.5 27 578.4 16.0666667
The following example computes the REGR_SXY
, REGR_SXX
, and REGR_SYY
values for the regression analysis of amount and quantity of sales for each year in the sample sh.sales
table:
SELECT t.fiscal_year, REGR_SXY(s.amount_sold, s.quantity_sold) "Regr_sxy", REGR_SYY(s.amount_sold, s.quantity_sold) "Regr_syy", REGR_SXX(s.amount_sold, s.quantity_sold) "Regr_sxx" FROM sales s, times t WHERE s.time_id = t.time_id GROUP BY t.fiscal_year; FISCAL_YEAR Regr_sxy Regr_syy Regr_sxx ----------- ---------- ---------- ---------- 1998 1620591607 2.3328E+11 32809865.2 1999 1955866724 2.7695E+11 39637097.2 2000 2127877398 3.0630E+11 43226509.7
The following example computes the cumulative REGR_SXY
, REGR_SXX
, and REGR_SYY
statistics for amount and quantity of weekend sales for products 270 and 260 for each year-month value in 1998:
SELECT t.day_number_in_month, REGR_SXY(s.amount_sold, s.quantity_sold) OVER (ORDER BY t.fiscal_year, t.fiscal_month_desc) "Regr_sxy", REGR_SYY(s.amount_sold, s.quantity_sold) OVER (ORDER BY t.fiscal_year, t.fiscal_month_desc) "Regr_syy", REGR_SXX(s.amount_sold, s.quantity_sold) OVER (ORDER BY t.fiscal_year, t.fiscal_month_desc) "Regr_sxx" FROM sales s, times t WHERE s.time_id = t.time_id AND prod_id IN (270, 260) AND t.fiscal_month_desc = '1998-02' AND t.day_number_in_week IN (6,7) ORDER BY t.day_number_in_month; DAY_NUMBER_IN_MONTH Regr_sxy Regr_syy Regr_sxx ------------------- ---------- ---------- ---------- 1 130973783 1.8916E+10 2577797.94 . . . 30 130973783 1.8916E+10 2577797.94