Optimal Binary Search Trees(최적이진탐색트리)

Basic Concept

Prerequisite Knowledge: Binary Search Tree

binary search tree: A binary tree of items (ordinarily called keys), that come from an ordered set, such that
1. Each node contain one key.
2. The keys in the right subtree of a given node are greater than or equal to the key in that tree.
3. The keys in the left subtree of a given node are less than or equal to the key in that tree.

left subtree: For any node in a binary tree, the subtree whose root is the left child of the node.
right subtree: For any node in a binary tree, the subtree whose root is the right child of the node.
depth: The number of edges in the unique path from the root to the node. This is also called the level of the node in the tree.
balanced: A binary tree is called balanced if the depth of the 2 subtrees of every node never differ by more than 1.

Example Two binary search trees

Data types

Struct nodetype {
		keytype key
		nodetype* left
		nodetype* right
}
Typedef nodetype* node_pointer;

Ordinarily, a binary search tree contains records that are retrieved according to the values of the keys. Our goal is to organize the keys in a binary search tree so that the average time it takes to locate a key is minimized. A tree that is organized in this fashion is called optimal. It is not hard to see that, if all keys have the same probability of being the search key, the tree image on the right is optimal. We are concerned with the case where the keys do not have the smae probability.

Example Shows the five different trees when n = 3. The actual values of the keys are not important. The only requirement is that they be ordered. $p_i$ is the probability that $Key_i$ is the search key. If

$p_1 = 0.7$ ($Key_1$ 찾을 확률), $p_2 = 0.2$ ($Key_2$ 찾을 확률), and $p_3 = 0.1$ ($Key_3$ 찾을 확률),

the average search times for the below trees are:

1번 트리의 평균검색시간: 3(0.7) + 2(0.2) + 1(0.1) = 2.6

$Key_1$을 세 번 비교 후 찾을 확률 +
$Key_2$를 두 번 비교 후 찾을 확률 +
$Key_3$를 한 번 비교 후 찾을 확률 = 2.6

2번 트리의 평균검색시간: 2(0.7) + 3(0.2) + 1(0.1) = 2.1

$Key_1$을 두 번 비교 후 찾을 확률 +
$Key_2$를 세 번 비교 후 찾을 확률 +
$Key_3$를 한 번 비교 후 찾을 확률 = 2.1

3번 트리의 평균검색시간: 2(0.7) + 1(0.2) + 2(0.1) = 1.8

$Key_1$을 두 번 비교 후 찾을 확률 +
$Key_2$를 한 번 비교 후 찾을 확률 +
$Key_3$를 두 번 비교 후 찾을 확률 = 1.8

4번 트리의 평균검색시간: 1(0.7) + 3(0.2) + 2(0.1) = 1.5

$Key_1$을 한 번 비교 후 찾을 확률 +
$Key_2$를 세 번 비교 후 찾을 확률 +
$Key_3$를 두 번 비교 후 찾을 확률 = 1.5

5번 트리의 평균검색시간: 1(0.7) + 2(0.2) + 3(0.1) = 1.4

$Key_1$을 한 번 비교 후 찾을 확률 +
$Key_2$를 두 번 비교 후 찾을 확률 +
$Key_3$를 세 번 비교 후 찾을 확률 = 1.4
The fifth tree is optimal.

👉 세 개의 $Key_1$, $Key_2$, $Key_3$로($Key$ 값들 간에는 오름차순 정렬, 각각의 $Key$ 값을 검색할 확률은 70%, 20%, 10%) 이진탐색트리를 구성하는 방법은 총 다섯 가지이다.

5번의 트리모양이 평균검색시간을 고려했을 때 최적화된 검색시간을 갖는 이진탐색트리이다. $Key_1$ 을 찾을 확률이 높기 때문에 트리의 균형이 깨지더라도 이 값을 루트로 둔다.

Optimal Binary Search Tree Algorithm

Algorithm Design

The average search time for tree k is given by,

$\underbrace{A[1][k-1]}_{\text{Average time in left subtree}} + \underbrace{p_1 + \cdots + p_{k-1}}_{\text{Additional time comparing at root}} + \underbrace{p_k}_{\text{Average time searching for root}}+ \underbrace{A[k+1][n]}_{\text{Average time in right subtree}} + \underbrace{p_{k+1} + \cdots + p_n}_{\text{Additional time comparing at root}}$

which equals

$$A[1][k-1] + A[k+1][n] + \sum_{m=1}^np_m$$

Because one of the k trees must be optimal, the average search time for optimal tree is given by

$$A[1][n] = minimum_{1 \le k \le n}(A[1][k-1] + A[k+1][n]) + \sum_{m=1}^np_m \text{ ,}$$
$$\begin{cases}A[i][j] = minimum_{i \le k \le j}(A[i][k-1] + A[k+1][j]) + \sum_{m=i}^jp_m \quad i < j \\\\A[i][i] = p_i \\\\A[i][i-1] \text{ and } A[j+1][j] \text{ are defined to be 0}.\end{cases}$$

👉 $A[1][k-1]$은 $Key_1, ⋯, Key_{k−1}$로 구성된 왼쪽 부트리의 최적화된 평균검색시간이고, $A[k+1][n]$은 $Key_{k+1}, ⋯, Key_{n}$으로 구성된 오른쪽 부트리의 최적화된 평균검색시간이다.

식이 길고 복잡해 보이지만 (왼쪽 부트리에서의 평균검색시간) + (루트에서의 검색시간) + (오른쪽 부트리에서의 평균검색시간)으로 이루어진 직관적인 수식이다.

Example Suppose we have three keys and the probabilities in this above Example. To determine A[2[3] we must consider the two trees. For those two trees we have the following:

1번 트리의 평균검색시간: 1($p_2$) + 2($p_3$) = 1(0.2) + 2(0.1) = 0.4

$Key_2$을 한 번 비교 후 찾을 확률 +
$Key_3$를 두 번 비교 후 찾을 확률 = 0.4
The first tree is optimal.

2번 트리의 평균검색시간: 2($p_2$) + 1($p_3$) = 2(0.2) + 1(0.1) = 0.5

$Key_2$을 두 번 비교 후 찾을 확률 +
$Key_3$를 한 번 비교 후 찾을 확률 = 0.5

👉 A[2][3]에는 $Key_2$와 $Key_3$으로 구성된 이진탐색트리의 최적화된 평균검색시간이 저장된다. 예제에서 보았듯이 그 값은 0.4이다. 위의 1번 트리처럼 $Key_2$를 루트에 놓아야 최적화된 평균검색시간을 갖는 트리가 구성된다.

Pseudo Code

Optimal Binary Search Tree

void optsearchtree(int n, const float p[], float& minavg, index R[][])
{
  index i, j, k, diagonal;
  float A[1...n+1][0...n];
  for (i = 1; i <= n; i++)
  {
    A[i][i-1] = 0;
    A[i][i] = p[i];
    R[i][i] = i;
    R[i][i-1] = 0;
  }
  A[n][n+1] = 0;
  R[n+1][n] = 0;
  for (diagonal = 1; diagonal <= n-1; diagonal++)
  {
    for (i = 1; i <= n-diagonal; i++)
    {
      j = i + diagonal;
      A[i][j] = minimum(A[i][k-1] + A[k+1][j]) +  (pi + ... + pj)
      R[i][j] = a value of k that gave the minimum;
    }
  }
  minavg = A[1][n];
}

Source Code

optbintree.h

// File: optbintree.h
#ifndef OPTBIN_TREE_H
#define OPTBIN_TREE_H
#include <iomanip> // Provides setw
#include <iostream>

namespace algorithms
{
  void optsearchtree(const int n, const float p[], float &minavg, int **R);
  // Problem: Determine an optimal binary search tree for a set of keys,
  // each with a given pr of being the search key.
  // Inputs: n, the number of keys, and an array of real numbers p indexed
  // from 1 to n, where p[i] is the pr of searching for the ith key.
  // Outputs: A variable minavg, whose value is the average search time
  // for an optimal binary search tree; and a two-dimensional array R from
  // which an optimal tree can be constructed. R has its rows indexed from
  // 1 to n+1 and its columns indexed from 0 to n. R[i][j] is the index of
  // the key in the root of an optimal tree containing the ith through
  // the jth keys.

  float prsum(const float p[], const int i, const int j);
  // Postcondition: Calculate the sum of p from i to j

  template <typename Item>
  Item *get_vector_space(const int n)
  // Postcondition: Return the array containing n spaces
  {
    Item *v = new Item[n];

    --v; // offset the pointer
    for (int i = 1; i <= n; ++i)
      v[i] = Item();

    return v;
  }

  template <typename Item>
  Item **get_matrix_space(const int m, const int n)
  // Postcondition: Return the array containing m*n spaces
  {
    Item **a;
    a = new Item *[m];
    --a; // Offset the pointer to get row index from 1 to m
    for (int i = 1; i <= m; ++i)
      a[i] = new Item[n];

    // Initialize the matrix
    for (int i = 1; i <= m; ++i)
      for (int j = 1; j <= n; ++j)
        a[i][j] = Item();

    return a;
  }

  template <typename Item>
  void release_vector_space(Item* d)
  {
    delete (d);
  }

  template <typename Item>
  void release_matrix_space(Item **d, int n)
  {
    for (int i = 1; i <= n; ++i)
    	delete[](d[i] + 1);
    delete (d + 1);
  }
} // namespace algorithms

#endif

optbintree.cpp

// File: optbintree.cpp
#include <cstdlib> // Provides EXIT_SUCCESS
#include <climits> // Provides INT_MAX
#include "optbintree.h"

namespace algorithms
{
  void optsearchtree(const int n, const float p[], float &minavg, int **R)
  {
    int i, j, k, diagonal;
    // Make sure that we have to allocate n+1 instead of n
    float **A = get_matrix_space<float>(n + 1, n + 1);

    for (i = 1; i <= n; ++i)
    {
      A[i][i - 1] = 0;
      A[i][i] = p[i];
      R[i][i] = i;
      R[i][i - 1] = 0;
    }
    A[n + 1][n] = 0;
    R[n + 1][n] = 0;

    for (diagonal = 1; diagonal <= n - 1; ++diagonal)
    {
      for (i = 1; i <= n - diagonal; ++i)
      {
        j = i + diagonal;
        float min = static_cast<float>(INT_MAX); // casting
        for (k = i; k <= j; ++k)
        {
          if ((A[i][k - 1] + A[k + 1][j]) < min)
          {
            A[i][j] = A[i][k - 1] + A[k + 1][j] + prsum(p, i, j);
            R[i][j] = k;           // A value of k that gave the minimum
            min = A[i][k - 1] + A[k + 1][j]; // Update min value
          }
        }
      }
    }

    minavg = A[1][n];
  }

  float prsum(const float p[], const int i, const int j)
  {
    float answer = float();
    // Calculate the sum of pm from i to j
    for (int m = i; m <= j; ++m)
      answer += p[m];

    return answer;
  }
} // namespace algorithms

optbintreetest.cpp

// File: optbintreetest.cpp
#include <cstdlib>  // Provides EXIT_SUCCESS
#include <string>
#include "binarytree.h"
#include "optbintree.h"
using namespace std;
using namespace data_structures;
using namespace algorithms;

int main( )
{
  const int MATRIX_SIZE = 6; // the number of keys
  string* Key; // an array Key containing the MATRIX_SIZE keys in order.
  // R is the index of the key in the root of an optimal tree containing
  // the ith through the jth keys.
  int** R;

  // Exercise 24
  float* pr = get_vector_space<float>(MATRIX_SIZE);
  pr[1] = (float)0.05;
  pr[2] = (float)0.15;
  pr[3] = (float)0.05;
  pr[4] = (float)0.35;
  pr[5] = (float)0.05;
  pr[6] = (float)0.35;

  Key = get_vector_space<string>(MATRIX_SIZE);
  Key[1] = "CASE";
  Key[2] = "ELSE";
  Key[3] = "END";
  Key[4] = "IF";
  Key[5] = "OF";
  Key[6] = "THEN";

  // Make sure that we have to allocate MATRIX_SIZE+1 instead of MATRIX_SIZE
  R = get_matrix_space<int>(MATRIX_SIZE+1, MATRIX_SIZE+1);
  float minavg = float( ); // Initialize minavg

  optsearchtree(MATRIX_SIZE, pr, minavg, R);

  // Make an optimal binary tree
  binary_tree_node* root = tree(R, Key, 1, MATRIX_SIZE);

  // Print optimal binary tree
  cout << "The Optimal Binary Tree is.." << endl << endl;
  print_bintree(root, 2);

  release_vector_space(pr);
  release_vector_space(Key);
  release_matrix_space(R, MATRIX_SIZE);
  return EXIT_SUCCESS;
}

Time Complexity Analysis

Basic operation

The instructions executed for each value of k. They include a comparison to test for the minimum.

Input size

n, the number of keys.

Every-Case Time Complexity

The control of this algorithm is almost identical to that Chained Matrix Multiplication. The only difference is that, for given values of diagonal and i, the basic operation is done diagonal+1 times.

$\displaystyle T(n) = \sum_{diagonal=1}^{n-1}[(n-diagonal) \times (diagonal+1)] = \frac{n(n-1)(n+4)}{6} \in \Theta(n^3)$

구분	수행 조건	수행 횟수
diagonal-루프 수행 횟수	1 to n - 1	n - 1
i-루프 수행 횟수	1 to n - diagonal	n - diagonal
k-루프 수행 횟수	i to j	j - i + 1 = (i + diagonal) - i + 1 = diagonal + 1 ※ j = i + diagonal

Principle of Optimality

Optimal Binary Search Trees problem is an optimization problem. Therefore, the optimal solution to the instance contains optimal solutions to all subinstances, and the principle of optimality applies.

🔗 Dynamic Programming and Optimization Problems(동적계획법과 최적화 문제)

Exercies

Create the optimal binary search tree for the following items, where the probability occurrence of each word is given in parentheses:

CASE (0.05), ELSE (0.15), END (0.05), IF (0.35), OF (0.05), THEN (0.35).

My Answer

index      1      2      3      4      5      6
     ___________________________________________
  1 │   0.05   0.25   0.35   0.95   1.05    1.8
  2 │      0   0.15   0.25    0.8    0.9   1.65
  3 │      0      0   0.05   0.45   0.55    1.3
  4 │      0      0      0   0.35   0.45    1.2
  5 │      0      0      0      0   0.05   0.45
  6 │      0      0      0      0      0   0.35
  7 │      0      0      0      0      0      0

index      1      2      3      4      5      6
     ___________________________________________
  1 │      1      2      2      4      4      4
  2 │      0      2      2      4      4      4
  3 │      0      0      3      4      4      4
  4 │      0      0      0      4      4      4
  5 │      0      0      0      0      5      6
  6 │      0      0      0      0      0      6
  7 │      0      0      0      0      0      0

The Optimal Binary Tree is..

            THEN
                OF
        IF
                END
            ELSE
                CASE

👉 다음은 책에 있는 연습문제를 알고리즘에 적용한 결과이다. 생성된 이진탐색트리의 표현은 재귀를 이용하여 출력하였다. IF가 루트 ELSE는 IF의 왼쪽자식, TEHN이 IF의 오른쪽 자식이다.

'Algorithms > Dynamic Programming' 카테고리의 다른 글

Chained Matrix Multiplication(연쇄행렬곱셈) (2)	2011.01.14
Floyd's Algorithm for Shortest Paths(플로이드 알고리즘) (0)	2011.01.12
Binomial Coefficient(이항계수) (0)	2011.01.11
Dynamic Programming(동적계획법) (0)	2011.01.10

Basic Concept

Prerequisite Knowledge: Binary Search Tree

Optimal Binary Search Tree Algorithm

Algorithm Design

Pseudo Code

Source Code

Time Complexity Analysis

Basic operation

Input size

Every-Case Time Complexity

Principle of Optimality

Exercies

My Answer

'Algorithms > Dynamic Programming' 카테고리의 다른 글

티스토리툴바