Basic Concept
We will investigate two different greedy algorithms for minimum spanning trees, Prim’s algorithm and Kruskal’s algorithm. Each uses a different locally optimal property.
๐ ์์ ์ดํด ๋ณธ ํ๋ฆผ ์๊ณ ๋ฆฌ์ฆ ๋์ ํฌ๋ฌ์ค์ปฌ ์๊ณ ๋ฆฌ์ฆ์ ์ด์ฉํด๋ ์ต์์ ์ฅํธ๋ฆฌ๋ฅผ ๊ตฌํ ์ ์๋ค.
Example etermine a minimum spanning tree.
Step 1 ๊ฐ์ค์น๋ฅผ ๊ธฐ์ค์ผ๋ก ์ ์ฒด ์ด์์ ๋ค์ ์ค๋ฆ์ฐจ์์ผ๋ก ์ ๋ ฌํ๋ค.

Step 2 5๊ฐ์ ์ ์ ๋ค์ด ์์ผ๋ฏ๋ก ์ด 5๊ฐ์ ์๋ก์ ์งํฉ, ์ฆ {$v_1$}, {$v_2$}, {$v_3$}, {$v_4$}, {$v_5$}์ด ๋ง๋ค์ด์ง๊ฒ ๋๋ค.

Step 3 ์ค๋ฆ์ฐจ์์ผ๋ก ์ ๋ ฌ๋ ์ด์์ ์ค ์ต์๊ฐ์ค์น ๊ฐ์ธ 1์ด ์ ํ๋๋ค. 1์ $v_1$์ ์ ๊ณผ $v_2$์ ์ ์ ์ฐ๊ฒฐํด์ฃผ๋ ์ด์์ ๊ฐ์ด๋ค. ๊ทธ๋ฆฌ๊ณ $v_1$๊ณผ $v_2$๋ ์๋ก์ ์งํฉ์ด๋ฏ๋ก ๋ ์งํฉ์ ํ๋์ ๋ถ๋ถ์งํฉ์ผ๋ก ํฉ์ง๋ค. ํ์ฌ๊น์ง์ ๋ถ๋ถ์งํฉ์ {$v_1$, $v_2$}, {$v_3$}, {$v_4$}, {$v_5$} ์ด๋ ๊ฒ 4๊ฐ์ด๋ค.

Step 4 ์ต์๊ฐ์ค์น๋ฅผ ๊ฐ๋ ์ด์์ ์ผ๋ก $v_3$์ ์ ๊ณผ $v_5$์ ์ ์ ์ฐ๊ฒฐํด์ฃผ๋ 2๊ฐ ์ ํ๋๋ค. $v_3$๊ณผ $v_5$์ญ์ ์๋ก์ ์งํฉ์ด๋ฏ๋ก ๋ ์งํฉ์ ํ๋์ ๋ถ๋ถ์งํฉ์ผ๋ก ํฉ์น๋ค. ํ์ฌ๊น์ง์ ๋ถ๋ถ์งํฉ์ {$v_1$, $v_2$}, {$v_3$, $v_5$}, {$v_4$} ์ด๋ ๊ฒ 3๊ฐ์ด๋ค.

Step 5 ์ต์๊ฐ์ค์น๋ฅผ ๊ฐ๋ ์ด์์ ์ผ๋ก $v_1$์ ์ ๊ณผ $v_3$์ ์ ์ ์ฐ๊ฒฐํด์ฃผ๋ 3์ด ์ ํ๋๋ฉฐ, ์์ ๊ณผ์ ์ด ๋๊ฐ์ด ์ํ๋๋ค. ํ์ฌ๊น์ง์ ๋ถ๋ถ์งํฉ์ {$v_1$, $v_2$, $v_3$, $v_5$}๊ณผ {$v_4$} ์ด๋ ๊ฒ ์ด 2๊ฐ์ด๋ค.

Step 6 ๋ค์์ผ๋ก $v_2$์ ์ ๊ณผ $v_3$์ ์ ์ ์ฐ๊ฒฐํด์ฃผ๋ ์ด์์ 3์ด ์ ํ๋๋ค. ๊ทธ๋ฌ๋ $v_2$์ $v_3$์ ๊ฒฝ์ฐ ์ด๋ฏธ ๊ฐ์ ์งํฉ์ ํฌํจ๋์ด ์๋ค. ๊ฐ์ ์งํฉ์ ์ ์ ๋ค๋ผ๋ฆฌ ์ฐ๊ฒฐํ๋ ๊ฒฝ์ฐ์๋ ์ํ๊ฒฝ๋ก๊ฐ ์๊ธฐ๋ฏ๋ก $v_2$์ ์ ๊ณผ $v_3$์ ์ ์ ์ฐ๊ฒฐํ๋ ์ด์์ ์ ๋ฒ๋ฆฐ๋ค.

Step 7 ๋ง์ง๋ง์ผ๋ก $v_3$์ ์ ๊ณผ $v_4$์ ์ ์ ์ฐ๊ฒฐํ๋ ์ด์์ ์ด ์ ํ๋๊ณ , 5๊ฐ๋ก ์ถ๋ฐํ๋ ์๋ก์ ์งํฉ๋ค์ด ํ๋์ ์งํฉ์ผ๋ก ํฉ์ณ์ก๊ธฐ ๋๋ฌธ์ ์ต์์ ์ฅํธ๋ฆฌ๊ฐ ์์ฑ๋๋ค.

Kruskal’s Algorithm
Algorithm Design
To write a formal version of Kruskal’s algorithm, we need a disjoint set abstract data type. The disjoint set abstract data type consists of data types index and set pointer, and routines initial, find, merge, and equal.
๐ ํฌ๋ฌ์ค์ปฌ ์๊ณ ๋ฆฌ์ฆ์์๋ ์๋ก์ ์งํฉ(disjoint set)์ ์๋ฃ๊ตฌ์กฐ๊ฐ ์ฌ์ฉ๋๋ค.
Kruskal’s algorithm for the Minimum Spanning Tree problem starts by creating disjoint subsets of V, one for each vertex and containing only that vertex. It then inspects the edges according to nondecreasing weight (ties are broken arbitrarily). If an edge connects two vertices in disjoint subsets, the edge is added and the subsets are merged into one set. This process is repeated until all the subsets are merged into one set. The following is a high-level algorithm for this procedure.
๐ ์ ์ ๋ค์ ๊ฐ์ ๋งํผ ์๋ก์ ๋ถ๋ถ์งํฉ์ ์์ฑํ๊ณ , ๊ฐ์ค์น๋ฅผ ๊ธฐ์ค์ผ๋ก ์ ์ฒด ์ด์์ ๋ค์ ์ค๋ฆ์ฐจ์์ผ๋ก ์ ๋ ฌํ๋ค. ๊ทธ ๋ค์ ์ค๋ฆ์ฐจ์์ผ๋ก ์ ๋ ฌ๋ ์ด์์ ๋ค์ ํ๋์ฉ ์กฐ์ฌํ๋ฉด์, ํด๋น ์ด์์ ์ด ์ฐ๊ฒฐํ๊ณ ์๋ ์ ์ ๋ค์ด ์๋ก์ ๋ถ๋ถ์งํฉ์ด๋ฉด ๋ ๊ฐ์ ๋ถ๋ถ์งํฉ์ ํ๋๋ก ํฉ์น๋ค. ์ด ๊ณผ์ ์ ๋ชจ๋ ์๋ก์ ๋ถ๋ถ์งํฉ๋ค์ด ํ๋์ ์งํฉ์ผ๋ก ํฉ์ณ์ง ๋๊น์ง ๊ณ์ ์ํํ๋ค.
Pseudo Code
- Minimum Spanning Trees - Kruskal's Algorithm
void kruskal(int n, int m, set_of_edges E, set_of_edges& F)
{
index i, j;
set_pointer p, q;
edge e;
Sort the m edges in E by weight in nondecreasing order;
F = "empty";
initial(n);
while (number of edges in F is less than n-1)
{
e = edges with least weight not yet considered;
i, j = indices of vertices connected by e;
p = find(i);
q = find(j);
if (!equal(p,q))
{
merge(p,q);
add e to F;
}
}
Source Code
- kruskal.h
// File: kruskal.h
#ifndef KRUSKAL_H
#define KRUSKAL_H
#include <string>
#include "graph.h"
using namespace data_structures;
namespace algorithms
{
struct edge
{
std::string vertex[2];
int weight;
};
typedef struct edge edge;
typedef edge* set_of_edges;
void kruskal(int n, int m, set_of_edges& E, set_of_edges& F);
// Problem: Determine a minimum spanning tree.
// Inputs: integer n >= 2, positive integer m, and a connected, weighted,
// undirected graph containing n vertices and m edges. The graph is
// represented b y a set E that contains the edges in the graph along
// with their weights.
// Outputs: F, a set of edges in a minimum spanning tree.
void make_edgeset(const graph g, set_of_edges& E, int& num_of_edges);
void init_edgeset(set_of_edges& F, const int n);
void mergesort2(set_of_edges& data, int low, int high);
void merge2(set_of_edges& data, int low, int mid, int high);
}
#endif
- kruskal.cpp
// File: kruskal.cpp
#include <string>
#include "disjointset.h"
#include "kruskal.h"
namespace algorithms
{
void kruskal(int n, int m, set_of_edges &E, set_of_edges &F)
{
disjointset dsjset(n); // disjoint set
disjointset::index i, j;
disjointset::set_pointer p, q;
edge e;
// Sort the m edges in E by weight in nondecreasing order
mergesort2(E, 0, m);
dsjset.initialize(n); // Initialize n disjoint subsets.
int k = 0;
int repeat = 0;
while (repeat < n)
{
// edge with least weight not yet considered
e.weight = E[repeat].weight;
e.vertex[0] = E[repeat].vertex[0];
e.vertex[1] = E[repeat].vertex[1];
// indices of vertices connected by e
i = atoi(e.vertex[0].c_str());
j = atoi(e.vertex[1].c_str());
p = dsjset.find(i);
q = dsjset.find(j);
if (!dsjset.equal(p, q))
{
dsjset.merge(p, q);
F[k++] = e;
}
++repeat;
}
}
void make_edgeset(const graph g, set_of_edges &E, int &num_of_edges)
{
edge e;
int cnt = 0;
int k = 0;
for (int i = 1; i <= g.get_size(); ++i)
{
for (int j = i; j <= g.get_size(); ++j)
{
if (g.get_edge(i, j) != graph::INFINITE && g.get_edge(i, j) != 0)
{
// edge connecting vertices index by i and j
e.vertex[0] = std::to_string(i);
e.vertex[1] = std::to_string(j);
e.weight = g.get_edge(i, j);
E[k++] = e; // add e to E
++cnt; // count valid edges
}
}
}
num_of_edges = cnt;
}
void init_edgeset(set_of_edges &F, const int n)
{
for (int i = 0; i < n; ++i)
{
F[i].vertex[0] = "";
F[i].vertex[1] = "";
F[i].weight = graph::INFINITE;
}
}
void mergesort2(set_of_edges &data, int low, int high)
// Problem: Sort n keys in nondecreasing sequence.
// Inputs: positive integer n, array of key S indexed from 1 to n.
// the array S containing the keys in nondecreasing order.
{
int mid;
if (low < high)
{
mid = (low + high) / 2;
mergesort2(data, low, mid);
mergesort2(data, mid + 1, high);
merge2(data, low, mid, high);
}
}
void merge2(set_of_edges &data, int low, int mid, int high)
{
int i, j, k;
// A local array needed for the merging
set_of_edges temp = new edge[(high - low) + 1];
i = low;
j = mid + 1;
k = 0;
while (i <= mid && j <= high)
{
if (data[i].weight < data[j].weight)
temp[k] = data[i++]; // Copy from first half
else
temp[k] = data[j++]; // Copy from second half
k++;
}
// Copy any remaining entries in the left and right subarrays.
if (i > mid)
{
while (j <= high)
temp[k++] = data[j++];
}
else
{
while (i <= mid)
temp[k++] = data[i++];
}
// Copy from temp back to the data array, and release temp's memory.
i = low;
for (k = 0; k < (high - low) + 1; ++k)
data[i++] = temp[k];
delete[] temp;
}
} // namespace algorithms
- minspantree.cpp
// File: minspantree.cpp
#include <iostream>
#include <cstdlib> // Provides EXIT_SUCCESS;
#include <iomanip> // Provides setw
#include "graph.h" // Provides graph
#include "kruskal.h" // Provides Kruskal's algorithm
#include "disjointset.h" // Provides disjoint set
using namespace std;
using namespace data_structures;
using namespace algorithms;
int main()
{
const int GRAPH_SIZE = 5;
int num_of_edges = 0;
set_of_edges E, F;
graph example(GRAPH_SIZE);
example.set_vertex("V1");
example.set_vertex("V2");
example.set_vertex("V3");
example.set_vertex("V4");
example.set_vertex("V5");
example.set_edge(1, 2, 1);
example.set_edge(1, 3, 3);
example.set_edge(2, 1, 1);
example.set_edge(2, 3, 3);
example.set_edge(2, 4, 6);
example.set_edge(3, 1, 3);
example.set_edge(3, 2, 3);
example.set_edge(3, 4, 4);
example.set_edge(3, 5, 2);
example.set_edge(4, 2, 6);
example.set_edge(4, 3, 4);
example.set_edge(4, 5, 5);
example.set_edge(5, 3, 2);
example.set_edge(5, 4, 5);
E = new edge[GRAPH_SIZE * GRAPH_SIZE];
make_edgeset(example, E, num_of_edges); // Copy edges from graph
F = new edge[num_of_edges];
init_edgeset(F, num_of_edges);
for (int i = 0; i < num_of_edges; ++i)
cout << "V" << E[i].vertex[0] << " to "
<< "V" << E[i].vertex[1] << " with weight " << E[i].weight << endl;
kruskal(example.get_size(), num_of_edges, E, F); // Kruskal algorithm
for (int i = 0; i < num_of_edges; ++i)
{
if (F[i].weight != 0 && F[i].weight != graph::INFINITE)
{
cout << setw(7) << "From" << setw(3) << "V" << F[i].vertex[0] << setw(3)
<< "to" << setw(3) << "V" << F[i].vertex[1]
<< ": " << F[i].weight << endl;
}
}
delete[](E);
delete[](F);
return EXIT_SUCCESS;
}
Time Complexity Analysis
Basic operation
A comparison instruction.
Input size
n, the number of vertices, and m, the number of edges.
Worst-Case Time Complexity
$W(m,n) \in \Theta(n^2lgn^2) = \Theta(n^22lgn) = \Theta(n^2lgn)$
| W(m) = The time to sort the edges(Mergesort) | $W(m) \in \Theta(mlgn)$ |
| W(m) = The time in the while loop | $W(m) \in \Theta(mlgn)$ |
| T(n)= The time to initialize n disjoint sets | $T(n) \in \Theta(n)$ |
Optimality Proof
Recall that there is no guarantee that a given greedy algorithm always yields an optimal solution. One must prove whether or not this is the case. We will prove that Kruskal’s algorithms always produce minimum spanning trees. We show that the following proposition P is true by induction.
Proposition P
If F is the set of edges chosen at any stage of the algorithm, then there is some minimum spanning tree that contains F.
๐ F๊ฐ ์๊ณ ๋ฆฌ์ฆ์ ์ด๋ ๋จ๊ณ์์ ์ ํ๋ ์ด์์ ์ ์งํฉ์ด๋ผ๋ฉด F๋ฅผ ํฌํจํ๋ ์ต์์ ์ฅํธ๋ฆฌ๊ฐ ์กด์ฌํ๋ค.
Clearly P is true at the beginning, when F is empty: any minimum spanning tree will do, and there exists one because a weighted connected graph always has a minimum spanning tree.
๐ ์ฒ์ ์์์ F๊ฐ ๊ณต์งํฉ์ธ ๊ฒฝ์ฐ๋ค. ์ฐ๊ฒฐ๋๊ณ , ๊ฐ์ค์น๊ฐ ์๋ ๋น๋ฐฉํฅ์ฑ ๊ทธ๋ํ์์๋ ํญ์ ์ต์์ ์ฅํธ๋ฆฌ๊ฐ ์กด์ฌํ๋ฏ๋ก, ๊ณต์งํฉ F๋ฅผ ํฌํจํ๋ ์ต์์ ์ฅํธ๋ฆฌ๊ฐ ์กด์ฌํ๋ค. ๋ฐ๋ผ์ ๋ช ์ P๋ ์ฐธ์ด๋ค.
Now assume P is true for some non-final edge set F and let T be a minimum spanning tree that contains F.
๐ P๊ฐ ๊ณต์งํฉ์ด ์๋ ์ด์์ ์งํฉ F์ ๋ํด์๋ ์ฐธ์ด๋ฉฐ, T๊ฐ F๋ฅผ ํฌํจํ๋ ์ต์์ ์ฅํธ๋ฆฌ๋ผ๊ณ ๊ฐ์ ํ๋ค.
Case 1
If the next chosen edge e is also in T, then P is true for F U {e}.
๐ ๋ค์์ ์ ํ๋ ์ด์์ e๊ฐ e ∈ T๋ผ๋ฉด, F U {e}์ ๋ํด ๋ช ์ P๊ฐ ์ฐธ์ด๋ค.
Case 2
Otherwise, e is not in T, and T U {e} has a cycle C and there is another edge e’ that is in C but not F. (Since if all the e’ are in the F, then the algorithm won’t choose e, otherwise it will be a cycle.) (If there was no such edge e’, then e could not have been added to F, since doing so would have created the cycle C.) Then T − {e’} U {e} is a tree, and it has the same weight as T, since T has minimum weight and the weight of e’ cannot be less than the weight of e, otherwise the algorithm would have chosen e’ instead of e. So T − {e’} U {e} is a minimum spanning tree containing F U {e} and again P holds.

๐ ๋ง์ผ e ∉ T๋ผ๋ฉด T U {e}๋ ์ํ๊ฒฝ๋ก C๋ฅผ ๊ฐ์ง๋ฉฐ ๊ทธ ์ํ๊ฒฝ๋ก์์๋ F์ ์ํ์ง ์์ ์ด์์ e’๊ฐ ์๋ค. ๋ชจ๋ e’๊ฐ F์ ์ํ๊ฒ ๋๋ฉด ์๊ณ ๋ฆฌ์ฆ์ e๋ฅผ ์ ํํ์ง ์๋๋ค. e๋ฅผ ์ ํํ๋ ์๊ฐ ์ํ๊ฒฝ๋ก๊ฐ ์๊ธฐ๊ธฐ ๋๋ฌธ์ด๋ค. ๊ทธ๋ฐ ์ด์์ e’๊ฐ ์๋ค๋ฉด, e๋ฅผ F์ ์ถ๊ฐํ ์ ์๋ค. e๋ฅผ ์ถ๊ฐํ๋ฉด ์ํ๊ฒฝ๋ก C๊ฐ ์์ฑ๋๊ธฐ ๋๋ฌธ์ด๋ค.
๊ทธ๋ฌ๋ฉด T - {e’} U {e}๋ ํธ๋ฆฌ์ด๊ณ T์ ๋์ผํ ๊ฐ์ค์น๋ฅผ ๊ฐ์ง๋ค. T๋ ์ต์๊ฐ์ค์น๋ฅผ ๊ฐ์ง๋ฉฐ e’์ ๊ฐ์ค์น๋ e์ ๊ฐ์ค์น๋ณด๋ค ์์ ์ ์๋ค. ๊ทธ๋ ์ง ์์ผ๋ฉด ํฌ๋ฌ์ค์ปฌ ์๊ณ ๋ฆฌ์ฆ์ e๋์ e’๋ฅผ ์ ํํ๊ฒ ๋๋ค. ๋ฐ๋ผ์ T - {e’} U {e}๋ F U {e}๋ฅผ ํฌํจํ๋ ์ต์์ ์ฅํธ๋ฆฌ์ด๋ฉฐ ๋ช ์ P๊ฐ ์ ์ง๋๋ค.
Conclusion
Therefore, by the principle of induction, P holds when F has become a spanning tree, which is only possible if F is a minimum spanning tree itself.
๐ ๊ทธ๋ฌ๋ฏ๋ก ๊ท๋ฉ๋ฒ ์๋ฆฌ์ ์ํด, ๋ช ์ P(F๊ฐ ์๊ณ ๋ฆฌ์ฆ์ ์ด๋ ๋จ๊ณ์์ ์ ํ๋ ์ด์์ ์ ์งํฉ์ด๋ผ๋ฉด F๋ฅผ ํฌํจํ๋ ์ต์์ ์ฅํธ๋ฆฌ๊ฐ ์กด์ฌ)๋ F๊ฐ ์ ์ฅํธ๋ฆฌ๊ฐ ๋ ๋ ์ฑ๋ฆฝํ๋ฉฐ, ์ด๋ F๊ฐ ์ต์์ ์ฅํธ๋ฆฌ ์์ฒด์ธ ๊ฒฝ์ฐ์๋ง ๊ฐ๋ฅํ๋ค.
