Skip to content

Commit

Permalink
Translation of "Prüfer code" article
Browse files Browse the repository at this point in the history
  • Loading branch information
jakobkogler committed Apr 11, 2018
1 parent 3bb1d94 commit ead499d
Show file tree
Hide file tree
Showing 4 changed files with 371 additions and 1 deletion.
2 changes: 1 addition & 1 deletion src/combinatorics/catalan-numbers.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The first few numbers Catalan numbers, $C_n$ (starting from zero):
The Catalan number $C_n$ is the solution for

- Number of correct bracket sequence consisting of $n$ opening and $n$ closing brackets.
- The number of rooted full binary trees with $n + 1$ leaves (tops are not numbered). A rooted binary tree is full if every vertex has either two children or no children.
- The number of rooted full binary trees with $n + 1$ leaves (vertices are not numbered). A rooted binary tree is full if every vertex has either two children or no children.
- The number of ways to completely parenthesize $n + 1$ factors.
- The number of triangulations of a convex polygon with $n + 2$ sides (i.e. the number of partitions of polygon into disjoint triangles by using the diagonals).
- The number of ways to connect the $2n$ points on a circle to form $n$ disjoint chords.
Expand Down
307 changes: 307 additions & 0 deletions src/graph/pruefer_code.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,307 @@
<!--?title Prüfer code -->
# Prüfer code

In this article we will look at the so-called **Prüfer code** (or Prüfer sequence), which is a way of encoding a labeled tree into a sequence of numbers in a unique way.

With the help of the Prüfer code we will prove **Cayley's formula** (which specified the number of spanning trees in a complete graph).
Also we show the solution to the problem of counting the number of ways of adding edges to a graph to make it connected.

**Note**, we will not consider trees consisting of a single vertex - this is a special case in which multiple statements clash.

## Prüfer code

The Prüfer code is a way of encoding a labeled tree with $n$ vertices using a sequence of $n - 2$ integers in the interval $[0; n-1]$.
This encoding also acts as a **bijection** between all spanning trees of a complete graph and the numerical sequences.

Although using the Prüfer code for storing and operating on tree is impractical due the specification of the representation, the Prüfer codes are used frequently: mostly in solving combinatorial problems.

The inventor - Heinz Prüfer - proposed this code in 1918 as a proof for Cayley's formula.

### Building the Prüfer code for a given tree

The Prüfer code is constructed as follows.
We will repeat the following procedure $n - 2$ times:
we select the leaf of the tree with the smallest number, remove it from the tree, and write down the number of the vertex that was connected to it.
After $n - 2$ iterations there will only remain $2$ vertices, and the algorithm ends.

Thus the Prüfer code for a given tree is a sequence of $n - 2$ numbers, where each number is the number of the connected vertex, i.e. this number is in the interval $[0, n-1]$.

The algorithm for computing the Prüfer code can be implemented easily with $O(n \log n)$ time complexity, simply by using a data structure to extract the minimum (for instance `set` or `priority_queue` in C++), which contains a list of all the current leafs.

```cpp pruefer_code_slow
vector<vector<int>> adj;

vector<int> pruefer_code() {
int n = adj.size();
set<int> leafs;
vector<int> degree(n);
vector<bool> killed(n, false);
for (int i = 0; i < n; i++) {
degree[i] = adj[i].size();
if (degree[i] == 1)
leafs.insert(i);
}

vector<int> code(n - 2);
for (int i = 0; i < n - 2; i++) {
int leaf = *leafs.begin();
leafs.erase(leafs.begin());
killed[leaf] = true;

int v;
for (int u : adj[leaf]) {
if (!killed[u])
v = u;
}

code[i] = v;
if (--degree[v] == 1)
leafs.insert(v);
}

return code;
}
```

However the construction can also be implemented in linear time.
Such an approach is described in the next section.

### Building the Prüfer code for a given tree in linear time

The essence of the algorithm is to use a **moving pointer**, which will always point to the current leaf vertex that we want to remove.

At first glance this seems impossible, because during the process of constructing the Prüfer code the leaf number can increase and decrease.
However after a closer look, this is actually not true.
The number of leafs will not increase. Either the number decreases by one (we remove one leaf vertex and don't gain a new one), or it stay the same (we remove one leaf vertex and gain another one).
In the first case there is no other way than searching for the next smallest leaf vertex.
In the second case, however, we can decide in $O(1)$ time, if we can continue using the vertex that became a new leaf vertex, or if we have to search for the next smallest leaf vertex.
And in quite a lot of times we can continue with the new leaf vertex.

To do this we will use a variable $\text{ptr}$, which will indicate that in the set of vertices between $0$ and $\text{ptr}$ is at most one leaf vertex, namely the current one.
All other vertices in that range are either already removed from the tree, or have still more than one adjacent vertices.
At the same time we say, that we haven't removed any leaf vertices bigger than $\text{ptr}$ yet.

This variable is already very helpful in the first case.
After removing the current leaf node, we know that there cannot be a leaf node between $0$ and $\text{ptr}$, therefore we can start the search for the next one directly at $\text{ptr} + 1$, and we don't have to start the search back at vertex $0$.
And in the second case, we can further distinguish two cases:
Either the newly gained leaf vertex is smaller than $\text{ptr}$, then this must be the next leaf vertex, since we know that there are no other vertices smaller than $\text{ptr}$.
Or the newly gained leaf vertex is bigger.
But then we also know that it has to be bigger than $\text{ptr}$, and can start the search again at $\text{ptr} + 1$.

Even though we might have to perform multiple linear searches for the next leaf vertex, the pointer $\text{ptr}$ only increases and therefore the time complexity in total is $O(n)$.

```cpp pruefer_code_fast
vector<vector<int>> adj;
vector<int> parent;

void dfs(int v) {
for (int u : adj[v]) {
if (u != parent[v]) {
parent[u] = v;
dfs(u);
}
}
}

vector<int> pruefer_code() {
int n = adj.size();
parent.resize(n);
parent[n-1] = -1;
dfs(n-1);

int ptr = -1;
vector<int> degree(n);
for (int i = 0; i < n; i++) {
degree[i] = adj[i].size();
if (degree[i] == 1 && ptr == -1)
ptr = i;
}

vector<int> code(n - 2);
int leaf = ptr;
for (int i = 0; i < n - 2; i++) {
int next = parent[leaf];
code[i] = next;
if (--degree[next] == 1 && next < ptr) {
leaf = next;
} else {
ptr++;
while (degree[ptr] != 1)
ptr++;
leaf = ptr;
}
}

return code;
}
```
In the code we first find for each its ancestor `parent[i]`, i.e. the ancestor that this vertex will have once we remove it from the tree.
We can find this ancestor by rooting the tree at the vertex $n-1$.
This is possible because the vertex $n-1$ will never be removed from the tree.
We also compute the degree for each vertex.
`ptr` is the pointer that indicates the minimum size of the remaining leaf vertices (except the current one `leaf`).
We will either assign the current leaf vertex with `next`, if this one is also a leaf vertex and it is smaller than `ptr`, or we start a linear search for the smallest leaf vertex by increasing the pointer.
It can be easily seen, that this code has the complexity $O(n)$.
### Some properties of the Prüfer code
- After constructing the Prüfer code two vertices will remain.
One of them is the highest vertex $n-1$, but nothing else can be said about the other one.
- Each vertex appears in the Prüfer code exactly a fixed number of times - its degree minus one.
This can be easily checked, since the degree will get smaller every time we record its label in the code, and we remove it once the degree is $1$.
For the two remaining vertices this fact is also true.
### Restoring the tree using the Prüfer code
To restore the tree it suffice to only focus on the property discussed in the last section.
We already know the degree of all the vertices in the desired tree.
Therefore we can find all leaf vertices, and also the first leaf that was removed in the first step (it has to be the smallest leaf).
This leaf vertex was connected to the vertex corresponding to the number in the first cell of the Prüfer code.
Thus we found the first edge removed by when then the Prüfer code was generated.
We can add this edge to the answer and reduce the degrees at both ends of the edge.
We will repeat this operation until we have used all numbers of the Prüfer code:
we look for the minimum vertex with degree equal to $1$, connect it with the next vertex from the Prüfer code, and reduce the degree.
In the end we only have two vertices left with degree equal to $1$.
These are the vertices that didn't got removed by the Prüfer code process.
We connect them to get the last edge of the tree.
One of them will always be the vertex $n-1$.
This algorithm can be **implemented** easily in $O(n \log n)$: we use a data structure that supports extracting the minimum (for example `set<>` or `priority_queue<>` in C++) to store all the leaf vertices.
The following implementation returns the list of edges corresponding to the tree.
```cpp pruefer_decode_slow
vector<pair<int, int>> pruefer_decode(vector<int> const& code) {
int n = code.size() + 2;
vector<int> degree(n, 1);
for (int i : code)
degree[i]++;
set<int> leaves;
for (int i = 0; i < n; i++) {
if (degree[i] == 1)
leaves.insert(i);
}
vector<pair<int, int>> edges;
for (int v : code) {
int leaf = *leaves.begin();
leaves.erase(leaves.begin());
edges.emplace_back(leaf, v);
if (--degree[v] == 1)
leaves.insert(v);
}
edges.emplace_back(*leaves.begin(), n-1);
return edges;
}
```

### Restoring the tree using the Prüfer code in linear time

To obtain the tree in linear time we can apply the same technique used to obtain the Prüfer code in linear time.

We don't need a data structure to extract the minimum.
Instead we can notice that, after processing the current edge, only one vertex becomes a leaf.
Therefore we can either continue with this vertex, or we find a smaller one with a linear search by moving a pointer.

```cpp pruefer_decode_fast
vector<pair<int, int>> pruefer_decode(vector<int> const& code) {
int n = code.size() + 2;
vector<int> degree(n, 1);
for (int i : code)
degree[i]++;

int ptr = 0;
while (degree[ptr] != 1)
ptr++;
int leaf = ptr;

vector<pair<int, int>> edges;
for (int v : code) {
edges.emplace_back(leaf, v);
if (--degree[v] == 1 && v < ptr) {
leaf = v;
} else {
ptr++;
while (degree[ptr] != 1)
ptr++;
leaf = ptr;
}
}
edges.emplace_back(leaf, n-1);
return edges;
}
```
### Bijection between trees and Prüfer codes
For each tree there exists a Prüfer code corresponding to it.
And for each Prüfer code we can restore the original tree.
It follows that also every Prüfer code (i.e. a sequence of $n-2$ numbers in the range $[0; n - 1]$) corresponds to a tree.
Therefore all trees and all Prüfer codes form a bijection (a **one-to-one correspondence**).
## Cayley's formula
Cayley's formula states that the **number of spanning trees in a complete labeled graph** with $n$ vertices is equal to:
$$n^{n-2}$$
There are multiple proofs for this formula.
Using the Prüfer code concept this statement comes without any surprise.
In fact any Prüfer code with $n-2$ numbers from the interval $[0; n-1]$ corresponds to some tree with $n$ vertices.
So we have $n^{n-2}$ different such Prüfer codes.
Since each such tree is a spanning tree of a complete graph with $n$ vertices, the number of such spanning trees is also $n^{n-2}$.
## Number of ways to make a graph connected
The concept of Prüfer codes are even more powerful.
It allows to create a lot more general formulas than Cayley's formula.
In this problem we are given a graph with $n$ vertices and $m$ edges.
The graph currently has $k$ components.
We want to compute the number of ways of adding $k-1$ edges so that the graph becomes connected (obviously $k-1$ is the minimum number necessary to make the graph connected).
Let us derive a formula for solving this problem.
We use $s_1, \dots, s_k$ for the sizes of the connected components in the graph.
We cannot add edges within a connected component.
Therefore it turns out that this problem is very similar to the search for the number of spanning trees of a complete graph with $k$ vertices.
The only difference is that each vertex has actually the size $s_i$: each edge connecting the vertex $i$, actually multiplies the answer by $s_i$.
Thus in order to calculate the number of possible ways it is important to count how often each of the $k$ vertices is used in the connecting tree.
To obtain a formula for the problem it is necessary to sum the answer over all possible degrees.
Let $d_1, \dots, d_k$ be the degrees of the vertices in the tree after connecting the vertices.
The sum of the degrees is twice the number of edges:
$$\sum_{i=1}^k d_i = 2k - 2$$
If the vertex $i$ has degree $d_i$, then it appears $d_i - 1$ times in the Prüfer code.
The Prüfer code for a tree with $k$ vertices has length $k-2$.
So the number of ways to choose a code with $k-2$ numbers where the number $i$ appears exactly $d_i - 1$ times is equal to the **multinomial coefficient**
$$\binom{k-2}{d_1-1, d_2-1, \dots, d_k-1} = \frac{(k-2)!}{(d_1-1)! (d_2-1)! \cdots (d_k-1)!}.$$
The fact that each edge adjacent to the vertex $i$ multiplies the answer by $s_i$ we receive the answer, assuming that the degrees of the vertices are $d_1, \dots, d_k$:
$$s_1^{d_1} \cdot s_2^{d_2} \cdots s_k^{d_k} \cdot \binom{k-2}{d_1-1, d_2-1, \dots, d_k-1}$$
To get the final answer we need to sum this for all possible ways to choose the degrees:
$$\sum_{\substack{d_i \ge 1 \\\\ \sum_{i=1}^k d_i = 2k -2}} s_1^{d_1} \cdot s_2^{d_2} \cdots s_k^{d_k} \cdot \binom{k-2}{d_1-1, d_2-1, \dots, d_k-1}$$
Currently this looks like a really horrible answer, however we can use the **multinomial theorem**, which says:
$$(x_1 + \dots + x_m)^p = \sum_{\substack{c_i \ge 0 \\\\ \sum_{i=1}^m c_i = p}} x_1^{c_1} \cdot x_2^{c_2} \cdots x_m^{c_m} \cdot \binom{p}{c_1, c_2, \dots c_m}$$
This look already pretty similar.
To use it we only need to substitute with $e_i = d_i - 1$:
$$\sum_{\substack{e_i \ge 0 \\\\ \sum_{i=1}^k e_i = k - 2}} s_1^{e_1+1} \cdot s_2^{e_2+1} \cdots s_k^{e_k+1} \cdot \binom{k-2}{e_1, e_2, \dots, e_k}$$
After applying the multinomial theorem we get the **answer to the problem**:
$$s_1 \cdot s_2 \cdots s_k \cdot (s_1 + s_2 + \dots + s_k)^{k-2} = s_1 \cdot s_2 \cdots s_k \cdot n^{k-2}$$
By accident this formula also holds for $k = 1$.
## Practice problems
- [UVA #10843 - Anne's game](https://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&category=20&page=show_problem&problem=1784)
- [Timus #1069 - Prufer Code](http://acm.timus.ru/problem.aspx?space=1&num=1069)
- [Codeforces - Clues](http://codeforces.com/contest/156/problem/D)
- [Topcoder - TheCitiesAndRoadsDivTwo](https://community.topcoder.com/stat?c=problem_statement&pm=10774&rd=14146)
1 change: 1 addition & 0 deletions src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ especially popular in field of competitive programming.*
- [Minimum Spanning Tree - Prim's Algorithm](./graph/mst_prim.html)
- [Minimum Spanning Tree - Kruskal](./graph/mst_kruskal.html)
- [Minimum Spanning Tree - Kruskal with Disjoint Set Union](./graph/mst_kruskal_with_dsu.html)
- [Prüfer code](./graph/pruefer_code.html)
- [Kirchhoff Theorem](./graph/kirchhoff-theorem.html)
- [Strongly Connected Components and Condensation Graph](./graph/strongly-connected-components.html)
- [Maximum flow - Ford-Fulkerson and Edmonds-Karp](./graph/edmonds_karp.html)
Expand Down
62 changes: 62 additions & 0 deletions test/test_pruefer_code.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
#include <cassert>
#include <vector>
#include <set>
using namespace std;

namespace SLOW {
#include "pruefer_code_slow.h"
#include "pruefer_decode_slow.h"
}

namespace FAST {
#include "pruefer_code_fast.h"
#include "pruefer_decode_fast.h"
}

vector<vector<int>> get_adj(vector<pair<int, int>> edges) {
vector<vector<int>> adj(edges.size() + 1);
for (auto edge : edges) {
adj[edge.first].push_back(edge.second);
adj[edge.second].push_back(edge.first);
}
return adj;
}

int main() {
{
using namespace SLOW;
adj = {
{3},
{3},
{3},
{0, 1, 2, 4},
{3, 5},
{4}
};
vector<int> exp_code = {3, 3, 3, 4};
assert(pruefer_code() == exp_code);

vector<int> code2 = {5, 3, 2, 4, 4};
auto edges = pruefer_decode(code2);
adj = get_adj(edges);
assert(pruefer_code() == code2);
}
{
using namespace FAST;
adj = {
{3},
{3},
{3},
{0, 1, 2, 4},
{3, 5},
{4}
};
vector<int> exp_code = {3, 3, 3, 4};
assert(pruefer_code() == exp_code);

vector<int> code2 = {5, 3, 2, 4, 4};
auto edges = pruefer_decode(code2);
adj = get_adj(edges);
assert(pruefer_code() == code2);
}
}

0 comments on commit ead499d

Please sign in to comment.