Minimum Spanning Trees

Recall that trees are connected and acyclic, meaning there exists exactly one path between any two vertices.

For a graph , a spanning tree is a tree on all vertices in . A minimum spanning tree is a spanning tree with the least total weight, out of all spanning trees for a graph.


The proof for this relies on the fact that adding an edge to a tree creates a cycle.

Cut Property
For any cut of , each edge of minimum weight in the cut must be part of some MST.

A result of this is that if there is only one edge of minimum weight in the cut then it is in every MST.

Cycle Property
For every cycle, an edge with the greatest weight of all edges in the cycle is not part of any MST.


Here are two algorithms that produce a minimum spanning tree for a given graph.

Prim’s Algorithm

  1. Select an arbitrary vertex
  2. Initialize a tree to be just
  3. While does not span
    • Find the vertex not in that is closest to (connected by some edge )
    • Add and to

Kruskal’s Algorithm

  1. Initialize tree
  2. For each edge , in ascending order by weight
    • If adding to does not create a cycle, add and its vertices to

Resistance to Transformations

Let , and let be the graph with each edge weight replaced with .

If is an increasing function, then a MST of is also a MST of .

Proof: Since is increasing, if then . This means the relative ordering of edge weights is preserved. Appealing to the correctness of Prim’s or Kruskal’s algorithms means the edges selected in are the same as those selected in during the execution of these algorithms.

A similar result holds for maximum spanning trees. However, the result does not necessarily hold for any spanning tree. Consider a function and a spanning tree of consisting of edge weights and another spanning tree consisting of weights . Then has a total weight of , which is less than than that of , which as a total weight of . However, the spanning tree has weights for a total weight of , while has weights for a total weight of .

This means if you were to take all the spanning trees of and place them in order by total weight, transforming into does not shift the spanning trees at the ends of this list, but may reorder those in the middle.


The mutual information between two random variables and is a measure of the amount of information (often in bits) gained about after observing . Mutual information is symmetric so .

One application of MSTs is in constructing Bayesian networks. We can model random variables as vertices and mutual information between random variables as edges. We can get a tree with that maximizes the total mutual information by applying Prim’s algorithm or Kruskal’s algorithm. This is known as the Chow-Liu algorithm and produces the tree-structured network that best approximates the true joint distribution of the random variables.