Home How to Draw a Dendrogram by Hand
Post
Cancel

How to Draw a Dendrogram by Hand

Scatterplot and dendrogram
Fig 1. scatterplot of values (left) and dendrogram of clustered data points (right).

Dendrograms are easy to create with code, but are a bit challenging to do by hand. In this post, I’ll show you how to draw a dendrogram using pen and paper.

This will be our dummy problem statement:

Draw a dendrogram for the following scatterplot. Use single linkage and euclidean (l2) distance.

Scatterplot and dendrogram
Fig 2. scatterplot of 6 randomly generated points.

There are a few definitions to get out of the way. The Euclidean distance between points P and Q in two dimensions is \(d(P, Q) = \sqrt{(p_1 - q_1)² + (p_2 - q_2)²}\).

Single linkage refers to using the smallest distance between two clusters to merge. The smallest distance is also used when merging a cluster and a point. Merging two points just uses Euclidean distance without any special considerations.

It is usually easiest sort the x-axis labels by eyeballing which points look like they might be merged last. In figure 3, data point 0 is visually farthest away from everything in the scatterplot, so it is on the far right of the axis in the dendrogram. The starter template is shown below.

Scatterplot and dendrogram
Fig 3. empty dendrogram with no branches.

Let’s begin merging. Since we are using single linkage, the strategy is to merge the closest two points that have not already been merged.

Scatterplot and dendrogram
Fig 4. merging the closest pairs of points. We'll re-evaluate the ones that are further away after merging the obvious ones (magenta).
Scatterplot and dendrogram
Fig 5. the next closest pair of non-merged points is between 5 and 2 (red).
Scatterplot and dendrogram
Fig 6. continuing to build the dendrogram with the closest non-merged points (green).
Scatterplot and dendrogram
Fig 7. the finished dendrogram (grey).

It is usually best to do a sanity check like recreating the plot with code if you aren’t doing this in an exam setting. You can compare the drawing with the generated code in Figure 1.

This is the code to make Figure 1:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage

np.random.seed(25)
num_points = 6
data_points = pd.DataFrame(np.random.randint(0, 10,
                (num_points, 2)), columns=['x', 'y'])

linked = linkage(data_points, 'single', 'euclidean')

fig, ax = plt.subplots(1, 2, figsize=(10, 5))

data_points.plot.scatter(x='x', y='y', s=150,
                         title='Randomly Generated Data Points', ax=ax[0])
ax[0].grid()
ax[0].set_axisbelow(True)

# add numbers to the points based on their index
for i, point in data_points.iterrows():
    ax[0].annotate(i, point-.07, color='w')

labelList = range(0, num_points)

plt.subplot(1, 2, 2)
dendrogram(linked,
           labels=labelList,
           distance_sort='descending')
# add grid for the y axis only
plt.grid(axis='y')
ax[1].spines['left'].set_visible(False)
ax[1].set_axisbelow(True)

plt.title('Dendrogram for Randomly Generated Data Points')
plt.xlabel('Data Point Label')
plt.ylabel('Euclidean Distance (Single Linkage)')
plt.show()

This was my pyproject.toml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[tool.poetry]
name = "medium-articles"
version = "0.1.0"
description = ""
authors = ["Coulton Theuer <>"]
readme = "README.md"

[tool.poetry.dependencies]
python = "3.11.2"
pandas = "^2.1.4"
numpy = "^1.26.3"
scikit-learn = "^1.3.2"
SciPy = "^1.11.4"
nltk = "^3.8.1"
statsmodels = "^0.14.1"
ipykernel = "^6.28.0"
jupyter = "^1.0.0"
matplotlib = "^3.8.2"
seaborn = "^0.13.1"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

Resources:

This post is licensed under CC BY 4.0 by the author.

-

-