Chapter 7: SciPy Graphs
SciPy Graphs, which almost everyone means scipy.sparse.csgraph when they say that.
This is not a general-purpose graph library like NetworkX (which is for more flexible, higher-level graph stuff). Instead, scipy.sparse.csgraph is a very efficient, low-level collection of graph algorithms that work directly on sparse matrices (usually in CSR or CSC format).
It’s designed for speed and memory efficiency when your graph is:
- Large (thousands to millions of nodes)
- Sparse (most possible edges don’t exist → adjacency matrix has mostly zeros)
Think road networks, social connections (sparse!), pixel neighborhoods in image segmentation, finite element meshes, recommender systems (users × items), etc.
Current status (February 2026): Still lives in scipy.sparse.csgraph Latest SciPy version around 1.17.0 (released early 2026) No major breaking changes in recent years — very stable
Core Idea — Graphs as Sparse Matrices
In csgraph, a graph with N nodes is represented as an N × N sparse adjacency matrix:
- G[i,j] = weight/cost/distance from node i to node j
- If no edge → G[i,j] = 0 (or very large number for some algos)
- Undirected graph → symmetric matrix
- Directed graph → asymmetric
The module only accepts sparse formats (csr_array, csc_array, coo_array, etc.) or sometimes dense (but converts internally).
Most Important Functions — What You’ll Actually Use
Here’s the practical shortlist (from docs & real usage in 2026):
| What you want to do | Main function(s) | Input graph directed? | Returns what? | Typical use case |
|---|---|---|---|---|
| Check connected components | connected_components | Yes/No | n_components, labels | Find clusters/islands in data |
| Shortest paths (all pairs) | floyd_warshall, johnson | Yes | Distance matrix (dense!) | All-pairs distances (small–medium graphs) |
| Shortest paths from single source | dijkstra, bellman_ford | Yes | distances, predecessors | Navigation, routing (Dijkstra most common) |
| Minimum spanning tree | minimum_spanning_tree | No | MST as csr sparse matrix | Network design, clustering |
| Depth-first / breadth-first traversal tree | depth_first_tree, breadth_first_tree | Yes | Tree as sparse matrix | Traversal order, spanning tree |
| Shortest path tree | shortest_path (convenience wrapper) | Yes | Method choice + distances/predecessors | One function for most shortest-path needs |
| Laplacian matrix | laplacian | No | Sparse Laplacian | Spectral clustering, graph cuts |
Let’s Do Real Examples — Hands-On (Jupyter style)
Always start like this:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
import numpy as np from scipy.sparse import csr_array # preferred in modern SciPy from scipy.sparse.csgraph import ( connected_components, shortest_path, dijkstra, minimum_spanning_tree, laplacian ) import matplotlib.pyplot as plt |
Example 1 — Connected Components (very common first step)
Imagine we have 6 cities, some connected by roads.
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# Adjacency matrix (undirected → symmetric) # 0–1, 0–2, 1–3, 4–5 are connected; 4–5 is separate component rows = [0,0,1,1,2,3,4,5] cols = [1,2,3,0,0,1,5,4] data = [1,1,1,1,1,1,1,1] # unweighted G = csr_array((data, (rows, cols)), shape=(6,6)) n_components, labels = connected_components(csgraph=G, directed=False, return_labels=True) print("Number of connected components:", n_components) # → 2 print("Component labels for nodes 0–5:", labels) # → e.g. [0 0 0 0 1 1] |
→ Nodes 0,1,2,3 in one group; 4,5 in another.
Example 2 — Shortest Path from One City to All Others (Dijkstra)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
# Weighted graph: distances in km dist_data = [40, 25, 90, 35, 60, 55, 70, 80] G_weighted = csr_array((dist_data, (rows, cols)), shape=(6,6)) # Make undirected (add reverse edges) G_weighted = G_weighted + G_weighted.T distances, predecessors = dijkstra( csgraph=G_weighted, directed=False, indices=0, # start from node 0 return_predecessors=True ) print("Shortest distances from node 0:", distances) # e.g. [ 0. 40. 25. 65. inf inf] print("Predecessor array (to reconstruct path):", predecessors) # Use to build path: follow back from target to 0 |
For all-pairs (small graph):
|
0 1 2 3 4 5 6 7 |
all_dist = shortest_path(G_weighted, method='auto') # chooses dijkstra/floyd/johnson automatically print(all_dist) |
Example 3 — Minimum Spanning Tree (MST)
Useful for wiring networks with minimal cable, clustering, etc.
|
0 1 2 3 4 5 6 7 8 |
mst = minimum_spanning_tree(G_weighted) print(mst) # sparse matrix of the MST edges print(mst.toarray()) # dense view (only for small graphs!) |
→ Returns a tree (acyclic connected graph) with minimal total weight.
Example 4 — Graph Laplacian (spectral things)
|
0 1 2 3 4 5 6 7 |
L = laplacian(G_weighted, normed=False) # unnormalized print(L.toarray()) |
→ Used in spectral clustering, graph partitioning, diffusion, etc.
When to Use csgraph vs NetworkX?
| Criterion | Use scipy.sparse.csgraph | Use NetworkX |
|---|---|---|
| Graph size | Thousands–millions of nodes | Hundreds–thousands (slower on big) |
| Memory efficiency | Excellent (sparse only) | Higher overhead |
| Speed of core algos | Very fast (compiled Cython/Fortran) | Slower (pure Python mostly) |
| You already have sparse matrix | Yes → perfect fit | Need to convert |
| Need rich graph features | No (just algos) | Yes (attributes, drawing, centrality…) |
| Visualization | No built-in | Yes (draw, spring_layout…) |
Most scientific/ML people use csgraph for heavy lifting (shortest paths, connectivity on huge sparse data) and NetworkX for exploration/visualization/smaller graphs.
Teacher’s Final Tips (2026 edition)
- Always use CSR format for most csgraph functions (fastest)
- For directed graphs → set directed=True
- If graph has no weights → use 1 for edges or just connectivity
- Missing edge = 0 (but some algos like Dijkstra treat 0 as no edge — watch out!)
- For huge graphs → prefer dijkstra with single source over Floyd-Warshall
- Docs are excellent: https://docs.scipy.org/doc/scipy/reference/sparse.csgraph.html Tutorial bits: https://docs.scipy.org/doc/scipy/tutorial/csgraph.html (word ladder example is classic)
What kind of graph problem are you thinking about?
- Road/city shortest paths?
- Social network components?
- Building MST for clustering?
- Image segmentation (pixel graph)?
- Something from your data?
Tell me and we’ll build a more specific, realistic example together (maybe even with fake data that looks like yours). 🚀
