Hierarchical Data cont’d – Path Enumeration and Closure Tables

Subscribe: Apple Podcasts | Google Podcasts | Spotify | TuneIn | RSS

It’s that time again.

This week we answer a question, Allen registers for school, Joe reads some numbers, Michael breaks out the survey results, and Joe cringes at the thought of bidets. It’s time for episode 29! And we thought, what better to talk about than to continue our discussion on hierarchical data solutions.

We were so excited to discuss the survey results, that we forgot to mention a new survey. So, this episode’s survey is:

News & Updates

Sincere thanks for the reviews: ShonnLy, Mike Myke, and DEADIRON
The survey results are in, but what do they mean? Science!
It’s never too late to become a developer.
Join Allen at Stanford’s Algorithms: Design and Analysis, Part 1 online course! He’ll need a shoulder to cry on.
NDepend 6 is out! Use offer code NDependV6CodingBlocks before July 2nd to save 20%.

Path Enumeration

Also known as materialized path, in this structure, one column contains each row’s entire heritage. Ways to visualize this include any directory path you’ve ever seen like C:\Windows\System32\drivers\etc or any bread crumbing you’ve see around the web such as Home > Computer Hardware > Hard Drives > SSDs.

Consider three records with ids 1, 2, 3. Record 2 is a child of 1 and record 3 is a child of 2. Assuming the / is used to separate each id, the data would look like:

ID	PATH
1	1/
2	1/2/
3	1/2/3/

Pros

Each record knows it’s entire heritage without additional queries.
Allows multiple parents, but requires being stored in a separate table.

Cons

Depth of heritage constrained to column width.
Queries to get descendants require string parsing, which can be a performance hinderance, especially for larger data sets.
- SELECT * FROM mydata WHERE path LIKE ‘windows\system32\%’
No referential integrity nor normalization.
Hierarchy changes can be expensive.
Doesn’t scale well for larger data sets.

Closure Tables

Similar to the nested set model, closure tables are a separate table like the name implies. Every node of the tree knows *all* of it’s descendants.

Considering the same three records as before, the data would look like:

Ancestor	Descendant
1	1
1	2
1	3
2	2
2	3
3	3

Pros

Supports referential integrity.
Indexes well.
Easy to manipulate “middle” nodes: add/move/delete.
Easy query sub-trees.
Allows multiple parents.
Multiple hierarchies can co-exist.

Cons

Total row count can be high for large data sets. Worse case, O(n²).

Tips & Tricks

Launch a browser in Incognito Mode directly from Visual Studio
Need lots of data to play with? Amazon has you covered with Public Data Sets.
Be nice to your co-workers and provide smaller merge bubbles. Use git pull‘s rebase option. WARNING: Not recommended for history that has already been published.

Tell A Friend. Or Three.

And if you haven’t already, be sure to visit www.codingblocks.net/review to share your opinion with anonymous strangers. Meh, it’s what people do. And if you already have, who’s better than you? No one, that’s who. You. Rock.

Code, compile, commit. Sigh, repeat.

Share the joy

Navigation

Should this episode have a survey?

News & Updates

Path Enumeration

Pros

Cons

Closure Tables

Pros

Cons

Tips & Tricks

Tell A Friend. Or Three.

About Michael Outlaw