Cyclic Graph Query in Sqlite/Python
A look at a WITH RECURSIVE query, how a cycle in the data breaks it, and how we can fix it via the Python and Sqlite dynamic duo.
Problem
We want to use WITH RECURSIVE to query an edge list table in a Sqlite table, as well returning the level of the edges, but there is a cycle. Read on to see one way we can address this via a Python wrapper for our Sqlite call.
This article walks through the basic recursive query; how we can add recursion level to that basic query; what happens when there is a cycle in the data; a first try at a fix, and a decent (if hackish) solution.
Basic Query
The basic table and data are right out of Celko or the SQLite documentation:
CREATE TABLE emp(emp_name,boss_emp_name);
INSERT INTO emp(emp_name,boss_emp_name) VALUES(?,?);
[ ('Albert', '')
, ('Bert' , 'Albert')
, ('Chuck' , 'Albert')
, ('Donna' , 'Bert')
, ('Eddie' , 'Bert')
, ('Fred ' , 'Chuck')
, ('Gail' , 'Chuck') ]
(Skipping a bunch of pytest clutter here, except to say thanks to those who made it.)
All is fine and a basic query returns what we INSERTed:
WITH RECURSIVE bosses(emp_name, boss_emp_name) AS (
VALUES( 'Albert', NULL)
UNION
SELECT A.emp_name
, A.boss_emp_name
FROM emp A
INNER JOIN bosses B
WHERE A.boss_emp_name=B.emp_name
ORDER BY 2 DESC
LIMIT 10
)
SELECT * FROM bosses;
('Albert', None)
('Bert' , 'Albert')
('Donna' , 'Bert')
('Eddie' , 'Bert')
('Chuck' , 'Albert')
('Fred ' , 'Chuck')
('Gail' , 'Chuck')The Sqlite documentation notes that UNION means that “UNION is used instead of UNION ALL to prevent the recursion from entering an infinite loop if the graph contains cycles.“
Levels
We can have Sqlite track the level of the nodes, which could be useful for displaying the results:
WITH RECURSIVE bosses(emp_name, boss_emp_name,level) AS (
VALUES( 'Albert', NULL, 0)
UNION
SELECT A.emp_name
, A.boss_emp_name
, level + 1
FROM emp A INNER JOIN bosses B
WHERE A.boss_emp_name=B.emp_name
ORDER BY 3 DESC
LIMIT 20
)
SELECT * FROM bosses;
('Albert', None , 0)
('Bert' , 'Albert', 1)
('Donna' , 'Bert' , 2)
('Eddie' , 'Bert' , 2)
('Chuck' , 'Albert', 1)
('Fred ' , 'Chuck' , 2)
('Gail' , 'Chuck' , 2)We’ve added a level field to the query output that gets incremented every time the engine descends to another level in the hierarchy.
Note that we stop at level 2 and that I doubled the LIMIT clause value to 20. The motive for that will be come apparent as we move to the next test. It will include a line revealing Gail as the power behind the throne of this imaginary organization.
Cycle Me
Pytest makes this sort of development straightforward. This test function
Uses the ‘:memory:’ connection conn.
Instantiates the data table and loads it, as shown above, with data.
Then adds an extra edge connecting Albert to Gail.
We enumerate() the output, which will highlight the cycle problem.
def test_break_the_query(conn, sql_create, sql_insert, data, sql_query_level):
cur = conn.cursor()
cur.execute(sql_create)
for d in data:
cur.execute(sql_insert, d)
cur.execute(sql_insert, ("Albert", "Gail"))
conn.commit()
for i, r in enumerate(cur.execute(sql_query_level)):
print(f"{i=}. {r}")
Results:
i=0. ('Albert', None, 0)
i=1. ('Bert', 'Albert', 1)
i=2. ('Donna', 'Bert', 2)
i=3. ('Eddie', 'Bert', 2)
i=4. ('Chuck', 'Albert', 1)
i=5. ('Fred ', 'Chuck', 2)
i=6. ('Gail', 'Chuck', 2)
i=7. ('Albert', 'Gail', 3)
i=8. ('Bert', 'Albert', 4)
i=9. ('Donna', 'Bert', 5)
i=10. ('Eddie', 'Bert', 5)
i=11. ('Chuck', 'Albert', 4)
i=12. ('Fred ', 'Chuck', 5)
i=13. ('Gail', 'Chuck', 5)
i=14. ('Albert', 'Gail', 6)
i=15. ('Bert', 'Albert', 7)
i=16. ('Donna', 'Bert', 8)
i=17. ('Eddie', 'Bert', 8)
i=18. ('Chuck', 'Albert', 7)
i=19. ('Fred ', 'Chuck', 8)
Note the cycle appearing when i=7. This would be an infinite loop in code if not for the handy LIMIT 20 clause in the SQL query.
First Try At A Fix
The Fundamental Theorem of Software Engineering: "Any problem in computer science can be solved with another layer of indirection"—David Wheeler
Define a python set() called unique_tuples to judge the uniqueness of the Sqlite return.
Define a nested function to decide if we want to increment the level value based upon checking unique_tuples. Note that if a tuple has been seen, we’ll see “maybe redundant” printed in the output.
Register this user-defined function within the Sqlite connection.
Carry on as before, inserting the cycle and then querying.
def test_fix_the_broken_query(
conn, sql_create, sql_insert, data, sql_query_maybe_level
):
unique_tuples = set()
def maybe_level(a_emp_name, a_boss_emp_name, level):
maybe_tuple = (a_emp_name, a_boss_emp_name)
if maybe_tuple in unique_tuples:
print("maybe redundant:")
return level
else:
unique_tuples.add(maybe_tuple)
return level + 1
conn.create_function("maybe_level", 3, maybe_level)
cur = conn.cursor()
cur.execute(sql_create)
for d in data:
cur.execute(sql_insert, d)
cur.execute(sql_insert, ("Albert", "Gail"))
conn.commit()
for i, r in enumerate(cur.execute(sql_query_maybe_level)):
print(f"{i=}. {r}")We need to improve the query to use the maybe_level() function:
WITH RECURSIVE bosses(emp_name, boss_emp_name,level) AS (
VALUES( 'Albert', NULL, 0)
UNION
SELECT A.emp_name
, A.boss_emp_name
, maybe_level( A.emp_name
, A.boss_emp_name
, level )
FROM emp A INNER JOIN bosses B
WHERE A.boss_emp_name=B.emp_name
ORDER BY 3 DESC
LIMIT 20
)
SELECT * FROM bosses;…and we’re almost there:
i=0. ('Albert', None , 0)
i=1. ('Bert' , 'Albert', 1)
i=2. ('Donna' , 'Bert' , 2)
i=3. ('Eddie' , 'Bert' , 2)
i=4. ('Chuck' , 'Albert', 1)
i=5. ('Fred ' , 'Chuck' , 2)
i=6. ('Gail' , 'Chuck' , 2)
maybe redundant:
maybe redundant:
i=7. ('Albert', 'Gail' , 3)
maybe redundant:
maybe redundant:
i=8. ('Bert' , 'Albert', 3)
maybe redundant:
maybe redundant:
i=9. ('Chuck' , 'Albert', 3)
i=10. ('Donna' , 'Bert' , 3)
i=11. ('Eddie' , 'Bert' , 3)
i=12. ('Fred ' , 'Chuck' , 3)
maybe redundant:
i=13. ('Gail' , 'Chuck' , 3)The code noted that we had “maybe redundant” output, and the second time that the Sqlite engine encountered (’Gail’ , ‘Chuck’ , 3) it stopped recursing. The query halted ahead of reaching the LIMIT, but we actually duplicated the data.
A Decent (if Hackish) Solution
Cutting to the chase, the next Python try is a generator function that is stand-alone, and then invoked from pytest.
Here we declare out_list and have maybe_level() report findings there. We can’t just throw a StopIteration exception from maybe_level(), as the Sqlite engine is…unexceptional…at least from a Python vantage.
We can then check out_list for StopIteration() at the bottom of the function and exit gracefully when we encounter a dupe.
def generate_unique_nodes_despite_cycle(
conn, sql_create, sql_insert, data, sql_query_maybe_level
):
unique_tuples = set()
out_list = []
def maybe_level(a_emp_name, a_boss_emp_name, level):
maybe_tuple = (a_emp_name, a_boss_emp_name)
if maybe_tuple in unique_tuples:
out_list.append("StopIteration")
return level
else:
unique_tuples.add(maybe_tuple)
return level + 1
conn.create_function("maybe_level", 3, maybe_level)
cur = conn.cursor()
cur.execute(sql_create)
for d in data:
cur.execute(sql_insert, d)
cur.execute(sql_insert, ("Albert", "Gail"))
conn.commit()
for i, r in enumerate(cur.execute(sql_query_maybe_level)):
if out_list != []:
return None
else:
yield f"{i=}. {r}"
def test_final_offer(conn, sql_create, sql_insert, data, sql_query_maybe_level):
for x in generate_unique_nodes_despite_cycle(
conn, sql_create, sql_insert, data, sql_query_maybe_level
):
print(x)Let’s see:
i=0. ('Albert', None, 0)
i=1. ('Bert', 'Albert', 1)
i=2. ('Donna', 'Bert', 2)
i=3. ('Eddie', 'Bert', 2)
i=4. ('Chuck', 'Albert', 1)
i=5. ('Fred ', 'Chuck', 2)
i=6. ('Gail', 'Chuck', 2)Brilliant! The test exited when the cycle was first encountered.
Summary
Thus if we’re willing to
duplicate the resulting data,
jump through setup hoops, and
lose some performance with the extra if.. logic,
…we can safely query an edge list with cycles.
I haven’t done much beside write this example code. In particular, if the cycle crops up early in the resulting data, vast swaths of the graph could be omitted.
The “StopIteration” play in the list is cheeky. Real production code would probably want to return more detail about the duplicate itself, and make this generator function part of a maintenance system that notes the problem, and then lets someone review the cause, before taking maintenance action.
Maybe you do automate the cycle removal, and then restart the query. Wash, rinse, repeat until a complete result obtains. Mileage can and will vary.

