Making a note of speed-up findings while fresh on my mind.
- Performed some tests on
IsAncestorOf
on the Example tree - If I run the filter on every person, it takes 2 minutes and 10 seconds
- If I avoid all unpickling, it comes it at 2 or 3 seconds
- Operating in batches is faster, fewer queries
- Unpickling can be completely avoided, even when we need info from inside the blobs
- This involves saving the unpickled data in a new column and using JSON_EXTRACT() on the JSON-based text
- We could implement such methods in a business-logic layer on top of the database. The current, unpickled, deserialized objects can still be the fallback.
- Another enhancement for regular operation is to not unpickle the same object repeatedly. We could have a cache of unpickled, or deserialized objects.
What does a completely optimized low-level business logic function look like?
def get_parent_handles_from_main_family(db, person_handle):
db.dbapi.execute(
"SELECT JSON_EXTRACT(unblob, '$[9]') FROM person WHERE handle = ? limit 1;",
[person_handle]
)
row = db.dbapi.fetchone()
parent_family_list = json.loads(row[0])
if parent_family_list:
return parent_family_list[0]
It is low-level, but not any more so that the unserialize methods.
The above replaces this which requires an unpickled, unserialized Person to begin with, and then unpickles and unserializes a Family too:
fam_id = person.get_main_parents_family_handle()
if fam_id:
fam = db.get_family_from_handle(fam_id)
if fam:
f_id = fam.get_father_handle()
m_id = fam.get_mother_handle()