DBI API and SQLite optimizations

I’ve been using integrated development environments with AI for the last couple of months, and they have gotten much better. I thought I would see what one recommended for our code base regarding SQLite, it gave some really good recommendations, including bulk DBI API methods, and tweaks to get the most out of SQLite.

On top of this, we could also provide dynamic analysis to see how one’s database is performing and recommend (and make) database changes. I didn’t prompt the AI to do that here, but putting esoteric settings in the wiki probably won’t be seen by many people.

Here is the PR, complete with summaries, all written by AI:

Interesting though SQLite Optimizations #2098 is, I thought that it had been decided that AI could not be used for contributions. “The last couple of months” is not really that much time to get a good appreciation.

Besides, in the context of SQL (and the switch from Pickled BLOBs to JSON), I would much prefer that the code base be improved in this area before going all out on further work. I have commented in this area before, but:

  • Clarity and understandability of code and static type checking
  • Is it still the case that some DB methods can return either the desired thing, or None, and wasn’t there a comment that this gave rise to lots of hidden errors, so the code should be changed to type check that only the desired thing is returned, so these errors can’t occur.

@DavidMStraub, one complication is that when these methods are overloaded in a proxy (like LivingProxyDb, or PrivateProxyDb) then a method might return None. This is the number one issue in bugs throughout the code base: we don’t check to see if the return value is None and then we end up with crashes.

I think with the correct type hints, we could catch such errors.

  • Has static type checking been implemented for all the DB methods and calls?
  • There is a lot of code for JSON where there are long hierarchies of methods, because of the introduction of a small change, rather than making a clear abstraction.
  • Some of the details of changes for JSON (not the overall concept) achieved small performance improvements at the expense of added complexity and reduced understandability. (Interesting that is probably not the case for SQLite improvements here).

Code is read much more than it is written, and whereas quick changes are ideal for prototypes, the code really needs to be cleaned up for production and maintenance (I’m talking about JSON code, not the SQLite optimisations here).

@DavidMStraub I know you had been looking at this sort of thing - e.g. your comment on Add type hints to the generic database handler by DavidMStraub · Pull Request #1858 · gramps-project/gramps · GitHub

What I am trying to achieve is clarity and explicitness and type checking in the code, like there is in gen/db/generic get_person_from_handle. I would much prefer replicated get_person_from_data, consistently like all the other get_foo_from_bar in gen/db/generic.

I am with you on this. Explicit is better than implicit, and if something is difficult to type-hint, it probably means that it’s not explicit enough.

[BTW, I had also given AI a go. It was interesting how much it got right, and ‘understood’. It wasn’t much good at producing a gramplet from a description - it ran but didn’t do what I wanted. Further prompts made it fail to run though nearer to what I wanted. I also tried a code improvement - surprised at what it refactored into subroutine, it ran but duplicated some of the output, due to a complicated misunderstanding on its part about the properties of a particular type. So what. I wouldn’t trust it to produce good, clear and correct code.]

sqlite.py

self.__connection.execute("PRAGMA cache_size = -64000;")  # 64MB cache
self.__connection.execute("PRAGMA temp_store = MEMORY;")
self.__connection.execute("PRAGMA mmap_size = 268435456;")  # 256MB mmap

Instead of hard coding these values they should be in an .ini file.
The values should ideally be changeable in settings in Gramps

I don’t see any ANALYZE statement in the code. Though ANALYZE is not required, it is recommended to create statistics on tables and indices, which will be used by the optimizer to create the best query plan.
The ANALYZE statement can be run as “PRAGMA optimize;”

Hi Doug, How interesting that you posted this as I’m just finishing up testing on a very much alpha version of a modern postgresql addon using psycopg3 and including several enhancements (vectordb, graphdb, fulltext search, etc.) not yet activated for projects I intend to pursue. I’m currently working on one project that already has 100,000 GEDCOM persons to integrate for the Clan Henderson Socitey, and I’m not even 5% done with my imports (we have more than 400 GEDCOM trees we own). Thus I need a real postgres solution (and I tend to loathe SQLite anyway).

What I foulnd during this fun project was that I really wish GRAMPS would push back its abstraction level a bit since so many limiting factors would be eliminated if hewing to the SQLite implementation were just slightly less fervent. It’s workable still, but it wold be so much easier to implement amazing features if this were so.

I’ll push out some version of this enhanced postgresql addon probably in the next hour or two, so I hope you will give it a look. Be gentle with me when you do! I’m sure you’ll find something dumb, but I want to hear about it.

Thanks,

Greg Lamberson

It would be great if we could add methods to the DBI-API interface that both postgresql and sqlite (and others of course) could implement. Looking forward to seeing your code. Thanks for posting!

It would be great if we could add methods to the DBI-API interface that both postgresql and sqlite (and others of course) could implement. Looking forward to seeing your code. Thanks for posting!

I’ll be glad to. I’m finishing up testing and formatting of my postgres addon but I’ve been keeping a steady report during my work so I’ll have ideas to share and some practical solutions that hopefully won’t be to painful. Thanks!

Greg

Our old BSDDB backend had settings of this type in a dbconst.py file. They were fairly easy for expert users to tweak. However, I would caution against encouraging novices to make database configuration changes. We should choose some good defaults.

Some options may also have disadvantages. For example the write-ahead logging “does not work over a network filesystem”

If it throws an exception, then we can instruct those users on what to do (or do it automatically).