Hmm, I have 32 GB RAM in my machine. But of course, this doesn’t say very much about the RAM avialable for Gramps at a specific moment …
Good to hear that! I’ve always pointed out that in my humble opinion, Gramps is by far the best and most powerful genealogy software on the market since it allows to go far beyond the old GEDCOM philosophy deep into historical sociological territory but that it needs to be able to work much faster with large datasets by supporting the native power of modern database backends. I have not always had the impression that my discussion partners share this understanding.
@Nick-Hall has suggested in the past that the memory allocated to caching might be insufficient for high performance.
Although he recommended experimenting to validate that theory, a test process was not specified. And we probably need a standard set of tests so that statistics can be gathered.
The timer that @kku added to the Filter+ addon gramplet will make stats much more accurate. (The “define filter” scraper made it into the 6.0 Filter but I do not recall if the timer also did so.)
Just a thought: is this really necessary? Of course it’s always a good idea to support as many database backends as possible, but this usually comes with the disadvantage of making things much more complicated. So wouldn’t it be a more pragmatic approach to look for the famous 80 % solution (e g only backends understandig SQL) instead of a 100 % one? Having spent long years in CIO roles I know very well that this discussion is always a hot topic, so I guess you’ve already gone through this discussion in the past. If you find some time I’d very much appreciate if you share your thoughts and conclusions.
Keeping the API abstract isn’t necessary, but it is part of the philosophical foundation of Gramps. It isn’t that complicated to allow any database backend (there is one class that you must implement for any database use). But it is limiting in that you can’t reach down (in general) below that level to get additional power.
But it is possible to have specific backends do things in a fast manner. The current plan (if I can speak for the developers) is to do one of the following:
Create a db.select_from_TABLE() methods that can be implemented very fast in databases that support the SQL methods, and fall back to a slower method for those that don’t.
Create a collection of “business logic” methods (such as db.get_all_ancestors()) that could be implemented in a fast manner for those that support it, and use general access methods for those that don’t.
Don’t do anything. Adding anything more is too complicated and not worth it.
Option 1 and 2 are both very similar, but the first is general and can accommodate new queries without having to change any code. The second is very specific, but can be tested directly. But does require adding things to the code, and you are stuck until those are added.
Option 3 is a choice that at least a few developers would choose. I suspect that they have family trees that are smaller, and they don’t see the need.
I’m in favor of Option 1, and have even written a prototype of it. The PR has the code and some good discussion of the ideas. (This was written when the JSON conversion was first being introduced, so has some mentions of things that are now settled).
In any event, my opinion is the complication of Option 1 is worth it, and it isn’t that complicated (core code about 250 lines). But it needs more testing and discussion.
Hope that helps! We’re a little bit off topic of @Kat‘s original question. Sorry!
Thanks – I now understand the discussion much better. I have my thoughts regarding the philosophical foundation and the resulting need for an abstract API but I see your point. And yes, we are off topic now, sorry from my side as well!
And having apologized for getting off topic, I’d like to point out that this off topic discussion has clarified that there is not much one can do if running into speed problems with large databases. You can always play around with faster harddisks or SDD disks or more RAM or a faster machine, but at the end of the day this will not speed up things significantly. The problem boils down to the current code base and how it leverages (or better: not leverages) the database backend. According to my experience, searches in the person or event views are slow with a large database (ca 90k individuals, ca 230k events) and become even much slower (run time x2 or x3) when upgrading from v5.1.2 on BSDDB to v6.0.x on SQLite. All other searches are reasonable fast (i e 5 sec or less). So for the time being, there is no alternative to v5.1.2 on BSDDB for a large database.
Probably inspired by Pareto, but this famous 80 % solution is famous among CIOs and other management guys: don’t try to be perfect, just make it as good as we need it. Spending the additional time and money going from 80 % to 100 % will never pay off.