Storage Abstraction: Lessons from PostgreSQL Enhanced + GrampsWeb Integration

Dear Gramps developers,

I recently completed integrating PostgreSQL Enhanced with GrampsWeb. The exercise required 12 (!!!) workarounds to handle architectural assumptions about storage. I’m sharing this as a concrete example of why we need to revisit the Gramps storage layer architecture.

Technical Findings

Filesystem Coupling in Core

gramps/gen/db/dbconst.py:73

DBBACKEND = "database.txt"  # Hardcoded assumption

This constant propagates through the entire stack. GrampsWeb’s is_tree() function literally checks for this file to determine database existence. For a pure-database backend, I had to create fake filesystem structures.

Missing Base Class Methods

GrampsWeb calls these methods not defined in DbGeneric:

  • get_transactions(page, pagesize, **kwargs)
  • set_metadata(key, value)
  • get_metadata(key, default)

Every backend implementing web support has to add these independently. There’s no interface contract.

Static Backend Registration

gramps_webapi/dbmanager.py:44

ALLOWED_DB_BACKENDS = ["sqlite", "postgresql", "sharedpostgresql"]

Adding a backend requires patching source files. We should be using entry points or a registry pattern.

SQL Dialect Issues

psycopg2 vs psycopg3 incompatibility in PostgreSQL backends:

-- psycopg2
WHERE x.name IN %s

-- psycopg3  
WHERE x.name = ANY(%s)

Each PostgreSQL variant handles this differently. We need dialect abstraction. Personally I LOATHE psycopg2 due to many long years of pain in implementing its use. You won’t find it in any code I use. It’s a database driver from 2008 replaced in 2019. There’s frankly no reason to use it. And getting it to work in the real world is usually painful due to its slightly different syntax and its age, making finding drivers a pain with most implementations.

Architectural Impact

But this isn’t PostgreSQL-specific. Any non-filesystem backend faces these issues:

  1. Backend proliferation: We have 3+ PostgreSQL variants because each developer solves these problems differently
  2. Code duplication: Same features reimplemented in each backend
  3. Testing complexity: No interface contract means no meaningful mocks
  4. Feature gaps: Backends implement different subsets of functionality

The Deeper Issue: DBAPI Architecture

These problems stem from fundamental issues with the DBAPI layer:

  1. Not Actually DB-API 2.0 Compliant: Despite its name, DBAPI doesn’t follow Python’s DB-API 2.0 specification (PEP 249)
  2. SQLite-Centric Design: The entire abstraction assumes SQLite patterns
  3. Filesystem Coupling: Directory-based initialization (_initialize(directory, username, password))
  4. No Connection Abstraction: Single connection architecture prevents pooling, replicas, or scaling

As documented in previous analyses, DBAPI is actually three things mixed together:

  • SQLite-specific implementation details (should be in SQLite class)
  • SQL database abstractions (should be in a SQL base class)
  • General database operations (already in DbGeneric)

The Storage Layer Challenge

The current inheritance hierarchy creates cascading problems:

DbReadBase + DbWriteBase
           ↓
      DbGeneric
           ↓
        DBAPI  ← [SQLite assumptions baked in]
           ↓
    SQLite / PostgreSQL (forced to work around DBAPI)

Meanwhile, sophisticated patterns exist but are disconnected:

  • Proxy pattern (PrivateProxy, LivingProxy, FilterProxy)
  • Transaction management (DbTxn)
  • Multi-tenant support (SharedDBAPI)

These need to be preserved while fixing the storage layer.

Architectural Direction

What we need:

1. Clean Separation of Concerns

  • Move SQLite specifics out of DBAPI into SQLite class
  • Create proper SQL abstraction base class
  • Enable non-filesystem backends without workarounds

2. Modern Storage Patterns

  • Connection string support (not just directory paths)
  • Connection pooling capabilities
  • Support for cloud databases
  • Native JSON types where available

3. Backward Compatibility Through Layering

Following the “Layered Core with Compatibility Shell” pattern:

  • Existing code continues to work unchanged
  • New capabilities available through opt-in
  • Progressive migration without breaking changes

The goal isn’t to prescribe a specific solution, but to enable the flexibility that modern genealogy software needs while preserving Gramps’s stability and existing patterns.

Why This Matters Now

The genealogy landscape is evolving:

  1. Multi-user Collaboration: Families want to work on shared trees simultaneously
  2. Cloud Storage: Users expect data accessible from anywhere
  3. Scale: Large trees need real database features (indexing, query optimization)
  4. Modern Backends: Graph databases for relationships, time-series for events

The current architecture blocks all of these use cases.

Evidence from the Field

My PostgreSQL Enhanced integration is just one example. We’re seeing:

  • Multiple PostgreSQL variants (postgresql, sharedpostgresql, postgresqlenhanced)
  • Each solving the same problems differently
  • No standard way to support cloud databases
  • Repeated workarounds for the same architectural issues

Moving Forward

This isn’t about one backend or implementation preference. It’s about enabling Gramps to support the storage systems users need, whether that’s:

  • Local SQLite for individual users
  • PostgreSQL for concurrent access
  • Cloud databases for web deployment
  • Graph databases for relationship analysis

The 12 workarounds I needed demonstrate that current Gramps architecture is blocking innovation. We can fix this while maintaining full backward compatibility through careful architectural evolution.

For Discussion

Rather than proposing a specific implementation, I’d like to discuss:

  1. Do we agree that storage layer abstraction is needed?
  2. What are the non-negotiable compatibility requirements?
  3. How can we enable modern backends without disrupting existing ones?
  4. What’s the right timeline for architectural evolution?

The PostgreSQL Enhanced integration works - but it shouldn’t require 12 hacks. Let’s discuss how to remove these barriers for all backend developers.

Working code and detailed analysis available for review

8 Likes

Thanks. I agree that work needs to be done in this area.

You should create a Gramps Enhancement Proposal GEP. No additional work is required - the content of this post is a good starting point.

This also needs to be discussed on the gramps-devel mailing list, which is our primary method of communication between developers. Not all of our developers use this forum.

There is a 2014 GEPS that remains open. Should it be used or a new one opened?

Gramps Enhancement Proposals (GEPS) - Summary

GEPS #
Title
Proposed Status Released Note Revision Control
GEPS 032:
Database Backend API
2013 Finished Gramps 5.0 Plug-in replacements for BSDDB. This allows the use of other databases as a backend.
GEPS 033:
Abstract Database API
2014 In Progress Working on for Gramps 4.1 WIP on SourceForge

We should open a new one. They are not the same.

1 Like

Can discussion of the Host/Port preference be added to this GEPS?

I’ve never found an entry point for user testing the (mostly missing) documentation for that Database Location option.

Is it for Cloud or Local Network databases?

Yes, this is covered in the “Filesystem Coupling in Core” section.

Some years ago now, I wrote a prototype that replaced the database.txt, name.txt and metadata.db files in database directories with a connections file. This removed the need for a local directory for a remote database and allowed a SQLite database to consist of a single file.

There wasn’t much interest back then, but maybe now is the time.

2 Likes

Thanks. I will create a GEP. I had oral surgery yesterday so I was out of commission but I’l, get caught up now.

1 Like

When you create the GEP, could you please give some specific examples, for example of

  • Move SQLite specifics out of DBAPI into SQLite class

  • Create proper SQL abstraction base class

  • Enable non-filesystem backends without workarounds

and

As documented in previous analyses, DBAPI is actually three things mixed together:

  • SQLite-specific implementation details (should be in SQLite class)

  • SQL database abstractions (should be in a SQL base class)

  • General database operations (already in DbGeneric)

Which are the SQLite specifics that should be moved out of DBAPI? Which should be in an SQL base class (and examples of what it should contain), and examples of the general database operations in DbGeneric (and so where should they be instead)?

As a separate issue, should there be a separate abstraction layer for JSON and serializer and strings and raw data and dicts, and where would this fit into the inheritance hierarchy?

One of the unfortunate realities of the Gramps information sharing ecosystem is that it is piecemeal.

The different systems (GitHub, SourceForge mailing lists, MantisBT, MediaWiki, Sphinx, WordPress, Discourse, Reddit, Facebook) have different … and incompatible… markdown formatting options. They have separate login account systems.

Only GitHub and Discourse include any WYSIWYG formatting editors.

And the MediaWiki markdown system (where the GEPS articles are collaboratively edited) has an enormous number of peccadillos and customizations. Requiring specialized expertise.

I suggest that “reformatting markdown between information systems” is one place that the gramps-project should loosen the restrictions on AI assistance.

That a developer should not be penalized for writing documentation in a familiar Editing Suite and allowing an AI tool to recast the markdown for the target system. (Note that this requires prompting the AI that that target markdown is formatted as needed. Such as : MediaWiki articles comforming to the style of a specific wiki article. And that the response be delivered as “code”, not with the rendered markdown.)