Discussion: minimal, anonymized, opt-out telemetry for Gramps Web API

DavidMStraub · May 27, 2025, 9:30am

This is a question to existing Gramps Web users:

How would you feel about Gramps Web (API) adding anonymized telemetry that would send a unique identifier of the instance/tree (not based on any of the data in the tree or the installation) plus the current API version number to a statistics endpoint by default, unless you opt out by changing a flag in the configuration?

Here is the rationale for this idea:

We currently have no idea how many users are actively using Gramps Web, but the number is clearly increasing. From docker pull counts, I would estimate we currently have well over 5000 installations, but it’s very hard to tell. For comparison, Webtrees has detailed statistics from their telemetry (which AFAIK is not even opt-out).
Apart from just being nice to know, a strong reason why knowing the rough number of active installations is important is map tiles. I want to keep having free map tiles as a default, but we don’t want to overload a free tile service with a huge number of users. (By the way, Grampshub uses the subscription fees to pay for tiles from MapTiler.)
Opt-out rather than opt-in is purely for statistical reasons: I assume only a small fraction of users would actively enable statistics (out of mere inertia), and the fraction of users that enable it would be impossible to determine, rendering the statistics useless.

The reason for starting this thread is to get an impression of whether users think this is reasonable.

Technically, it would mean that, roughly every 24 hours, a JSON of this form would be sent to a statistics server:

{
    "tree_uuid": "86f9fe91-3d5d-4ba8-acd2-d9ebaedf563e",
    "server_uuid": "dd5a8089-e3d4-4208-a75a-5a7e4afb39eb",
    "api_version": "3.1.0",
}

The UUIDs would be stored in a cache directory, allowing e.g. to configure resetting them every x days. Disabling it completely would also be possible.

bgee · May 27, 2025, 11:38am

My thoughts:

It would be very interesting and useful to get some sort of idea how
Gramps is used. A bit of telemetry should not be a problem. I think
there are a few requirements:

Make the opt-out easy to find.
Run in the background. No impact to performance.
Fail silently.
Log the attempts.
It would be useful to include operating system.
Include whether running as a docker/container/snap/flatpak/appimage
(etc) or as a native executable.
Perhaps have several levels of reporting, similar to what the KDE
project has. Default would be the basics. The next level might include
my items 5 and 6, and another level with additional data such as size of
the active database.
If opt-out is chosen, then send one last item requesting database purge.

codefarmer · May 27, 2025, 12:05pm

As a hobbyist, I have no concerns about sharing the data you mentioned. I imagine that more usage data may be collected in the future to learn about usage of the product (I understand the value of that), so it would be important to publish the data collection policy on a web site reassuring users of anonymity, privacy and security of the data. Also, do you know whether GDPR or other regulations apply to such data collection?

Finally, would it be better to send data only if one of the three collected values have changed?

DavidMStraub · May 27, 2025, 12:33pm

My naive understanding is that by ensuring that there is no personally identifiable information is collected, GDPR does not apply.

I don’t think this makes sense. The UUIDs are supposed to be constant, so they will not change by definition. And the version might be constant for a long time if there is no new backend release. Since the point is to report active installations, I think it makes sense to do it as a fixed interval.

My idea to implement it in practice without having to set up a cronjob (or celery beat) would be:

On every request, in before_request check when last ping was sent. If less than 24 h or disable flag is set → do nothing
If more than 24 hours, dispatch a telemetry background task to celery and continue with the request → request duration not increased noticable as the operation should only take ms and does not have to wait for a remote server to respond
Telemetry background task: post the above JSON to the telemetry endpoint.

ahs3 · May 27, 2025, 5:07pm

Any sort of telemetry tends to make me squeamish; as the saying goes,
“it’s not paranoia if they really are out to get you” :).

That being said, this info seems pretty innocuous and I would possibly
even opt-in. Before doing so, however, I think there would need to be
some things done:

Provide a very clear, easily found policy and description of
exactly how this info is generated, collecte, and used.
The opt-in/opt-out question must be clear and asked for upon
installation, and on subsequent upgrades as a reminder that it is
happening.
Once a day seems far too often to me. Once a week seems better,
like Debian popcon, but that’s a personal opinion. It kind of depends
on the next item, I think …
What does “active” mean in this case? If you’re just trying to
count the number of installations, sending the JSON once after an
installation is running (with approval) seems to be sufficient. If
you’re trying to see how often that particular site is actually being
used (logins, changes to data, queries to the db, and so on), then I
don’t see how this will tell you that. Just as an example, my site
can go for days, even weeks, without any activity as other research
gets done (sometimes I even shut down the cloud instance). Is that
“active” or not?

Just my $0.02 worth …

PLegoux · May 27, 2025, 5:11pm

Maybe GDPR applies: Telemetry data + I.P from where telemetry data is coming from

DavidMStraub · May 27, 2025, 5:25pm

That’s why I think the IP must not be stored.

I think this doesn’t work, because (as can be seen on the forum) people often wipe there installs and start again - in fact being able to do that and being able to port the data is an important feature. If we only ping once after install, the number of installations would rise monotonically, but would be completly unrealistic. To get a realistic estimate of the instances actually running, we need some “keepalive” signal to also detect of instances disappear.

That being said, I am not too attached to the daily rythm, weekly could indeed work as well.

ahs3 · May 28, 2025, 4:55pm

Ah, okay. I see what you’re trying to capture now. That makes sense, then. “Active” means “still running” vs “I’ve just installed it recently.”

hdholm · June 3, 2025, 2:02am

I don’t have very strong feelings one way or the other, but I do think a number of good points have been raised. Having a potential opt out via a flag in the configuration is good, but I really do like the ideas of @bgee of having potentially more information OS, etc. by opt in and a clear description of what’s happening. So maybe another page under administration or as part of the system system information page that includes the options (obviously including opt out) and a clear description of the service and it’s default and current state.

Topic		Replies	Views
Gramps Web API: first release [July 2021] Gramps Web websolutions	6	2076	June 25, 2023
New releases, updated docs, call for support Gramps Web	11	781	June 26, 2023
Automation and Gramps Web API Help websolutions	3	551	August 28, 2022
Where can I see logs (info-error-etc) for Gramps Web Gramps Web	4	91	November 1, 2024
Gramps user statistics Help	7	526	September 8, 2021

Discussion: minimal, anonymized, opt-out telemetry for Gramps Web API

Related topics