URL management strategy

I am facing an upsetting problem with “Permalinks”. I don’t keep screen copies of records I retrieve from digital public archive services (for obvious reason of disk space). I prefer to save a Permalink in a Link-type Note.

However, I discovered while checking data I collected years ago that permalinks are not as “permanent” as the theory boasts about. Several public archives renewed their hosting contracts often with a change of subcontrator and a consequent change of software.

Since there is no “standard” to issue permalinks, software vendors are free to define their own rule. Thus, though the primary data is still the same, permalinks to access them is different. I must edit several thousands such permalinks. In some cases, software assigned 3 different permalinks to same source, depending on the semantics of the final record (this is the case when the source contains birth, marriage and death records in the same “book”).

Schematically, from my observations, there are four permalink “styles”:

  • Permalinks designate the source book and you must manually navigate inside it
  • Permalinks designate the source book and you can extend it with “parameters”, either as HTML query, or with deeper ark: components
  • Permalinks designate the page
  • There is no permalink per se and the link is in fact a query (probably the most stable permalink because query arguments are intimately related to the record)

The review process is extremely tedious, time-consuming and error-prone.

To simplify it and improve its reliability, I suggest to provide a “hierarchical” link management because permalinks, at least those based on ark: specification are themselves “hierarchical”.

A Citation is a “pointer” into a Source which is stored in a Repository. It should then be possible to “chain” the URL designations to synthesize a final permalink.

The Repository record contains an array of URLs, presently with types WEB_HOME and WEB_SEARCH. For a reason which will become clear later, I suggest to add a new ARK_PREFIX type.

Usually, the host name in these URLs is the same and it would be quite nice to have the host name recorded only once but managing 2 or 3 independent URLs is not a big deal.

I attach a Link-type Note to a Source. This Note contains a link-formatted user-readable description associated with the link. Several Link-Notes can be attached to a Source (because several permalinks can be assigned to it) or the Note can contain several lines, each with link information. The note can also host comments. In case there are several links, I prefix a “key” to the link, serving both as a user reminder and a computer identification:

<key> <link_data> <optional_comment>

The Link-type Note attached to the Citation only contains “link-data”.

My idea is to add “descriptors” in the link to be replaced by the corresponding value in the parent chain.

In Citation Note {SRC <key>} would return the full URL corresponding to the optional key when the link is control-clicked. If key is not given, first hit wins. If key does not exist, an error is reported. The Source is extracted from fields in the Citation record.

in Source Note {REPO <key>}would return the full URL corresponding to the optional key when the link is control-clicked. key is WEB_HOME, WEB_SEARCH or the new ARK_PREFIX. Again, the Repository is extracted from fields in the Source record.

Of course, an escape mechanism must be provided in case the real URL contains a left curly bracket. This can be done with reverse solidus \ with semantics take next character unconditionally.

What my specification does not cover is the case where a Source is available in several repositories, which I describe a a single Source record referencing several Repositories. If software is not the same, we can end up with different permalinks and possibly different ways to designate the page or record.

I mentioned above that the substitution occurs only when the link is Control-clicked (or generated for NarrativeWeb). Otherwise the specification does not meet the goal of easy maintenance where host name is exclusive written in Repository records, entity ark: designation in Repository record, id for source only in source Note records and arguments in citation Note records. Thus only a limited number of Notes needs to be modified when a change is made by the archive services.

Deferring replacement means either get_path() or display_url() must be adapted.

Before I experiment with this idea, I’d like to receive comments on possible shortcomings, generalisation, improvements or misconception from my part.

This is why I’ll never store URLs in my data

In most cases I copy the transcribed source into a note. I’ll copy and store the URL, but it is only a helper to get back to the source. I’ll likely never use it, but I have it if necessary. Therefore this would be irrelevant to me.

1 Like

Absolutely agree URL’s and “permalinks” are to store short term and no
problem if/when they disappear.
Everything with an internet base is fluid and not suitable for long term
storage.
I help out an admin on a Family History Website they have a suggested
links page with about 900 links every time I check I find a failure rate
of about 5%.
So forget them, use an alternative storage system
phil

Recently, the departmental archives, where I have many sources/quotes, updated their sites, and as a result, hundreds of my permalinks were broken.

Faced with user protests, service providers are gradually trying to revive them. As a result, the URL syntax has become particularly long.

My strategy for all citations:

1 - Indicate the source rating as precisely as possible

2 - A screenshot if the individual is a Sosa

3 - The permalink in a citation Note: When working, the permalink provides quick access to the source.

Furthermore, consultation is easy in the Relationship/Combined view.

Furthermore, this link will appear in the Individual Record report, unlike an attribute. When exporting to Geneanet, there are no problems, except for the excessively heavy records due to the repetition of these long URL links.

Before exporting to Geneanet, I replace all of these with a glyph :link:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.