Adding mimetypes to notes: few questions

Hi!

I’m back at working on adding mimetypes to notes.

I think the update script should add a text/html mimetype to notes of type “Html code”. What should it do with the others? Add a custom mimetype, like text/x-gramps-styledtext for instance, or leave it empty? check whether there’s any formatting and add text/plain in case none is found or should we keep this for the GEDCOM export?

Would it make sense to add a text/csv mimetype for the DNA Segment Map Gramplet notes?

Totally unrelated: gramps_upgrade_21 keeps some blobs in the metadata table “if someone tries to downgrade the db”. Is there actually any downgrade procedure? Should I also add a downgrade function?

Notes are designed to be output in a variety of formats. Document generators process the notes and determine the mime type of the output. The notes themselves don’t really have a mime type.

In Gedcom 7.0 only two mime types are supported: text/plain and text/html.

Currently we only export plain text to Gedcom. Formatting information is lost. In the future we could write html.

Special note types such as HTML code or DNA segment maps should probably be exported as text/plain so that the structure is preserved.

Gramps has HTML markup notes for Narrative Web page inclusions.

And the native styling of Notes uses a Out-of-band markup or positional markup when writing Notes to XML.

Can we assume that, sooner or later, Notes will include some other formats too. We already use GitHub markdown format for README.md files, MediaWiki markup, Discourse styling markdown (a mix of markdown, BBCode, HTML), Sphinx MyST dialect of reStructuredText (reST). These seem to be the most likely candidates.

So should note mimetypes anticipate this?

(And there seems to be a SimpleDoc format too? Is that what is used in Gramplets like the “Welcome to Gramps” plugin? )

Language/System Type Syntax Style Typical Use Case Example: Bold Text Notes
HTML Markup Tag-based Web pages, emails <strong>text</strong> Very expressive, but verbose
MediaWiki Markup Markup Wiki-specific Wikis (e.g., Wikipedia) '''text''' Some HTML allowed, unique wiki syntax
GitHub Markdown (.md) Markdown Lightweight, minimal README files, docs, comments **text** or __text__ Converts to HTML, easy to read/write
Discourse Styling Markdown, BBCode, limited HTML Mixed (Markdown, BBCode, HTML) Forum posts, discussions **text** (Markdown)
[b]text[/b] (BBCode)
<strong>text</strong> (HTML)
Supports Markdown, BBCode, and safe HTML
Sphinx
MyST Markdown
Markdown CommonMark + Sphinx extensions Technical docs, Sphinx projects **bold** Extends standard Markdown with roles, directives, and cross-references for Sphinx

Yes. Document generators are plugins, so adding a new format involves writing a new plugin. This could be in the form of a third-party addon.

That is not necessary. Only the document generator needs to know about the mime type, not the note.

1 Like

I recall that the CSV data for the DNA Segment Map Gramplet is stored using “DNA” type Association Notes. Doesn’t it seem like Media Files are better place to link CSV data?

There are other addons that can leverage user customized CSV files. (Such as the new Historical Context and WebSearch addons.) One they are linked as Media Objects, the OS’s application for "view"ing CSV files is used to open them instead of the Note Editor. (Although Gramps only supports opening with the Default OS application. A Context menu to “Open With…” would be very welcome.)

For DNA, there are several “standard” data exchange formats:

Format Type Syntax Style Typical Use Case Notes
VCF Markup Tab-delimited Variant data exchange, analysis pipelines Industry standard; supports SNPs and other variants
PML (OMG) Markup XML-based Interoperability, database exchange Standardized by OMG; platform-independent
Tab-delimited Plain text Tab-separated Simple data exchange, spreadsheets Supported by many tools; less standardized than VCF
XML Markup XML Interoperability, programmatic exchange Flexible; used in tools like SNPper

Not sure I am following the top-level issue here. The data used by the DNA Segment Map gramplet can be either csv or tsv. It is generated via cut-paste from the various separate external apps that provide the segment info (FamilyTreeDNA, GEDmatch, MyHeritage, LivingDNA, …)

The goal was to make the import easy from these apps. Editing, if needed, can be done in the Note itself currently. Since the format provided from these apps sometimes changes and is language-dependent (radix, thousep), some editing may be required. There is no markup in these notes.

I don’t think changing to use a Media File instead of an AssociationNote has any advantage, and does have disadvantages:

  1. requires an external editor.
  2. Person Ref editor would need to add a Gallery tab.
2 Likes

Yes. The DNA Segment Map gramplet uses notes as a convenient place to store data. These notes are not really intended to be read directly by the user.

I believe that this is working toward supporting a GEDCOM 7 functionality

See

  • 0013176: [GEDCOM 7] Support Mime-types for notes
  • 0012226: [GEDCOM 7] Support Import & Export of New (June 2021) version
  • Pull Request #2047 : Add mimetypes to notes by olivierberten

GEDCOM 7.0 modernizes media handling throughout the specification, including requiring valid MIME types for multimedia objects. However, for notes specifically, GEDCOM 7.0 does not require a MIME type tag for plain text or markdown notes, as the format is implied by context (plain text or markdown). The introduction of markdown as an allowed format is the main change relevant to the “type” of note content, but not as a formal MIME type tag.

As I said in my previous post, Gedcom 7.0 only supports two mime types: text/plain and text/html. The minimum requirement for text/html is that we recognise the p, br, b, i, u, s, sup and sub tags, together with the &amp;, &lt;, &gt;, &quote; and &apos; entities. Since superscript and subscript were added to the editor, this is no longer a problem.

All that is need is a simple HTML parser to read Gedcom containing notes with text/html content.