What samples does Gramps have of Statistical Analysis

A user on Reddit r/gramps wants to explore creating an occupational statistics bar chart diagram.

What sample code can a prospective developer review? Here’s a partial list of Gramps features that provide comparative statistical information:

  1. Top Surnames Gramplet: Text list of the 10 most frequent surnames, with percentile and count]
  2. Surname Cloud gramplet: Surname font size scaled by frequency
  3. Age Stats Gramplet: Distribution bar chart of lifespans in the family tree.
  4. Pedigree Gramplet: Includes a completeness percentile of generations at the bottom
  5. DNA Segment Map Gramplet: Chromosome map comparing DNA samples
  6. DNA Matches gramplet: compares shares cM, segment count and segment size
  7. Statistics Chart (Graphical Report): Attempts to display comparative bar charts with selectable criteria
  8. Database Summary report: Text list displays the overall statistics concerning number of individuals of each gender, various incomplete entries statistics, as well as family and media statistics
  9. Compare Individual Events Tool: Allows analysis and comparison of events across the database
  10. Number of Ancestors quickview: count of ancestors per generation and completeness for a selected person
  11. Number of ancestors report: count of ancestors per generation and completeness for a selected person
  12. Ancestor Fill report:
  13. Descendant Count gramplet/quickview: count of all decscendants for each person in the tree
  14. Relationship Calculator: Provides statistical analysis of family connections, adaptable to language-specific terminologies
  15. Research and Analysis Tools: Include features like duplicate people finder and interactive descendant browser, which can provide comparative data
  16. Heatmap web report: filtered Event frequency on a geographical heatmap
  17. Verify the Data: Identifies and reports on data inconsistencies, providing a statistical overview of database quality
  18. Information Map: spidered connections GraphViz graph of tree object connections

Example target chart:

  • Purpose: To visually compare the prevalence of various occupations across different time periods.
  • Chart Title: Occupations Over Time
  • X-Axis: Timeline spanning centuries at quarter-century increments.
  • Y-Axis: Quantity of occurrences for each occupation.
  • Bars: Each bar represents a specific occupation, with height proportional to its occurrences. Bars are grouped by time period and colored differently to distinguish between occupations.
  • Legend: A key is included to map each color to its corresponding occupation.

The problem isn’t the graphic. The problem is to normalize the occupation names. In my tree there are for example: farmer, Bauer, Bäuerin, Landwirt,… They are all varieties of farmer.

And I have organ builders, organ builder master, organ pipe makers,…

And I have “teachers wife”. Or “retired XYZ”. Or…

So it is necessary to exclude some of them like “teachers wife”, to translate them to a unique occupation code (there are several standards for that), and group the codes hierarchical before starting to visualize them.

1 Like

A form of a hierarchical dictionary of alias’ for occupations and types and other lists used…?

A colleague of mine is working on that topic. Dr. Katrin Moeller has contributed significantly to the standardization of job titles through her research. You can find some papers at her publication list: Publikationen - Dr. Katrin Moeller

Here are some of her pertinent publications on this subject:

  1. “Standards für die Geschichtswissenschaft! Zu differenzierten Funktionen von Normdaten, Standards und Klassifikationen für die Geisteswissenschaften am Beispiel von Berufsklassifikationen”

In this paper, Moeller discusses the importance of authority data and standards in the humanities, particularly concerning job classifications.

  1. “Automatisierte Identifikation und Lemmatisierung historischer Berufsbezeichnungen in deutschsprachigen Datenbeständen” (co-authored with Jan Michael Goldberg)

This article addresses the development of an algorithm for the automated lemmatization of historical job titles, aiming to reduce manual effort in data cleansing.

  1. “Die Ontologie historischer deutschsprachiger Berufs- und Amtsbezeichnungen. Interoperationalität und Berufsklassifizierung durch semantisches Topic Modeling” (co-authored with Robert Nasarek)

This work presents the development of an ontology for standardizing and classifying historical German-language job and official titles.

I have also used Supertool to generate some sample charts, see

These include an “occupational statistics” chart similar to what the user requested.

2 Likes

Those charts are very elegant and would make great additions to reports.

Have you given thought to the process of making a report (or book of reports) from such a script?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.