In genealogy, a rose by any other name may not smell sweet. A feud broils over what is acceptable, when it comes to naming conventions. Do you use question marks for unknown portions of a name? Do you write helpful information in the suffix field? Congratulations, we’ll call you a Montague! Do you get annoyed when you see people doing the above, fearing trashy data transfers — a messed-up GEDCOM? You, friend, we’ll call a Capulet. In determining how to use the name fields in our software, we find ourselves having to choose the house of Montague or Capulet — expedient practicality or clean data sharing. Some want both, and we call ourselves GEGs. Starry-eyed GEG I may be, but with the right tools and rules, I think Romeo and Juliet can have a future together.
Get comfortable, friends. This subject can’t be swallowed in a quick gulp….
A plague on both your houses
Montagues want to be efficient and effective in building their family trees — and use names as a tool to that end. Capulets are those who want us to follow a style that ensures our trees can be pulled into other people’s databases with minimal errors. The Montagues are fast-moving and effective; the Capulets are stable and see the long-term. But the two ideas have long been at war. Both require us to sacrifice something of value.
Capulets: The name field must be cleanly transferable, above all
The Capulets place a high value on the collaborative nature of genealogy. They share information with one another. They upload it into online trees and vitally important databases. They export data for publication.
Capulets want us to follow rules because:
- They’ve seen the horrors of GEDCOMs imported from “creative” trees, with thousands of names having to be corrected one by one.
- They want us to be able to use tools like Ancestry.com’s “Lifestory” or Legacy’s “Descendants Report,” without having trash populating the prefabricated sentences.
- They love structure and rules, and don’t mind taking their time to do things “by the book.”
Capulets say we’re never to use a question mark, other symbols, or a derivative of “Unknown” in the name fields in our software. Never to use multiple terms in a last name. Never to put anything but the pieces of the name we have proven–and then only the name given at birth. If it doesn’t fit the rule, hide it in the notes. Period.
To a Capulet, the Montagues are unschooled in proper technique and an annoying impediment to the ideal genealogical environment.
Montagues: The name field must be useful, above all
Montagues say that the rules create more problems than they solve, and make simple tasks far too difficult. They want to be able to distinguish one ancestor from another quickly and easily. To a Montague, an ancestor’s name is the key identifier in all our genealogical software usage. It is what they use to find someone in indexes, searches, or drop-down lists. Everything they do ties to a name.
With names as their key path to finding people, they need to use the name fields in whatever way will expedite that end. Here are some reasons they use creative license in naming:
- A researcher often knows only a single name or even a nickname long before they can get to the official birth name of an ancestor.
- They often know a woman only by her married name, but our software tends to put that name out of sight when needed most.
- Many genealogy software packages and online trees blend all the parts of a name together into a single name field, when displaying it. They do not parse out first, middle, last, suffix, and so forth, making it impossible to tell a first name from a last when the name displayed is a single term, for example, Reid or Anderson.
- In censuses we often know there were others in a man’s household, but have no names — only tick marks that give an age range, gender, and ethnicity. How are they to be reflected with Capulet naming rules?
- Millions of ancestors had their names stripped to seal their enslavement. The researcher might only know the person as “Male mulatto born c. 1820.” How would they then account for these until they have proper names, if they follow Capulet rules?
Montagues want to work rapidly and efficiently. Rules that impede this goal are rules to be broken. They’d rather clean up a GEDCOM someday than slow their progress in their genealogical work day after day.
To a Montague, Capulets are mired in old-school rules that hinder effective research and are slowing genealogy to a crawl.
GEGs: The name field must be useful and cleanly transferable
We GEGs are eager to be the best genealogists we can be. Choosing Montague or Capulet forces a compromise we are unwilling to make. Our nature, then, compels us to seek techniques that draw the best of both houses and eliminate the limitations.
When it comes to handling names in genealogy, a GEG:
- Seeks the full birth name of the ancestor as the long-term goal, applying reasonable and thorough effort in that pursuit, but refuses to be limited unnecessarily by its absence in the early stages of research.
- Recognizes the vital importance of rapidly distinguishing one ancestor from another in the trees and lists displayed in our software and online services and uses the name field, as necessary, to make those distinctions.
- Takes a smart approach to unconventional naming techniques with the goal in mind to be able quickly and easily to remove unacceptable name data before transferring information to others.
- Keeps software up-to-date to prevent the name limitations inherent in older technology.
- Creates new and better rules to ensure consistency, even as old rules are circumvented.
- Thoroughly tests new naming techniques and their transferability via GEDCOM before making widespread changes in trees.
The solutions I am proposing below work for me, with my particular set of tools and settings. Only you can determine what works for you. Test on a “play tree” or a copy of your tree before you implement any wide-scale changes to your data.
The Golden Key: the power of wildcard search-and-replace tools
Ultimately, the key to being simultaneously a fast-moving Montague and a collaboration-ready Capulet lies in the use of search-and-replace tools — usually with wildcard capabilities. A wildcard lets you find all episodes of a string of characters with some of the characters unspecified. The smart creation of notations can allow you to select and delete the text you have determined to be removable with minimal effort.
The problems and potential solutions described below assume you do have robust search-and-replace capabilities — either within your genealogical software or in a document tool that can manipulate your GEDCOM file after it is exported. Search-and-replace tools do a scatter-shot surgery on a large amount of data, so it is vital that you back up your data before you use such a tool. Let me put this in big bold red letters:
BACKUP YOUR TREE OR GEDCOM BEFORE DOING ANY SEARCH-AND-REPLACE OPERATIONS!!!
GEG-style creative naming conventions
The problems and solutions described later in this post all depend on a smart approach to choosing how you get creative with the name fields in your software. You want to begin with the end in mind — the end being the ability to share your data with others without creating a mess in the transfer. When you run a search-and-replace operation on your data, the tool needs to be able to tell exactly which text needs to be removed from the copy of your tree data or from the GEDCOM after you’ve exported it. You want to make sure your creative name notations have something unique about them that prevents you from deleting something you should have kept, when you attempt to delete the rule-breaking notations.
There are two different types of search/replace operations:
Static: You have used a standard notation that means the same thing on every record on which it appears. For example, you use “-?-” to represent a missing name. Most software will easily do a search/replace on a term like this, replacing it with nothing, as a means to remove it from your record.
Complex: You have put meaningful text about the specific record within some piece of the name field. These terms require wildcard capabilities to find and remove them. In order to flag to the search/replace tool these complex notations, you encase all appearances of such text in a consistent manner, like “(wo: John Doe)” on one record and “(wo: Tom)” on another. [See explanation of “wo:” below.] Using wildcards, you can tell your tool to remove every phrase beginning “(wo:” and ending with the next “)”.
Do you have full wildcard search-and-replace capabilities?
Search the help file of your genealogy software for the term “wildcard” to see if it allows you to search and replace text with wildcards. Some allow you to search with wildcards, but not to replace it. I’ve checked the Big Three:
- If you use Family Tree Maker, you have built-in wildcard capabilities, and all the examples below can be handled from within the software, always working on a copy of your main data — unless you intend to be forever rid of the notations for your own use.
- Legacy Family Tree software has wildcard capabilities for searching, not for replacing. You can easily use it to remove static terms. Or you can let its wildcard search tool take you to episodes of complex text for you to clean up manually, one by one, which is sometimes a desirable method. If you want an automated tool to replace complex text, you’ll need to do that clean-up from your GEDCOM after export, using a tool like Microsoft Word (desktop edition only) if you have it, or a free word processor like WPS Office Free 2016.
- RootsMagic does not offer wildcard searching at all — a limitation I hope it is working on. Its basic search-and-replace tool can handle static situations, but the more complex clean-up of text will require you to first export to GEDCOM and to clean the data, as described for Legacy.
One irrefutable rule for all: never use a slash (/) in a surname field
Before you start getting creative with your software naming conventions, let me warn you about one symbol that will eventually cause you trouble. The slash (/) is the standard symbol used by the current GEDCOM format (5.5) to separate surnames from given names. I’ve learned by bitter experience that typing a slash into a surname field will not be exported properly to a GEDCOM .
If you open a GEDCOM file in a text file reader (Notepad, Wordpad, MS Word, etc.), you’ll find a line holding the given name(s) and surname for each ancestral record. The line begins “1 NAME ” followed by the name terms. GEDCOM encloses the surname in slashes, like “/Avery/” in the example below. Anything that precedes it will become the given name(s) when this GEDCOM file is imported into software.
If you have used a slash anywhere in the surname before you exported the GEDCOM file, this name line will now have three slashes, instead of two. GEDCOM will interpret anything between the final two slashes as the surname.
Let’s say, for example, you want to show various spellings you’ve found for Henry William Avery’s surname. In the surname field (if your software offers such a thing), you type, “Avery/Avary/Avrey.” The GEDCOM will choose the final piece — “Avrey” — as the surname and deem all else a given name. So if you export your data via GEDCOM and then pull it into a different software program, the name will appear:
Given Name: Henry William Avery/Avary
And your post-GEDCOM data cleanup will be multiplied for any place you put a slash in a surname. So I implore you, don’t do it.
Other symbols you should avoid…@,
Your GEDCOM files contain the “at” sign (@) many times throughout. I haven’t confirmed that the @ in a name field will create problems, but I think you’re taking a risk to use a symbol so significant in the coding of GEDCOMs.
I love brackets , but Ancestry.com won’t allow them in the surname. Even if you are not an Ancestry user and never plan to be, you’re creating a problem for Ancestry users who share your data. Also, since brackets might be used in the wildcard search-and-replace to enclose terms, this is a risky symbol. I would avoid it in any name fields.
Commas tend to be used in views that give last name first, so the use of a comma will create ugly data and possibly create other problems with the software.
Sign up to receive notices of new content from the Golden Egg Genealogist.
Now, after all that preliminary yakking,
let’s get down to business…
Problem #1: Dealing with the unknowns
As we build a family history, we only have partial names for many ancestors. We might have a first, we might have a last, and sometimes we just know there was a person — but have no clue what their first or last name was.
The rules I see on many genealogical sites say, if you haven’t proven the name, leave it blank. Use the notes fields to explain things. But you have to put something in at least one field of the name, with most software tools, so most of us are already breaking that rule. But blank name fields also create inefficiencies in processing and viewing some names, as I’ll describe below.
So how do we handle them?
Don’t use “Unk” or “Unknown”
I do agree with the rule on this one thing: Don’t put “Unknown” or “Unk” in a blank name field. Why? Because our very helpful software tools will try to turn “Unknown” or “Unk” into a name when you are searching for records.
It will shape the “Hints” your software shows you, when it thinks it has found a match. If you’ve typed in the names “Elizabeth Unknown,” you’ll be seeing records show up for Elizabeth Un and Elizabeth Underwood. You’ll be weeding out a lot of bad data. So don’t make this trouble for yourself.
Do use properly encased question marks
On the other hand, I think there can be a very good reason to use a question mark to reflect that a name is missing, if you package it properly. When that day comes that you clean up data for transfer, you don’t want to accidentally remove question marks that belong in your data permanently.
So how do you “package it properly”? You encase it with other symbols in such a way that you can easily run a search/replace to eliminate the set of marks, rather than just the question mark by itself. Here’s what I recommend:
Later, you can eliminate all episodes of “-?-” without accidentally removing valid question marks from your notes, sources, URLs, or GEDCOM coding.
Family Tree Maker and empty name fields
The encased question mark serves a useful purpose if you are moving data from one software product to another, particularly if you are using or plan to use Family Tree Maker (FTM). This tool makes you type all of the elements of a name into one long blank, and then the software, in its own wisdom, parses the terms into name parts for you. While it does have a way for you to open a name to see how it was parsed, most of us don’t realize we need to check behind the software.
In adding a new person, you type:
Donna Leigh Cox
and it correctly assumes the following:
Given: Donna Leigh
While it gets it right many times, as it would in the above situation, it has the potential to make a mess of names. And it might be a mess you are unaware of until you move 20,000 names to another software package.
Let’s assume you only know a first name:
The FTM software is programmed to turn the last term it finds into a last name, unless the final term is a traditional suffix, like “Sr.” So the software turns your term “Reid” into this:
On the other hand, had you typed
it would have handled it correctly — making Reid the given name and putting “-?-” in the last name field.
The problem with displaying incomplete names
Most of the other software packages let you key terms into separate Given, Surname, and Suffix fields. However, when you are looking at the names in various reports or trees, you can’t tell what you are seeing. If a person has been keyed into your software with the name Mitchell in the first or last name field, he will appear in the tree like this:
Is Mitchell his first name or his last? The software knows, but you have to dig in to learn the answer. However, if you use a question mark to flag the blank field, you know whether you’re seeing given or surnames:
- Patrick -?-
- -?- Reid
- Ryan -?-
Sign up to receive notices of new content from the Golden Egg Genealogist.
Problem #2: Women with unknown maiden names
The Capulet rules say that the main name field should contain ONLY names you have proven. And you are aiming for the name as it appeared on initial birth documents. If you find discrepancies or have questions, you address it in the notes — NOT in the name. It seems like solid counsel, until you put it into practice.
As you’re using your software, you will need to find your way back to people you’re working on. But if you follow the rules above, your name list will include lengthy sections that likely look something like this (if you’re using encased question marks for blanks):
|-?-, Elizabeth||Abt. 1809|
|-?-, Elizabeth||1 Dec 1872|
|-?-, Elizabeth A.|
|-?-, Mary||23 Sep 1793|
|-?-, Mary T.|
If you’re looking for a particular Elizabeth, Capulet rules will tell you to open each one, see who they’re married to and read the notes. You can then figure out which Elizabeth is the one you want. Ridiculously time-consuming? You bet. Worse, you’ll have to do it all over again the next time you’re looking for an Elizabeth whose birth surname is unknown. And the list of other Elizabeths will likely keep growing.
I use the suffix field of names to put something that distinguishes this person — usually tying her to another family member. While the goal will always be to find Elizabeth’s own maiden name and get rid of the suffix, it helps to distinguish her for the time being.
I put the temporary information in parentheses and start with an abbreviation and colon, followed by a related name–someone who readily helps me to know which Elizabeth or Mary is which. Here are my standard abbreviations:
- wo: = “wife of” or “widow of”
- ho: = “husband of”
- fo: = “father of”
- mo: = “mother of”
- so: = “son of”
- do: = “daughter of”
We can create our own abbreviations to make this most useful and meaningful. If you plan to dump your data to a document for publication, the parentheses, abbreviations, and colons make it relatively easy to search for all episodes of these helpful suffixes for deletion before publication.
Meanwhile, on-line, surely you’d have an easier time finding the Elizabeth you’re looking for, if your search window looked more like this:
|-?-, Elizabeth (wo: John Stone)||1809|
|-?-, Elizabeth (mo: Allison Smith)||1 Dec 1872|
|-?-, Elizabeth (wo: Ransom -?-)|
|-?-, Elizabeth (do: Terrence -?-)|
|-?-, Elizabeth A. (wo: -?- Bond then -?- Frazer)|
|-?-, Mary (wo: -?- Prischkin)|
|-?-, Mary (do: -?- Prishkin)||23 Sep 1793|
|-?-, Mary (wo: John Roe)|
|-?-, Mary T. (wo: Philip)|
It might not look quite as clean, but it is immensely more valuable to you in your work.
Ancestry’s automated family building and creative names
I have found this naming technique particularly helpful when Ancestry.com brings me a set of family members to automatically add into my tree–generated from a source. Depending upon what sort of source generated the names, it might be giving me a father’s daughters with their married names or it might give me a nickname for someone I had under a formal name in my database. In these cases, I’ll use the suffix field and my abbreviations (including one that starts with “aka:”) to temporarily capture these clues until Ancestry can add the record to or modify the record in my tree.
Turning off warning messages
Ancestry and FamilySearch offer us little to no resistance as we attempt to key into the name field characters that other software packages call “illegal.” (Exception: brackets in the surname in Ancestry.) Our desktop software packages put up more protections–but also allow you to turn off the warnings.
RootsMagic is very forgiving as you get creative with naming conventions. It does offer a NameClean tool that searches for illegal characters and offers you the chance to fix them. But if you are going Montague, you will want to think long and hard before you use this tool.
Family Tree Maker will put an error message in a banner across the top of your screen when you get creative with names. It gives you the option to turn the message off. You will want to do that, by the way, because this error message also tries to “fix” your bad data. Don’t let that happen. (I’ve reported the bug to MacKiev.)
Legacy will also flag the errors and give you the chance to select which issues you want to be warned about. Deselect anything that will prevent you from doing your names the best way for your needs.
Last words on the subject (until you give me more to say)
When I started working on this topic weeks ago, I expected it to be a quick topic. But the more I tested, the more complex it became. It’s been a very worthwhile exercise for me because, for the first time, I feel confident and consistent in my choices. I hope I’ve raised options and warnings and questions for you. Understand that I don’t encourage anyone to break the rules until you …
- Create an intelligently conceived rule.
- Test your new rule thoroughly.
- Consistently follow your intelligently conceived and thoroughly tested rule.
Let me know about issues you encounter, so I can refine this document as we learn together.
Sign up to receive notices of new content from the Golden Egg Genealogist.