GEDCOM Explained By: Dick Eastman

Email: [email protected]

I frequently mention the acronym “GEDCOM”. This week a reader wrote to me with an excellent question: “What is GEDCOM?” I realized that I haven’t explained that buzzword in a long, long time. So here is a brief, non-technical explanation of the term for the newer subscribers to this publication.

GEDCOM is an abbreviation that stands for GEnealogy Data COMmunications. In short, GEDCOM is the language by which different genealogy software programs talk to one another. The purpose is to exchange data between dissimilar programs without having to manually re-enter all the data on a keyboard.

To illustrate the importance of GEDCOM, step back in time with me for a moment. Back before the invention of GEDCOM and before the invention of the home computer, I entered onto 80-column punch cards the names and limited information about 200 or so of my ancestors. I did this after hours in my employer’s data center. I then used the employer’s mainframe computer that cost hundreds of thousands of dollars to sort the data and to print a few crude reports. Luckily for me, my employer allowed me to use all the mainframe time I wanted during the evening, after the company finished its daily work.

Around 1980, I built my own home computer. I decided to put my genealogy database onto the new system, but it would not read 80- column punch cards. I manually re-typed every bit of data into a dBASE-II program that I wrote. My database had grown; I had to enter data on 400 or so individuals. I stored the information on 8-inch floppy disks attached to my homemade 8-bit CP/M computer that had 64 kilobytes of memory.

Some time later I discovered a CP/M genealogy program that would operate on my system. Unlike my crude, homemade program, this new genealogy program printed pedigree charts, family group sheets, and other reports. I decided to convert to the new, more powerful program (although I must say that it was rather elementary when compared to today’s powerful programs). My database had grown to about 600 individuals, and I could not find any method of easily copying that data into the new program. I first printed out the information from the dBASE-II database. Then I sat at my computer for several evenings, reading the information on paper and re- typing every bit of it into my new program.

I bet you can guess the next step: I purchased an IBM clone in 1984 and decided to move my data to this new powerhouse. After all, it had 640 kilobytes of memory and a 20-megabyte hard drive that I was certain that I could never fill. Having been rather active in my genealogy research, now I had information about 1,200 people to re-enter. I printed out the entire database from the old system onto paper and then manually re-typed it into the new PC powerhouse. That effort took weeks, and I promised myself, “Never again!”

Newer genealogy programs appeared in the following years, each with new features that I found enticing. However, I continued to use the same program simply because I didn’t want to go through the keyboard effort again. Then the Church of Jesus Christ of Latter-day Saints announced something new: a file format called GEDCOM. This new proposed standard file format was designed to allow different genealogy programs to exchange data. There was only one problem at the time: the only program that could read and write GEDCOM data was the one written by the Church of Jesus Christ of Latter-day Saints.

GEDCOM is a standard, not a program. As such, genealogy programs that are going to use the same data have to be written by the programmers to handle GEDCOM files. If you are trying to transfer data from one program to another, only to discover that only one of the programs supports GEDCOM, you are out of luck. Instead, both programs have to support GEDCOM.

Slowly, over a period of several years, other genealogy programs began to add the ability to read and write GEDCOM files. It was now possible to move data from one genealogy program to another without manually re-typing everything. The author of the genealogy program that I used never did add GEDCOM capability. Luckily for me, someone else eventually wrote a small routine that would export data from this program in GEDCOM format, and I was then able to move my data to more powerful new programs.

By 1990, I was writing articles on CompuServe, advising everyone to never use a genealogy program that lacked GEDCOM capabilities. Luckily, that is not much of an issue this year. All of today’s major genealogy programs will import and export GEDCOM data. Data transfer is still a problem for those using older genealogy programs without GEDCOM capability; many people still find their data trapped in these “islands.” For them, there is no easy solution.

Unlike the “dark ages” of the 1980s, it is now common for people to use two or three or even more genealogy programs. You may find one program that you prefer to use for storing all the bits of information that you encounter in your research efforts. However, you might prefer the printed reports or multimedia scrapbook features of a different program. Thanks to GEDCOM, you can easily move your data from one program to another. You can also share information with distant cousins using yet other genealogy programs by sending GEDCOM files to each other by e-mail.

The instructions for creating or reading GEDCOM files will vary from one program to another. You need to consult the program’s HELP files to find the exact sequence of instructions your genealogy program requires.

You need to be aware that the creation of the GEDCOM standard was not a perfect implementation. For one thing, not all the data fields are specified precisely in the GEDCOM specifications. Next, not all the programmers of the various genealogy programs interpreted the specifications in exactly the same manner. For instance, your present genealogy program might be perfectly happy with a birth date listed as, “after 1847 but before 1852.” However, once that information is exported in a GEDCOM file and then imported into a different program, the birth date may say something else. Typically, it is simply left blank.

Another problem is that not all genealogy programs have the same ideas about databases. One program may have only one field for “occupation,” assuming that every person on the face of the earth never, ever changed careers. Another genealogy program may have the ability to record multiple occupations during the person’s lifetime. When transferring data via GEDCOM from the more powerful program to the simpler one, some of these occupations will be lost. These are a couple of simple examples; you can find numerous other inconsistencies when moving data between dissimilar programs.

Another limitation is the fact that the present GEDCOM standard was created before the popularity of multimedia. You can transfer textual data, such as names, dates and locations rather well in GEDCOM. However, transferring scanned images, sound clips and movies from one genealogy program to another is almost impossible to accomplish via GEDCOM files.

There is another problem with translating from one format to another, that of data integrity. Translating from one program’s database to GEDCOM is sort of the same as translating from one spoken language to another. The basics work, but subtleties and details sometimes do not translate well. Then, when translating to the third language (the receiving genealogy program’s database), more translation losses creep in. I well remember reading a technical manual some years ago that had been written in Japanese and then translated into Chinese. At a later date, the Chinese version was translated into English. The resultant English manual was barely readable. The same may happen with translating a database from Program A into GEDCOM and then from GEDCOM into Program B.

A new method of transferring data between different genealogy programs was announced some time ago by Wholly Genes Software. Their GenBridge technology reads data from one program directly into a second program without requiring a “double translation” via GEDCOM. The result is a much more accurate transfer process. However, other genealogy developers have yet to adopt GenBridge. To date, this technology is only available in software produced by Wholly Genes: The Master Genealogist and Family Tree Super Tools.

Despite all the shortcomings, GEDCOM is still a simple and somewhat effective method of transferring genealogy data from one program to another. Most of the data will transfer properly, and then there are easy ways of reviewing the data to look for errors. The names, dates and locations normally transfer correctly. Text, events, notes and source citations may not always work perfectly. The exact problems encountered will depend upon the two genealogy programs involved.

Most modern genealogy programs will create an error log of GEDCOM data imported but not understood by the receiving program. You can read that log file to see what the program detected as inconsistent, then manually go in and fix the errors. While tedious, this is still a lot better than re-keying everything!

A few weeks ago a new GEDCOM standard was proposed that is to be based upon XML, a programming language that is popular on the World Wide Web. This new standard should greatly improve data transfer accuracy. See my article at http://www.ancestry.com/library/view/columns/eastman/5626.asp for details. However, don’t look for this new GEDCOM 6.0 any time soon. It is still a proposal and probably will not appear in genealogy programs for another couple of years.

I offer this as a non-technical explanation of GEDCOM plus some commentary on its use. For more details and for technical explanations of the inner workings of GEDCOM, I would suggest that you read the following:
The GEDCOM Standard Release 5.5:
http://homepages.rootsweb.com/~pmcbride/gedcom/55gctoc.htm

GEDCOM 6.0 XML proposal:
http://www.familysearch.org/GEDCOM/GedXML60.pdf

Introduction to GEDCOM:
http://web.ukonline.co.uk/nigel.battysmith/gedinfo.html

GENTECH’s GEDCOM Test BookProject:
http://www.ancestry.com/library/view/gencomp/5645.asp and
http://www.gentech.org/gedtest.htm

GEDCOM 101 by Jan McClintock:
http://www.leisterpro.com/doc/Articles/GEDCOM101.html

GEDCOM Usage Guide:
http://www.cmis.csiro.au/Graham.Williams/personal/gedcom.html

Ancestry World Tree:
http://www.ancestry.com/share/awt/main.htm

Is GEDCOM Dead? By Beau Sharbrough:
http://www.sharbrough.net/genealogy/genart13.htm

SOURCE: This article is from Eastman’s Online Genealogy Newsletter published May 16, 2002 and is copyright 2002 by Richard W. Eastman. It is re-published here with Dick’s permission.