parser

package
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 9, 2026 License: MIT Imports: 8 Imported by: 0

Documentation

Overview

Package parser provides low-level GEDCOM line parsing functionality.

This package handles the tokenization and parsing of individual GEDCOM lines, converting them into Line structures with level, tag, value, and cross-reference information. It supports all standard GEDCOM formats and provides detailed error reporting with line numbers.

Example usage:

p := parser.NewParser(reader)
for {
    line, err := p.ParseLine()
    if err == io.EOF {
        break
    }
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("Level %d: %s = %s\n", line.Level, line.Tag, line.Value)
}
Example

Example demonstrates basic GEDCOM line parsing.

package main

import (
	"fmt"
	"strings"

	"github.com/cacack/gedcom-go/parser"
)

func main() {
	// GEDCOM data as a string (typically read from a file)
	gedcomData := `0 HEAD
1 GEDC
2 VERS 5.5
0 @I1@ INDI
1 NAME John /Smith/
0 TRLR`

	// Create a parser and parse the content
	p := parser.NewParser()
	lines, err := p.Parse(strings.NewReader(gedcomData))
	if err != nil {
		fmt.Printf("Error: %v\n", err)
		return
	}

	fmt.Printf("Parsed %d lines\n", len(lines))

}
Output:

Parsed 6 lines

Index

Examples

Constants

View Source
const IndexVersion byte = 1

IndexVersion is the current version of the index format. Increment when making incompatible changes to the index structure.

View Source
const MaxNestingDepth = 100

MaxNestingDepth is the maximum allowed nesting depth to prevent stack overflow.

Variables

View Source
var ErrIndexVersionMismatch = errors.New("index version mismatch")

ErrIndexVersionMismatch is returned when loading an index with an incompatible version.

View Source
var ErrNoIndex = errors.New("no index available: call BuildIndex or LoadIndex first")

ErrNoIndex is returned when attempting to use indexed operations without an index.

View Source
var ErrRecordNotFound = errors.New("record not found")

ErrRecordNotFound is returned when a record cannot be found.

Functions

func Records added in v1.2.0

func Records(r io.Reader) iter.Seq2[*RawRecord, error]

Records returns an iterator over GEDCOM records using Go 1.23 range-over-func. It yields (*RawRecord, nil) for each successfully parsed record. On parse error, it yields (nil, error) and stops iteration.

This function provides a modern, idiomatic alternative to RecordIterator for streaming GEDCOM record processing. The reader should already be wrapped with charset.NewReader() for encoding normalization.

Usage:

for record, err := range parser.Records(reader) {
    if err != nil {
        return err
    }
    // process record
}

Early termination is supported - breaking from the loop will stop iteration:

for record, err := range parser.Records(reader) {
    if err != nil {
        return err
    }
    if record.Type == "TRLR" {
        break // stop at trailer
    }
}

func RecordsWithOffset added in v1.2.0

func RecordsWithOffset(r io.Reader) iter.Seq2[*RawRecord, error]

RecordsWithOffset returns an iterator over GEDCOM records with accurate byte offset tracking. It yields (*RawRecord, nil) for each successfully parsed record. On parse error, it yields (nil, error) and stops iteration.

This is the range-over-func equivalent of RecordIteratorWithOffset, providing precise ByteOffset and ByteLength values suitable for building file indexes. The reader should already be wrapped with charset.NewReader() for encoding normalization.

Usage:

for record, err := range parser.RecordsWithOffset(reader) {
    if err != nil {
        return err
    }
    fmt.Printf("Record %s at offset %d, length %d\n",
        record.Type, record.ByteOffset, record.ByteLength)
}

Types

type IndexEntry added in v0.8.0

type IndexEntry struct {
	// XRef is the cross-reference identifier (e.g., "@I1@")
	// Empty for records without XRefs (like HEAD, TRLR)
	XRef string

	// Type is the record type tag (e.g., "INDI", "FAM", "HEAD")
	Type string

	// ByteOffset is the starting byte position of this record in the file
	ByteOffset int64

	// ByteLength is the total number of bytes for this record
	ByteLength int64
}

IndexEntry represents a single record's location in a GEDCOM file.

type LazyParser added in v0.8.0

type LazyParser struct {
	// contains filtered or unexported fields
}

LazyParser provides lazy/incremental parsing of GEDCOM files. It combines streaming iteration with indexed random access for efficient partial file processing.

func NewLazyParser added in v0.8.0

func NewLazyParser(rs io.ReadSeeker) *LazyParser

NewLazyParser creates a new LazyParser from an io.ReadSeeker. The reader must support seeking for indexed access operations.

func (*LazyParser) AllRecords added in v1.2.0

func (p *LazyParser) AllRecords() iter.Seq2[*RawRecord, error]

AllRecords seeks to the beginning and returns an iterator over all records using Go 1.23 range-over-func. This is the range-over-func equivalent of [IterateAll].

Usage:

for record, err := range lp.AllRecords() {
    if err != nil {
        return err
    }
    // process record
}

func (*LazyParser) BuildIndex added in v0.8.0

func (p *LazyParser) BuildIndex() error

BuildIndex scans the entire file to build an index for O(1) record lookup. After calling BuildIndex, use FindRecord for efficient random access. The reader is rewound to the beginning after building the index.

func (*LazyParser) FindRecord added in v0.8.0

func (p *LazyParser) FindRecord(xref string) (*RawRecord, error)

FindRecord locates and parses a specific record by XRef. Requires an index to be built or loaded first. Returns ErrNoIndex if no index is available. Returns ErrRecordNotFound if the XRef is not in the index.

func (*LazyParser) FindRecordByType added in v0.8.0

func (p *LazyParser) FindRecordByType(recordType string) (*RawRecord, error)

FindRecordByType locates and parses a record by type (for records without XRef). This is useful for finding HEAD or TRLR records. Requires an index to be built or loaded first.

func (*LazyParser) HasIndex added in v0.8.0

func (p *LazyParser) HasIndex() bool

HasIndex returns true if an index is available.

func (*LazyParser) Index added in v0.8.0

func (p *LazyParser) Index() *RecordIndex

Index returns the current index, or nil if none is loaded.

func (*LazyParser) Iterate added in v0.8.0

func (p *LazyParser) Iterate() *RecordIterator

Iterate returns a RecordIterator for streaming through records. The iterator starts from the current position of the reader. For full file iteration, seek to the beginning first.

func (*LazyParser) IterateAll added in v0.8.0

func (p *LazyParser) IterateAll() (*RecordIterator, error)

IterateAll seeks to the beginning and returns an iterator for all records.

func (*LazyParser) IterateFrom added in v0.8.0

func (p *LazyParser) IterateFrom(offset int64) (*RecordIterator, error)

IterateFrom seeks to the given byte offset and returns an iterator. This allows resuming iteration from a known position.

func (*LazyParser) LoadIndex added in v0.8.0

func (p *LazyParser) LoadIndex(r io.Reader) error

LoadIndex loads a pre-built index from the given reader. Using a pre-built index avoids the O(n) scan of BuildIndex.

func (*LazyParser) RecordCount added in v0.8.0

func (p *LazyParser) RecordCount() int

RecordCount returns the total number of indexed records. Returns 0 if no index is available.

func (*LazyParser) Records added in v1.2.0

func (p *LazyParser) Records() iter.Seq2[*RawRecord, error]

Records returns an iterator over records from the current position using Go 1.23 range-over-func. This is the range-over-func equivalent of [Iterate].

Usage:

for record, err := range lp.Records() {
    if err != nil {
        return err
    }
    // process record
}

func (*LazyParser) RecordsFrom added in v1.2.0

func (p *LazyParser) RecordsFrom(offset int64) iter.Seq2[*RawRecord, error]

RecordsFrom seeks to the given byte offset and returns an iterator over records using Go 1.23 range-over-func. This is the range-over-func equivalent of [IterateFrom].

If seeking fails, the error is yielded as the first iteration result.

Usage:

for record, err := range lp.RecordsFrom(offset) {
    if err != nil {
        return err
    }
    // process record
}

func (*LazyParser) SaveIndex added in v0.8.0

func (p *LazyParser) SaveIndex(w io.Writer) error

SaveIndex writes the current index to the given writer. Returns ErrNoIndex if no index has been built or loaded.

func (*LazyParser) XRefs added in v0.8.0

func (p *LazyParser) XRefs() []string

XRefs returns all XRefs in the index. Returns nil if no index is available.

type Line

type Line struct {
	// Level indicates the hierarchical depth (0, 1, 2, etc.)
	Level int

	// Tag is the GEDCOM tag (e.g., HEAD, INDI, NAME, BIRT)
	Tag string

	// Value is the optional value associated with the tag
	Value string

	// XRef is the optional cross-reference identifier (e.g., @I1@)
	XRef string

	// LineNumber is the line number in the source file (1-based)
	// Used for error reporting
	LineNumber int
}

Line represents a single parsed line from a GEDCOM file. GEDCOM files use a line-based format with hierarchical levels. Each line format: LEVEL [XREF] TAG [VALUE]

Example

ExampleLine demonstrates accessing parsed Line fields.

package main

import (
	"fmt"

	"github.com/cacack/gedcom-go/parser"
)

func main() {
	p := parser.NewParser()

	// Parse a line with XRef (cross-reference identifier)
	line1, _ := p.ParseLine("0 @I1@ INDI")
	fmt.Printf("XRef: %s, Tag: %s\n", line1.XRef, line1.Tag)

	// Parse a line with a value
	line2, _ := p.ParseLine("1 NAME John /Smith/")
	fmt.Printf("Tag: %s, Value: %s\n", line2.Tag, line2.Value)

	// Parse a nested line
	line3, _ := p.ParseLine("2 GIVN John")
	fmt.Printf("Level: %d, Tag: %s, Value: %s\n", line3.Level, line3.Tag, line3.Value)

}
Output:

XRef: @I1@, Tag: INDI
Tag: NAME, Value: John /Smith/
Level: 2, Tag: GIVN, Value: John
Example (LineNumber)

ExampleLine_lineNumber shows how line numbers are tracked for error reporting.

package main

import (
	"fmt"

	"github.com/cacack/gedcom-go/parser"
)

func main() {
	p := parser.NewParser()

	// Parse multiple lines - line numbers are tracked automatically
	lines := []string{
		"0 HEAD",
		"1 SOUR MyApp",
		"0 @I1@ INDI",
	}

	for _, input := range lines {
		line, _ := p.ParseLine(input)
		fmt.Printf("Line %d: %s\n", line.LineNumber, line.Tag)
	}

}
Output:

Line 1: HEAD
Line 2: SOUR
Line 3: INDI

type ParseError

type ParseError struct {
	// Line is the line number where the error occurred (1-based)
	Line int

	// Message describes what went wrong
	Message string

	// Context provides the actual line content that caused the error
	Context string

	// Err is the underlying error, if any
	Err error
}

ParseError represents an error that occurred during parsing. It includes line number and context for better error reporting.

func (*ParseError) Error

func (e *ParseError) Error() string

func (*ParseError) Unwrap

func (e *ParseError) Unwrap() error

type ParseOptions added in v1.1.0

type ParseOptions struct {
	// Lenient controls error handling behavior.
	// If true, the parser collects errors and continues parsing.
	// If false (default), the parser fails on the first error.
	Lenient bool

	// MaxErrors is the maximum number of errors to collect in lenient mode.
	// When reached, parsing continues but errors are no longer collected.
	// A value of 0 means unlimited errors will be collected.
	MaxErrors int
}

ParseOptions configures the behavior of ParseWithOptions.

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser parses GEDCOM files into Line structures.

func NewParser

func NewParser() *Parser

NewParser creates a new Parser instance.

func (*Parser) Parse

func (p *Parser) Parse(r io.Reader) ([]*Line, error)

Parse reads a GEDCOM file from a reader and returns all parsed lines. Supports all line ending styles: LF (Unix), CRLF (Windows), CR (old Macintosh).

Example

ExampleParse shows how to parse complete GEDCOM content from an io.Reader.

package main

import (
	"fmt"
	"strings"

	"github.com/cacack/gedcom-go/parser"
)

func main() {
	gedcomData := `0 HEAD
1 GEDC
2 VERS 5.5.1
0 @I1@ INDI
1 NAME Alice /Johnson/
1 SEX F
0 @I2@ INDI
1 NAME Bob /Johnson/
1 SEX M
0 TRLR`

	p := parser.NewParser()
	lines, err := p.Parse(strings.NewReader(gedcomData))
	if err != nil {
		fmt.Printf("Error: %v\n", err)
		return
	}

	// Count records at level 0 (top-level records)
	records := 0
	for _, line := range lines {
		if line.Level == 0 {
			records++
		}
	}

	fmt.Printf("Total lines: %d\n", len(lines))
	fmt.Printf("Top-level records: %d\n", records)

}
Output:

Total lines: 10
Top-level records: 4

func (*Parser) ParseLine

func (p *Parser) ParseLine(input string) (*Line, error)

ParseLine parses a single GEDCOM line. GEDCOM line format: LEVEL [XREF] TAG [VALUE] Examples:

0 HEAD
0 @I1@ INDI
1 NAME John /Smith/
2 GIVN John
Example

ExampleParser_ParseLine shows line-by-line parsing for streaming scenarios.

package main

import (
	"fmt"

	"github.com/cacack/gedcom-go/parser"
)

func main() {
	// Line-by-line parsing is useful for streaming or custom parsing logic
	p := parser.NewParser()

	inputLines := []string{
		"0 HEAD",
		"1 GEDC",
		"2 VERS 5.5",
		"0 @I1@ INDI",
		"1 NAME John /Smith/",
	}

	for _, input := range inputLines {
		line, err := p.ParseLine(input)
		if err != nil {
			fmt.Printf("Error: %v\n", err)
			return
		}
		fmt.Printf("Level %d: %s\n", line.Level, line.Tag)
	}

}
Output:

Level 0: HEAD
Level 1: GEDC
Level 2: VERS
Level 0: INDI
Level 1: NAME

func (*Parser) ParseWithOptions added in v1.1.0

func (p *Parser) ParseWithOptions(r io.Reader, opts *ParseOptions) (
	lines []*Line,
	parseErrors []*ParseError,
	fatalErr error,
)

ParseWithOptions reads a GEDCOM file with configurable error handling. In lenient mode, it collects parse errors and continues parsing. Returns:

  • lines: successfully parsed lines (may be partial in lenient mode)
  • parseErrors: syntax errors encountered (only populated in lenient mode)
  • fatalErr: unrecoverable errors like I/O failures

func (*Parser) Reset

func (p *Parser) Reset()

Reset resets the parser state for reuse.

type RawRecord added in v0.8.0

type RawRecord struct {
	// XRef is the optional cross-reference identifier (e.g., "@I1@")
	XRef string

	// Type is the tag at level 0 (e.g., "INDI", "FAM", "HEAD", "TRLR")
	Type string

	// Lines contains all parsed lines belonging to this record, including the level-0 line
	Lines []*Line

	// ByteOffset is the starting byte position of this record in the file
	ByteOffset int64

	// ByteLength is the total number of bytes for this record
	ByteLength int64
}

RawRecord represents a complete GEDCOM record with all its subordinate lines. A record starts at level 0 and includes all following lines until the next level 0.

type RecordIndex added in v0.8.0

type RecordIndex struct {
	// contains filtered or unexported fields
}

RecordIndex provides O(1) lookup of records by XRef after an O(n) build phase. The index maps XRef strings to their byte offsets in the original file.

func BuildIndex added in v0.8.0

func BuildIndex(r io.Reader) (*RecordIndex, error)

BuildIndex builds an index by scanning the entire file once. The reader should be positioned at the start of the file. After building, the reader position is at EOF.

func LoadIndex added in v0.8.0

func LoadIndex(r io.Reader) (*RecordIndex, error)

LoadIndex reads an index from the given reader. Returns an error if the index version is incompatible.

func NewRecordIndex added in v0.8.0

func NewRecordIndex() *RecordIndex

NewRecordIndex creates an empty RecordIndex.

func (*RecordIndex) Encoding added in v0.8.0

func (idx *RecordIndex) Encoding() string

Encoding returns the encoding that was detected when the index was built.

func (*RecordIndex) Len added in v0.8.0

func (idx *RecordIndex) Len() int

Len returns the total number of indexed entries (both XRef and type-based).

func (*RecordIndex) Lookup added in v0.8.0

func (idx *RecordIndex) Lookup(xref string) (IndexEntry, bool)

Lookup returns the index entry for a given XRef. Returns the entry and true if found, or zero entry and false if not found.

func (*RecordIndex) LookupByType added in v0.8.0

func (idx *RecordIndex) LookupByType(recordType string) (IndexEntry, bool)

LookupByType returns the index entry for a record type without XRef. This is useful for finding HEAD or TRLR records. Returns the entry and true if found, or zero entry and false if not found.

func (*RecordIndex) Save added in v0.8.0

func (idx *RecordIndex) Save(w io.Writer) error

Save writes the index to the given writer in gob format. The format includes a version byte for future compatibility.

func (*RecordIndex) SetEncoding added in v0.8.0

func (idx *RecordIndex) SetEncoding(enc string)

SetEncoding sets the encoding string for this index. This should be set during index building to record the detected encoding.

func (*RecordIndex) Types added in v0.8.0

func (idx *RecordIndex) Types() []string

Types returns all record types without XRefs in the index.

func (*RecordIndex) XRefs added in v0.8.0

func (idx *RecordIndex) XRefs() []string

XRefs returns all XRefs in the index.

type RecordIterator added in v0.8.0

type RecordIterator struct {
	// contains filtered or unexported fields
}

RecordIterator provides streaming access to GEDCOM records. It groups lines into records (level-0 boundaries) without loading the entire file into memory.

func NewRecordIterator added in v0.8.0

func NewRecordIterator(r io.Reader) *RecordIterator

NewRecordIterator creates a new RecordIterator that reads from the given reader. The reader should already be wrapped with charset.NewReader() for encoding normalization.

func (*RecordIterator) Err added in v0.8.0

func (it *RecordIterator) Err() error

Err returns any error encountered during iteration. Should be checked after Next() returns false.

func (*RecordIterator) Next added in v0.8.0

func (it *RecordIterator) Next() bool

Next advances the iterator to the next record. Returns true if a record is available, false when iteration is complete or on error.

func (*RecordIterator) Record added in v0.8.0

func (it *RecordIterator) Record() *RawRecord

Record returns the current record. Returns nil if Next() has not been called or returned false.

type RecordIteratorWithOffset added in v0.8.0

type RecordIteratorWithOffset struct {
	// contains filtered or unexported fields
}

RecordIteratorWithOffset creates a RecordIterator that tracks accurate byte offsets. This is used when building an index and needs precise offset tracking.

func NewRecordIteratorWithOffset added in v0.8.0

func NewRecordIteratorWithOffset(r io.Reader) *RecordIteratorWithOffset

NewRecordIteratorWithOffset creates an iterator with accurate byte offset tracking.

func (*RecordIteratorWithOffset) Err added in v0.8.0

func (it *RecordIteratorWithOffset) Err() error

Err returns any error encountered during iteration.

func (*RecordIteratorWithOffset) Next added in v0.8.0

func (it *RecordIteratorWithOffset) Next() bool

Next advances the iterator to the next record.

func (*RecordIteratorWithOffset) Record added in v0.8.0

func (it *RecordIteratorWithOffset) Record() *RawRecord

Record returns the current record.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL