Documentation
¶
Overview ¶
Package parser provides low-level GEDCOM line parsing functionality.
This package handles the tokenization and parsing of individual GEDCOM lines, converting them into Line structures with level, tag, value, and cross-reference information. It supports all standard GEDCOM formats and provides detailed error reporting with line numbers.
Example usage:
p := parser.NewParser(reader)
for {
line, err := p.ParseLine()
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
fmt.Printf("Level %d: %s = %s\n", line.Level, line.Tag, line.Value)
}
Example ¶
Example demonstrates basic GEDCOM line parsing.
package main
import (
"fmt"
"strings"
"github.com/cacack/gedcom-go/parser"
)
func main() {
// GEDCOM data as a string (typically read from a file)
gedcomData := `0 HEAD
1 GEDC
2 VERS 5.5
0 @I1@ INDI
1 NAME John /Smith/
0 TRLR`
// Create a parser and parse the content
p := parser.NewParser()
lines, err := p.Parse(strings.NewReader(gedcomData))
if err != nil {
fmt.Printf("Error: %v\n", err)
return
}
fmt.Printf("Parsed %d lines\n", len(lines))
}
Output: Parsed 6 lines
Index ¶
- Constants
- Variables
- func Records(r io.Reader) iter.Seq2[*RawRecord, error]
- func RecordsWithOffset(r io.Reader) iter.Seq2[*RawRecord, error]
- type IndexEntry
- type LazyParser
- func (p *LazyParser) AllRecords() iter.Seq2[*RawRecord, error]
- func (p *LazyParser) BuildIndex() error
- func (p *LazyParser) FindRecord(xref string) (*RawRecord, error)
- func (p *LazyParser) FindRecordByType(recordType string) (*RawRecord, error)
- func (p *LazyParser) HasIndex() bool
- func (p *LazyParser) Index() *RecordIndex
- func (p *LazyParser) Iterate() *RecordIterator
- func (p *LazyParser) IterateAll() (*RecordIterator, error)
- func (p *LazyParser) IterateFrom(offset int64) (*RecordIterator, error)
- func (p *LazyParser) LoadIndex(r io.Reader) error
- func (p *LazyParser) RecordCount() int
- func (p *LazyParser) Records() iter.Seq2[*RawRecord, error]
- func (p *LazyParser) RecordsFrom(offset int64) iter.Seq2[*RawRecord, error]
- func (p *LazyParser) SaveIndex(w io.Writer) error
- func (p *LazyParser) XRefs() []string
- type Line
- type ParseError
- type ParseOptions
- type Parser
- type RawRecord
- type RecordIndex
- func (idx *RecordIndex) Encoding() string
- func (idx *RecordIndex) Len() int
- func (idx *RecordIndex) Lookup(xref string) (IndexEntry, bool)
- func (idx *RecordIndex) LookupByType(recordType string) (IndexEntry, bool)
- func (idx *RecordIndex) Save(w io.Writer) error
- func (idx *RecordIndex) SetEncoding(enc string)
- func (idx *RecordIndex) Types() []string
- func (idx *RecordIndex) XRefs() []string
- type RecordIterator
- type RecordIteratorWithOffset
Examples ¶
Constants ¶
const IndexVersion byte = 1
IndexVersion is the current version of the index format. Increment when making incompatible changes to the index structure.
const MaxNestingDepth = 100
MaxNestingDepth is the maximum allowed nesting depth to prevent stack overflow.
Variables ¶
var ErrIndexVersionMismatch = errors.New("index version mismatch")
ErrIndexVersionMismatch is returned when loading an index with an incompatible version.
var ErrNoIndex = errors.New("no index available: call BuildIndex or LoadIndex first")
ErrNoIndex is returned when attempting to use indexed operations without an index.
var ErrRecordNotFound = errors.New("record not found")
ErrRecordNotFound is returned when a record cannot be found.
Functions ¶
func Records ¶ added in v1.2.0
Records returns an iterator over GEDCOM records using Go 1.23 range-over-func. It yields (*RawRecord, nil) for each successfully parsed record. On parse error, it yields (nil, error) and stops iteration.
This function provides a modern, idiomatic alternative to RecordIterator for streaming GEDCOM record processing. The reader should already be wrapped with charset.NewReader() for encoding normalization.
Usage:
for record, err := range parser.Records(reader) {
if err != nil {
return err
}
// process record
}
Early termination is supported - breaking from the loop will stop iteration:
for record, err := range parser.Records(reader) {
if err != nil {
return err
}
if record.Type == "TRLR" {
break // stop at trailer
}
}
func RecordsWithOffset ¶ added in v1.2.0
RecordsWithOffset returns an iterator over GEDCOM records with accurate byte offset tracking. It yields (*RawRecord, nil) for each successfully parsed record. On parse error, it yields (nil, error) and stops iteration.
This is the range-over-func equivalent of RecordIteratorWithOffset, providing precise ByteOffset and ByteLength values suitable for building file indexes. The reader should already be wrapped with charset.NewReader() for encoding normalization.
Usage:
for record, err := range parser.RecordsWithOffset(reader) {
if err != nil {
return err
}
fmt.Printf("Record %s at offset %d, length %d\n",
record.Type, record.ByteOffset, record.ByteLength)
}
Types ¶
type IndexEntry ¶ added in v0.8.0
type IndexEntry struct {
// XRef is the cross-reference identifier (e.g., "@I1@")
// Empty for records without XRefs (like HEAD, TRLR)
XRef string
// Type is the record type tag (e.g., "INDI", "FAM", "HEAD")
Type string
// ByteOffset is the starting byte position of this record in the file
ByteOffset int64
// ByteLength is the total number of bytes for this record
ByteLength int64
}
IndexEntry represents a single record's location in a GEDCOM file.
type LazyParser ¶ added in v0.8.0
type LazyParser struct {
// contains filtered or unexported fields
}
LazyParser provides lazy/incremental parsing of GEDCOM files. It combines streaming iteration with indexed random access for efficient partial file processing.
func NewLazyParser ¶ added in v0.8.0
func NewLazyParser(rs io.ReadSeeker) *LazyParser
NewLazyParser creates a new LazyParser from an io.ReadSeeker. The reader must support seeking for indexed access operations.
func (*LazyParser) AllRecords ¶ added in v1.2.0
func (p *LazyParser) AllRecords() iter.Seq2[*RawRecord, error]
AllRecords seeks to the beginning and returns an iterator over all records using Go 1.23 range-over-func. This is the range-over-func equivalent of [IterateAll].
Usage:
for record, err := range lp.AllRecords() {
if err != nil {
return err
}
// process record
}
func (*LazyParser) BuildIndex ¶ added in v0.8.0
func (p *LazyParser) BuildIndex() error
BuildIndex scans the entire file to build an index for O(1) record lookup. After calling BuildIndex, use FindRecord for efficient random access. The reader is rewound to the beginning after building the index.
func (*LazyParser) FindRecord ¶ added in v0.8.0
func (p *LazyParser) FindRecord(xref string) (*RawRecord, error)
FindRecord locates and parses a specific record by XRef. Requires an index to be built or loaded first. Returns ErrNoIndex if no index is available. Returns ErrRecordNotFound if the XRef is not in the index.
func (*LazyParser) FindRecordByType ¶ added in v0.8.0
func (p *LazyParser) FindRecordByType(recordType string) (*RawRecord, error)
FindRecordByType locates and parses a record by type (for records without XRef). This is useful for finding HEAD or TRLR records. Requires an index to be built or loaded first.
func (*LazyParser) HasIndex ¶ added in v0.8.0
func (p *LazyParser) HasIndex() bool
HasIndex returns true if an index is available.
func (*LazyParser) Index ¶ added in v0.8.0
func (p *LazyParser) Index() *RecordIndex
Index returns the current index, or nil if none is loaded.
func (*LazyParser) Iterate ¶ added in v0.8.0
func (p *LazyParser) Iterate() *RecordIterator
Iterate returns a RecordIterator for streaming through records. The iterator starts from the current position of the reader. For full file iteration, seek to the beginning first.
func (*LazyParser) IterateAll ¶ added in v0.8.0
func (p *LazyParser) IterateAll() (*RecordIterator, error)
IterateAll seeks to the beginning and returns an iterator for all records.
func (*LazyParser) IterateFrom ¶ added in v0.8.0
func (p *LazyParser) IterateFrom(offset int64) (*RecordIterator, error)
IterateFrom seeks to the given byte offset and returns an iterator. This allows resuming iteration from a known position.
func (*LazyParser) LoadIndex ¶ added in v0.8.0
func (p *LazyParser) LoadIndex(r io.Reader) error
LoadIndex loads a pre-built index from the given reader. Using a pre-built index avoids the O(n) scan of BuildIndex.
func (*LazyParser) RecordCount ¶ added in v0.8.0
func (p *LazyParser) RecordCount() int
RecordCount returns the total number of indexed records. Returns 0 if no index is available.
func (*LazyParser) Records ¶ added in v1.2.0
func (p *LazyParser) Records() iter.Seq2[*RawRecord, error]
Records returns an iterator over records from the current position using Go 1.23 range-over-func. This is the range-over-func equivalent of [Iterate].
Usage:
for record, err := range lp.Records() {
if err != nil {
return err
}
// process record
}
func (*LazyParser) RecordsFrom ¶ added in v1.2.0
RecordsFrom seeks to the given byte offset and returns an iterator over records using Go 1.23 range-over-func. This is the range-over-func equivalent of [IterateFrom].
If seeking fails, the error is yielded as the first iteration result.
Usage:
for record, err := range lp.RecordsFrom(offset) {
if err != nil {
return err
}
// process record
}
func (*LazyParser) SaveIndex ¶ added in v0.8.0
func (p *LazyParser) SaveIndex(w io.Writer) error
SaveIndex writes the current index to the given writer. Returns ErrNoIndex if no index has been built or loaded.
func (*LazyParser) XRefs ¶ added in v0.8.0
func (p *LazyParser) XRefs() []string
XRefs returns all XRefs in the index. Returns nil if no index is available.
type Line ¶
type Line struct {
// Level indicates the hierarchical depth (0, 1, 2, etc.)
Level int
// Tag is the GEDCOM tag (e.g., HEAD, INDI, NAME, BIRT)
Tag string
// Value is the optional value associated with the tag
Value string
// XRef is the optional cross-reference identifier (e.g., @I1@)
XRef string
// LineNumber is the line number in the source file (1-based)
// Used for error reporting
LineNumber int
}
Line represents a single parsed line from a GEDCOM file. GEDCOM files use a line-based format with hierarchical levels. Each line format: LEVEL [XREF] TAG [VALUE]
Example ¶
ExampleLine demonstrates accessing parsed Line fields.
package main
import (
"fmt"
"github.com/cacack/gedcom-go/parser"
)
func main() {
p := parser.NewParser()
// Parse a line with XRef (cross-reference identifier)
line1, _ := p.ParseLine("0 @I1@ INDI")
fmt.Printf("XRef: %s, Tag: %s\n", line1.XRef, line1.Tag)
// Parse a line with a value
line2, _ := p.ParseLine("1 NAME John /Smith/")
fmt.Printf("Tag: %s, Value: %s\n", line2.Tag, line2.Value)
// Parse a nested line
line3, _ := p.ParseLine("2 GIVN John")
fmt.Printf("Level: %d, Tag: %s, Value: %s\n", line3.Level, line3.Tag, line3.Value)
}
Output: XRef: @I1@, Tag: INDI Tag: NAME, Value: John /Smith/ Level: 2, Tag: GIVN, Value: John
Example (LineNumber) ¶
ExampleLine_lineNumber shows how line numbers are tracked for error reporting.
package main
import (
"fmt"
"github.com/cacack/gedcom-go/parser"
)
func main() {
p := parser.NewParser()
// Parse multiple lines - line numbers are tracked automatically
lines := []string{
"0 HEAD",
"1 SOUR MyApp",
"0 @I1@ INDI",
}
for _, input := range lines {
line, _ := p.ParseLine(input)
fmt.Printf("Line %d: %s\n", line.LineNumber, line.Tag)
}
}
Output: Line 1: HEAD Line 2: SOUR Line 3: INDI
type ParseError ¶
type ParseError struct {
// Line is the line number where the error occurred (1-based)
Line int
// Message describes what went wrong
Message string
// Context provides the actual line content that caused the error
Context string
// Err is the underlying error, if any
Err error
}
ParseError represents an error that occurred during parsing. It includes line number and context for better error reporting.
func (*ParseError) Error ¶
func (e *ParseError) Error() string
func (*ParseError) Unwrap ¶
func (e *ParseError) Unwrap() error
type ParseOptions ¶ added in v1.1.0
type ParseOptions struct {
// Lenient controls error handling behavior.
// If true, the parser collects errors and continues parsing.
// If false (default), the parser fails on the first error.
Lenient bool
// MaxErrors is the maximum number of errors to collect in lenient mode.
// When reached, parsing continues but errors are no longer collected.
// A value of 0 means unlimited errors will be collected.
MaxErrors int
}
ParseOptions configures the behavior of ParseWithOptions.
type Parser ¶
type Parser struct {
// contains filtered or unexported fields
}
Parser parses GEDCOM files into Line structures.
func (*Parser) Parse ¶
Parse reads a GEDCOM file from a reader and returns all parsed lines. Supports all line ending styles: LF (Unix), CRLF (Windows), CR (old Macintosh).
Example ¶
ExampleParse shows how to parse complete GEDCOM content from an io.Reader.
package main
import (
"fmt"
"strings"
"github.com/cacack/gedcom-go/parser"
)
func main() {
gedcomData := `0 HEAD
1 GEDC
2 VERS 5.5.1
0 @I1@ INDI
1 NAME Alice /Johnson/
1 SEX F
0 @I2@ INDI
1 NAME Bob /Johnson/
1 SEX M
0 TRLR`
p := parser.NewParser()
lines, err := p.Parse(strings.NewReader(gedcomData))
if err != nil {
fmt.Printf("Error: %v\n", err)
return
}
// Count records at level 0 (top-level records)
records := 0
for _, line := range lines {
if line.Level == 0 {
records++
}
}
fmt.Printf("Total lines: %d\n", len(lines))
fmt.Printf("Top-level records: %d\n", records)
}
Output: Total lines: 10 Top-level records: 4
func (*Parser) ParseLine ¶
ParseLine parses a single GEDCOM line. GEDCOM line format: LEVEL [XREF] TAG [VALUE] Examples:
0 HEAD 0 @I1@ INDI 1 NAME John /Smith/ 2 GIVN John
Example ¶
ExampleParser_ParseLine shows line-by-line parsing for streaming scenarios.
package main
import (
"fmt"
"github.com/cacack/gedcom-go/parser"
)
func main() {
// Line-by-line parsing is useful for streaming or custom parsing logic
p := parser.NewParser()
inputLines := []string{
"0 HEAD",
"1 GEDC",
"2 VERS 5.5",
"0 @I1@ INDI",
"1 NAME John /Smith/",
}
for _, input := range inputLines {
line, err := p.ParseLine(input)
if err != nil {
fmt.Printf("Error: %v\n", err)
return
}
fmt.Printf("Level %d: %s\n", line.Level, line.Tag)
}
}
Output: Level 0: HEAD Level 1: GEDC Level 2: VERS Level 0: INDI Level 1: NAME
func (*Parser) ParseWithOptions ¶ added in v1.1.0
func (p *Parser) ParseWithOptions(r io.Reader, opts *ParseOptions) ( lines []*Line, parseErrors []*ParseError, fatalErr error, )
ParseWithOptions reads a GEDCOM file with configurable error handling. In lenient mode, it collects parse errors and continues parsing. Returns:
- lines: successfully parsed lines (may be partial in lenient mode)
- parseErrors: syntax errors encountered (only populated in lenient mode)
- fatalErr: unrecoverable errors like I/O failures
type RawRecord ¶ added in v0.8.0
type RawRecord struct {
// XRef is the optional cross-reference identifier (e.g., "@I1@")
XRef string
// Type is the tag at level 0 (e.g., "INDI", "FAM", "HEAD", "TRLR")
Type string
// Lines contains all parsed lines belonging to this record, including the level-0 line
Lines []*Line
// ByteOffset is the starting byte position of this record in the file
ByteOffset int64
// ByteLength is the total number of bytes for this record
ByteLength int64
}
RawRecord represents a complete GEDCOM record with all its subordinate lines. A record starts at level 0 and includes all following lines until the next level 0.
type RecordIndex ¶ added in v0.8.0
type RecordIndex struct {
// contains filtered or unexported fields
}
RecordIndex provides O(1) lookup of records by XRef after an O(n) build phase. The index maps XRef strings to their byte offsets in the original file.
func BuildIndex ¶ added in v0.8.0
func BuildIndex(r io.Reader) (*RecordIndex, error)
BuildIndex builds an index by scanning the entire file once. The reader should be positioned at the start of the file. After building, the reader position is at EOF.
func LoadIndex ¶ added in v0.8.0
func LoadIndex(r io.Reader) (*RecordIndex, error)
LoadIndex reads an index from the given reader. Returns an error if the index version is incompatible.
func NewRecordIndex ¶ added in v0.8.0
func NewRecordIndex() *RecordIndex
NewRecordIndex creates an empty RecordIndex.
func (*RecordIndex) Encoding ¶ added in v0.8.0
func (idx *RecordIndex) Encoding() string
Encoding returns the encoding that was detected when the index was built.
func (*RecordIndex) Len ¶ added in v0.8.0
func (idx *RecordIndex) Len() int
Len returns the total number of indexed entries (both XRef and type-based).
func (*RecordIndex) Lookup ¶ added in v0.8.0
func (idx *RecordIndex) Lookup(xref string) (IndexEntry, bool)
Lookup returns the index entry for a given XRef. Returns the entry and true if found, or zero entry and false if not found.
func (*RecordIndex) LookupByType ¶ added in v0.8.0
func (idx *RecordIndex) LookupByType(recordType string) (IndexEntry, bool)
LookupByType returns the index entry for a record type without XRef. This is useful for finding HEAD or TRLR records. Returns the entry and true if found, or zero entry and false if not found.
func (*RecordIndex) Save ¶ added in v0.8.0
func (idx *RecordIndex) Save(w io.Writer) error
Save writes the index to the given writer in gob format. The format includes a version byte for future compatibility.
func (*RecordIndex) SetEncoding ¶ added in v0.8.0
func (idx *RecordIndex) SetEncoding(enc string)
SetEncoding sets the encoding string for this index. This should be set during index building to record the detected encoding.
func (*RecordIndex) Types ¶ added in v0.8.0
func (idx *RecordIndex) Types() []string
Types returns all record types without XRefs in the index.
func (*RecordIndex) XRefs ¶ added in v0.8.0
func (idx *RecordIndex) XRefs() []string
XRefs returns all XRefs in the index.
type RecordIterator ¶ added in v0.8.0
type RecordIterator struct {
// contains filtered or unexported fields
}
RecordIterator provides streaming access to GEDCOM records. It groups lines into records (level-0 boundaries) without loading the entire file into memory.
func NewRecordIterator ¶ added in v0.8.0
func NewRecordIterator(r io.Reader) *RecordIterator
NewRecordIterator creates a new RecordIterator that reads from the given reader. The reader should already be wrapped with charset.NewReader() for encoding normalization.
func (*RecordIterator) Err ¶ added in v0.8.0
func (it *RecordIterator) Err() error
Err returns any error encountered during iteration. Should be checked after Next() returns false.
func (*RecordIterator) Next ¶ added in v0.8.0
func (it *RecordIterator) Next() bool
Next advances the iterator to the next record. Returns true if a record is available, false when iteration is complete or on error.
func (*RecordIterator) Record ¶ added in v0.8.0
func (it *RecordIterator) Record() *RawRecord
Record returns the current record. Returns nil if Next() has not been called or returned false.
type RecordIteratorWithOffset ¶ added in v0.8.0
type RecordIteratorWithOffset struct {
// contains filtered or unexported fields
}
RecordIteratorWithOffset creates a RecordIterator that tracks accurate byte offsets. This is used when building an index and needs precise offset tracking.
func NewRecordIteratorWithOffset ¶ added in v0.8.0
func NewRecordIteratorWithOffset(r io.Reader) *RecordIteratorWithOffset
NewRecordIteratorWithOffset creates an iterator with accurate byte offset tracking.
func (*RecordIteratorWithOffset) Err ¶ added in v0.8.0
func (it *RecordIteratorWithOffset) Err() error
Err returns any error encountered during iteration.
func (*RecordIteratorWithOffset) Next ¶ added in v0.8.0
func (it *RecordIteratorWithOffset) Next() bool
Next advances the iterator to the next record.
func (*RecordIteratorWithOffset) Record ¶ added in v0.8.0
func (it *RecordIteratorWithOffset) Record() *RawRecord
Record returns the current record.