Processing Files Line by Line in Go
Processing files line by line is a common task in Go. Here are the practical approaches, from simplest to most flexible.
Using bufio.Scanner
The most straightforward method uses bufio.Scanner, which handles line splitting automatically:
package main
import (
"bufio"
"fmt"
"log"
"os"
)
func main() {
file, err := os.Open("input.txt")
if err != nil {
log.Fatal(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
line := scanner.Text()
fmt.Println(line)
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
}
Scanner handles most cases well. By default it splits on newlines, but you can customize the split function if needed.
Setting Buffer Size
By default, Scanner has a 64KB line buffer. For larger lines, increase it:
scanner := bufio.NewScanner(file)
buf := make([]byte, 0, 64*1024)
scanner.Buffer(buf, 1024*1024) // 1MB max line length
for scanner.Scan() {
line := scanner.Text()
// process line
}
Custom Split Function
Process by delimiter other than newlines:
scanner := bufio.NewScanner(file)
scanner.Split(bufio.ScanWords) // splits on whitespace
for scanner.Scan() {
word := scanner.Text()
// process word
}
Or define your own split function for comma-separated values or other formats:
scanner := bufio.NewScanner(file)
scanner.Split(func(data []byte, atEOF bool) (advance int, token []byte, err error) {
for i := 0; i < len(data); i++ {
if data[i] == ',' {
return i + 1, data[:i], nil
}
}
if !atEOF {
return 0, nil, nil
}
return len(data), data, bufio.ErrFinalToken
})
for scanner.Scan() {
field := scanner.Text()
// process field
}
Using bufio.Reader
For more control, use bufio.Reader directly with ReadString or ReadLine:
reader := bufio.NewReader(file)
for {
line, err := reader.ReadString('\n')
if err != nil {
if err == io.EOF {
break
}
log.Fatal(err)
}
fmt.Print(line)
}
ReadString includes the delimiter. Use strings.TrimSuffix(line, "\n") to remove it if needed. This approach gives you more granular error handling per line.
Handling Different Line Endings
Windows files use \r\n. Scanner handles this automatically, but if using ReadString, strip both:
import "strings"
line = strings.TrimRight(line, "\r\n")
Processing Large Files
For multi-gigabyte files, consider concurrent processing:
type lineJob struct {
num int
text string
}
scanner := bufio.NewScanner(file)
jobs := make(chan lineJob, 100)
go func() {
lineNum := 0
for scanner.Scan() {
jobs <- lineJob{lineNum, scanner.Text()}
lineNum++
}
close(jobs)
}()
for job := range jobs {
// process job.text
}
Performance Considerations
Scanneris convenient but slightly slower thanReaderfor very tight loops- Avoid repeated string allocations; use
scanner.Bytes()instead ofscanner.Text()if you don’t need strings - Pre-allocate buffers when possible
- For CSV/structured data, consider the
encoding/csvpackage instead
Error Handling
Always check scanner.Err() after the loop, not just at each iteration. The scanner won’t report errors until after the loop ends:
for scanner.Scan() {
// process line
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
Most line-by-line processing tasks fit well with Scanner. Reach for Reader when you need per-line error handling or custom buffering behavior.