Reading Files Line By Line In Bash

Bash provides several methods to process files line by line, each with different performance characteristics and use cases.

Using a While Loop with IFS

The most common approach uses a while loop with the read builtin:

while IFS= read -r line; do
    echo "Processing: $line"
done < filename.txt

Breaking this down:

IFS= prevents leading/trailing whitespace from being trimmed
read -r reads raw input without interpreting backslash escapes
< filename.txt provides input redirection

For files with quoted fields or special characters:

while IFS= read -r line; do
    echo "$line"
done < "$input_file"

Always quote variables like "$input_file" to handle filenames with spaces.

Processing Specific Fields

If you need to split each line into fields:

while IFS=',' read -r field1 field2 field3; do
    echo "Field 1: $field1"
    echo "Field 2: $field2"
done < data.csv

The IFS (Internal Field Separator) here is set to comma. Use multiple characters if needed: IFS=':,'.

Reading into Arrays

For lines with variable field counts, use arrays:

while IFS= read -r -a fields; do
    echo "First field: ${fields[0]}"
    echo "Total fields: ${#fields[@]}"
done < file.txt

Handling Different Line Endings

Files from Windows systems may contain carriage returns (\r). Strip them with:

while IFS= read -r line; do
    line="${line%$'\r'}"
    echo "$line"
done < windows_file.txt

Or use dos2unix beforehand:

dos2unix filename.txt

Performance Considerations

The while read approach is reasonably fast for most files, but reading large files (>1GB) line by line in pure Bash is slow. For better performance:

mapfile -t lines < filename.txt
for line in "${lines[@]}"; do
    echo "Processing: $line"
done

mapfile reads the entire file into an array in a single operation. The -t flag removes trailing newlines. This is faster but uses more memory.

Common Pitfalls

Subshell issues: Piping to a loop creates a subshell, losing variables:

# Wrong - count will be 0 after loop
count=0
cat file.txt | while read -r line; do
    ((count++))
done
echo $count  # Outputs 0

Use input redirection instead:

# Correct - count persists
count=0
while read -r line; do
    ((count++))
done < file.txt
echo $count  # Outputs actual line count

Empty lines: By default, read returns false on EOF, and the loop terminates. Handle this explicitly if needed:

while IFS= read -r line || [[ -n "$line" ]]; do
    echo "Line: $line"
done < file.txt

Real-World Example

Processing a log file and extracting timestamps:

#!/bin/bash

logfile="$1"
[[ -f "$logfile" ]] || { echo "File not found"; exit 1; }

while IFS= read -r line; do
    # Extract timestamp (assuming format: [YYYY-MM-DD HH:MM:SS])
    timestamp=$(echo "$line" | grep -oP '\[\K[^\]]+')

    if [[ -n "$timestamp" ]]; then
        echo "Found: $timestamp"
    fi
done < "$logfile"

Using grep with Filtering

If you only need lines matching a pattern:

while IFS= read -r line; do
    echo "Matched: $line"
done < <(grep "pattern" filename.txt)

Process substitution <(...) avoids subshell issues while filtering efficiently.

Debugging

Use set -x to trace execution:

set -x
while read -r line; do
    echo "$line"
done < file.txt
set +x

Or enable it only for the loop by running with bash -x script.sh.

Always validate input files exist and contain expected data before processing in production scripts. Use [[ -f "$file" ]] to check file existence and wc -l to preview line counts.

Reading Files Line by Line in Bash