Reading Files Line by Line in Bash
Bash provides several methods to process files line by line, each with different performance characteristics and use cases.
Using a While Loop with IFS
The most common approach uses a while loop with the read builtin:
while IFS= read -r line; do
echo "Processing: $line"
done < filename.txt
Breaking this down:
IFS=prevents leading/trailing whitespace from being trimmedread -rreads raw input without interpreting backslash escapes< filename.txtprovides input redirection
For files with quoted fields or special characters:
while IFS= read -r line; do
echo "$line"
done < "$input_file"
Always quote variables like "$input_file" to handle filenames with spaces.
Processing Specific Fields
If you need to split each line into fields:
while IFS=',' read -r field1 field2 field3; do
echo "Field 1: $field1"
echo "Field 2: $field2"
done < data.csv
The IFS (Internal Field Separator) here is set to comma. Use multiple characters if needed: IFS=':,'.
Reading into Arrays
For lines with variable field counts, use arrays:
while IFS= read -r -a fields; do
echo "First field: ${fields[0]}"
echo "Total fields: ${#fields[@]}"
done < file.txt
Handling Different Line Endings
Files from Windows systems may contain carriage returns (\r). Strip them with:
while IFS= read -r line; do
line="${line%$'\r'}"
echo "$line"
done < windows_file.txt
Or use dos2unix beforehand:
dos2unix filename.txt
Performance Considerations
The while read approach is reasonably fast for most files, but reading large files (>1GB) line by line in pure Bash is slow. For better performance:
mapfile -t lines < filename.txt
for line in "${lines[@]}"; do
echo "Processing: $line"
done
mapfile reads the entire file into an array in a single operation. The -t flag removes trailing newlines. This is faster but uses more memory.
Common Pitfalls
Subshell issues: Piping to a loop creates a subshell, losing variables:
# Wrong - count will be 0 after loop
count=0
cat file.txt | while read -r line; do
((count++))
done
echo $count # Outputs 0
Use input redirection instead:
# Correct - count persists
count=0
while read -r line; do
((count++))
done < file.txt
echo $count # Outputs actual line count
Empty lines: By default, read returns false on EOF, and the loop terminates. Handle this explicitly if needed:
while IFS= read -r line || [[ -n "$line" ]]; do
echo "Line: $line"
done < file.txt
Real-World Example
Processing a log file and extracting timestamps:
#!/bin/bash
logfile="$1"
[[ -f "$logfile" ]] || { echo "File not found"; exit 1; }
while IFS= read -r line; do
# Extract timestamp (assuming format: [YYYY-MM-DD HH:MM:SS])
timestamp=$(echo "$line" | grep -oP '\[\K[^\]]+')
if [[ -n "$timestamp" ]]; then
echo "Found: $timestamp"
fi
done < "$logfile"
Using grep with Filtering
If you only need lines matching a pattern:
while IFS= read -r line; do
echo "Matched: $line"
done < <(grep "pattern" filename.txt)
Process substitution <(...) avoids subshell issues while filtering efficiently.
Debugging
Use set -x to trace execution:
set -x
while read -r line; do
echo "$line"
done < file.txt
set +x
Or enable it only for the loop by running with bash -x script.sh.
Always validate input files exist and contain expected data before processing in production scripts. Use [[ -f "$file" ]] to check file existence and wc -l to preview line counts.
