Reading Files Line By Line In Python

Reading a file line by line is memory-efficient and the standard approach for handling large text files in Python. The key is choosing the right method for your use case.

The Iterator Approach (Recommended for Large Files)

For most scenarios, especially with large files, use the file object as an iterator:

with open('file.txt', 'r') as f:
    for line in f:
        print(line.strip())

This is efficient because Python reads the file in chunks rather than loading the entire contents into memory. Each iteration yields the next line, including the trailing newline character — that’s why strip() is typically needed.

Using pathlib (Modern Alternative)

Python 3.12+ developers should consider pathlib for cleaner path handling:

from pathlib import Path

for line in Path('file.txt').read_text().splitlines():
    print(line)

This approach is readable and works well for small to medium files. However, it loads the entire file into memory first, so avoid it for multi-gigabyte files.

Handling Different Encodings

For files with non-UTF-8 encoding, specify it explicitly:

with open('file.txt', 'r', encoding='latin-1') as f:
    for line in f:
        process(line.strip())

Stripping Specific Characters

Sometimes you only want to remove the newline, not all whitespace:

with open('file.txt', 'r') as f:
    for line in f:
        line = line.rstrip('\n')
        print(line)

Use rstrip('\n') to preserve intentional leading/trailing spaces in your data.

Processing with a Buffer

For extremely large files or when you need to process multiple lines at once, read in chunks:

with open('file.txt', 'r') as f:
    while True:
        lines = f.readlines(8192)  # Read 8KB at a time
        if not lines:
            break
        for line in lines:
            process(line.strip())

Skipping Lines and Filtering

Use itertools for efficient filtering without loading everything into memory:

from itertools import islice

with open('file.txt', 'r') as f:
    # Skip the first 5 lines
    for line in islice(f, 5, None):
        print(line.strip())

Or filter based on conditions:

with open('file.txt', 'r') as f:
    for line in f:
        if not line.startswith('#'):  # Skip comment lines
            print(line.strip())

Performance Considerations

The standard with open() iterator is optimized in CPython and typically reads 8KB chunks internally. This is almost always faster than manually calling readline() in a loop. For most real-world use cases, stick with the iterator approach.

If you’re processing millions of lines, benchmarking different approaches on your actual data is worthwhile — I/O patterns and your CPU work per line matter more than the reading method itself.

Common Pitfalls and Best Practices

When working with Python on Linux systems, keep these considerations in mind. Always use virtual environments to avoid polluting the system Python installation. Python 2 reached end-of-life in 2020, so ensure you are using Python 3 for all new projects.

For system scripting, prefer the subprocess module over os.system for better control over process execution. Use pathlib instead of os.path for cleaner file path handling in modern Python.

Related Commands and Tools

These complementary Python tools and commands are useful for daily development workflows:

python3 -m venv myenv – Create an isolated virtual environment
pip list –outdated – Check which packages need updating
python3 -m py_compile script.py – Check syntax without running
black script.py – Auto-format code to PEP 8 standards
mypy script.py – Static type checking for Python code

Quick Verification

After applying the changes described above, verify that everything works as expected. Run the relevant commands to confirm the new configuration is active. Check system logs for any errors or warnings that might indicate problems. If something does not work as expected, review the steps carefully and consult the official documentation for your specific version.

Reading Files Line by Line in Python