How to `cut` a String Using a String as the Delimiter in Bash?

The Problem: cut Only Accepts Single-Character Delimiters

The cut command is designed for simple field splitting, but it has a hard limitation: the delimiter must be a single character.

cut --delimiter="delim" -f 1,3
# Error: cut: the delimiter must be a single character

This is a deliberate design choice. cut prioritizes speed and simplicity for common use cases like splitting on tabs, spaces, or colons. If you need multi-character delimiters, you need a different tool.

Solution: Use awk for Multi-Character Delimiters

The most practical solution is awk, which treats the -F option as a full delimiter pattern rather than a single character.

awk -F 'delim' '{print $1, $3}' input.txt

This splits each line on the string delim and prints fields 1 and 3.

More Realistic Example

If you’re parsing a file with ::|:: as a delimiter:

awk -F '::\|::' '{print $1; print $3}' data.txt

Note the escaped pipe character — awk’s -F interprets the argument as an extended regular expression, so special regex characters need escaping.

Using FPAT for Complex Patterns

For more sophisticated field matching, use the FPAT variable to define what fields are rather than what separates them. This is cleaner when dealing with quoted fields or nested delimiters:

awk 'BEGIN {FPAT = "[a-zA-Z0-9_]+"} {print $1, $3}' input.txt

This extracts contiguous alphanumeric sequences as fields, ignoring any delimiter structure entirely.

FPAT is especially useful for CSV-like data with quoted fields:

awk 'BEGIN {FPAT = "([^,]+)|(\"[^\"]+\")"} {print $1, $3}' data.csv

sed as an Alternative

For simpler cases, sed can work too:

sed 's/delim/\n/g' input.txt | head -n 3 | tail -n 1

This converts the delimiter to newlines and extracts specific lines, but it’s less elegant for multiple fields.

perl for Maximum Flexibility

If you’re already using perl in your scripts, it handles multi-character delimiters natively:

perl -lane 'print $F[0], " ", $F[2]' -F 'delim' input.txt

The -a flag autosplits on the delimiter specified with -F, and @F contains the resulting fields.

Practical Comparison

Tool Multi-char Delimiter Speed Complexity
cut No Fast Low
awk Yes (regex) Good Low-Medium
sed Yes (via regex) Good Medium
perl Yes Good Medium-High

For most cases, awk is the right choice: it’s universally available, handles complex delimiters, and the syntax is straightforward for field extraction.

Similar Posts

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *