How to Match Multiple Lines using Regex in Perl One-liners

Perl one-liners with perl’s regular expression statement can be a very powerful text processing tools used as commands in a terminal or a script. By default, the input to the perl one-liner with -p or -n options is passed line by line. However, when we want to match multiple lines, it gets us some trouble. In this post we take a look at a technique to match multiple lines using Perl one-liner.

As an example, let’s try to find and remove content between <PRE> and </PRE> (both tags included too) if the content contains only new lines (\n), spaces, and <BR>/<HR> tags. A simple regex like <PRE>[\s{<BR>}{<HR>}]*</PRE> matches such criteria. But it does not match across multiple line (that is \s does not match \n). The trick here is to add option -0777 so that the record separator is the char of octal number 777 instead of \n.

perl -0777 -pe 's|<PRE>[\s{<BR>}{<HR>}]*</PRE>||g' 

You can find the meanings of the options to perl used here from perlrun manual.

Here is one example of usages of the above one-liner.

$ echo -e "text\n<PRE>\n<BR>\n<HR><HR>\n \n</PRE>more text"
text
<PRE>
<BR>
<HR><HR>

</PRE>more text
$ echo -e "text\n<PRE>\n<BR>\n<HR><HR>\n \n</PRE>more text" |
perl -0777 -pe 's|<PRE>[\s{<BR>}{<HR>}]*</PRE>||g'
text
more text

The same technique can be used for grep too: How to Grep 2 Lines using grep in Linux.

Eric Ma

Eric is a systems guy. Eric is interested in building high-performance and scalable distributed systems and related technologies. The views or opinions expressed here are solely Eric's own and do not necessarily represent those of any third parties.

Leave a Reply

Your email address will not be published. Required fields are marked *