How to convert PDF to text with format kept on Linux?

How to convert PDF to text with format kept on Linux?

Many of the formatting in PDF will not be available in text. But better keep the text’s relative positions as the same. For example, the table columns should be kept.

The pdftotext tool can convert PDF to text pretty well:

pdftotext – Portable Document Format (PDF) to text converter

with the -layout option:

-layout

Maintain (as best as possible) the original physical layout of the text. The default is to 'undo' physical layout (columns, hyphenation,

etc.) and output the text in reading order.

$ pdftotext -layout file.pdf file.txt

and file.txt will contain the text version of the main text content of the PDF with layout kept as best as possible.

Answered by Eric Z Ma.

Eric Z Ma

Eric is a father and systems guy. Eric is interested in building high-performance and scalable distributed systems and related technologies. The views or opinions expressed here are solely Eric's own and do not necessarily represent those of any third parties.

One comment:

Leave a Reply

Your email address will not be published. Required fields are marked *