How to convert PDF to text with format kept on Linux?

How to convert PDF to text with format kept on Linux?

Many of the formatting in PDF will not be available in text. But better keep the text's relative positions as the same. For example, the table columns should be kept.

asked Oct 28, 2015 by anonymous

1 Answer

The pdftotext tool can convert PDF to text pretty well:

pdftotext - Portable Document Format (PDF) to text converter

with the -layout option:

-layout

Maintain (as best as possible) the original physical layout of the text. The default is to 'undo' physical layout (columns, hyphenation,

etc.) and output the text in reading order.

$ pdftotext -layout file.pdf file.txt

and file.txt will contain the text version of the main text content of the PDF with layout kept as best as possible.

answered Nov 8, 2015 by Eric Z Ma (44,280 points)

Please log in or register to answer this question.

Copyright © SysTutorials. User contributions licensed under cc-wiki with attribution required.
Hosted on Dreamhost

...