3 Ways of .odt to .txt File Conversion in Command Line in Linux

Posted on

The Open Document .odt files can contain rich formats for the content. However, some times a plain text file is more handy. We may convert .odt files to plain text files for such needs. In this post, we discuss 3 ways of how to convert .odt files to .text files in command line in Linux. The ways here can be easily organized into a Bash script to do batch processing of a set of files too. Together with the ways of .docx/.doc to .odt File Conversion in Command Line in Linux, the methods here can be used to do .docx/.doc to plain text file conversion.

We use the LibreOffice and pandoc software. Make sure the software packages are installed in the Linux system. As an example, we use a .odt file as follows.

As shown in the following examples, different ways have different pros/cons. In actual usage, we may choose one suitable way or combine the results from different ways together according to the files or the purposes.

.odt to .txt file conversion using LibreOffice

We can use the --convert-to feature of the LibreOffice software to conver the .odt file to .txt file. The command to convertt the .odt file to .txt file is as follows.

$ libreoffice --convert-to txt input.odt 
convert /home/davidy/Downloads/input.odt -> /home/davidy/Downloads/input.txt using filter : Text

The converted .txt file looks like this.

$ cat input.txt 
This is an example docx file.

This is a title

This is a heading level 1

This is some text with format: bold, italic and underline.

Also some code below.

int main() {
  std::cout << “Hello World!\n”;
  return 0;
}

Here, we can see all the text including spaces (those in the code section) are kept. However, the format (like bold, italic, titles) are not included.

Convert .odt to .txt file using pandoc

The pandoc tool can convert many file formats. It can also read .odt files and generate .txt files.

Here is the command to convert the .odt file to .txt file is as follows.

$ pandoc -t plain input.odt > input-pandoc.txt

The .txt file generated is as follows.

$ cat input-pandoc.txt 
This is an example docx file.

This is a title

THIS IS A HEADING LEVEL 1

This is some text with format: BOLD, _italic_ and _underline_.

Also some code below.

int main() {
std::cout << “Hello World!\n”;
return 0;
}

Here, we can see pandoc keeps some of the format (using BOLD for bold fonts, and _italic_ for italic format). However, it removes some spaces in the code section.

Convert .odt to Markdown .txt file using pandoc

Markdown format is a plain text format with its special markup elements into the text document to indicate formats. The markup elements are also in plain text and readable. It can be a good alternative plain text format.

Here is the command to convert the .odt file to Markdown format.

$ pandoc -t markdown input.odt > input-pandoc.md

The converted Markdown file is as follows.

$ cat input-pandoc.md
This is an example docx file.

This is a title

This is a heading level 1
=========================

This is some text with format: **bold**, *italic* and *underline*.

Also some code below.

int main() {\
std::cout \<\< "Hello World!\\n";\
return 0;\
}

Here, we can see the format are marked using Markdown markups (**bold**, ==== and *italic*). It is much better although it is not ideal regarding the code section handling.

Summary

This post introduce 3 ways of how to convert .odt to .txt files in command line in Linux. The ways have their pros and cons. But these methods can help us do the majority part of the conversion job. For example, for the example document in this post, by manually adjusting the Markdown plain text file based on pandoc‘s output and LibreOffice‘s output (for the code section), we can have a good plain text for the document.

This is an example docx file.

This is a title

This is a heading level 1
=========================

This is some text with format: **bold**, *italic* and *underline*.

Also some code below.

```
int main() {
  std::cout << “Hello World!\n”;
  return 0;
}
```

The Markdown, if converted to HTML, will look like this:



Leave a Reply

Your email address will not be published. Required fields are marked *