Converting documents (ODT, DOC to PDF) on PHP with Unoconv / LibreOffice

The problem

Convert .doc, .odt, .docx files to .pdf, or another combination (i.e. .docx to .odt) under PHP.

The solution

To solve this problem we'll install Unoconv, LibreOffice command tools and build a PHP Class.

This is an old article (2013). Maybe the content is outdated.

Installing

Unoconv is Python tool that uses LibreOffice libs (pyuno).

Instaling LibreOffice command line tools

On a server you're not required to make a full install of LibreOffice, just command line and converters that you find on core.

For Ubuntu/Debian:

apt-get install openjdk-6-jdk libreoffice-core libreoffice-common libreoffice-writer python-uno

Important

Installing libreoffice-writer gives you support to convert TEXT documents. To convert other formats (spreadsheets, presentations, images, etc), install the related LibreOffice package.

For images, consider lighter and well-known tools such ImageMagick;

For converting PDF to Text, consider PDF to Text - you can find it in Poppler-Utils package.

Installing o Unoconv

Installing libreoffice-writer gives you support to convert TEXT documents. To convert other formats (spreadsheets, presentations, images, etc), install the related LibreOffice package.

For images, consider lighter and well-known tools such ImageMagick;

For converting PDF to Text, consider PDF to Text - you can find it in Poppler-Utils package.

Installing o Unoconv

As root:

cd /tmp
git clone https://github.com/dagwieers/unoconv
cd unoconv/
make install
cd ../
rm -rf unoconv/
unoconv --listener &

So you've started LibreOffice/OpenOffice as a service running on a local port, and you can check with ps aux | grep soffice.

Some warnings:

  • Unlike you can convert only with a LibreOffice/OpenOffice install, using the service and unoconv is better for mass intensive operations, because you reuse an instance always in memory.

  • unoconv package is already in Debian repositories but that's an old version.

Showing support formats

unoconv --show

Creating a Deamon

To demonize unoconv (better for server mode), create a file /etc/init.d/unoconvd with the following content:

( Source )

#!/bin/sh
### BEGIN INIT INFO
# Provides: unoconvd
# Required-Start: $network
# Required-Stop: $network
# Default-Start: 2 3 5
# Default-Stop:
# Description: unoconvd - Converting documents to PDF by unoconv
### END INIT INFO
case "$1" in
    start)
        /usr/bin/unoconv --listener &
        ;;
    stop)
        killall soffice.bin
        ;;
    restart)
        killall soffice.bin
        sleep 1
        /usr/bin/unoconv --listener &
        ;;
esac

The adjust permissions, put on boot and run the daemon:

chmod 755 /etc/init.d/unoconvd
update-rc.d  unoconvd defaults
service unoconvd start

Basic use

It doesn't matter if you've started unoconv manualy or deamonized, you can use as bellow to convert files:

unoconv --format pdf --output /OUTPUT_DIR/ file.odt

That will convert the file.odt to file.pdf on the informed output directory.

PHP Class

A simple PHP wrapper could be as bellow:

<?php

namespace Unoconv;

/**
 * Unoconv class wrapper
 *
 * @author Rafael Goulart <rafaelgou@gmail.com>
 * @see http://tech.rgou.net/
 */
class Unoconv {

    /**
     * Basic converter method
     *
     * @param string $originFilePath Origin File Path
     * @param string $toFormat       Format to export To
     * @param string $outputDirPath  Output directory path
     */
    public static function convert($originFilePath, $outputDirPath, $toFormat)
    {
        $command = 'unoconv --format %s --output %s %s';
        $command = sprintf($command, $toFormat, $outputDirPath, $originFilePath);
        system($command, $output);

        return $output;
    }

    /**
     * Convert to PDF
     *
     * @param string $originFilePath Origin File Path
     * @param string $outputDirPath  Output directory path
     */
    public static function convertToPdf($originFilePath, $outputDirPath)
    {
        return self::convert($originFilePath, $outputDirPath, 'pdf');
    }

    /**
     * Convert to TXT
     *
     * @param string $originFilePath Origin File Path
     * @param string $outputDirPath  Output directory path
     */
    public static function convertToTxt($originFilePath, $outputDirPath)
    {
        return self::convert($originFilePath, $outputDirPath, 'txt');
    }

}

Sample use:

<?php
/**
 * Sample use of Unoconv class
 *
 */
require 'Unoconv.php';

use Unoconv\Unoconv;

// Converting to PDF
$originFilePath = 'test.odt';
$outputDirPath  = './';
Unoconv::convertToPdf($originFilePath, $outputDirPath);

// Converting to DOCX
$originFilePath = 'test.odt';
$outputDirPath  = './';
Unoconv::convert($originFilePath, $outputDirPath, 'docx');