Converting documents (ODT, DOC to PDF) on PHP with Unoconv / LibreOffice

The problem

Convert .doc, .odt, .docx files to .pdf, or another combination (i.e. .docx to .odt)
under PHP.

To solve this problem we’ll install Unoconv,
LibreOffice command tools and build a PHP Class.

Installing

Unoconv is Python tool that uses LibreOffice libs (pyuno).

Instaling LibreOffice command line tools

On a server you’re not required to make a full install of LibreOffice,
just command line and converters that you find on core.

For Ubuntu/Debian:

apt-get install openjdk-6-jdk libreoffice-core libreoffice-common libreoffice-writer python-uno

Important:

Installing libreoffice-writer gives you support to convert TEXT documents.
To convert other formats (spreadsheets, presentations, images, etc), install the related LibreOffice package.

For images, consider lighter and well-known tools such ImageMagick;

For converting PDF to Text, consider PDF to Text – you can find it in Poppler-Utils package.

Installing o Unoconv

Installing libreoffice-writer gives you support to convert TEXT documents.
To convert other formats (spreadsheets, presentations, images, etc), install the related LibreOffice package.

For images, consider lighter and well-known tools such ImageMagick;

For converting PDF to Text, consider PDF to Text – you can find it in Poppler-Utils package.

Installing o Unoconv

As root:

cd /tmp
git clone https://github.com/dagwieers/unoconv
cd unoconv/
make install
cd ../
rm -rf unoconv/
unoconv --listener &

So you’ve started LibreOffice/OpenOffice as a service running on a local port,
and you can check with ps aux | grep soffice.

Some warnings:

  • Unlike you can convert only with a LibreOffice/OpenOffice install, using the
    service and unoconv is better for mass intensive operations, because you
    reuse an instance always in memory.

  • unoconv package is already in Debian repositories but that’s an old version.

Showing support formats

unoconv --show

Creating a Deamon

To demonize unoconv (better for server mode), create a file /etc/init.d/unoconvd with the following content:

( Source )

#!/bin/sh
### BEGIN INIT INFO
# Provides: unoconvd
# Required-Start: $network
# Required-Stop: $network
# Default-Start: 2 3 5
# Default-Stop:
# Description: unoconvd - Converting documents to PDF by unoconv
### END INIT INFO
case "$1" in
    start)
        /usr/bin/unoconv --listener &
        ;;
    stop)
        killall soffice.bin
        ;;
    restart)
        killall soffice.bin
        sleep 1
        /usr/bin/unoconv --listener &
        ;;
esac

The adjust permissions, put on boot and run the daemon:

chmod 755 /etc/init.d/unoconvd
update-rc.d  unoconvd defaults
service unoconvd start

Basic use

It doesn’t matter if you’ve started unoconv manualy or deamonized, you can
use as bellow to convert files:

unoconv --format pdf --output /OUTPUT_DIR/ file.odt

That will convert the file.odt to file.pdf on the informed output directory.

PHP Class

A simple PHP wrapper could be as bellow:

<?php

namespace Unoconv;

/**
 * Unoconv class wrapper
 *
 * @author Rafael Goulart <rafaelgou@gmail.com>
 * @see http://tech.rgou.net/
 */
class Unoconv {

    /**
     * Basic converter method
     * 
     * @param string $originFilePath Origin File Path
     * @param string $toFormat       Format to export To
     * @param string $outputDirPath  Output directory path
     */
    public static function convert($originFilePath, $outputDirPath, $toFormat)
    {
        $command = 'unoconv --format %s --output %s %s';
        $command = sprintf($command, $toFormat, $outputDirPath, $originFilePath);
        system($command, $output);

        return $output;
    }

    /**
     * Convert to PDF
     * 
     * @param string $originFilePath Origin File Path
     * @param string $outputDirPath  Output directory path
     */
    public static function convertToPdf($originFilePath, $outputDirPath)
    {
        return self::convert($originFilePath, $outputDirPath, 'pdf');
    }

    /**
     * Convert to TXT
     * 
     * @param string $originFilePath Origin File Path
     * @param string $outputDirPath  Output directory path
     */
    public static function convertToTxt($originFilePath, $outputDirPath)
    {
        return self::convert($originFilePath, $outputDirPath, 'txt');
    }

}

Sample use:

<?php
/**
 * Sample use of Unoconv class
 *  
 */
require 'Unoconv.php';

use Unoconv\Unoconv;

// Converting to PDF
$originFilePath = 'test.odt';
$outputDirPath  = './';
Unoconv::convertToPdf($originFilePath, $outputDirPath);

// Converting to DOCX
$originFilePath = 'test.odt';
$outputDirPath  = './';
Unoconv::convert($originFilePath, $outputDirPath, 'docx');
  • I have tried your code. While I have no errors, I also do not get back any pdf file.

    My ubuntu is 12.10. I installed unoconv exactly the way you described.

    But I can only execute the unoconv successfully in terminal and not in PHP script.

    Please advise.

    • Hi, @kimsia,

      The soffice command line dies silently on errors. That makes debugging really hard.

      In my tests, there are two major problems that I’d faced:

      – Missing of libreoffice requirements.
      – Permissions stuffs.

      Maybe you can try:
      apt-get install libreoffice-java-common

      since conversion is Java based. I don’t know if you it’s already installed as a dependecy.

      []’
      Rafael

      • My unoconv works in the terminal but not at php script.

        I have already installed libreoffice-java-common.

        when i logged in as terminal, I am using the same user www-data as the nginx user.

        Please advise.

        • If it would help anyone in PHP you can just use the command “unoconv –format /path/and/file.odt pdf” it will save it in the save directory.
          the destination parameter seem to have issue, havent test it yet 😀

  • If it’s of any use to anyone, I also had an issue where I could execute unoconv via PHP. In my case, it was because I was running Nginx web server with PHP-FPM

    Nginx doesn’t declare a path to execute scripts, so you need to manually declare it.

    In this case, I added the following right before the sprintf command and it worked.

    putenv(‘PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/opt/node/bin’);

    You may of cause also declare it in your FastCGI params in Nginx

    • lark1n

      Thank you, it work for me 🙂

  • A better script:

    ### BEGIN INIT INFO
    # Provides: unoconvd
    # Required-Start: $network
    # Required-Stop: $network
    # Default-Start: 2 3 4 5
    # Default-Stop: 0 1 6
    # Description: unoconvd – Converting documents to PDF by unoconv
    ### END INIT INFO

    #!/bin/sh
    case “$1” in
    status)
    if [ `ps ax | grep “/usr/lib/libreoffice/program/soffice.bin” | grep “accept=socket,host=127.0.0.1,port=2002;urp;StarOffice.ComponentContext” | wc -l` -gt 0 ]; then
    echo “Unoconv listener active”
    else
    echo “Unoconv listnere inactive”
    fi
    ;;
    start)
    if [ `ps ax | grep “/usr/lib/libreoffice/program/soffice.bin” | grep “accept=socket,host=127.0.0.1,port=2002;urp;StarOffice.ComponentContext” | wc -l` -gt 0 ]; then
    echo “Unoconv listener already started.”
    else
    /usr/bin/unoconv –listener &
    echo “Unoconv listener started.”
    fi
    ;;
    stop)
    if [ `ps ax | grep “/usr/lib/libreoffice/program/soffice.bin” | grep “accept=socket,host=127.0.0.1,port=2002;urp;StarOffice.ComponentContext” | wc -l` -gt 0 ]; then
    killall soffice.bin
    echo “Unoconv listener stopped.”
    else
    echo “Unoconv isn’t running.”
    fi
    ;;
    restart)
    $0 stop
    sleep 1
    $0 start
    ;;
    *)
    echo “Usage: /etc/init.d/unoconvd {start|stop|restart|status}”
    exit 1
    ;;
    esac

  • Hi Rafael. Could you please tell me if I can check converting progress and how?

  • Hi!! Can you help me, Rafael Goulart? i did like you. When i use terminal everything run like i want but the web use php code nothing work and no error. I don’t know how to fix it

    • Hi Hung
      check the path by var_dump(getcwd()) then u will get idea

  • I turned this into an Ansible task. https://gist.github.com/jdewit/9038743

    Thanks Rafael.

  • Johna647

    I appreciate you sharing this article.Thanks Again. Really Cool. gaacefefggee

  • This is what i search for. thanks for sharing, i will try it first

  • G. Plante

    Hi,

    I edited the daemon script of Paolo to that it works fine with Centos 6.5 and OpenOffice. I though it could help other people!

    #!/bin/sh
    ### BEGIN INIT INFO
    # Provides: unoconvd
    # Required-Start: $network
    # Required-Stop: $network
    # Default-Start: 2 3 5
    # Default-Stop:
    # Description: unoconvd – Converting documents to PDF by unoconv
    ### END INIT INFO
    MYPID=$(pidof /usr/lib64/openoffice.org3/program/soffice.bin)
    case “$1” in
    status)
    if [ `ps ax | grep “/usr/lib64/openoffice.org3/program/soffice.bin” | grep “port=2002” | wc -l` -gt 0 ]; then
    echo “Unoconv listener active. pid=$MYPID”
    else
    echo “Unoconv listener inactive”
    fi
    ;;
    start)
    if [ `ps ax | grep “/usr/lib64/openoffice.org3/program/soffice.bin” | grep “port=2002” | wc -l` -gt 0 ]; then
    echo “Unoconv listener already started.”
    else
    /usr/bin/unoconv –listener &
    echo “Unoconv listener started. pid=$MYPID”
    fi
    ;;
    stop)
    killall soffice.bin
    ;;
    restart)
    killall soffice.bin
    sleep 1
    /usr/bin/unoconv –listener &
    sleep 1
    MYPID=$(pidof /usr/lib64/openoffice.org3/program/soffice.bin)
    echo “Unoconv listener re-started. pid=$MYPID”
    ;;
    esac

  • Ravinesh

    I want to convert PDF to doc it is possible with unoconv

  • jagmohan

    Hi,

    I want to install unoconv + libreoffice on Heroku. Do you have some suggestions on how I go about it?

    Thanks!

  • Pingback: Errors while working on Certificate Generation System – Rupinderjit Kaur()

  • Jaderson

    It converts complex documents with, images and tables easily?

    • Nabil Aldhaleai

      Is that a question or a statement ?