Convertendo documentos (ODT, DOC to PDF) no PHP com Unoconv / LibreOffice

O problema

Converter arquivos .doc, .odt, .docx para .pdf, ou entre si (.docx para .odt, por exemplo)
com o PHP.

Para solucionar o problema instalaremos o Unoconv,
ferramentas de linha de comando do LibreOffice e construiremos uma classe em PHP.

Instalando o unoconv

Unoconv é uma ferramenta em Python que utiliza as bibliotecas do LibreOffice (pyuno).

Instalando LibreOffice linha de comando

No servidor, não é necessário instalar todo o LibreOffice,
apenas a linha de comando e conversores que estão no core.

Para Ubuntu/Debian:

apt-get install openjdk-6-jdk libreoffice-core libreoffice-common libreoffice-writer python-uno

Importante:

Ao instalar o libreoffice-writer tem-se suporte à conversão de documentos de TEXTO.
Para converter outros formatos (planilhas, apresentações, imagens, etc), instale o pacote correspondente
ao aplicativo do LibreOffice.

Para imagens, considere ferramentas mais leves e sedimentadas como ImageMagick;

Para converter PDF para Texto, considere o PDF to Text, encontrado no pacote Poppler-Utils

Instalando o Unoconv

Como root

cd /tmp
git clone https://github.com/dagwieers/unoconv
cd unoconv/
make install
cd ../
rm -rf unoconv/
unoconv --listener &

Isto inicia o LibreOffice/OpenOffice como um serviço rodando sobre uma porta local,
conforme checamos com um ps aux | grep soffice.

Observações:

  • Para operações em massa este modo é mais interessante pois reaproveita
    a instância aberta do LibreOffice/OpenOffice.

  • O unoconv existe no repositório Debian, mas numa versão antiga. É um script em python
    que aproveita a biblioteca pyuno disponibilizada pela LibreOffice/OpenOffice

Para ver os formatos suportados

unoconv --show

Criando um Deamon

Para deamonizer o unoconv (mais prático no servidor), crie o arquivo /etc/init.d/unoconvd
com o conteúdo abaixo:

( Fonte )

#!/bin/sh
### BEGIN INIT INFO
# Provides: unoconvd
# Required-Start: $network
# Required-Stop: $network
# Default-Start: 2 3 5
# Default-Stop:
# Description: unoconvd - Converting documents to PDF by unoconv
### END INIT INFO
case "$1" in
    start)
        /usr/bin/unoconv --listener &
        ;;
    stop)
        killall soffice.bin
        ;;
    restart)
        killall soffice.bin
        sleep 1
        /usr/bin/unoconv --listener &
        ;;
esac

Ajuste permissões, coloque para carregamento no boot
e rode o daemon:

chmod 755 /etc/init.d/unoconvd
update-rc.d  unoconvd defaults
service unoconvd start

Uso básico

Iniciando manualmente ou com o daemon é possível utilizar como segue:

unoconv --format pdf --output /OUTPUT_DIR/ file.odt

Isto irá converter o arquivo file.odt para file.pdf no diretório de saída informado.

Classe PHP

Um wrapper simples em PHP pode ser escrito como abaixo:

<?php

namespace Unoconv;

/**
 * Unoconv class wrapper
 *
 * @author Rafael Goulart <rafaelgou@gmail.com>
 * @see http://tech.rgou.net/
 */
class Unoconv {

    /**
     * Basic converter method
     * 
     * @param string $originFilePath Origin File Path
     * @param string $toFormat       Format to export To
     * @param string $outputDirPath  Output directory path
     */
    public static function convert($originFilePath, $outputDirPath, $toFormat)
    {
        $command = 'unoconv --format %s --output %s %s';
        $command = sprintf($command, $toFormat, $outputDirPath, $originFilePath);
        system($command, $output);

        return $output;
    }

    /**
     * Convert to PDF
     * 
     * @param string $originFilePath Origin File Path
     * @param string $outputDirPath  Output directory path
     */
    public static function convertToPdf($originFilePath, $outputDirPath)
    {
        return self::convert($originFilePath, $outputDirPath, 'pdf');
    }

    /**
     * Convert to TXT
     * 
     * @param string $originFilePath Origin File Path
     * @param string $outputDirPath  Output directory path
     */
    public static function convertToTxt($originFilePath, $outputDirPath)
    {
        return self::convert($originFilePath, $outputDirPath, 'txt');
    }

}

Exemplo de uso:

<?php
/**
 * Sample use of Unoconv class
 *  
 */
require 'Unoconv.php';

use Unoconv\Unoconv;

// Converting to PDF
$originFilePath = 'test.odt';
$outputDirPath  = './';
Unoconv::convertToPdf($originFilePath, $outputDirPath);

// Converting to DOCX
$originFilePath = 'test.odt';
$outputDirPath  = './';
Unoconv::convert($originFilePath, $outputDirPath, 'docx');
  • I have tried your code. While I have no errors, I also do not get back any pdf file.

    My ubuntu is 12.10. I installed unoconv exactly the way you described.

    But I can only execute the unoconv successfully in terminal and not in PHP script.

    Please advise.

    • Hi, @kimsia,

      The soffice command line dies silently on errors. That makes debugging really hard.

      In my tests, there are two major problems that I’d faced:

      – Missing of libreoffice requirements.
      – Permissions stuffs.

      Maybe you can try:
      apt-get install libreoffice-java-common

      since conversion is Java based. I don’t know if you it’s already installed as a dependecy.

      []’
      Rafael

      • My unoconv works in the terminal but not at php script.

        I have already installed libreoffice-java-common.

        when i logged in as terminal, I am using the same user www-data as the nginx user.

        Please advise.

        • If it would help anyone in PHP you can just use the command “unoconv –format /path/and/file.odt pdf” it will save it in the save directory.
          the destination parameter seem to have issue, havent test it yet 😀

  • If it’s of any use to anyone, I also had an issue where I could execute unoconv via PHP. In my case, it was because I was running Nginx web server with PHP-FPM

    Nginx doesn’t declare a path to execute scripts, so you need to manually declare it.

    In this case, I added the following right before the sprintf command and it worked.

    putenv(‘PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/opt/node/bin’);

    You may of cause also declare it in your FastCGI params in Nginx

    • lark1n

      Thank you, it work for me 🙂

  • A better script:

    ### BEGIN INIT INFO
    # Provides: unoconvd
    # Required-Start: $network
    # Required-Stop: $network
    # Default-Start: 2 3 4 5
    # Default-Stop: 0 1 6
    # Description: unoconvd – Converting documents to PDF by unoconv
    ### END INIT INFO

    #!/bin/sh
    case “$1” in
    status)
    if [ `ps ax | grep “/usr/lib/libreoffice/program/soffice.bin” | grep “accept=socket,host=127.0.0.1,port=2002;urp;StarOffice.ComponentContext” | wc -l` -gt 0 ]; then
    echo “Unoconv listener active”
    else
    echo “Unoconv listnere inactive”
    fi
    ;;
    start)
    if [ `ps ax | grep “/usr/lib/libreoffice/program/soffice.bin” | grep “accept=socket,host=127.0.0.1,port=2002;urp;StarOffice.ComponentContext” | wc -l` -gt 0 ]; then
    echo “Unoconv listener already started.”
    else
    /usr/bin/unoconv –listener &
    echo “Unoconv listener started.”
    fi
    ;;
    stop)
    if [ `ps ax | grep “/usr/lib/libreoffice/program/soffice.bin” | grep “accept=socket,host=127.0.0.1,port=2002;urp;StarOffice.ComponentContext” | wc -l` -gt 0 ]; then
    killall soffice.bin
    echo “Unoconv listener stopped.”
    else
    echo “Unoconv isn’t running.”
    fi
    ;;
    restart)
    $0 stop
    sleep 1
    $0 start
    ;;
    *)
    echo “Usage: /etc/init.d/unoconvd {start|stop|restart|status}”
    exit 1
    ;;
    esac

  • Hi Rafael. Could you please tell me if I can check converting progress and how?

  • Hi!! Can you help me, Rafael Goulart? i did like you. When i use terminal everything run like i want but the web use php code nothing work and no error. I don’t know how to fix it

    • Hi Hung
      check the path by var_dump(getcwd()) then u will get idea

  • I turned this into an Ansible task. https://gist.github.com/jdewit/9038743

    Thanks Rafael.

  • Johna647

    I appreciate you sharing this article.Thanks Again. Really Cool. gaacefefggee

  • This is what i search for. thanks for sharing, i will try it first

  • G. Plante

    Hi,

    I edited the daemon script of Paolo to that it works fine with Centos 6.5 and OpenOffice. I though it could help other people!

    #!/bin/sh
    ### BEGIN INIT INFO
    # Provides: unoconvd
    # Required-Start: $network
    # Required-Stop: $network
    # Default-Start: 2 3 5
    # Default-Stop:
    # Description: unoconvd – Converting documents to PDF by unoconv
    ### END INIT INFO
    MYPID=$(pidof /usr/lib64/openoffice.org3/program/soffice.bin)
    case “$1” in
    status)
    if [ `ps ax | grep “/usr/lib64/openoffice.org3/program/soffice.bin” | grep “port=2002” | wc -l` -gt 0 ]; then
    echo “Unoconv listener active. pid=$MYPID”
    else
    echo “Unoconv listener inactive”
    fi
    ;;
    start)
    if [ `ps ax | grep “/usr/lib64/openoffice.org3/program/soffice.bin” | grep “port=2002” | wc -l` -gt 0 ]; then
    echo “Unoconv listener already started.”
    else
    /usr/bin/unoconv –listener &
    echo “Unoconv listener started. pid=$MYPID”
    fi
    ;;
    stop)
    killall soffice.bin
    ;;
    restart)
    killall soffice.bin
    sleep 1
    /usr/bin/unoconv –listener &
    sleep 1
    MYPID=$(pidof /usr/lib64/openoffice.org3/program/soffice.bin)
    echo “Unoconv listener re-started. pid=$MYPID”
    ;;
    esac

  • Ravinesh

    I want to convert PDF to doc it is possible with unoconv

  • jagmohan

    Hi,

    I want to install unoconv + libreoffice on Heroku. Do you have some suggestions on how I go about it?

    Thanks!

  • Pingback: Errors while working on Certificate Generation System – Rupinderjit Kaur()

  • Jaderson

    It converts complex documents with, images and tables easily?