<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Blog of a LAMP Developer based near Guildford, Surrey &#187; convert DOC to PDF</title>
	<atom:link href="http://www.lampdeveloper.co.uk/tag/convert-doc-to-pdf/feed" rel="self" type="application/rss+xml" />
	<link>http://www.lampdeveloper.co.uk</link>
	<description>A day in the life of a Lamp Developer</description>
	<lastBuildDate>Wed, 18 Aug 2010 14:44:13 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Converting a Doc to PDF, txt or HTML using PHP and Linux</title>
		<link>http://www.lampdeveloper.co.uk/linux/converting-doc-to-pdf-txt-or-html-using-php-and-linux.html</link>
		<comments>http://www.lampdeveloper.co.uk/linux/converting-doc-to-pdf-txt-or-html-using-php-and-linux.html#comments</comments>
		<pubDate>Fri, 06 Mar 2009 12:46:15 +0000</pubDate>
		<dc:creator>Jamie</dc:creator>
				<category><![CDATA[Cakephp]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[cakephp component]]></category>
		<category><![CDATA[convert DOC to PDF]]></category>
		<category><![CDATA[unoconv]]></category>

		<guid isPermaLink="false">http://www.lampdeveloper.co.uk/?p=16</guid>
		<description><![CDATA[This has been an issue that has bothered me for a while. I finally found a solution that worked and doesn&#8217;t kill your server in the process.
I give to two words. OpenOffice, or is that one ?
This is what I&#8217;m running for this test:
OS: CentOS release 5.2 (Final)
PHP: PHP 5.2.8
Openoffice 1.2.3
Firstly I installed several programs [...]]]></description>
			<content:encoded><![CDATA[<p>This has been an issue that has bothered me for a while. I finally found a solution that worked and doesn&#8217;t kill your server in the process.</p>
<p>I give to two words. OpenOffice, or is that one ?</p>
<p>This is what I&#8217;m running for this test:</p>
<p>OS: CentOS release 5.2 (Final)<br />
PHP: PHP 5.2.8<br />
Openoffice 1.2.3</p>
<p>Firstly I installed several programs using yum. You will need to use DAG&#8217;s repo:</p>
<p>rpm -Uhv <a href="http://apt.sw.be/redhat/el5/en/x86_64/rpmforge/RPMS//rpmforge-release-0.3.6-1.el5.rf.x86_64.rpm">http://apt.sw.be/redhat/el5/en/x86_64/rpmforge/RPMS//rpmforge-release-0.3.6-1.el5.rf.x86_64.rpm</a></p>
<p>yum install unoconv openoffice.org-headless openoffice.org-writer</p>
<p>unoconv is a handy tool that can be run as a demon and talk to the open office binary, via the command line.</p>
<p>In order to run the commands via apache you need to change the apache home directory and make it writable.</p>
<p>mkdir /home/apache<br />
chown apache:apache /home/apache<br />
usermod -d /home/apache apache<br />
chmd 755 /home/apache</p>
<p>Now the apache user can create the hidden .openoffice.org2.0 directory.</p>
<p>With the setup out of the wa,y we need to start the open office deamon.</p>
<p>I did this as root but you could start this as apache.</p>
<p>unoconv &#8211;listener &amp;</p>
<p>This basically creates  the following deamon</p>
<p>soffice.bin -nologo -nodefault -accept=socket,host=localhost,port=2002;urp;StarOffice.ComponentContext</p>
<p>You can now send requests to port 2002 using unoconv</p>
<p><code>/usr/bin/unoconv --server localhost --port 2002 --stdout -f pdf input.doc</code></p>
<p>This will output the PDF file to the stdout.</p>
<p>Here is a cakephp component that I wrote to talk to unoconv. Please note this is very alpha and has only had a small amount of testing but works <img src='http://www.lampdeveloper.co.uk/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  If you want to use it you must create these directories in your cake install.</p>
<p>&#8216;TMP_FOLDER&#8217;, TMP . &#8216;filegenerator/&#8217;<br />
ROOT . &#8216;/uploads/generatedpdfs/&#8217;<br />
ROOT . &#8216;/uploads/docfiles/&#8217;</p>
<p>It can be used via a form upload</p>
<pre name="code" class="php">

$this-&gt;Filegenerator = new FilegeneratorComponent ($this-&gt;params[&quot;form&quot;][&#039;uploaddocfile&#039;]);
// if the filegenerator did all it&#039;s magic ok then process
if($this-&gt;Filegenerator){

// returns the text version of the PDF
$text = $this-&gt;Filegenerator-&gt;convertDocToTxt();
// returns the html of the PDF
$html = $this-&gt;Filegenerator-&gt;convertDocToHtml();
// returns the generated pdf file
$pdf = $this-&gt;Filegenerator-&gt;convertDocToPdf($doc_id);

}
</pre>
<p>The component called filegenerator.php</p>
<pre name="code" class="php">

&lt;?php
/**
* Class Used to convert files.
*@author jamiescott.net
*/
class FilegeneratorComponent extends Object {

// input folder types
private $allowable_files = array (&#039;application/msword&#039; =&gt; &#039;doc&#039; );
// variable set if the constuctor loaded correctly.
private $pass = false;
// store the file info from constuctor reference
private $fileinfo;

/**
* Enter description here...
*
* @param array $fileinfo
* Expected :
* (
[name] =&gt; test.doc
[type] =&gt; application/msword
[tmp_name] =&gt; /Applications/MAMP/tmp/php/php09PYNO
[error] =&gt; 0
[size] =&gt; 79360
)
*
*
* @return unknown
*/
function __construct($fileinfo) {

// folder to process all the files etc
define ( &#039;TMP_FOLDER&#039;, TMP . &#039;filegenerator/&#039; . $this-&gt;generatefoldername () . &#039;/&#039; );

// where unoconv is installed
define ( &#039;UNOCONV_PATH&#039;, &#039;/usr/bin/unoconv&#039; );
// where to store pdf files
define ( &#039;PDFSTORE&#039;, ROOT . &#039;/uploads/generatedpdfs/&#039; );
// where to store doc files
define ( &#039;DOCSTORE&#039;, ROOT . &#039;/uploads/docfiles/&#039; );
// apache home dir
define ( &#039;APACHEHOME&#039;, &#039;/home/apache&#039; );
// set some shell enviroment vars
putenv ( &quot;HOME=&quot;.APACHEHOME );
putenv ( &quot;PWD=&quot;.APACHEHOME );

// check the file info is passed the tmp file is there and the correct file type is set
// and the tmp folder could be created
if (is_array ( $fileinfo ) &amp;amp;amp;&amp;amp;amp; file_exists ( $fileinfo [&#039;tmp_name&#039;] ) &amp;amp;amp;&amp;amp;amp; in_array ( $fileinfo [&#039;type&#039;], array_keys ( $this-&gt;allowable_files ) ) &amp;amp;amp;&amp;amp;amp; $this-&gt;createtmp ()) {

// bass by reference
$this-&gt;fileinfo = &amp;amp;amp;$fileinfo;
// the constuctor ran ok
$this-&gt;pass = true;
// return true to the instantiation
return true;

} else {
// faild to instantiate
return false;

}

}

/**
*      * takes the file set in the constuctor and turns it into a pdf
* stores it in /uploads/docfiles and returns the filename
*
* @return filename if pdf was generated
*/
function convertDocToPdf($foldername=false) {

if ($this-&gt;pass) {

// generate a random name
$output_pdf_name = $this-&gt;generatefoldername () . &#039;.pdf&#039;;

// move it to the tmp folder for processing
if (! copy ( $this-&gt;fileinfo [&#039;tmp_name&#039;], TMP_FOLDER . &#039;input.doc&#039; ))
die ( &#039;Error copying the doc file&#039; );

$command = UNOCONV_PATH;
$args = &#039; --server localhost --port 2002 --stdout -f pdf &#039; . TMP_FOLDER . &#039;input.doc&#039;;

$run = $command . $args;

//echo $run; die;
$pdf = shell_exec ( $run );
$end_of_line = strpos ( $pdf, &quot;\n&quot; );
$start_of_file = substr ( $pdf, 0, $end_of_line );

if (! eregi ( &#039;%PDF&#039;, $start_of_file ))
die ( &#039;Error Generating the PDF file&#039; );

if(!file_exists(PDFSTORE.$foldername)){
mkdir(PDFSTORE.$foldername);
}

// file saved
if(!$this-&gt;_createandsave($pdf, PDFSTORE.&#039;/&#039;.$foldername.&#039;/&#039;, $output_pdf_name)){
die(&#039;Error Saving The PDF&#039;);
}

return $output_pdf_name;

}

}

/**
* Return a text version of the Doc
*
* @return unknown
*/
function convertDocToTxt() {

if ($this-&gt;pass) {

// move it to the tmp folder for processing
if (! copy ( $this-&gt;fileinfo [&#039;tmp_name&#039;], TMP_FOLDER . &#039;input.doc&#039; ))
die ( &#039;Error copying the doc file&#039; );

$command = UNOCONV_PATH;
$args = &#039; --server localhost --port 2002 --stdout -f txt &#039; . TMP_FOLDER . &#039;input.doc&#039;;

$run = $command . $args;

//echo $run; die;
$txt = shell_exec ( $run );

// guess that if there is less than this characters probably an error
if (strlen($txt) &lt; 10)
die ( &#039;Error Generating the TXT&#039; );

// return the txt from the PDF
return $txt;

}

}

/**
* Convert the do to heml and return the html
*
* @return unknown
*/
function convertDocToHtml() {

if ($this-&gt;pass) {

// move it to the tmp folder for processing
if (! copy ( $this-&gt;fileinfo [&#039;tmp_name&#039;], TMP_FOLDER . &#039;input.doc&#039; ))
die ( &#039;Error copying the doc file&#039; );

$command = UNOCONV_PATH;
$args = &#039; --server localhost --port 2002 --stdout -f html &#039; . TMP_FOLDER . &#039;input.doc&#039;;

$run = $command . $args;

//echo $run; die;
$html= shell_exec ( $run );
$end_of_line = strpos ( $html, &quot;\n&quot; );
$start_of_file = substr ( $html, 0, $end_of_line );

if (! eregi ( &#039;HTML&#039;, $start_of_file ))
die ( &#039;Error Generating the HTML&#039; );

// return the txt from the PDF
return $html;

}

}
/**
* Create file and store data
*
* @param unknown_type $data
* @param unknown_type $location
* @return unknown
*/
function _createandsave($data, $location, $file) {

if (is_writable ( $location )) {

// In our example we&#039;re opening $filename in append mode.
// The file pointer is at the bottom of the file hence
// that&#039;s where $somecontent will go when we fwrite() it.
if (! $handle = fopen ( $location.$file, &#039;w&#039; )) {
trigger_error(&quot;Cannot open file ($location$file)&quot;);
return false;
}

// Write $somecontent to our opened file.
if (fwrite ( $handle, $data ) === FALSE) {
trigger_error(&quot;Cannot write to file ($location$file)&quot;);
return false;
}

fclose ( $handle );
return true;

} else {
trigger_error(&quot;The file $location.$file is not writable&quot;);
return false;
}

}

function __destruct() {

// remove the tmp folder

if (file_exists ( TMP_FOLDER ) &amp;amp;amp;&amp;amp;amp; strlen ( TMP_FOLDER ) &gt; 4)
$this-&gt;removetmp ();

}

/**
* Create the tmp directory to hold and process the files
*
* @return unknown
*/
function createtmp() {

if (is_writable ( TMP )) {

if (mkdir ( TMP_FOLDER ))
return true;

} else {

return false;
}

return false;

}

/**
* Delete the tmp dir
*
* @return unknown
*/
function removetmp() {

if (strlen ( TMP_FOLDER ) &gt; 3 &amp;amp;amp;&amp;amp;amp; file_exists ( TMP_FOLDER )) {

if ($this-&gt;recursive_remove_directory ( TMP_FOLDER ))
return true;

}

return false;
}

/**
* Return a rendom string for the folder name
*
* @return unknown
*/
function generatefoldername() {

return md5 ( microtime () );

}

/**
* Recursivly delete directroy or empty it
*
* @param unknown_type $directory
* @param unknown_type $empty
* @return unknown
*/
function recursive_remove_directory($directory, $empty = FALSE) {
// if the path has a slash at the end we remove it here
if (substr ( $directory, - 1 ) == &#039;/&#039;) {
$directory = substr ( $directory, 0, - 1 );
}

// if the path is not valid or is not a directory ...
if (! file_exists ( $directory ) || ! is_dir ( $directory )) {
// ... we return false and exit the function
return FALSE;

// ... if the path is not readable
} elseif (! is_readable ( $directory )) {
// ... we return false and exit the function
return FALSE;

// ... else if the path is readable
} else {

// we open the directory
$handle = opendir ( $directory );

// and scan through the items inside
while ( FALSE !== ($item = readdir ( $handle )) ) {
// if the filepointer is not the current directory
// or the parent directory
if ($item != &#039;.&#039; &amp;amp;amp;&amp;amp;amp; $item != &#039;..&#039;) {
// we build the new path to delete
$path = $directory . &#039;/&#039; . $item;

// if the new path is a directory
if (is_dir ( $path )) {
// we call this function with the new path
recursive_remove_directory ( $path );

// if the new path is a file
} else {
// we remove the file
unlink ( $path );
}
}
}
// close the directory
closedir ( $handle );

// if the option to empty is not set to true
if ($empty == FALSE) {
// try to delete the now empty directory
if (! rmdir ( $directory )) {
// return false if not possible
return FALSE;
}
}
// return success
return TRUE;
}
}
}
?&gt;
</pre>
<p class="addtoany_share_save_container">
    <a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?sitename=Blog%20of%20a%20LAMP%20Developer%20based%20near%20Guildford%2C%20Surrey&amp;siteurl=http%3A%2F%2Fwww.lampdeveloper.co.uk%2F&amp;linkname=Converting%20a%20Doc%20to%20PDF%2C%20txt%20or%20HTML%20using%20PHP%20and%20Linux&amp;linkurl=http%3A%2F%2Fwww.lampdeveloper.co.uk%2Flinux%2Fconverting-doc-to-pdf-txt-or-html-using-php-and-linux.html"><img src="http://www.lampdeveloper.co.uk/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Save/Bookmark"/></a>

	</p>]]></content:encoded>
			<wfw:commentRss>http://www.lampdeveloper.co.uk/linux/converting-doc-to-pdf-txt-or-html-using-php-and-linux.html/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
