Help me avoid MS-Office lock-in

Forum: LinuxTotal Replies: 29
Author Content
Sander_Marechal

May 06, 2007
1:06 PM EDT
I need some help. I'm a fairly average in online discussions about the virtues on Free Software but I suck at it in real life because I have less time to think before I respond. To cut a long story short, on my new job they asked me to design and implement an automated report generating toolchain. Basically it suck in data from various sources, processes it and spits out PDF reports at the other end. The trick is this: sometimes the people need to add an extra bit of comment to a report, explaining a figure or graph, whatever.

I had it all figured out. The report templates were going to be LaTex, processed by Smarty (which did all the data processing) and pdflatex would create the final PDF's. I'd give everyone Lyx so they could easily change the LaTeX reports create a customized PDF report.

My bosses however thought this was a bad idea. Why give everyone Lyx and spend a whole of 10 minutes telling people how to edit a LaTeX document while they all have MS-Office 2000 already? So I'm supposed to find a way to create MS-Word files instead. They first great idea was to have Smarty generate ODF templates, then run everything through OpenOffice.org into MS-Office and finally through a doc2dpf conversion utility. After showing them the many incompatibilities I tried using HTML, RTF and a bunch of other formats to get stuff into Word but none suffices. Alternatively, they want something with a web interface. Yes, lets spend 80-100 hours developing a webinterface to edit LaTeX or whatever so we don't have to install Lyx at 5 computers.

Anyway, I need some arguments, soundbites, articles, etcetera to throw at these people. I'm pretty bad at discussing this in real life so next time I want to go in prepared. I have a hard time convincing them that using MS-Office in a high volume processing chain is a bad idea. That designing something like this and redesign it everytime MS-Office changes is a bad idea. That Words crappy output can't match the professional typesetting of LaTeX, etcetera, etcetera, etcetera.

PS: I've tried other formats besides LaTeX too. DocBook was scrapped because it's too inflexible (try putting an image (graph) next to a table). HTML gives me no control over headers, footers or pagination.ODF is too complicated to generate by hand in a simple, flexible way, as is XSL-FO, etcetera.
jimf

May 06, 2007
2:29 PM EDT
> they all have MS-Office 2000 already

Well there's your problem. They've already made up their minds. What I'm very much afraid of is that, if you push your solution against the obvious bias, you may be the one loosing in a very personal way. Sorry to have to say it, but, have your resume current...

If it were me, I'd be looking to develop reports in MS Access, as it will do the job and later can be transferred to mysql and knoda, without much work at all, when they finally decide to give up on the M$. Heck, with knoda even the forms, reports, and such would look the same.

The other thing you might want to implement is the use of ghostscript as a means to transfer any reports into pdf/ps. It's GPL on both platforms and the routine is simple.
Sander_Marechal

May 06, 2007
2:53 PM EDT
Quoting:Well there's your problem. They've already made up their minds.


That's the thing. I'm not too sure they have. I just think I promoted my solution quite badly.

MS access isn't the solution. Apart from the fact that I can't program VB, the thing is supposed to run on a Debian server. And the data is being sucked in through some very application-specific XMLRPC API's (currently I'm doing that with a combo of PHP and Python. PHP to parse the templates and figure out what data it needs for the reports, Python to run a multi-threaded XMLRPC daemon that does the actual collecting). I don't think Access is able to handle things like that.
jimf

May 06, 2007
3:20 PM EDT
> I can't program VB

Lol, I 'don't program' and I can still hack VB...

Seriously, then the problem is still the intersection, or lack of one, between Linux and MS. The obvious solution is to rid yourself of those pesky MS desktops and go for a Linux KDE desktop, but that presents other problems. Mainly resistance to change, but, that's a big one.

Despite the reluctance to change, most Business really dislike the MS Vista upgrade path for a multitude of reasons. You might point out to them that Linux is the obvious future, and, better to move on a complete solution now than wait till the situation becomes untenable or even critical.

Sander_Marechal

May 06, 2007
3:48 PM EDT
Well, I can hack VB too. Make Excel do some nice pony trick or some such, but I not enough to do any real programming in it.

As it turns out, there actually is a way to get good Word docs from PHP: Use a COM component from PHP on a Windows server, but even MS says that it's a risky approach, likely to crash your box if you're not careful (imagine loading 20 totally separate instances or Word... ugh).
jimf

May 06, 2007
3:58 PM EDT
I'm sure you can also get a word doc from OO too... Without all that messing around.
NoDough

May 06, 2007
5:45 PM EDT
Sander,

Here's the trick. Instead of focusing on how to do this without Office 2k, give them what they are asking for. Give them a solution using Office 2k. However, make it obvious that this is going to be PAINFUL!

"Yes, sir. We can do this is office 2k. However, the users will have to save the documents to this specific directory, with this specific naming convention, in this specific format. Oh, and if they accidentally change any options before saving, all bets are off. Although I would like to give them an instantaneous PDF file, I'm afraid that option won't work in this configuration. But the directory will be polled every 15minutes for new files, so they won't have to wait long."

"Oh! Yes, sir! I still have the Lyx routines."
Sander_Marechal

May 06, 2007
10:09 PM EDT
Well, it's probably true that if I figure out a way to do this in Offiec 2K, that I cannot export fully automated to PDF anymore. So in order to let the users edit the 1-5% of documents manually in Office, they would need to do all the PDF exporting themselves as well. There aren't any doc2pdf convertors that run on Linux that I am aware of (well.... OpenOffice.org maybe).
jimf

May 07, 2007
12:03 AM EDT
> OpenOffice.org maybe

maybe??? It's an File> 'export to PDF' on the menu as long as I can remember....
Sander_Marechal

May 07, 2007
1:20 AM EDT
Yes, but I wonder if scripting OOo to become an automated batch converter is the smartest thing to do. It didn't seem to work out too well for doc->odf converters. Not to mention layout fidelity from importing doc's in OOo. It usually works, but not always.

Is there perhaps some other document format that's not DOC or LaTeX that would be good for automated reports and that exports well to PDF (with things like page headers, page breaks and all)?
jimf

May 07, 2007
1:32 AM EDT
I was doing doc to pdf conversions in OO for a while. It seemed to do an excelent jobs with page headers, page breaks and all. Perhaps even better than with the native odt format, but, I don't claim to be an expert on this, and, I wasn't using many graphics and such.

I know that Don uses OO extensively. perhaps he has some input?
jimf

May 07, 2007
3:15 AM EDT
I don't know if this pertains, but reading the latest from Distrowatch they mention a word processor named LyX. Being curious, I looked in the Debian repos and sure'nough, it was there.

I haven't really even begun to play with it, but this thing looks to be nothing short of 'amazing'. All the advantages of latex and none of the pain. At very least sander, it's a great tool for you. I also think that ALL who write for LXer 'MUST' take a look at it.
Sander_Marechal

May 07, 2007
3:52 AM EDT
jimf: Look back at my first post. I was already using LyX. It's a great tool, though it does not have all of LaTeX's felibility (try putting a table next to an image. It's quite a pain). the PDFs it creates are nothing short of stunning though.

In the mean time I've discovered JOOReports (http://jooreports.sourceforge.net/). I'm reading up about it, but it looks nice and is reasonably flexible. I'm, trying to figure out how I would fit it in the workflow. Their demo mentions that you can run OOo in "service mode" in which it can act as a serverside document translator. Interesting!
dcparris

May 07, 2007
7:37 AM EDT
Sander, I published "Penguin in the Pew" using OpenOffice.org. I did some of the typing in Word during down time at work. The formatting and bulk of the typing was all OOWriter. I exported to PDF, and published through Lulu.com. I only ran into one problem with a font that didn't work with PDF format for some reason - a fellow OOo user ran my doc through a perl script and found the text using that font. I had been experimenting with different fonts, and couldn't find the instance of this particular one anywhere. After he found it, I was able to change it.

That was the only hiccup. Everything else worked like a charm. Going against the grain of what others feel about OOo, I very much enjoy Writer, and it's document navigator is teh cool when working with long docs.
jimf

May 07, 2007
8:13 AM EDT
> I was already using LyX

sorry, missed that :(

> Going against the grain of what others feel about OOo

Lol, at least it now looks good enough so I don't just run from it.
Sander_Marechal

May 07, 2007
2:09 PM EDT
dcparris: My main beef with OOo after playing with it all day is that I cannot export nested tables to MS-Office. I still have to figure out some way of getting my API hooks in there as well.

Currently I use the Smarty templating engine with some custom plugins. For example, writing the following piece of code generates a PNG or SVG graph for me:

{graph type="SalesTotal" from="last week" to="today"}

I simply put a placeholder image in the document and put the smarty code in the "alternative text" box. I haven't figured out how to do the same for tables yes. I don't like JOOReport's convulted way of doing that. Idealy I'd create a table in OOo with a header row and a data row, style it whatever way I want and then add the smarty code somewhere. But there's to table equivalent to an images "alternative text" to put the code in.

Anyway, more fiddling with that tomorrow. I'm happier than I was yesterday. If this pans out when it might become a nice vehicle to peddle OOo in my company because I will of course be keeping the report archives in ODF and not DOC.
dcparris

May 07, 2007
2:18 PM EDT
Sorry. I just can't help with nested tables. Someone on the OOo users list may be able to offer advice on that.
Sander_Marechal

May 07, 2007
2:40 PM EDT
I searched. It simply can't be done.... yet. OOo only recently gained nested tables itself and can now sort-of import them from Word, but exporting them is still an issue as of 2.2.
dcparris

May 07, 2007
2:54 PM EDT
Oh well.
robntina

May 07, 2007
7:53 PM EDT
Here's my thoughts.

Mr (Boss), Obviously how your business is presented is very important. Being efficient, productive, and profitable are also important.

You want every aspect of your business to function predictably, efficiently, and consistently. This should include any documents that are produced by your company. Formatting, (the design and the presentation) should be professionally done and can be automated, allowing the business to provide a consistent experience to your clients, employees, and suppliers; everyone with whom your business communicates, both inside the company and out. The results need to be predictable, consistent, and re-produceable by anybody who writes the documents without them having to think about it much.

You should only have to pay people to write content. Our experience with using Word to write our documents is that in practice, each person often spends the majority of their time formatting each document they write to look how they want it to look instead of being focused on the content. This not only affects productivity, it also produces an inconsistent result as each writer has their own idea of what each document should look like. Because of these observations, and considering what we are trying to accomplish, Word may be our least effective option even if already paid for.

I believe you have an opportunity to eliminate or at least greatly reduce this loss of productivity, and at the same time produce a more consistent, professional result. The highest cost you pay is not the price of the programs or even the training to use them, the highest cost is time you pay for the guy using it. Knowing this, using a program like Lyx is a good choice for us because by design the content and the presentation of that content are separated. The writers focus on content, everything else, including the presentation and formatting, is automated resulting in a higher quality, more consistent, more professional, and more profitable result.
devnet

May 07, 2007
9:24 PM EDT
make sure you let them know that it will cost them to get it done through MS Office...they love hearing that.

Then let them know that it won't cost them anything to do it the way you want to do it. Money speaks to them more than anything else in the world.
Sander_Marechal

May 08, 2007
2:32 PM EDT
devnet, they already know that. Why do you think they gave me a Debian Etch server instead of a Windows server? Money.

Anyway, I did send by boss on a nice wild goose chase. After I showed the initial OOo->Word quality (table-in-table problems, etcetera) he blamed OOo. I told them to blame MS because they keep the specs closed. He vehemently disagreed and said that the specs were open. I said "Okay. Get them to me and I'll implement 100% file fidelity OOo". I think he stopped searching now :-)

Nevertheless, the OOo route is actually looking quite promising. Some things are definitely easier to do this way as opposed to the LaTeX/LyX route (report layout for example). Others are harder (template parsing through the XML DOM). Plus I found a way to keep MS Office out of it (go ODF all the way and let the users sort it out conversion themselves through a separate conversion service). Might be a good vehicle to get OOo on the company's desktops.
dcparris

May 08, 2007
2:46 PM EDT
It just might. Keep us posted!
devnet

May 08, 2007
3:06 PM EDT
Sander,

Even when they know, they don't know. You have to keep reminding them. :)

Always good to remind them as much as possible.
henke54

May 12, 2007
3:25 AM EDT
Quoting:However, Microsoft criticised ODF for not having backwards compatibility with Microsoft products. "ODF is not coming from the standpoint of preserving previous documents, which is a bit like saying there's no room for PDF if you have .doc," said Strange.

But the ODF Alliance's Marcich said that OXML itself is not compatible with some Microsoft products. "Even for Microsoft, there's no such thing as 100 percent backwards compatibility — it's not guaranteed in Office 2007," said Marcich. "So to say backwards compatibility is a weakness of ODF is somewhat odd."
http://news.zdnet.co.uk/software/0,1000000121,39287024,00.ht...

;-P
Sander_Marechal

May 12, 2007
4:11 AM EDT
henke: Nice article. Post it to the queue already! We've said so multiple times. Else I will start posting them and take credit for it ;-)

Anyway, to give you guys an update: The project looks quite promising. My original attempt to create Smarty templates out of ODF files failed due to a nasty circular loop. In order to parse the Smarty tags I needed to load the ODF file into XML DOM. But smarty doesn't store it's files as text or XML but as executable PHP that produces the XML when run. Of course running the PHP means it's parsing the tags. **error: infinite loop detected**.

I threw out Smarty and went with XSLT. I found a way to create XSLT documents out of ODF documents from within OOo by specifying commands in a way very similar to JODReports but more generic and extensible (see http://www.artofsolving.com/opensource/jodreports). So if I now open up an ODF file and look at content.xml I get something like:

<text:p>some text</text:p> <table:table>     <xslt:for-each select="/table/row">         <table:table-row>             <table:table-cell>                 <text:p><xslt:value-of select="row/col1"/></text:p>             </table:table-cell>             <table:table-cell>                 <text:p><xslt:value-of select="row/col2"/></text:p>             </table:table-cell>         </table:table-row>     </xslt:for-each> </table:table>

And the best thing is that I can still edit the template with OOo! Just combine that with an XSLT processor and an XML containing data and you're done! It's pretty sweet and works well so far. When I'm done I hope to convince my boss to release it under an open source license, just like my previous boss let me. It's all pretty generic stuff, nothing specific to our branch of business that a competitor could directly profit from.
henke54

May 12, 2007
6:55 AM EDT
>Else I will start posting them and take credit for it ;-)

no problemo... i even would like that ;-)
Sander_Marechal

May 14, 2007
12:52 PM EDT
I have a request for you knowledgable folk. I plan to draw the boundaries of my application at the processing level. When there's an ODF report, my app is done and there will be an archive full of ODF's. I want to set up a separate conversion service so the users can do all the converting they need. Something like right-clicking on an ODT ot DOC in windows and getting the options "convert to DOC", "convert to PDF", etcetera. Does any of you know of an application that does this? Or a conversion server/service for it?

If I can't find any I'll wrap up headless OOo with JODConverter inside an XML-RPC API and see how rusty my Win32 API skills have become. An ungly prospect :-/
Geekosaur

May 24, 2007
8:46 AM EDT
I was just reading the thread for a small bit, but something caught my eye. Is there not an application on Sourceforge called Doc2pdf?. Now it may have been tried and turfed by you, but it does exist and appears to work through email, which may do it for you.
Sander_Marechal

May 24, 2007
3:14 PM EDT
Geekosaur: Thanks for that. doc2pdf could be one of the many gears that my application needs. It's probably better to do doc->pdf instead of doc->odf->pdf.

You cannot post until you login.