adventures in cross-disciplinary collaboration, part 27: typesetting

One challenging thing about building bridges to the sociolinguistics community from the computer science world is that publication methods in sociolinguistics are… traditional. The open-access movement hasn’t made many inroads (language@internet is a great exception — please comment if I’m missing others), and many journals require microsoft word format for submissions. I find this pretty surprising in a discipline that involves a considerable amount of formal notation, both linguistic and mathematical.

Anyway, I convinced my co-authors to take the path of writing the document in latex and then converting right before submission — by promising that I would manage the conversion. And now the chickens have come home to roost. We’ve got a fairly complicated 45-page document, with all the usual stuff: equations, figures, tables, references, etc. I’ve spent the morning tracking down various forum posts about how to get as many of these features to survive the conversion as possible, with mixed success. Here’s what I’ve figured out so far:

latex2rtf is the current winner. It did a good job with citations, got some of the references, messed up all the math. make sure to update to version 2.3.3, not the 1.9.19 that is default with ubuntu.

my command line: latex2rtf main

pandoc lost all document-level formatting, citations, and references. But, it did a nice job on equations. I may create the main document from latex2rtf and then copy in the equations from pandoc.

my command line: pandoc -f latex -t odt -o main.odt main.tex

tex4ht gets a good recommendation here, but for me it generates a blank output

my command line: mk4ht oolatex main.tex

latex2html was advertised here, but I can’t get it into odt and anyway it doesn’t get any of the equations for me.

my command line: latex2html main.tex -split 0 -no_navigation -info “” -address “” -html_version 4.0,unicode