Good Word doc -> plain text conversion
    Jerry Natowitz 
    j.natowitz-KealBaEQdz4 at public.gmane.org
       
    Sun Sep 19 13:14:23 EDT 2010
    
    
  
Never used it, but Text::Extract::Word on CPAN.
	Jerry Natowitz
	j.natowitz (at) rcn.com
David Kramer wrote:
> On 09/19/2010 03:38 PM, jc-8FIgwK2HfyJMuWfdjsoA/w at public.gmane.org wrote:
>> Anyone here have advice on programs (scriptable and  usable
>> on linux) that convert Word docs to plain text?
>>
>> I've been googling, of course, but most of the  things  I'm
>> finding start with "1.  Load the file into Word". This is a
>> good clue that the scheme  probably  can't  be  used  in  a
>> script that's running on a linux system.  ;-)
> 
> If you want an automated solution. how about writing it in Java?
> 
> http://poi.apache.org/
> The Apache POI Project's mission is to create and maintain Java APIs for
> manipulating various file formats based upon the Office Open XML
> standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2).
> In short, you can read and write MS Excel files using Java. In addition,
> you can read and write MS Word and MS PowerPoint files using Java.
> Apache POI is your Java Excel solution (for Excel 97-2008). We have a
> complete API for porting other OOXML and OLE2 formats and welcome others
> to participate.
> 
> OLE2 files include most Microsoft Office files such as XLS, DOC, and PPT
> as well as MFC serialization API based file formats. The project
> provides APIs for the OLE2 Filesystem (POIFS) and OLE2 Document
> Properties (HPSF).
> 
> Here are some other solutions:
> http://www.linux.com/archive/feed/52385
> _______________________________________________
> Discuss mailing list
> Discuss-mNDKBlG2WHs at public.gmane.org
> http://lists.blu.org/mailman/listinfo/discuss
> 
    
    
More information about the Discuss
mailing list