Deutsch Open-Source Weblog Resources zopyx group Imprint

Python, Zope, Plone — Development and Consulting

External converters

This documents explains the dependencies from external converters that are used to convert binary formats like Word, Excel etc. to plain text.

TextIndexNG supports a registry for external converters wrapped into a Python class to convert a document or an object to text before it gets passed to the splitter. The converter is selected based on the mime-type and the extension of the object.

Supported formats

If you are on Linux then most converters can be installed using the corresponding package manager e.g. apt-get install catdoc ppthtml.

ALL CONVERTERS MUST BE IN THE EXECUTABLE SEARCHPATH $PATH OR WHATEVER. THEY MUST BE CALLABLE THROUGH PYTHON'S os.open() OR os.popen() call.

If you upload files to Zope, CMF or Plone you must ensure that the content_type property of the object is set properly to the corresponding mimetype e.g. application/pdf if your content is PDF. This setting is extremely important otherwise TextIndexNG may not determine the type of your file and can not choose the required converter.

Artikelaktionen

The Zope & Plone
Expert Network

 
ZOPYX Ltd., Charlottenstr. 37/1, D-72070 Tübingen, Germany
Phone +49(0)70 71/79 33 76, Fax +49(0)70 71/7 93 68 40, Email: info@zopyx.com
Contact form Callback service Print page