Deutsch Open-Source Weblog Resources Imprint

Python, Zope, Plone — Development and Consulting

External converters

This documents explains the dependencies from external converters that are used to convert binary formats like Word, Excel etc. to plain text.

TextIndexNG supports a registry for external converters wrapped into a Python class to convert a document or an object to text before it gets passed to the splitter. The converter is selected based on the mime-type and the extension of the object.

Supported formats

If you are on Linux then most converters can be installed using the corresponding package manager e.g. apt-get install catdoc ppthtml.

ALL CONVERTERS MUST BE IN THE EXECUTABLE SEARCHPATH $PATH OR WHATEVER. THEY MUST BE CALLABLE THROUGH PYTHON'S os.open() OR os.popen() call.

If you upload files to Zope, CMF or Plone you must ensure that the content_type property of the object is set properly to the corresponding mimetype e.g. application/pdf if your content is PDF. This setting is extremely important otherwise TextIndexNG may not determine the type of your file and can not choose the required converter.

Artikelaktionen

Kundenstimme

Andreas Jung arbeitet seit vielen Jahren erfolgreich mit uns zusammen. Gemeinsam mit unserem Entwicklungsteam wurden zahlreiche state-of-the-Art Software-Lösungen auf Basis von Zope-Technologie im Bereich Electronic-Publishing erstellt.

Dr. Hans Georg Osthof, Verlagsleiter Electronic Publishing, Haufe Mediengruppe

 
ZOPYX Ltd., Charlottenstr. 37/1, D-72070 Tübingen, Germany
Phone +49(0)70 71/79 33 76, Fax +49(0)70 71/7 93 68 40, Email: info@zopyx.com
Contact form Callback service Print page