API

This part of the documentation covers all the interfaces of the Docx Utils Library.

Docx-utils Library

This library allow you to:

  • Convert Open XML document to flat OPC format.

Exceptions

Exception hierarchy for the docx-utils package.

exception docx_utils.exceptions.DocxUtilsException[source]

Base exception of the docx-utils package.

exception docx_utils.exceptions.UnknownContentTypeError(opc_path, uri)[source]

Exception raised during Microsoft Office document parsing when a part can’t be resolved.

fmt = "Cannot parse the Microsoft Office document '{opc_path}': the content-type of the part '{uri}' is unknown"
opc_path
uri

Command line interface (CLI)

Overview

This module defines the main command line interface (CLI).

Docx to flat XML converter

This converter is inspired from Eric White’s article: Transforming Open XML Documents to Flat OPC Format.

This post describes the process of conversion of an Open XML (OPC) document into a Flat OPC document, and presents the C# function, OpcToFlat.

The function opc_to_flat_opc() is used to convert an Open XML document (.docx, .xlsx, .pptx) into a flat OPC format (.xml).

class docx_utils.flatten.ContentTypes[source]

ContentTypes contained in a “[Content_Types].xml” file.

NS = {'ct': u'http://schemas.openxmlformats.org/package/2006/content-types'}
parse_xml_data(data)[source]
resolve(part_name)[source]
class docx_utils.flatten.PackagePart(uri, content_type, data)
content_type

Alias for field number 1

data

Alias for field number 2

uri

Alias for field number 0

docx_utils.flatten.iter_package(opc_path, on_error='ignore')[source]

Iterate a Open XML document and yield the package parts.

Parameters:
  • opc_path (str) – Microsoft Office document to read (.docx, .xlsx, .pptx)
  • on_error (str) –

    control the way errors are handled when a part URI cannot be resolved:

    • ’ignore”: ignore the part,
    • ’strict’: raise an exception.
Returns:

Iterator which yield package parts

Raises:

UnknownContentTypeError – if a part URI cannot be resolved.

docx_utils.flatten.opc_to_flat_opc(src_path, dst_path, on_error='ignore')[source]

Convert an Open XML document into a flat OPC format.

Parameters:
  • src_path (str) – Microsoft Office document to convert (.docx, .xlsx, .pptx)
  • dst_path (str) – Microsoft Office document converted into flat OPC format (.xml)
  • on_error (str) –

    control the way errors are handled when a part URI cannot be resolved:

    • ’ignore”: ignore the part,
    • ’strict’: raise an exception.