API¶
This part of the documentation covers all the interfaces of the Docx Utils Library.
Exceptions¶
Exception hierarchy for the docx-utils package.
Docx to flat XML converter¶
This converter is inspired from Eric White’s article: Transforming Open XML Documents to Flat OPC Format.
This post describes the process of conversion of an Open XML (OPC) document into a Flat OPC document, and presents the C# function, OpcToFlat.
The function opc_to_flat_opc()
is used to convert
an Open XML document (.docx, .xlsx, .pptx) into a flat OPC format (.xml).
-
class
docx_utils.flatten.
ContentTypes
[source]¶ ContentTypes contained in a “[Content_Types].xml” file.
-
NS
= {'ct': u'http://schemas.openxmlformats.org/package/2006/content-types'}¶
-
-
class
docx_utils.flatten.
PackagePart
(uri, content_type, data)¶ -
content_type
¶ Alias for field number 1
-
data
¶ Alias for field number 2
-
uri
¶ Alias for field number 0
-
-
docx_utils.flatten.
iter_package
(opc_path, on_error='ignore')[source]¶ Iterate a Open XML document and yield the package parts.
Parameters: - opc_path (str) – Microsoft Office document to read (.docx, .xlsx, .pptx)
- on_error (str) –
control the way errors are handled when a part URI cannot be resolved:
- ’ignore”: ignore the part,
- ’strict’: raise an exception.
Returns: Iterator which yield package parts
Raises: UnknownContentTypeError – if a part URI cannot be resolved.
-
docx_utils.flatten.
opc_to_flat_opc
(src_path, dst_path, on_error='ignore')[source]¶ Convert an Open XML document into a flat OPC format.
Parameters: - src_path (str) – Microsoft Office document to convert (.docx, .xlsx, .pptx)
- dst_path (str) – Microsoft Office document converted into flat OPC format (.xml)
- on_error (str) –
control the way errors are handled when a part URI cannot be resolved:
- ’ignore”: ignore the part,
- ’strict’: raise an exception.