cardinal_pythonlib.pdf


Original code copyright (C) 2009-2022 Rudolf Cardinal (rudolf@pobox.com).

This file is part of cardinal_pythonlib.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.


Support functions to generate (and serve) PDFs.

class cardinal_pythonlib.pdf.PdfPlan(is_html: bool = False, html: str | None = None, header_html: str | None = None, footer_html: str | None = None, wkhtmltopdf_filename: str | None = None, wkhtmltopdf_options: Dict[str, Any] | None = None, is_filename: bool = False, filename: str | None = None)[source]

Class to describe a PDF on disk or the information required to create the PDF from HTML.

Parameters:
  • is_html – use HTML mode?

  • html – for HTML mode, the main HTML

  • header_html – for HTML mode, an optional page header (in HTML)

  • footer_html – for HTML mode, an optional page footer (in HTML)

  • wkhtmltopdf_filename – filename of the wkhtmltopdf executable

  • wkhtmltopdf_options – options for wkhtmltopdf

  • is_filename – use file mode?

  • filename – for file mode, the filename of the existing PDF on disk

Use either is_html or is_filename, not both.

add_to_writer(writer: PdfWriter, start_recto: bool = True) None[source]

Add the PDF described by this class to a PDF writer.

Parameters:
  • writer – a pypdf.PdfWriter

  • start_recto – start a new right-hand page?

class cardinal_pythonlib.pdf.Processors[source]

Class to enumerate possible PDF processors.

cardinal_pythonlib.pdf.append_memory_pdf_to_writer(input_pdf: bytes, writer: PdfWriter, start_recto: bool = True) None[source]

Appends a PDF (as bytes in memory) to a pypdf writer.

Parameters:
  • input_pdf – the PDF, as bytes

  • writer – the writer

  • start_recto – start a new right-hand page?

cardinal_pythonlib.pdf.append_pdf(input_pdf: bytes, output_writer: PdfWriter)[source]

Appends a PDF to a pyPDF writer. Legacy interface.

cardinal_pythonlib.pdf.assert_processor_available(processor: str) None[source]

Assert that a specific PDF processor is available.

Parameters:

processor – a PDF processor type from Processors

Raises:
  • AssertionError – if bad processor

  • RuntimeError – if requested processor is unavailable

cardinal_pythonlib.pdf.get_concatenated_pdf_from_disk(filenames: Iterable[str], start_recto: bool = True) bytes[source]

Concatenates PDFs from disk and returns them as an in-memory binary PDF.

Parameters:
  • filenames – iterable of filenames of PDFs to concatenate

  • start_recto – start a new right-hand page for each new PDF?

Returns:

concatenated PDF, as bytes

cardinal_pythonlib.pdf.get_concatenated_pdf_in_memory(pdf_plans: Iterable[PdfPlan], start_recto: bool = True) bytes[source]

Concatenates PDFs and returns them as an in-memory binary PDF.

Parameters:
  • pdf_plans – iterable of PdfPlan objects

  • start_recto – start a new right-hand page for each new PDF?

Returns:

concatenated PDF, as bytes

cardinal_pythonlib.pdf.get_default_fix_pdfkit_encoding_bug() bool[source]

Should we be trying to fix a pdfkit encoding bug, by default?

Returns:

should we? Yes if we have the specific buggy version of pdfkit.

cardinal_pythonlib.pdf.get_pdf_from_html(html: str, header_html: str | None = None, footer_html: str | None = None, wkhtmltopdf_filename: str | None = None, wkhtmltopdf_options: Dict[str, Any] | None = None, file_encoding: str = 'utf-8', debug_options: bool = False, debug_content: bool = False, debug_wkhtmltopdf_args: bool = True, fix_pdfkit_encoding_bug: bool | None = None, processor: str = 'pdfkit') bytes[source]

Takes HTML and returns a PDF.

See the arguments to make_pdf_from_html() (except on_disk).

Returns:

the PDF binary as a bytes object

cardinal_pythonlib.pdf.make_pdf_from_html(on_disk: bool, html: str, output_path: str | None = None, header_html: str | None = None, footer_html: str | None = None, wkhtmltopdf_filename: str | None = None, wkhtmltopdf_options: Dict[str, Any] | None = None, file_encoding: str = 'utf-8', debug_options: bool = False, debug_content: bool = False, debug_wkhtmltopdf_args: bool = True, fix_pdfkit_encoding_bug: bool | None = None, processor: str = 'pdfkit') bytes | bool[source]

Takes HTML and either returns a PDF in memory or makes one on disk.

For preference, uses wkhtmltopdf (with pdfkit):

  • faster than xhtml2pdf

  • tables not buggy like Weasyprint

  • however, doesn’t support CSS Paged Media, so we have the header_html and footer_html options to allow you to pass appropriate HTML content to serve as the header/footer (rather than passing it within the main HTML).

Parameters:
  • on_disk – make file on disk (rather than returning it in memory)?

  • html – main HTML

  • output_path – if on_disk, the output filename

  • header_html – optional page header, as HTML

  • footer_html – optional page footer, as HTML

  • wkhtmltopdf_filename – filename of the wkhtmltopdf executable

  • wkhtmltopdf_options – options for wkhtmltopdf

  • file_encoding – encoding to use when writing the header/footer to disk

  • debug_options – log wkhtmltopdf config/options passed to pdfkit?

  • debug_content – log the main/header/footer HTML?

  • debug_wkhtmltopdf_args – log the final command-line arguments to that will be used by pdfkit when it calls wkhtmltopdf?

  • fix_pdfkit_encoding_bug – attempt to work around bug in e.g. pdfkit==0.5.0 by encoding wkhtmltopdf_filename to UTF-8 before passing it to pdfkit? If you pass None here, then a default value is used, from get_default_fix_pdfkit_encoding_bug().

  • processor – a PDF processor type from Processors

Returns:

the PDF binary as a bytes object

Raises:
  • AssertionError – if bad processor

  • RuntimeError – if requested processor is unavailable

cardinal_pythonlib.pdf.make_pdf_on_disk_from_html(html: str, output_path: str, header_html: str | None = None, footer_html: str | None = None, wkhtmltopdf_filename: str | None = None, wkhtmltopdf_options: Dict[str, Any] | None = None, file_encoding: str = 'utf-8', debug_options: bool = False, debug_content: bool = False, debug_wkhtmltopdf_args: bool = True, fix_pdfkit_encoding_bug: bool | None = None, processor: str = 'pdfkit') bool[source]

Takes HTML and writes a PDF to the file specified by output_path.

See the arguments to make_pdf_from_html() (except on_disk).

Returns:

success?

cardinal_pythonlib.pdf.make_pdf_writer() PdfWriter[source]

Creates and returns a pypdf writer.

cardinal_pythonlib.pdf.pdf_from_html(html: str, header_html: str | None = None, footer_html: str | None = None, wkhtmltopdf_filename: str | None = None, wkhtmltopdf_options: Dict[str, Any] | None = None, file_encoding: str = 'utf-8', debug_options: bool = False, debug_content: bool = False, fix_pdfkit_encoding_bug: bool = True, processor: str = 'pdfkit') bytes[source]

Older function name for get_pdf_from_html() (q.v.).

cardinal_pythonlib.pdf.pdf_from_writer(writer: PdfWriter) bytes[source]

Extracts a PDF (as binary data) from a pypdf writer object.

cardinal_pythonlib.pdf.serve_pdf_to_stdout(pdf: bytes) None[source]

Serves a PDF to stdout (for web servers).

Writes a Content-Type: application/pdf header and then the PDF to stdout.

See: