cardinal_pythonlib.pdf
Original code copyright (C) 2009-2022 Rudolf Cardinal (rudolf@pobox.com).
This file is part of cardinal_pythonlib.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Support functions to generate (and serve) PDFs.
- class cardinal_pythonlib.pdf.PdfPlan(is_html: bool = False, html: str | None = None, header_html: str | None = None, footer_html: str | None = None, wkhtmltopdf_filename: str | None = None, wkhtmltopdf_options: Dict[str, Any] | None = None, is_filename: bool = False, filename: str | None = None)[source]
Class to describe a PDF on disk or the information required to create the PDF from HTML.
- Parameters:
is_html¶ – use HTML mode?
html¶ – for HTML mode, the main HTML
header_html¶ – for HTML mode, an optional page header (in HTML)
footer_html¶ – for HTML mode, an optional page footer (in HTML)
wkhtmltopdf_filename¶ – filename of the
wkhtmltopdf
executablewkhtmltopdf_options¶ – options for
wkhtmltopdf
is_filename¶ – use file mode?
filename¶ – for file mode, the filename of the existing PDF on disk
Use either
is_html
oris_filename
, not both.
- cardinal_pythonlib.pdf.append_memory_pdf_to_writer(input_pdf: bytes, writer: PdfWriter, start_recto: bool = True) None [source]
Appends a PDF (as bytes in memory) to a pypdf writer.
- cardinal_pythonlib.pdf.append_pdf(input_pdf: bytes, output_writer: PdfWriter)[source]
Appends a PDF to a pyPDF writer. Legacy interface.
- cardinal_pythonlib.pdf.assert_processor_available(processor: str) None [source]
Assert that a specific PDF processor is available.
- Parameters:
processor¶ – a PDF processor type from
Processors
- Raises:
AssertionError – if bad
processor
RuntimeError – if requested processor is unavailable
- cardinal_pythonlib.pdf.get_concatenated_pdf_from_disk(filenames: Iterable[str], start_recto: bool = True) bytes [source]
Concatenates PDFs from disk and returns them as an in-memory binary PDF.
- cardinal_pythonlib.pdf.get_concatenated_pdf_in_memory(pdf_plans: Iterable[PdfPlan], start_recto: bool = True) bytes [source]
Concatenates PDFs and returns them as an in-memory binary PDF.
- cardinal_pythonlib.pdf.get_default_fix_pdfkit_encoding_bug() bool [source]
Should we be trying to fix a
pdfkit
encoding bug, by default?- Returns:
should we? Yes if we have the specific buggy version of
pdfkit
.
- cardinal_pythonlib.pdf.get_pdf_from_html(html: str, header_html: str | None = None, footer_html: str | None = None, wkhtmltopdf_filename: str | None = None, wkhtmltopdf_options: Dict[str, Any] | None = None, file_encoding: str = 'utf-8', debug_options: bool = False, debug_content: bool = False, debug_wkhtmltopdf_args: bool = True, fix_pdfkit_encoding_bug: bool | None = None, processor: str = 'pdfkit') bytes [source]
Takes HTML and returns a PDF.
See the arguments to
make_pdf_from_html()
(excepton_disk
).- Returns:
the PDF binary as a
bytes
object
- cardinal_pythonlib.pdf.make_pdf_from_html(on_disk: bool, html: str, output_path: str | None = None, header_html: str | None = None, footer_html: str | None = None, wkhtmltopdf_filename: str | None = None, wkhtmltopdf_options: Dict[str, Any] | None = None, file_encoding: str = 'utf-8', debug_options: bool = False, debug_content: bool = False, debug_wkhtmltopdf_args: bool = True, fix_pdfkit_encoding_bug: bool | None = None, processor: str = 'pdfkit') bytes | bool [source]
Takes HTML and either returns a PDF in memory or makes one on disk.
For preference, uses
wkhtmltopdf
(withpdfkit
):faster than
xhtml2pdf
tables not buggy like
Weasyprint
however, doesn’t support CSS Paged Media, so we have the
header_html
andfooter_html
options to allow you to pass appropriate HTML content to serve as the header/footer (rather than passing it within the main HTML).
- Parameters:
on_disk¶ – make file on disk (rather than returning it in memory)?
html¶ – main HTML
output_path¶ – if
on_disk
, the output filenameheader_html¶ – optional page header, as HTML
footer_html¶ – optional page footer, as HTML
wkhtmltopdf_filename¶ – filename of the
wkhtmltopdf
executablewkhtmltopdf_options¶ – options for
wkhtmltopdf
file_encoding¶ – encoding to use when writing the header/footer to disk
debug_options¶ – log
wkhtmltopdf
config/options passed topdfkit
?debug_content¶ – log the main/header/footer HTML?
debug_wkhtmltopdf_args¶ – log the final command-line arguments to that will be used by
pdfkit
when it callswkhtmltopdf
?fix_pdfkit_encoding_bug¶ – attempt to work around bug in e.g.
pdfkit==0.5.0
by encodingwkhtmltopdf_filename
to UTF-8 before passing it topdfkit
? If you passNone
here, then a default value is used, fromget_default_fix_pdfkit_encoding_bug()
.processor¶ – a PDF processor type from
Processors
- Returns:
the PDF binary as a
bytes
object- Raises:
AssertionError – if bad
processor
RuntimeError – if requested processor is unavailable
- cardinal_pythonlib.pdf.make_pdf_on_disk_from_html(html: str, output_path: str, header_html: str | None = None, footer_html: str | None = None, wkhtmltopdf_filename: str | None = None, wkhtmltopdf_options: Dict[str, Any] | None = None, file_encoding: str = 'utf-8', debug_options: bool = False, debug_content: bool = False, debug_wkhtmltopdf_args: bool = True, fix_pdfkit_encoding_bug: bool | None = None, processor: str = 'pdfkit') bool [source]
Takes HTML and writes a PDF to the file specified by
output_path
.See the arguments to
make_pdf_from_html()
(excepton_disk
).- Returns:
success?
- cardinal_pythonlib.pdf.pdf_from_html(html: str, header_html: str | None = None, footer_html: str | None = None, wkhtmltopdf_filename: str | None = None, wkhtmltopdf_options: Dict[str, Any] | None = None, file_encoding: str = 'utf-8', debug_options: bool = False, debug_content: bool = False, fix_pdfkit_encoding_bug: bool = True, processor: str = 'pdfkit') bytes [source]
Older function name for
get_pdf_from_html()
(q.v.).
- cardinal_pythonlib.pdf.pdf_from_writer(writer: PdfWriter) bytes [source]
Extracts a PDF (as binary data) from a pypdf writer object.