jmespathvsxhtml2pdf
JMESPath (pronounced “james path”) allows you to declaratively specify how to extract elements from a JSON document.
In web scraping, jmespath is a powerful tool for parsing and reshaping large JSON datasets. Jmespath is fast and easily extendible following it's own powerful query language.
For more see the Json parsing introduction section.
xhtml2pdf is a Python library that allows you to convert HTML and CSS documents to PDF files. It is built on top of ReportLab, a powerful PDF generation library for Python.
xhtml2pdf makes it easy to convert HTML and CSS documents to PDF by using ReportLab's powerful layout engine to handle the rendering of the document.
The library supports a wide variety of HTML and CSS features, including tables, lists, images, and links. It also supports several popular CSS frameworks such as Bootstrap and Foundation.
To use xhtml2pdf, you first need to install it via pip by running `pip install xhtml2pdf``. Once it is installed, you can use the xhtml2pdf.pisa.pisaDocument() function to convert an HTML file to a PDF.
Example Use
import jmespath
data = {
"data": {
"info": {
"products": [
{"price": {"usd": 1}, "_type": "product", "id": "123"},
{"price": {"usd": 2}, "_type": "product", "id": "345"}
]
}
}
}
# easily reshape nested dataset to flat structure:
jmespath.search("data.info.products[*].{id:id, price:price.usd}", data)
[{'id': '123', 'price': 1}, {'id': '345', 'price': 2}]
from xhtml2pdf import pisa
with open('input.html', 'r') as html_file:
html = html_file.read()
with open('output.pdf', 'wb') as pdf_file:
pisa.pisaDocument(html, pdf_file)