jmespathvsnested-lookup
JMESPath (pronounced “james path”) allows you to declaratively specify how to extract elements from a JSON document.
In web scraping, jmespath is a powerful tool for parsing and reshaping large JSON datasets. Jmespath is fast and easily extendible following it's own powerful query language.
For more see the Json parsing introduction section.
nested-lookup is a convenient way to parse multi-depth JSON documents which are often encountered in web scraping. Using nested-lookup we can easily extract deeply nested data-field just by providing key value.
The library provides a number of functions for searching and extracting data from nested dictionaries, including:
nested_lookup
: search for a key within a nested dictionary and returns the associated value.nested_update
: update a key-value pair within a nested dictionary.nested_has
: check if a key exists within a nested dictionary.nested_values
: returns all the values within a nested dictionary, including values within nested dictionaries.
The library is designed to be flexible and can work with dictionaries of any size and structure, making it a useful tool for working with complex and nested data structures.
Example Use
import jmespath
data = {
"data": {
"info": {
"products": [
{"price": {"usd": 1}, "_type": "product", "id": "123"},
{"price": {"usd": 2}, "_type": "product", "id": "345"}
]
}
}
}
# easily reshape nested dataset to flat structure:
jmespath.search("data.info.products[*].{id:id, price:price.usd}", data)
[{'id': '123', 'price': 1}, {'id': '345', 'price': 2}]
from nested_lookup import nested_lookup
my_document = {
"name" : "Rocko Ballestrini",
"email_address" : "test1@example.com",
"other" : {
"secondary_email" : "test2@example.com",
"EMAIL_RECOVERY" : "test3@example.com",
"email_address" : "test4@example.com",
},
}
# retrieving all keys can be useful in dataset overview
from nested_lookup import get_all_keys
get_all_keys(my_document)
['name', 'email_address', 'other', 'secondary_email', 'EMAIL_RECOVERY', 'email_address']
# key/value stats can also be useful for data overview:
from nested_lookup import get_occurrence_of_key, get_occurrence_of_value, get_occurrences_and_values
data = {"products": [{"category": "t-shirt"},{"category": "underwear"},{"category": "t-shirt"}]}
get_occurrence_of_key(data, key='category')
3
get_occurrence_of_value(data, value='t-shirt')
2
get_occurrences_and_values([data], "t-shirt") # count t-shirt products
{
't-shirt': {
'occurrences': 2,
'values': [{'category': 't-shirt'}, {'category': 't-shirt'}]
}
}
# it can also be used to delete/alter values:
from nested_lookup import nested_alter
data = {"products": [{"price": 10}, {"price": 14}]}
nested_alter(data, "price", lambda price: price * 1.4)
{'products': [{'price': 14.0}, {'price': 19.599999999999998}]}
nested_delete(data, "price")
{'products': [{}, {}]}