jmespathvsnested-lookup

MIT 57 2 2,211

219.0 million (month) Feb 09 2022 1.0.1(2 years ago)

209 2 - Public Domain

Feb 09 2022 485.5 thousand (month) 0.2.25(2 years ago)

JMESPath (pronounced “james path”) allows you to declaratively specify how to extract elements from a JSON document.

In web scraping, jmespath is a powerful tool for parsing and reshaping large JSON datasets. Jmespath is fast and easily extendible following it's own powerful query language.

For more see the Json parsing introduction section.

nested-lookup is a convenient way to parse multi-depth JSON documents which are often encountered in web scraping. Using nested-lookup we can easily extract deeply nested data-field just by providing key value.

The library provides a number of functions for searching and extracting data from nested dictionaries, including:

nested_lookup: search for a key within a nested dictionary and returns the associated value.
nested_update: update a key-value pair within a nested dictionary.
nested_has: check if a key exists within a nested dictionary.
nested_values: returns all the values within a nested dictionary, including values within nested dictionaries.

The library is designed to be flexible and can work with dictionaries of any size and structure, making it a useful tool for working with complex and nested data structures.

Example Use

import jmespath

data = {
    "data": {
        "info": {
            "products": [
                {"price": {"usd": 1}, "_type": "product", "id": "123"}, 
                {"price": {"usd": 2}, "_type": "product", "id": "345"}
            ]
        }
    }
}

# easily reshape nested dataset to flat structure:
jmespath.search("data.info.products[*].{id:id, price:price.usd}", data)
[{'id': '123', 'price': 1}, {'id': '345', 'price': 2}]

from nested_lookup import nested_lookup

my_document = {
   "name" : "Rocko Ballestrini",
   "email_address" : "test1@example.com",
   "other" : {
       "secondary_email" : "test2@example.com",
       "EMAIL_RECOVERY" : "test3@example.com",
       "email_address" : "test4@example.com",
    },
}

# retrieving all keys can be useful in dataset overview
from nested_lookup import get_all_keys
get_all_keys(my_document)
['name', 'email_address', 'other', 'secondary_email', 'EMAIL_RECOVERY', 'email_address']

# key/value stats can also be useful for data overview: 
from nested_lookup import get_occurrence_of_key, get_occurrence_of_value, get_occurrences_and_values
data = {"products": [{"category": "t-shirt"},{"category": "underwear"},{"category": "t-shirt"}]}

get_occurrence_of_key(data, key='category')
3
get_occurrence_of_value(data, value='t-shirt')
2
get_occurrences_and_values([data], "t-shirt")  # count t-shirt products
{
  't-shirt': {
    'occurrences': 2,
    'values': [{'category': 't-shirt'}, {'category': 't-shirt'}]
    }
  }

# it can also be used to delete/alter values:
from nested_lookup import nested_alter
data = {"products": [{"price": 10}, {"price": 14}]}

nested_alter(data, "price", lambda price: price * 1.4)
{'products': [{'price': 14.0}, {'price': 19.599999999999998}]}

nested_delete(data, "price")
{'products': [{}, {}]}