Skip to content

nested-lookupvsjsonpath-ng

Public Domain - 2 208
474.9 thousand (month) Feb 09 2022 0.2.25(1 year, 7 months ago)
517 5 63 Apache 2.0
1.6.1(a month ago) Feb 09 2022 30.8 million (month)

nested-lookup is a convenient way to parse multi-depth JSON documents which are often encountered in web scraping. Using nested-lookup we can easily extract deeply nested data-field just by providing key value.

The library provides a number of functions for searching and extracting data from nested dictionaries, including:

  • nested_lookup: search for a key within a nested dictionary and returns the associated value.
  • nested_update: update a key-value pair within a nested dictionary.
  • nested_has: check if a key exists within a nested dictionary.
  • nested_values: returns all the values within a nested dictionary, including values within nested dictionaries.

The library is designed to be flexible and can work with dictionaries of any size and structure, making it a useful tool for working with complex and nested data structures.

jsonpath-ng is a Python library for parsing and querying JSON data.
It is a powerful tool for extracting and manipulating data from JSON structures,

The library uses a syntax similar to XPath, which is a well-known language for querying and manipulating XML data, to query and extract data from JSON structures. This makes it familiar and intuitive for many developers who have worked with XML in the past. JSONPath implementation in Python.

JSONPath is a JSON query path language inspired by XPath (path language for querying XML/HTML). For more see the initial syntax proposal.

jsonpath-ng is implemented in pure Python and can be easily extended with additional python functions if needed. Most commonly used JSONPath feature in web scraping is the recursive key lookup ($..key) which is a convenient way to find specific data fields in large datasets.

Example Use


from nested_lookup import nested_lookup

my_document = {
   "name" : "Rocko Ballestrini",
   "email_address" : "test1@example.com",
   "other" : {
       "secondary_email" : "test2@example.com",
       "EMAIL_RECOVERY" : "test3@example.com",
       "email_address" : "test4@example.com",
    },
}

# retrieving all keys can be useful in dataset overview
from nested_lookup import get_all_keys
get_all_keys(my_document)
['name', 'email_address', 'other', 'secondary_email', 'EMAIL_RECOVERY', 'email_address']

# key/value stats can also be useful for data overview: 
from nested_lookup import get_occurrence_of_key, get_occurrence_of_value, get_occurrences_and_values
data = {"products": [{"category": "t-shirt"},{"category": "underwear"},{"category": "t-shirt"}]}

get_occurrence_of_key(data, key='category')
3
get_occurrence_of_value(data, value='t-shirt')
2
get_occurrences_and_values([data], "t-shirt")  # count t-shirt products
{
  't-shirt': {
    'occurrences': 2,
    'values': [{'category': 't-shirt'}, {'category': 't-shirt'}]
    }
  }

# it can also be used to delete/alter values:
from nested_lookup import nested_alter
data = {"products": [{"price": 10}, {"price": 14}]}

nested_alter(data, "price", lambda price: price * 1.4)
{'products': [{'price': 14.0}, {'price': 19.599999999999998}]}

nested_delete(data, "price")
{'products': [{}, {}]}
from jsonpath_ng import jsonpath, parse

# A robust parser, not just a regex. (Makes powerful extensions possible; see below)
jsonpath_expr = parse('foo[*].baz')

# Extracting values is easy
[match.value for match in jsonpath_expr.find({'foo': [{'baz': 1}, {'baz': 2}]})]
[1, 2]

# Matches remember where they came from
[str(match.full_path) for match in jsonpath_expr.find({'foo': [{'baz': 1}, {'baz': 2}]})]
['foo.[0].baz', 'foo.[1].baz']

# And this can be useful for automatically providing ids for bits of data that do not have them (currently a global switch)
jsonpath.auto_id_field = 'id'
[match.value for match in parse('foo[*].id').find({'foo': [{'id': 'bizzle'}, {'baz': 3}]})]
['foo.bizzle', 'foo.[1]']

# A handy extension: named operators like `parent`
[match.value for match in parse('a.*.b.`parent`.c').find({'a': {'x': {'b': 1, 'c': 'number one'}, 'y': {'b': 2, 'c': 'number two'}}})]
['number two', 'number one']

# You can also build expressions directly quite easily
from jsonpath_ng.jsonpath import Fields
from jsonpath_ng.jsonpath import Slice

jsonpath_expr_direct = Fields('foo').child(Slice('*')).child(Fields('baz'))  # This is equivalent

Alternatives / Similar