object-scanvsnested-lookup
object-scan allows traversal of complex javascript objects to find specific keys.
In web scraping, it's useful for parsing large, nested JSON datasets for specific datafields. object-scan can be used to recursively find any key in any object structure:
import objectScan from 'object-scan';
const haystack = { a: { b: { c: 'd' }, e: { f: 'g' } } };
objectScan(['a.*.f'], { joined: true })(haystack);
// => [ 'a.e.f' ]
nested-lookup is a convenient way to parse multi-depth JSON documents which are often encountered in web scraping. Using nested-lookup we can easily extract deeply nested data-field just by providing key value.
The library provides a number of functions for searching and extracting data from nested dictionaries, including:
nested_lookup
: search for a key within a nested dictionary and returns the associated value.nested_update
: update a key-value pair within a nested dictionary.nested_has
: check if a key exists within a nested dictionary.nested_values
: returns all the values within a nested dictionary, including values within nested dictionaries.
The library is designed to be flexible and can work with dictionaries of any size and structure, making it a useful tool for working with complex and nested data structures.
Example Use
const objectScan = require('object-scan');
const myNestedObject = {
level1: {
level2: {
level3: {
myTargetKey: 'value',
},
},
},
};
const searchTerm = 'myTargetKey';
const result = objectScan([`**.${searchTerm}`], { joined: false })(myNestedObject);
console.log(result);
from nested_lookup import nested_lookup
my_document = {
"name" : "Rocko Ballestrini",
"email_address" : "test1@example.com",
"other" : {
"secondary_email" : "test2@example.com",
"EMAIL_RECOVERY" : "test3@example.com",
"email_address" : "test4@example.com",
},
}
# retrieving all keys can be useful in dataset overview
from nested_lookup import get_all_keys
get_all_keys(my_document)
['name', 'email_address', 'other', 'secondary_email', 'EMAIL_RECOVERY', 'email_address']
# key/value stats can also be useful for data overview:
from nested_lookup import get_occurrence_of_key, get_occurrence_of_value, get_occurrences_and_values
data = {"products": [{"category": "t-shirt"},{"category": "underwear"},{"category": "t-shirt"}]}
get_occurrence_of_key(data, key='category')
3
get_occurrence_of_value(data, value='t-shirt')
2
get_occurrences_and_values([data], "t-shirt") # count t-shirt products
{
't-shirt': {
'occurrences': 2,
'values': [{'category': 't-shirt'}, {'category': 't-shirt'}]
}
}
# it can also be used to delete/alter values:
from nested_lookup import nested_alter
data = {"products": [{"price": 10}, {"price": 14}]}
nested_alter(data, "price", lambda price: price * 1.4)
{'products': [{'price': 14.0}, {'price': 19.599999999999998}]}
nested_delete(data, "price")
{'products': [{}, {}]}