Python implementation notes¶

The Python implementation of JSON-delta consists of a package json_delta, whose top-level namespace is documented in The JSON-delta API. The implementation is divided into five sub-modules of the package, whose names all begin with an underscore to highlight the fact that they are not part of the API: the way the functions documented in The JSON-delta API are implemented is subject to refactoring at any time. Nevertheless, the sub-modules are documented here.

json_delta._diff¶

Functions for computing JSON-format diffs.

json_delta._diff.diff(left_struc, right_struc, array_align=True, compare_lengths=True, common_key_threshold=0.0, verbose=True, key=None)¶

Compose a sequence of diff stanzas sufficient to convert the structure left_struc into the structure right_struc. (Whether you can add ‘necessary and’ to ‘sufficient to’ depends on the setting of the other parms, and how many cycles you want to burn; see below).

Optional parameters:

array_align: Use needle_diff() to compute deltas between arrays. Computationally expensive, but likely to produce shorter diffs. If this parm is set to the string 'udiff', needle_diff() will optimize for the shortest udiff, instead of the shortest JSON-format diff. Otherwise, set to any value that is true in a Boolean context to enable.

compare_lengths: If [[key, right_struc]] can be encoded as a shorter JSON-string, return it instead of examining the internal structure of left_struc and right_struc. It involves calling json.dumps() twice for every node in the structure, but may result in smaller diffs.

common_key_threshold: Skip recursion into left_struc and right_struc if the fraction of keys they have in common (as computed by commonality(), which see) is less than this parm (which should be a float between 0.0 and 1.0).

verbose: Print compression statistics will be to stderr.

The parameter key is present because this function is mutually recursive with needle_diff() and keyset_diff(). If set to a list, it will be prefixed to every keypath in the output.

json_delta._diff.append_key(stanzas, left_struc, keypath=())¶

Get the appropriate key for appending to the sequence left_struc.

stanzas should be a diff, some of whose stanzas may modify a sequence left_struc that appears at path keypath. If any of the stanzas append to left_struc, the return value is the largest index in left_struc they address, plus one. Otherwise, the return value is len(left_struc) (i.e. the index that a value would have if it was appended to left_struc).

>>> append_key([], [])
0
>>> append_key([[[2], 'Baz']], ['Foo', 'Bar'])
3
>>> append_key([[[2], 'Baz'], [['Quux', 0], 'Foo']], [], ['Quux'])
1

json_delta._diff.commonality(left_struc, right_struc)¶

Return a float between 0.0 and 1.0 representing the amount that the structures left_struc and right_struc have in common.

Return value is computed as the fraction (elements in common) / (total elements).

json_delta._diff.compute_diff_stats(target, diff, percent=True)¶

Calculate the size of a minimal JSON dump of target and diff, and the ratio of the two sizes.

The ratio is expressed as a percentage if percent is True in a Boolean context , or as a float otherwise.

Return value is a tuple of the form ({ratio}, {size of target}, {size of diff})

>>> compute_diff_stats([{}, 'foo', 'bar'], [], False)
(0.125, 16, 2)
>>> compute_diff_stats([{}, 'foo', 'bar'], [[0], {}])
(50.0, 16, 8)

json_delta._diff.compute_keysets(left_seq, right_seq)¶

Compare the keys of left_seq vs. right_seq.

Determines which keys left_seq and right_seq have in common, and which are unique to each of the structures. Arguments should be instances of the same basic type, which must be a non-terminal: i.e. list or dict. If they are lists, the keys compared will be integer indices.

Returns:: Return value is a 3-tuple of sets ({overlap}, {left_only}, {right_only}). As their names suggest, overlap is a set of keys left_seq have in common, left_only represents keys only found in left_seq, and right_only holds keys only found in right_seq.
Raises:: AssertionError if left_seq is not an instance of type(right_seq), or if they are not of a non-terminal type.

>>> (compute_keysets({'foo': None}, {'bar': None})
...  == (set([]), {'foo'}, {'bar'}))
True
>>> (compute_keysets({'foo': None, 'baz': None},
...                  {'bar': None, 'baz': None})
...  == ({'baz'}, {'foo'}, {'bar'}))
True
>>> (compute_keysets(['foo', 'baz'], ['bar', 'baz'])
...  == ({0, 1}, set([]), set([])))
True
>>> compute_keysets(['foo'], ['bar', 'baz']) == ({0}, set([]), {1})
True
>>> compute_keysets([], ['bar', 'baz']) == (set([]), set([]), {0, 1})
True

json_delta._diff.diff(left_struc, right_struc, array_align=True, compare_lengths=True, common_key_threshold=0.0, verbose=True, key=None)

Compose a sequence of diff stanzas sufficient to convert the structure left_struc into the structure right_struc. (Whether you can add ‘necessary and’ to ‘sufficient to’ depends on the setting of the other parms, and how many cycles you want to burn; see below).

Optional parameters:

array_align: Use needle_diff() to compute deltas between arrays. Computationally expensive, but likely to produce shorter diffs. If this parm is set to the string 'udiff', needle_diff() will optimize for the shortest udiff, instead of the shortest JSON-format diff. Otherwise, set to any value that is true in a Boolean context to enable.

compare_lengths: If [[key, right_struc]] can be encoded as a shorter JSON-string, return it instead of examining the internal structure of left_struc and right_struc. It involves calling json.dumps() twice for every node in the structure, but may result in smaller diffs.

common_key_threshold: Skip recursion into left_struc and right_struc if the fraction of keys they have in common (as computed by commonality(), which see) is less than this parm (which should be a float between 0.0 and 1.0).

verbose: Print compression statistics will be to stderr.

The parameter key is present because this function is mutually recursive with needle_diff() and keyset_diff(). If set to a list, it will be prefixed to every keypath in the output.

json_delta._diff.keyset_diff(left_struc, right_struc, key, options={})¶

Return a diff between left_struc and right_struc.

It is assumed that left_struc and right_struc are both non-terminal types (serializable as arrays or objects). Sequences are treated just like mappings by this function, so the diffs will be correct but not necessarily minimal. For a minimal diff between two sequences, use needle_diff().

This function probably shouldn’t be called directly. Instead, use diff(), which will call keyset_diff() if appropriate anyway.

json_delta._diff.needle_diff(left_struc, right_struc, key, options={})¶

Returns a diff between left_struc and right_struc.

If left_struc and right_struc are both serializable as arrays, this function will use a Needleman-Wunsch sequence alignment to find a minimal diff between them. Otherwise, the inputs are passed on to keyset_diff().

This function probably shouldn’t be called directly. Instead, use diff(), which is mutually recursive with this function and keyset_diff() anyway.

json_delta._diff.sort_stanzas(stanzas)¶

Sort the stanzas in a diff.

Object changes can occur in any order, but deletions from arrays have to happen last node first: ['foo', 'bar', 'baz'] → ['foo', 'bar'] → ['foo'] → []; additions to arrays have to happen leftmost-node-first: [] → ['foo'] → ['foo', 'bar'] → ['foo', 'bar', 'baz'], and insert-and-shift alterations to arrays must happen last: ['foo', 'quux'] → ['foo', 'bar', 'quux'] → ['foo', 'bar', 'baz', 'quux'].

Finally, stanzas are sorted in descending order of length of keypath, so that the most deeply-nested structures are altered before alterations which might change their keypaths take place.

Note that this will also sort changes to objects (dicts) so that they occur first of all.

json_delta._diff.split_diff(stanzas)¶

Split a diff into modifications, deletions and insertions.

Return value is a 4-tuple of lists: the first is a list of stanzas from stanzas that modify JSON objects, the second is a list of stanzas that add or change elements in JSON arrays, the third is a list of stanzas which delete elements from arrays, and the fourth is a list of stanzas which insert elements into arrays (stanzas ending in "i").

json_delta._diff.structure_comparable(left_struc, right_struc)¶: Test if left_struc and right_struc can be efficiently diffed.

json_delta._diff.this_level_diff(left_struc, right_struc, key=None, common=None)¶

Return a sequence of diff stanzas between the structures left_struc and right_struc, assuming that they are each at the key-path key within the overall structure.

>>> (this_level_diff({'foo': 'bar', 'baz': 'quux'},
...                 {'foo': 'bar'})
...  == [[['baz']]])
True
>>> (this_level_diff({'foo': 'bar', 'baz': 'quux'},
...                 {'foo': 'bar'}, ['quordle'])
...  == [[['quordle', 'baz']]])
True

json_delta._patch¶

Functions for applying JSON-format patches.

json_delta._patch.patch(struc, diff, in_place=True)¶

Apply the sequence of diff stanzas diff to the structure struc.

By default, this function modifies struc in place; set in_place to False to return a patched copy of struc instead:

>>> will_change = [16]
>>> wont_change = [16]
>>> patch(will_change, [[[0]]])
[]
>>> will_change
[]
>>> patch(wont_change, [[[0]]], False)
[]
>>> wont_change
[16]

json_delta._patch.patch(struc, diff, in_place=True)

Apply the sequence of diff stanzas diff to the structure struc.

By default, this function modifies struc in place; set in_place to False to return a patched copy of struc instead:

>>> will_change = [16]
>>> wont_change = [16]
>>> patch(will_change, [[[0]]])
[]
>>> will_change
[]
>>> patch(wont_change, [[[0]]], False)
[]
>>> wont_change
[16]

json_delta._patch.patch_stanza(struc, stanza)¶

Applies the stanza stanza to the structure struc as a patch.

Note that this function modifies struc in-place into the target of stanza. If struc is a tuple(), you get a new tuple with the appropriate modification made:

>>> patch_stanza((17, 3.141593, None), [[1], 3.14159265])
(17, 3.14159265, None)

json_delta._udiff¶

Functions for computing udiffs. Main entry point: udiff().

The data structure representing a udiff that these functions all manipulate is a pair of lists of iterators (left_lines, right_lines). These lists are expected (principally by generate_udiff_lines(), which processes them), to be of the same length. A pair of iterators (left_lines[i], right_lines[i]) may yield exactly the same sequence of output lines, each with ' ' as the first character (representing parts of the structure the input and output have in common). Alternatively, they may each yield zero or more lines (referring to parts of the structure that are unique to the inputs they represent). In this case, all lines yielded by left_lines[i] should begin with '-', and all lines yielded by right_lines[i] should begin with '+'.

json_delta._udiff.udiff(left, right, patch=None, indent=0, use_ellipses=True, entry=True)¶

Render the difference between the structures left and right as a string in a fashion inspired by diff -u.

Generating a udiff is strictly slower than generating a normal diff with the same option parameters, since the udiff is computed on the basis of a normal diff between left and right. If such a diff has already been computed (e.g. by calling diff()), pass it as the patch parameter:

>>> (next(udiff({"foo": None}, {"foo": None}, patch=[])) ==
...  ' {...}')
True

As you can see above, structures that are identical in left and right are abbreviated using '...' by default. To disable this behavior, set use_ellipses to False.

>>> ('\n'.join(udiff({"foo": None}, {"foo": None},
...            patch=[], use_ellipses=False)) ==
... """ {
...  "foo":
...    null
...  }""")
True

>>> ('\n'.join(udiff([None, None, None], [None, None, None],
...             patch=[], use_ellipses=False)) ==
... """ [
...   null,
...   null,
...   null
...  ]""")
True

class json_delta._udiff.Gap¶: Class to represent gaps introduced by sequence alignment.

json_delta._udiff.add_matter(seq, matter, indent)¶

Add material to seq, treating it appropriately for its type.

matter may be an iterator, in which case it is appended to seq. If it is a sequence, it is assumed to be a sequence of iterators, the sequence is concatenated onto seq. If matter is a string, it is turned into a patch band using single_patch_band(), which is appended. Finally, if matter is None, an empty iterable is appended to seq.

This function is a udiff-forming primitive, called by more specific functions defined within udiff_dict() and udiff_list().

json_delta._udiff.commafy(gen, comma=True)¶

Yield from gen, ensuring that the final result ends with a comma iff comma is True.

>>> gen = ['Example line']
>>> next(commafy(iter(gen))) == 'Example line,'
True
>>> next(commafy(iter(gen), False)) == 'Example line'
True
>>> gen = ['Line with a comma at the end,']
>>> (next(commafy(iter(gen), comma=True))
...  == next(commafy(iter(gen), comma=False))
...  == 'Line with a comma at the end,')
True

json_delta._udiff.curry_functions(local_ns)¶

Create partials of _add_common_matter(), _add_differing_matter() and _commafy_last(), with values for left_lines, right_lines and (where appropriate) indent taken from the dictionary local_ns.

Appropriate defaults are also included in the partials, namely left=None and right=None for _add_differing_matter() and left_comma=True and right_comma=None for _commafy_last().

json_delta._udiff.generate_udiff_lines(left, right)¶: Combine the diff lines from left and right, and generate the lines of the resulting udiff.

json_delta._udiff.patch_bands(indent, material, sigil=u' ')¶: Generate appropriately indented patch bands, with sigil as the first character.

json_delta._udiff.reconstruct_alignment(left, right, stanzas)¶: Reconstruct the sequence alignment between the lists left and right implied by stanzas.

json_delta._udiff.single_patch_band(indent, line, sigil=u' ')¶: Convenience function returning an iterable that generates a single patch band.

json_delta._udiff.udiff(left, right, patch=None, indent=0, use_ellipses=True, entry=True)

Render the difference between the structures left and right as a string in a fashion inspired by diff -u.

Generating a udiff is strictly slower than generating a normal diff with the same option parameters, since the udiff is computed on the basis of a normal diff between left and right. If such a diff has already been computed (e.g. by calling diff()), pass it as the patch parameter:

>>> (next(udiff({"foo": None}, {"foo": None}, patch=[])) ==
...  ' {...}')
True

As you can see above, structures that are identical in left and right are abbreviated using '...' by default. To disable this behavior, set use_ellipses to False.

>>> ('\n'.join(udiff({"foo": None}, {"foo": None},
...            patch=[], use_ellipses=False)) ==
... """ {
...  "foo":
...    null
...  }""")
True

>>> ('\n'.join(udiff([None, None, None], [None, None, None],
...             patch=[], use_ellipses=False)) ==
... """ [
...   null,
...   null,
...   null
...  ]""")
True

json_delta._udiff.udiff_dict(left, right, stanzas, indent=0, use_ellipses=True)¶

Construct a human-readable delta between left and right.

This function probably shouldn’t be called directly. Instead, use udiff() with the same arguments. udiff() and udiff_dict() are mutually recursive, anyway.

json_delta._udiff.udiff_list(left, right, stanzas, indent=0, use_ellipses=True)¶

Construct a human-readable delta between left and right.

This function probably shouldn’t be called directly. Instead, use udiff() with the same arguments. udiff() and udiff_list() are mutually recursive, anyway.

json_delta._upatch¶

json_delta._upatch.upatch(struc, udiff, reverse=False, in_place=True)¶

Apply a patch as output by json_delta.udiff() to struc.

As with json_delta.patch(), struc is modified in place by default. Set the parm in_place to False if this is not the desired behaviour.

The udiff format has enough information in it that this transformation can be applied in reverse: i.e. if udiff is the output of udiff(left, right), you can reconstruct right given left and udiff (by running upatch(left, udiff)), or you can also reconstruct left given right and udiff (by running upatch(right, udiff, reverse=True)). This is not possible for JSON-format diffs, since a [keypath] stanza (meaning “delete the structure at keypath”) does not record what the deleted structure was.

json_delta._upatch.ellipsis_handler(jstring, point, key)¶: Extends key_tracker() to handle the … construction.

json_delta._upatch.is_none_key(key)¶: Is the last element of key None?

json_delta._upatch.reconstruct_diff(udiff, reverse=False)¶

Turn a udiff back into a JSON-format diff.

Set reverse to True to generate a reverse diff (i.e. swap the significance of line-initial + and -).

Header lines (if present) are ignored:

>>> udiff = """--- <stdin>
... +++ <stdin>
... -false
... +true"""
>>> reconstruct_diff(udiff)
[[[], True]]
>>> reconstruct_diff(udiff, reverse=True)
[[[], False]]

json_delta._upatch.skip_key(point, key, origin, keys, predicate)¶

Find the next result in keys for which predicate(key) is False.

If none is found, or if key is already such a result, the return value is (point, key).

json_delta._upatch.sort_stanzas(stanzas)¶

Sorts the stanzas in a diff.

reconstruct_diff() works on different assumptions from json_delta._diff.needle_diff() when it comes to stanzas altering arrays: keys in such stanzas relate to the element’s position within the array’s longest intermediate representation during the transformation (that is after all insert-and-shifts, after all appends, but before any deletions). This function sorts stanzas to reflect that order of operations.

As with json_delta._diff.sort_stanzas() (which see), stanzas are sorted for length so the most deeply-nested structures get their modifications first.

json_delta._upatch.udiff_key_tracker(udiff, point=0, start_key=None)¶: Find points within the udiff where the active keypath changes.

json_delta._upatch.upatch(struc, udiff, reverse=False, in_place=True)

Apply a patch as output by json_delta.udiff() to struc.

As with json_delta.patch(), struc is modified in place by default. Set the parm in_place to False if this is not the desired behaviour.

The udiff format has enough information in it that this transformation can be applied in reverse: i.e. if udiff is the output of udiff(left, right), you can reconstruct right given left and udiff (by running upatch(left, udiff)), or you can also reconstruct left given right and udiff (by running upatch(right, udiff, reverse=True)). This is not possible for JSON-format diffs, since a [keypath] stanza (meaning “delete the structure at keypath”) does not record what the deleted structure was.

json_delta._util¶

Utility functions and constants used by more than one submodule.

The majority of python 2/3 compatibility shims also appear in this module.

json_delta._util.predicate_count(iterable, predicate=lambda x: True)¶

Count items x in iterable such that predicate(x).

The default predicate is lambda x: True, so predicate_count(iterable) will count the values generated by iterable. Note that if the iterable is a generator, this function will exhaust it, and if it is an infinite generator, this function will never return!

>>> predicate_count([True] * 16)
16
>>> predicate_count([True, True, False, True, True], lambda x: x)
4

json_delta._util.uniquify(bytestring, key=lambda x: x)¶

Remove duplicate elements from a list while preserving order.

key works as for min(), max(), etc. in the standard library.

json_delta._util.sniff_encoding(bytestring, starts=JSON_STARTS, complete=True)¶

Determine the encoding of a UTF-x encoded string.

The argument starts must be a mapping of bytestrings the input can begin with onto the encoding that such a beginning would represent (see licit_starts() for a function that can build such a mapping).

The complete flag signifies whether the input represents the entire string: if it is set False, the function will attempt to determine the encoding, but will raise a UnicodeError if it is ambiguous. For example, an input of b'\xff\xfe' could be the UTF-16 little-endian byte-order mark, or, if the input is incomplete, it could be the first two characters of the UTF-32-LE BOM:

>>> sniff_encoding(b'\xff\xfe') == 'utf_16'
True
>>> sniff_encoding(b'\xff\xfe', complete=False)
Traceback (most recent call last):
    ...
UnicodeError: String encoding is ambiguous.

json_delta._util._load_and_func(func, parm1=None, parm2=None, both=None, **flags)¶: Decode JSON-serialized parameters and apply func to them.

json_delta._util.all_paths(struc)¶

Generate key-paths to every node in struc.

Both terminal and non-terminal nodes are visited, like so:

>>> paths = [x for x in all_paths({'foo': None, 'bar': ['baz', 'quux']})]
>>> [] in paths # ([] is the path to ``struc`` itself.)
True
>>> ['foo'] in paths
True
>>> ['bar'] in paths
True
>>> ['bar', 0] in paths
True
>>> ['bar', 1] in paths
True
>>> len(paths)
5

json_delta._util.check_diff_structure(diff)¶

Return diff (or True) if it is structured as a sequence of diff stanzas. Otherwise return False.

[] is a valid diff, so if it is passed to this function, the return value is True, so that the return value is always true in a Boolean context if diff is valid.

>>> check_diff_structure('This is certainly not a diff!')
False
>>> check_diff_structure([])
True
>>> check_diff_structure([None])
False
>>> example_valid_diff = [[["foo", 6, 12815316313, "bar"], None]]
>>> check_diff_structure(example_valid_diff) == example_valid_diff
True
>>> check_diff_structure([[["foo", 6, 12815316313, "bar"], None],
...                       [["foo", False], True]])
False

json_delta._util.compact_json_dumps(obj)¶

Compute the most compact possible JSON representation of obj.

>>> test = {
...             'foo': 'bar',
...             'baz':
...                ['quux', 'spam',
...       'eggs']
... }
>>> compact_json_dumps(test) in (
...     '{"foo":"bar","baz":["quux","spam","eggs"]}',
...     '{"baz":["quux","spam","eggs"],"foo":"bar"}'
... )
True
>>>

json_delta._util.decode_json(file_or_str)¶

Decode a JSON file-like object or string.

The following doctest is probably pointless as documentation. It is here so json-delta can claim 100% code coverage for its test suite!

>>> try:
...     from StringIO import StringIO
... except ImportError:
...     from io import StringIO
>>> foo = '[]'
>>> decode_json(foo)
[]
>>> decode_json(StringIO(foo))
[]

json_delta._util.decode_udiff(file_or_str)¶

Decode a file-like object or bytestring udiff into a unicode string.

The udiff may be encoded in UTF-8, -16 or -32 (with or without BOM):

>>> udiff = u'- true\n+ false'
>>> decode_udiff(udiff.encode('utf_32_be')) == udiff
True
>>> try:
...     from StringIO import StringIO
... except ImportError:
...     from io import BytesIO as StringIO
>>> decode_udiff(StringIO(udiff.encode('utf-8-sig'))) == udiff
True

An empty string is a valid udiff; this function will convert it to a unicode string:

>>> decode_udiff(b'') == u''
True

The function is idempotent: if you pass it a unicode string, it will be returned unmodified:

>>> decode_udiff(udiff) is udiff
True

If you pass it a non-empty bytestring that cannot be interpreted as beginning with ' ', '+', '-' or a BOM in any encoding, a ValueError is raised:

>>> decode_udiff(b':-)')
Traceback (most recent call last):
    ...
ValueError: String does not begin with any of the specified start chars.

json_delta._util.follow_path(struc, path)¶: Retrieve the value found at the key-path path within struc.

json_delta._util.in_array(key, accept_None=False)¶

Should the keypath key point at a JSON array ([])?

Works by testing whether key[-1] is an int or (where appropriate) long:

>>> in_array([u'bar', 16])
True
>>> import sys
>>> sys.version >= '3' or eval("in_array([u'foo', 94L])")
True

Returns False if key addresses a non-array object…

>>> in_array(["foo"])
False
>>> in_array([u'bar'])
False

…or if key == [] (as in that case there’s no way of knowing whether key addresses an object or an array).

>>> in_array([])
False

If the accept_None flag is set, this function will not raise a ValueError if key[-1] is None (keypaths of this form are used by key_tracker(), to signal points within a JSON string where a new object key is expected, but not yet found).

>>> in_array([None])
Traceback (most recent call last):
    ...
ValueError: keypath elements must be instances of str, unicode, int or long,
    not NoneType (key[0] == None)

>>> in_array([None], True)
False
>>> in_array([None], accept_None=True)
False

Otherwise, a ValueError is raised if key is not a valid keypath:

>>> keypath = [{str("spam"): str("spam")}, "pickled eggs and spam", 7]
>>> in_array(keypath)
Traceback (most recent call last):
    ...
ValueError: keypath elements must be instances of str, unicode, int or long,
    not dict (key[0] == {'spam': 'spam'})

json_delta._util.in_object(key, accept_None=False)¶

Should the keypath key point at a JSON object ({})?

Works by testing whether key[-1] is a string or (where appropriate) unicode():

>>> in_object(["foo"])
True
>>> in_object([u'bar'])
True

Returns False if key addresses an array…

>>> in_object([u'bar', 16])
False
>>> import sys
>>> False if sys.version >= '3' else eval("in_object([u'bar', 16L])")
False

…if key == []…

>>> in_object([])
False

If the accept_None flag is set, this function will also return True if key[-1] is None (this functionality is used by key_tracker(), to signal points within a JSON string where a new object key is expected, but not yet found).

>>> in_object([None])
Traceback (most recent call last):
    ...
ValueError: keypath elements must be instances of str, unicode, int or long,
    not NoneType (key[0] == None)

>>> in_object([None], True)
True
>>> in_object([None], accept_None=True)
True

Raises a ValueError if key is not a valid keypath:

>>> in_object(['foo', {}])
Traceback (most recent call last):
    ...
ValueError: keypath elements must be instances of str, unicode, int or long,
    not dict (key[1] == {})

>>> in_object([False, u'foo'])
Traceback (most recent call last):
    ...
ValueError: keypath elements must be instances of str, unicode, int or long,
    not bool (key[0] == False)

json_delta._util.in_x_error(key, offender)¶: Build the instance of ValueError in_object() and in_array() raise if keypath is invalid.

json_delta._util.key_tracker(jstring, point=0, start_key=None, special_handler=None)¶

Generate points within jstring where the keypath changes.

This function also identifies points within objects where a new key: value pair is expected, by yielding a pseudo-keypath with None as the final element.

Parameters:

jstring: The JSON string to search.
point: The point to start at.
start_key: The starting keypath.
special_handler: A function for handling extensions to JSON syntax (e.g. _upatch.ellipsis_handler(), used to handle the ... construction in udiffs).

>>> next(key_tracker('{}'))
(1, (None,))

json_delta._util.licit_starts(start_chars=u'{}[]"-0123456789tfn \t\n\r')¶

Compute the bytestrings a UTF-x encoded string can begin with.

This function is intended for encoding detection when the beginning of the encoded string must be one of a limited set of characters, as for JSON or the udiff format. The argument start_chars must be an iterable of valid beginnings.

json_delta._util.nearest_of(string, *subs)¶: Find the index of the substring in subs that occurs earliest in string, or len(string) if none of them do.

json_delta._util.predicate_count(iterable, predicate=<function <lambda>>)

Count items x in iterable such that predicate(x).

The default predicate is lambda x: True, so predicate_count(iterable) will count the values generated by iterable. Note that if the iterable is a generator, this function will exhaust it, and if it is an infinite generator, this function will never return!

>>> predicate_count([True] * 16)
16
>>> predicate_count([True, True, False, True, True], lambda x: x)
4

json_delta._util.read_bytestring(file)¶: Read the contents of file as a bytes object.

json_delta._util.skip_string(jstring, point)¶

Assuming jstring is a string, and jstring[point] is a " that starts a JSON string, return x such that jstring[x-1] is the " that terminates the string.

When a " is found, it is necessary to check that it is not escaped by a preceding backslash. As a backslash may itself be escaped, this amounts to checking that the number of backslashes immediately preceding the " is even (counting 0 as an even number):

>>> test_string = r'"Fred \"Foonly\" McQuux"'
>>> skip_string(test_string, 0) == len(test_string)
True
>>> backslash = chr(0x5c)
>>> dbl_quote = chr(0x22)
>>> even_slashes = ((r'"\\\\\\"', json.dumps(backslash * 3)),
...                 (r'"\\\\"',   json.dumps(backslash * 2)),
...                 (r'"\\"',     json.dumps(backslash)))
>>> all((json.loads(L) == json.loads(R) for (L, R) in even_slashes))
True
>>> all((skip_string(L, 0) == len(L) for (L, R) in even_slashes))
True
>>> def cat_dump(*args): return json.dumps(''.join(args))
>>> odd_slashes = (
...     (r'"\\\\\\\"  "', cat_dump(backslash * 3, dbl_quote, ' ' * 2)),
...     (r'"\\\\\"    "', cat_dump(backslash * 2, dbl_quote, ' ' * 4)),
...     (r'"\\\"      "', cat_dump(backslash * 1, dbl_quote, ' ' * 6)),
...     (r'"\"        "', cat_dump(dbl_quote, ' ' * 8)),
... )
>>> all((json.loads(L) == json.loads(R) for (L, R) in odd_slashes))
True
>>> all((skip_string(L, 0) == 12 for (L, R) in odd_slashes))
True

json_delta._util.sniff_encoding(bytestring, starts={'\x00\x00\x007': u'utf_32_be', '\x00\n': u'utf_16_be', '\x00\x00\x00\r': u'utf_32_be', '\x00\t': u'utf_16_be', '\x00\x00\x00\t': u'utf_32_be', '\x00\x00\x00\n': u'utf_32_be', '\x00\r': u'utf_16_be', '"\x00\x00\x00': u'utf_32_le', '2\x00': u'utf_16_le', '\x00\x00\x00]': u'utf_32_be', '\xef\xbb\xbf': u'utf_8_sig', '\x00"': u'utf_16_be', ' ': u'utf_8', '\x00 ': u'utf_16_be', '\x00\x00\x00 ': u'utf_32_be', '\x00\x00\x00"': u'utf_32_be', '\x00\x00\x00-': u'utf_32_be', '\x00-': u'utf_16_be', '\x002': u'utf_16_be', '0': u'utf_8', '\x000': u'utf_16_be', '\x001': u'utf_16_be', '\x006': u'utf_16_be', '4': u'utf_8', '\x004': u'utf_16_be', '\x005': u'utf_16_be', '8': u'utf_8', '\x008': u'utf_16_be', '\xff\xfe\x00\x00': u'utf_32', '\x00\x00\x008': u'utf_32_be', '\x00\x00\x001': u'utf_32_be', ']\x00\x00\x00': u'utf_32_le', '-\x00': u'utf_16_le', 'f\x00\x00\x00': u'utf_32_le', '\x00\x00\x00f': u'utf_32_be', '\x00[': u'utf_16_be', '5\x00': u'utf_16_le', 't\x00': u'utf_16_le', '\x00]': u'utf_16_be', ' \x00': u'utf_16_le', '\x00f': u'utf_16_be', '\x00\x00\x00n': u'utf_32_be', '\x00n': u'utf_16_be', '1\x00\x00\x00': u'utf_32_le', '\x00\x00\x00t': u'utf_32_be', 't': u'utf_8', '\x00t': u'utf_16_be', '4\x00\x00\x00': u'utf_32_le', '\x00{': u'utf_16_be', '\x00}': u'utf_16_be', '\x00\x00\xfe\xff': u'utf_32', '7\x00\x00\x00': u'utf_32_le', '0\x00': u'utf_16_le', '8\x00': u'utf_16_le', 'f\x00': u'utf_16_le', '3': u'utf_8', '7': u'utf_8', '{\x00\x00\x00': u'utf_32_le', ']\x00': u'utf_16_le', '\x00\x00\x00}': u'utf_32_be', '\t\x00': u'utf_16_le', '[': u'utf_8', '3\x00': u'utf_16_le', '\x00\x00\x00{': u'utf_32_be', '{': u'utf_8', '-\x00\x00\x00': u'utf_32_le', '\n': u'utf_8', '0\x00\x00\x00': u'utf_32_le', 'n\x00\x00\x00': u'utf_32_le', '6\x00': u'utf_16_le', '\x00\x00\x004': u'utf_32_be', '"': u'utf_8', '3\x00\x00\x00': u'utf_32_le', '\x003': u'utf_16_be', '\x00\x00\x00[': u'utf_32_be', '\x00\x00\x006': u'utf_32_be', '2': u'utf_8', '}\x00': u'utf_16_le', '6\x00\x00\x00': u'utf_32_le', '6': u'utf_8', 't\x00\x00\x00': u'utf_32_le', '\x00\x00\x000': u'utf_32_be', '\x007': u'utf_16_be', '\x00\x00\x002': u'utf_32_be', '9\x00\x00\x00': u'utf_32_le', '\t\x00\x00\x00': u'utf_32_le', '1\x00': u'utf_16_le', '[\x00': u'utf_16_le', '[\x00\x00\x00': u'utf_32_le', '\x009': u'utf_16_be', ' \x00\x00\x00': u'utf_32_le', 'f': u'utf_8', '9\x00': u'utf_16_le', '}\x00\x00\x00': u'utf_32_le', 'n': u'utf_8', '\xfe\xff': u'utf_16', '\t': u'utf_8', '\n\x00\x00\x00': u'utf_32_le', '\r': u'utf_8', '\r\x00\x00\x00': u'utf_32_le', '\n\x00': u'utf_16_le', '4\x00': u'utf_16_le', '-': u'utf_8', '1': u'utf_8', '{\x00': u'utf_16_le', '5': u'utf_8', '9': u'utf_8', '\xff\xfe': u'utf_16', '2\x00\x00\x00': u'utf_32_le', '\x00\x00\x005': u'utf_32_be', 'n\x00': u'utf_16_le', '5\x00\x00\x00': u'utf_32_le', '\x00\x00\x003': u'utf_32_be', ']': u'utf_8', '\x00\x00\x009': u'utf_32_be', '"\x00': u'utf_16_le', '\r\x00': u'utf_16_le', '7\x00': u'utf_16_le', '8\x00\x00\x00': u'utf_32_le', '}': u'utf_8'}, complete=True)

Determine the encoding of a UTF-x encoded string.

The argument starts must be a mapping of bytestrings the input can begin with onto the encoding that such a beginning would represent (see licit_starts() for a function that can build such a mapping).

The complete flag signifies whether the input represents the entire string: if it is set False, the function will attempt to determine the encoding, but will raise a UnicodeError if it is ambiguous. For example, an input of b'\xff\xfe' could be the UTF-16 little-endian byte-order mark, or, if the input is incomplete, it could be the first two characters of the UTF-32-LE BOM:

>>> sniff_encoding(b'\xff\xfe') == 'utf_16'
True
>>> sniff_encoding(b'\xff\xfe', complete=False)
Traceback (most recent call last):
    ...
UnicodeError: String encoding is ambiguous.

json_delta._util.stanzas_addressing(stanzas, keypath)¶

Find diff stanzas modifying the structure at keypath.

The purpose of this function is to keep track of changes made to the overall structure by stanzas earlier in the sequence, e.g.:

>>> struc = [
...     'foo',
...     'bar', [
...         'baz'
...     ]
... ]
>>> stanzas = [
...     [ [2, 1], 'quux'],
...     [ [0] ],
...     [ [1, 2], 'quordle']
... ]
>>> (stanzas_addressing(stanzas, [2])
...  == [
...   [ [1], 'quux' ],
...   [ [2], 'quordle' ]
...  ])
True

stanzas[0] and stanzas[2] both address the same element of struc — the list that starts off as ['baz'], even though their keypaths are completely different, because the diff stanza [[0]] moves the list ['baz'] from index 2 of struc to index 1.

The return value is a sub-diff: a list of stanzas fit to modify the element at keypath within the overall structure.

json_delta._util.uniquify(obj, key=<function <lambda>>)