Python implementation notes

The Python implementation of JSON-delta consists of a package json_delta, whose top-level namespace is documented in The JSON-delta API. The implementation is divided into five sub-modules of the package, whose names all begin with an underscore to highlight the fact that they are not part of the API: the way the functions documented in The JSON-delta API are implemented is subject to refactoring at any time. Nevertheless, the sub-modules are documented here.


Functions for computing JSON-format diffs.

json_delta._diff.append_key(stanzas, left_struc, keypath=None)

Get the appropriate key for appending to the sequence left_struc.

stanzas should be a diff, some of whose stanzas may modify a sequence left_struc that appears at path keypath. If any of the stanzas append to left_struc, the return value is the largest index in left_struc they address, plus one. Otherwise, the return value is len(left_struc) (i.e. the index that a value would have if it was appended to left_struc).

>>> append_key([], [])
>>> append_key([[[2], 'Baz']], ['Foo', 'Bar'])
>>> append_key([[[2], 'Baz'], [['Quux', 0], 'Foo']], [], ['Quux'])
json_delta._diff.commonality(left_struc, right_struc)

Return a float between 0.0 and 1.0 representing the amount that the structures left_struc and right_struc have in common.

It is assumed (and assert``ed!) that ``left_struc and right_struc are of the same type, and non-empty (check this using structure_worth_investigating()). Return value is computed as the fraction (elements in common) / (total elements).

json_delta._diff.compute_keysets(left_seq, right_seq)

Compare the keys of left_seq vs. right_seq.

Determines which keys left_seq and right_seq have in common, and which are unique to each of the structures. Arguments should be instances of the same basic type, which must be a non-terminal: i.e. list or dict. If they are lists, the keys compared will be integer indices.

Return value is a 3-tuple of sets ({overlap}, {left_only}, {right_only}). As their names suggest, overlap is a set of keys left_seq have in common, left_only represents keys only found in left_seq, and right_only holds keys only found in right_seq.
AssertionError if left_seq is not an instance of type(right_seq), or if they are not of a non-terminal type.
>>> compute_keysets({'foo': None}, {'bar': None}) == (set([]), {'foo'}, {'bar'})
>>> (compute_keysets({'foo': None, 'baz': None}, {'bar': None, 'baz': None})
...  == ({'baz'}, {'foo'}, {'bar'}))
>>> compute_keysets(['foo', 'baz'], ['bar', 'baz']) == ({0, 1}, set([]), set([]))
>>> compute_keysets(['foo'], ['bar', 'baz']) == ({0}, set([]), {1})
>>> compute_keysets([], ['bar', 'baz']) == (set([]), set([]), {0, 1})
json_delta._diff.diff(left_struc, right_struc, minimal=True, verbose=True, key=None)

Compose a sequence of diff stanzas sufficient to convert the structure left_struc into the structure right_struc. (The goal is to add ‘necessary and’ to ‘sufficient’ above!).


verbose: if this is set True (the default), a line of compression statistics will be printed to stderr.

minimal: if True, the function will try harder to find the diff that encodes as the shortest possible JSON string, at the expense of using more of both memory and processor time (as alternatives are computed and compared).

The parameter key is present because this function is mutually recursive with needle_diff() and keyset_diff(). If set to a list, it will be prefixed to every keypath in the output.

json_delta._diff.keyset_diff(left_struc, right_struc, key, minimal=True)

Return a diff between left_struc and right_struc.

It is assumed that left_struc and right_struc are both non-terminal types (serializable as arrays or objects). Sequences are treated just like mappings by this function, so the diffs will be correct but not necessarily minimal. For a minimal diff between two sequences, use needle_diff().

This function probably shouldn’t be called directly. Instead, use udiff(), which will call keyset_diff() if appropriate anyway.

json_delta._diff.needle_diff(left_struc, right_struc, key, minimal=True)

Returns a diff between left_struc and right_struc.

If left_struc and right_struc are both serializable as arrays, this function will use Needleman-Wunsch sequence alignment to find a minimal diff between them. Otherwise, the inputs are passed on to keyset_diff().

This function probably shouldn’t be called directly. Instead, use diff(), which is mutually recursive with this function and keyset_diff() anyway.


Sort the stanzas in diff.

Object changes can occur in any order, but deletions from arrays have to happen last node first: [‘foo’, ‘bar’, ‘baz’] -> [‘foo’, ‘bar’] -> [‘foo’] -> []; and additions to arrays have to happen leftmost-node-first: [] -> [‘foo’] -> [‘foo’, ‘bar’] -> [‘foo’, ‘bar’, ‘baz’].

Note that this will also sort changes to objects (dicts) so that they occur first of all, then modifications/additions on arrays, followed by deletions from arrays.


Split a diff into modifications and deletions.

Return value is a 3-tuple of lists: the first is a list of stanzas from stanzas that modify JSON objects, the second is a list of stanzas that add or change elements in JSON arrays, and the second is a list of stanzas which delete elements from arrays.

json_delta._diff.structure_worth_investigating(left_struc, right_struc)

Test whether it is worth looking at the internal structure of left_struc and right_struc to see if they can be efficiently diffed.

json_delta._diff.this_level_diff(left_struc, right_struc, key=None, common=None)

Return a sequence of diff stanzas between the structures left_struc and right_struc, assuming that they are each at the key-path key within the overall structure.

>>> (this_level_diff({'foo': 'bar', 'baz': 'quux'},
...                 {'foo': 'bar'})
...  == [[['baz']]])
>>> (this_level_diff({'foo': 'bar', 'baz': 'quux'},
...                 {'foo': 'bar'}, ['quordle'])
...  == [[['quordle', 'baz']]])


Functions for applying JSON-format patches.

json_delta._patch.patch(struc, diff, in_place=True)

Apply the sequence of diff stanzas diff to the structure struc.

By default, this function modifies struc in place; set in_place to False to return a patched copy of struc instead:

>>> will_change = [16]
>>> wont_change = [16]
>>> patch(will_change, [[[0]]])
>>> will_change
>>> patch(wont_change, [[[0]]], False)
>>> wont_change
json_delta._patch.patch_stanza(struc, diff)

Applies the diff stanza diff to the structure struc as a patch.

Note that this function modifies struc in-place into the target of diff. If struc is a tuple, you get a new tuple with the appropriate modification made:

>>> patch_stanza((17, 3.141593, None), [[1], 3.14159265])
(17, 3.14159265, None)


Functions for computing udiffs.

The data structure representing a udiff that these functions all manipulate is a pair of lists of iterators (left_lines, right_lines). These lists are expected (principally by generate_udiff_lines(), which processes them), to be of the same length. A pair of iterators (left_lines[i], right_lines[i]) may yield exactly the same sequence of output lines, each with ' ' as the first character (representing parts of the structure the input and output have in common). Alternatively, they may each yield zero or more lines (referring to parts of the structure that are unique to the inputs they represent). In this case, all lines yielded by left_lines[i] should begin with '-', and all lines yielded by right_lines[i] should begin with '+'.

class json_delta._udiff.Gap

Class to represent gaps introduced by sequence alignment.

json_delta._udiff.add_matter(seq, matter, indent)

Add material to seq, treating it appropriately for its type.

matter may be an iterator, in which case it is appended to seq. If it is a sequence, it is assumed to be a sequence of iterators, the sequence is concatenated onto seq. If matter is a string, it is turned into a patch band using single_patch_band(), which is appended. Finally, if matter is None, an empty iterable is appended to seq.

This function is a udiff-forming primitive, called by more specific functions defined within udiff_dict() and udiff_list().

json_delta._udiff.commafy(gen, comma=True)

Yields every result of gen, with a comma concatenated onto the final result iff comma is True.


Create partials of _add_common_matter, _add_differing_matter and _commafy_last, with values for left_lines, right_lines and (where appropriate) indent taken from the dictionary local_ns.

Appropriate defaults are also included in the partials, namely left=None and right=None for _add_differing_matter and left_comma=True and right_comma=None for _commafy_last.

json_delta._udiff.generate_udiff_lines(left, right)

Combine the diff lines from left and right, and generate the lines of the resulting udiff.

json_delta._udiff.patch_bands(indent, material, sigil=u' ')

Generate appropriately indented patch bands, with sigil as the first character.

json_delta._udiff.reconstruct_alignment(left, right, stanzas)

Reconstruct the sequence alignment between the lists left and right implied by stanzas.

json_delta._udiff.single_patch_band(indent, line, sigil=u' ')

Convenience function returning an iterable that generates a single patch band.

json_delta._udiff.udiff(left, right, patch=None, indent=0, entry=True)

Render the difference between the structures left and right as a string in a fashion inspired by diff -u.

Generating a udiff is strictly slower than generating a normal diff, since the udiff is computed on the basis of a normal diff between left and right. If such a diff has already been computed (e.g. by calling diff()), pass it as the patch parameter.

json_delta._udiff.udiff_dict(left, right, stanzas, indent=0)

Construct a pair of iterator sequences representing stanzas as a human-readable delta between dictionaries left and right.

This function probably shouldn’t be called directly. Instead, use udiff() with the same arguments. udiff() and udiff_dict() are mutually recursive, anyway.

json_delta._udiff.udiff_list(left, right, stanzas, indent=0)

Construct a pair of iterator sequences representing stanzas as a delta between the lists left and right.

This function probably shouldn’t be called directly. Instead, use udiff() with the same arguments. udiff() and udiff_list() are mutually recursive, anyway.


json_delta._upatch.ellipsis_handler(jstring, point, key)

Extends _util.key_tracker() to handle the ... construction.

json_delta._upatch.reconstruct_diff(udiff, reverse=False)

Turn a udiff back into a JSON-format diff.

Set reverse to True to generate a reverse diff (i.e. swap the significance of line-initial + and -).

Header lines (if present) are ignored: >>> udiff = “”“— <stdin> ... +++ <stdin> ... -false ... +true”“” >>> reconstruct_diff(udiff) [[[], True]] >>> reconstruct_diff(udiff, reverse=True) [[[], False]]

json_delta._upatch.udiff_key_tracker(udiff, point=0, start_key=None)

Find points within the udiff where the active keypath changes.

json_delta._upatch.upatch(struc, udiff, reverse=False)

Apply a patch of the form output by json_delta.udiff() to the structure struc.


Utility functions and constants used by all submodules.

json_delta._util._load_and_func(func, parm1=None, parm2=None, both=None, **flags)

Decode JSON-serialized parameters and apply func to them.


Generate key-paths to every node in struc.

Both terminal and non-terminal nodes are visited, like so:

>>> paths = [x for x in all_paths({'foo': None, 'bar': ['baz', 'quux']})]
>>> [] in paths # ([] is the path to struc itself.)
>>> ['foo'] in paths
>>> ['bar'] in paths
>>> ['bar', 0] in paths
>>> ['bar', 1] in paths
>>> len(paths)

Return diff (or True) if it is structured as a sequence of diff stanzas. Otherwise return False.

[] is a valid diff, so if it is passed to this function, the return value is True, so that the return value is always true in a Boolean context if diff is valid.

>>> check_diff_structure('This is certainly not a diff!')
>>> check_diff_structure([])
>>> check_diff_structure([None])
>>> example_valid_diff = [[["foo", 6, 12815316313, "bar"], None]]
>>> check_diff_structure(example_valid_diff) == example_valid_diff
>>> check_diff_structure([[["foo", 6, 12815316313, "bar"], None],
...                       [["foo", False], True]])

Compute the most compact possible JSON representation of the serializable object obj.

>>> test = {
...             'foo': 'bar',
...             'baz':
...                ['quux', 'spam',
...       'eggs']
... }
>>> compact_json_dumps(test) in (
...     '{"foo":"bar","baz":["quux","spam","eggs"]}',
...     '{"baz":["quux","spam","eggs"],"foo":"bar"}'
... )

Decode a JSON file-like object or string.

The following doctest is probably pointless as documentation. It is here so json_delta can claim 100% code coverage for its test suite!

>>> try:
...     from StringIO import StringIO
... except ImportError:
...     from io import StringIO
>>> foo = '[]'
>>> decode_json(foo)
>>> decode_json(StringIO(foo))
json_delta._util.follow_path(struc, path)

Return the value found at the key-path path within struc.


Should the keypath key point at a JSON array ([])?


Should the keypath key point at a JSON object ({})?

json_delta._util.in_one_level(diff, key)

Return the subset of diff whose key-paths begin with key, expressed relative to the structure at [key] (i.e. with the first element of each key-path stripped off).

>>> diff = [ [['bar'], None],
...          [['baz', 3], 'cheese'],
...          [['baz', 4, 'quux'], 'foo'] ]
>>> in_one_level(diff, 'baz') == [[[3], 'cheese'], [[4, 'quux'], 'foo']]
json_delta._util.key_tracker(jstring, point=0, start_key=None, special_handler=None)

Generate points within jstring where the keypath changes.

This function also identifies points within objects where a new key: value pair is expected, by yielding a pseudo-keypath with None as the final element.

  • jstring: The JSON string to search.
  • point: The point to start at.
  • start_key: The starting keypath.
  • special_handler: A function for handling extensions to JSON syntax (e.g. _upatch.ellipsis_handler(), used to handle the ... construction in udiffs).
>>> next(key_tracker('{}'))
(1, (None,))
json_delta._util.nearest_of(string, *subs)

Find the index of the substring in subs that occurs earliest in string, or len(string) if none of them do.

json_delta._util.skip_string(jstring, point)

Assuming jstring is a string, and jstring[point] is a " that starts a JSON string, return x such that jstring[x-1] is the " that terminates the string.

json_delta._util.uniquify(obj, key=<function <lambda>>)

Remove duplicate elements from a list while preserving order.