Python implementation notes¶
The Python implementation of JSON-delta consists of a package
json_delta, whose top-level namespace is documented in
The JSON-delta API. The implementation is divided into five sub-modules of
the package, whose names all begin with an underscore to highlight the
fact that they are not part of the API: the way the functions
documented in The JSON-delta API are implemented is subject to refactoring
at any time. Nevertheless, the sub-modules are documented here.
json_delta._diff¶
Functions for computing JSON-format diffs.
-
json_delta._diff.diff(left_struc, right_struc, array_align=True, compare_lengths=True, common_key_threshold=0.0, verbose=True, key=None)¶ Compose a sequence of diff stanzas sufficient to convert the structure
left_strucinto the structureright_struc. (Whether you can add ‘necessary and’ to ‘sufficient to’ depends on the setting of the other parms, and how many cycles you want to burn; see below).- Optional parameters:
array_align: Useneedle_diff()to compute deltas between arrays. Computationally expensive, but likely to produce shorter diffs. If this parm is set to the string'udiff',needle_diff()will optimize for the shortest udiff, instead of the shortest JSON-format diff. Otherwise, set to any value that is true in a Boolean context to enable.compare_lengths: If[[key, right_struc]]can be encoded as a shorter JSON-string, return it instead of examining the internal structure ofleft_strucandright_struc. It involves callingjson.dumps()twice for every node in the structure, but may result in smaller diffs.common_key_threshold: Skip recursion intoleft_strucandright_strucif the fraction of keys they have in common (as computed bycommonality(), which see) is less than this parm (which should be a float between0.0and1.0).verbose: Print compression statistics will be to stderr.
The parameter
keyis present because this function is mutually recursive withneedle_diff()andkeyset_diff(). If set to a list, it will be prefixed to every keypath in the output.
-
json_delta._diff.append_key(stanzas, left_struc, keypath=())¶ Get the appropriate key for appending to the sequence
left_struc.stanzasshould be a diff, some of whose stanzas may modify a sequenceleft_structhat appears at pathkeypath. If any of the stanzas append toleft_struc, the return value is the largest index inleft_structhey address, plus one. Otherwise, the return value islen(left_struc)(i.e. the index that a value would have if it was appended toleft_struc).>>> append_key([], []) 0 >>> append_key([[[2], 'Baz']], ['Foo', 'Bar']) 3 >>> append_key([[[2], 'Baz'], [['Quux', 0], 'Foo']], [], ['Quux']) 1
-
json_delta._diff.commonality(left_struc, right_struc)¶ Return a float between
0.0and1.0representing the amount that the structuresleft_strucandright_struchave in common.Return value is computed as the fraction (elements in common) / (total elements).
-
json_delta._diff.compute_diff_stats(target, diff, percent=True)¶ Calculate the size of a minimal JSON dump of
targetanddiff, and the ratio of the two sizes.The ratio is expressed as a percentage if
percentisTruein a Boolean context , or as a float otherwise.Return value is a tuple of the form
({ratio}, {size of target}, {size of diff})>>> compute_diff_stats([{}, 'foo', 'bar'], [], False) (0.125, 16, 2) >>> compute_diff_stats([{}, 'foo', 'bar'], [[0], {}]) (50.0, 16, 8)
-
json_delta._diff.compute_keysets(left_seq, right_seq)¶ Compare the keys of
left_seqvs.right_seq.Determines which keys
left_seqandright_seqhave in common, and which are unique to each of the structures. Arguments should be instances of the same basic type, which must be a non-terminal: i.e.listordict. If they are lists, the keys compared will be integer indices.- Returns:
- Return value is a 3-tuple of sets
({overlap}, {left_only}, {right_only}). As their names suggest,overlapis a set of keysleft_seqhave in common,left_onlyrepresents keys only found inleft_seq, andright_onlyholds keys only found inright_seq. - Raises:
- AssertionError if
left_seqis not an instance oftype(right_seq), or if they are not of a non-terminal type.
>>> (compute_keysets({'foo': None}, {'bar': None}) ... == (set([]), {'foo'}, {'bar'})) True >>> (compute_keysets({'foo': None, 'baz': None}, ... {'bar': None, 'baz': None}) ... == ({'baz'}, {'foo'}, {'bar'})) True >>> (compute_keysets(['foo', 'baz'], ['bar', 'baz']) ... == ({0, 1}, set([]), set([]))) True >>> compute_keysets(['foo'], ['bar', 'baz']) == ({0}, set([]), {1}) True >>> compute_keysets([], ['bar', 'baz']) == (set([]), set([]), {0, 1}) True
-
json_delta._diff.diff(left_struc, right_struc, array_align=True, compare_lengths=True, common_key_threshold=0.0, verbose=True, key=None) Compose a sequence of diff stanzas sufficient to convert the structure
left_strucinto the structureright_struc. (Whether you can add ‘necessary and’ to ‘sufficient to’ depends on the setting of the other parms, and how many cycles you want to burn; see below).- Optional parameters:
array_align: Useneedle_diff()to compute deltas between arrays. Computationally expensive, but likely to produce shorter diffs. If this parm is set to the string'udiff',needle_diff()will optimize for the shortest udiff, instead of the shortest JSON-format diff. Otherwise, set to any value that is true in a Boolean context to enable.compare_lengths: If[[key, right_struc]]can be encoded as a shorter JSON-string, return it instead of examining the internal structure ofleft_strucandright_struc. It involves callingjson.dumps()twice for every node in the structure, but may result in smaller diffs.common_key_threshold: Skip recursion intoleft_strucandright_strucif the fraction of keys they have in common (as computed bycommonality(), which see) is less than this parm (which should be a float between0.0and1.0).verbose: Print compression statistics will be to stderr.
The parameter
keyis present because this function is mutually recursive withneedle_diff()andkeyset_diff(). If set to a list, it will be prefixed to every keypath in the output.
-
json_delta._diff.keyset_diff(left_struc, right_struc, key, options={})¶ Return a diff between
left_strucandright_struc.It is assumed that
left_strucandright_strucare both non-terminal types (serializable as arrays or objects). Sequences are treated just like mappings by this function, so the diffs will be correct but not necessarily minimal. For a minimal diff between two sequences, useneedle_diff().This function probably shouldn’t be called directly. Instead, use
diff(), which will callkeyset_diff()if appropriate anyway.
-
json_delta._diff.needle_diff(left_struc, right_struc, key, options={})¶ Returns a diff between
left_strucandright_struc.If
left_strucandright_strucare both serializable as arrays, this function will use a Needleman-Wunsch sequence alignment to find a minimal diff between them. Otherwise, the inputs are passed on tokeyset_diff().This function probably shouldn’t be called directly. Instead, use
diff(), which is mutually recursive with this function andkeyset_diff()anyway.
-
json_delta._diff.sort_stanzas(stanzas)¶ Sort the stanzas in a diff.
Object changes can occur in any order, but deletions from arrays have to happen last node first:
['foo', 'bar', 'baz']→['foo', 'bar']→['foo']→[]; additions to arrays have to happen leftmost-node-first:[]→['foo']→['foo', 'bar']→['foo', 'bar', 'baz'], and insert-and-shift alterations to arrays must happen last:['foo', 'quux']→['foo', 'bar', 'quux']→['foo', 'bar', 'baz', 'quux'].Finally, stanzas are sorted in descending order of length of keypath, so that the most deeply-nested structures are altered before alterations which might change their keypaths take place.
Note that this will also sort changes to objects (dicts) so that they occur first of all.
-
json_delta._diff.split_diff(stanzas)¶ Split a diff into modifications, deletions and insertions.
Return value is a 4-tuple of lists: the first is a list of stanzas from
stanzasthat modify JSON objects, the second is a list of stanzas that add or change elements in JSON arrays, the third is a list of stanzas which delete elements from arrays, and the fourth is a list of stanzas which insert elements into arrays (stanzas ending in"i").
-
json_delta._diff.structure_comparable(left_struc, right_struc)¶ Test if
left_strucandright_struccan be efficiently diffed.
-
json_delta._diff.this_level_diff(left_struc, right_struc, key=None, common=None)¶ Return a sequence of diff stanzas between the structures
left_strucandright_struc, assuming that they are each at the key-pathkeywithin the overall structure.>>> (this_level_diff({'foo': 'bar', 'baz': 'quux'}, ... {'foo': 'bar'}) ... == [[['baz']]]) True >>> (this_level_diff({'foo': 'bar', 'baz': 'quux'}, ... {'foo': 'bar'}, ['quordle']) ... == [[['quordle', 'baz']]]) True
json_delta._patch¶
Functions for applying JSON-format patches.
-
json_delta._patch.patch(struc, diff, in_place=True)¶ Apply the sequence of diff stanzas
diffto the structurestruc.By default, this function modifies
strucin place; setin_placetoFalseto return a patched copy of struc instead:>>> will_change = [16] >>> wont_change = [16] >>> patch(will_change, [[[0]]]) [] >>> will_change [] >>> patch(wont_change, [[[0]]], False) [] >>> wont_change [16]
-
json_delta._patch.patch(struc, diff, in_place=True) Apply the sequence of diff stanzas
diffto the structurestruc.By default, this function modifies
strucin place; setin_placetoFalseto return a patched copy of struc instead:>>> will_change = [16] >>> wont_change = [16] >>> patch(will_change, [[[0]]]) [] >>> will_change [] >>> patch(wont_change, [[[0]]], False) [] >>> wont_change [16]
-
json_delta._patch.patch_stanza(struc, stanza)¶ Applies the stanza
stanzato the structurestrucas a patch.Note that this function modifies
strucin-place into the target ofstanza. Ifstrucis atuple(), you get a new tuple with the appropriate modification made:>>> patch_stanza((17, 3.141593, None), [[1], 3.14159265]) (17, 3.14159265, None)
json_delta._udiff¶
Functions for computing udiffs. Main entry point: udiff().
The data structure representing a udiff that these functions all
manipulate is a pair of lists of iterators (left_lines,
right_lines). These lists are expected (principally by
generate_udiff_lines(), which processes them), to be of the
same length. A pair of iterators (left_lines[i], right_lines[i])
may yield exactly the same sequence of output lines, each with ' '
as the first character (representing parts of the structure the input
and output have in common). Alternatively, they may each yield zero
or more lines (referring to parts of the structure that are unique to
the inputs they represent). In this case, all lines yielded by
left_lines[i] should begin with '-', and all lines yielded by
right_lines[i] should begin with '+'.
-
json_delta._udiff.udiff(left, right, patch=None, indent=0, use_ellipses=True, entry=True)¶ Render the difference between the structures
leftandrightas a string in a fashion inspired by diff -u.Generating a udiff is strictly slower than generating a normal diff with the same option parameters, since the udiff is computed on the basis of a normal diff between
leftandright. If such a diff has already been computed (e.g. by callingdiff()), pass it as thepatchparameter:>>> (next(udiff({"foo": None}, {"foo": None}, patch=[])) == ... ' {...}') True
As you can see above, structures that are identical in
leftandrightare abbreviated using'...'by default. To disable this behavior, setuse_ellipsestoFalse.>>> ('\n'.join(udiff({"foo": None}, {"foo": None}, ... patch=[], use_ellipses=False)) == ... """ { ... "foo": ... null ... }""") True
>>> ('\n'.join(udiff([None, None, None], [None, None, None], ... patch=[], use_ellipses=False)) == ... """ [ ... null, ... null, ... null ... ]""") True
-
class
json_delta._udiff.Gap¶ Class to represent gaps introduced by sequence alignment.
-
json_delta._udiff.add_matter(seq, matter, indent)¶ Add material to
seq, treating it appropriately for its type.mattermay be an iterator, in which case it is appended toseq. If it is a sequence, it is assumed to be a sequence of iterators, the sequence is concatenated ontoseq. Ifmatteris a string, it is turned into a patch band usingsingle_patch_band(), which is appended. Finally, ifmatterisNone, an empty iterable is appended toseq.This function is a udiff-forming primitive, called by more specific functions defined within
udiff_dict()andudiff_list().
-
json_delta._udiff.commafy(gen, comma=True)¶ Yield from
gen, ensuring that the final result ends with a comma iffcommaisTrue.>>> gen = ['Example line'] >>> next(commafy(iter(gen))) == 'Example line,' True >>> next(commafy(iter(gen), False)) == 'Example line' True >>> gen = ['Line with a comma at the end,'] >>> (next(commafy(iter(gen), comma=True)) ... == next(commafy(iter(gen), comma=False)) ... == 'Line with a comma at the end,') True
-
json_delta._udiff.curry_functions(local_ns)¶ Create partials of
_add_common_matter(),_add_differing_matter()and_commafy_last(), with values forleft_lines,right_linesand (where appropriate)indenttaken from the dictionarylocal_ns.Appropriate defaults are also included in the partials, namely
left=Noneandright=Nonefor_add_differing_matter()andleft_comma=Trueandright_comma=Nonefor_commafy_last().
-
json_delta._udiff.generate_udiff_lines(left, right)¶ Combine the diff lines from
leftandright, and generate the lines of the resulting udiff.
-
json_delta._udiff.patch_bands(indent, material, sigil=u' ')¶ Generate appropriately indented patch bands, with
sigilas the first character.
-
json_delta._udiff.reconstruct_alignment(left, right, stanzas)¶ Reconstruct the sequence alignment between the lists
leftandrightimplied bystanzas.
-
json_delta._udiff.single_patch_band(indent, line, sigil=u' ')¶ Convenience function returning an iterable that generates a single patch band.
-
json_delta._udiff.udiff(left, right, patch=None, indent=0, use_ellipses=True, entry=True) Render the difference between the structures
leftandrightas a string in a fashion inspired by diff -u.Generating a udiff is strictly slower than generating a normal diff with the same option parameters, since the udiff is computed on the basis of a normal diff between
leftandright. If such a diff has already been computed (e.g. by callingdiff()), pass it as thepatchparameter:>>> (next(udiff({"foo": None}, {"foo": None}, patch=[])) == ... ' {...}') True
As you can see above, structures that are identical in
leftandrightare abbreviated using'...'by default. To disable this behavior, setuse_ellipsestoFalse.>>> ('\n'.join(udiff({"foo": None}, {"foo": None}, ... patch=[], use_ellipses=False)) == ... """ { ... "foo": ... null ... }""") True
>>> ('\n'.join(udiff([None, None, None], [None, None, None], ... patch=[], use_ellipses=False)) == ... """ [ ... null, ... null, ... null ... ]""") True
-
json_delta._udiff.udiff_dict(left, right, stanzas, indent=0, use_ellipses=True)¶ Construct a human-readable delta between
leftandright.This function probably shouldn’t be called directly. Instead, use
udiff()with the same arguments.udiff()andudiff_dict()are mutually recursive, anyway.
-
json_delta._udiff.udiff_list(left, right, stanzas, indent=0, use_ellipses=True)¶ Construct a human-readable delta between
leftandright.This function probably shouldn’t be called directly. Instead, use
udiff()with the same arguments.udiff()andudiff_list()are mutually recursive, anyway.
json_delta._upatch¶
-
json_delta._upatch.upatch(struc, udiff, reverse=False, in_place=True)¶ Apply a patch as output by
json_delta.udiff()tostruc.As with
json_delta.patch(),strucis modified in place by default. Set the parmin_placetoFalseif this is not the desired behaviour.The udiff format has enough information in it that this transformation can be applied in reverse: i.e. if
udiffis the output ofudiff(left, right), you can reconstructrightgivenleftandudiff(by runningupatch(left, udiff)), or you can also reconstructleftgivenrightand udiff (by runningupatch(right, udiff, reverse=True)). This is not possible for JSON-format diffs, since a[keypath]stanza (meaning “delete the structure atkeypath”) does not record what the deleted structure was.
-
json_delta._upatch.ellipsis_handler(jstring, point, key)¶ Extends
key_tracker()to handle the…construction.
-
json_delta._upatch.is_none_key(key)¶ Is the last element of
keyNone?
-
json_delta._upatch.reconstruct_diff(udiff, reverse=False)¶ Turn a udiff back into a JSON-format diff.
Set
reversetoTrueto generate a reverse diff (i.e. swap the significance of line-initial+and-).Header lines (if present) are ignored:
>>> udiff = """--- <stdin> ... +++ <stdin> ... -false ... +true""" >>> reconstruct_diff(udiff) [[[], True]] >>> reconstruct_diff(udiff, reverse=True) [[[], False]]
-
json_delta._upatch.skip_key(point, key, origin, keys, predicate)¶ Find the next result in
keysfor whichpredicate(key)isFalse.If none is found, or if
keyis already such a result, the return value is(point, key).
-
json_delta._upatch.sort_stanzas(stanzas)¶ Sorts the stanzas in a diff.
reconstruct_diff()works on different assumptions fromjson_delta._diff.needle_diff()when it comes to stanzas altering arrays: keys in such stanzas relate to the element’s position within the array’s longest intermediate representation during the transformation (that is after all insert-and-shifts, after all appends, but before any deletions). This function sortsstanzasto reflect that order of operations.As with
json_delta._diff.sort_stanzas()(which see), stanzas are sorted for length so the most deeply-nested structures get their modifications first.
-
json_delta._upatch.udiff_key_tracker(udiff, point=0, start_key=None)¶ Find points within the udiff where the active keypath changes.
-
json_delta._upatch.upatch(struc, udiff, reverse=False, in_place=True) Apply a patch as output by
json_delta.udiff()tostruc.As with
json_delta.patch(),strucis modified in place by default. Set the parmin_placetoFalseif this is not the desired behaviour.The udiff format has enough information in it that this transformation can be applied in reverse: i.e. if
udiffis the output ofudiff(left, right), you can reconstructrightgivenleftandudiff(by runningupatch(left, udiff)), or you can also reconstructleftgivenrightand udiff (by runningupatch(right, udiff, reverse=True)). This is not possible for JSON-format diffs, since a[keypath]stanza (meaning “delete the structure atkeypath”) does not record what the deleted structure was.
json_delta._util¶
Utility functions and constants used by more than one submodule.
The majority of python 2/3 compatibility shims also appear in this module.
-
json_delta._util.predicate_count(iterable, predicate=lambda x: True)¶ Count items
xiniterablesuch thatpredicate(x).The default
predicateislambda x: True, sopredicate_count(iterable)will count the values generated byiterable. Note that if the iterable is a generator, this function will exhaust it, and if it is an infinite generator, this function will never return!>>> predicate_count([True] * 16) 16 >>> predicate_count([True, True, False, True, True], lambda x: x) 4
-
json_delta._util.uniquify(bytestring, key=lambda x: x)¶ Remove duplicate elements from a list while preserving order.
keyworks as formin(),max(), etc. in the standard library.
-
json_delta._util.sniff_encoding(bytestring, starts=JSON_STARTS, complete=True)¶ Determine the encoding of a UTF-x encoded string.
The argument
startsmust be a mapping of bytestrings the input can begin with onto the encoding that such a beginning would represent (seelicit_starts()for a function that can build such a mapping).The
completeflag signifies whether the input represents the entire string: if it is setFalse, the function will attempt to determine the encoding, but will raise aUnicodeErrorif it is ambiguous. For example, an input ofb'\xff\xfe'could be the UTF-16 little-endian byte-order mark, or, if the input is incomplete, it could be the first two characters of the UTF-32-LE BOM:>>> sniff_encoding(b'\xff\xfe') == 'utf_16' True >>> sniff_encoding(b'\xff\xfe', complete=False) Traceback (most recent call last): ... UnicodeError: String encoding is ambiguous.
-
json_delta._util._load_and_func(func, parm1=None, parm2=None, both=None, **flags)¶ Decode JSON-serialized parameters and apply func to them.
-
json_delta._util.all_paths(struc)¶ Generate key-paths to every node in
struc.Both terminal and non-terminal nodes are visited, like so:
>>> paths = [x for x in all_paths({'foo': None, 'bar': ['baz', 'quux']})] >>> [] in paths # ([] is the path to ``struc`` itself.) True >>> ['foo'] in paths True >>> ['bar'] in paths True >>> ['bar', 0] in paths True >>> ['bar', 1] in paths True >>> len(paths) 5
-
json_delta._util.check_diff_structure(diff)¶ Return
diff(orTrue) if it is structured as a sequence ofdiffstanzas. Otherwise returnFalse.[]is a valid diff, so if it is passed to this function, the return value isTrue, so that the return value is always true in a Boolean context ifdiffis valid.>>> check_diff_structure('This is certainly not a diff!') False >>> check_diff_structure([]) True >>> check_diff_structure([None]) False >>> example_valid_diff = [[["foo", 6, 12815316313, "bar"], None]] >>> check_diff_structure(example_valid_diff) == example_valid_diff True >>> check_diff_structure([[["foo", 6, 12815316313, "bar"], None], ... [["foo", False], True]]) False
-
json_delta._util.compact_json_dumps(obj)¶ Compute the most compact possible JSON representation of
obj.>>> test = { ... 'foo': 'bar', ... 'baz': ... ['quux', 'spam', ... 'eggs'] ... } >>> compact_json_dumps(test) in ( ... '{"foo":"bar","baz":["quux","spam","eggs"]}', ... '{"baz":["quux","spam","eggs"],"foo":"bar"}' ... ) True >>>
-
json_delta._util.decode_json(file_or_str)¶ Decode a JSON file-like object or string.
The following doctest is probably pointless as documentation. It is here so json-delta can claim 100% code coverage for its test suite!
>>> try: ... from StringIO import StringIO ... except ImportError: ... from io import StringIO >>> foo = '[]' >>> decode_json(foo) [] >>> decode_json(StringIO(foo)) []
-
json_delta._util.decode_udiff(file_or_str)¶ Decode a file-like object or bytestring udiff into a unicode string.
The udiff may be encoded in UTF-8, -16 or -32 (with or without BOM):
>>> udiff = u'- true\n+ false' >>> decode_udiff(udiff.encode('utf_32_be')) == udiff True >>> try: ... from StringIO import StringIO ... except ImportError: ... from io import BytesIO as StringIO >>> decode_udiff(StringIO(udiff.encode('utf-8-sig'))) == udiff True
An empty string is a valid udiff; this function will convert it to a unicode string:
>>> decode_udiff(b'') == u'' True
The function is idempotent: if you pass it a unicode string, it will be returned unmodified:
>>> decode_udiff(udiff) is udiff True
If you pass it a non-empty bytestring that cannot be interpreted as beginning with
' ','+','-'or a BOM in any encoding, aValueErroris raised:>>> decode_udiff(b':-)') Traceback (most recent call last): ... ValueError: String does not begin with any of the specified start chars.
-
json_delta._util.follow_path(struc, path)¶ Retrieve the value found at the key-path
pathwithinstruc.
-
json_delta._util.in_array(key, accept_None=False)¶ Should the keypath
keypoint at a JSON array ([])?Works by testing whether
key[-1]is anintor (where appropriate)long:>>> in_array([u'bar', 16]) True >>> import sys >>> sys.version >= '3' or eval("in_array([u'foo', 94L])") True
Returns
Falseifkeyaddresses a non-array object…>>> in_array(["foo"]) False >>> in_array([u'bar']) False
…or if
key == [](as in that case there’s no way of knowing whetherkeyaddresses an object or an array).>>> in_array([]) False
If the
accept_Noneflag is set, this function will not raise aValueErrorifkey[-1] is None(keypaths of this form are used bykey_tracker(), to signal points within a JSON string where a new object key is expected, but not yet found).>>> in_array([None]) Traceback (most recent call last): ... ValueError: keypath elements must be instances of str, unicode, int or long, not NoneType (key[0] == None)
>>> in_array([None], True) False >>> in_array([None], accept_None=True) False
Otherwise, a
ValueErroris raised ifkeyis not a valid keypath:>>> keypath = [{str("spam"): str("spam")}, "pickled eggs and spam", 7] >>> in_array(keypath) Traceback (most recent call last): ... ValueError: keypath elements must be instances of str, unicode, int or long, not dict (key[0] == {'spam': 'spam'})
-
json_delta._util.in_object(key, accept_None=False)¶ Should the keypath
keypoint at a JSON object ({})?Works by testing whether
key[-1]is a string or (where appropriate)unicode():>>> in_object(["foo"]) True >>> in_object([u'bar']) True
Returns
Falseifkeyaddresses an array…>>> in_object([u'bar', 16]) False >>> import sys >>> False if sys.version >= '3' else eval("in_object([u'bar', 16L])") False
…if
key == []…>>> in_object([]) False
If the
accept_Noneflag is set, this function will also returnTrueifkey[-1] is None(this functionality is used bykey_tracker(), to signal points within a JSON string where a new object key is expected, but not yet found).>>> in_object([None]) Traceback (most recent call last): ... ValueError: keypath elements must be instances of str, unicode, int or long, not NoneType (key[0] == None)
>>> in_object([None], True) True >>> in_object([None], accept_None=True) True
Raises a
ValueErrorifkeyis not a valid keypath:>>> in_object(['foo', {}]) Traceback (most recent call last): ... ValueError: keypath elements must be instances of str, unicode, int or long, not dict (key[1] == {})
>>> in_object([False, u'foo']) Traceback (most recent call last): ... ValueError: keypath elements must be instances of str, unicode, int or long, not bool (key[0] == False)
-
json_delta._util.in_x_error(key, offender)¶ Build the instance of
ValueErrorin_object()andin_array()raise ifkeypathis invalid.
-
json_delta._util.key_tracker(jstring, point=0, start_key=None, special_handler=None)¶ Generate points within
jstringwhere the keypath changes.This function also identifies points within objects where a new
key: valuepair is expected, by yielding a pseudo-keypath withNoneas the final element.- Parameters:
jstring: The JSON string to search.point: The point to start at.start_key: The starting keypath.special_handler: A function for handling extensions to JSON syntax (e.g._upatch.ellipsis_handler(), used to handle the...construction in udiffs).
>>> next(key_tracker('{}')) (1, (None,))
-
json_delta._util.licit_starts(start_chars=u'{}[]"-0123456789tfn \t\n\r')¶ Compute the bytestrings a UTF-x encoded string can begin with.
This function is intended for encoding detection when the beginning of the encoded string must be one of a limited set of characters, as for JSON or the udiff format. The argument
start_charsmust be an iterable of valid beginnings.
-
json_delta._util.nearest_of(string, *subs)¶ Find the index of the substring in
substhat occurs earliest instring, orlen(string)if none of them do.
-
json_delta._util.predicate_count(iterable, predicate=<function <lambda>>) Count items
xiniterablesuch thatpredicate(x).The default
predicateislambda x: True, sopredicate_count(iterable)will count the values generated byiterable. Note that if the iterable is a generator, this function will exhaust it, and if it is an infinite generator, this function will never return!>>> predicate_count([True] * 16) 16 >>> predicate_count([True, True, False, True, True], lambda x: x) 4
-
json_delta._util.read_bytestring(file)¶ Read the contents of
fileas abytesobject.
-
json_delta._util.skip_string(jstring, point)¶ Assuming
jstringis a string, andjstring[point]is a"that starts a JSON string, returnxsuch thatjstring[x-1]is the"that terminates the string.When a
"is found, it is necessary to check that it is not escaped by a preceding backslash. As a backslash may itself be escaped, this amounts to checking that the number of backslashes immediately preceding the"is even (counting 0 as an even number):>>> test_string = r'"Fred \"Foonly\" McQuux"' >>> skip_string(test_string, 0) == len(test_string) True >>> backslash = chr(0x5c) >>> dbl_quote = chr(0x22) >>> even_slashes = ((r'"\\\\\\"', json.dumps(backslash * 3)), ... (r'"\\\\"', json.dumps(backslash * 2)), ... (r'"\\"', json.dumps(backslash))) >>> all((json.loads(L) == json.loads(R) for (L, R) in even_slashes)) True >>> all((skip_string(L, 0) == len(L) for (L, R) in even_slashes)) True >>> def cat_dump(*args): return json.dumps(''.join(args)) >>> odd_slashes = ( ... (r'"\\\\\\\" "', cat_dump(backslash * 3, dbl_quote, ' ' * 2)), ... (r'"\\\\\" "', cat_dump(backslash * 2, dbl_quote, ' ' * 4)), ... (r'"\\\" "', cat_dump(backslash * 1, dbl_quote, ' ' * 6)), ... (r'"\" "', cat_dump(dbl_quote, ' ' * 8)), ... ) >>> all((json.loads(L) == json.loads(R) for (L, R) in odd_slashes)) True >>> all((skip_string(L, 0) == 12 for (L, R) in odd_slashes)) True
-
json_delta._util.sniff_encoding(bytestring, starts={'\x00\x00\x007': u'utf_32_be', '\x00\n': u'utf_16_be', '\x00\x00\x00\r': u'utf_32_be', '\x00\t': u'utf_16_be', '\x00\x00\x00\t': u'utf_32_be', '\x00\x00\x00\n': u'utf_32_be', '\x00\r': u'utf_16_be', '"\x00\x00\x00': u'utf_32_le', '2\x00': u'utf_16_le', '\x00\x00\x00]': u'utf_32_be', '\xef\xbb\xbf': u'utf_8_sig', '\x00"': u'utf_16_be', ' ': u'utf_8', '\x00 ': u'utf_16_be', '\x00\x00\x00 ': u'utf_32_be', '\x00\x00\x00"': u'utf_32_be', '\x00\x00\x00-': u'utf_32_be', '\x00-': u'utf_16_be', '\x002': u'utf_16_be', '0': u'utf_8', '\x000': u'utf_16_be', '\x001': u'utf_16_be', '\x006': u'utf_16_be', '4': u'utf_8', '\x004': u'utf_16_be', '\x005': u'utf_16_be', '8': u'utf_8', '\x008': u'utf_16_be', '\xff\xfe\x00\x00': u'utf_32', '\x00\x00\x008': u'utf_32_be', '\x00\x00\x001': u'utf_32_be', ']\x00\x00\x00': u'utf_32_le', '-\x00': u'utf_16_le', 'f\x00\x00\x00': u'utf_32_le', '\x00\x00\x00f': u'utf_32_be', '\x00[': u'utf_16_be', '5\x00': u'utf_16_le', 't\x00': u'utf_16_le', '\x00]': u'utf_16_be', ' \x00': u'utf_16_le', '\x00f': u'utf_16_be', '\x00\x00\x00n': u'utf_32_be', '\x00n': u'utf_16_be', '1\x00\x00\x00': u'utf_32_le', '\x00\x00\x00t': u'utf_32_be', 't': u'utf_8', '\x00t': u'utf_16_be', '4\x00\x00\x00': u'utf_32_le', '\x00{': u'utf_16_be', '\x00}': u'utf_16_be', '\x00\x00\xfe\xff': u'utf_32', '7\x00\x00\x00': u'utf_32_le', '0\x00': u'utf_16_le', '8\x00': u'utf_16_le', 'f\x00': u'utf_16_le', '3': u'utf_8', '7': u'utf_8', '{\x00\x00\x00': u'utf_32_le', ']\x00': u'utf_16_le', '\x00\x00\x00}': u'utf_32_be', '\t\x00': u'utf_16_le', '[': u'utf_8', '3\x00': u'utf_16_le', '\x00\x00\x00{': u'utf_32_be', '{': u'utf_8', '-\x00\x00\x00': u'utf_32_le', '\n': u'utf_8', '0\x00\x00\x00': u'utf_32_le', 'n\x00\x00\x00': u'utf_32_le', '6\x00': u'utf_16_le', '\x00\x00\x004': u'utf_32_be', '"': u'utf_8', '3\x00\x00\x00': u'utf_32_le', '\x003': u'utf_16_be', '\x00\x00\x00[': u'utf_32_be', '\x00\x00\x006': u'utf_32_be', '2': u'utf_8', '}\x00': u'utf_16_le', '6\x00\x00\x00': u'utf_32_le', '6': u'utf_8', 't\x00\x00\x00': u'utf_32_le', '\x00\x00\x000': u'utf_32_be', '\x007': u'utf_16_be', '\x00\x00\x002': u'utf_32_be', '9\x00\x00\x00': u'utf_32_le', '\t\x00\x00\x00': u'utf_32_le', '1\x00': u'utf_16_le', '[\x00': u'utf_16_le', '[\x00\x00\x00': u'utf_32_le', '\x009': u'utf_16_be', ' \x00\x00\x00': u'utf_32_le', 'f': u'utf_8', '9\x00': u'utf_16_le', '}\x00\x00\x00': u'utf_32_le', 'n': u'utf_8', '\xfe\xff': u'utf_16', '\t': u'utf_8', '\n\x00\x00\x00': u'utf_32_le', '\r': u'utf_8', '\r\x00\x00\x00': u'utf_32_le', '\n\x00': u'utf_16_le', '4\x00': u'utf_16_le', '-': u'utf_8', '1': u'utf_8', '{\x00': u'utf_16_le', '5': u'utf_8', '9': u'utf_8', '\xff\xfe': u'utf_16', '2\x00\x00\x00': u'utf_32_le', '\x00\x00\x005': u'utf_32_be', 'n\x00': u'utf_16_le', '5\x00\x00\x00': u'utf_32_le', '\x00\x00\x003': u'utf_32_be', ']': u'utf_8', '\x00\x00\x009': u'utf_32_be', '"\x00': u'utf_16_le', '\r\x00': u'utf_16_le', '7\x00': u'utf_16_le', '8\x00\x00\x00': u'utf_32_le', '}': u'utf_8'}, complete=True) Determine the encoding of a UTF-x encoded string.
The argument
startsmust be a mapping of bytestrings the input can begin with onto the encoding that such a beginning would represent (seelicit_starts()for a function that can build such a mapping).The
completeflag signifies whether the input represents the entire string: if it is setFalse, the function will attempt to determine the encoding, but will raise aUnicodeErrorif it is ambiguous. For example, an input ofb'\xff\xfe'could be the UTF-16 little-endian byte-order mark, or, if the input is incomplete, it could be the first two characters of the UTF-32-LE BOM:>>> sniff_encoding(b'\xff\xfe') == 'utf_16' True >>> sniff_encoding(b'\xff\xfe', complete=False) Traceback (most recent call last): ... UnicodeError: String encoding is ambiguous.
-
json_delta._util.stanzas_addressing(stanzas, keypath)¶ Find diff stanzas modifying the structure at
keypath.The purpose of this function is to keep track of changes made to the overall structure by stanzas earlier in the sequence, e.g.:
>>> struc = [ ... 'foo', ... 'bar', [ ... 'baz' ... ] ... ] >>> stanzas = [ ... [ [2, 1], 'quux'], ... [ [0] ], ... [ [1, 2], 'quordle'] ... ] >>> (stanzas_addressing(stanzas, [2]) ... == [ ... [ [1], 'quux' ], ... [ [2], 'quordle' ] ... ]) True
stanzas[0]andstanzas[2]both address the same element ofstruc— the list that starts off as['baz'], even though their keypaths are completely different, because the diff stanza[[0]]moves the list['baz']from index 2 ofstructo index 1.The return value is a sub-diff: a list of stanzas fit to modify the element at
keypathwithin the overall structure.