JSON-delta: a diff/patch pair for JSON-serialized data structures¶
JSON-delta is a multi-language software suite for computing deltas between JSON-serialized data structures, and applying those deltas as patches. It enables separate programs at either end of a communications channel (e.g. client and server over HTTP, or two processes using IPC) to manipulate the same data structure while minimizing communications overhead.
If you’re not clear what this means, or you simply can’t see the point, see JSON-delta by example for a (somewhat whimsical) exposition of the basic idea.
If, on the other hand, you’re fully sold and want to know how to use one of the available implementations of JSON-delta, see The JSON-delta API, which documents the functions every implementation makes available. This is something like a “standard” for the JSON-delta “API” (I’ll consider that the software has outgrown these quotes when it gets more users than just me…) Since the Python implementation is the most developed, if you’re thinking in terms of standards, you can consider it the reference implementation.
Further documentation of individual implementations is also available, along with manpages for the CLI programs json_diff(1) and json_patch(1)
Donations to support the continuing development of JSON-delta will be gratefully received via gratipay, PayPal (himself@phil-roberts.name) or Bitcoin: 1HPJHRpVSm1Y4zrgppd2c6LysjxeabbQN4
News¶
- Bugfix release 1.1.3 is out. Serious bugs are addressed in this release, so Javascript users and anyone who relies on the upatch functionality should upgrade.
- Bugfix release 1.1.2 is out, featuring a sharper
distinction between minimal and non-minimal diffs. Non-minimal
diffs can now be called for on the command line by running
json_diff --fast
. - Bugfix release 1.1.1 is out. Javascript users and anyone who uses udiffs should upgrade.
- The Heisenbug referred to below has been fixed, along with a more serious bug: in v1.0 of both the Python and Javascript implementations, more than one addition to a non-top-level array—that is, an array nested within one or more other arrays or objects—is not encoded properly in diffs where the minimal flag is set to True. There was no official release of v1.0 for Python, but for ease of reference I’m calling both the fixed versions 1.1. It is recommended that all users upgrade.
- There is a Heisenbug in the udiff part of what I’m retroactively calling v1.0 of the python implementation: it shows up some of the time for one test case, and then only when running python3. Due to this and other bugfixes, I recommed upgrading to v1.1.
- v1.0 of the Javascript implementation has now been released. For JSON-format diffs it is now every bit as capable as the python version, and it has been extensively tested, not only against the JSON-delta test suite, but also with JSlint and JShint.
Downloads¶
The Python implementation (compatible with version 2.7 and later, including 3) is available from PyPI.
You can download the Javascript versions here (full, minified)
(Please, for the sake of my bandwidth bills, serve your own copy rather than hot-linking mine!)
The racket and perl implementations are alpha-quality at best. If you want to check them out, I recommend looking at the source repo (no pun intended): development of JSON-delta takes place against a master repository containing all implementations.
Implementations¶
The following table summarizes the feature-completeness of the available implementations of JSON-delta, with links to implementation-specific notes for each:
Language | Patch | Diff | Compact | U-patch | U-diff |
Python (2.7 and newer) | ✓ | ✓ | ✓ | ✓ | ✓ |
Javascript | ✓ | ✓ | ✓ | ✗ | ✗ |
Racket | ✓ | ✓ | ✗ | ✗ | ✗ |
The features described are as follows:
- Patch
- The implementation can manipulate data structures according to a diff in the format specified above.
- Diff
- The implementation can calculate deltas between two data structures in the format specified above.
- Compact
- Diffs produced by the implementation are as small as I can possibly make them, using a variant of Needleman-Wunsch sequence alignment to optimize stanzas modifying JSON arrays.
- U-diff
- The implementation is capable of emitting diffs in a format
reminiscent of the output of
diff -u
, which is designed to be more human-readable than the JSON format, to facilitate debugging. - U-patch
- The implementation can apply U-format patches.
The beginnings of a patch implementation in Perl can also be found in the source repo, but, because Perl doesn’t have as ramified a type ontology as Javascript, numeric values do not round-trip cleanly, so work on the Perl implementation is stalled for now.
Further Reading¶
JSON-delta by example¶
Consider the example JSON-LD entry for John Lennon from http://json-ld.org/:
{
"@context": "http://json-ld.org/contexts/person.jsonld",
"@id": "http://dbpedia.org/resource/John_Lennon",
"name": "John Lennon",
"born": "1940-10-09",
"spouse": "http://dbpedia.org/resource/Cynthia_Lennon"
}
Suppose we have a piece of software that updates this record to show his date of death, like so:
{
"@context": "http://json-ld.org/contexts/person.jsonld",
"@id": "http://dbpedia.org/resource/John_Lennon",
"name": "John Lennon",
"born": "1940-10-09",
"died": "1980-12-07",
"spouse": "http://dbpedia.org/resource/Cynthia_Lennon"
}
Further suppose that we wish to communicate this update to another piece of software whose only job is to store information about John Lennon in JSON-LD format. (Yes, I know this is getting unlikely, but stay with me.) If this Lennon-record-keeper accepts updates in json-delta format, all you have to do is send the following over the wire:
[[["died"],"1980-12-07"]]
This is a complete diff in json-delta format. It is itself a
JSON-serializable data structure: specifically, it is a sequence of
what I refer to as diff stanzas for some reason. The
format for a diff stanza is [<key path>, (<update>)]
(The
parentheses mean that the <update>
part is optional. I’ll get
to that in a minute). A key path is a sequence of keys specifying
where in the data structure the node you want to alter is found, much
like those emitted by JSON.sh. The stanza may be thought of as an
instruction to update the node found at that path so that its content
is equal to <update>
.
Now, let’s do some more supposing. Suppose the software we’re communicating with is dedicated to storing information about the Beatles in general. Also, suppose we’ve remembered that it was actually on the 8th of December 1980 that John Lennon died, not the 7th. Finally, suppose we live in an Orwellian dystopia, and Cynthia Lennon has been declared a non-person who must be expunged from all records. Unfortunately, json-delta is incapable of overthrowing corrupt and despotic governments, so let’s make one last supposition, that what we’re interested in is updating the record kept by the software on the other end of the wire, which looks like this:
[
{
"@context": "http://json-ld.org/contexts/person.jsonld",
"@id": "http://dbpedia.org/resource/John_Lennon",
"name": "John Lennon",
"born": "1940-10-09",
"died": "1980-12-07",
"spouse": "http://dbpedia.org/resource/Cynthia_Lennon"
},
{"name": "Paul McCartney"},
{"name": "George Harrison"},
{"name": "Ringo Starr"}
]
(Allegations of bias in favor of specific Beatles on the part of the maintainer of this record are punished by the aforementioned despotic government. All glory to Arstotzka!)
To make the changes we’ve decided on (correcting John’s date of death, and expunging Cynthia Lennon from the record), we need to send the following sequence:
[
[[0, "died"], "1980-12-08"],
[[0, "spouse"]]
]
Now, of course, you see what I meant when I said I’d tell you why
<update>
is optional later. If a stanza includes no update material,
it is interpreted as an instruction to delete the node the key-path
points to.
Note also that there is no difference between a stanza that adds a node, and one that changes one.
The intention is to save as much communications bandwidth as possible
without sacrificing the ability to communicate arbitrary modifications
to the data structure (this format can be used to describe a change
from any JSON-serialized object into any other). The worst-case
scenario, where there is no commonality between the two structures, is
that the protocol adds seven octets of overhead, because a diff can
always be expressed as [[[],<target>]]
, meaning “substitute
<target>
for the data structure that is to be modified”.
The JSON-delta API¶
This document is intended to describe the behaviour of the main entry points for every implementation of JSON-delta. For now, it effectively documents the top-level namespace of the Python implementation, as that is the most fully-developed implementation in the suite.
Core functions¶
-
json_delta.
diff
(left_struc, right_struc, minimal=None, verbose=True, key=None, array_align=True, compare_lengths=True, common_key_threshold=0.0)¶ Compose a sequence of diff stanzas sufficient to convert the structure
left_struc
into the structureright_struc
. (The goal is to add ‘necessary and’ to ‘sufficient’ above!).- Optional parameters:
verbose
: Print compression statistics to stderr, and warn if the setting ofminimal
contradicts the other parms.array_align
: Use_diff.needle_diff()
to compute deltas between arrays. Relatively computationally expensive, but likely to produce shorter diffs. Defaults toTrue
.compare_lengths
: If[[key, right_struc]]
can be encoded as a shorter JSON-string, return it instead of examining the internal structure ofleft_struc
andright_struc
. It involves callingjson.dumps()
twice for every node in the structure, but may result in smaller diffs. Defaults toTrue
.common_key_threshold
: Skip recursion intoleft_struc
andright_struc
if the fraction of keys they have in common (with the same value) is less than this parm (which should be a float between0.0
and1.0
). Defaults to 0.0.minimal
: Included for backwards compatibility.True
is equivalent to(array_align=True, compare_lengths=True, common_key_threshold=0.0)
;False
is equivalent to(array_align=False, compare_lengths=False, common_key_threshold=0.5)
. Specific settings ofarray_align
,compare_lengths
orcommon_key_threshold
will supersede this parm, warning on stderr ifverbose
andminimal
are both set.key
: Also included for backwards compatibility. If set, will be prepended to the key in each stanza of the output.
The parameter
key
is present because this function is mutually recursive with_diff.needle_diff()
and_diff.keyset_diff()
. If set to a list, it will be prefixed to every keypath in the output.
-
json_delta.
patch
(struc, diff, in_place=True)¶ Apply the sequence of diff stanzas
diff
to the structurestruc
.By default, this function modifies
struc
in place; setin_place
toFalse
to return a patched copy of struc instead:>>> will_change = [16] >>> wont_change = [16] >>> patch(will_change, [[[0]]]) [] >>> will_change [] >>> patch(wont_change, [[[0]]], False) [] >>> wont_change [16]
-
json_delta.
udiff
(left, right, patch=None, indent=0, use_ellipses=True, entry=True)¶ Render the difference between the structures
left
andright
as a string in a fashion inspired by diff -u.Generating a udiff is strictly slower than generating a normal diff with the same option parameters, since the udiff is computed on the basis of a normal diff between
left
andright
. If such a diff has already been computed (e.g. by callingdiff()
), pass it as thepatch
parameter:>>> (next(udiff({"foo": None}, {"foo": None}, patch=[])) == ... ' {...}') True
As you can see above, structures that are identical in
left
andright
are abbreviated using'...'
by default. To disable this behavior, setuse_ellipses
toFalse
.>>> ('\n'.join(udiff({"foo": None}, {"foo": None}, ... patch=[], use_ellipses=False)) == ... """ { ... "foo": ... null ... }""") True
>>> ('\n'.join(udiff([None, None, None], [None, None, None], ... patch=[], use_ellipses=False)) == ... """ [ ... null, ... null, ... null ... ]""") True
-
json_delta.
upatch
(struc, udiff, reverse=False, in_place=True)¶ Apply a patch as output by
json_delta.udiff()
tostruc
.As with
json_delta.patch()
,struc
is modified in place by default. Set the parmin_place
toFalse
if this is not the desired behaviour.The udiff format has enough information in it that this transformation can be applied in reverse: i.e. if
udiff
is the output ofudiff(left, right)
, you can reconstructright
givenleft
andudiff
(by runningupatch(left, udiff)
), or you can also reconstructleft
givenright
and udiff (by runningupatch(right, udiff, reverse=True)
). This is not possible for JSON-format diffs, since a[keypath]
stanza (meaning “delete the structure atkeypath
”) does not record what the deleted structure was.
load_and_*¶
For convenience when handling input that is already JSON-serialized, implementations should offer entry points named load_and_{FUNC}, which deserialize their input and then apply {FUNC} to it.
-
json_delta.
load_and_diff
(left=None, right=None, both=None, array_align=None, compare_lengths=None, common_key_threshold=None, minimal=None, verbose=True)¶ Apply
diff()
to strings or files representing JSON-serialized structures.Specify either
left
andright
, orboth
, like so:>>> (load_and_diff('{"foo":"bar"}', '{"foo":"baz"}', verbose=False) ... == [[["foo"],"baz"]]) True >>> (load_and_diff(both='[{"foo":"bar"},{"foo":"baz"}]', verbose=False) ... == [[["foo"],"baz"]]) True
left
,right
andboth
may be either strings (instances of basestring in 2.7) or file-like objects.minimal
andverbose
are passed through todiff()
, which see.A call to this function with string arguments is strictly equivalent to calling
diff(json.loads(left), json.loads(right), minimal=minimal, verbose=verbose)
ordiff(*json.loads(both), minimal=minimal, verbose=verbose)
, as appropriate.
-
json_delta.
load_and_patch
(struc=None, stanzas=None, both=None)¶ Apply
patch()
to strings or files representing JSON-serialized structures.Specify either
struc
andstanzas
, orboth
, like so:>>> (load_and_patch('{"foo":"bar"}', '[[["foo"],"baz"]]') == ... {"foo": "baz"}) True >>> (load_and_patch(both='[{"foo":"bar"},[[["foo"],"baz"]]]') == ... {"foo": "baz"}) True
struc
,stanzas
andboth
may be either strings (instances of basestring in 2.7) or file-like objects.A call to this function with string arguments is strictly equivalent to calling
patch(json.loads(struc), json.loads(stanzas), in_place=in_place)
orpatch(*json.loads(both), in_place=in_place)
, as appropriate.
-
json_delta.
load_and_udiff
(left=None, right=None, both=None, stanzas=None, indent=0)¶ Apply
udiff()
to strings representing JSON-serialized structures.Specify either
left
andright
, orboth
, like so:>>> udiff = """ { ... "foo": ... - "bar" ... + "baz" ... }""" >>> test = load_and_udiff('{"foo":"bar"}', '{"foo":"baz"}') >>> '\n'.join(test) == udiff True >>> test = load_and_udiff(both='[{"foo":"bar"},{"foo":"baz"}]') >>> '\n'.join(test) == udiff True
left
,right
andboth
may be either strings (instances of basestring in 2.7) or file-like objects.stanzas
andindent
are passed through toudiff()
, which see.A call to this function with string arguments is strictly equivalent to calling
udiff(json.loads(left), json.loads(right), stanzas=stanzas, indent=indent)
orudiff(*json.loads(both), stanzas=stanzas, indent=indent)
, as appropriate.
-
json_delta.
load_and_upatch
(struc=None, json_udiff=None, both=None, reverse=False)¶ Apply
upatch()
to strings representing JSON-serialized structures.Specify either
struc
andjson_udiff
, orboth
, like so:>>> struc = '{"foo":"bar"}' >>> json_udiff = r'" {\n \"foo\":\n- \"bar\"\n+ \"baz\"\n }"' >>> both = r'[{"foo":"baz"}," '\ ... r'{\n \"foo\":\n- \"bar\"\n+ \"baz\"\n }"]' >>> load_and_upatch(struc, json_udiff) == {"foo": "baz"} True >>> load_and_upatch(both=both, reverse=True) == {"foo": "bar"} True
struc
,json_udiff
andboth
may be either strings (instances of basestring in 2.7) or file-like objects. Note thatjson_udiff
is so named because it must be a JSON-serialized representation of the udiff string, not the udiff string itself.reverse
is passed through toupatch()
, which see.A call to this function with string arguments is strictly equivalent to calling
upatch(json.loads(struc), json.loads(json_udiff), reverse=reverse, in_place=in_place)
orupatch(*json.loads(both), reverse=reverse, in_place=in_place)
, as appropriate.
json_diff¶
Synopsis¶
json_diff [--output FILE] [--verbose] [--unified] [left] [right]
json_diff [--version]
json_diff [--help]
Description¶
json_diff produces deltas between JSON-serialized data structures.
If no arguments are specified, stdin will be expected to be a JSON
array [left, right]
, and the output will be written to stdout.
The default output is itself a JSON data structure, specifically an
array of arrays of the form [<keypath>]
or [<keypath>,
<replacement>]
. The companion program json_patch(1) can
be used to apply such a diff.
A keypath is an array of string or integer tokens specifying a
path to a terminal node in the data structure. For example, in the
structure [{}, {"foo": "bar"}]
, the string "bar"
appears at
the node addressed by the key sequence [1, 'foo']
, and the empty
object {}
appears at key sequence [0]
.
If a diff stanza is an array of length 1, consisting only of a key sequence, json_patch(1) interprets it as an instruction to delete the node the key sequence points to. If a stanza is of length 2, the node is replaced by the last element of the stanza.
An alternative output format for json_diff is accessed using the
--unified
/ -u
option. This is designed to be
more legible to the human eye, inspired by unified diffs as output by
diff(1). json_patch(1) can read
either format, and, since there is enough information in the format,
can apply --unified
patches in reverse.
json_diff will accept input in any of the encodings specified in RFC
7159, namely UTF-8, -16 or -32, with or without byte-order marks. The
default encoding for output is UTF-8 with no BOM, but this can be
changed using the --encoding
option.
Options¶
--output FILE, -o FILE | |
Write output to FILE instead of stdout. | |
--unified, -u | Write diffs in a more legible format,
inspired by the output of diff -u |
--encoding ENCODING | |
Select the encoding for the output. | |
--verbose | Print compression statistics on stderr. |
--version | Show the program’s version number and exit. |
--help, -h | Show a brief help message and exit. |
Examples¶
$ json_diff << 'EOF'
> [{"foo": "bar"},
> {"foo": "bar",
> "baz": ["quux"]}]
> EOF
[[["baz"],["quux"]]]
$ cat > foofile << 'EOF'
> {"foods": ["spam", "spam", "spam", "spam"],
> "weaponry": "Mainly battleaxes.",
> "spanish inquisition expected": false,
> "drinks": "Delicious mead!",
> "other supplies": null}
> EOF
$ cat > barfile << 'EOF'
> {"foods": ["spam", "spam", "spam", "pickled eggs", "spam"],
> "weaponry": "Mainly battleaxes.",
> "spanish inquisition expected": false,
> "drinks": "Soda water."}
> EOF
$ json_diff -u foofile barfile
--- foofile 2014-04-14 21:32:00 BST
+++ barfile 2014-04-14 21:32:17 BST
{
"foods":
...
"weaponry": "Mainly battleaxes.",
["spam",
...(2),
+ "pickled eggs",
"spam"]
"drinks":
- "Delicious mead!",
+ "Soda water.",
- "other supplies": null
}
Implementation Notes¶
The value of the --encoding
option in the Python implementation of
json_diff
is fed straight to the encode()
function, so it is
possible to get output in any encoding supported by the Python
implementation used to run the script. This makes various mildly
interesting things possible, like getting compressed output using
--encoding bz2
or --encoding zlib
, or even --encoding
rot-13
(Furrfu!)
json_patch¶
Synopsis¶
json_patch [--output FILE] [--unified | --normal]
[--strip [NUM]] [--reverse] [originalfile] [patchfile]
json_patch [--version]
json_patch [--help]
Description¶
json_patch applies diffs in the format produced by json_diff(1) to JSON-serialized data structures.
The program attempts to mimic the interface of the patch(1) utility as far as possible, while also remaining compatible with the script functionality of the json_delta.py library on which it relies. There are, therefore, at least four different ways its input can be specified.
- The simplest, of course, is if the filenames are both specified as positional arguments.
- Closely following in terms of simplicity, the inputs can be fed as
a JSON array
[<structure>,<patch>]
to standard input. - If only one positional argument is specified, it is read as the filename of the original data structure, and the patch is expected to appear on stdin.
- Finally, if there are no positional arguments, and stdin cannot be
parsed as JSON, it can alternatively be a udiff, as output by
json_diff -u
. In this case, json_patch will read the name of the file containing the structure to modify out of the first header line of the udiff (the one beginning with---
).
The most salient departure from the behavior of patch(1) is that, by default, json_patch will not modify files in place. Instead, the patched structure is written as JSON to stdout. Frankly, this is to save having to implement backup filename options, getting it wrong, and having angry hackers blame me for their lost data.
However, the input structure is read into memory before the output
file handle is opened, so an in-place modification can be accomplished
by setting the option --output
to point to <originalfile>
.
Also, note that json_diff and json_patch can only manipulate a single
file at a time: even the output of json_diff -u
is not a “unified”
diff sensu stricto.
json_patch will accept input in any of the encodings specified in
RFC 7159, namely UTF-8, -16 or -32, with or without byte-order marks.
The default encoding for output is UTF-8 with no BOM, but this can be
changed using the --encoding
option.
Options¶
--output FILE, -o FILE | |
Write output to FILE instead of stdout. | |
--unified, -u | Force the patch to be interpreted as a udiff. |
--normal, -n | Force the patch to be interpreted as a normal (i.e. JSON-format) patch |
--reverse, -R | Assume the patch was created with old and new files swapped. |
--strip NUM, -p NUM | |
Strip NUM leading components from file names read out of udiff headers. | |
--encoding ENCODING | |
Select the encoding for the output. | |
--version | Show the program’s version number and exit. |
--help, -h | Show a brief help message and exit. |
Udiff Format¶
The program has strict requirements of the format of “unified” diffs.
It works by discarding header lines, then creating two strings: one by
discarding every line beginning with -
, then discarding the first
character of every remaining line, and one following the same
procedure, but with lines beginning with +
discarded. For
json_patch to function, these strings must be interpretable according
to the following superset of the JSON spec:
- Within objects, the string
...
may appear in any context where a"property": <object>
construction would be valid JSON. This indicates that one or more properties have been omitted from the representation of the object. - Within arrays, the string
...
may appear as an array element. It may optionally be followed by an integer in parentheses, e.g.(1)
,(15)
. This indicates that that number of elements have been omitted from the array, or that one element has, if no parenthesized number is present.
The program reconstructs the JSON-format diff on the basis of these strings, and then applies it to the input structure.
Implementation Notes¶
The value of the --encoding
option in the Python implementation of
json_diff
is fed straight to the encode()
function, so it is
possible to get output in any encoding supported by the Python
implementation used to run the script. This makes various mildly
interesting things possible, like getting compressed output using
--encoding bz2
or --encoding zlib
, or even --encoding
rot-13
(Furrfu!)
json_cat¶
Synopsis¶
json_cat [FILE]...
Description¶
Concatenate FILE(s), or standard input together and write them to standard output as a JSON array.
Input streams are parsed as JSON if possible, otherwise they are added to the array as strings.
Output is always UTF-8 encoded.
Examples¶
$ echo '{"foo": true, "bar": false,
> "baz": null}' > foofile
$ json_cat foofile - << 'EOF'
> This text cannot be parsed as JSON.
> EOF
[{"foo": true, "bar": false, "baz": null}, "This text cannot be parsed as JSON."]
$ echo 'You can use json_cat to create 1-element JSON arrays of text,
> if that'\''s something you like to do...' | json_cat
["You can use json_cat to create 1-element JSON arrays of text, if that's something you like to do..."]
Python implementation notes¶
The Python implementation of JSON-delta consists of a package
json_delta
, whose top-level namespace is documented in
The JSON-delta API. The implementation is divided into five sub-modules of
the package, whose names all begin with an underscore to highlight the
fact that they are not part of the API: the way the functions
documented in The JSON-delta API are implemented is subject to refactoring
at any time. Nevertheless, the sub-modules are documented here.
json_delta._diff¶
Functions for computing JSON-format diffs.
-
json_delta._diff.
diff
(left_struc, right_struc, array_align=True, compare_lengths=True, common_key_threshold=0.0, verbose=True, key=None)¶ Compose a sequence of diff stanzas sufficient to convert the structure
left_struc
into the structureright_struc
. (Whether you can add ‘necessary and’ to ‘sufficient to’ depends on the setting of the other parms, and how many cycles you want to burn; see below).- Optional parameters:
array_align
: Useneedle_diff()
to compute deltas between arrays. Computationally expensive, but likely to produce shorter diffs. If this parm is set to the string'udiff'
,needle_diff()
will optimize for the shortest udiff, instead of the shortest JSON-format diff. Otherwise, set to any value that is true in a Boolean context to enable.compare_lengths
: If[[key, right_struc]]
can be encoded as a shorter JSON-string, return it instead of examining the internal structure ofleft_struc
andright_struc
. It involves callingjson.dumps()
twice for every node in the structure, but may result in smaller diffs.common_key_threshold
: Skip recursion intoleft_struc
andright_struc
if the fraction of keys they have in common (as computed bycommonality()
, which see) is less than this parm (which should be a float between0.0
and1.0
).verbose
: Print compression statistics will be to stderr.
The parameter
key
is present because this function is mutually recursive withneedle_diff()
andkeyset_diff()
. If set to a list, it will be prefixed to every keypath in the output.
-
json_delta._diff.
append_key
(stanzas, left_struc, keypath=())¶ Get the appropriate key for appending to the sequence
left_struc
.stanzas
should be a diff, some of whose stanzas may modify a sequenceleft_struc
that appears at pathkeypath
. If any of the stanzas append toleft_struc
, the return value is the largest index inleft_struc
they address, plus one. Otherwise, the return value islen(left_struc)
(i.e. the index that a value would have if it was appended toleft_struc
).>>> append_key([], []) 0 >>> append_key([[[2], 'Baz']], ['Foo', 'Bar']) 3 >>> append_key([[[2], 'Baz'], [['Quux', 0], 'Foo']], [], ['Quux']) 1
-
json_delta._diff.
commonality
(left_struc, right_struc)¶ Return a float between
0.0
and1.0
representing the amount that the structuresleft_struc
andright_struc
have in common.Return value is computed as the fraction (elements in common) / (total elements).
-
json_delta._diff.
compute_diff_stats
(target, diff, percent=True)¶ Calculate the size of a minimal JSON dump of
target
anddiff
, and the ratio of the two sizes.The ratio is expressed as a percentage if
percent
isTrue
in a Boolean context , or as a float otherwise.Return value is a tuple of the form
({ratio}, {size of target}, {size of diff})
>>> compute_diff_stats([{}, 'foo', 'bar'], [], False) (0.125, 16, 2) >>> compute_diff_stats([{}, 'foo', 'bar'], [[0], {}]) (50.0, 16, 8)
-
json_delta._diff.
compute_keysets
(left_seq, right_seq)¶ Compare the keys of
left_seq
vs.right_seq
.Determines which keys
left_seq
andright_seq
have in common, and which are unique to each of the structures. Arguments should be instances of the same basic type, which must be a non-terminal: i.e.list
ordict
. If they are lists, the keys compared will be integer indices.- Returns:
- Return value is a 3-tuple of sets
({overlap}, {left_only}, {right_only})
. As their names suggest,overlap
is a set of keysleft_seq
have in common,left_only
represents keys only found inleft_seq
, andright_only
holds keys only found inright_seq
. - Raises:
- AssertionError if
left_seq
is not an instance oftype(right_seq)
, or if they are not of a non-terminal type.
>>> (compute_keysets({'foo': None}, {'bar': None}) ... == (set([]), {'foo'}, {'bar'})) True >>> (compute_keysets({'foo': None, 'baz': None}, ... {'bar': None, 'baz': None}) ... == ({'baz'}, {'foo'}, {'bar'})) True >>> (compute_keysets(['foo', 'baz'], ['bar', 'baz']) ... == ({0, 1}, set([]), set([]))) True >>> compute_keysets(['foo'], ['bar', 'baz']) == ({0}, set([]), {1}) True >>> compute_keysets([], ['bar', 'baz']) == (set([]), set([]), {0, 1}) True
-
json_delta._diff.
diff
(left_struc, right_struc, array_align=True, compare_lengths=True, common_key_threshold=0.0, verbose=True, key=None) Compose a sequence of diff stanzas sufficient to convert the structure
left_struc
into the structureright_struc
. (Whether you can add ‘necessary and’ to ‘sufficient to’ depends on the setting of the other parms, and how many cycles you want to burn; see below).- Optional parameters:
array_align
: Useneedle_diff()
to compute deltas between arrays. Computationally expensive, but likely to produce shorter diffs. If this parm is set to the string'udiff'
,needle_diff()
will optimize for the shortest udiff, instead of the shortest JSON-format diff. Otherwise, set to any value that is true in a Boolean context to enable.compare_lengths
: If[[key, right_struc]]
can be encoded as a shorter JSON-string, return it instead of examining the internal structure ofleft_struc
andright_struc
. It involves callingjson.dumps()
twice for every node in the structure, but may result in smaller diffs.common_key_threshold
: Skip recursion intoleft_struc
andright_struc
if the fraction of keys they have in common (as computed bycommonality()
, which see) is less than this parm (which should be a float between0.0
and1.0
).verbose
: Print compression statistics will be to stderr.
The parameter
key
is present because this function is mutually recursive withneedle_diff()
andkeyset_diff()
. If set to a list, it will be prefixed to every keypath in the output.
-
json_delta._diff.
keyset_diff
(left_struc, right_struc, key, options={})¶ Return a diff between
left_struc
andright_struc
.It is assumed that
left_struc
andright_struc
are both non-terminal types (serializable as arrays or objects). Sequences are treated just like mappings by this function, so the diffs will be correct but not necessarily minimal. For a minimal diff between two sequences, useneedle_diff()
.This function probably shouldn’t be called directly. Instead, use
diff()
, which will callkeyset_diff()
if appropriate anyway.
-
json_delta._diff.
needle_diff
(left_struc, right_struc, key, options={})¶ Returns a diff between
left_struc
andright_struc
.If
left_struc
andright_struc
are both serializable as arrays, this function will use a Needleman-Wunsch sequence alignment to find a minimal diff between them. Otherwise, the inputs are passed on tokeyset_diff()
.This function probably shouldn’t be called directly. Instead, use
diff()
, which is mutually recursive with this function andkeyset_diff()
anyway.
-
json_delta._diff.
sort_stanzas
(stanzas)¶ Sort the stanzas in a diff.
Object changes can occur in any order, but deletions from arrays have to happen last node first:
['foo', 'bar', 'baz']
→['foo', 'bar']
→['foo']
→[]
; additions to arrays have to happen leftmost-node-first:[]
→['foo']
→['foo', 'bar']
→['foo', 'bar', 'baz']
, and insert-and-shift alterations to arrays must happen last:['foo', 'quux']
→['foo', 'bar', 'quux']
→['foo', 'bar', 'baz', 'quux']
.Finally, stanzas are sorted in descending order of length of keypath, so that the most deeply-nested structures are altered before alterations which might change their keypaths take place.
Note that this will also sort changes to objects (dicts) so that they occur first of all.
-
json_delta._diff.
split_diff
(stanzas)¶ Split a diff into modifications, deletions and insertions.
Return value is a 4-tuple of lists: the first is a list of stanzas from
stanzas
that modify JSON objects, the second is a list of stanzas that add or change elements in JSON arrays, the third is a list of stanzas which delete elements from arrays, and the fourth is a list of stanzas which insert elements into arrays (stanzas ending in"i"
).
-
json_delta._diff.
structure_comparable
(left_struc, right_struc)¶ Test if
left_struc
andright_struc
can be efficiently diffed.
-
json_delta._diff.
this_level_diff
(left_struc, right_struc, key=None, common=None)¶ Return a sequence of diff stanzas between the structures
left_struc
andright_struc
, assuming that they are each at the key-pathkey
within the overall structure.>>> (this_level_diff({'foo': 'bar', 'baz': 'quux'}, ... {'foo': 'bar'}) ... == [[['baz']]]) True >>> (this_level_diff({'foo': 'bar', 'baz': 'quux'}, ... {'foo': 'bar'}, ['quordle']) ... == [[['quordle', 'baz']]]) True
json_delta._patch¶
Functions for applying JSON-format patches.
-
json_delta._patch.
patch
(struc, diff, in_place=True)¶ Apply the sequence of diff stanzas
diff
to the structurestruc
.By default, this function modifies
struc
in place; setin_place
toFalse
to return a patched copy of struc instead:>>> will_change = [16] >>> wont_change = [16] >>> patch(will_change, [[[0]]]) [] >>> will_change [] >>> patch(wont_change, [[[0]]], False) [] >>> wont_change [16]
-
json_delta._patch.
patch
(struc, diff, in_place=True) Apply the sequence of diff stanzas
diff
to the structurestruc
.By default, this function modifies
struc
in place; setin_place
toFalse
to return a patched copy of struc instead:>>> will_change = [16] >>> wont_change = [16] >>> patch(will_change, [[[0]]]) [] >>> will_change [] >>> patch(wont_change, [[[0]]], False) [] >>> wont_change [16]
-
json_delta._patch.
patch_stanza
(struc, stanza)¶ Applies the stanza
stanza
to the structurestruc
as a patch.Note that this function modifies
struc
in-place into the target ofstanza
. Ifstruc
is atuple()
, you get a new tuple with the appropriate modification made:>>> patch_stanza((17, 3.141593, None), [[1], 3.14159265]) (17, 3.14159265, None)
json_delta._udiff¶
Functions for computing udiffs. Main entry point: udiff()
.
The data structure representing a udiff that these functions all
manipulate is a pair of lists of iterators (left_lines,
right_lines)
. These lists are expected (principally by
generate_udiff_lines()
, which processes them), to be of the
same length. A pair of iterators (left_lines[i], right_lines[i])
may yield exactly the same sequence of output lines, each with ' '
as the first character (representing parts of the structure the input
and output have in common). Alternatively, they may each yield zero
or more lines (referring to parts of the structure that are unique to
the inputs they represent). In this case, all lines yielded by
left_lines[i]
should begin with '-'
, and all lines yielded by
right_lines[i]
should begin with '+'
.
-
json_delta._udiff.
udiff
(left, right, patch=None, indent=0, use_ellipses=True, entry=True)¶ Render the difference between the structures
left
andright
as a string in a fashion inspired by diff -u.Generating a udiff is strictly slower than generating a normal diff with the same option parameters, since the udiff is computed on the basis of a normal diff between
left
andright
. If such a diff has already been computed (e.g. by callingdiff()
), pass it as thepatch
parameter:>>> (next(udiff({"foo": None}, {"foo": None}, patch=[])) == ... ' {...}') True
As you can see above, structures that are identical in
left
andright
are abbreviated using'...'
by default. To disable this behavior, setuse_ellipses
toFalse
.>>> ('\n'.join(udiff({"foo": None}, {"foo": None}, ... patch=[], use_ellipses=False)) == ... """ { ... "foo": ... null ... }""") True
>>> ('\n'.join(udiff([None, None, None], [None, None, None], ... patch=[], use_ellipses=False)) == ... """ [ ... null, ... null, ... null ... ]""") True
-
class
json_delta._udiff.
Gap
¶ Class to represent gaps introduced by sequence alignment.
-
json_delta._udiff.
add_matter
(seq, matter, indent)¶ Add material to
seq
, treating it appropriately for its type.matter
may be an iterator, in which case it is appended toseq
. If it is a sequence, it is assumed to be a sequence of iterators, the sequence is concatenated ontoseq
. Ifmatter
is a string, it is turned into a patch band usingsingle_patch_band()
, which is appended. Finally, ifmatter
isNone
, an empty iterable is appended toseq
.This function is a udiff-forming primitive, called by more specific functions defined within
udiff_dict()
andudiff_list()
.
-
json_delta._udiff.
commafy
(gen, comma=True)¶ Yield from
gen
, ensuring that the final result ends with a comma iffcomma
isTrue
.>>> gen = ['Example line'] >>> next(commafy(iter(gen))) == 'Example line,' True >>> next(commafy(iter(gen), False)) == 'Example line' True >>> gen = ['Line with a comma at the end,'] >>> (next(commafy(iter(gen), comma=True)) ... == next(commafy(iter(gen), comma=False)) ... == 'Line with a comma at the end,') True
-
json_delta._udiff.
curry_functions
(local_ns)¶ Create partials of
_add_common_matter()
,_add_differing_matter()
and_commafy_last()
, with values forleft_lines
,right_lines
and (where appropriate)indent
taken from the dictionarylocal_ns
.Appropriate defaults are also included in the partials, namely
left=None
andright=None
for_add_differing_matter()
andleft_comma=True
andright_comma=None
for_commafy_last()
.
-
json_delta._udiff.
generate_udiff_lines
(left, right)¶ Combine the diff lines from
left
andright
, and generate the lines of the resulting udiff.
-
json_delta._udiff.
patch_bands
(indent, material, sigil=u' ')¶ Generate appropriately indented patch bands, with
sigil
as the first character.
-
json_delta._udiff.
reconstruct_alignment
(left, right, stanzas)¶ Reconstruct the sequence alignment between the lists
left
andright
implied bystanzas
.
-
json_delta._udiff.
single_patch_band
(indent, line, sigil=u' ')¶ Convenience function returning an iterable that generates a single patch band.
-
json_delta._udiff.
udiff
(left, right, patch=None, indent=0, use_ellipses=True, entry=True) Render the difference between the structures
left
andright
as a string in a fashion inspired by diff -u.Generating a udiff is strictly slower than generating a normal diff with the same option parameters, since the udiff is computed on the basis of a normal diff between
left
andright
. If such a diff has already been computed (e.g. by callingdiff()
), pass it as thepatch
parameter:>>> (next(udiff({"foo": None}, {"foo": None}, patch=[])) == ... ' {...}') True
As you can see above, structures that are identical in
left
andright
are abbreviated using'...'
by default. To disable this behavior, setuse_ellipses
toFalse
.>>> ('\n'.join(udiff({"foo": None}, {"foo": None}, ... patch=[], use_ellipses=False)) == ... """ { ... "foo": ... null ... }""") True
>>> ('\n'.join(udiff([None, None, None], [None, None, None], ... patch=[], use_ellipses=False)) == ... """ [ ... null, ... null, ... null ... ]""") True
-
json_delta._udiff.
udiff_dict
(left, right, stanzas, indent=0, use_ellipses=True)¶ Construct a human-readable delta between
left
andright
.This function probably shouldn’t be called directly. Instead, use
udiff()
with the same arguments.udiff()
andudiff_dict()
are mutually recursive, anyway.
-
json_delta._udiff.
udiff_list
(left, right, stanzas, indent=0, use_ellipses=True)¶ Construct a human-readable delta between
left
andright
.This function probably shouldn’t be called directly. Instead, use
udiff()
with the same arguments.udiff()
andudiff_list()
are mutually recursive, anyway.
json_delta._upatch¶
-
json_delta._upatch.
upatch
(struc, udiff, reverse=False, in_place=True)¶ Apply a patch as output by
json_delta.udiff()
tostruc
.As with
json_delta.patch()
,struc
is modified in place by default. Set the parmin_place
toFalse
if this is not the desired behaviour.The udiff format has enough information in it that this transformation can be applied in reverse: i.e. if
udiff
is the output ofudiff(left, right)
, you can reconstructright
givenleft
andudiff
(by runningupatch(left, udiff)
), or you can also reconstructleft
givenright
and udiff (by runningupatch(right, udiff, reverse=True)
). This is not possible for JSON-format diffs, since a[keypath]
stanza (meaning “delete the structure atkeypath
”) does not record what the deleted structure was.
-
json_delta._upatch.
ellipsis_handler
(jstring, point, key)¶ Extends
key_tracker()
to handle the…
construction.
-
json_delta._upatch.
is_none_key
(key)¶ Is the last element of
key
None
?
-
json_delta._upatch.
reconstruct_diff
(udiff, reverse=False)¶ Turn a udiff back into a JSON-format diff.
Set
reverse
toTrue
to generate a reverse diff (i.e. swap the significance of line-initial+
and-
).Header lines (if present) are ignored:
>>> udiff = """--- <stdin> ... +++ <stdin> ... -false ... +true""" >>> reconstruct_diff(udiff) [[[], True]] >>> reconstruct_diff(udiff, reverse=True) [[[], False]]
-
json_delta._upatch.
skip_key
(point, key, origin, keys, predicate)¶ Find the next result in
keys
for whichpredicate(key)
isFalse
.If none is found, or if
key
is already such a result, the return value is(point, key)
.
-
json_delta._upatch.
sort_stanzas
(stanzas)¶ Sorts the stanzas in a diff.
reconstruct_diff()
works on different assumptions fromjson_delta._diff.needle_diff()
when it comes to stanzas altering arrays: keys in such stanzas relate to the element’s position within the array’s longest intermediate representation during the transformation (that is after all insert-and-shifts, after all appends, but before any deletions). This function sortsstanzas
to reflect that order of operations.As with
json_delta._diff.sort_stanzas()
(which see), stanzas are sorted for length so the most deeply-nested structures get their modifications first.
-
json_delta._upatch.
udiff_key_tracker
(udiff, point=0, start_key=None)¶ Find points within the udiff where the active keypath changes.
-
json_delta._upatch.
upatch
(struc, udiff, reverse=False, in_place=True) Apply a patch as output by
json_delta.udiff()
tostruc
.As with
json_delta.patch()
,struc
is modified in place by default. Set the parmin_place
toFalse
if this is not the desired behaviour.The udiff format has enough information in it that this transformation can be applied in reverse: i.e. if
udiff
is the output ofudiff(left, right)
, you can reconstructright
givenleft
andudiff
(by runningupatch(left, udiff)
), or you can also reconstructleft
givenright
and udiff (by runningupatch(right, udiff, reverse=True)
). This is not possible for JSON-format diffs, since a[keypath]
stanza (meaning “delete the structure atkeypath
”) does not record what the deleted structure was.
json_delta._util¶
Utility functions and constants used by more than one submodule.
The majority of python 2/3 compatibility shims also appear in this module.
-
json_delta._util.
predicate_count
(iterable, predicate=lambda x: True)¶ Count items
x
initerable
such thatpredicate(x)
.The default
predicate
islambda x: True
, sopredicate_count(iterable)
will count the values generated byiterable
. Note that if the iterable is a generator, this function will exhaust it, and if it is an infinite generator, this function will never return!>>> predicate_count([True] * 16) 16 >>> predicate_count([True, True, False, True, True], lambda x: x) 4
-
json_delta._util.
uniquify
(bytestring, key=lambda x: x)¶ Remove duplicate elements from a list while preserving order.
key
works as formin()
,max()
, etc. in the standard library.
-
json_delta._util.
sniff_encoding
(bytestring, starts=JSON_STARTS, complete=True)¶ Determine the encoding of a UTF-x encoded string.
The argument
starts
must be a mapping of bytestrings the input can begin with onto the encoding that such a beginning would represent (seelicit_starts()
for a function that can build such a mapping).The
complete
flag signifies whether the input represents the entire string: if it is setFalse
, the function will attempt to determine the encoding, but will raise aUnicodeError
if it is ambiguous. For example, an input ofb'\xff\xfe'
could be the UTF-16 little-endian byte-order mark, or, if the input is incomplete, it could be the first two characters of the UTF-32-LE BOM:>>> sniff_encoding(b'\xff\xfe') == 'utf_16' True >>> sniff_encoding(b'\xff\xfe', complete=False) Traceback (most recent call last): ... UnicodeError: String encoding is ambiguous.
-
json_delta._util.
_load_and_func
(func, parm1=None, parm2=None, both=None, **flags)¶ Decode JSON-serialized parameters and apply func to them.
-
json_delta._util.
all_paths
(struc)¶ Generate key-paths to every node in
struc
.Both terminal and non-terminal nodes are visited, like so:
>>> paths = [x for x in all_paths({'foo': None, 'bar': ['baz', 'quux']})] >>> [] in paths # ([] is the path to ``struc`` itself.) True >>> ['foo'] in paths True >>> ['bar'] in paths True >>> ['bar', 0] in paths True >>> ['bar', 1] in paths True >>> len(paths) 5
-
json_delta._util.
check_diff_structure
(diff)¶ Return
diff
(orTrue
) if it is structured as a sequence ofdiff
stanzas. Otherwise returnFalse
.[]
is a valid diff, so if it is passed to this function, the return value isTrue
, so that the return value is always true in a Boolean context ifdiff
is valid.>>> check_diff_structure('This is certainly not a diff!') False >>> check_diff_structure([]) True >>> check_diff_structure([None]) False >>> example_valid_diff = [[["foo", 6, 12815316313, "bar"], None]] >>> check_diff_structure(example_valid_diff) == example_valid_diff True >>> check_diff_structure([[["foo", 6, 12815316313, "bar"], None], ... [["foo", False], True]]) False
-
json_delta._util.
compact_json_dumps
(obj)¶ Compute the most compact possible JSON representation of
obj
.>>> test = { ... 'foo': 'bar', ... 'baz': ... ['quux', 'spam', ... 'eggs'] ... } >>> compact_json_dumps(test) in ( ... '{"foo":"bar","baz":["quux","spam","eggs"]}', ... '{"baz":["quux","spam","eggs"],"foo":"bar"}' ... ) True >>>
-
json_delta._util.
decode_json
(file_or_str)¶ Decode a JSON file-like object or string.
The following doctest is probably pointless as documentation. It is here so json-delta can claim 100% code coverage for its test suite!
>>> try: ... from StringIO import StringIO ... except ImportError: ... from io import StringIO >>> foo = '[]' >>> decode_json(foo) [] >>> decode_json(StringIO(foo)) []
-
json_delta._util.
decode_udiff
(file_or_str)¶ Decode a file-like object or bytestring udiff into a unicode string.
The udiff may be encoded in UTF-8, -16 or -32 (with or without BOM):
>>> udiff = u'- true\n+ false' >>> decode_udiff(udiff.encode('utf_32_be')) == udiff True >>> try: ... from StringIO import StringIO ... except ImportError: ... from io import BytesIO as StringIO >>> decode_udiff(StringIO(udiff.encode('utf-8-sig'))) == udiff True
An empty string is a valid udiff; this function will convert it to a unicode string:
>>> decode_udiff(b'') == u'' True
The function is idempotent: if you pass it a unicode string, it will be returned unmodified:
>>> decode_udiff(udiff) is udiff True
If you pass it a non-empty bytestring that cannot be interpreted as beginning with
' '
,'+'
,'-'
or a BOM in any encoding, aValueError
is raised:>>> decode_udiff(b':-)') Traceback (most recent call last): ... ValueError: String does not begin with any of the specified start chars.
-
json_delta._util.
follow_path
(struc, path)¶ Retrieve the value found at the key-path
path
withinstruc
.
-
json_delta._util.
in_array
(key, accept_None=False)¶ Should the keypath
key
point at a JSON array ([]
)?Works by testing whether
key[-1]
is anint
or (where appropriate)long
:>>> in_array([u'bar', 16]) True >>> import sys >>> sys.version >= '3' or eval("in_array([u'foo', 94L])") True
Returns
False
ifkey
addresses a non-array object…>>> in_array(["foo"]) False >>> in_array([u'bar']) False
…or if
key == []
(as in that case there’s no way of knowing whetherkey
addresses an object or an array).>>> in_array([]) False
If the
accept_None
flag is set, this function will not raise aValueError
ifkey[-1] is None
(keypaths of this form are used bykey_tracker()
, to signal points within a JSON string where a new object key is expected, but not yet found).>>> in_array([None]) Traceback (most recent call last): ... ValueError: keypath elements must be instances of str, unicode, int or long, not NoneType (key[0] == None)
>>> in_array([None], True) False >>> in_array([None], accept_None=True) False
Otherwise, a
ValueError
is raised ifkey
is not a valid keypath:>>> keypath = [{str("spam"): str("spam")}, "pickled eggs and spam", 7] >>> in_array(keypath) Traceback (most recent call last): ... ValueError: keypath elements must be instances of str, unicode, int or long, not dict (key[0] == {'spam': 'spam'})
-
json_delta._util.
in_object
(key, accept_None=False)¶ Should the keypath
key
point at a JSON object ({}
)?Works by testing whether
key[-1]
is a string or (where appropriate)unicode()
:>>> in_object(["foo"]) True >>> in_object([u'bar']) True
Returns
False
ifkey
addresses an array…>>> in_object([u'bar', 16]) False >>> import sys >>> False if sys.version >= '3' else eval("in_object([u'bar', 16L])") False
…if
key == []
…>>> in_object([]) False
If the
accept_None
flag is set, this function will also returnTrue
ifkey[-1] is None
(this functionality is used bykey_tracker()
, to signal points within a JSON string where a new object key is expected, but not yet found).>>> in_object([None]) Traceback (most recent call last): ... ValueError: keypath elements must be instances of str, unicode, int or long, not NoneType (key[0] == None)
>>> in_object([None], True) True >>> in_object([None], accept_None=True) True
Raises a
ValueError
ifkey
is not a valid keypath:>>> in_object(['foo', {}]) Traceback (most recent call last): ... ValueError: keypath elements must be instances of str, unicode, int or long, not dict (key[1] == {})
>>> in_object([False, u'foo']) Traceback (most recent call last): ... ValueError: keypath elements must be instances of str, unicode, int or long, not bool (key[0] == False)
-
json_delta._util.
in_x_error
(key, offender)¶ Build the instance of
ValueError
in_object()
andin_array()
raise ifkeypath
is invalid.
-
json_delta._util.
key_tracker
(jstring, point=0, start_key=None, special_handler=None)¶ Generate points within
jstring
where the keypath changes.This function also identifies points within objects where a new
key: value
pair is expected, by yielding a pseudo-keypath withNone
as the final element.- Parameters:
jstring
: The JSON string to search.point
: The point to start at.start_key
: The starting keypath.special_handler
: A function for handling extensions to JSON syntax (e.g._upatch.ellipsis_handler()
, used to handle the...
construction in udiffs).
>>> next(key_tracker('{}')) (1, (None,))
-
json_delta._util.
licit_starts
(start_chars=u'{}[]"-0123456789tfn \t\n\r')¶ Compute the bytestrings a UTF-x encoded string can begin with.
This function is intended for encoding detection when the beginning of the encoded string must be one of a limited set of characters, as for JSON or the udiff format. The argument
start_chars
must be an iterable of valid beginnings.
-
json_delta._util.
nearest_of
(string, *subs)¶ Find the index of the substring in
subs
that occurs earliest instring
, orlen(string)
if none of them do.
-
json_delta._util.
predicate_count
(iterable, predicate=<function <lambda>>) Count items
x
initerable
such thatpredicate(x)
.The default
predicate
islambda x: True
, sopredicate_count(iterable)
will count the values generated byiterable
. Note that if the iterable is a generator, this function will exhaust it, and if it is an infinite generator, this function will never return!>>> predicate_count([True] * 16) 16 >>> predicate_count([True, True, False, True, True], lambda x: x) 4
-
json_delta._util.
read_bytestring
(file)¶ Read the contents of
file
as abytes
object.
-
json_delta._util.
skip_string
(jstring, point)¶ Assuming
jstring
is a string, andjstring[point]
is a"
that starts a JSON string, returnx
such thatjstring[x-1]
is the"
that terminates the string.When a
"
is found, it is necessary to check that it is not escaped by a preceding backslash. As a backslash may itself be escaped, this amounts to checking that the number of backslashes immediately preceding the"
is even (counting 0 as an even number):>>> test_string = r'"Fred \"Foonly\" McQuux"' >>> skip_string(test_string, 0) == len(test_string) True >>> backslash = chr(0x5c) >>> dbl_quote = chr(0x22) >>> even_slashes = ((r'"\\\\\\"', json.dumps(backslash * 3)), ... (r'"\\\\"', json.dumps(backslash * 2)), ... (r'"\\"', json.dumps(backslash))) >>> all((json.loads(L) == json.loads(R) for (L, R) in even_slashes)) True >>> all((skip_string(L, 0) == len(L) for (L, R) in even_slashes)) True >>> def cat_dump(*args): return json.dumps(''.join(args)) >>> odd_slashes = ( ... (r'"\\\\\\\" "', cat_dump(backslash * 3, dbl_quote, ' ' * 2)), ... (r'"\\\\\" "', cat_dump(backslash * 2, dbl_quote, ' ' * 4)), ... (r'"\\\" "', cat_dump(backslash * 1, dbl_quote, ' ' * 6)), ... (r'"\" "', cat_dump(dbl_quote, ' ' * 8)), ... ) >>> all((json.loads(L) == json.loads(R) for (L, R) in odd_slashes)) True >>> all((skip_string(L, 0) == 12 for (L, R) in odd_slashes)) True
-
json_delta._util.
sniff_encoding
(bytestring, starts={'\x00\x00\x007': u'utf_32_be', '\x00\n': u'utf_16_be', '\x00\x00\x00\r': u'utf_32_be', '\x00\t': u'utf_16_be', '\x00\x00\x00\t': u'utf_32_be', '\x00\x00\x00\n': u'utf_32_be', '\x00\r': u'utf_16_be', '"\x00\x00\x00': u'utf_32_le', '2\x00': u'utf_16_le', '\x00\x00\x00]': u'utf_32_be', '\xef\xbb\xbf': u'utf_8_sig', '\x00"': u'utf_16_be', ' ': u'utf_8', '\x00 ': u'utf_16_be', '\x00\x00\x00 ': u'utf_32_be', '\x00\x00\x00"': u'utf_32_be', '\x00\x00\x00-': u'utf_32_be', '\x00-': u'utf_16_be', '\x002': u'utf_16_be', '0': u'utf_8', '\x000': u'utf_16_be', '\x001': u'utf_16_be', '\x006': u'utf_16_be', '4': u'utf_8', '\x004': u'utf_16_be', '\x005': u'utf_16_be', '8': u'utf_8', '\x008': u'utf_16_be', '\xff\xfe\x00\x00': u'utf_32', '\x00\x00\x008': u'utf_32_be', '\x00\x00\x001': u'utf_32_be', ']\x00\x00\x00': u'utf_32_le', '-\x00': u'utf_16_le', 'f\x00\x00\x00': u'utf_32_le', '\x00\x00\x00f': u'utf_32_be', '\x00[': u'utf_16_be', '5\x00': u'utf_16_le', 't\x00': u'utf_16_le', '\x00]': u'utf_16_be', ' \x00': u'utf_16_le', '\x00f': u'utf_16_be', '\x00\x00\x00n': u'utf_32_be', '\x00n': u'utf_16_be', '1\x00\x00\x00': u'utf_32_le', '\x00\x00\x00t': u'utf_32_be', 't': u'utf_8', '\x00t': u'utf_16_be', '4\x00\x00\x00': u'utf_32_le', '\x00{': u'utf_16_be', '\x00}': u'utf_16_be', '\x00\x00\xfe\xff': u'utf_32', '7\x00\x00\x00': u'utf_32_le', '0\x00': u'utf_16_le', '8\x00': u'utf_16_le', 'f\x00': u'utf_16_le', '3': u'utf_8', '7': u'utf_8', '{\x00\x00\x00': u'utf_32_le', ']\x00': u'utf_16_le', '\x00\x00\x00}': u'utf_32_be', '\t\x00': u'utf_16_le', '[': u'utf_8', '3\x00': u'utf_16_le', '\x00\x00\x00{': u'utf_32_be', '{': u'utf_8', '-\x00\x00\x00': u'utf_32_le', '\n': u'utf_8', '0\x00\x00\x00': u'utf_32_le', 'n\x00\x00\x00': u'utf_32_le', '6\x00': u'utf_16_le', '\x00\x00\x004': u'utf_32_be', '"': u'utf_8', '3\x00\x00\x00': u'utf_32_le', '\x003': u'utf_16_be', '\x00\x00\x00[': u'utf_32_be', '\x00\x00\x006': u'utf_32_be', '2': u'utf_8', '}\x00': u'utf_16_le', '6\x00\x00\x00': u'utf_32_le', '6': u'utf_8', 't\x00\x00\x00': u'utf_32_le', '\x00\x00\x000': u'utf_32_be', '\x007': u'utf_16_be', '\x00\x00\x002': u'utf_32_be', '9\x00\x00\x00': u'utf_32_le', '\t\x00\x00\x00': u'utf_32_le', '1\x00': u'utf_16_le', '[\x00': u'utf_16_le', '[\x00\x00\x00': u'utf_32_le', '\x009': u'utf_16_be', ' \x00\x00\x00': u'utf_32_le', 'f': u'utf_8', '9\x00': u'utf_16_le', '}\x00\x00\x00': u'utf_32_le', 'n': u'utf_8', '\xfe\xff': u'utf_16', '\t': u'utf_8', '\n\x00\x00\x00': u'utf_32_le', '\r': u'utf_8', '\r\x00\x00\x00': u'utf_32_le', '\n\x00': u'utf_16_le', '4\x00': u'utf_16_le', '-': u'utf_8', '1': u'utf_8', '{\x00': u'utf_16_le', '5': u'utf_8', '9': u'utf_8', '\xff\xfe': u'utf_16', '2\x00\x00\x00': u'utf_32_le', '\x00\x00\x005': u'utf_32_be', 'n\x00': u'utf_16_le', '5\x00\x00\x00': u'utf_32_le', '\x00\x00\x003': u'utf_32_be', ']': u'utf_8', '\x00\x00\x009': u'utf_32_be', '"\x00': u'utf_16_le', '\r\x00': u'utf_16_le', '7\x00': u'utf_16_le', '8\x00\x00\x00': u'utf_32_le', '}': u'utf_8'}, complete=True) Determine the encoding of a UTF-x encoded string.
The argument
starts
must be a mapping of bytestrings the input can begin with onto the encoding that such a beginning would represent (seelicit_starts()
for a function that can build such a mapping).The
complete
flag signifies whether the input represents the entire string: if it is setFalse
, the function will attempt to determine the encoding, but will raise aUnicodeError
if it is ambiguous. For example, an input ofb'\xff\xfe'
could be the UTF-16 little-endian byte-order mark, or, if the input is incomplete, it could be the first two characters of the UTF-32-LE BOM:>>> sniff_encoding(b'\xff\xfe') == 'utf_16' True >>> sniff_encoding(b'\xff\xfe', complete=False) Traceback (most recent call last): ... UnicodeError: String encoding is ambiguous.
-
json_delta._util.
stanzas_addressing
(stanzas, keypath)¶ Find diff stanzas modifying the structure at
keypath
.The purpose of this function is to keep track of changes made to the overall structure by stanzas earlier in the sequence, e.g.:
>>> struc = [ ... 'foo', ... 'bar', [ ... 'baz' ... ] ... ] >>> stanzas = [ ... [ [2, 1], 'quux'], ... [ [0] ], ... [ [1, 2], 'quordle'] ... ] >>> (stanzas_addressing(stanzas, [2]) ... == [ ... [ [1], 'quux' ], ... [ [2], 'quordle' ] ... ]) True
stanzas[0]
andstanzas[2]
both address the same element ofstruc
— the list that starts off as['baz']
, even though their keypaths are completely different, because the diff stanza[[0]]
moves the list['baz']
from index 2 ofstruc
to index 1.The return value is a sub-diff: a list of stanzas fit to modify the element at
keypath
within the overall structure.
Javascript implementation notes¶
The Javascript implementation provides an object JSON_delta
that
encapsulates the principal functions (use JSON_delta.patch
and
JSON_delta.diff
). JSON-format diffs and patches are supported,
and the diffs can be made compact (set the minimal
parm to
JSON_delta.diff
to true
). Udiffs and upatching are not yet
supported: email me if you’d like to see them!
Racket implementation notes¶
The Racket implementation passes the test suite, but is about as slow as a wet weekend. Refactoring for speed will be gotten around to Real Soon Now...
Perl implementation notes¶
Development of the Perl implementation stalled when it was discovered that, because Perl doesn’t have as ramified a type ontology as Javascript, numeric values do not round-trip cleanly. This makes it impossible to produce an implementation that passes the test suite using available JSON libraries.
Licenses¶
Source¶
The JSON-delta source code is copyright 2012-2015 Philip J. Roberts. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
This documentation¶
This documentation is copyright 2014-2015 Philip J. Roberts. All rights reserved.
Redistribution and use in source (ReST/Sphinx) and ‘compiled’ forms (SGML, HTML, PDF, PostScript, RTF and so forth) with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code (ReST/Sphinx) must retain the above copyright notice, this list of conditions and the following disclaimer as the first lines of this file unmodified.
Redistributions in compiled form (transformed to other DTDs, converted to PDF, PostScript, RTF and other formats) must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS DOCUMENTATION IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FREEBSD DOCUMENTATION PROJECT BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DOCUMENTATION, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.