?

Log in

No account? Create an account
Have I mentioned lately how much I love jq? I love it so much. I've… - Notes from a Medium-Sized Island [entries|archive|friends|userinfo]
Jason

[ website | My Website ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

[Aug. 15th, 2015|09:57 pm]
Jason
[Tags|, ]

Have I mentioned lately how much I love jq? I love it so much. I've seen it billed as "sed for json", but as a PL nerd it really goes a couple steps beyond that.

One obvious thing is that it lets you program in the list monad with very simple syntax. I can do things like
$ echo "[[1,2,3],[4,5,6],[7,8,9]]" | jq ".[][]"
1
2
3
4
5
6
7
8
9

and
$ echo '[{"foo":[4,5],"bar":9},{"foo":[6,7,8]}]' | jq "[.[].foo[]]"
[
  4,
  5,
  6,
  7,
  8
]


But that's no big deal, you say, you're used to list comprehensions from python and ruby and whatever. But can you automatically capture the provenance of data loci used at some point during nested list comprehensions and apply functions to them in-place in a complex, heterogeneous data-structure?
$  echo '[{"foo":[4,5],"bar":9},{"foo":[6,7,8]}]' | jq ".[].foo[] |= (. + 100)"
[
  {
    "foo": [
      104,
      105
    ],
    "bar": 9
  },
  {
    "foo": [
      106,
      107,
      108
    ]
  }
]

Like, actually, if you can do that in any reasonable way in python or ruby or whatever, I would be interested to know. But this sort of feature is novel to me, at least - jq's the first place I've encountered it. You can even do arbitrarily nested searches like
$ echo '[{"foo":[4,5],"bar":9},{"foo":[[[[{"baz":"here is what I am looking for"}]]],7,8]}]' | \
  jq '(..|select(type=="object" and has("baz"))|.baz) |= (. + " and now I have changed it")'
[
  {
    "foo": [
      4,
      5
    ],
    "bar": 9
  },
  {
    "foo": [
      [
        [
          [
            {
              "baz": "here is what I am looking for and now I have changed it"
            }
          ]
        ]
      ],
      7,
      8
    ]
  }
]
LinkReply

Comments:
From: jccw
2015-08-16 10:14 pm (UTC)
Reminds me a little of Flux, which was based originally on some ideas from the late 90's about complex object query and update languages by Hartmut Liefke and Susan Davidson (whose interestingness appears not to have been appreciated at the time). We also used a similar update language in some later work on provenance, coincidentally.

jq seems worth a look, it may be straightforward to adapt ideas from the above to make a typed version.
(Reply) (Thread)
[User Picture]From: jcreed
2015-08-17 01:25 am (UTC)
Neat, I should read your stuff more carefully.
(Reply) (Parent) (Thread)
[User Picture]From: jcreed
2015-08-18 02:23 am (UTC)
To help me understand the core language in the Flux paper:

If I have
<a>
 <b><c/></b>
 <b><c/></b>
 <b><c/></b>
 ...
 <b><c/></b>
</a>

and I want to rename all the c's to d's, how do I do that? I feel like I want to do something like let x = a :: b :: c in ... to grab all the c's, and then I want to snapshot something or other, but I seem to end up with more bound variables than I think I should have.
(Reply) (Parent) (Thread)
From: jccw
2015-08-18 08:55 am (UTC)
In the core language in section 3, you would do something like this:

children[iter[children[iter[c? replace d]]]]

(modulo an off-by-one problem) if you just want to rename all the c's in a document with schema a[b[c[]]*].

I think in the source language this would look like:

RENAME */*/c TO d

If you want to rename all of the c's (anywhere in a document), this can't be done with just one update (the path language excludes XPath's descendant axis, i.e. //). This is important for typechecking to stay sane, though it may be possible to generalize it.

Snapshot is only needed if we want to copy or rearrange data in complex ways; the snapshotted variable gets assigned a pure (immutable) value. Likewise, Let is only useful for evaluating and reusing expressions (the result is a pure value that cannot be updated).
(Reply) (Parent) (Thread)
[User Picture]From: jcreed
2015-08-18 01:05 pm (UTC)
Interesting --- I think then the thing I found interesting about jq was exactly the thing you say Flux chooses not to support:

Unlike most other proposals, in FLUX, arbitrary [query] expressions cannot be used to select foci. If this were allowed, it would be easy to construct examples for which the result
of an update depends on the order in which the focus selection
is processed.

I sort of see why this is potentially dangerous. Yet if I try an example like
echo '[[4, 5]]' | jq -c '.. |= if type == "array" then . + [100] else . + 10 end'
I get out [[14,15,100],100], which makes sense to me as picking a bottom-up traversal of the tree. I'm struggling to imitate your exact example since JSON isn't quite the same as XML, but I do see that
echo '{"a": [{"b": []}]}' | jq '.. = "c"' ("replace every node of the tree with 'c'")
gives me the error
jq: error (at :1): Cannot index string with string "a"
so there's something weird going on because it's trying to do overlapping updates concurrently there I think.


(Reply) (Parent) (Thread)
From: jccw
2015-08-18 01:48 pm (UTC)


Yes, overlapping updates is a big reason for Flux being more limited. Aside from typechecking, nondeterministic updates seem undesirable, and imposing a disambiguation strategy for re-determinizing things seems to require a global analysis of what the update is trying to do.

As an example of the pitfalls of a richer strategy, there's also the W3C's XML update extension to XQuery, which allows selecting and updating in a much more flexible (and harder to typecheck) way.

Anyway, I'll try to take a closer look at jq when I have time...
(Reply) (Parent) (Thread)