Jason (jcreed) wrote,

Here's a cute puzzle:

Define a edit distance function d(x,y) between strings x and y to be the minimum number of steps to get from x to y allowing only insertions and deletions of characters at the beginning of the string. For example, d("foo", "baroo") is 4: there's four hops in the sequence "foo", "oo", "roo", "aroo", "baroo".

Does there exist a bijection
 f : string + string -> string

and a number n such that
d(f(inl(x)), x) < n
d(f(inr(x)), x) < n

? That is, can you embed string + string into string with bounded prefix-edit-distance?
Finding such a function that's merely an injection is trivial. Just prefix with one character in one branch, another character in the other branch.
fun f s = case s of inl(x) => "!" ^ x | inr(x) => "#" ^ x

But how can you make it a bijection?

Despite Cantor-Schroeder-Bernstein being generally nonconstructive, you can walk through its reasoning with a suitably chosen lexicographic ordering on the types string and string * string, and arrive at the following function: (in fact, that's exactly how I did arrive at it, after puzzling about it unsuccessfully by the seat of my pants for a while)

datatype ('a,'b) sum = inl of 'a | inr of 'b;

fun split s = (substring(s, 0, 1), substring(s, 1, size s - 1));

fun escape s = (case split s of ("#", x) => "#" ^ escape x
                              | ("!", x) => "#!" ^ x
                              | _ => s);

fun f s = (case s of inl(x) => "!" ^ x 
                  | inr(x) => escape x);

In plain english, the strings coming from the left injection are marked with a "!" in the beginning, and strings coming from the right injection are left unmolested as we can get away with --- we prefix them with a single "#" only if they begin with "#...#!" where "#...#" indicates 0 or more "#"s.

- f (inl "hello");
val it = "!hello" : string
- f (inr "hello");
val it = "hello" : string
- f (inr "!hello");
val it = "#!hello" : string
- f (inr "#hello");
val it = "#hello" : string
- f (inr "#!hello");
val it = "##!hello" : string
- f (inr "!asdfoaisdfu");
val it = "#!asdfoaisdfu" : string
- f (inr "####asdf");
val it = "####asdf" : string

Variation 1: is it possible to write such a bijection (still prefix-edit-distance-bounded) that doesn't have to inspect an unbounded amount of the input string? I don't actually know off the top of my head, and I suspect not, since it sounds like a strong sort of continuity requirement, and I'm not sure that string is so "topologically" isomorphic to string + string.

Variation 2: is it possible to write such a bijection (still prefix-edit-distance-bounded) with only one "escape" character? That is, the edit-distance function only permits insertions or deletions of a single character, rather than the two characters "!" and "#" that are used above? I also suspect not.
Tags: math, strings

  • (no subject)

    Guy from Seattle team we've been working with showed up today at work; no matter how much I'm generally comfortable working with remote teams (and I…

  • (no subject)

    Sean's back in town --- good fun working with nonremote teammates.

  • (no subject)

    Sean's in town at work, good times.

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded