Define a edit distance function d(x,y) between strings x and y to be the minimum number of steps to get from x to y allowing only insertions and deletions of characters at the beginning of the string. For example, d("foo", "baroo") is 4: there's four hops in the sequence "foo", "oo", "roo", "aroo", "baroo".
Does there exist a bijection
f : string + string -> string
and a number n such that
d(f(inl(x)), x) < n d(f(inr(x)), x) < n
? That is, can you embed string + string into string with bounded prefix-edit-distance?
Finding such a function that's merely an injection is trivial. Just prefix with one character in one branch, another character in the other branch.
fun f s = case s of inl(x) => "!" ^ x | inr(x) => "#" ^ x
But how can you make it a bijection?
Despite Cantor-Schroeder-Bernstein being generally nonconstructive, you can walk through its reasoning with a suitably chosen lexicographic ordering on the types string and string * string, and arrive at the following function: (in fact, that's exactly how I did arrive at it, after puzzling about it unsuccessfully by the seat of my pants for a while)
datatype ('a,'b) sum = inl of 'a | inr of 'b; fun split s = (substring(s, 0, 1), substring(s, 1, size s - 1)); fun escape s = (case split s of ("#", x) => "#" ^ escape x | ("!", x) => "#!" ^ x | _ => s); fun f s = (case s of inl(x) => "!" ^ x | inr(x) => escape x);
In plain english, the strings coming from the left injection are marked with a "!" in the beginning, and strings coming from the right injection are left unmolested as we can get away with --- we prefix them with a single "#" only if they begin with "#...#!" where "#...#" indicates 0 or more "#"s.
Examples:
- f (inl "hello"); val it = "!hello" : string - f (inr "hello"); val it = "hello" : string - f (inr "!hello"); val it = "#!hello" : string - f (inr "#hello"); val it = "#hello" : string - f (inr "#!hello"); val it = "##!hello" : string - f (inr "!asdfoaisdfu"); val it = "#!asdfoaisdfu" : string - f (inr "####asdf"); val it = "####asdf" : string
Variation 1: is it possible to write such a bijection (still prefix-edit-distance-bounded) that doesn't have to inspect an unbounded amount of the input string? I don't actually know off the top of my head, and I suspect not, since it sounds like a strong sort of continuity requirement, and I'm not sure that string is so "topologically" isomorphic to string + string.
Variation 2: is it possible to write such a bijection (still prefix-edit-distance-bounded) with only one "escape" character? That is, the edit-distance function only permits insertions or deletions of a single character, rather than the two characters "!" and "#" that are used above? I also suspect not.