Jason (jcreed) wrote,
Jason
jcreed

My latest exercise in pure programming futility has descended into picking apart truetype font files in order to get the metrics and encoding information out of them. This being necessary, apparently, to support unicode at all, since the PDF format wasn't exactly designed with it in mind.

Of course these are, again, arcane and all-but-underspecified formats with lots of little "version number of this thingy goes here, and then the data that follows obeys one of the following five formats" patterns. I have just got used to this bullshit by now, though. A PDF can be version 1.3, 1.4, 1.5, or 1.6, a font embedded in a PDF can be Type1, TrueType, Type0, or OpenType, a Type0 can contain a descendant CIDFont that is Type0 (which, by the way, really means "Type1 font") or Type2 (which really means "TrueType"), and a TrueType font embedded inside a CIDFont (which is presently called Type2, right, and also it is called a composite font, since it is composed out of one subfont (because composite fonts that are composites of more than one font are illegal) to distinguish it from fonts which are one font apiece, got it?) and now this TrueType ("Type2") font has a cmap table in its giant list of tables, and the cmap table has a version, which must always be zero, but each of its subtables in addition to a platform ID (either 0 which means "Unicode", 1 which means "Macintosh", 2 which is reserved and must not be used, or 3, which means "Microsoft" --- good luck thinking about that in a well-typed way) and a platform-specific encoding number (which may mean things such as "Unicode ISO 10646 1993 semantics" or "Gujarati") also has a format number, which is an entirely different concept. It may be 0, 2, 4, 8, 10, or 12, each one of these things being a different way of, er, encoding encodings. So far I have only seen 0 and 4 in the wild, so who knows how common these other things are.

Now would be a good time to question why I wanted to generate PDFs from a program in the first place. I can't remember the exact reason, but since I feel the PDF and TrueType formats respectively snickering behind my back at my lack of success so far, I feel it's nearly a matter of honor, here, that I at least finish what I started. The whole thing is definitely more a matter of wanting to believe that I understand these formats well enough to generate them from any ol' language, not necessarily a drive to implement it in SML specificically....

---

Got it implemented now, but I am getting sporadic off-by-1024 errors. At least it's off-by-nice-round-power-of-two. That gives me hope for a rational explanation.

---

No, no rational explanation so far, but I do have a completely irrational fix that seems to work.

Why, I just apply the function (at some point in the crazy pipeline already set up) that takes a fifteen-bit integer and shifts the seven most significant bits left, producing a sixteen-bit integer where the 256-bit is always zero. Then everything works perfectly. Why didn't I think of that earlier?
Tags: angst, fonts, pdf, programming, truetype
Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 7 comments