Unravelling 'nativized' xml to readable string format

I hope for ideas or help for unravelling some probably simple obfuscation of xml files. This requires some string work intuition, and I’m just stuck.

Background:
Mathcad is one of those classic engineering tools we love for good reasons. At the entry level one builds worksheet reports declaratively, but you seamlessly progress into using it as a functional programming language. It has symbolic solvers with its own strengths, and other nice things like memoization. The speed is ~1/200 of Julia, but since results are stored in the worksheet that’s normaly not an issue. In combination with Julia and our proprietary function library it covers most of my professional software needs.

I have been playing with an open source user interface, leveraging the strengths of both approaches (mock-up here). But it’s too ambitious for a hobby project, and in the short term (i.e. the next ten years), we’re going to continue adapting Mathcad for our needs.

Mathcad uses a transparent worksheet format, where functions and definitions are stored in a subset of xml, a MathML variant. Conversion to and from Julia expressions is pretty straightforward.

Now comes the issue. The business some years ago moved into the phase of milking a declining user base. They moved from 32 bit to 64 bit in general, and dropped all transparency. As part of this improvement, the file format is now something ‘native’ which we’re unable to adapt. But we will eventually have to use these new versions.

Problem:

Here is an example of a short file, and here is an example of my failed attempts at making this readable using Julia. The output is truncated after a while:

``
`julia> String(transcode(UInt8, (reinterpret(UInt8, read(“file.mcdx”)[2:end]))))

"K\x03\x04\x14\0\0\0\b\0\x17f\x82N\xdd[(\xdf\xde\x01\0\0\x8f\x06\0\0\x15\0\x1c\0mathcad/worksheet.xml \xa2\x18\0(\xa0\x14\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xad\x94Qo\x9b0\x14\x85\xff\n\xe2=\x06\xdb\xc1K\xa2$R\xa7Mڤ\ue953\xb6\xbe:p\tV\rF\xb6\tٿ\x9fq\n\xc9(J\xd3v\xbc.\xe7\x1e\xdf\xf3a\xb1n\x95~2\x05\x80\r\x8e\xa5\xac\xcc\xeah\xc4&,\xac\xadWQԶ-j)Rz\x1f\x918\xc6\xd1\xe3\x8f\xfb\x9fi\x01%\x9f\x89\xcaX^\xa5\x10>w\x1dh2^a\x90\xaa\xa1r/s\xa5Kn\x8d7)\xb9~j\xeaY\xaaʚ[\xb1\x13R\xd8?\x9d3\xebm\xf4-.\xcfE\n_TڔPY\xdf\x1fi\x90\xceQU\xa6\x10\xb5\xe9\xddZ\xf3\xc2Ι\x14F\xe5\x16\xb9\x19\xa2!z\x12\xf7-\xa5\xbc\xde\xd2=\x9c\xd5\xcduqS\tk𠮯\xabk\xad\x0ePuP\x87\x96\xdb\xc7߮5\xec\xbb\xfc\xfd"8\xddf"ۄΎ\xa7\xb6\xe1\xf2\xb7\xc8l\xb1\t\x13\x86p_\xfa\x06b_\xd8M\x88c\xa7\xb2\xca͈q\x82\x1c\xd3\xf3E\xc2@B\xdei\x96\x04]\xbe\x88\xa9۷\e\xc9mf\x1ai\x1f \xef6s5\xb9\xca \x17\x15\xf8\xa5\xc8\x02\xc9w ]\x9c_w\x0f\xdf\xef>\xdf\x7f\xf5\xf9V\xa6\xe6\xa9;7\xb5\xeb\x06}\x80p\xcbב\xd7\xfb6g\xac\xc51Ъ\xfb\x8cI\x18\xa4\xaas\xc0’{\r\n\xa9\x97\xfb\xe5P\xc3\x13\xb5\xf9D\x8dLԒ\x8bZ4LpZ\xf7y\xfc\tp\xb7\x13\xde)\xdaxL;A\x94\xb2\xf3\U149f87=Y,\xfee\xcf&ؓ\x0f\xb0߽\x8d\xfd\x14\xbf)\xf6\xb7~\xa3\xffÞ\x8c\xd8\xd3%Zė\xecɘ=I\x10{\x86O\xe6KĦ\xe1\xd3\x18\xcd_;\xf8\xf3\x0f\xc0O/\xe1\xbf\xf46\x06t\xc4\x003\x94\$\xef\x8aM\t{=6\v\xfb\xac\x93ɲ!\xd9x\xf2h\xf8c\x9d\xffdۿPK\x03\x04\n\0\0\0\0\0\x17f\x82N\xf9\xf6Yr\n\x03\0\0\n\x03\0\0\v\0\x1c\0_rels/.rels \xa2\x18\0(\xa0\x14\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\ufeff<?xml version=\"1.0\" encoding=\"utf-8\"?><Relationships xmlns=\"http://schemas.openxmlformats.org/package/2006/relationships\"><Relationship Type=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument\" Target=\"/mathcad/worksheet.xml\" Id=\"Rb87cec5b4080430e\" /><Relationship Type=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument\" Target=\"/mathcad/header.xml\" Id=\"R303d863a9cd94749\" /><Relationship Type=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument\" Target=\"/mathcad/footer.xml\" Id=\"R9cf8fed2c4de426e\" /><Relationship Type=\"http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties\" Target=\"/docProps/core.xml\" Id=\"R2d7574fcb21b417d\" /></Relationships>PK\x03\x04\x14\0\0\0\b\0\x17f\x82N\xe1ʇ~\xcc\0\0\0\xd9\x01\0\0\x12\0\x1c\0mathcad/header.xml \xa2\x18\0(\xa0\x14\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x8d\x91\xb1n\xc20\x10@\x7f\x05e'\x0e\xa02\xa0\x90\xa9c\x99XX\x8d{\xc1\x16\xb1Ϻ\xbb\xf8\xfb&i\x93\x0eHQF\xcb\xef=\x9dϥ\x05\xfd\r\xb4z\xfa&\xf0\xe1\xc9\xee\x98Y\x91xP\xa5\x94\xa7]\x8etSۢب\xcb\xe9\xebl,x\xbdv\x81E\a\x03ٟ\xf5\x80I\xe2\x81\xe0\x1c#\x84\xee\xb2F\xf2Zx\x88xM\xf76\xae\r\xfa\xa8\xc5]]\xe3\xe4\u557\xf7c\


Att.: @CrashBurnRepeat

It looks like the .mcdx is actually a zip-compressed XML file format, similar to docx. After unzipping it, there are a bunch of XML files that are plainly readable and could be parsed using e.g. LightXML.jl.

2 Likes

Wow, thank you! As simple as that, of course! They didn’t even bother to actually compress much. I’m so glad I asked, and modifying those files is just another piece of cake.

1 Like