Avoiding conversion of (an invalid) string returned from PyCall when passing back to Python

question

#1

In Python

>>> import base64
>>> base64.urlsafe_b64decode('08lZBOeQrbLPrIK8d029jtuj3rl3oC6DD_4MUEKu_8TAlZIOSub4a-RJ2Me8z5Dv')

produces the following string:
'\xd3\xc9Y\x04\xe7\x90\xad\xb2\xcf\xac\x82\xbcwM\xbd\x8e\xdb\xa3\xde\xb9w\xa0.\x83\x0f\xfe\x0cPB\xae\xff\xc4\xc0\x95\x92\x0eJ\xe6\xf8k\xe4I\xd8\xc7\xbc\xcf\x90\xef'
This is not really a valid string so Julia will understandably complain about printing it…

@pyimport base64
base64.urlsafe_b64decode(input)
"ԉY\x04琭Error showing value of type String:
ERROR: UnicodeError: invalid character index
...

but the real problem is that I can’t pass it back to Python through PyCall

@pyimport base64
str = base64.urlsafe_b64decode(input)
base64.encode(str)
ERROR: PyError (PyUnicode_DecodeUTF8) <type 'exceptions.UnicodeDecodeError'>
UnicodeDecodeError('utf8', '\xd3\xc9Y\x04\xe7\x90\xad\xb2\xcf\xac\x82\xbcwM\xbd\x8e\xdb\xa3\xde\xb9w\xa0.\x83\x0f\xfe\x0cPB\xae\xff\xc4\xc0\x95\x92\x0eJ\xe6\xf8k\xe4I\xd8\xc7\xbc\xcf\x90\xef', 0, 1, 'invalid continuation byte')

as I assume Julia is trying to convert it to a UTF-8 encoding before passing it to Python.

I could wrap the python functionality in a @pydefed class to avoid these conversions, but I wonder if there is an alternative approach which avoids this problem?


#2

One option is to use pycall to retain the underlying Python object:

strobj = pycall(base64.urlsafe_b64decode, PyObject, "08lZBOeQrbLPrIK8d029jtuj3rl3oC6DD_4MUEKu_8TAlZIOSub4a-RJ2Me8z5Dv")

which one can then use in the following call to Python.


#3

Julia doesn’t really have a binary string type (if you use the string macro b"..." it returns a Vector{UInt8}).
What happens if you convert the return immediately: binstr = convert(Vector{UInt8}, base64.urlsafe_b64decode(input))?
Does base64.encode(binstr) then work correctly?


#4

Thanks Scott, your solution also works well.

julia> using PyCall
julia> @pyimport base64
julia> input = "08lZBOeQrbLPrIK8d029jtuj3rl3oC6DD_4MUEKu_8TAlZIOSub4a-RJ2Me8z5Dv";
julia> base64.urlsafe_b64encode(convert(Vector{UInt8},base64.urlsafe_b64decode(input)))
"08lZBOeQrbLPrIK8d029jtuj3rl3oC6DD_4MUEKu_8TAlZIOSub4a-RJ2Me8z5Dv"