Avoiding conversion of (an invalid) string returned from PyCall when passing back to Python

samuelpowell · December 15, 2016, 6:01pm

In Python

>>> import base64
>>> base64.urlsafe_b64decode('08lZBOeQrbLPrIK8d029jtuj3rl3oC6DD_4MUEKu_8TAlZIOSub4a-RJ2Me8z5Dv')

produces the following string:
'\xd3\xc9Y\x04\xe7\x90\xad\xb2\xcf\xac\x82\xbcwM\xbd\x8e\xdb\xa3\xde\xb9w\xa0.\x83\x0f\xfe\x0cPB\xae\xff\xc4\xc0\x95\x92\x0eJ\xe6\xf8k\xe4I\xd8\xc7\xbc\xcf\x90\xef'
This is not really a valid string so Julia will understandably complain about printing it…

@pyimport base64
base64.urlsafe_b64decode(input)
"ԉY\x04琭Error showing value of type String:
ERROR: UnicodeError: invalid character index
...

but the real problem is that I can’t pass it back to Python through PyCall

@pyimport base64
str = base64.urlsafe_b64decode(input)
base64.encode(str)
ERROR: PyError (PyUnicode_DecodeUTF8) <type 'exceptions.UnicodeDecodeError'>
UnicodeDecodeError('utf8', '\xd3\xc9Y\x04\xe7\x90\xad\xb2\xcf\xac\x82\xbcwM\xbd\x8e\xdb\xa3\xde\xb9w\xa0.\x83\x0f\xfe\x0cPB\xae\xff\xc4\xc0\x95\x92\x0eJ\xe6\xf8k\xe4I\xd8\xc7\xbc\xcf\x90\xef', 0, 1, 'invalid continuation byte')

as I assume Julia is trying to convert it to a UTF-8 encoding before passing it to Python.

I could wrap the python functionality in a @pydefed class to avoid these conversions, but I wonder if there is an alternative approach which avoids this problem?

samuelpowell · December 15, 2016, 6:36pm

One option is to use pycall to retain the underlying Python object:

strobj = pycall(base64.urlsafe_b64decode, PyObject, "08lZBOeQrbLPrIK8d029jtuj3rl3oC6DD_4MUEKu_8TAlZIOSub4a-RJ2Me8z5Dv")

which one can then use in the following call to Python.

ScottPJones · May 28, 2017, 12:27am

Julia doesn’t really have a binary string type (if you use the string macro b"..." it returns a Vector{UInt8}).
What happens if you convert the return immediately: binstr = convert(Vector{UInt8}, base64.urlsafe_b64decode(input))?
Does base64.encode(binstr) then work correctly?

samuelpowell · May 28, 2017, 10:13am

Thanks Scott, your solution also works well.

julia> using PyCall
julia> @pyimport base64
julia> input = "08lZBOeQrbLPrIK8d029jtuj3rl3oC6DD_4MUEKu_8TAlZIOSub4a-RJ2Me8z5Dv";
julia> base64.urlsafe_b64encode(convert(Vector{UInt8},base64.urlsafe_b64decode(input)))
"08lZBOeQrbLPrIK8d029jtuj3rl3oC6DD_4MUEKu_8TAlZIOSub4a-RJ2Me8z5Dv"

Topic		Replies	Views
Passing bytes instead strings in PyCall.jl General Usage	2	1128	May 24, 2017
Integration of Python with Julia General Usage question , pycall	3	2657	November 25, 2017
bug report calling julia from python General Usage python	0	673	July 5, 2018
Convert hex to base64 New to Julia convert , cryptography	2	1136	June 24, 2020
Pass a Julia string to Python as an iterable General Usage question	9	899	December 5, 2016

Avoiding conversion of (an invalid) string returned from PyCall when passing back to Python

Related topics