Julia call from Python3 running in single core

Hi all,

I am calling julia from python3 through pyjulia interface for mixed model. i have 2.8 M records and it is taking 2-3 hours to complete the model. Which is little expensive. So i took the dataset and loaded directly in julia and executed the model which took 800sec (V0.3.2).

While analyzing in dept i found when i execute the Julia as standalone all cores of my server are used for Julia and its fast. But when i call from Python only one core with python process is utilized. Can some one help me how can i force julia to use more cores when it called from Pyjulia/python?

Without more information it’s hard to say, but if you are calling addprocs in the pure-Julia version, you need to do the same from pyjulia.

In other words, nprocs() in pure Julia must match j.nprocs() in pyjulia.

ps: please don’t double-post

1 Like

Thank you.
Sorry i was not aware that this is linked to the stack overflow.

  1. i didnt set any thing on the standalone julia process
  2. Here is my python and Julia code, please suggest me what is wrong here
import sys
import julia
import time
from itertools import combinations
start_time = time.time()
print("asdasdasda")
import csv
import pandas as pd
df = pd.read_csv('/home/hpcuser/test.csv')

schema = list(df.columns)
inputData = [schema]+df.values.tolist()

inputFormula = 'volume ~ 1 + logprice + col1 + col2 + col3 + price1 + price2 + price3 + price4 + ( ( 0 + logprice ) | employee ) + ( ( 0 + col1 ) | employee ) + ( ( 0 + col2 ) | employee ) + ( ( 0 + col3 ) | employee )'
#result = juliaCall(inputData, inputFormula, allRand=0,resid= 0)

def juliaCall(inputData, inputFormula,  allRand=0,resid= 0):
  j = julia.Julia()
  # Create the final Julia code to be run
  juliaCode = """
    using DataFrames
    using MixedModels
     function calc_lmm(raw_data::Array)
      addprocs(5)
      dataset = convert(DataFrame, Dict(raw_data[1,:], [raw_data[2:end,i] for i in 1:size(raw_data,2)]))
      id = convert(Array, dataset[:id])
      delete!(dataset, :id)
      modelREML = lmm({formula}, dataset)
      reml!(modelREML,true)
      lmeModel = fit(modelREML)
      fixedDF = DataFrame(fixedEffVar = coeftable(lmeModel).rownms,estimate = coeftable(lmeModel).mat[:,1],
                     stdError = coeftable(lmeModel).mat[:,2],zVal = coeftable(lmeModel).mat[:,3])
      if ({resid} == 1 || {allRand} == 1)
        randomEffectTerms = map(string, filter(x -> contains(string(x), "("), lmeModel.mf.terms.terms))
        byVar = strip(string(randomEffectTerms[1])[search(string(randomEffectTerms[1]), "|")[1] + 1:end])
        byVarValues = unique(lmeModel.mf.df[symbol(byVar)])
        randomDF = convert(DataFrame, ranef(lmeModel)[1])
        rename!(randomDF, randomDF.colindex.names, map(x -> symbol(string(byVar,x)), byVarValues))
        randomDF[:randomEffVar] = randomEffectTerms
        randomDF = stack(randomDF, [1:length(randomDF)-1], :randomEffVar)
        rename!(randomDF, [:variable, :value], [symbol(byVar), :estimate])
      end
      if {allRand} == 1
        result = ["fixedEffEstimates" => convert(Array, fixedDF), "randomEffEstimates" => convert(Array, randomDF), "residuals" => hcat(id, lmeModel.resid)]
      elseif {resid} == 1
        result = ["fixedEffEstimates" => convert(Array, fixedDF), "randomEffEstimates" => convert(Array, randomDF), "residuals" => ""]
      else
        result = ["fixedEffEstimates" => convert(Array, fixedDF), "randomEffEstimates" => "", "residuals" => ""]
      end  
      return result
    end""".format(formula = inputFormula, allRand = allRand, resid = resid)

  # Evaluate the code, note this is not running the code
  calcLME = j.eval(juliaCode)
  result = calcLME(inputData)
  del inputData
  return result


result = juliaCall(inputData, inputFormula, allRand=0,resid= 0)
print(result)

Tried below code its showing np as 1, how can i make this equal to number of cores (V.0. 3.2)

j = julia.Julia()
np = j.nprocs()
print(“Julia Process:”, np)

The extra workers may not be added, even though you appear to call addprocs in the (julia) script. I’m surprised it works at all: when I call addprocs through pyjulia I get a segfault in both 0.5 and recent master due to GC corruption (GC error (probable corruption), when it is feeling nice; usually a nasty backtrace).

edit: oh, you are using Julia 0.3. That may be why. At this point 0.3 is not really supported so I’m not going to try to debug there. Please try with 0.5.

They’re not linked, but most of the people who answer SO are also on here, so please only post in one place at a time.

1 Like

If i execute my code in V0.5 are you saying will it add extra process? (addprocs(5))? . is there anything else i need to modify inside or out side the code?.

My only suggestion for 0.3 is to try calling j.addprocs(5) directly, before executing your script.

I don’t know what will happen for you with 0.5; as I mentioned, j.addprocs crashes for me (on OS X). I am currently debugging that, and will open an issue if I don’t find a local reason. If it crashes for you on a different platform than OS X then please go ahead and file an issue on github.

Added like this:
calcLME = j.eval(juliaCode)
j.addprocs(7)
result = calcLME(inputData)

error message
j.addprocs(7)
RuntimeError: Julia exception: UndefRefError()

On V0.3.2

I did a system call from Python to julia. (julia -p 7 script.jl) which used 7+1 cores and execution is fast. Looking at it i feel when we trigger from python using pyjulia, python is not allowing julia to execute using available free cores, instead it forces julia interpreter to run under the python process which triggered this Julia. Is my assumption is right?

https://github.com/JuliaPy/pyjulia/blob/master/julia/core.py – first comment of this code states" Bridge Python and Julia by initializing the Julia interpreter inside Python."

What about passing the “init_julia” argument in this function? j = julia.Julia(init_julia=False)

No, -p is a command line option that automatically call addproc to spawn worker processes. You can always just do that manually.

Unfortunately that doesn’t seem to work via pyjulia. OP tried 0.3, and I tested several versions:

Start the julia manually then call above function with init_julia=False. Will this work?
j = julia.Julia(init_julia=False)

If you are calling from the python (pyjulia), you don’t have option to put -p. If you run standalone julia then only -p will work. We are looking into the pyjulia.

Well, addproc also works with stand alone julia and it seems that the linked bug report includes a workaround already.

If you dont mind can you point me to the workaround please?

Thanks. But i am not still clear what change i have to do in my code (2nd thread in this discussion). Will this work in .3.2 or .5.0? or any changes required in my code?

j = julia.Julia()
np = j.addprocs()

Regards,
Harish

Based on the stack trace, I thought there might be some issue with task switching because of ctypes (libffi) stack. However, running the following (equivalent?) code does not segfault:

JLPATH="/Users/inorton/git/julia"
import ctypes
jl = ctypes.PyDLL(JLPATH+"/usr/lib/libjulia.dylib", ctypes.RTLD_GLOBAL)
jl.jl_init(JLPATH+"/usr/bin/")
jl.jl_eval_string(""" addprocs(1) """)
jl.jl_eval_string(""" println(nprocs()) """)
# 2

Thank you. But i dont find libjulia.dylib in my julia folder (binary 0.3.2). is it julia/lib/julia/libjulia.so file?. Do i need to rename or link ? if so how can i do it.?

OSError: julia/usr/lib/libjulia.dylib: cannot open shared object file: No such file or directory

usr/lib/julia/libjulia.so