Julia call from Python3 running in single core

Harish_Kumar · November 23, 2016, 12:09am

Hi all,

I am calling julia from python3 through pyjulia interface for mixed model. i have 2.8 M records and it is taking 2-3 hours to complete the model. Which is little expensive. So i took the dataset and loaded directly in julia and executed the model which took 800sec (V0.3.2).

While analyzing in dept i found when i execute the Julia as standalone all cores of my server are used for Julia and its fast. But when i call from Python only one core with python process is utilized. Can some one help me how can i force julia to use more cores when it called from Pyjulia/python?

ihnorton · November 23, 2016, 12:41am

Without more information it’s hard to say, but if you are calling addprocs in the pure-Julia version, you need to do the same from pyjulia.

In other words, nprocs() in pure Julia must match j.nprocs() in pyjulia.

ps: please don’t double-post

Harish_Kumar · November 23, 2016, 1:04am

Thank you.
Sorry i was not aware that this is linked to the stack overflow.

i didnt set any thing on the standalone julia process
Here is my python and Julia code, please suggest me what is wrong here

import sys
import julia
import time
from itertools import combinations
start_time = time.time()
print("asdasdasda")
import csv
import pandas as pd
df = pd.read_csv('/home/hpcuser/test.csv')

schema = list(df.columns)
inputData = [schema]+df.values.tolist()

inputFormula = 'volume ~ 1 + logprice + col1 + col2 + col3 + price1 + price2 + price3 + price4 + ( ( 0 + logprice ) | employee ) + ( ( 0 + col1 ) | employee ) + ( ( 0 + col2 ) | employee ) + ( ( 0 + col3 ) | employee )'
#result = juliaCall(inputData, inputFormula, allRand=0,resid= 0)

def juliaCall(inputData, inputFormula,  allRand=0,resid= 0):
  j = julia.Julia()
  # Create the final Julia code to be run
  juliaCode = """
    using DataFrames
    using MixedModels
     function calc_lmm(raw_data::Array)
      addprocs(5)
      dataset = convert(DataFrame, Dict(raw_data[1,:], [raw_data[2:end,i] for i in 1:size(raw_data,2)]))
      id = convert(Array, dataset[:id])
      delete!(dataset, :id)
      modelREML = lmm({formula}, dataset)
      reml!(modelREML,true)
      lmeModel = fit(modelREML)
      fixedDF = DataFrame(fixedEffVar = coeftable(lmeModel).rownms,estimate = coeftable(lmeModel).mat[:,1],
                     stdError = coeftable(lmeModel).mat[:,2],zVal = coeftable(lmeModel).mat[:,3])
      if ({resid} == 1 || {allRand} == 1)
        randomEffectTerms = map(string, filter(x -> contains(string(x), "("), lmeModel.mf.terms.terms))
        byVar = strip(string(randomEffectTerms[1])[search(string(randomEffectTerms[1]), "|")[1] + 1:end])
        byVarValues = unique(lmeModel.mf.df[symbol(byVar)])
        randomDF = convert(DataFrame, ranef(lmeModel)[1])
        rename!(randomDF, randomDF.colindex.names, map(x -> symbol(string(byVar,x)), byVarValues))
        randomDF[:randomEffVar] = randomEffectTerms
        randomDF = stack(randomDF, [1:length(randomDF)-1], :randomEffVar)
        rename!(randomDF, [:variable, :value], [symbol(byVar), :estimate])
      end
      if {allRand} == 1
        result = ["fixedEffEstimates" => convert(Array, fixedDF), "randomEffEstimates" => convert(Array, randomDF), "residuals" => hcat(id, lmeModel.resid)]
      elseif {resid} == 1
        result = ["fixedEffEstimates" => convert(Array, fixedDF), "randomEffEstimates" => convert(Array, randomDF), "residuals" => ""]
      else
        result = ["fixedEffEstimates" => convert(Array, fixedDF), "randomEffEstimates" => "", "residuals" => ""]
      end  
      return result
    end""".format(formula = inputFormula, allRand = allRand, resid = resid)

  # Evaluate the code, note this is not running the code
  calcLME = j.eval(juliaCode)
  result = calcLME(inputData)
  del inputData
  return result


result = juliaCall(inputData, inputFormula, allRand=0,resid= 0)
print(result)

Harish_Kumar · November 23, 2016, 1:47am

Tried below code its showing np as 1, how can i make this equal to number of cores (V.0. 3.2)

j = julia.Julia()
np = j.nprocs()
print(“Julia Process:”, np)

ihnorton · November 23, 2016, 5:51am

The extra workers may not be added, even though you appear to call addprocs in the (julia) script. I’m surprised it works at all: when I call addprocs through pyjulia I get a segfault in both 0.5 and recent master due to GC corruption (GC error (probable corruption), when it is feeling nice; usually a nasty backtrace).

edit: oh, you are using Julia 0.3. That may be why. At this point 0.3 is not really supported so I’m not going to try to debug there. Please try with 0.5.

They’re not linked, but most of the people who answer SO are also on here, so please only post in one place at a time.

Harish_Kumar · November 23, 2016, 5:56am

If i execute my code in V0.5 are you saying will it add extra process? (addprocs(5))? . is there anything else i need to modify inside or out side the code?.

ihnorton · November 23, 2016, 2:50pm

My only suggestion for 0.3 is to try calling j.addprocs(5) directly, before executing your script.

I don’t know what will happen for you with 0.5; as I mentioned, j.addprocs crashes for me (on OS X). I am currently debugging that, and will open an issue if I don’t find a local reason. If it crashes for you on a different platform than OS X then please go ahead and file an issue on github.

Harish_Kumar · November 23, 2016, 8:43pm

Added like this:
calcLME = j.eval(juliaCode)
j.addprocs(7)
result = calcLME(inputData)

error message
j.addprocs(7)
RuntimeError: Julia exception: UndefRefError()

On V0.3.2

Harish_Kumar · November 26, 2016, 7:49pm

I did a system call from Python to julia. (julia -p 7 script.jl) which used 7+1 cores and execution is fast. Looking at it i feel when we trigger from python using pyjulia, python is not allowing julia to execute using available free cores, instead it forces julia interpreter to run under the python process which triggered this Julia. Is my assumption is right?

https://github.com/JuliaPy/pyjulia/blob/master/julia/core.py – first comment of this code states" Bridge Python and Julia by initializing the Julia interpreter inside Python."

What about passing the “init_julia” argument in this function? j = julia.Julia(init_julia=False)

yuyichao · November 26, 2016, 10:32pm

No, -p is a command line option that automatically call addproc to spawn worker processes. You can always just do that manually.

ihnorton · November 26, 2016, 11:19pm

Unfortunately that doesn’t seem to work via pyjulia. OP tried 0.3, and I tested several versions:

github.com/JuliaPy/pyjulia

`addprocs` leads to segfault

opened 04:39AM - 24 Nov 16 UTC

closed 07:51PM - 06 Dec 16 UTC

ihnorton

``` import julia j = julia.Julia(jl_runtime_path="/Users/inorton/git/julia/usr…/bin/julia", jl_init_path="/Users/inorton/git/julia/usr/bin") j.addprocs(1) #segfault ``` (segfault stack [here](https://gist.github.com/ihnorton/63dabfa2316d2e29eb0941eaa70ab7ce)) Based on the stack trace, I thought there might be some issue with task switching because of ctypes (libffi) stack. However, running the following (equivalent?) code does not segfault: ``` JLPATH="/Users/inorton/git/julia" import ctypes jl = ctypes.PyDLL(JLPATH+"/usr/lib/libjulia.dylib", ctypes.RTLD_GLOBAL) jl.jl_init(JLPATH+"/usr/bin/") jl.jl_eval_string(""" addprocs(1) """) jl.jl_eval_string(""" println(nprocs()) """) # 2 ``` Tested with 0.4.6, 0.5, and a several-day-old trunk build. (cross-ref: https://discourse.julialang.org/t/julia-call-from-python3-running-in-single-core/508)

Harish_Kumar · November 26, 2016, 11:54pm

Start the julia manually then call above function with init_julia=False. Will this work?
j = julia.Julia(init_julia=False)

Harish_Kumar · November 26, 2016, 11:55pm

If you are calling from the python (pyjulia), you don’t have option to put -p. If you run standalone julia then only -p will work. We are looking into the pyjulia.

yuyichao · November 27, 2016, 11:12pm

Well, addproc also works with stand alone julia and it seems that the linked bug report includes a workaround already.

Harish_Kumar · November 27, 2016, 11:55pm

If you dont mind can you point me to the workaround please?

yuyichao · November 27, 2016, 11:57pm

github.com/JuliaPy/pyjulia

`addprocs` leads to segfault

opened 04:39AM - 24 Nov 16 UTC

closed 07:51PM - 06 Dec 16 UTC

ihnorton

``` import julia j = julia.Julia(jl_runtime_path="/Users/inorton/git/julia/usr…/bin/julia", jl_init_path="/Users/inorton/git/julia/usr/bin") j.addprocs(1) #segfault ``` (segfault stack [here](https://gist.github.com/ihnorton/63dabfa2316d2e29eb0941eaa70ab7ce)) Based on the stack trace, I thought there might be some issue with task switching because of ctypes (libffi) stack. However, running the following (equivalent?) code does not segfault: ``` JLPATH="/Users/inorton/git/julia" import ctypes jl = ctypes.PyDLL(JLPATH+"/usr/lib/libjulia.dylib", ctypes.RTLD_GLOBAL) jl.jl_init(JLPATH+"/usr/bin/") jl.jl_eval_string(""" addprocs(1) """) jl.jl_eval_string(""" println(nprocs()) """) # 2 ``` Tested with 0.4.6, 0.5, and a several-day-old trunk build. (cross-ref: https://discourse.julialang.org/t/julia-call-from-python3-running-in-single-core/508)

Harish_Kumar · November 28, 2016, 12:09am

Thanks. But i am not still clear what change i have to do in my code (2nd thread in this discussion). Will this work in .3.2 or .5.0? or any changes required in my code?

j = julia.Julia()
np = j.addprocs()

Regards,
Harish

yuyichao · November 28, 2016, 12:29am

Based on the stack trace, I thought there might be some issue with task switching because of ctypes (libffi) stack. However, running the following (equivalent?) code does not segfault:

JLPATH="/Users/inorton/git/julia"
import ctypes
jl = ctypes.PyDLL(JLPATH+"/usr/lib/libjulia.dylib", ctypes.RTLD_GLOBAL)
jl.jl_init(JLPATH+"/usr/bin/")
jl.jl_eval_string(""" addprocs(1) """)
jl.jl_eval_string(""" println(nprocs()) """)
# 2

Harish_Kumar · November 28, 2016, 1:13am

Thank you. But i dont find libjulia.dylib in my julia folder (binary 0.3.2). is it julia/lib/julia/libjulia.so file?. Do i need to rename or link ? if so how can i do it.?

OSError: julia/usr/lib/libjulia.dylib: cannot open shared object file: No such file or directory

yuyichao · November 28, 2016, 1:20am

usr/lib/julia/libjulia.so

Topic		Replies	Views
Construct and intermittently call a single Julia instance with JuliaCall Tooling python , juliacall , pyjulia	4	589	April 14, 2023
Pyjulia in desperate need of attention form someone who knows what they're doing General Usage	22	5755	November 24, 2017
Questions about pyjulia usage General Usage pyjulia	6	1877	November 11, 2021
Efficiency for calling Julia from python and purely run Julia General Usage	20	6975	May 28, 2022
Baffling addprocs() with @everywhere Julia at Scale	17	3443	April 24, 2018

Julia call from Python3 running in single core

Related topics