I have downloaded the julia to my directory /home/harish/julia
I see the above “.so” file in /home/harish/julia/lib/julia, I dont see anywhere else.
Now i changed your code like this, correct me if i am wrong.
import ctypes
JLPATH=“/home/harish/julia”
jl = ctypes.PyDLL(JLPATH+“/lib/julia/libjulia.so”, ctypes.RTLD_GLOBAL)
jl.jl_init(JLPATH+“/bin/”)
jl.jl_eval_string(“”" addprocs(1) “”“)
jl.jl_eval_string(”“” println(nprocs()) “”")
ERROR :System image file “//…/lib/julia/sys.ji” not found
I dont see above mentioned file in “cd /usr/lib/” (I am not sure usr or /usr is inside julia dir or system dir)
Finding out how to add processes when calling Julia from Python could be of interest by itself, but I would reiterate that doing so will not help in fitting a mixed-effects model using the MixedModels package. The package only uses one process.
I did point out in one of the forums to which you posted that much of the work in fitting such a model is dense numeric linear algebra. Each case is a little different about exactly which BLAS or LAPACK routines are called, which is why I asked if you could post the formula for the model and the characteristics of the data. I would still appreciate it if you could do that. The inputFormula in your original post would fail on invalid syntax, I think, and I haven’t been able to see where it gets modified, if it does.
The benefit from having multiple threads is only in the BLAS/LAPACK calls in the evaluation of the objective function to be optimized.
Here is the draft of the formula. Its nor the same what i have but similar with changed column names. I am doing Mixed models based on Random and fixed variables
My question is – this takes 820 sec when i execute only julia, if i call the same code from pyjila its 2-3 hours. I dont think its due to internal parallelism issue (any mixed model package). It may be due python to julia communication.
This is an important point. Looks like I was on the wrong track here (though it did reveal a different bug). There might be a few ways BLAS could end up performing differently in the embedded pyjulia situation.
For example, if NumPy is also compiled against OpenBLAS and loads it first, then Julia might pick up the wrong shared library. From what I can tell, Julia’s BLAS ccalls don’t use a hard path, so dlopen could be defaulting to the existing handle because everything is in the same address space. If that library was previously initialized by NumPy with a lower number of threads, then performance would be different.
@Harish_Kumar what Python distribution are you using? Anaconda comes with MKL so it shouldn’t be an issue there, but other Python distributions probably compile against OpenBLAS.
yes, even we tried with python2.7, addprocs working fine (with work around). Currently we are using python 3.4 where it fails with “ERROR :System image file “//…/lib/julia/sys.ji” not found” error. Any help on this?
You may be right that the bottleneck may be in the communication of data between Python and Julia.
One way to move a pandas dataframe to Julia is to write a feather file in Python/pandas and read it in Julia using the Feather package. The Python code is something like
import feather
import pandas as pd
#df is spark df
df = pd.read_csv('/home/hpcuser/test.csv')
feather.write_dataframe(df, 'test.feather')
If you know that there are no missing data values in the data frame I recommend calling Feather.read in Julia as
using Feather, DataFrames, MixedModels
df = Feather.read("test.feather", nullable = false)
It happens that the particular formula you use is not handled as efficiently in the current (0.7.0) release of MixedModels. In the numeric representation of the model, the random-effects terms should be amalgamated into a single term with a special structure. I know how to do the algebra, I just haven’t worked out a good way of specifying the model. It is possible to work backwards reassembling the terms that have been specified separately, but it may be more effective to allow for another argument instead so the model is specified as
I am having some trouble in collecting the data back after the julia call. Can i know what is the data type of jl.jl_eval_string() , i always get a integer back but in my code i return dictionary.
calcLME = jl.jl_eval_string(juliaCode)
result = calcLME(inputData)
We’ve provided several suggestions: number of processors, number of BLAS threads, and communication between Python and Julia. Without a minimal example demonstrating the problem, including data (generated or otherwise), we are all just guessing, so continuing this thread is not productive.