I am getting slow performance with JuliaDB when I run optimization problems. I store data for all simulations in one CSV file for each set of data, and then filter it for the relevant simulation and make a call to the optimization function in a loop.
If I only include the data for one simulation in CSV files then run-time is about 4 seconds. However, when the CSV file contains data for six simulations and I do a run for the all the scenarios, then the total run-time is 55 seconds, even if I use same set of data for each simulation.
I suspect the reason for this is that the program spends a lot of time filtering data for each scenario in the loop. I have given a simple example below to explain how my code works, but the actual program obviously has a lot more data.
I have tired to parallelize it using Distrubuted; addprocs(4)
but I did not had any success with it. Even if the code is parallelized, the maximum reduction in time will be 4 times on a 4 core processor, so still 55/4 = 13.75 seconds per scenario which is high compared to the run time of 4 seconds (if data of only one scenario is contained in the CSV files).
To improve the run-time, I reorganised the data for one particular simulation so that it can be read as a matrix using Matrix{Float64}(CSV.read("…")
function. This reduced run-time to 0.4 seconds, but I am not sure how I can set up a loop to run the optimization if data for each simulation is stored in a separate CSV file.
I will appreciate if some one can guide me on:
1. How to improve the run-time, even if this means reorganising input data if I have to run the program for all simulations? For example, if separate CSV files for each scenario will be best then how can I make the program to read the relevant files and store the values in a matrix.
2. Is it possible to parallelize the operation in question 1?
The code is given below. Data similar to that in the CSV files is constructed using table
function.
using JuliaDB, JuMP, Clp, LinearAlgebra, NPFinancial, Tables, DataFrames, CSV
function Optimizer(Prices,Liability_CF,Assets_CF)
model = Model(Clp.Optimizer)
@variable(model, 0<= x[b = 1:5])
@objective(model,Min,sum(Prices[i]*x[i] for i=1:5))
@constraint(model, Assets_CF*x .>= Liability_CF)
optimize!(model)
return JuMP.value.(x)
end
function Yield(Bonds,Assets_CF,Prices)
Assets = Float64.(vcat(-dot(Prices,Bonds), vec(Assets_CF*Bonds)))
return irr(Assets)
end
# Data is loaded using loadtable function of JuliaDB.
# Each of these three are loaded as separate CSV files
prices = table([1,1,1,1,1,2,2,2,2,2],[99.74, 91.22, 98.71, 103.75, 97.15,100,93,94,105,101];names = [:Simulation, :Prices])
cashFlowLiab = table([1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2],[5000, 7000, 7000, 6000, 8000, 7000, 2000, 1000, 555.5, 655.5, 100, 555.5, 655.5,1000,500,1500];names = [:Simulation, :Liability_CFs])
cashFlows = table([1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2],[4,4,4,4,4,4,4,104,3.8,3.8,3.8,3.8,3.8,3.8,3.8,98.8],[5,5,5,5,5,5,105,0,4.75,4.75,4.75,4.75,4.75,4.75,99.75,0],[2.5,2.5,2.5,2.5,102.5,0,0,0,2.375,2.375,2.375,2.375,97.375,0,0,0],[5,5,5,5,5,105,0,0,4.75,4.75,4.75,4.75,4.75,99.75,0,0],[4,4,4,4,4,104,0,0,3.8,3.8,3.8,3.8,3.8,98.8,0,0];names = [:Simulation, :T1,:T2,:T3,:T4,:T5])
for nsim = 1:2
Prices = (JuliaDB.select(filter(i -> (i.Simulation == nsim), prices),(:Prices)))'
ACF = JuliaDB.select(filter(i -> (i.Simulation == nsim), cashFlows),All(Between(:T1, :T5)))
Assets_CF = (Tables.matrix(ACF))
Liability_CF = JuliaDB.select(filter(i -> (i.Simulation == nsim), cashFlowLiab),(:Liability_CFs))
Bonds = Optimizer(Prices,Liability_CF,Assets_CF)
CSV.write("noOfBonds.csv",Tables.table(Bonds'), append = nsim !=1, writeheader=false)
PortfolioYield = Yield(Bonds,Assets_CF,Prices)
CSV.write("PortfolioReturn.csv",DataFrame(Simulation = nsim, Yield = PortfolioYield), append = nsim !=1)
end