I am having difficulty in understanding on how to link up two dataframes to create another dataframe based on multiple criteria. Apologies, if the code below seem too long-most of it is to create the two dataframes in question.
I have two dataframes, namely df1 and df2. In df2, I have some numerical data that has column names as dates. I want to create another dataframe (df3) that will have all the data as in df2 but numerical data updated by applyling a formula. This should be based on:
- columns RANK and LEVEl in the two dataframes matching up
- GRADE in df2 corresponds to columns named βOneβ, βTwoβ and βThreeβ in df1
- Formula is simply: the relevant factor in df1 (based on RANK, LEVEL, GRADE, Time matching up with df2)value in df2df_Time calculated (based on dates in df2)
- if df_Time > 5, then it should apply the factor for Time 5 (as df1 only contains data up to Time 5). For example, for the data item in the last column and first row, this will give the right result
0.678235743 470.935.498630 (where 5.49 is the time corresponding to the date in that column, 470.93 is the numerical data in df2 and 0.67 is the factor from df1)
Similarly, if time in df2 is 3.5 then this requires apply the factor corresponding to time 4 in df1. - if VEHICLE in df2 is BIKE then the formula wouldnβt apply and the original data in df2 would go in df3
The code below generates the two dataframes, but I am not sure how to produce df3 based on the criteria above. Any help will be appreciated.
using DataFrames, Dates
function create_data()
Time = repeat(collect(1:5),outer =4)
CAT1= Vector{String}(undef,20)
for i in 1:20
i <= 10 ? CAT1[i] = "A" : CAT1[i] = "B"
end
CAT2 = Vector{String}(undef,20)
for i in 1:20
i <= 10 ? CAT2[i] = "EASY" : CAT2[i] = "HARD"
end
temp_data = rand(20,3)
df1 = DataFrame(Time = Time,RANK=CAT1,LEVEL=CAT2,One=temp_data[1],Two=temp_data[2],Three=temp_data[3])
Date_1 = Vector{Date}(undef,6)
for i in 1:6
Date_1[i] = Date(2020+i,06,30)
end
TYPE = Vector{String}(undef,20)
for i in 1:20
i <= 17 ? TYPE[i] = "CAR" : TYPE[i] = "BIKE"
end
GRADE = Vector{String}(undef,20)
for i in 1:20
i <= 10 ? GRADE[i] = "One" : GRADE[i] = "Two"
end
df2_Data= rand(20,6)*1000
df2_temp = DataFrame(Symbol("30/06/2020") =>df2_Data[:,1],Symbol("30/06/2021") =>df2_Data[:,2],
Symbol("30/06/2022") =>df2_Data[:,3],Symbol("30/06/2023") =>df2_Data[:,4],
Symbol("30/06/2024") =>df2_Data[:,5],Symbol("30/06/2025") =>df2_Data[:,6])
df2_temp2= DataFrame(ID=collect(1:20),GRADE=GRADE,RANK=CAT1,LEVEL=CAT2,VEHICLE=TYPE)
df2 = hcat(df2_temp2,df2_temp)
return df1, df2
end
df1,df2= create_data()
function calc_time(df2)
# Calculate time as year fractions based on dates in df2
dates_string = names(select(df2, Not(Between(:ID, :VEHICLE))))
dates_formatted = Date.(dates_string, Dates.DateFormat("dd/mm/yyyy"))
start_date = Date(2020,01,01)
n_days = dates_formatted .- start_date
df2_Time = Dates.value.(n_days) / 365
return df2_Time
end
df2_Time = calc_time(df2)
'''