I use DataFrame to hold data where a set of columns is considered the primary key. So only one record should be present for each primary key. But I suppose in DataFrame such condition cannot be enforced. So for example the following code where person and town would represent the primary key and time the queried value :
using DataFrames, DataFramesMeta
using Dates
df2 = DataFrame(person = ["John", "Nick", "Mary", "Mary"],
time = [DateTime("20150101","yyyymmdd"),
DateTime("20150101","yyyymmdd"),
DateTime("20150201","yyyymmdd"),
DateTime("20150601","yyyymmdd") ],
town = ["Brisbane", "Perth","Wollongong", "Wollongong"]);
e = df2[(df2.person .== "Mary") .& (df2.town .== "Wollongong"), :time]
show(e)
Suppose that person visiting the same town twice is impossible. I ask when Mary visited Wollongong and expect a single date but get an array if many records were originally put into df2.
Is it better to use a NamedArray or Dict instead of Dataframe to enforce records with unique primary keys? In Pandas there is MultiIndex
, which I suppose can achieve something like this.