Changing file paths in structs


#1

This is probably more about how to write clean code:

I have a struct containing a reference to a local file and some additional annotations (in my case sequence alignments and patient information)

struct FooData <: AbstractFooData
    filepath::String
    annotation::Vector{Annotation}
    some other fields...
end

several objects of type FooData are combined to a dataset type:

struct FooDataset <: AbstractFooDataset
    name::String
    data::Vector{AbstractFooData}
  end

I then have an analysis type for which I define a run method.

struct FooAnalysis
    name::String
    dataset::FooDataset
end

function run(analysis::FooAnalysis, dir::AbstractString)
    some long calculations...
end

The run method saves some files into the directory dir and also copies all files contained in each FooData.filepath so that everything can later be replicated. For this it would be nice to save a copy of FooAnalysis (most likely as some .jld2 file), but I need to update the FooData.filepath fields for that. What is the best option to do this?

The possibilities I came up with are:

1.) just declare FooData as mutable. This seems a bit awkward because I don’t want the end user to change any of the other fields (but maybe it isn’t?).

2.) create a mutable FilePath struct and just update that.

3.) recreate the FooAnalysis object with the new filepaths. This is the option I like best, but I don’t know how to do that, because FooDataset can contain any object of supertype AbstractFooData, and all they have in common is the filepath and annotation field.

I hope this isn’t too vague, but maybe someone has a nice idea on how to structure these kind of problems.


#2

For this it would be nice to save a copy of FooAnalysis (most likely as some .jld2 file), but I need to update the FooData.filepath fields for that.

If by that you mean, that the data files are going to be saved in another folder and “packaged” with the .jld2 file in a specific file structure, then why not use a relative path in the filepath field which doesn’t change, and have another data_dir field in the FooAnalysis or FooDataset type that you can more easily change without digging deep into your data structure. The problem with this approach is that it assumes the data files are always in a specific hierarchy, so it is less flexible if you want to re-arrange the files into subfolders or move them around.

1.) just declare FooData as mutable. This seems a bit awkward because I don’t want the end user to change any of the other fields (but maybe it isn’t?).

You can declare the first field as a Base.RefValue{String} type. A RefValue type is a mutable type with special syntax, so you can do dataobj.filepath[] = new_filepath. Alternatively, you can define your own mutable filepath type.

mutable struct FilePath
    p::String
end

and make dataobj.filepath of type Filepath.

IIUC, my preferred option would be that while scanning the files in fooanalysis.foodataset.data to write them, you can use something like https://github.com/jw3126/Setfield.jl to make a new FooData struct with a different filepath for each one you are writing, and replace the current foodataset.data[i] with the new FooData struct which has the new path. Of course, you will need to deepcopy the FooAnalysis struct input to run to avoid modifying the input if that’s not your goal, since we would be modifying foodataset.data.

Good luck!


#3

Great, thank you!

The reason I can’t use relative file paths is that the data objects are usually provided by the user of this future package and are probably scattered across different locations, so I thought it would be best to treat the results directory as a single entity that can be moved across different computers.

For now I went for the FilePath approach and just provided an outer constructor for FooData that takes a string and this seems to work just fine.

Thank you again