Reimagining C++ project in Julia: arrays and structs vs data frames

I delayed a response because I was reluctant to post any code. I found a chunk that’s not too long but has a few complicated parts that would help me out greatly.

Comments:

  1. I used a few types from the Standard Template Library. Map, Multimap and vector. Multimap has the unique property of handling duplicate keys. In my program, gravity stations have a large number of data fields that I can model as a vector of structs. The station names are the keys. Of course, stations are repeated so I have duplicate entries for some stations, which are eventually removed after some processing to leave a unique set. I haven’t found a suitable substitute in Julia yet.
  2. C++ methods are class functions and I can move them outside the struct which I would use to replace the class.
  3. The data members fall into two types: primitive data types like string, int and double; and reference pointers and an array of station objects (converted to Julia structs).
  4. So, it looks like I have to tackle the problem of building a struct that’s composed of other structs. And what I’ve seen so far is not that easy.
  5. It pretty much has to be a mutable struct. The constructor is called with the filename, and references to other objects. The constructor then reads the data file in two parts: the header that fills the primitive data fields, and the table of station data. As each on is read in, a station object is constructed, and added the the vector.
  6. Quite a lot is going on inside each constructor. The main program manages a list of these loop files and processes each one until they are all added to the station multi map. Further processing is done to build out the fields in each station struct.
typedef map <const string, double, less<string> > refBaseMapType;
typedef map <const string, GravityMeter, less<string> > meterMapType;
typedef multimap <const string, GravityStation, less<string> > gravityStationMMapType;
typedef vector <GravityStation> GSVector;

class GravityLoop {
public:
	// default constructor, initializes data
	GravityLoop(string, gravityStationMMapType &, refBaseMapType &, meterMapType & );
	
	~GravityLoop();					// default destructor, releases heap memory
	void readFile();				// read input file into job
	void printHeader();				// print contents of loop to standard output
	void writeObsFile(string);		// Write observed gravity file
	void writeReport();				// Write station data report file

	string FileName() const {return fileName;}
	void setFileName(string fn){fileName = fn;}
	int StationCount() const {return nStations;}
	double TotalDrift() const {return totalDrift;}
	
private:
	string fileName;		// loop filename
	string title;			// Loop Title
	string meterName;		// Meter Name
	string operatorName;	// Operator Name
	CalDate loopDate;		// Day of the survey
	double timeZoneOffset;	// Time zone offset to UTC
	double latitude;		// Loop Latitude
	double longitude;		// Loop Longitude
	
	GSVector gsVec;			// Vector of loop gravity stations
	
	gravityStationMMapType &staMMap;	// MultiMap of station names, station objects
	refBaseMapType &refBaseMap;	// Map of reference base names, values
	meterMapType &meterMap;		// Map of meter names, meter objects
	
	int nStations;			// Number of Stations in loop
	double totalDrift;		// Total loop drift
	
};

Part of the solution may be replacing “array of structs”

struct Person
name::String
age::Int
end

persons = [Person("Alice", 20), Person("Bob", 30), ...]

with “struct of arrays”, which conceptually operates like this:

struct PersonVector
name::Vector{String}
age::Vector{Int}
end

persons = PersonVector(["Alice", "Bob"], [20,30])

In Julia, the “struct of arrays” concept uses the Tables.jl interface, which is implemented in a number of different packages, e.g. in DataFrames, StructArrays, TypedTables, TupleVectors, etc.

1 Like

More progress - I’ve been studying the JuliaGeo project. Geodesy.jl has a lot I can relate to. There’s a lot to learn from here. I retract what i said about not including many ellipsoids. The ellipsoid constructor allows you to add the definitions of any ellipsoids you wish to use. I have a lot to digest here. Thanks for all the help and comments.

And there is also PROJ that is integrated in packages like GDAL and GMT