How to fill dataframe from an array and date range?

So let’s see if I can convey what I’m trying to do:

I have a list (array actually) of 931 technology names, and a known minimum date and maximum date. I want to create a dataframe in the most Julia-esque way which has every date (between the minimum and maximum dates) associated with every technology name, and missing for any additional columns I want in the dataframe. How do I do this?

Thank you in advance!

Perhaps you need join, but I don’t fully understand what you are trying to do from this description. Perhaps provide a simplified example of an input and the expected output; or an MWE working code that you want to simplify.

I’m also unsure of what you want - when you say “every date”, do you mean year, month, day? There are a number of ways I can imagine structuring the DataFrame, and I don’t think any of them after more or less Julian than others.

Once you explain the form your data takes (you said you have an array of technologies, but how are they situated with dates?), and more about the form you want it in, I think we can be of more help.
In particular, if you can make a MWE as Tamas suggested, maybe just take two or three of the technologies and dates and manually build the DataFrame you want.

Y’all gave me some ideas, so here’s what I came up with:

unique_reddit_dates = DataFrame(reddit_date = unique(comment_df.reddit_date))     
comment_df_empty = DataFrame(audience = reddit_search.audience, subdomain = reddit_search.subdomain, comment_count = 0)
comment_df_empty = join(comment_df_empty, unique_reddit_dates, kind = :cross, makeunique = true)
comment_df_empty[:subdomain_all_freq_comment_count] = 0

# then remove empty records which we have actual values for (roughly thirty thousand right now)
not_in_comment_df = join(comment_df_empty, comment_df, on = [:audience, :subdomain, :reddit_date], kind = :anti)

# and add in the blank records
comment_df = vcat(comment_df, not_in_comment_df)```

Which works. Is there a better/more Julia-ish way?

I would create a function that takes mindate and maxdate and returns a vector of all of the days you’d want to include that fall between them.

Then I’d create a loop that loops over your existing list, calls this function, creates a DataFrame from the result, and sequentially append these DataFrames together.

1 Like