[ANN] A new lightning fast package for data manipulation in pure Julia

Julia is generally very memory efficient. It has around 200 Mb overhead to launch, but it gives you very good tools to write memory efficient data manipulation. (The one exception is that String has unfortunately high overhead for small strings currently).

6 Likes

I want to point out to observers of this thread that there seems to be some funny business going on here and in other discussions related to InMemoryDatasets. Specifically, there appear to be a sizable number of people in these threads who are masking their location and there are indications that they may be coming from a common location/network. This does not appear to be straight up sock puppetry (a la Henning Rousseau)ā€”posters seem to (mostly) actually be different peopleā€”but there does seem to be some sort of hidden agenda here. My guess is that the goal is to make it appear that there is more widespread dissatisfaction with DataFrames in the Julia community than there actually is. I feel I have to post this warning so that participants in the conversations are not taken in by any deception.

To those people in the thread who are doing thisā€”first of all, welcome! You mostly appear to be new to the Julia community and you come bearing code, which is great! However, please consider taking a different approach here. First, stop trying to start flame wars with DataFrames developersā€”that is not cool. Also, please stop trying to appear to be independent people who just happen to all be fed up with DataFrames. Itā€™s fine if you all work together and are collectively frustrated with DataFrames. Thatā€™s absolutely okā€”just donā€™t be deceptive about it. It is also totally fine to have developed a fork of DataFrames and compete with it. The MIT license very much allows that and if you think you can do better, by all means, give it a try.

(One thing that does need to be fixed is the license copyright notice: InMemoryDatasets appears to be derived from the DataFrames code base and the MIT license does require keeping the copyright notice intact, so if you could fix that, that would put the project in legally upstanding footing.)

Assuming that Iā€™m correct about the ā€œcompany that is collectively interested in improving on DataFramesā€ interpretation of whatā€™s going on, my suggestion would be: take a beat, reset the conversation, be direct about working together and that you have created and are promoting InMemoryDatasets as an open source alternative to DataFrames. Maybe Iā€™m wrong about my interpretation and if so, feel free to let me know here or privately whatā€™s actually going on.

43 Likes

I think it would be good to give people a shot to give alternative explanations since I can imagine several hypothetical reasons for this beyond a desire to seem larger in size than appropriate.

5 Likes

Yeah, Iā€™m absolutely open to something else being up hereā€”feel free to DM meā€”but something odd is going on and it seemed like people ought to be aware.

4 Likes

A post was split to a new topic: Julia PR team?

The InMemoryDatasets new users have used all but ā€œsubbtle waysā€

I really want to make sure that this does not become accusatory and that we donā€™t pile on. Nothing that has been done is terrible and itā€™s really exciting to have people interested enough in data wrangling in Julia to take a crack at a new package like this. Itā€™s a lot of work and itā€™s generously contributed for anyone to use. Who amongst us hasnā€™t gotten a little vehement in our defense of Julia against its alternatives? Pointing out differences between similar packages can surely easily go the same way without ill intentions. Weā€™re not sure what the motivation is for the funky accounts, but letā€™s please, please, please letā€™s give people the benefit of the doubt.

15 Likes

Perhaps itā€™s best to leave it at that (just the facts) and avoid speculating about intentions. That way anybody from the relevant group has an opportunity to explain if they choose, without feeling defensive.

Yeah, but allow me one remark: I donā€™t want to read anymore posts about about the benefits of competition vs. cooperation (which we could discuss elsewhere). If we are going Darwinian in an open source forum, Iā€™m out.

3 Likes

I think these kinds of posts shouldnā€™t be included in a public discussion, because they are more dangerous to community than any use. Including author there are less than 30 people involve in this topic and call it sizable is a little rash, I also searched topics about InMemoryDatasets and I found 7 of them so far which one of them is this announcement and two of them are also mine.

Sorry to object: Iā€™m thankful for this kind of transparency.

On the one hand I can imagine frustrations of developers like @sl-solution to get improvement PRs rejected by mature libraries due to compatibility reasons which lead to these kinds of new developments.

On the other hand this is a bad sign, which needs to be addressed:

3 Likes

This is a very commonly misunderstood detail of the MIT license. Seems like a mistake.

4 Likes

Yeah, thatā€™s a very common mistake that Iā€™ve made myself before. I think so long as people are gracious about fixing these things when brought to their attention and itā€™s not some clear pattern of bad behaviour, itā€™s best that we all assume that MIT license violations are accidental.

3 Likes

Awesome! Right package in the wrong language.

If you donā€™t mind: could you help me understand why you joined a Discourse forum for a language you think is bad exactly 1 minute before posting a comment about a very specific package? What was the specific chain of events that led to that happening? It seems like a remarkable coincidence.

23 Likes

Sincerely, the most insulting thing about this comment is not the puerile attack to the language but how it genuinely underestimates the community efficiency to spot a troll on sight.

7 Likes

This comment is intended to the original author of this post within a very specific context which Iā€™m not obligated to share with you. Please donā€™t make any further assumptions.

I would appreciate it if you could point me to specific guidelines that bans such an ā€œinsultā€ to the language.

Who said it was banned?

Calling someone out for poor behavior can be done regardless of whether that behavior is specifically banned.

And poor behavior is not ok just because you have a hidden agenda, as you yourself admit.

10 Likes

https://discourse.julialang.org/guidelines

2 Likes

From the first reactions here I indeed conclude this to be a significant improvement. Iā€™d also expect that the author put some thought into which language to use. So where do you think he misjudged?

Granted. But if you are not interested in discussing this publicly you could use Discourse PM instead.

6 Likes