Gadfly boxplot is getting plotted like a vertical straight line (missing the box part)

Hello,

I am trying to make a boxplot using Gadfly but all I see are straight vertical lines. Also, this seems to be a very tricky issue. It only happens for a specific combination of data points, and I am not able to figure out what’s wrong. I have replicated the issue using fake data below, please help me understand what may I be missing?

Also, there are 2 issues that this dataset is causing.

  1. The major issue is that I am seeing all vertical lines
  2. If I try to make the plot differently, I do see box plots but then the left most and the right most end of the plot is getting trimmed away.

Fake Data

df_plot = vcat(
DataFrame(model = repeat([“one”,“two”, “three”],inner=242),
type = repeat([“type1”],outer=726),
data=rand(726)),

DataFrame(model = repeat([ “four”, “five”, “six”],inner=242),
type = repeat([“type2”],outer=726),
data=rand(726)))
image

Some more information about the data
image

Plot (issue 1)

Gadfly.plot(df_plot,
x = :type,
y=:data,
color=:model,
Geom.boxplot
)

Plot (issue 2)
Gadfly.plot(df_plot,
x=:model,
y=:data,
color = :type, # swapped color and x aesthetics
Geom.boxplot,
)

I tried to increase the width of the plot using

set_default_plot_size(10inch, 4inch)

but the issue still remains

Thank you for all the help!

I would really appreciate if anyone can help with a workaround for issue 2. It’s a bit important for me right now. By workaround I mean if I can make any adjustments through spacing etc in Themes to fix the issue. I cannot really change the format of the plot given the actual requirement in my work where this (kind of) plot is needed.

Thanks a lot for the help!

gadfly is getting confused because the models for type1 (one, two, three) are not the same as the models for type 2 (four, five, six). is this really the nature of your data? if i change your example so that they are (ie type2 models are also one, two, three), then the output looks reasonable:

df_plot = vcat(
       DataFrame(model = repeat(["one","two", "three"],inner=242),
       type = repeat(["type1"],outer=726),
       data=rand(726)),

       DataFrame(model = repeat([ "one", "two", "three"],inner=242),
       type = repeat(["type2"],outer=726),
       data=rand(726)))
plot(df_plot,
       x = :model,
       y=:data,
       color=:type,
       Geom.boxplot
       )

plot(df_plot,
       x = :type,
       y=:data,
       color=:model,
       Geom.boxplot
       )

as a workaround, if that is truly the nature of your data, i would just make two plots using different slices of the dataframe:

julia> type1 = Gadfly.plot(df_plot[df_plot.type.=="type1", :],
       x = :model,
       y=:data,
       Geom.boxplot,
       Guide.title("type1"));

julia> type2 = Gadfly.plot(df_plot[df_plot.type.=="type2", :],
       x = :model,
       y=:data,
       Geom.boxplot,
       Guide.title("type2"));

julia> hstack(type1, type2)

Thanks a lot @bjarthur for the workaround suggestion.

Yes, that’s the nature of the data. I have two different categories that defines my models. I want the x axis to show a property of the model (which I represented as one, two, three here). Given both categories do not have any overlapping properties, my x-axis is unique between categories.

I wanted readers to clearly see the two categories (it’s not obvious from the x-axis). Hence, I was playing with either differentiating the categories by grids or by color. This way I was hoping, the different categories will pop out to the readers.

Also I am putting the property of the model in x axis because the y axis metric has a linear (kind of) relation with x axis (property of model). Interestingly, as the category changes, the degree of linear relationship changes too. Hence, I wanted to pop this information in my graph.

Thanks again!