How to send an array from Splunk to a dataframe?

Hi

I have used Splunk API (in Python) to retrive data. Then I am sending that data to a Dataframe. This is the code:

results=pyimport("splunklib.results")

kwargs_oneshot = (earliest_time= "2019-09-07T12:00:00.000-07:00",
                  latest_time= "2019-09-09T12:00:00.000-07:00",
                  count=0)

searchquery_oneshot = "search index=iis | lookup geo_BST_ONT longitude as sLongitude, latitude as sLatitude | stats count by featureId | eval log=ln(count)| table featureId log | geom geo_BST_ONT allFeatures=True"

oneshotsearch_results = service.jobs.oneshot(searchquery_oneshot; kwargs_oneshot...)

# Get the results
reader = results.ResultsReader(oneshotsearch_results)

# collect them into an array
Dv = collect(reader);

## Sending data to a dataframe
using DataFrames
df_splunk=vcat(DataFrame.(Dv)...)

The problem arises whenever I say to Splunk to calculate a β€œlog” (see: β€œsearchquery_oneshot”) . Because Julia does not want to β€œeat” the data.

The error that I get is:

ArgumentError: column(s) count are missing from argument(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637 and 638, and column(s) log are missing from argument(s) 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766, 767, 768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, 782, 783, 784, 785, 786, 787, 788, 789, 790, 791, 792, 793, 794, 795, 796, 797, 798, 799, 800, 801, 802, 803, 804, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814 and 815

Stacktrace:
 [1] #_vcat#82(::Symbol, ::typeof(DataFrames._vcat), ::Array{DataFrame,1}) at /home/juliana/.julia/packages/DataFrames/uPgZV/src/abstractdataframe/abstractdataframe.jl:1226
 [2] #reduce#77 at ./none:0 [inlined]
 [3] (::Base.var"#kw##reduce")(::NamedTuple{(:cols,),Tuple{Symbol}}, ::typeof(reduce), ::typeof(vcat), ::NTuple{815,DataFrame}) at ./none:0
 [4] #vcat#76(::Symbol, ::typeof(vcat), ::DataFrame, ::Vararg{DataFrame,N} where N) at /home/juliana/.julia/packages/DataFrames/uPgZV/src/abstractdataframe/abstractdataframe.jl:1180
 [5] vcat(::DataFrame, ::DataFrame, ::DataFrame, ::Vararg{DataFrame,N} where N) at /home/juliana/.julia/packages/DataFrames/uPgZV/src/abstractdataframe/abstractdataframe.jl:1180
 [6] top-level scope at In[5]:3

**This is related to: https://discourse.julialang.org/t/how-to-send-data-to-a-dataframe-when-some-values-are-nan-or-empty/34653

Can you post the error message? There is no issue creating a column with NaN because NaN is a standard Float64 value. Therefore the column is just seen as a column of floating point numbers.

add to your example

x = randn(8)
x[2:5] .= NaN
df[!,:x] = x

This is the error.

ArgumentError: column(s) count are missing from argument(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637 and 638, and column(s) log are missing from argument(s) 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766, 767, 768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, 782, 783, 784, 785, 786, 787, 788, 789, 790, 791, 792, 793, 794, 795, 796, 797, 798, 799, 800, 801, 802, 803, 804, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814 and 815

Stacktrace:
 [1] #_vcat#82(::Symbol, ::typeof(DataFrames._vcat), ::Array{DataFrame,1}) at /home/juliana/.julia/packages/DataFrames/uPgZV/src/abstractdataframe/abstractdataframe.jl:1226
 [2] #reduce#77 at ./none:0 [inlined]
 [3] (::Base.var"#kw##reduce")(::NamedTuple{(:cols,),Tuple{Symbol}}, ::typeof(reduce), ::typeof(vcat), ::NTuple{815,DataFrame}) at ./none:0
 [4] #vcat#76(::Symbol, ::typeof(vcat), ::DataFrame, ::Vararg{DataFrame,N} where N) at /home/juliana/.julia/packages/DataFrames/uPgZV/src/abstractdataframe/abstractdataframe.jl:1180
 [5] vcat(::DataFrame, ::DataFrame, ::DataFrame, ::Vararg{DataFrame,N} where N) at /home/juliana/.julia/packages/DataFrames/uPgZV/src/abstractdataframe/abstractdataframe.jl:1180
 [6] top-level scope at In[5]:3

Seems to me the issue is that the column count is not on all of the DataFrame.(Dv) results.

If you do something like dfvec = DataFrame.(Dv) you should get a vector of the data frames that you can inspect to see if this is indeed the case.

There is always this message:

815-element Array{DataFrame,1}:
 1Γ—5 DataFrame. Omitted printing of 2 columns

But when I just retrieve data from Splunk without a β€œlog” and then send it to the dataframe I do not get that error.

julia> using DataFrames

julia> df = DataFrame(a = repeat([1, 2, 3, 4], outer=[2]),
                  b = repeat([2, 1], outer=[4]),
                  c = randn(8))
8Γ—3 DataFrame
β”‚ Row β”‚ a     β”‚ b     β”‚ c         β”‚
β”‚     β”‚ Int64 β”‚ Int64 β”‚ Float64   β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1     β”‚ 2     β”‚ 0.425327  β”‚
β”‚ 2   β”‚ 2     β”‚ 1     β”‚ 2.003     β”‚
β”‚ 3   β”‚ 3     β”‚ 2     β”‚ 0.0297242 β”‚
β”‚ 4   β”‚ 4     β”‚ 1     β”‚ -0.137706 β”‚
β”‚ 5   β”‚ 1     β”‚ 2     β”‚ 0.499277  β”‚
β”‚ 6   β”‚ 2     β”‚ 1     β”‚ -0.490561 β”‚
β”‚ 7   β”‚ 3     β”‚ 2     β”‚ -0.434039 β”‚
β”‚ 8   β”‚ 4     β”‚ 1     β”‚ -0.243863 β”‚

julia> df2 = DataFrame(a = repeat([1, 2, 3, 4], outer=[2]),
                  b = repeat([2, 1], outer=[4]),
                  c = randn(8),x=ones(8))
8Γ—4 DataFrame
β”‚ Row β”‚ a     β”‚ b     β”‚ c         β”‚ x       β”‚
β”‚     β”‚ Int64 β”‚ Int64 β”‚ Float64   β”‚ Float64 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1     β”‚ 2     β”‚ 0.191768  β”‚ 1.0     β”‚
β”‚ 2   β”‚ 2     β”‚ 1     β”‚ 0.455636  β”‚ 1.0     β”‚
β”‚ 3   β”‚ 3     β”‚ 2     β”‚ 1.77497   β”‚ 1.0     β”‚
β”‚ 4   β”‚ 4     β”‚ 1     β”‚ -0.173877 β”‚ 1.0     β”‚
β”‚ 5   β”‚ 1     β”‚ 2     β”‚ 0.652813  β”‚ 1.0     β”‚
β”‚ 6   β”‚ 2     β”‚ 1     β”‚ 0.153738  β”‚ 1.0     β”‚
β”‚ 7   β”‚ 3     β”‚ 2     β”‚ -0.617698 β”‚ 1.0     β”‚
β”‚ 8   β”‚ 4     β”‚ 1     β”‚ 0.654069  β”‚ 1.0     β”‚

julia> vcat(df,df2)
ERROR: ArgumentError: column(s) x are missing from argument(s) 1
Stacktrace:
 [1] #_vcat#75(::Symbol, ::typeof(DataFrames._vcat), ::Array{DataFrame,1}) at C:\Users\tbeason\.julia\packages\DataFrames\yH0f6\src\abstractdataframe\abstractdataframe.jl:1117
 [2] #reduce#70 at .\none:0 [inlined]
 [3] #reduce at .\none:0 [inlined]
 [4] #vcat#69 at C:\Users\tbeason\.julia\packages\DataFrames\yH0f6\src\abstractdataframe\abstractdataframe.jl:1082 [inlined]
 [5] vcat(::DataFrame, ::DataFrame) at C:\Users\tbeason\.julia\packages\DataFrames\yH0f6\src\abstractdataframe\abstractdataframe.jl:1082
 [6] top-level scope at REPL[4]:1

That error message has nothing to do with the log function. It comes from trying to vcat two data frames that have different columns. You can instead take the union by using vcat(df,df2,cols=:union), but that may not be what you are trying to do.

2 Likes

This is some of the data that I want to send to a dataframe. As you can see, there is a β€œlog”. Whenever I retrieve it from Splunk and try to send it to a Dataframe I get the error before mentionned. Otherwise all the data is sent to a Dataframe.

There is a β€œlog”. It is the log of β€œcount”. the calculation has been made in Splunk. I can never send this to a dataframe.

This is the dataframe when I do not retrieve β€œlog” from Splunk.

Your first post seems to complain about a DomainError, which is something that you would encounter if you try log(-1) for example. But the error message that you gave me is strictly related to attempting to combine non-identical DataFrames. So, I’m not sure how else to help you. Perhaps you should break the code up into smaller chunks to make sure you understand what each piece is doing.

1 Like

What I am thinking right now is:

One solution would be: is there other way to send my Splunk’s array to a Julia’s dataframe?

The other solution is to just retrieve the data without the β€œlog” and once it is in Julia’s dataframe…multiply the β€œcount” column with a log. β€”> THIS IS WHAT i AM DOING RIGHT NOW

My guess is that log is not present in all the Dicts coming from Splunk. Maybe you can sanitize your data first by setting log to missing when Dicts don’t have them.

But please read, and post, your error messages. You have gotten three different errors from three different things. And remember to do vcat with union.

As another poster said earlier, the NaNs shouldn’t be an issue.

1 Like

I have improved my question. How do you do vcat with union?

Check out the documentation for vcat here. Remember you can also look up documentation by doing ?vcat in your jupyter notebook.

# make an array of dataframes
dfs = [DataFrame(d) for d in vec_of_dicts]
df_new = DataFrame()
for df in dfs
    df_new = vcat(df_new, df, cols = :union)
end

You can also do

reduce(vcat, dfs, cols = :union)
2 Likes