How to get value attribute from an < li > element of HTML in Julia?

I want to get value attribute from all < li > element of HTML. Like 34123, 34122 etc, in a array form so that i can use them for identification.

image

This is related to the Gumbo documentation.

Just writing code that I will not run - so take it with a grain of salt:

li_elements = children(ol_element)
li_vals = [ getattr(li, "value") for li in li_elements ]

Your li_vals should be a vector holding all the values from the li elements at this point.

Please take a look at the convenience methods provided by Gumbo.jl.

Please see this i am not able to extract value of < li > element. :slightly_smiling_face:

In your input line, after Gumbo.HTMLElement{:li} you’ve put a colon (:), which is usually the start of a range:

1:10

so I’m guessing that the message is saying that you started a range expression, used a colon, but then didn’t supply the last part(s) of the range.

I’d have to read the Gumbo docs to find out whether ranges are OK here.

MethodError: no method matching (::Colon)(::Type{Gumbo.HTMLElement{:li}}, ::Int64)

Closest candidates are:

(::Colon)(::T, ::Any, !Matched::T) where T

@ Base range.jl:44

(::Colon)(!Matched::T, ::Real, !Matched::T) where T<:AbstractFloat

@ Base range.jl:18

(::Colon)(!Matched::T, ::T) where T<:Real

@ Base range.jl:5

  1. top-level scope@Local: 1
getattr(Gumbo.HTMLElement{:li}:5, value)

What are you hoping to do first with the : and now with the :5? That does not seem to make sense to me, since : would indicate a range, but your start of the range is an li-element and not a number. That is exactly what the error message tells you.

I want to extract value of < li > element and put them in array so that i could get a list of circulars webpages.

Yes you wrote that above, but I was asking specifically about the : and later the :5 those do not make sense. IN principle you are writing something like `please count from li to nothing for the first and “from li to 5” in the second. That does not make sense and is the origin of your error message.

Have you tried the approach mentioned above?

I tried but it is giving error. I am stuck and Gumbo documentation is not detailed for my understanding . I want get those values “34128” .

<li value="34128">

image

@raman_kumar,

There are multiple things that are wrong with your code above:

  1. As others pointed out - you have a syntax error - mainly, you use : in the wrong place. This has nothing to do with Gumbo or a specific library. I’ll not discuss this - as others have already done this.
  2. getattr cannot be used with a sequence of HTMLElements as input: you need to call getattr with an instance of an HTMLElement (not the signature of the element) and an attribute name (wich is a string).

Also - if you can properly share your code with minimal reproducible examples that we can run, you would enable everybody to offer better help.

Please share your code starting with the document parsing and make sure to include the snippet that gives you a hard time (a screenshot is not a good idea).

2 Likes

My code looks like :slightly_smiling_face:

using HTTP , Gumbo , Cascadia, DataFrames,ParserCombinator, Profile
mnurl="https://gcn.nasa.gov/circulars"
q=HTTP.get(mnurl)
g=parsehtml(String(q.body))
bd=g.root[2]
dv= eachmatch(Selector(".grid-container.usa-section ol"), bd)
dv[1]
children(dv[1])

after that i want get value of each < li > element in children(dv[1]) output.

This code does not reproduce any error.

1 Like

How to get value of each < li > element in children(dv[1]) output ? I want get those values like “34128” etc .

@raman_kumar,

But I answered this right after you posted the original question.

Please see the first answer and compare it with the following code:

using HTTP , Gumbo , Cascadia, DataFrames,ParserCombinator, Profile
mnurl="https://gcn.nasa.gov/circulars"
q=HTTP.get(mnurl)
g=parsehtml(String(q.body))
bd=g.root[2]
dv= eachmatch(Selector(".grid-container.usa-section ol"), bd)
dv[1]
children(dv[1])

li_elements = children(dv[1])
li_vals = [ getattr(li, "value") for li in li_elements ]

# li_vals will contain the following values...
# "34128"
# "34127"
# "34126"
# and so on
2 Likes

Why did you not try the code above? If I do

li_elements = children(dv[1])
li_vals = [ getattr(li, "value") for li in li_elements ]

I get exactly what you asked for and I only had to rename the variable containing the parent.

2 Likes

Thank You both. It is giving circular numbers on first page. can i get the list of all circulars on that GCN website ?

The answer would be most likely yes, you can.

However, to answer in detail, it would require those willing to help to go and inspect the site map and review various HTML sources.

I think the best way to get help is to try and create specific posts with a narrow enough issue - and always try to provide enough context + code (if the case).

1 Like