Julia-Gumbo-webscraping

Is it possible to convert structure type to julia string type.
i have scraped some data from website with Gumbo.jl and i want to push this data to a Array but when i do that i get this error:Cannot convert an object of type Array{Any,1} to an object of type HTMLNode

Should be possible, can you post a snippet of code? It’ll be easier to help you

I has been resolved.
But could you please tell me what is HTMLNode in Gumbo.jl HTML Type.

It’s an abstract type, you can see it’s subtypes using:

subtypes(HTMLNode)

3-element Array{Any,1}:
 HTMLElement
 HTMLText   
 NullNode

You might also find this link on types useful

Sir, i have scraped some data with gumbo.jl and i want to convert data to string. I am not able to do that.How can i convert?

using HTTP
using Gumbo
using Cascadia
using DataFrames

Base_Url="https://www.moneycontrol.com/india/stockpricequote/A"
Active_Url=Base_Url
println(Active_Url)
h=HTTP.request("GET",Active_Url)
#println(h)
html=parsehtml(String(h))
From_Website=eachmatch(sel".bl_12",html.root)
#println(From_Website)
Name=From_Website[5][1]
println(Name)
a=String(Name)
using HTTP, Gumbo, Cascadia

url = "https://www.moneycontrol.com/india/stockpricequote/A"
response = HTTP.get(url)
html = response.body |> String |> parsehtml
elements = eachmatch(sel".bl_12", html.root)
elements[5][1].text
1 Like

Could you please explain this
html = response.body |> String |> parsehtml

Hi @Anil_Mathews, please take a look at this post, which in particular provides information on how to get the code in your parts to be formatted (makes it easier to read). There’s some other useful information about how to write questions to make it easier to help you.

This is pipe syntax, and is equivalent to html = parsehtml(String(response.body)). Here’s the relevant section of the manual.

1 Like

Sir, I want scrape all company names from a site. When i the second for loop is not working and .text is no showing no object text
using HTTP
using Gumbo
using Cascadia
using DataFrames
using StatsBase

letters=[‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’, ‘G’, ‘H’, ‘I’, ‘J’, ‘K’, ‘L’, ‘M’, ‘N’, ‘O’, ‘P’, ‘Q’, ‘R’, ‘S’, ‘T’, ‘U’, ‘V’, ‘W’, ‘X’, ‘Y’, ‘Z’]
#print(letters)
list=
Base_Url=“Stock Quotes|Company Stock Price quotes|NSE/ BSE Listed Company Stocks|Indian Stock Market
lst=Array{String,1}()
for letter in letters
url=Base_Url*letter
response = HTTP.get(url)
html = response.body |> String |> parsehtml
elements= eachmatch(sel".bl_12", html.root)

for ele in [3:length(elements)]
    print(ele)
    a=elements[ele][1].text
   
    push!(lst,a)

    end

end

Please quote your code:

Normally this is a matter of etiquette and getting help more effectively, but in this case because you link to an external site, it’s getting you flagged as spam.

2 Likes

I want to create Structure to store my details

        name:: String
        age ::int 
        Father_Name:: String
end```

Then, I want to store these data in Array type anil

```data= Array{Anil,1}("anil",22,"Mathew)```

Unfortunately, it's wrong, how to code

2)how to find a particular type from an array?
From my data Array, i want to find String only, how to do that?

Again, please quote your code.

I was scraping company names from

using HTTP, Gumbo, Cascadia```

```url = "https://www.moneycontrol.com/india/stockpricequote/A" ```
#getting the data

```response = HTTP.get(url)```

#parsing the data

```html = response.body |> String |> parsehtml```
#the page contain 747 elements

```elements = eachmatch(sel".bl_12", html.root)```

#iterating each to get company name

```for ele in Array{Int,1}(range(3,length(elements)))
    try
    println(ele,"========", elements[ele][1].text)
    finally
        break
    end
end```
#But the problem is Array element is 747,However, after 745 element left two are 0 element. So, the loop breaks. I want to manage that if the 746 element is None break and iterate another page. Please help me.

That’s still not properly quoted I’m afraid and the system is telling us (mods) that you’re still spamming… Try backticks: it’s the character in the upper left corner of most keyboards.

“Quote” the code doesn’t mean to use the quotation symbol ". If you insert code in-line, “quote the code” means to insert the code between a pair of back-ticks , e.g., leading to using Gumbo; .

If you have longer code that you want to display, “quote the code” means to insert the code between a pair of lines with triple back-ticks, e.g.:

using Gumbo;
x = range(0,2pi,length=100)

I don’t know whether it is mandatory, but you can specify the programming language at the end of the first line with triple back-tics, e.g., you add julia just after the triple back-ticks of the first line.

Such “quotation of the code” with back-ticks makes your questions much more readable, and increases the chance of someone deciphering your code.

… to be more precise, here is the back-ticking… using Gumbo;… is set up by:
image
while the displayed code,

using Gumbo;
x = range(0,2pi,length=100

is set up by:
image

Meta help with code formatting - in discourse (though not all markdown-based things), you can actually use 4 backticks to get formatting of triple backticks:

````
```
using Gumbo;
x = range(0,2pi,length=100
```
````

(note: in the source, the above is surrounded by 5 backticks… one can do this ad infinitum)

For in-line formatting, you can surround single-back ticks in double backticks. “This is a `test`” is achieved with "This is a `` `test` ``"

EDIT: I didn’t realize, but Tamas made a detailed post about this, linked in the PSA above

Please do read the post that Stefan and I linked you to - I know when you’re trying to solve a particular problem, it can be frustrating to be sent down seemingly unrelated paths, but taking a little time now to learn how to ask questions in a way that will make it easier for people on this forum to help you will pay off in the long run, I promise.

A couple of other points of etiquette in addition to quoting code:

  1. it’s also good practice to ask separate questions in separate threads. If your original question was answered, you should ask follow-ups in separate threads.
  2. Use a title that reflects the question you’re asking (you can actually edit the title after the fact if you wish). Your first question is not really about web scraping, but rather about type conversion
  3. Only one post can be marked as the solution. Since my post was the answer to a follow up, you really shouldn’t mark it as the answer. One of @onetonfoot’s answers would be more appropriate.
3 Likes