[ANN] TableScraper.jl - an easy way to scrape WELL-FORMED tables from webpages

xiaodai · May 22, 2021, 1:43pm

rafael.guerra · May 22, 2021, 2:23pm

Can also be scraped here.

TheLateKronos · June 9, 2021, 9:39am

Nice. Is it possible to use this to get wikipedia tables as LaTeX?

Specifically this one would look great in my LaTeX equation sheet, as opposed to my current screenshot.

xiaodai · June 9, 2021, 11:18am

the table looks wellformed so you can scrape it. but u need to know some css and HTML to do it properly i’d say.

TheLateKronos · June 9, 2021, 1:40pm

I dont unfortunatly… Oh well

jules · June 9, 2021, 2:10pm

just wanted to check out the package quickly, something like this seemed to kind of work

@chain begin
           scrape_tables("https://en.wikipedia.org/wiki/Z-transform", identity)
           _[8]
           DataFrame
           transform(1 => ByRow(nodeText) => :number)
           transform(2:4 .=> ByRow(function(x)
               try
                   x.children[1].children[2].attributes["alt"]
               catch
                   missing
               end
           end) .=> ["Signal", "Z-Transform", "ROC"])
           select(Not(1:4))
        end

xiaodai · June 9, 2021, 10:22pm

Well done I guess it’s not that hard

Marc.Cox · September 30, 2021, 11:58pm

Just noticed this announcement for TableScraper.jl it looks good.

BTW about an easy way to scrape " WELL-FORMED tables " from webpages ;
I believe you can (mostly) eliminate the caveat/requirement/limitation
of " WELL-FORMED tables " by using tidy-html5 as per >>

Tidy tidies HTML, XML and tidy-HTML5 - Github code repo https://github.com/htacg/tidy-html5 .

About “Regular” Tidy tidies https://www.html-tidy.org/

It can tidy your documents by itself, and developers can easily integrate its
features into even more powerful tools.

And upon reflection I believe you might even guess of the existence of
something like “tidy-html5” because of the fact that browsers
can display almost all tables be they WELL-FORMED , ILL-Formed, or not

HTH,
Marc

xiaodai · October 1, 2021, 12:14am

Interesting. TIL about these tidy tools. I might look at it if the need arises. Currenlty, the scraper works pretty well for my basic needs.

Topic		Replies	Views
Approaching PrettyTables v2.0 Community	5	886	September 3, 2022
[ANN] PrettyTables 0.8.0 Package Announcements	13	1246	January 6, 2020
[ANN] PrettyTables.jl now has LaTeX backend Community	22	2875	January 12, 2020
[ANN] PrettyTables.jl - Print formatted tables in Julia Package Announcements	84	17124	November 9, 2020
[ANN] PrettyTables.jl v0.12 Package Announcements	8	810	April 15, 2021

[ANN] TableScraper.jl - an easy way to scrape WELL-FORMED tables from webpages

Related topics