Can also be scraped here.
Nice. Is it possible to use this to get wikipedia tables as LaTeX?
Specifically this one would look great in my LaTeX equation sheet, as opposed to my current screenshot.
the table looks wellformed so you can scrape it. but u need to know some css and HTML to do it properly i’d say.
I dont unfortunatly… Oh well
just wanted to check out the package quickly, something like this seemed to kind of work
@chain begin
scrape_tables("https://en.wikipedia.org/wiki/Z-transform", identity)
_[8]
DataFrame
transform(1 => ByRow(nodeText) => :number)
transform(2:4 .=> ByRow(function(x)
try
x.children[1].children[2].attributes["alt"]
catch
missing
end
end) .=> ["Signal", "Z-Transform", "ROC"])
select(Not(1:4))
end
Well done I guess it’s not that hard
Just noticed this announcement for TableScraper.jl it looks good.
BTW about an easy way to scrape " WELL-FORMED tables " from webpages ;
I believe you can (mostly) eliminate the caveat/requirement/limitation
of " WELL-FORMED tables " by using tidy-html5 as per >>
Tidy tidies HTML, XML and tidy-HTML5 - Github code repo https://github.com/htacg/tidy-html5 .
About “Regular” Tidy tidies https://www.html-tidy.org/
It can tidy your documents by itself, and developers can easily integrate its
features into even more powerful tools.
And upon reflection I believe you might even guess of the existence of
something like “tidy-html5” because of the fact that browsers
can display almost all tables be they WELL-FORMED , ILL-Formed, or not
HTH,
Marc
Interesting. TIL about these tidy tools. I might look at it if the need arises. Currenlty, the scraper works pretty well for my basic needs.