How to download the html file of a website from juli, e.g.: https://page.com through a browser like chrom, if the website blocks access to robots. Some package?
Paul
A package can’t solve this problem. If the website requires some user interaction to proceed, you can’t overcome this.
What you can try: check if there are some cookies set or if you can login/register, so that you don’t need to interact anymore before you get the page. If this is possible the next step would be to send the cookie or to login via your script.
Another possibility would be, if the website in question provides an API exactly for getting content via script.
If you solved the problem of a “user interaction test” then you can check out headless chrome. Here is an example (Windows):
url = "https://page.com"
chrome_bin = "C:/Program Files/Google/Chrome/Application/chrome.exe"
chrome = `$(chrome_bin) --headless --disable-gpu --dump-dom $(url)`
content = read(chrome,String)
Here is a starting point for headless chrome: https://developer.chrome.com/blog/headless-chrome
Thanks ! No user interaction is required. The site does not send full html when I query with Julia, it only responds to Chrome, Firefox, etc.
Paul
So, there is probably some JavaScript filling the content. Headless chrome is the way to go.