Directory Path -> URL Mining; output a Tree

pontus · November 13, 2019, 3:42pm

So I think to do this is a bit more work than originally understood, there doesn’t appear to be a simple command equivalent in URL-land to walkdir in FILESYSTEM-land. Will need to build the parent-child relationships with a graph structure, perhaps recursively. While more work, it’s also well-understood and documented, so will go back to that in time…

For now, have a hack that can work, basically creating a mirror of the URL structure in the local Filesystem, and then running the fstree.jl script above. Can use wget to do this on open directories:

$ wget -r --spider -l depth www.your-target-website.tld

For regular directory structures, can use the command-line sitemap crawler I’ve been using, process with EzXML to extract all the directories, and then use a loop to create these locally in the filesystem, and then walk this.

IRS_nber2
└─ data.nber.org
   └─ tax-stats
      ├─ 990
      ├─ county
      │  ├─ 1989
      │  │  ├─ 89xls
      │  │  ├─ CI89
      │  │  └─ desc
      │  ├─ 1990
      │  │  ├─ 1990CountyIncome
      │  │  └─ desc
...

This doesn’t provide option of showing filenames in the structure, need to do a full mirror for that (rsync), but can move on…

bls-timeseries
└─ download.bls.gov
   └─ pub
      └─ time.series
         ├─ ap
         │  ├─ ap.area
         │  ├─ ap.contacts
         │  ├─ ap.data.0.Current
         │  ├─ ap.data.1.HouseholdFuels
         │  ├─ ap.data.2.Gasoline
         │  ├─ ap.data.3.Food
         │  ├─ ap.footnote
...

Also, lesson learned is to ask for more focused help with a succinct OP.

Topic		Replies	Views
[ANN] Announcing XML.jl Package Announcements xml	29	2193	June 9, 2023
Generate Tree Struct using EzXML.jl? General Usage xml , tree	1	537	February 2, 2022
[ANN] FileTrees.jl -- easy everyday parallelism on trees of files Package Announcements parallel , data , multithreading , distributed , filesystem	34	3857	August 22, 2020
Trees in Julia New to Julia question , tree	14	11025	July 15, 2019
Wengert list -- possible graphs? New to Julia	11	818	May 1, 2021

Directory Path -> URL Mining; output a Tree

Related topics