Directory Path -> URL Mining; output a Tree

So I think to do this is a bit more work than originally understood, there doesn’t appear to be a simple command equivalent in URL-land to walkdir in FILESYSTEM-land. Will need to build the parent-child relationships with a graph structure, perhaps recursively. While more work, it’s also well-understood and documented, so will go back to that in time…

For now, have a hack that can work, basically creating a mirror of the URL structure in the local Filesystem, and then running the fstree.jl script above. Can use wget to do this on open directories:

$ wget -r --spider -l depth www.your-target-website.tld

For regular directory structures, can use the command-line sitemap crawler I’ve been using, process with EzXML to extract all the directories, and then use a loop to create these locally in the filesystem, and then walk this.

IRS_nber2
└─ data.nber.org
   └─ tax-stats
      ├─ 990
      ├─ county
      │  ├─ 1989
      │  │  ├─ 89xls
      │  │  ├─ CI89
      │  │  └─ desc
      │  ├─ 1990
      │  │  ├─ 1990CountyIncome
      │  │  └─ desc
...

This doesn’t provide option of showing filenames in the structure, need to do a full mirror for that (rsync), but can move on…

bls-timeseries
└─ download.bls.gov
   └─ pub
      └─ time.series
         ├─ ap
         │  ├─ ap.area
         │  ├─ ap.contacts
         │  ├─ ap.data.0.Current
         │  ├─ ap.data.1.HouseholdFuels
         │  ├─ ap.data.2.Gasoline
         │  ├─ ap.data.3.Food
         │  ├─ ap.footnote
...

Also, lesson learned is to ask for more focused help with a succinct OP.