Request to package authors: single html / PDF manuals for LLMs

greatpet · March 2, 2025, 9:08am

Various cloud and local LLM services allow users to upload documents before asking questions about the documents. If a package manual can be provided as a single webpage or PDF file (within the context window size of the LLM), then you can ask questions about using the package even when the (up-to-date) package is otherwise outside the knowledge of the LLM.

P.S. maybe I just need a tool for crawling through doc pages to combine them into a single document.

Mason · March 2, 2025, 11:58am

abraemer · March 2, 2025, 12:06pm

Maybe use a LLM to generate it from the package documentation?

DanielVandH · March 2, 2025, 1:54pm

I don’t think you are likely to find many, if any, authors willing to do it for you. I definitely wouldn’t just so a user can use an LLM. If you want a PDF, you can generate one yourself by playing with Other Output Formats · Documenter.jl and going to the package’s docs/make.jl file. How good that PDF will turn out with any extra effort is a different question.

Xing_Shi_Cai · March 2, 2025, 2:26pm

Simon Wilson has a files-to-prompt command line tool which converts to multiple files into a single prompt for use with LLMs: GitHub - simonw/files-to-prompt: Concatenate a directory full of files into a single prompt for use with LLMs

You can download the documentation and turn it into a single prompt with this tool and ask LLM the question.

On the other hand, it might take less time to just go to the part of the documentation where it is relevant and copy and past it into an LLM.

adienes · March 2, 2025, 2:56pm

IMO this could be a pretty good idea (though of course no package author faces no obligation whatsoever if they don’t want to)

but I suspect some of the tone of the responses so far are driven by a generic hostility to AI-assisted coding workflows, and like it or not I don’t think these are going to become any less popular anytime soon.

greatpet · March 2, 2025, 3:34pm

Just playing with LLMs for fun as of late; definitely didn’t mean to sound entitled. I tried running a downsized version of DeepSeek-R1 with 14B parameters on my laptop. It was able to produce correct Julia code for this simple question, on the first try: “Write Julia code to print out the total size of the 10 largest files in the current directory.” Surely a very basic task, but still impressive that it can be solved by a local model on my meager GPU with 8GB of RAM. The code produced is here:

files = []
for f in readdir(".")
    if isfile(f)
        try
            size = stat(f).size
            push!(files, (f, size))
        catch e
            continue
        end
    end
end

sorted_files = sort(files, by = x -> -x[2])
top_10 = first(sorted_files, 10)
total_size = sum(x[2] for x in top_10)

println("Total size of the top 10 largest files: $(total_size) bytes")

yvikhlya · March 2, 2025, 8:35pm

I think that soon making documentation in format which AI can easily digest and accompany it with data sets for fine tunning AI models will become a good manner. Packages without this stuff will not stand a competition.

GunnarFarneback · March 2, 2025, 8:46pm

Something’s amiss here. If an artificial intelligence requires specially crafted data, it doesn’t sound as intelligent as it’s made out to be. Or put another way, AI with such requirements will not stand a competition.

greatpet · March 2, 2025, 10:25pm

Looks like I did miss the web search features available in both ChatGPT and local LLMs, which can easily descend into subsections of doc websites. I just found out how to enable web searches in my local LLM deployment (Ollama + OpenWebUI). Here’s a screenshot, running Meta’s Llama3.2 3B model to inquire about a recently created Julia package. Both the doc site and the Discourse announcement page were read by the LLM before an answer was produced.

Benny · March 2, 2025, 10:31pm

Is there any indication of how much of the websites it read? That would be good to verify.

greatpet · March 2, 2025, 10:44pm

I don’t think it read beyond the top-level pages of the 6 websites listed explicitly in the screenshot. I can probably tweak the settings to increase the number of pages the LLM reads, which will help if the individual doc pages appear among the top results returned by the search engine. I can also ask specific questions, e.g. about a particular function in the package, to guide the LLM to search for the specific doc page.

yvikhlya · March 3, 2025, 5:49am

Well, in a distant future AI will become smart enough to do all the job which will be used by other AI, but we are not there yet.

kristoffer.carlsson · March 3, 2025, 1:12pm

The way this is done is not to have every package author creating specialized documents to upload to LLMs but to make a separate tool that takes the typical form that package author publish their documentation and generates a single file from it.

Datseris · March 3, 2025, 1:50pm

Exactly. Not only that, but this doesn’t sound too far of either. Documenter.jl is the standard tool most Julia packages use for their docs. The Documenter.jl team has put a lot of effort to make it so that it is modular and can accommodate various plug-ins. Many of them readily available already in the “market” like DocumenterCitations.jl. As the whole infrastructure is well standardized it really isn’t much of a stress for someone to make a conversion tool.

Most likely the people that use AI the most can come together and make a nice package!

yvikhlya · March 7, 2025, 4:46am

Yes! People who use AI can actually use AI to make a nice AI package. How cool this would be!

Topic		Replies	Views
AI-Generated Documentation(Deepwiki) + Julia Offtopic wiki , documentation , ai , generative-ai	4	436	May 3, 2025
Documentation needs to be more discoverable and explorable General Usage documentation	24	1484	November 24, 2023
New pkg.julialang.org Community announcement	51	4252	April 10, 2020
GptSearchPlugin (Web service) Package Announcements chatgpt , openapi , openai , vertexai , vector-search	0	280	July 24, 2023
Literate.jl to PDF using Documenter.jl simple question Tooling literate-programming	15	245	May 22, 2025

Request to package authors: single html / PDF manuals for LLMs

Related topics