I found this tool (repo2file) helpful for my workflows - quickly giving context for questions to my local LLM about my working (small) repo in the terminal.
Until I saw this post, I wasn't aware of any of those.
What makes his better? Since you're asking, I tried these and here's my verdict:
- [files-to-prompt](https://github.com/simonw/files-to-prompt) (from the GOAT simonw)
--> There's no option to specify files to include, must work backwards with ignore option
- [1filellm](https://github.com/jimmc414/1filellm)
--> Many dependencies and complicated setup(have to setup GitHub access token which I've never done)
I guess the difference is that your script produces a complete copy, whereas aider uses a concise summary, necessary for when the context window is full
This is a similar tool I wrote for myself called "ingest". It ingests files/directories to LLM friendly markdown, estimates token usage, and can estimate vRAM usage for different models and quantisations and shows you a table highlighting which quantisation, context size and k/v cache quantisation will fit in a given (v)RAM size. - https://github.com/sammcj/ingest
I schemed the readme, but did not see support for prefixing each line with line numbers, this is an absolute must have for people like me who have a workflow centered around generating git patchs. In my experience that gives generated patchs much more chances to be incorrect.
Something like this that could automatically scrape a set of url's into a file would also be useful for trying to learn how to use various terrible enterprise software applications (SAP).
Can you explain your n8n workflow a little bit if you don't mind? txtrepo looks interesting. I had thought about somehow combining Aider with n8n/nodered to automate a few small things, but haven't looked too much into it yet.
Seems like a common itch to scratch and a good tool to scratch it with. I created 'linusfiles' and 'grabout' as tools with this. Grabout copies the last input and error message or other output to clipboard and linusfiles copies the tracked files to clipboard.
But I like the idea of tarballing it, as ndr_ suggested. I'm thinking that could be the move here.
These are extremely common these days. Here are a few I've collected over the past few months:
- [files-to-prompt](https://github.com/simonw/files-to-prompt) (from the GOAT simonw)
- [code2prompt](https://github.com/mufeedvh/code2prompt)
- https://gh-repo-dl.cottonash.com/
- [1filellm](https://github.com/jimmc414/1filellm)
- [repopack](https://github.com/yamadashy/repopack)
- [ingest](https://github.com/sammcj/ingest)
What makes yours better?
I found this tool (repo2file) helpful for my workflows - quickly giving context for questions to my local LLM about my working (small) repo in the terminal. Until I saw this post, I wasn't aware of any of those.
What makes his better? Since you're asking, I tried these and here's my verdict:
- [files-to-prompt](https://github.com/simonw/files-to-prompt) (from the GOAT simonw) --> There's no option to specify files to include, must work backwards with ignore option
- [code2prompt](https://github.com/mufeedvh/code2prompt) --> It always puts the output to the paste buffer even if you specify output file
- https://gh-repo-dl.cottonash.com/ --> There's no CLI
- [1filellm](https://github.com/jimmc414/1filellm) --> Many dependencies and complicated setup(have to setup GitHub access token which I've never done)
- [repopack](https://github.com/yamadashy/repopack) - [ingest](https://github.com/sammcj/ingest) --> haven't tried these yet, but they actually look promising...
I tried [repopack] and it did a very good job (y) Simple installation too.
I tried code2prompt but the interface has a quirk—even if you specified an output file, it will always put the output to the paste buffer.
Which one of these you find the best? It’s quite tempting to write one myself for something as simple as this.
I think most of these projects are bit overkill, OG poster's repo is about what I'd do myself.
How did you compare the above? I'd love to see a "clear winner".
sammcj of sammcj/ingest reporting in! Funny to see my little tool pop up in comment threads.
Take a look at what aider does to create a repo map using treesitter; https://aider.chat/docs/repomap.html https://aider.chat/2023/10/22/repomap.html
I guess the difference is that your script produces a complete copy, whereas aider uses a concise summary, necessary for when the context window is full
This is a similar tool I wrote for myself called "ingest". It ingests files/directories to LLM friendly markdown, estimates token usage, and can estimate vRAM usage for different models and quantisations and shows you a table highlighting which quantisation, context size and k/v cache quantisation will fit in a given (v)RAM size. - https://github.com/sammcj/ingest
Thats cool. I've used it. I'd add:
- treat '-' as stdout
- named arguments
- dont filter ignorefiles by checking they start with '.', cause it makes local .gitignore not being found, and treated as an extension :)
I schemed the readme, but did not see support for prefixing each line with line numbers, this is an absolute must have for people like me who have a workflow centered around generating git patchs. In my experience that gives generated patchs much more chances to be incorrect.
Nice. I have a few suggestions:
Put code blocks inside 3 ticks in the beginning and 3 ticks in the end since it's the default for each file.
Remove the dashes to save tokens.
In the title for the code blocks put the full relative path to the file since some projects have many files with the same name.
I asked chatgpt and it said to use a language code, too, eg:
Made a similar one that's not super polished - https://github.com/VVoruganti/repo-to-prompt
Interesting! There was another Show HN that did this same thing earlier in the day!
https://news.ycombinator.com/item?id=41480373
Something like this that could automatically scrape a set of url's into a file would also be useful for trying to learn how to use various terrible enterprise software applications (SAP).
made one as well with interactive selection and token counting https://github.com/3rd/promptpack
There is an api for this at https://txtrepo.com I used it with n8n to create PRs on issues
Can you explain your n8n workflow a little bit if you don't mind? txtrepo looks interesting. I had thought about somehow combining Aider with n8n/nodered to automate a few small things, but haven't looked too much into it yet.
How does this (or similar tools) differ from just a simple `cat foo bar > out`?
Great, I didn't know about this type of tools, thanks
Another approach is to just tar up the files, without compression. Works well with Claude via API.
Seems like a common itch to scratch and a good tool to scratch it with. I created 'linusfiles' and 'grabout' as tools with this. Grabout copies the last input and error message or other output to clipboard and linusfiles copies the tracked files to clipboard.
But I like the idea of tarballing it, as ndr_ suggested. I'm thinking that could be the move here.
In case anyone wanted to see my workflows https://github.com/atxtechbro/shell-tooling
that's a cool project.