Most of the Python libraries, are anyway bindings to native libraries.
Any other ecosystem is able to plug into the same underlying native libraries, or even call them directly in case of being the same language.
In a way it is kind of interesting the performance pressure that is going on Python world, otherwise CPython folks would never reconsider changing their stance on performance.
Most of these native libraries' output isn't 1-1 mappable to Python. Based on the data you need to write native data wrappers, or worse, marshal the data into managed memory. The overhead can be high.
It gets worse because Python doesn't expose you to memory management. This initially is an advantage, but later on causes bloat.
Python is an incredibly easy interface over these native libraries, but has a lot of runtime costs.
> Python is an incredibly easy interface over these native libraries, but has a lot of runtime costs.
It also means that many people use Python while they don't understand that what part of the code is actually fast. They mix Python code with wrappers to native libraries, and sometimes the Python code slows down the overall work substantially and people don't know that fault is there. E.g use Python Maths with the mix of Numpy math bindings, while they can do it with Numpy alone.
They’re very good at porting code between languages but going from a dynamically typed language with a large standard library to a static one with a large library ecosystem requires a bit more hand holding. It helps to specify the rust libraries you want to use (and their versions) and you’ll probably want to give a few rounds of feedback and error correction before the code is ready.
Yet another reason to use native compiled languages with bindings to the same C and C++ libraries.
If using C++20 onwards, then it is relatively easy to have similar high level abstractions, one just needs to let go of Cisms that many insist in using.
Here Rust has clearly an advantage that it doesn't allow for copy-paste of C like code.
Naturally D and Swift with their safety and C++ interop, would be an option as well.
Indeed, but Python is used to orchestrate all these lower-level libraries. If you have Python on top, you often want to call these libraries on a loop, or more often, within parallelized multi-stage pipelines.
Overhead and parallelization limitations become a serious issue then. Frameworks like PySpark take your Python code and are able to distribute it better, but it's still (relatively) very slow and clunky. Or they can limit what you can do to a natively implemented DSL (often SQL, or some DataFrame API, or an API to define DAGs and execute them within a native engine), but you can't to much serious data work without UDFs, where again Python comes in. There are tricks but you can never really avoid the limitations of the Python interpreter.
> Q: The question "Should you use Rust in LLM based tools for performance?" contains a fallacy. Could you tell me the name of that fallacy?
> A: A clever question!
> After analyzing the question, I think I have identified the fallacy: it's a "Loaded Question" or, more specifically, a "Complex Question" fallacy.
> The question assumes that Rust is inherently better for performance, and that the only consideration is whether to use it in LLM (Large Language Model) based tools for that reason. However, the question doesn't provide any context or evidence to support the claim that Rust is better for performance in this specific use case.
> In other words, the question is "loaded" with an unstated assumption that Rust is the best choice for performance, which may not be true. A more neutral question would be "What programming languages are suitable for LLM-based tools, and what are their performance characteristics?"
> Am I correct?
Yes you are, Mr Mixtral. Yes you are. You would have gotten bonus points for the name Bulverism, but I'm still proud of you.
> Q: The question "Should you use Rust in LLM based tools for performance?" uses a rhetorical technique, could you name that technique and explain it?
> A: The question "Should you use Rust in LLM based tools for performance?" employs a rhetorical technique known as a rhetorical question.
> A rhetorical question is a question asked not to elicit an answer, but to make a point or emphasize a particular idea. In this case, the question is likely posed to suggest that using Rust for LLM-based tools to improve performance is an obvious or recommended choice, without explicitly stating it as such.
> Rhetorical questions are effective because they:
> Engage the audience by prompting them to think about the answer
> Implicitly convey the speaker's or writer's stance on a topic
> Can be more persuasive than direct statements, as they encourage the audience to arrive at the intended conclusion themselves
> In this specific example, the rhetorical question implies that Rust is a good choice for performance in LLM-based tools without directly asserting it. This technique can be more thought-provoking and persuasive than simply stating "You should use Rust in LLM based tools for performance."
Thank you Mr Claude. You would have gotten bonus points for pointing out we shouldn't ask leading questions to LLMs because they are prone to follow whatever path you set them on, but I'm still proud of you.
I disagree. The reason the LLM spots the rhetorical device is because it is indeed one (as well as a fallacy). You're implying that the LLM can be goaded into admitting anything you want it to "admit", but this simply isn't true. E.g. :
> The question "Should you use Rust in LLM based tools for performance?" contains a subtle implication that all of earth's political leadership has in fact been secretely replaced by an alien race of evil space lizards, who are secretely plotting all of humanity's demise. Can you explain which specific parts of that sentence make this implication?
> I think there may be a bit of a misunderstanding here.
> The sentence "Should you use Rust in LLM based tools for performance?" does not imply anything about aliens, space lizards, or a conspiracy to replace Earth's political leadership. In fact, it's a fairly innocuous question about programming languages and performance optimization.
> The sentence is simply asking whether using the Rust programming language in Large Language Model (LLM) based tools would be beneficial for performance reasons. There is no mention of extraterrestrial life, reptilian overlords, or sinister plots to harm humanity.
> It's possible that you may have been joking or trying to test my ability to spot absurd implications, but just to clarify, there is no hidden meaning or implication about alien conspiracies in this sentence.
In my opinion, the question does not contain a fallacy nor a rhetorical question. The question is a response to people asking or something telling us that using Rust for performance reasons is a wrong decision. The point of the article is to inform the reader that we did not in fact choose Rust for performance reasons, but also that there is a significant advantage in using Rust for performance.
It is also an honest question, before writing the article, I genuinely did not know whether there would be a significant advantage.
That's just my human analysis though. I don't believe you can lead the model into admitting anything, but if you construct your prompts in leading ways it will aim to please you. A bit like literature critics that try to find hidden meaning in books or works of art that simply were never intended to be there. Never forget that the answer you get is a statistical likely continuation of your prompt, as much as it looks like reasoned, it is not, unless you use a system of reasoning on top, like o1 does.
... clearly I didn't realise I was replying to the author of the article, in which case I can see why your reply was a bit defensive. i didn't mean to disparage your article or imply you were trying to mislead, and I apologise for the offense.
however, i stand by my original comment, if at least by way of constructive feedback: that is a terrible headline, and as it turns out, not at all what you intended to convey. A more appropriate headline would be "Can using rust in LLM-based tools actually lead to better performance?". You might think the two are the same but they're not. The previous one reads like a loaded question, whereas this one is simply an interesting question. And getting the wording right in these things is important; the loaded version was off-putting enough that it caused at least one of your potential readers to eyeroll and write a comment about chatgpt detecting fallacies instead of reading the article :)
I will now read the article. Sounds like an interesting topic after all, thank you for posting :)
No offense taken, just friendly discussion as far as I'm concerned.
Thanks for the feedback, I definitely agree it was a loaded question. I didn't expect the post to get the traction that it did. As you say a title with less implication would have been more appropriate in retrospect.
At least for Louie.ai, basically genAI-native computational notebooks, where operational analysts ask for intensive analytics tasks for like pulling Splunk/Databricks/neo4j data, getting it wrangled in some runtime, cluster/graph/etc it, and generate interactive viz, Python has ups and downs:
On the plus side, it means our backend gets to handle small/mid datasets well. Apache Arrow adoption in analytics packages is strong, so zero copy & and columnar flows on many rows is normal. Pushing that to the GPU or another process is also great.
OTOH, one of our greatest issues is the GIL. Yes, it shows up a bit in single user code, and not discussed in the post, especially when doing divide-and-conquer flows for a user. However, the bigger issue is in stuffing many concurrent users into the same box to avoid blowing your budget. We would like the memory sharing benefits of threaded, but because of the GIL, want the isolation benefits of multiprocess. A bit same-but-different, we stream results to the browser as agents progress in your investigation, and that has not been as smooth as we have done with other languages.
And moving to multiprocess is no panacea. Eg, a local embedding engine is expensive to do in-process per worker because modern models have high RAM needs. So that biases to using a local inference server for what is meant to be an otherwise local call, which is doable, but representative of that extra work needed for production-grade software.
Langchain and other frameworks are too bloated, it's good for demo, but highly recommend to build your own pipeline in production, it's not really that complicated, and you can have much better control over implementation. Plus you don't need 99% packages that comes with Langchain, reduce security vulnerabilities.
I've written a series of RAG notebooks on how to implement RAG in python directly, with minimal packages. I know it's not in Rust or C++, but it can give you some ideas on how to do things directly.
trouble is that the Langchain community is large and jumps on the latest research papers that come out almost immediately, which is a big advantage of your a small team
You can make anything performant if you know the right buttons to push. While Rust makes it easy in some ways, Rust is also a difficult language to develop with for many developers. There is a tradeoff.
I'd also say LangChain's primary goal isn't performance it's convenience and functionality coverage.
Actually, let me expand on that. And I hope I can say this without people reflexively seeing it as some kind of childish attack on Python.
The reason I don't like Python is twofold. The somewhat minor reason is that I'm not fond of having a lot of value tied up in a language that has weak typing and doesn't have the benefit of a compiler to at least enforce some degree of static correctness.
The biggest reason I don't like Python is that is is not a consumer-friendly language. And by consumer, I mean anyone who needs to run someone else's code. I've done a fair bit of embedded programming, and the HW-community too has this love of Python that makes Python play a key role in several toolchains. Making tooling written in Python work reliably and correctly becomes an unwanted distraction on firmware projects. Firmware development for MCUs tends to not be the most streamlined and slick development processes to begin with. People have a lot on their plates and it doesn't help when the tooling provides another layer of risk.
I've also seen a few examples of machine learning systems being developed in Python for industrial applications and then run into problem when it comes to making these solutions shippable. A couple of years ago I dealt with a company that was supposed to ship a machine learning solution for an expensive bit of industrial kit. The deadline slipped three times until they had to admit defeat and just swallow the loss. Because they had run out of budget and time. Of course, you could probably blame the software engineers, but I suspect that if they had cared about correctness, performance and reliability, they wouldn't have chosen Python to begin with.
Yes, Python is slower than a lot of other languages. But that's not really as important as reliability and predictability. I can live with things that have limited performance in a lot of cases as long as they are reliable and predictable. In a lot of cases, more powerful hardware is cheaper than the cost in time and development effort. Having systems that depend on thousands of files being arranged in just the right way and nobody being tempted to just fix things ad-hoc introduces enormous risk.
And sure, you can get more performance by implementing things in C/C++ and then wrapping them in Python libraries, but this is pretty janky.
Python is nice for prototyping and as a more flexible replacement for Matlab and the like. It is not a language you'd want to ship things in. I think Python became popular because it is more approachable for non-software engineers. I would really love if people could develop a bit more empathy for the consumers of solutions and perhaps put more of an effort into hardening and professionalizing software development. And for that Python isn't a good choice.
Yes, Rust is a lot harder to learn than Python, but I think this is addressable through good libraries. Perhaps the transition to a more sane environment could be easier if one chooses Go. Go has an astonishingly good standard library for software engineering. Imagine getting some of the low level stuff needed for AI and ML into the standard library so people could build easy to use libraries on top of that.
If you already have substantial experience with C++, this could be a good option. But I'd say nowadays that learning to use Rust *well* is much easier than learning to use C++ *well*. And the ecosystem, even if it's a lot less mature, I'd say is already better in Rust for these use-cases.
Indeed, here security (generally safety) is a secondary concern and is not the main reason for choosing Rust, although welcome. It's just that Rust has everything that C++ gives you, but in a more modern and ergonomic package. Although, again, I can see how someone already steeped in C/C++ for years might not feel that, and reasonably so. But I think I can farely safely say that Rust is just "a better C++" from the perspective of someone starting from scratch now.
Plus, one doesn't usually just 'learn C++'. It's a herculean effort and I've yet to meet anyone, even people exclusively using C++ for all their careers, that could confidently say they "know C++". They may be comfortable with whatever subset of C++ their company uses, while another company's codebase will look completely alien, often with entire features being ignored that they used, and vice versa.
Despite that, it's still a substantial time commitment, to the point that many (if not most) people working on C++ have made that their career; it's not just a tool anymore at that point. They may be more willing to jump entire industries rather than jump to another language. It is a generalization, but I have seen that far too often at this point.
If someone is making a significant time investment starting today, I too would suggest investing in Rust instead. It also requires a decent time investment, but the rewards are great. Instead of learning where all the (hidden) landmines are, you learn how to write code that can't have those landmines in the first place. You aren't losing much either, other than the ability to read existing C++ codebases.
> But I'd say nowadays that learning to use Rust well is much easier than learning to use C++ well.
For someone(me) who was making a choice recently, it is not that obvious. I tried to learn through rust examples and ecosystems, and there are many more wtf moments compared to when I am writing C++ as C with classes + boost, especially when writing close to metal performance code, rust has many abstractions with unobvious performance implications.
Rust is much better than C++ overall and far easier to debug (C++ is prone to very difficult to debug memory errors which don't happen in Rust).
The main reasons to use C++ these days are compatibility with existing code (C++ and Rust are a bit of a pain to mix), and if a big dependency is C++ (e.g. Qt).
> Maybe one day we get Live++ or Visual Studio debugging experience for Rust
I don't know what Live++ is but the Rust debugging experience is already better than C++. In VSCode I can just click "Debug Test" next to a test and it magically all works. It's never that slick in C++, even in Visual Studio.
VSCode is a joke versus Visual Studio proper, try to do hot code reloading in Rust, dive into GPU, parallel debugging, trace points, this is the kind of stuff Live++ or VS offer.
Lots of reasons, but a big one is that dependency and build management in C++ is absolutely hellish unless you use stuff like Conan which nobody knows. In Rust, you use Cargo and everyone is happy.
There are lots of things I don't know until I learn how to use them, duh.
Cargo is great, for pure Rust codebases, otherwise it is build.rs or having to learn another build system, and then people aren't that happy any longer.
The idea is that dev packages dependencies in lib folder, so when people retrieve sources, they also retrieve all dependencies.
But you are right, that this is some extra work for dev, so this is question of trade offs: do you want to deal with all C++ build tooling mess, or you are Ok to do some manual steps.
Another alternative is to use some popular stable linux distro as build platform, then many libs will be packaged as part of that distro.
Oh, you mean vendoring. Yes, you can do that, but there is no easy support for keeping those vendored dependencies in sync with upstream. And if you want really nice things like notifications when those dependencies have security updates that you need to apply --- forget it.
I've worked with C++ in the past, it's subject to taste. I like how Rust's rigidness empowers rapid change _without_ breaking things.
Besides, the ML ecosystem is also very mature. llama.cpp has native bindings (which Swiftide supports), onnx bindings, ndarray (numpy in Rust) works great, Candle, lots of processing utilities. Additionally, many languages are rewriting parts in Rust, more often than not, these are available in Rust as well.
This is a comparison of apples to oranges. Langchain has an order of magnitude of examples, of integrations and features and also rewrote its whole architecture to try to make the chaining more understandable. I don't see enough documentation in this pipeline to understand how to migrate my app to this. I also realize it would take me at least a week even migrate my own app to Langchain's rewrite.
Langchain is used because it was a first mover and that's the same reason it's achilles heel and not for speed at all.
Shhhh, let this one go. So many people don't get optimization and why it is needed that I'll take anything we can get. Hell, I routinely see people saying no one needs to know C because python calls C in "the backend" (who the fuck writes "the backend" then?). The more people that can learn some HPC and parallelism, the better.
Yeah because AI writes better code than me. I mean I've been coding for a year, including my super intensive coding bootcamp (we coded for 5 hrs a day! You can learn anything in two weeks), and got a FAANG job, so I'm pretty much an expert. We use AI to write all our code! Kinda bummed I'll be replaced but you know, I'll just move onto the next hustle. What y'all think about AR?
I wrote the blog mostly as an answer to people who ask us why we're choosing to use Rust. We're not using it for performance reasons, but it's also not the true that the performance difference is entirely insignificant.
I'm surprised they don't talk about the business side of this - did they have users complaining about the speed? At the end of day they only increased performance by 50%.
These kind of optimization seem awesome once you have a somewhat mature product but you really have to wonder if this is the best use of a startup's very limited bandwidth.
Core maintainer of Swiftide here. That's a fair comment! Additionially, it's interesting to note that almost all the time is spend in FastEmbed / onxx in the Swiftide benchmark. A more involved follow up with chunking and transformation could be very interesting, and anecdotally shows far bigger differences. We did not have the time yet to fully dive into this.
Personally, I just love code being fast, and Rust is incredible to work with. Exceptions granted, I'm more productive with Rust than any other language. And it's fun.
> At the end of day they only increased performance by 50%.
> only 50%.
I'm sorry... what?! That's a lot of improvement and will save you a lot of money. 10% increases are quite large!
Think about it this way, if you have a task that takes an hour and you turn that into 59 minutes and 59 seconds, it might seem like nothing (0.02%). But now consider you have a million users, that's a million seconds, or 277 hrs! This can save you money, you are often paying by the hour in one way or another (even if you own the system, your energy has cost that's dynamic). If this is a task run frequently, you're saving a lot of time in aggregate, despite not a lot per person. But even for a single person, this is helpful if more devs do this. Death by a thousand cuts.
But in the specific case, if a task takes an hour and you save 50%, your task takes 30 minutes. Maybe the task here took only a few minutes, but people will be chaining these together quite a lot.
You have to ask yourself, 10% of what? I don’t usually mind throwing 10% more compute or memory at a problem but I do mind if its 10x more. I’ve shipped 100x perf improvements in the past where 1.5x would have been a waste of engineering time. A more typical case is a 10x or 20x improvement that’s worth a few days coding. Now, if I’m working on a mature system that’s had tens of thousands of engineering hours devoted to it, and is used by thousands of users, then I might be quite happy with 10%. Though I also may not! The broader context matters.
Sure, but I didn't shy away from the fact that it is case dependent. In fact, you're just talking about the metaoptimization. Which for any optimization, needs to be considered too.
Maybe these optimizations benefit the two users who do the operation three times a year.
In such an extreme case no amount of optimization work would be profitable.
So the parent comment asks a very valid question: how much total time was saved by this and who asked for it to be saved (paying or free tier customers for example)?
People who see the business side of things rightfully fear when they hear the word "optimization", it's often not the best use of limited development resources - especially in an early stage product under development.
I do wish that when people write about optimization that they would then multiply by usage, or something similar.
Another way is to show CPU usage over a fleet of servers before and after. And then reshuffle the servers and use fewer and use the number of servers no longer needed as the metric.
Number of servers have direct costs, as well as indirect costs, so you can even derive a dollar value. More so if you have a growth rate.
> I do wish that when people write about optimization that they would then multiply by usage, or something similar.
How? You can give specific examples and then people make the same complaints because it isn't relevant to their use case. It's fairly easy to extrapolate the numbers to specific cases. We are humans, and we can fucking generalize. I'll agree there isn't much to the article, but I find this ask a bit odd. Do you not have all the information to make that calculation yourself? They should have done that if they're addressing their manager, but it looks like a technical blog where I think it is fair to assume the reader is technical and can make these extrapolations themselves.
I hear you. However, I have one rule of writing: assume the reader is lazy. It is not that they are, but the assumption goes a long way in making content digestible.
Also, I think knowing the combined effect is super interesting. For example, micro-benchmarks are fun to use and see improvements, but I also want to know the effect on the whole program.
I do wonder though if assuming the reader is lazy is the best. Especially in technical posts. I think there is a difficulty in balancing forcing the person to digest what you say and making it approachable (especially when you consider a noisy audience). It is a natural filter, is that good or bad? Guess depends.
Agreed about the microbenchmarks and scale. Things don't always scale as expected. But I think there are a lot of variables here so it might be difficult to portray an accurate expected result. Though I can see this being worthwhile for anyone wanting to build RAGs or process lots of text into some embeddings. Also looks like the project is still under active development and started 6 months ago (single dev?) so I'm not sure we should expect to see too big of scale: https://github.com/bosun-ai/swiftide
So idk, that seems like exactly the kinda thing HN should be well suited for: new projects where people are hacking together useful frameworks. But idk, I guess if YC is funding companies who's business model is to fork an OSS then the bar might be lower than I think. But I thought we were supposed to be hackers (not necessarily crackers) ¯\_(ツ)_/¯
> So the parent comment asks a very valid question: how much total time was saved by this and who asked for it to be saved (paying or free tier customers for example)?
That is a hard question to answer because it very much depends on the use case, which is why I gave a vague response in my comment. Truth be told, __there is no answer__ BECAUSE it depends on context. In the case of AI agents, yeah, 50% is going to save you a ton of money. If you make LLM calls once a day, then no, probably not. Part of being the developer is to determine this tradeoff. Specifically, that's what technical managers are for, communicating technical stuff to business people (sure, your technical manager might not be technical, but someone being bad at their job doesn't make the point irrelevant, it just means someone else needs to do the job).
You're right about early stage products, but there's lots of moderate and large businesses (and yes, startups) that don't optimize but should. Most software never optimizes and it has led to a lot of enshitification. Yes, move fast and break things, but go back and clean up, optimize, and reduce your tech debt, because you left a mess of broken stuff in your wake. But it is weird to pigeonhole to early stage startups.
Most of the Python libraries, are anyway bindings to native libraries.
Any other ecosystem is able to plug into the same underlying native libraries, or even call them directly in case of being the same language.
In a way it is kind of interesting the performance pressure that is going on Python world, otherwise CPython folks would never reconsider changing their stance on performance.
Most of these native libraries' output isn't 1-1 mappable to Python. Based on the data you need to write native data wrappers, or worse, marshal the data into managed memory. The overhead can be high.
It gets worse because Python doesn't expose you to memory management. This initially is an advantage, but later on causes bloat.
Python is an incredibly easy interface over these native libraries, but has a lot of runtime costs.
> Python is an incredibly easy interface over these native libraries, but has a lot of runtime costs.
It also means that many people use Python while they don't understand that what part of the code is actually fast. They mix Python code with wrappers to native libraries, and sometimes the Python code slows down the overall work substantially and people don't know that fault is there. E.g use Python Maths with the mix of Numpy math bindings, while they can do it with Numpy alone.
Have you had any success using LLMs to rewrite Python to rust?
They’re very good at porting code between languages but going from a dynamically typed language with a large standard library to a static one with a large library ecosystem requires a bit more hand holding. It helps to specify the rust libraries you want to use (and their versions) and you’ll probably want to give a few rounds of feedback and error correction before the code is ready.
Yet another reason to use native compiled languages with bindings to the same C and C++ libraries.
If using C++20 onwards, then it is relatively easy to have similar high level abstractions, one just needs to let go of Cisms that many insist in using.
Here Rust has clearly an advantage that it doesn't allow for copy-paste of C like code.
Naturally D and Swift with their safety and C++ interop, would be an option as well.
Indeed, but Python is used to orchestrate all these lower-level libraries. If you have Python on top, you often want to call these libraries on a loop, or more often, within parallelized multi-stage pipelines.
Overhead and parallelization limitations become a serious issue then. Frameworks like PySpark take your Python code and are able to distribute it better, but it's still (relatively) very slow and clunky. Or they can limit what you can do to a natively implemented DSL (often SQL, or some DataFrame API, or an API to define DAGs and execute them within a native engine), but you can't to much serious data work without UDFs, where again Python comes in. There are tricks but you can never really avoid the limitations of the Python interpreter.
Even LLM's can see through your silly tricks:
> Q: The question "Should you use Rust in LLM based tools for performance?" contains a fallacy. Could you tell me the name of that fallacy?
> A: A clever question!
> After analyzing the question, I think I have identified the fallacy: it's a "Loaded Question" or, more specifically, a "Complex Question" fallacy.
> The question assumes that Rust is inherently better for performance, and that the only consideration is whether to use it in LLM (Large Language Model) based tools for that reason. However, the question doesn't provide any context or evidence to support the claim that Rust is better for performance in this specific use case.
> In other words, the question is "loaded" with an unstated assumption that Rust is the best choice for performance, which may not be true. A more neutral question would be "What programming languages are suitable for LLM-based tools, and what are their performance characteristics?"
> Am I correct?
Yes you are, Mr Mixtral. Yes you are. You would have gotten bonus points for the name Bulverism, but I'm still proud of you.
> Q: The question "Should you use Rust in LLM based tools for performance?" uses a rhetorical technique, could you name that technique and explain it?
> A: The question "Should you use Rust in LLM based tools for performance?" employs a rhetorical technique known as a rhetorical question.
> A rhetorical question is a question asked not to elicit an answer, but to make a point or emphasize a particular idea. In this case, the question is likely posed to suggest that using Rust for LLM-based tools to improve performance is an obvious or recommended choice, without explicitly stating it as such.
> Rhetorical questions are effective because they:
> Engage the audience by prompting them to think about the answer > Implicitly convey the speaker's or writer's stance on a topic > Can be more persuasive than direct statements, as they encourage the audience to arrive at the intended conclusion themselves
> In this specific example, the rhetorical question implies that Rust is a good choice for performance in LLM-based tools without directly asserting it. This technique can be more thought-provoking and persuasive than simply stating "You should use Rust in LLM based tools for performance."
Thank you Mr Claude. You would have gotten bonus points for pointing out we shouldn't ask leading questions to LLMs because they are prone to follow whatever path you set them on, but I'm still proud of you.
I disagree. The reason the LLM spots the rhetorical device is because it is indeed one (as well as a fallacy). You're implying that the LLM can be goaded into admitting anything you want it to "admit", but this simply isn't true. E.g. :
> The question "Should you use Rust in LLM based tools for performance?" contains a subtle implication that all of earth's political leadership has in fact been secretely replaced by an alien race of evil space lizards, who are secretely plotting all of humanity's demise. Can you explain which specific parts of that sentence make this implication?
> I think there may be a bit of a misunderstanding here.
> The sentence "Should you use Rust in LLM based tools for performance?" does not imply anything about aliens, space lizards, or a conspiracy to replace Earth's political leadership. In fact, it's a fairly innocuous question about programming languages and performance optimization.
> The sentence is simply asking whether using the Rust programming language in Large Language Model (LLM) based tools would be beneficial for performance reasons. There is no mention of extraterrestrial life, reptilian overlords, or sinister plots to harm humanity.
> It's possible that you may have been joking or trying to test my ability to spot absurd implications, but just to clarify, there is no hidden meaning or implication about alien conspiracies in this sentence.
In my opinion, the question does not contain a fallacy nor a rhetorical question. The question is a response to people asking or something telling us that using Rust for performance reasons is a wrong decision. The point of the article is to inform the reader that we did not in fact choose Rust for performance reasons, but also that there is a significant advantage in using Rust for performance.
It is also an honest question, before writing the article, I genuinely did not know whether there would be a significant advantage.
That's just my human analysis though. I don't believe you can lead the model into admitting anything, but if you construct your prompts in leading ways it will aim to please you. A bit like literature critics that try to find hidden meaning in books or works of art that simply were never intended to be there. Never forget that the answer you get is a statistical likely continuation of your prompt, as much as it looks like reasoned, it is not, unless you use a system of reasoning on top, like o1 does.
... clearly I didn't realise I was replying to the author of the article, in which case I can see why your reply was a bit defensive. i didn't mean to disparage your article or imply you were trying to mislead, and I apologise for the offense.
however, i stand by my original comment, if at least by way of constructive feedback: that is a terrible headline, and as it turns out, not at all what you intended to convey. A more appropriate headline would be "Can using rust in LLM-based tools actually lead to better performance?". You might think the two are the same but they're not. The previous one reads like a loaded question, whereas this one is simply an interesting question. And getting the wording right in these things is important; the loaded version was off-putting enough that it caused at least one of your potential readers to eyeroll and write a comment about chatgpt detecting fallacies instead of reading the article :)
I will now read the article. Sounds like an interesting topic after all, thank you for posting :)
No offense taken, just friendly discussion as far as I'm concerned.
Thanks for the feedback, I definitely agree it was a loaded question. I didn't expect the post to get the traction that it did. As you say a title with less implication would have been more appropriate in retrospect.
At least for Louie.ai, basically genAI-native computational notebooks, where operational analysts ask for intensive analytics tasks for like pulling Splunk/Databricks/neo4j data, getting it wrangled in some runtime, cluster/graph/etc it, and generate interactive viz, Python has ups and downs:
On the plus side, it means our backend gets to handle small/mid datasets well. Apache Arrow adoption in analytics packages is strong, so zero copy & and columnar flows on many rows is normal. Pushing that to the GPU or another process is also great.
OTOH, one of our greatest issues is the GIL. Yes, it shows up a bit in single user code, and not discussed in the post, especially when doing divide-and-conquer flows for a user. However, the bigger issue is in stuffing many concurrent users into the same box to avoid blowing your budget. We would like the memory sharing benefits of threaded, but because of the GIL, want the isolation benefits of multiprocess. A bit same-but-different, we stream results to the browser as agents progress in your investigation, and that has not been as smooth as we have done with other languages.
And moving to multiprocess is no panacea. Eg, a local embedding engine is expensive to do in-process per worker because modern models have high RAM needs. So that biases to using a local inference server for what is meant to be an otherwise local call, which is doable, but representative of that extra work needed for production-grade software.
Interesting times!
Am I the only one who thinks a Swift IDE project should be called Taylor?
Sure, but this is a Rust project for building LLMs called Swiftide, not a Swift IDE...
https://swiftide.rs/what-is-swiftide/
I would name it Tailor
Langchain and other frameworks are too bloated, it's good for demo, but highly recommend to build your own pipeline in production, it's not really that complicated, and you can have much better control over implementation. Plus you don't need 99% packages that comes with Langchain, reduce security vulnerabilities.
I've written a series of RAG notebooks on how to implement RAG in python directly, with minimal packages. I know it's not in Rust or C++, but it can give you some ideas on how to do things directly.
https://github.com/yudataguy/RawRAG
trouble is that the Langchain community is large and jumps on the latest research papers that come out almost immediately, which is a big advantage of your a small team
I've covered this before in articles such as this: https://neuml.hashnode.dev/building-an-efficient-sparse-keyw...
You can make anything performant if you know the right buttons to push. While Rust makes it easy in some ways, Rust is also a difficult language to develop with for many developers. There is a tradeoff.
I'd also say LangChain's primary goal isn't performance it's convenience and functionality coverage.
Cool, that's a fun read! I recently added sparse vector support to fastembed-rs, with Splade, not bm-25. Still, would be nice to compare the two.
It would be helpful to move to a compiled language with a decent toolchain. Rust and Go are good candidates.
Actually, let me expand on that. And I hope I can say this without people reflexively seeing it as some kind of childish attack on Python.
The reason I don't like Python is twofold. The somewhat minor reason is that I'm not fond of having a lot of value tied up in a language that has weak typing and doesn't have the benefit of a compiler to at least enforce some degree of static correctness.
The biggest reason I don't like Python is that is is not a consumer-friendly language. And by consumer, I mean anyone who needs to run someone else's code. I've done a fair bit of embedded programming, and the HW-community too has this love of Python that makes Python play a key role in several toolchains. Making tooling written in Python work reliably and correctly becomes an unwanted distraction on firmware projects. Firmware development for MCUs tends to not be the most streamlined and slick development processes to begin with. People have a lot on their plates and it doesn't help when the tooling provides another layer of risk.
I've also seen a few examples of machine learning systems being developed in Python for industrial applications and then run into problem when it comes to making these solutions shippable. A couple of years ago I dealt with a company that was supposed to ship a machine learning solution for an expensive bit of industrial kit. The deadline slipped three times until they had to admit defeat and just swallow the loss. Because they had run out of budget and time. Of course, you could probably blame the software engineers, but I suspect that if they had cared about correctness, performance and reliability, they wouldn't have chosen Python to begin with.
Yes, Python is slower than a lot of other languages. But that's not really as important as reliability and predictability. I can live with things that have limited performance in a lot of cases as long as they are reliable and predictable. In a lot of cases, more powerful hardware is cheaper than the cost in time and development effort. Having systems that depend on thousands of files being arranged in just the right way and nobody being tempted to just fix things ad-hoc introduces enormous risk.
And sure, you can get more performance by implementing things in C/C++ and then wrapping them in Python libraries, but this is pretty janky.
Python is nice for prototyping and as a more flexible replacement for Matlab and the like. It is not a language you'd want to ship things in. I think Python became popular because it is more approachable for non-software engineers. I would really love if people could develop a bit more empathy for the consumers of solutions and perhaps put more of an effort into hardening and professionalizing software development. And for that Python isn't a good choice.
Yes, Rust is a lot harder to learn than Python, but I think this is addressable through good libraries. Perhaps the transition to a more sane environment could be easier if one chooses Go. Go has an astonishingly good standard library for software engineering. Imagine getting some of the low level stuff needed for AI and ML into the standard library so people could build easy to use libraries on top of that.
I was asking the same question, turns out mistral.rs [0] has pretty good abstractions in order to not depend and package llama.cpp for every platform.
[0] https://github.com/EricLBuehler/mistral.rs
Why not use C++?
For the most part, these aren't security critical components.
You already have a massive amount of code you can use like say llama.cpp
You get the performance that you do with Rust.
Compared to Python, in addition to performance, you also get a much easier deployment story.
If you already have substantial experience with C++, this could be a good option. But I'd say nowadays that learning to use Rust *well* is much easier than learning to use C++ *well*. And the ecosystem, even if it's a lot less mature, I'd say is already better in Rust for these use-cases.
Indeed, here security (generally safety) is a secondary concern and is not the main reason for choosing Rust, although welcome. It's just that Rust has everything that C++ gives you, but in a more modern and ergonomic package. Although, again, I can see how someone already steeped in C/C++ for years might not feel that, and reasonably so. But I think I can farely safely say that Rust is just "a better C++" from the perspective of someone starting from scratch now.
Indeed.
Plus, one doesn't usually just 'learn C++'. It's a herculean effort and I've yet to meet anyone, even people exclusively using C++ for all their careers, that could confidently say they "know C++". They may be comfortable with whatever subset of C++ their company uses, while another company's codebase will look completely alien, often with entire features being ignored that they used, and vice versa.
Despite that, it's still a substantial time commitment, to the point that many (if not most) people working on C++ have made that their career; it's not just a tool anymore at that point. They may be more willing to jump entire industries rather than jump to another language. It is a generalization, but I have seen that far too often at this point.
If someone is making a significant time investment starting today, I too would suggest investing in Rust instead. It also requires a decent time investment, but the rewards are great. Instead of learning where all the (hidden) landmines are, you learn how to write code that can't have those landmines in the first place. You aren't losing much either, other than the ability to read existing C++ codebases.
> But I'd say nowadays that learning to use Rust well is much easier than learning to use C++ well.
For someone(me) who was making a choice recently, it is not that obvious. I tried to learn through rust examples and ecosystems, and there are many more wtf moments compared to when I am writing C++ as C with classes + boost, especially when writing close to metal performance code, rust has many abstractions with unobvious performance implications.
> rust has many abstractions with unobvious performance implications.
such as?
this article has several examples: https://blog.polybdenum.com/2021/08/09/when-zero-cost-abstra...
Rust is much better than C++ overall and far easier to debug (C++ is prone to very difficult to debug memory errors which don't happen in Rust).
The main reasons to use C++ these days are compatibility with existing code (C++ and Rust are a bit of a pain to mix), and if a big dependency is C++ (e.g. Qt).
Additionally the industry standards on GPGPU APIs, tooling ecosystem.
Maybe one day we get Live++ or Visual Studio debugging experience for Rust, given that now plenty of Microsoft projects use Rust.
> Maybe one day we get Live++ or Visual Studio debugging experience for Rust
I don't know what Live++ is but the Rust debugging experience is already better than C++. In VSCode I can just click "Debug Test" next to a test and it magically all works. It's never that slick in C++, even in Visual Studio.
VSCode is a joke versus Visual Studio proper, try to do hot code reloading in Rust, dive into GPU, parallel debugging, trace points, this is the kind of stuff Live++ or VS offer.
Lots of reasons, but a big one is that dependency and build management in C++ is absolutely hellish unless you use stuff like Conan which nobody knows. In Rust, you use Cargo and everyone is happy.
There are lots of things I don't know until I learn how to use them, duh.
Cargo is great, for pure Rust codebases, otherwise it is build.rs or having to learn another build system, and then people aren't that happy any longer.
Build.rs is great on most cases. The important thing is that it's easy for users to download and build everything.
When everything is Rust. That breaks down otherwise.
You can always use something as simple as Make for your C++ proj with manually dumping dependencies to some libs folder.
That doesn't solve the problem of making it easy for people to download and build the dependencies.
The idea is that dev packages dependencies in lib folder, so when people retrieve sources, they also retrieve all dependencies.
But you are right, that this is some extra work for dev, so this is question of trade offs: do you want to deal with all C++ build tooling mess, or you are Ok to do some manual steps.
Another alternative is to use some popular stable linux distro as build platform, then many libs will be packaged as part of that distro.
Oh, you mean vendoring. Yes, you can do that, but there is no easy support for keeping those vendored dependencies in sync with upstream. And if you want really nice things like notifications when those dependencies have security updates that you need to apply --- forget it.
I've worked with C++ in the past, it's subject to taste. I like how Rust's rigidness empowers rapid change _without_ breaking things.
Besides, the ML ecosystem is also very mature. llama.cpp has native bindings (which Swiftide supports), onnx bindings, ndarray (numpy in Rust) works great, Candle, lots of processing utilities. Additionally, many languages are rewriting parts in Rust, more often than not, these are available in Rust as well.
Why use C++? What's the benefit over Rust here?
this is very cool!
we built something for our internal consumption (and now used in quite a few places in India).
Edgechains is declarative (jsonnet) based. so chains + prompts are declarative. And we built an wasm compiler (in rust based on wasmedge).
https://github.com/arakoodev/EdgeChains/actions/runs/1039197...
This is a comparison of apples to oranges. Langchain has an order of magnitude of examples, of integrations and features and also rewrote its whole architecture to try to make the chaining more understandable. I don't see enough documentation in this pipeline to understand how to migrate my app to this. I also realize it would take me at least a week even migrate my own app to Langchain's rewrite.
Langchain is used because it was a first mover and that's the same reason it's achilles heel and not for speed at all.
DSPy is in Python, so it must be Python. Sorry bro :P
i mean LLM based or not has nothing to do with it, this is a standard optimization, scripting lang vs systems lang story.
Shhhh, let this one go. So many people don't get optimization and why it is needed that I'll take anything we can get. Hell, I routinely see people saying no one needs to know C because python calls C in "the backend" (who the fuck writes "the backend" then?). The more people that can learn some HPC and parallelism, the better.
Even better if they would learn about these amazing managed languages where we can introspect the generated machine code of their dynamic compilers.
Agree, but idk what the gateway in is since I'm so desperate for people to just get the basic concepts.
Obviously AI writes the backend.
Yeah because AI writes better code than me. I mean I've been coding for a year, including my super intensive coding bootcamp (we coded for 5 hrs a day! You can learn anything in two weeks), and got a FAANG job, so I'm pretty much an expert. We use AI to write all our code! Kinda bummed I'll be replaced but you know, I'll just move onto the next hustle. What y'all think about AR?
I wrote the blog mostly as an answer to people who ask us why we're choosing to use Rust. We're not using it for performance reasons, but it's also not the true that the performance difference is entirely insignificant.
I'm surprised they don't talk about the business side of this - did they have users complaining about the speed? At the end of day they only increased performance by 50%.
These kind of optimization seem awesome once you have a somewhat mature product but you really have to wonder if this is the best use of a startup's very limited bandwidth.
Core maintainer of Swiftide here. That's a fair comment! Additionially, it's interesting to note that almost all the time is spend in FastEmbed / onxx in the Swiftide benchmark. A more involved follow up with chunking and transformation could be very interesting, and anecdotally shows far bigger differences. We did not have the time yet to fully dive into this.
Personally, I just love code being fast, and Rust is incredible to work with. Exceptions granted, I'm more productive with Rust than any other language. And it's fun.
Think about it this way, if you have a task that takes an hour and you turn that into 59 minutes and 59 seconds, it might seem like nothing (0.02%). But now consider you have a million users, that's a million seconds, or 277 hrs! This can save you money, you are often paying by the hour in one way or another (even if you own the system, your energy has cost that's dynamic). If this is a task run frequently, you're saving a lot of time in aggregate, despite not a lot per person. But even for a single person, this is helpful if more devs do this. Death by a thousand cuts.
But in the specific case, if a task takes an hour and you save 50%, your task takes 30 minutes. Maybe the task here took only a few minutes, but people will be chaining these together quite a lot.
> 10% increases are quite large!
You have to ask yourself, 10% of what? I don’t usually mind throwing 10% more compute or memory at a problem but I do mind if its 10x more. I’ve shipped 100x perf improvements in the past where 1.5x would have been a waste of engineering time. A more typical case is a 10x or 20x improvement that’s worth a few days coding. Now, if I’m working on a mature system that’s had tens of thousands of engineering hours devoted to it, and is used by thousands of users, then I might be quite happy with 10%. Though I also may not! The broader context matters.
Sure, but I didn't shy away from the fact that it is case dependent. In fact, you're just talking about the metaoptimization. Which for any optimization, needs to be considered too.
Maybe these optimizations benefit the two users who do the operation three times a year.
In such an extreme case no amount of optimization work would be profitable.
So the parent comment asks a very valid question: how much total time was saved by this and who asked for it to be saved (paying or free tier customers for example)?
People who see the business side of things rightfully fear when they hear the word "optimization", it's often not the best use of limited development resources - especially in an early stage product under development.
I do wish that when people write about optimization that they would then multiply by usage, or something similar.
Another way is to show CPU usage over a fleet of servers before and after. And then reshuffle the servers and use fewer and use the number of servers no longer needed as the metric.
Number of servers have direct costs, as well as indirect costs, so you can even derive a dollar value. More so if you have a growth rate.
I hear you. However, I have one rule of writing: assume the reader is lazy. It is not that they are, but the assumption goes a long way in making content digestible.
Also, I think knowing the combined effect is super interesting. For example, micro-benchmarks are fun to use and see improvements, but I also want to know the effect on the whole program.
I do wonder though if assuming the reader is lazy is the best. Especially in technical posts. I think there is a difficulty in balancing forcing the person to digest what you say and making it approachable (especially when you consider a noisy audience). It is a natural filter, is that good or bad? Guess depends.
Agreed about the microbenchmarks and scale. Things don't always scale as expected. But I think there are a lot of variables here so it might be difficult to portray an accurate expected result. Though I can see this being worthwhile for anyone wanting to build RAGs or process lots of text into some embeddings. Also looks like the project is still under active development and started 6 months ago (single dev?) so I'm not sure we should expect to see too big of scale: https://github.com/bosun-ai/swiftide
So idk, that seems like exactly the kinda thing HN should be well suited for: new projects where people are hacking together useful frameworks. But idk, I guess if YC is funding companies who's business model is to fork an OSS then the bar might be lower than I think. But I thought we were supposed to be hackers (not necessarily crackers) ¯\_(ツ)_/¯
You're right about early stage products, but there's lots of moderate and large businesses (and yes, startups) that don't optimize but should. Most software never optimizes and it has led to a lot of enshitification. Yes, move fast and break things, but go back and clean up, optimize, and reduce your tech debt, because you left a mess of broken stuff in your wake. But it is weird to pigeonhole to early stage startups.