Hey all, I'm excited to be the new CEO of DBOS! I'm coming up on my one month anniversary. I joined because I truly believe DBOS is solving a lot of the main issues with serverless deployments. I still believe that Serverless is the way of the future for most applications and I'm excited to make it a reality.
Would it be correct to say the these client libraries provide the functionality (eg ease of transactions, once only, recovery) whereas your cloud offering solves the scaling / performance issues you’d hit trying to do this with a regular pg compatible DB?
I do a lot of consulting on Kafka-related architectures and really like the concept of DBOS.
Customers tend to hit a wall of complexity when they want to actually use their streaming data (as distinct from simply piping it into a DWH).. being able to delegate a lot of that complexity to the lower layers is very appealing.
Would DBOS align with / complement these types of Kafka streaming pipelines or are you addressing a different need?
Yeah exactly! The Kafka use case is a great one--specifically writing consumers that perform real-world processing on events from Kafka.
In fact, one of our first customers used DBOS to build an event processing pipeline from Kafka. They hit the "wall of complexity" you described trying to persist events from Kafka to multiple backend data stores and services. DBOS made it much simpler because they could just write (and serverlessly deploy) durable workflows that ran exactly-once per Kafka message.
Recently I came to know about https://www.membrane.io/, which also follows similar approach, but it looks like that is more for internal apps and small projects.
From a high level what we offer is similar -- durable and reliable compute.
There isn't a lot of public information about how they are built, but from what I can tell you're right -- their architecture is more oriented for small projects.
It looks like they store the entire JS heap in a SQLLite database. We store schematized state checkpoints in Postgres compatible database, which makes it so that we can scale up and allow interesting things like querying the previous states and time travel debugging, where you can actually step though previously run workflows.
I've been using Temporal recently for some long-running multi-step AI workflows -- helps me get around API flakiness, manage rate limits for hosted models, and manage load on local models. It's pretty cool to write workers in different languages and run them on different infra and have them all orchestrate together nicely. How does DBOS compare -- what are the core differences?
From what I can tell, the programming model seems to be pretty similar but DBOS doesn't require a centralized workflow server, just serverless functions?
Great question! Yeah, the biggest difference is that DBOS doesn't require a centralized workflow server, but does all orchestration directly in your functions (through the decorators), storing your program's execution state in Postgres. Implications are:
1. Performance. A state transition in DBOS requires only a database write (~1 ms) whereas in Temporal it requires a roundtrip and dispatch from the workflow servers (tens of ms -- https://community.temporal.io/t/low-latency-stateless-workfl...).
2. Simplicity. All you need to run DBOS is Postgres. You can run locally or serverlessly deploy your app to our hosted cloud offering.
it might be interesting to look at some standard for workflows like CACAO to express what a workflow is. that way, workflows can ultimately become shareable between such workflow execution engines, and have common workflow editors. its (in cyber) a big problem that workflows cannot be shared between different systems which adds great costs to implementing such a system (need to redesign or design all workflows from the ground up). I think workflows and easy editors to assemble and connect steps are a good step ahead in any automation domain, but everywhere people want to reinvent the wheel of expressing what a workflow is.
definitely a fan of what these types of systems can do in replay/recovering and retying steps etc. as well as centralizing a lot of didferent workloads to a common execution engine.
Looks like an interesting abstraction. I can see the usefulness because I had to create a poor man's version of this when I built a CD. Sorry, because I don't have time to watch the 50 minute video, but how are you guaranteeing durability? Are you basically opening Postgres transactions for each step, or is there something else going on to persist state?
Yeah! Under the hood, DBOS wraps each function (step) to log its output in the database. This ensures that workflows can be safely re-executed if they're interrupted, guaranteeing durability.
Co-founder here! No current plans to support SQLite. We picked Postgres because of its huge ecosystem--you can use DBOS with any PostgreSQL-compatible database (Supabase, Neon, Aurora, Cockroach...) and with any Postgres extension (here's an example app using pgvector: https://github.com/dbos-inc/dbos-demo-apps/tree/main/python/...)
(After a quick look at the code) Is this due to concurrency (writes)? It looks like this architecture supports multiple executors, and I would imagine you require transactional guards to ensure consistency. I really like this interface btw, the complexity is hidden very well and from reading your docs it remains accessible if you need to dig deeper than a decorator.
And how the heck are you maintaining Typescript and Python copies? lol
Thanks for your kind words! We're focusing on Postgres because an important scenario for durable execution is serverless computing, which won't work with an embedded database.
I am sure you are aware of this, but if not: there are some emerging technologies around embedded database scale-out using CRDT and other replication protocols that would support various “serverless” (as in decentralized) topologies. PGLite, sqlite-cr, libSQL et al. I am informally looking at for serverless executor agents that do not need to coalesce around a central database instance (“server-full”).
I am sure you tested something like this, I would guess that classic/CDC replication lag would throw a big wrench into an attempt to orchestrate disconnected remote executors, I am hoping that in a peer to peer topology this new tech will have low enough sync latencies to be useful.
Best of luck with DBOS! You have an amazing team.
Step functions are an "external orchestrator". With DBOS your orchestration runs in the same process, so you can use normal testing frameworks like pytest for unit tests. Its super easy to test locally.
We have a time travel debugger that makes it super easy to test workflows. You could set them up in test and then time travel them, or even time travel completed workflows in production.
This looks quite cool. If anyone from DBOS is still around: does this handle more complex dependency relations between workflow steps (e.g. directed graphs), or is it only suitable for linear workflows?
I'd recommend using child workflows for directed graphs. The building blocks are start_workflow to split off a child, and then you can wait for the result at a later time. Workflows can also send events / messages to communicate back and forth with each other.
One neat thing about starting a child workflow is you can assign an idemopotency ID, which might be intentionally calculated in a way such that multiple parents will only start one run of the child workflow.
Very funny to me how much everyone went away from the db to achieve idempotent behavior, and now we're back to just using a db as a complicated queue with state.
It turns out that DBs were invented to solve hard problems with state management. When people moved away from DBs (at least transactional relational DBs) they had to rediscover all the old problems. Tech is cyclical.
One of the motivation for DBOS is that OSes were designed with orders of magnitude less state to manage than today. (e.g. linux >30 years ago). What's made to manage tons of state? A DBMS! :)
Recovering the application from failures especially when updating multiple data sources, once and only once execution and such things are in the application domain. They have never been done by relational databases. That is the problem solved by the Python SDK of DBOS ( and typescript SDK)
It's sort of a combination of both. The library solves those problems by storing specific data in the database and then taking advantage of the database's ACID properties and transactions to make the guarantees.
Then the DBOS cloud platform optimizes those interactions between the database and code so that you get a superior experience to running locally.
Hey all, I'm excited to be the new CEO of DBOS! I'm coming up on my one month anniversary. I joined because I truly believe DBOS is solving a lot of the main issues with serverless deployments. I still believe that Serverless is the way of the future for most applications and I'm excited to make it a reality.
Ask me anything!
Would it be correct to say the these client libraries provide the functionality (eg ease of transactions, once only, recovery) whereas your cloud offering solves the scaling / performance issues you’d hit trying to do this with a regular pg compatible DB?
I do a lot of consulting on Kafka-related architectures and really like the concept of DBOS.
Customers tend to hit a wall of complexity when they want to actually use their streaming data (as distinct from simply piping it into a DWH).. being able to delegate a lot of that complexity to the lower layers is very appealing.
Would DBOS align with / complement these types of Kafka streaming pipelines or are you addressing a different need?
Yeah exactly! The Kafka use case is a great one--specifically writing consumers that perform real-world processing on events from Kafka.
In fact, one of our first customers used DBOS to build an event processing pipeline from Kafka. They hit the "wall of complexity" you described trying to persist events from Kafka to multiple backend data stores and services. DBOS made it much simpler because they could just write (and serverlessly deploy) durable workflows that ran exactly-once per Kafka message.
Recently I came to know about https://www.membrane.io/, which also follows similar approach, but it looks like that is more for internal apps and small projects.
How would you compare DBOS with that ?
From a high level what we offer is similar -- durable and reliable compute.
There isn't a lot of public information about how they are built, but from what I can tell you're right -- their architecture is more oriented for small projects.
It looks like they store the entire JS heap in a SQLLite database. We store schematized state checkpoints in Postgres compatible database, which makes it so that we can scale up and allow interesting things like querying the previous states and time travel debugging, where you can actually step though previously run workflows.
I've been using Temporal recently for some long-running multi-step AI workflows -- helps me get around API flakiness, manage rate limits for hosted models, and manage load on local models. It's pretty cool to write workers in different languages and run them on different infra and have them all orchestrate together nicely. How does DBOS compare -- what are the core differences?
From what I can tell, the programming model seems to be pretty similar but DBOS doesn't require a centralized workflow server, just serverless functions?
Co-founder here:
Great question! Yeah, the biggest difference is that DBOS doesn't require a centralized workflow server, but does all orchestration directly in your functions (through the decorators), storing your program's execution state in Postgres. Implications are:
1. Performance. A state transition in DBOS requires only a database write (~1 ms) whereas in Temporal it requires a roundtrip and dispatch from the workflow servers (tens of ms -- https://community.temporal.io/t/low-latency-stateless-workfl...). 2. Simplicity. All you need to run DBOS is Postgres. You can run locally or serverlessly deploy your app to our hosted cloud offering.
it might be interesting to look at some standard for workflows like CACAO to express what a workflow is. that way, workflows can ultimately become shareable between such workflow execution engines, and have common workflow editors. its (in cyber) a big problem that workflows cannot be shared between different systems which adds great costs to implementing such a system (need to redesign or design all workflows from the ground up). I think workflows and easy editors to assemble and connect steps are a good step ahead in any automation domain, but everywhere people want to reinvent the wheel of expressing what a workflow is.
definitely a fan of what these types of systems can do in replay/recovering and retying steps etc. as well as centralizing a lot of didferent workloads to a common execution engine.
Hello! I’m here to answer any questions. I’d love to hear your feedback, comments, and anything!
Looks like an interesting abstraction. I can see the usefulness because I had to create a poor man's version of this when I built a CD. Sorry, because I don't have time to watch the 50 minute video, but how are you guaranteeing durability? Are you basically opening Postgres transactions for each step, or is there something else going on to persist state?
Yeah! Under the hood, DBOS wraps each function (step) to log its output in the database. This ensures that workflows can be safely re-executed if they're interrupted, guaranteeing durability.
More info here: https://docs.dbos.dev/explanations/how-workflows-work
Can you give us some more details about the CD pipeline you built? :)
Is it possible not to use a PostgreSQL database? For example would it run with SQLite? The goal is to improve developer experience.
You can now run Wasm builds of PostgreSQL that will get you everything you like about SQLite.
https://github.com/electric-sql/pglite
Co-founder here! No current plans to support SQLite. We picked Postgres because of its huge ecosystem--you can use DBOS with any PostgreSQL-compatible database (Supabase, Neon, Aurora, Cockroach...) and with any Postgres extension (here's an example app using pgvector: https://github.com/dbos-inc/dbos-demo-apps/tree/main/python/...)
(After a quick look at the code) Is this due to concurrency (writes)? It looks like this architecture supports multiple executors, and I would imagine you require transactional guards to ensure consistency. I really like this interface btw, the complexity is hidden very well and from reading your docs it remains accessible if you need to dig deeper than a decorator.
And how the heck are you maintaining Typescript and Python copies? lol
Thanks for your kind words! We're focusing on Postgres because an important scenario for durable execution is serverless computing, which won't work with an embedded database.
I am sure you are aware of this, but if not: there are some emerging technologies around embedded database scale-out using CRDT and other replication protocols that would support various “serverless” (as in decentralized) topologies. PGLite, sqlite-cr, libSQL et al. I am informally looking at for serverless executor agents that do not need to coalesce around a central database instance (“server-full”). I am sure you tested something like this, I would guess that classic/CDC replication lag would throw a big wrench into an attempt to orchestrate disconnected remote executors, I am hoping that in a peer to peer topology this new tech will have low enough sync latencies to be useful. Best of luck with DBOS! You have an amazing team.
How does the DX compare against AWS step functions? My experience is that it is very difficult to “unit test” step workflows.
Step functions are an "external orchestrator". With DBOS your orchestration runs in the same process, so you can use normal testing frameworks like pytest for unit tests. Its super easy to test locally.
We have a time travel debugger that makes it super easy to test workflows. You could set them up in test and then time travel them, or even time travel completed workflows in production.
https://docs.dbos.dev/cloud-tutorials/timetravel-debugging
This looks quite cool. If anyone from DBOS is still around: does this handle more complex dependency relations between workflow steps (e.g. directed graphs), or is it only suitable for linear workflows?
I'd recommend using child workflows for directed graphs. The building blocks are start_workflow to split off a child, and then you can wait for the result at a later time. Workflows can also send events / messages to communicate back and forth with each other.
One neat thing about starting a child workflow is you can assign an idemopotency ID, which might be intentionally calculated in a way such that multiple parents will only start one run of the child workflow.
Very funny to me how much everyone went away from the db to achieve idempotent behavior, and now we're back to just using a db as a complicated queue with state.
It turns out that DBs were invented to solve hard problems with state management. When people moved away from DBs (at least transactional relational DBs) they had to rediscover all the old problems. Tech is cyclical.
One of the motivation for DBOS is that OSes were designed with orders of magnitude less state to manage than today. (e.g. linux >30 years ago). What's made to manage tons of state? A DBMS! :)
Recovering the application from failures especially when updating multiple data sources, once and only once execution and such things are in the application domain. They have never been done by relational databases. That is the problem solved by the Python SDK of DBOS ( and typescript SDK)
It's sort of a combination of both. The library solves those problems by storing specific data in the database and then taking advantage of the database's ACID properties and transactions to make the guarantees.
Then the DBOS cloud platform optimizes those interactions between the database and code so that you get a superior experience to running locally.
Would I be able to use all the python and npm packages with it. Would something opening a headless browser to scrap data work with DBOS ?
Yes, its normal Python/node.js so you an use all their packages.
We know of users running puppeeter to scrap data.
[dead]