Note: StarVector models will not work for natural images or illustrations, as they have not been trained on those images. They excel in vectorizing icons, logotypes, technical diagrams, graphs, and charts.
No it won't (most likely). VTracer (which the authors compare with) is fast, runs in browser via wasm, consumes way less resources and can even convert natural images very decently.
But the model seems cool for the usecase of prompt to logo or icon (over my current workflow of getting a jpg from flux and passing it through VTracer). I hope someone over at llama.cpp notices this (at least for the text-to-svg usecase, if not multimodal).
Author of VTracer here. Finally being able to comment on hackernews before the thread got locked.
Would be interested in learning about your workflow. Is it a logo generation app?
I feel like this is an example of "Machine learning is eating software". Raster to vector conversion is a perfect problem, because we can generate dataset of infinite sizes and can easily validate them with vectorize-rasterize roundtrips.
I did have an idea of performing tracing iteratively. Basically by adjusting the output SVG bit-by-bit until it matches the original image within a certain margin of error. And optimizing the output size of the SVG by simplifying curves if it does not degrade the quality. But VTracer in its current state is oneshot and probably uses 1/100 of the computational resources.
VTracer seems to perform badly on all the examples. I suspect it can be drastically improved simply by upscaling the image (via traditional interpolation, or machine learning based) and picking different parameters. But I am glad that it was cited!
Thanks for noticing this, and yes I have also noticed what you're pointing out, but workable for many use cases. I use this workflow for making images for marketing or web (so images are more artistic than photo realistic generations to begin with). Think of stuff you can find on undraw, but generated by image models from prompts. Then run them through VTracer. The reproductions are not perfect, but are often good enough (can be slow depending on how sharp you want the curves, and often very large file sizes as you mentioned). Then make any changes in inkscape and convert back to raster for publishing.
> logo generation app
For logo generation, I would actually prefer code gen. I thought of this problem when reading about the diffusion language models recent (if there is lots of training data available in form of text-vector-raster triplets).
Did anyone else notice - the molecule it generated did not match the source image.
Seems like this could be incredibly valuable, but I'd argue there needs to be validation steps in place to confirm it's actually generating the right thing, for the case of image -> vector generation.
This would be absolutely GREAT for generating icons for applications!
(Also would make a great SaaS... for $X/month ($9.95, $19.95, ??.??) generate unlimited icons...)
Congrats to the team for their pioneering hard work in this nascent area of LLM/Transformer research!
Well done!
No it won't (most likely). VTracer (which the authors compare with) is fast, runs in browser via wasm, consumes way less resources and can even convert natural images very decently. But the model seems cool for the usecase of prompt to logo or icon (over my current workflow of getting a jpg from flux and passing it through VTracer). I hope someone over at llama.cpp notices this (at least for the text-to-svg usecase, if not multimodal).
Author of VTracer here. Finally being able to comment on hackernews before the thread got locked.
Would be interested in learning about your workflow. Is it a logo generation app?
I feel like this is an example of "Machine learning is eating software". Raster to vector conversion is a perfect problem, because we can generate dataset of infinite sizes and can easily validate them with vectorize-rasterize roundtrips.
I did have an idea of performing tracing iteratively. Basically by adjusting the output SVG bit-by-bit until it matches the original image within a certain margin of error. And optimizing the output size of the SVG by simplifying curves if it does not degrade the quality. But VTracer in its current state is oneshot and probably uses 1/100 of the computational resources.
VTracer seems to perform badly on all the examples. I suspect it can be drastically improved simply by upscaling the image (via traditional interpolation, or machine learning based) and picking different parameters. But I am glad that it was cited!
Thanks for noticing this, and yes I have also noticed what you're pointing out, but workable for many use cases. I use this workflow for making images for marketing or web (so images are more artistic than photo realistic generations to begin with). Think of stuff you can find on undraw, but generated by image models from prompts. Then run them through VTracer. The reproductions are not perfect, but are often good enough (can be slow depending on how sharp you want the curves, and often very large file sizes as you mentioned). Then make any changes in inkscape and convert back to raster for publishing.
> logo generation app
For logo generation, I would actually prefer code gen. I thought of this problem when reading about the diffusion language models recent (if there is lots of training data available in form of text-vector-raster triplets).
Did anyone else notice - the molecule it generated did not match the source image.
Seems like this could be incredibly valuable, but I'd argue there needs to be validation steps in place to confirm it's actually generating the right thing, for the case of image -> vector generation.
...you are the validator
I've been waiting for someone to make something like this! Perfect.
Horray. Code is released