It looks like LEGO provides 3D models for their components (e.g: via Bricklink [1]).
Wouldn't it be easier to generate the training data with a rendering pipeline of some sort that randomizes the brick position in a 3D scene with different lighting?
Of course you can still collect the user submissions for the test set
> Wouldn't it be easier to generate the training data with a rendering pipeline of some sort that randomizes the brick position in a 3D scene with different lighting?
You might enjoy the 2023 paper "Brickognize: Applying Photo-Realistic Image Synthesis for Lego Bricks Recognition with Limited Data" https://www.mdpi.com/1424-8220/23/4/1898
From personal experience, it doesn't work that well. The available 3D models [1] are not detailed enough and the effort to make realistic and diverse renders is huge. It will kinda work but it won't generalize well into real images.
That is something I don't really understand exactly why. 3D Models provide in one dataset the whole geometry information of each part, making it easier to learn and recognize that part in the real world, wouldn't it?
Is it because these ML algos lack a way to internally interpret 3D models for learning?
On top of that you reduce effort in data labeling, as each model would come with the relevant part ID, shape and color.
Of course, finetuning can be done afterwards with real world photos to increase robustness.
I think it's because we classify single images which lose 3d information. We expect them to work with single image but not even humans do that (without stereoscopic vision) if we can help it.
Computer vision has become very good, but whatever it is exactly that it is doing, it is still not the same as human vision, and this is one of the places where that really sticks out. A human can learn from a bunch of super-perfectly-pristine 3D renders to recognize a real-world object. Heck, we can do it from a single pristine 3D render, and not even necessarily a high-quality/resolution one. Whatever it is that computer vision is doing, it is something less "powerful" than human, which we then make up for by throwing a lot more computation and resources at it, which covers over a lot of the problems.
If you can figure out exactly why that is, you will at the very least get a very well-cited paper out of it, if not win some sort of award. It's not a complete mystery what the problem is but the solution is unknown.
Because we don't know what the difference is, we can't fix up the 3D renders. "Just make it noisy" certainly isn't it. (We in fact have a lot of experience throwing noise at things; the whole "stable diffusion" AI image generation is based on that principle at its core.) It has to be the right sort of noise, or distortions, to represent the real world, and nobody can tell you exactly what that is.
Yes, you can train an ML model on rendered data, but the model tends to fixate on rendering artifacts, and performance doesn't transfer to real world images. Plus, it's very difficult to make generated scenes with the variety and complexity of the real world. Your trained model will fail to generalize to all the distractions in natural scenes.
Yes, there are techniques for all these problems, but none of them are good, reliable, or easy to get right.
Yes, I think it should be possible without any technological hurdles. It's just some work to set it up.
(DL models are trained with one type of camera and used with another type of camera all the time, and that's sort of similar; plus renderers as used in blender3d are pretty good and they should work well with LEGO bricks which are relatively simple objects; if despite all this it doesn't work you could degrade both the real and the generated images e.g. by quantization, with the aim of bringing them closer together.)
One question about Brickit - the main usecase I and many many dads of Lego kids have is that our kids want to reassemble sets and we spend ages searching for these pieces. Yet Brickit works by identifying and recommending its own mini-set lists. Is this a usecase that is not in scope (because of business model) or is technically difficult or unsatisfactory in execution (colors, accuracy)?
The use case "let me reassemble all the parts that I have" is out of scope of Brickit app. It works explicitly with the contents of a single scan, that's why you see small ideas to build: it's just what fits into the parts you scanned.
But! We recognize the intent, and we have something in the works which will be released very soon, stay tuned!
I built a rudimentary brick sorting machine with the Lego color sensor many years ago, and I've seen some awesome brick sorting machines pop up over the years [1]. I started planning a sorting machine with a larger array of buckets in CAD but never got around to building it, hopefully the intrepid Lego fans over time will keep advancing the state of the art.
I’ve tried different part detectors, and nothing has been as good as the Google Image search (download the Google App if you’re on an iPhone.)
I’ve found it surprisingly good at identifying LEGO parts.
I tried to do brick sorting (because we have great detection and classification models at https://brickit.app/)
It turned out to be much more complex than I expected.
The biggest issue was grabbing. Typical approach for this type of task is to use vacuum suction actuator, but it does not work for Lego parts, because they have stubs and prevent suction from working.
Also there are issues of part separation.
We abandoned this idea, but I still hope that we can return and achieve something working some day.
But what would be the point of that, beyond the thrill of building a robot? The joy of LEGO is finding the pieces and building it yourself, not in owning an assembled model.
We're a ways away from a hand and arm that dexterous, though there's been promising developments given all the money going into AI lately. We're also a ways away from having a machine "read" a PDF of LEGO instructions and then modeling it as a digital twin. It's being worked on though, so maybe in our lifetimes.
That's an AI I'd rather not see though - that's the fun part OF LEGOs. If you really want a model completed, pay a local teenage (or laid off software dev) to do it.
Have you ever assembled a Lego set rated 8+ or older? It's not about stacking bricks, it's about applying just enough pressure at exactly the right angle, whilst supporting the assembly in the right places. (Something humans can do remarkably well.)
This is something you can probably program an expensive assembly robot to do if it has to repeat that same procedure thousands of times, but doing it for a whole set human operated? That only makes sense if you haven't got a set of nimble appendices and opposable thumbs, and you have the time to spare and inclination to train with the remotely controlled arm. Having an LLM do it seems fruitless at this point — and why bother?
Obviously, the reason for wanting a LEGO assembly robot is not to assemble LEGO projects, but it's to learn from the technology that makes it possible ...
Neat, I happpened to rediscover and sort my childhood lego collection two weeks ago, and had good success for the "no so common parts" with the brickowl camera search (I don't know what kind of engine is behind)
RebrickNet is an abandoned project that supports only 300 parts. I recommend Brickognize [1], which recognizes all 85k Lego parts, minifigures, and sets.
That's a weirdly minimalistic site. I mean it's nice that the front page is exactly what you want to see but I wish the About page would show some more information about the what the site can do, some examples, technology bits etc.
(I'm not a LEGO expert.)
It looks like LEGO provides 3D models for their components (e.g: via Bricklink [1]). Wouldn't it be easier to generate the training data with a rendering pipeline of some sort that randomizes the brick position in a 3D scene with different lighting?
Of course you can still collect the user submissions for the test set
[1]: https://www.bricklink.com/v3/studio/download.page
There is some code to generate Lego images from Lego 3d files here and use it to train models:
https://github.com/jtheiner/LegoBrickClassification
It is based on this post (which others have mentioned)
https://jacquesmattheij.com/sorting-lego-the-software-side/
> Wouldn't it be easier to generate the training data with a rendering pipeline of some sort that randomizes the brick position in a 3D scene with different lighting?
You might enjoy the 2023 paper "Brickognize: Applying Photo-Realistic Image Synthesis for Lego Bricks Recognition with Limited Data" https://www.mdpi.com/1424-8220/23/4/1898
Wow. Exactly my idea. Thanks for linking it, I'm glad it's a successful approach!
From personal experience, it doesn't work that well. The available 3D models [1] are not detailed enough and the effort to make realistic and diverse renders is huge. It will kinda work but it won't generalize well into real images.
[1] https://www.ldraw.org/
That is something I don't really understand exactly why. 3D Models provide in one dataset the whole geometry information of each part, making it easier to learn and recognize that part in the real world, wouldn't it?
Is it because these ML algos lack a way to internally interpret 3D models for learning?
On top of that you reduce effort in data labeling, as each model would come with the relevant part ID, shape and color.
Of course, finetuning can be done afterwards with real world photos to increase robustness.
I think it's because we classify single images which lose 3d information. We expect them to work with single image but not even humans do that (without stereoscopic vision) if we can help it.
Right, I would assume we'd render multiple 2D images from different angles, and train on that.
But as I'm writing this, why can't we do ML directly with the 3D models...
Computer vision has become very good, but whatever it is exactly that it is doing, it is still not the same as human vision, and this is one of the places where that really sticks out. A human can learn from a bunch of super-perfectly-pristine 3D renders to recognize a real-world object. Heck, we can do it from a single pristine 3D render, and not even necessarily a high-quality/resolution one. Whatever it is that computer vision is doing, it is something less "powerful" than human, which we then make up for by throwing a lot more computation and resources at it, which covers over a lot of the problems.
If you can figure out exactly why that is, you will at the very least get a very well-cited paper out of it, if not win some sort of award. It's not a complete mystery what the problem is but the solution is unknown.
Because we don't know what the difference is, we can't fix up the 3D renders. "Just make it noisy" certainly isn't it. (We in fact have a lot of experience throwing noise at things; the whole "stable diffusion" AI image generation is based on that principle at its core.) It has to be the right sort of noise, or distortions, to represent the real world, and nobody can tell you exactly what that is.
It's really hard to do in practice.
Yes, you can train an ML model on rendered data, but the model tends to fixate on rendering artifacts, and performance doesn't transfer to real world images. Plus, it's very difficult to make generated scenes with the variety and complexity of the real world. Your trained model will fail to generalize to all the distractions in natural scenes.
Yes, there are techniques for all these problems, but none of them are good, reliable, or easy to get right.
Yes, I think it should be possible without any technological hurdles. It's just some work to set it up.
(DL models are trained with one type of camera and used with another type of camera all the time, and that's sort of similar; plus renderers as used in blender3d are pretty good and they should work well with LEGO bricks which are relatively simple objects; if despite all this it doesn't work you could degrade both the real and the generated images e.g. by quantization, with the aim of bringing them closer together.)
Because the input is a 2D picture not a 3D scene with all the coordinates and geometry information.
Another neat app if you have an iphone is Brickit which scans a large pile of your lego pieces and gives you build ideas.
https://brickit.app/
This one actually works :)
Source: I'm responsible for ML development at Brickit
One question about Brickit - the main usecase I and many many dads of Lego kids have is that our kids want to reassemble sets and we spend ages searching for these pieces. Yet Brickit works by identifying and recommending its own mini-set lists. Is this a usecase that is not in scope (because of business model) or is technically difficult or unsatisfactory in execution (colors, accuracy)?
The use case "let me reassemble all the parts that I have" is out of scope of Brickit app. It works explicitly with the contents of a single scan, that's why you see small ideas to build: it's just what fits into the parts you scanned.
But! We recognize the intent, and we have something in the works which will be released very soon, stay tuned!
Great! I will follow your updates!
would love to hear more on the architecture choices you made.
do your models run on device? what's the general CV backbone?
Everything runs on device in tflite and it gives us some headaches, especially Android ecosystem.
We do not use anything fancy, detector is ssd-like, and classifier is either resnet or efficientnet depending on your device capability
I built a rudimentary brick sorting machine with the Lego color sensor many years ago, and I've seen some awesome brick sorting machines pop up over the years [1]. I started planning a sorting machine with a larger array of buckets in CAD but never got around to building it, hopefully the intrepid Lego fans over time will keep advancing the state of the art.
1. https://www.youtube.com/watch?v=04JkdHEX3Yk
Nice.
Reminds me of this: https://jacquesmattheij.com/sorting-two-metric-tons-of-lego/
I’ve tried different part detectors, and nothing has been as good as the Google Image search (download the Google App if you’re on an iPhone.) I’ve found it surprisingly good at identifying LEGO parts.
I'm curious how far away we are from a robot that can pick up LEGO pieces and stack them together according to the assembly-booklet.
And what kind of DL models would it use?
I tried to do brick sorting (because we have great detection and classification models at https://brickit.app/)
It turned out to be much more complex than I expected.
The biggest issue was grabbing. Typical approach for this type of task is to use vacuum suction actuator, but it does not work for Lego parts, because they have stubs and prevent suction from working.
Also there are issues of part separation.
We abandoned this idea, but I still hope that we can return and achieve something working some day.
But what would be the point of that, beyond the thrill of building a robot? The joy of LEGO is finding the pieces and building it yourself, not in owning an assembled model.
We're a ways away from a hand and arm that dexterous, though there's been promising developments given all the money going into AI lately. We're also a ways away from having a machine "read" a PDF of LEGO instructions and then modeling it as a digital twin. It's being worked on though, so maybe in our lifetimes.
That's an AI I'd rather not see though - that's the fun part OF LEGOs. If you really want a model completed, pay a local teenage (or laid off software dev) to do it.
> We're a ways away from a hand and arm that dexterous
So you are saying it is not possible to stack LEGO pieces by a remotely controlled arm (with a human operator)?
> That's an AI I'd rather not see though - that's the fun part OF LEGOs
AI is already claiming all the fun tasks like writing poems and drawing kittens ;)
Have you ever assembled a Lego set rated 8+ or older? It's not about stacking bricks, it's about applying just enough pressure at exactly the right angle, whilst supporting the assembly in the right places. (Something humans can do remarkably well.)
This is something you can probably program an expensive assembly robot to do if it has to repeat that same procedure thousands of times, but doing it for a whole set human operated? That only makes sense if you haven't got a set of nimble appendices and opposable thumbs, and you have the time to spare and inclination to train with the remotely controlled arm. Having an LLM do it seems fruitless at this point — and why bother?
Obviously, the reason for wanting a LEGO assembly robot is not to assemble LEGO projects, but it's to learn from the technology that makes it possible ...
No mention of license on contributed data, or the model.
related: see HN user jacquesm's work on a LEGO sorting machine
https://jacquesmattheij.com/sorting-lego-many-questions-and-...
Neat, I happpened to rediscover and sort my childhood lego collection two weeks ago, and had good success for the "no so common parts" with the brickowl camera search (I don't know what kind of engine is behind)
Does it use brickognize underneath?
RebrickNet is an abandoned project that supports only 300 parts. I recommend Brickognize [1], which recognizes all 85k Lego parts, minifigures, and sets.
Disclosure: I'm the creator.
[1] https://brickognize.com/
Are you willing to share some info on architecture and data set?
That's a weirdly minimalistic site. I mean it's nice that the front page is exactly what you want to see but I wish the About page would show some more information about the what the site can do, some examples, technology bits etc.
Sometimes a hammer can just be a hammer. I wish the whole web was so clean.
Piotr has an amazing video on the inner workings of his model https://www.youtube.com/watch?v=bzyG4Wf1Nkc
Which works really well! Thanks for a great project!
Is there any more information available on how it works?