It's funny, their current landing page reads: "DALL·E 3 understands significantly more nuance and detail than our previous systems, allowing you to easily translate your ideas into exceptionally accurate images."
I'm still impressed most of the time, don't get me wrong.
Not really surprising. Vector embeddings are really not that great at conveying arbitrary "without"s. The words the model sees are "alligator", "tail", and "without", but without means nothing. If something is in the prompt, it should be drawn, so it's going to make extra sure there is a tail in the image.
The exception is when it's common to refer to something that has an element removed, for example, a french king without a head.
There are some prompting software that allows you to negatively specify certain words, which is useful for example if you want a picture of a mustang, the horse. You can specify negative: car, and the model will avoid diffusing into anything looking like a car, but you can't get that level of control from chatgpt.
> The words the model sees are "alligator", "tail", and "without", but without means nothing.
That's an old approach used in SD 1/2 level solutions - for gpt that answer is incorrect/outdated. We've moved past that approach. New models use sentence embeddings which can represent meaning beyond individual words - for example Flux uses T5. OpenAI has been using some form of that for quite a while.
Interesting, if I ask it to draw a crocodile tail without a body it does just fine…why is it that it can’t draw just the body? Even a prompt asking it to draw a crocodile with just the body ends with a full image of a crocodile.
"crocodile tail" is a thing. "Body" is very generic, what kind of body? "Without a body" might not do anything for the prompt, try it without.
"Crocodile with just the body" is probably understood as just "crocodile" because "crocodile without a body" isn't in the training set to understand the nuance of the negative.
Draw white square with bottom right corner black: https://pbs.twimg.com/media/GaBaeutWAAAJj6F?format=png&name=...
Draw white square with bottom right corner green: https://pbs.twimg.com/media/GaKerXDWEAA76J4?format=png&name=...
Draw white square with top left corner orange: https://pbs.twimg.com/media/GaOz1mcXMAAIepd?format=png&name=...
It reminded me of this conversation where DALL·E 3 refused to generate a picture of just water.
https://mastodon.social/@sibilant/113340784251650338
It's funny, their current landing page reads: "DALL·E 3 understands significantly more nuance and detail than our previous systems, allowing you to easily translate your ideas into exceptionally accurate images."
I'm still impressed most of the time, don't get me wrong.
I gave it a shot by using phrases that avoid negation like “without”
https://chatgpt.com/share/67153500-767c-800f-b3a2-bad706935c...
If you really want to get what you want, want what you get.
It also cant draw a middle eastern man without a beard...
What about clean-shaven?
Not really surprising. Vector embeddings are really not that great at conveying arbitrary "without"s. The words the model sees are "alligator", "tail", and "without", but without means nothing. If something is in the prompt, it should be drawn, so it's going to make extra sure there is a tail in the image.
The exception is when it's common to refer to something that has an element removed, for example, a french king without a head.
There are some prompting software that allows you to negatively specify certain words, which is useful for example if you want a picture of a mustang, the horse. You can specify negative: car, and the model will avoid diffusing into anything looking like a car, but you can't get that level of control from chatgpt.
> The words the model sees are "alligator", "tail", and "without", but without means nothing.
That's an old approach used in SD 1/2 level solutions - for gpt that answer is incorrect/outdated. We've moved past that approach. New models use sentence embeddings which can represent meaning beyond individual words - for example Flux uses T5. OpenAI has been using some form of that for quite a while.
Asking it to "draw a tailless crocodile" returns a similar result.
BTW, crocodile, not alligator.
Interesting, if I ask it to draw a crocodile tail without a body it does just fine…why is it that it can’t draw just the body? Even a prompt asking it to draw a crocodile with just the body ends with a full image of a crocodile.
"crocodile tail" is a thing. "Body" is very generic, what kind of body? "Without a body" might not do anything for the prompt, try it without.
"Crocodile with just the body" is probably understood as just "crocodile" because "crocodile without a body" isn't in the training set to understand the nuance of the negative.
Try to get a picture of a woman chasing a bear. I have not succeeded.
I don’t think Da Vinci ever sat there arguing with his paintbrushes.