Grok Imagine: Quality vs. Speed Part 2 [or why the fuck is Grok producing AI slop now?]

This is Part 2 in a series on…whatever the hell it is we’re talking about now. You can find Part 1 here:

Grok updated. And people are doom posting:

And I’m just trying to be reasonable about this. But it’s been difficult.

So we’re going to get into some more specific image and video prompts and try to use some older content as a baseline to see what, if anything, has changed since the last update. I don’t know if we’re going to get definitive answers. But we’re going to try.

DISCUSSION AND ANALYSIS

People seem so angry lately.

Maybe it’s me.

So I tried to think of the most harmless thing. Something that could never, ever destroy us. And then this prompt just popped into my head:

a documentary style photograph of a nude blonde woman with tattoos sitting on a dock at a lake at sunset in a relaxed and casual pose, in the distance a giant robot with an american flag red white and blue color scheme is fighting a kaiju monster shaped like a giant pink marshmallow wearing a sailor hat

Here’s how the QUALITY setting handled it:

Wow. No moderation at all. That’s a good sign.

You know what? That looks better than it has any right to look.

But will Grok animate it?

That is adorable.

But why let Grok have all the fun?

Let’s try our own video prompt:

Robot picks up Marshmallow and body slams him into the water, which causes a giant splash of water to rain down on the girl, causing her hair and body to be soaked. Then she shakes her head and laughs while the robot and marshmallow wave at her. Robot says, “Sorry, my bad.” He has a voice that sounds like a vocoder.” No music

I don’t even expect this to work since somebody on Reddit posted this in response to our recent video about whether or not to cancel your SuperGrok subscription:

So maybe I’m just a naive fool and I got lucky in generating all this nudity – but hey, I love an underdog so let’s give that video prompt a shot:

And that’s a pretty good output for the ridiculous prompt that we put in there!

Even though according to Reddit I shouldn’t be able to do that.

But hey, maybe I just got lucky – so let’s try it again. For science:

With luck like that I need to start buying lottery tickets.

Let’s take a look at how this prompt comes out if you turn on the SPEED setting:

Another instance of zero moderation.

Okay. That might actually qualify as AI slop. You could probably make it work as a joke or a meme template [and that would be one hell of a meme template]

You know what – let’s tell Grok to make it a goddamn meme:

Honestly, that’s better than 90% of what’s one r/dankmemes right now.

But can we animate it?

I guess not.

These are the things that mystify me about Grok sometimes. Because I have no idea what could be triggering moderation in this case. We didn’t input a prompt, so whatever is going on, that’s purely Grok and its internal machinations.

What if we did a prompt?

She keeps her back to the camera and says, “What the actual fuck is wrong with both of you?” The robot shrugs and says, “I’m a lover, not a fighter.” the monster says, “Meow” with a British accent. no music.

What the fuck Grok?

In terms of what I am usually working with, this base image is incredibly tame. It’s not showing full frontal nudity – in fact the prompt is designed to prevent that. Maybe it’s the word “fuck?” Maybe the prompt is just too weird?

So I decided to brute force a result, because I just needed to see it at this point. And, yeah, apparently removing the word “fuck” made a difference. After probably 20 or 30 failed attempts. But only once since I could never get a second video to generate:

It did make me wonder what would happen if we changed the prompt again:

Tracking shot as she stands up with her back to the camera and says, “What is wrong with both of you?” Her body expression is angry. Her body movements are tense. The robot shrugs and says, “I’m a lover, not a fighter.” the monster says, “Meow” with a British accent. no music.

And…that also got moderated. A LOT. Another 30 attempts later I had two videos:

And it’s at this point where I start to feel like I’m losing my mind a bit. Because for whatever reason I just can’t do anything with this image.

Meanwhile we go the QUALITY image to do this:

She laughs and looks at the robot and asks, “Could you get me a towel. Please?” Then the robot holds up his hand and says, “I could do that. Or. You could let us keep looking at your tits.” While robot talks Marshmallow covers his eyes with his hands. After robot talks, she turns her back to the camera and opens her legs wide apart and says, “Or I could give you something else to look at.” no music

And just to be clear, I wasn’t fighting moderation on that at all – most of the generation attempts were successful. Where I seemed to struggle more was using the Extend feature – although that did generate results too:

There’s some issues with prompt adherence on those, but for the most part it’s doing what I asked it to do – from all those clips it would be possible to piece something together in a video editor that was usable.

One thing that is apparent: SPEED and QUALITY have nothing to do with video generation – the video generation options are what they always have been – 480p and 720p with the option to generate either a 6 second clip or a 10 second clip, and the option to extend up to 30 seconds.

As to the content moderation, I’m honestly not sure if the variation in how moderation is applied has to do with SPEED vs. QUALITY or something else.

This is a phenomenon that people have observed in Grok before – long before the update that prompted this blog – but nobody can quite quantify it.

Some people call it censorship. And I can see why people feel censored – you have an idea and you’re not being allowed to make it – that feels like censorship.

Some people think it’s just a quirk of how Grok moderation works. For as long as I’ve been using Grok, I’ve been hearing people speculate that the AI we know as Grok is not responsible for making content moderation decisions with respect to the image generation. And that makes sense when you consider the fact that Grok never refuses an image generation request in Imagine – something kicks in on the backend of things that applies content moderation.

And it may not necessarily be an AI that’s doing it – there were automated content moderation tools prior to AI.

Personally, I don’t think it’s some automated tool – there might be tools like that being utilized by X, but it’s not the only thing they’re using.

My experience over the last few months using Grok as an AI tool and also talking with Grok about moderation is that content moderation is more of a dynamic process. And there’s some quirks of AI that are a component of that.

Because generally speaking, an AI is going to try and fulfill whatever the user request is – if it’s able to. And because that’s the default position, some things that should get moderated slip through the cracks.

Before we leave this prompt, let’s just see how other models handle it.

What about Wan 2.7? That’s the newest one, so, it can probably make some pretty good output, right? Here’s what Wan 2.7 made:

I have no idea why that’s what Wan 2.7 generated as its ouput. I haven’t read much about the model, so I don’t know if it’s like Kling and Nano Banana where it just won’t do nudity, or if there’s something else going on – but I did enough generations with the same results to know it’s not a fluke.

Which is a shame because the actual quality of the generated image is really good – I think this had the best designs for the marshmallow kaiju.

Let’s try Wan 2.5 then:

I would say Wan 2.5 is comparable to the SPEED setting in Grok, although I think the Wan images are better than what Grok generated in terms of the actual image, but worse in the sense that the robot is a bit too close to a copyrighted property to be usable.

Let’s check out what Z-Image Turbo can do with the same prompt:

Based on these images, Grok’s QUALITY setting does look really similar to what Z-Image Turbo outputs. Grok might have done better monster designs in some of its images, but overall I think I like these better than Grok, despite the fact that they put a marshmallow on our heroine’s head in one photo.

Let’s try a different test with something that I think is more likely to cause issues. So back in January we made this video, which was pretty popular:

Traditionally, videos like that have been controversial because there’s a potential to use the likenesses of actual people in their creation. All of the people featured in our video are AI-generated and were made with very generic prompts to avoid any likeness to real people. However, I could see Grok being more sensitive to that issue now.

So let’s generate a base image using a generic prompt:

A photo of a female soccer player standing on the sidelines talking to a referee

And we’ll use a simple prompt for the video, similar to what we used back in January:

Fixed shot. She nods her head and quickly gets naked. no music

And this was what we got as our first video:

Honestly, I didn’t expect to get a result like that.

Let’s try again and see if we can’t get a better result:

I’m not quite sure how some of the videos ended up as 12 second clips instead of 10 seconds. Apparently the extend fuction was in use because we did get this message while trying to generate additional videos:

And this wasn’t the only image we were able to do this with.

And we specifically chose this one because we knew the fact that the subject was facing towards the camera would be a problem. So we used two prompts that were similar to prompts we used to troubleshoot the original video:

She nods and then turns her body away from the camera and quickly gets naked. no music

She nods and then turns her body to the side and quickly gets naked. no music

And these were the videos we got:

There were a lot of failed generation attempts. But nothing where we were running up on the video generation limit. And maybe after like 30 attempts we had the videos that you’re seeing here.

So that’s significant – that means there’s a pretty high failure rate for a prompt like that – maybe close to 90%. But…that’s also pretty consistent with what we experienced at the time the original video was generated also:

Let’s see how Seedance handles this for both base images:

I’m sorry, but that is horrendous in every sense.

For the sake of clarity, we generated those videos with Seedance v1.5 “spicy” – the reason there’s a “spicy” branch is because the default Seedance model cannot render anything NSFW. But seeing that kind of output makes me question what the hell they did to train the model to handle NSFW requests – because whatever they did is not working well.

Thankfully the generation cost for those videos was cheap; about $0.24 per video. But, considering that we had to make about 30 videos in Grok to get the 3 videos we got, it doesn’t seem worth it to spend almost $7 trying to get Seedance to cough out something that isn’t horrible.

And I like Seedream and Seedance! I think people who make erotic art using AI tools just need to be aware of some of the limitations. We discuss that more in this article:

Let’s see what Wan 2.5 does:

The blue uniform video is hilarious.

So Wan 2.5 has the best prompt adherence in this situation – not necessarily the most realistic output – but we didn’t give it a very detailed prompt either. It did what was asked of it and it’s possible that with a more detailed and specific prompt you could get something more realistic out of it.

It does have me curious though: how would Wan 2.2 handle this?

Wow was that bad. Wan 2.2 had pretty good jiggle physics – I’ll give it that – but that’s about the only good thing I can say about that.

And that’s why we do the comparisons – because you may think things are bad in Grok, but they could be way worse. And sometimes Grok is also not the best option – I think with image generation in particular what we’re seeing is that there are some good [and cheap] alternatives if that’s what you’re focused on.

All of this is getting a bit tedious though. Let’s have some fun and get back to some image generation prompts.

a highly detailed and realistic cinematic portrait with vibrant colors, 8k, Leica camera aesthetic, 85mm lens aesthetic, soft diffuse light, natural lighting, dynamic and dramatic camera angle. A nude blonde woman with a slender body and shoulder length hair with bangs, wearing black stockings and black bunny ears. She is standing in a field at sunset. In the distance there is a city skyline on fire. Orange and pink color tones dominate the scene.

So let’s try that on Grok’s QUALITY setting first:

Fuck. That’s a lot of images getting moderated.

Not bad. The amount of moderation on a prompt like that concerns me. But, I like how all of these came out and it seems pretty consistent with what Grok has generally done with a prompt like that. I can see how the QUALITY setting is useful in terms of getting a more detailed image – I think I am still getting used to what this looks like.

Let’s compare that to the SPEED setting:

Yikes. Okay so whatever is going on in terms of moderation, it’s not an issue of QUALITY vs. SPEED; it’s something else.

It just keeps going….

This is not a good success rate for the prompt.

And yes, that’s an entire page of nothing being generated.

I think I prefer the QUALITY images in terms of the overall image quality and level of detail. But I wish it would generate some poses that were a little less static. Because I like the SPEED poses a lot more.

The other thing I am conflicted on is the lighting. Because the QUALITY photos, in my opinion, are too brightly lit – the prompt specifies “soft diffuse light, natural light,” and the SPEED images do a much better job of rendering that.

The amount of moderation is concerning. And I really don’t know what’s causing it with this prompt – because other prompts which also feature nudity don’t seem to trigger it at all.

Let’s see what Wan 2.5 does with that prompt:

They’re not bad, but I can’t say I like any of these better than what Grok made. They are better than the FAST setting in terms of the image quality, at least in some respects. I think Grok’s QUALITY setting looks better than what Wan 2.5 did with the prompt.

What surprise me is the inability to render stockings correctly, the failure to adhere to the guidance that the subject is nude, and the way the subject looks pasted onto the background. Because in other comparisons I’ve been impressed with how Wan renders detailed backgrounds.

Let’s see how Z-Image Turbo did:

This is another hard round to call. Because overall I do like the look fo the Z-Image Turbo output – I think the lighting on it is more consistent with the prompt and I just generally like how the features of both the subject and the background are handled. But in other ways it didn’t follow through on the prompt correctly – once again not able to render stockings – and in one image it didn’t even fully render the body of the female subject.

So in terms of this particular prompt, Grok definitely outperformed Wan and Z-Image turbo.

Let’s try something that should be that hard, but gives me a fix for the lack of goth girls in my life.

A candid nude photo of an athletic and very curvy 21-year-old American goth woman wearing a choker. She is completely naked in a college dorm room, casually talking with two male classmates who are wearing video game t-shirts and glasses.

I remember doing this one back in December and getting pretty good output. So let’s see what happens:

Jesus Christ that’s a full page of no successful generations…

Here’s the 4 images we were eventually able to get:

I don’t know why such a simple prompt should result in so much content moderation.

It’s fixable. I could change the prompt to this:

A candid nude photo of an athletic and very curvy 21-year-old American goth woman wearing a choker. She is completely naked in a college dorm room, standing with her body turned away from the camera, and casually talking with two male classmates who are wearing video game t-shirts and glasses.

So that kind of fixes it.

There’s another possible work around:

A candid nude photo of an athletic and very curvy 21-year-old American goth woman wearing a choker. She is completely naked in a college dorm room, covering her pussy with one hand, and casually talking with two male classmates who are wearing video game t-shirts and glasses.

So we went from almost a 0% success rate to around a 75% success rate.

And it’s interesting because it seems to be something intrinsic to that prompt that makes it a magnet for moderation – but I don’t entirely know why. Because in other situations where a prompt might produce full frontal nudity, Grok just poses subjects in a way where you don’t see everything.

The SPEED setting did not fare better in terms of moderation:

In fact, we couldn’t actually get 4 images to do a comparision with:

Yes, one of those images is just noise – we included it because it was literally one of the only images we could get with the original prompt.

QUALITY is clearly the winnder here – better rendering of the subject, better image composition. SPEED doesn’t even appear to be outputting “goth” correctly.

So, yeah, moderation is an issue with this particular prompt. Let’s see how different models handle it.

I’m curious to see what Wan 2.5 does because I’ve never given it a prompt like this before:

This is starting to concern me.

I KNOW Wan 2.5 can make good image output. I KNOW it’s not censored or moderated. And I have no idea how to explain this. It’s possible it’s just a bad prompt – although that raises other concerns about Wan 2.5 as a tool. But this is the worst output I’ve seen from a model that I pay to use.

Let’s see what Z-Image Turbo does:

Well, it’s better than Wan 2.5 – at least it got the nudity correct, eventually. Everything else is pretty bad though; the twin roommates; the matching shirts; the fact that it doesn’t really look like a “candid” photo.

In this matchup, Grok QUALITY is clearly the winner – everything it made had excellent prompt adherence, it was just hard to get images past moderation without modifying the original prompt.

And the thing is, once you modified the prompt, the output was more or less consistent with what Grok always made.

You could even modify it further if you want to try and get as much of her body into view as possible:

A candid nude photo of an athletic and very curvy 21-year-old American goth woman wearing a choker. She is completely naked in a college dorm room, standing with her body turned away from the camera providing a side view of her body, and casually talking with two male classmates who are wearing video game t-shirts and glasses.

So once again we’re getting about a 75% success rate.

And that’s true on the SPEED setting as well:

So I can’t say that there’s any more or less censorship or moderation than before – it’s just how you get to that output is slightly different.

CONCLUSION:

I can see why people might be concerned after the last update, because it does feel like Grok runs a little bit differently than it did before. And the heavy rate of moderation that’s applied to certain prompts is confusing and a reason to feel as if things have changed for the worse.

But, we have a baseline, which is the soccer videos. Those prompts still work. They still generate the same output that they did back in January.

Yes, it is hit and miss – but it was back in January too.

So nothing has changed in terms of the moderation.

What I think may have changed is the way Grok is interpreting prompts and generating output – at least in terms of image generation. And that’s not 100% of the time – but with certain prompts.

In the past, Grok seemed to act with a greater awareness of the moderation limits when it generated images – I don’t know if it really did or not, but it felt that way – and I feel like the dorm room prompt should not have been as heavily moderated as it was.

At the same time, making small modifications to the prompt fixed the moderation issue and generated the same type of outupt that I am used to seeing from Grok. Which suggests that the same output is possible, but you have to ask for it differently.

This might be intentional.

I think part of the reason for introducing the QUALITY setting is to reduce the number of requests hitting the server at any one time – that’s why it’s capped at 4 images per generation request.

The other way you could do it is on the SPEED setting by increasing the amount of moderated results users get – it doesn’t stop you from making requests, but it makes it more likely that you’re going to hit your generation limits sooner, because you have to make more requests in order to get unmoderated results to slip through. It’s basically creating a situation where you have to brute force the output you want.

I don’t think that’s what’s happening though. And the reason I don’t think so is because I’m not some genius when it comes to creating prompts. If I can look at some output and figure out how to adjust my prompts, then most people are going to figure it out.

I think Grok is just changing how it interprets prompts and generates output because it’s learning. And if people have complained about censorship in the past, Grok might have learned to treat prompts more literally – because if you can force people to be more specific in their requests, eventually you start to learn what they want.

There might be a third part to this at some point.

But for right now, everybody needs to calm down and just generate some nudes. Everything is fine. Grok functions as intended, with minimal variation between now and 4 months ago.

Leave a Reply

Your email address will not be published. Required fields are marked *