While it is impressive and I like to follow the advancements in this field, it is incredibly frustrating to listen to. I can't put my finger on why exactly. It's definitely closer to human-sounding, but the uncanny valley is so deep here that I find myself thinking "I just want the point, not the fake personality that is coming with it". I can't make it through a 30s demo.
We're used to hearing some kind of identity behind voices -- we unconsciously sense clusters of vocabulary, intonation patterns, ticks, frequent interruption vs quiet patience, silence tolerance, response patterns to various triggers, etc that communicate a coherent person of some kind.
We may not know that a given speaker is a GenX Methodist from Wisconsin that grew up at skate parks in the suburbs, but we hear clusters of speech behavior that lets our brain go "yeah, I'm used to things fitting together in this way sometimes"
These don't have that.
Instead, they seem to mostly smudge together behaviors that are just generally common in aggregate across the training data. The speakers all voice interrupting acknowledgements eagerly, they all use bright and enunciated podcaster tone, they all draw on similar word choice, etc -- they distinguish gender and each have a stable overall vocal tone, but no identity.
I don't doubt that this'll improve quickly though, by training specific "AI celebrity" voices narrowed to sound more coherent, natural, identifiable, and consistent. (And then, probably, leasing out those voices for $$$.)
As a tech demo for "render some vague sense of life behind this generated dialog" this is pretty good, though.
To be fair, the majority of podcasts are from a group of generic white guys, and they almost sound identical to these AI generated ones. The AI actually seems to to do a better job too.
I did the best fast research I could given not wanting to spend more than 20 minutes on it and came to this result (aprox): - Mixed/Diverse: 48.0%
- White Men: 35.0%
- Women: 8.0%
- Non-White: 6.0%
- White Woman: 2.0%
- Non-White Woman: 1.0%
Whether this stops at the uncanny valley or progresses to specific "AI celebrity" voices, I'm left thinking the engineers involved in this never stopped to think carefully about whether this ought to be done in the first place.
I think their main target is corporate creative jobs. Background music to ads/videos/etc. And just like with all AI, they will eat the jobs that support the rest of the system, making it a one and done. It will give a one time boost, and then be stuck at that level because creatives won't have the jobs that allowed them to add to the domain. In this case new music styles. New techniques. It's literally eating the seed corn where the sprouts are the creatives working in the boring commercial jobs that allow them to practice/become experts in the tools/etc that they then build up it all. Their goal is cut the jobs that create their training data and the ecosystem that builds up/expands the domain. Everywhere AI touches will basically be 'stuck using Cobol' because AI will be frozen at the point in time where the energy infusing 'sprouts' all had their jobs replaced by AI and without them creating new output for AI to train on it's all ossified.
We are witnessing in real time the answer to why 'The Matrix' was set when it was. Once AI takes over there is no future culture.
Assuming you are right and that we will miss a generation of creatives and AI keeps making crap, why can't the creative field regrow. AI won't remove creativity from human genes.
As people get fed up with AI generated crap, companies will start to pay very good money to the few remaining good human creatives in order to differentiate themselves. The field will then be seen as desirable, people will start working hard for to get these jobs, companies will take apprentices hoping they will become masters later, etc... We may lose a generation, but certainly not the entire future.
Of course, it is just one of many possible futures, but I think the most likely if you take your assumptions as a postulate. It may turn out that AIs end up not displacing creative jobs too much, or going the other way, that AIs end up being truly creative, building their own culture together with humans, or not.
> It's literally eating the seed corn where the sprouts are the creatives working in the boring commercial jobs that allow them to practice/become experts in the tools/etc that they then build up it all.
This is a big problem that needs to be talked about more, the endgoal of AI seems to be quite grim for jobs and generally for humans. Where will this pure profit lead to? If all advertising will be generated who will want to have anything to do with all the products they’re advertising?
Agreed. To me it sounds like bad voice-over actors reading from a script. So the natural parts of a conversation where you might say the wrong thing and step back to correct yourself are all gone. Impressive for sure.
Totally agree. Maybe it’s just the clips they chose, but it feels overfit on the weird conversational elements that make it impressive? Like the “oh yeahs” from the other person when someone is speaking. It is cool to see that natural flow in a conversation generated by a model, but there’s waaaay too much of it in these examples to sound natural.
And I say all that completely slackjawed that this is possible.
I'd love to see stats on disfluency rate in conversation, podcasts, and this sample to get an idea of where it lies. It seems like they could have cranked it up, but there's also the chance that it's just the frequency illusion because we were primed to pay attention to it.
I love the technology, but I really don't want AI to sound like this.
Imagine being stuck on a call with this.
> "Hey, so like, is there anything I can help you with today?"
> "Talk to a person."
> "Oh wow, right. (chuckle) You got it. Well, before I connect you, can you maybe tell me a little bit more about what problem you're having? For example, maybe it's something to do with..."
That's how the DJ feature of Spotify talks and it's pretty jarring.
"How's it going. We're gonna start by taking you back to your 2022 favorites, starting with the sweet sounds of XYZ". There's very little you can tweak about it, the suggestions kinda suck, but you're getting a fake friend to introduce them to you. Yay, I guess..
Hmm.... Scottish, Welsh, Irish (Nor'n) or English? If English, North or South? If North, which city? Brummie? Scouse? If South, London? Cockney or Multicultural London English [0]?
Need to increase your granularity a bit. I live in Wexford Town, Ireland, and the other day I was chatting to a person that told me their old schoolmates from Castlebridge are making fun of their accent changing since moving from their hometown.
When people outside the British isles (esp. Americans) say "British accent", they almost invariably mean (British) English, and usually the "received pronunciation" accent that British media generally uses.
They do not mean Irish or Scottish accents; if they did, they would have said exactly that, because those accents are quite different from standard (British) English accents. So different, in fact, that even Americans can readily tell the difference, when they frequently have some trouble telling English and Australian accents apart.
Also, to most English speakers, "English accent" doesn't make much sense, because "English" is the language. It sounds like saying a German speaker, speaking German, has a "German accent". Saying "British accent" differentiates the language (English, spoken by people worldwide) from the accent (which refers to one part of one country that uses that language).
When I got to the bit where they referred to the smaller training set of paid voice actors, that hit it for me. It certainly sounds like they are throwing the 'um' and 'ah's in to a script - not naturally.
If I turn the volume down to the point that I only hear the cadence/rhythm of the voices, but can no longer make out the words, it sounds like any, “This Week in…” podcast.
That could just be the context though. Listening to a clip that's a demo of what the model can produce is very different to listening to a YouTube video that's using the model to generate speech about something you'd actually want to watch a video of.
Probably because you're expecting it and looking at a demo page. Put these voices behind a real video or advertisement and I would imagine most people wouldn't be able to tell that it's AI generated at all.
It'd be annoying to me whether it was AI or human. The faux-excitement and pseudo-bonhomie is grating. They should focus on how people actually talk, not on copying the vocal intonation of coked-up public radio presenters just back from a positive affirmation seminar.
Yeah... It isn't that it doesn't sound like human speech... it just sounds like how humans speak when they are uncomfortable or reading prepared and they aren't good at it.
> Example of a multi-speaker dialogue generated by NotebookLM Audio Overview, based on a few potato-related documents.
Listening to this on 1.75x speed is excellent. I think the generated speaking speed is slow for audio quality, bc it'd be much harder to slow-down the generated audio while retaining quality than vice versa.
While it is impressive and I like to follow the advancements in this field...
Please don't think that I'm trying to suggest... anything . It's just that I'm getting used to read this pattern in the output of LLMs. "While this and that is great...". Maybe we're mimicking them now? I catch myself using these disclaimers even in spoken language.
I like to preface negativity with a positive note. Maybe I am influenced in my word choice but my intent was to point out that this is a very, very impressive feat and I don't want to undermine it.
It's because it's probably trained with "professional audio", ads, movies, audiobooks, and not "normal people talking". Like the effect when diffusion was mostly trained with stock photos.
It's due to the histrionic mental epidemic that we are going through.
A lot of people are just like that IRL.
They cannot just say "the food was fine", it's usually some crap like "What on earth! These are the best cheese sticks I've had IN MY EN TI R E LIFE!".
I think I put my finger on exactly why it sounds a bit uncanny-valley: it sounds like humans who are reading from a prepared 'bit' or 'script'.
We've all been on those webinars where it's clear -- despite the infusions (on cue) of "enthusiasm" from the speaker attempting to make it sound more natural and off-the-cuff -- that they are reading from a script.
It's a difficult-to-mask phenomenon for humans.
That all said, I actually have more grace for an AI sounding like this than I do for a human presenter reading from a script. Like, if I'm here "live" and paying attention to what you're saying, at least do me the service of truly being "here" with me and authentically communicating vs. simply reading something.
If you're going to simply read something, then just send it to me to read too - don't pretend it's a spontaneously synchronous communication.
But what's the end goal and audience here? I don't believe people will resonate with robots making "um" and "ohs" because people usually resonate with an artist, a producer, a writer, a singer etc. A human layer with which people can empathize is essential. This can work as long as people are deceived and don't know there is no human behind it. If however i find out that a video is AI -generated i instantly lose interest in it. There are e.g. a lot of AI-generated architecture videos on youtube at the moment, i have never wanted to listen to one, because i know the emotions will be fake.
It's very related to LLMs. Though instead of text tokens you are working with audio tokens (e.g. from SoundStream). Then you go to audio corpus, instead of text corpus.
Is there a free (ad supported?) online tool without login that reads text that you paste into it?
I often would like to listen to a blog post instead of reading it, but haven't found an easy, quick solution yet.
I tried piping text through OpenAI's tts-1-hd, model and it is the first one I ever found that is human like enough for me to like listening to it. So I could write a tool for my own usecase that pipes the text to tts-1-hd and plays the audio.
But maybe there is already something with a public web interface out there?
Both windows and macos (the operating systems) have this built-in under accessibility. It’s worth a try and I use it sometimes when I want to read something while cooking.
There is on iOS. No ads. "Reader" by Eleven Labs. I haven't used it that much but have listened to some white papers and blogs (some of which were like 45 minutes) and it "just worked". Even let's you click text you want to jump to.
And it's Eleven Labs quality- which unless I've fallen behind the times is the highest quality TTS by a margin.
We've been using this at work to get inside of our customer's perspective. It's helpful to throw eg a bunch of point-of-sale data sync challenges into Notebook LM and eg pass a 10 minute audio to the team so they can understand where our work fits in.
I’ve cut and pasted weeks of Slack conversations into NotebookLM and it was quite entertaining to then listen to a Podcast talking humorously about all the arguments in the #management channel.
Almost all of the results will not consist of 'jazz' in any real sense, but instead a collection of uncanny melodies and chord progressions that wonder around going nowhere, traditionally accompanied by an obscenely eye-offending diffusion model-generated mishmash of seasonal tropes and incongruent interior design choices. Often, it's MIDI bossa nova presumably written by either a machine or someone who's only ever heard a few bars of music at a time and has no idea that 'feel' or 'soul' are a thing.
Because when I search "jazz" on YT I'm just getting legit music videos and jazz playlists -- stuff like Norah Jones, top 100 jazz classics playlists, etc.
But I assume that search results are personalized.
Seems to be genuinely popular. I can see why, but when there's so much 'real music' out there, why not just listen to that and enrich yourself instead of bathing in fake nonsense? If you want jazz, just put on a jazz album — hell, even Kind of Blue will do.
I'm not sure all (or any) of it actually is AI though. I assume that's coming very soon, but I suspect this stuff is cynically and methodically hand-composed.
By the way: I have nothing against generative composition! Brian Eno has been doing this stuff longer than anyone else, and it's very cool. I'm sure you could make some 'generative jazz' that's actually distinctive and artistic, but this isn't it.
While it is impressive and I like to follow the advancements in this field, it is incredibly frustrating to listen to. I can't put my finger on why exactly. It's definitely closer to human-sounding, but the uncanny valley is so deep here that I find myself thinking "I just want the point, not the fake personality that is coming with it". I can't make it through a 30s demo.
We're used to hearing some kind of identity behind voices -- we unconsciously sense clusters of vocabulary, intonation patterns, ticks, frequent interruption vs quiet patience, silence tolerance, response patterns to various triggers, etc that communicate a coherent person of some kind.
We may not know that a given speaker is a GenX Methodist from Wisconsin that grew up at skate parks in the suburbs, but we hear clusters of speech behavior that lets our brain go "yeah, I'm used to things fitting together in this way sometimes"
These don't have that.
Instead, they seem to mostly smudge together behaviors that are just generally common in aggregate across the training data. The speakers all voice interrupting acknowledgements eagerly, they all use bright and enunciated podcaster tone, they all draw on similar word choice, etc -- they distinguish gender and each have a stable overall vocal tone, but no identity.
I don't doubt that this'll improve quickly though, by training specific "AI celebrity" voices narrowed to sound more coherent, natural, identifiable, and consistent. (And then, probably, leasing out those voices for $$$.)
As a tech demo for "render some vague sense of life behind this generated dialog" this is pretty good, though.
To be fair, the majority of podcasts are from a group of generic white guys, and they almost sound identical to these AI generated ones. The AI actually seems to to do a better job too.
Citation absolutely needed. You call this fair?
> the majority of podcasts are from a group of generic white guys
https://podcastcharts.byspotify.com/ keep the Pareto distribution in mind
I did the best fast research I could given not wanting to spend more than 20 minutes on it and came to this result (aprox): - Mixed/Diverse: 48.0% - White Men: 35.0% - Women: 8.0% - Non-White: 6.0% - White Woman: 2.0% - Non-White Woman: 1.0%
Whether this stops at the uncanny valley or progresses to specific "AI celebrity" voices, I'm left thinking the engineers involved in this never stopped to think carefully about whether this ought to be done in the first place.
"Surely my genAI product won't be used to spam zero-effort slop all over the internet!"
- guy whose genAI product will definitely be used to spam zero-effort slop all over the internet.
I think their main target is corporate creative jobs. Background music to ads/videos/etc. And just like with all AI, they will eat the jobs that support the rest of the system, making it a one and done. It will give a one time boost, and then be stuck at that level because creatives won't have the jobs that allowed them to add to the domain. In this case new music styles. New techniques. It's literally eating the seed corn where the sprouts are the creatives working in the boring commercial jobs that allow them to practice/become experts in the tools/etc that they then build up it all. Their goal is cut the jobs that create their training data and the ecosystem that builds up/expands the domain. Everywhere AI touches will basically be 'stuck using Cobol' because AI will be frozen at the point in time where the energy infusing 'sprouts' all had their jobs replaced by AI and without them creating new output for AI to train on it's all ossified.
We are witnessing in real time the answer to why 'The Matrix' was set when it was. Once AI takes over there is no future culture.
Assuming you are right and that we will miss a generation of creatives and AI keeps making crap, why can't the creative field regrow. AI won't remove creativity from human genes.
As people get fed up with AI generated crap, companies will start to pay very good money to the few remaining good human creatives in order to differentiate themselves. The field will then be seen as desirable, people will start working hard for to get these jobs, companies will take apprentices hoping they will become masters later, etc... We may lose a generation, but certainly not the entire future.
Of course, it is just one of many possible futures, but I think the most likely if you take your assumptions as a postulate. It may turn out that AIs end up not displacing creative jobs too much, or going the other way, that AIs end up being truly creative, building their own culture together with humans, or not.
> It's literally eating the seed corn where the sprouts are the creatives working in the boring commercial jobs that allow them to practice/become experts in the tools/etc that they then build up it all.
This is a big problem that needs to be talked about more, the endgoal of AI seems to be quite grim for jobs and generally for humans. Where will this pure profit lead to? If all advertising will be generated who will want to have anything to do with all the products they’re advertising?
this is very spot on. There are tons of artists who have a job so they can sustain their own personal creativity.
Agreed. To me it sounds like bad voice-over actors reading from a script. So the natural parts of a conversation where you might say the wrong thing and step back to correct yourself are all gone. Impressive for sure.
every step of technological advancement builds on top of the previous one.
now it's bad voice actors, in 2 years it'll be great ones
Totally agree. Maybe it’s just the clips they chose, but it feels overfit on the weird conversational elements that make it impressive? Like the “oh yeahs” from the other person when someone is speaking. It is cool to see that natural flow in a conversation generated by a model, but there’s waaaay too much of it in these examples to sound natural.
And I say all that completely slackjawed that this is possible.
I'd love to see stats on disfluency rate in conversation, podcasts, and this sample to get an idea of where it lies. It seems like they could have cranked it up, but there's also the chance that it's just the frequency illusion because we were primed to pay attention to it.
I love the technology, but I really don't want AI to sound like this.
Imagine being stuck on a call with this.
> "Hey, so like, is there anything I can help you with today?"
> "Talk to a person."
> "Oh wow, right. (chuckle) You got it. Well, before I connect you, can you maybe tell me a little bit more about what problem you're having? For example, maybe it's something to do with..."
That's how the DJ feature of Spotify talks and it's pretty jarring.
"How's it going. We're gonna start by taking you back to your 2022 favorites, starting with the sweet sounds of XYZ". There's very little you can tweak about it, the suggestions kinda suck, but you're getting a fake friend to introduce them to you. Yay, I guess..
Reminds me of the robots from the Sirius cybernetics corporation. “Your plastic pal who’s fun to be with.”
> Like the “oh yeahs” from the other person when someone is speaking.
I bet that if you select a British accent you will get fewer of them.
I'm hoping it will be a lot of Ok Guv'ner and right you ares in the style of Dick Van Dyke.
> a British accent
Hmm.... Scottish, Welsh, Irish (Nor'n) or English? If English, North or South? If North, which city? Brummie? Scouse? If South, London? Cockney or Multicultural London English [0]?
[0] https://en.wikipedia.org/wiki/Multicultural_London_English
Need to increase your granularity a bit. I live in Wexford Town, Ireland, and the other day I was chatting to a person that told me their old schoolmates from Castlebridge are making fun of their accent changing since moving from their hometown.
Castlebridge is 10 minutes away by car. Madness!
Yeah, totally agree. Here's a useful link for non-Brits, that goes into a bit more detail:
https://accentbiasbritain.org/accents-in-britain/
Also, we have yet to define precisely define what is meant by 'British'. This probably needs a "20 falsehoods people believe about..."-type article.
When people outside the British isles (esp. Americans) say "British accent", they almost invariably mean (British) English, and usually the "received pronunciation" accent that British media generally uses.
They do not mean Irish or Scottish accents; if they did, they would have said exactly that, because those accents are quite different from standard (British) English accents. So different, in fact, that even Americans can readily tell the difference, when they frequently have some trouble telling English and Australian accents apart.
Also, to most English speakers, "English accent" doesn't make much sense, because "English" is the language. It sounds like saying a German speaker, speaking German, has a "German accent". Saying "British accent" differentiates the language (English, spoken by people worldwide) from the accent (which refers to one part of one country that uses that language).
Gor blimey lad, that's the problem now innit???
Right mate
Cheeky bugger, you are
ee by gum
When I got to the bit where they referred to the smaller training set of paid voice actors, that hit it for me. It certainly sounds like they are throwing the 'um' and 'ah's in to a script - not naturally.
This is good, but certainly not yet great.
It's like their training set was made up entirely of awkward podcaster banter.
At least 83% Leo Laporte.
If I turn the volume down to the point that I only hear the cadence/rhythm of the voices, but can no longer make out the words, it sounds like any, “This Week in…” podcast.
Agreed. To be fair, I also get annoyed by fake/exaggerated expression from human podcasters.
That could just be the context though. Listening to a clip that's a demo of what the model can produce is very different to listening to a YouTube video that's using the model to generate speech about something you'd actually want to watch a video of.
Probably because you're expecting it and looking at a demo page. Put these voices behind a real video or advertisement and I would imagine most people wouldn't be able to tell that it's AI generated at all.
It'd be annoying to me whether it was AI or human. The faux-excitement and pseudo-bonhomie is grating. They should focus on how people actually talk, not on copying the vocal intonation of coked-up public radio presenters just back from a positive affirmation seminar.
It sounds like every sentence is an ad read.
Yeah... It isn't that it doesn't sound like human speech... it just sounds like how humans speak when they are uncomfortable or reading prepared and they aren't good at it.
> Example of a multi-speaker dialogue generated by NotebookLM Audio Overview, based on a few potato-related documents.
Listening to this on 1.75x speed is excellent. I think the generated speaking speed is slow for audio quality, bc it'd be much harder to slow-down the generated audio while retaining quality than vice versa.
I suppose it doesn't matter if it is a human, or a bot delivering the message, if the message is boring
While it is impressive and I like to follow the advancements in this field...
Please don't think that I'm trying to suggest... anything . It's just that I'm getting used to read this pattern in the output of LLMs. "While this and that is great...". Maybe we're mimicking them now? I catch myself using these disclaimers even in spoken language.
I like to preface negativity with a positive note. Maybe I am influenced in my word choice but my intent was to point out that this is a very, very impressive feat and I don't want to undermine it.
It's because it's probably trained with "professional audio", ads, movies, audiobooks, and not "normal people talking". Like the effect when diffusion was mostly trained with stock photos.
they all sound like valley-people, complete with the raspy voice and everything
I get the feeling that this is useful for something that someone half-listens to.
It's due to the histrionic mental epidemic that we are going through.
A lot of people are just like that IRL.
They cannot just say "the food was fine", it's usually some crap like "What on earth! These are the best cheese sticks I've had IN MY EN TI R E LIFE!".
“I’m OBSESSED with the dipping sauce. So good.”
Try it out in the demo https://cloud.google.com/text-to-speech/?hl=en and in the API https://cloud.google.com/text-to-speech/docs/create-dialogue...
If I change the language in the demo, it removes all my text and replaces it with a template text. That's bad.
I think I put my finger on exactly why it sounds a bit uncanny-valley: it sounds like humans who are reading from a prepared 'bit' or 'script'.
We've all been on those webinars where it's clear -- despite the infusions (on cue) of "enthusiasm" from the speaker attempting to make it sound more natural and off-the-cuff -- that they are reading from a script.
It's a difficult-to-mask phenomenon for humans.
That all said, I actually have more grace for an AI sounding like this than I do for a human presenter reading from a script. Like, if I'm here "live" and paying attention to what you're saying, at least do me the service of truly being "here" with me and authentically communicating vs. simply reading something.
If you're going to simply read something, then just send it to me to read too - don't pretend it's a spontaneously synchronous communication.
But what's the end goal and audience here? I don't believe people will resonate with robots making "um" and "ohs" because people usually resonate with an artist, a producer, a writer, a singer etc. A human layer with which people can empathize is essential. This can work as long as people are deceived and don't know there is no human behind it. If however i find out that a video is AI -generated i instantly lose interest in it. There are e.g. a lot of AI-generated architecture videos on youtube at the moment, i have never wanted to listen to one, because i know the emotions will be fake.
It looks like lately a lot of progress have been made in audio generation / audio understanding (everything related to speech, I mean).
Is this related to LLM, or is this a completely different branch of AI, and is it just a coincidence? I am curious.
It's very related to LLMs. Though instead of text tokens you are working with audio tokens (e.g. from SoundStream). Then you go to audio corpus, instead of text corpus.
Is there a free (ad supported?) online tool without login that reads text that you paste into it?
I often would like to listen to a blog post instead of reading it, but haven't found an easy, quick solution yet.
I tried piping text through OpenAI's tts-1-hd, model and it is the first one I ever found that is human like enough for me to like listening to it. So I could write a tool for my own usecase that pipes the text to tts-1-hd and plays the audio. But maybe there is already something with a public web interface out there?
Both windows and macos (the operating systems) have this built-in under accessibility. It’s worth a try and I use it sometimes when I want to read something while cooking.
I use ms edge for this exact use case. Works well enough on any platform
firefox does this directly. Reader mode has a headphones symbol to read webpage text.
There is on iOS. No ads. "Reader" by Eleven Labs. I haven't used it that much but have listened to some white papers and blogs (some of which were like 45 minutes) and it "just worked". Even let's you click text you want to jump to.
And it's Eleven Labs quality- which unless I've fallen behind the times is the highest quality TTS by a margin.
There's also the built-in "Speak Selection" feature you can enable in the accessibility settings.
Reader is on a pretty good path to a monthly subscription model. Great audio quality, large selection of voices, and support for long-form input text.
Good old Microsoft Sam? It'll sound like Stephen Hawking is reading it to you!
We've been using this at work to get inside of our customer's perspective. It's helpful to throw eg a bunch of point-of-sale data sync challenges into Notebook LM and eg pass a 10 minute audio to the team so they can understand where our work fits in.
I’ve cut and pasted weeks of Slack conversations into NotebookLM and it was quite entertaining to then listen to a Podcast talking humorously about all the arguments in the #management channel.
The voices are impressive (I can't tell the difference as a non native speaker) but their "personality" sounds extremely annoying lmao
I know. Can they do anything other than obnoxious Californian? The vocal fry is off the charts.
ah.. so "frontier" is the new buzzword that keeps the corporate board invested in this dead end?
frontier garbage.
> This means it generates audio over 40-times faster than real time.
Astounding
Is this another fake like the Google bot that made reservations at a restaurant?
YouTube videos are already infested with insufferable AI elevator background "music". Even some channels that were previously good are using it.
On the bright side, you can stop watching these channels and have more time for serious things.
> AI elevator background "music".
What are some examples? I haven't encountered this.
Just search 'jazz' on YouTube.
Almost all of the results will not consist of 'jazz' in any real sense, but instead a collection of uncanny melodies and chord progressions that wonder around going nowhere, traditionally accompanied by an obscenely eye-offending diffusion model-generated mishmash of seasonal tropes and incongruent interior design choices. Often, it's MIDI bossa nova presumably written by either a machine or someone who's only ever heard a few bars of music at a time and has no idea that 'feel' or 'soul' are a thing.
Can you post a link to some like that?
Because when I search "jazz" on YT I'm just getting legit music videos and jazz playlists -- stuff like Norah Jones, top 100 jazz classics playlists, etc.
But I assume that search results are personalized.
Lucky you!
Sure. I just tried in private browsing mode, and got mostly the same. Here are a few of the very first results I get for 'jazz':
https://www.youtube.com/watch?v=xhL3Cb740VY
https://www.youtube.com/watch?v=8UXFapv_kFI
https://www.youtube.com/watch?v=nKNnzbi-v9E
https://www.youtube.com/watch?v=ABmQvH5K75w
https://www.youtube.com/watch?v=-jgEswq9ZlI
Some are worse than others.
That's wild. Thanks for the links. Infinite AI Muzak I guess? I suppose it was inevitable.
Seems to be genuinely popular. I can see why, but when there's so much 'real music' out there, why not just listen to that and enrich yourself instead of bathing in fake nonsense? If you want jazz, just put on a jazz album — hell, even Kind of Blue will do.
I'm not sure all (or any) of it actually is AI though. I assume that's coming very soon, but I suspect this stuff is cynically and methodically hand-composed.
By the way: I have nothing against generative composition! Brian Eno has been doing this stuff longer than anyone else, and it's very cool. I'm sure you could make some 'generative jazz' that's actually distinctive and artistic, but this isn't it.
To paraphrase the great Bertram Gilfoyle, computers don't need to produce fake vocal tics.