DeepSunk Costs

Craig DalzellI don’t know how closely you’ve been following the developments in Generative Artificial Intelligence (GAI) lately but it’s been The Next Big Thing in the tech sector for the past few years. Even if you’ve gone out of your way to try to avoid it, it’s being crammed down your apps just as soon as the companies behind them can update them. You’ll have noticed your internet search engines adding “AI summaries” instead of giving you links to the website you want. The call centres you’re trying to navigate through have replaced overworked and underpaid scut-workers with AI chat bots that on a good day eventually pass you through to one of the remaining scut-workers and on a bad day it’ll break in ways that would be hilarious if not for the fact that these bots are being pushed into mission-critical roles too. You might even have noticed the news that Meta wanted to introduce GAI bots that would set up fake profile pages, post fake posts and then be responded to by fake comments from other bots – all in the name of harvesting ad revenue. That plan lasted less than a week, but will be back as soon as we’re distracted by something else.

Paradigm Shift

The big environmental problem (I’ll get back to the big social problems with these bots in a bit) is that the process of creating one of these GAIs involves a huge amount of both data and computational ability. Data on the order not just of “everything on Wikipedia” but more like “every single written word humanity has ever produced and even then we could eat more”. And computational power on the order of 30,000 graphics cards to train the current crop of bots and over a million GPU cards to train the next generation of bots. And the power demand of running all of those chips is so huge that some of these companies are seriously considering spending billions to buy entire nuclear power plants to run them.That is, until last week, when the entire business model of these companies went off a cliff. Chinese company DeepSeek released their first model of a GAI which promises similar performance to the current best bots but not only was it purportedly trained on just a few thousand chips, it was done for a fraction of the cost. The worst part of this all – from the techbro point of view – isn’t even that DeepSeek likely has data security implications even worse than TikTok, or that the programme already shows signs of having a Government-approved sanitised view of the world, but that the model itself was released open source meaning it kicked the legs out of the possibility of the techbros being able to profit from their own AIs. It is already the most downloaded app on the Apple store. The market has...reacted. Over $500 billion lost from the value of Nvidia – the company that has a near monopoly on the graphics cards companies use to train their bots – and similar amounts lost throughout the Techbro sector. Falls too in the stocks of energy companies and the price of oil as energy companies saw their massive future bills slide away. Geopolitical impacts as the US – host of most of the techbros – is spurred even faster towards its Trumpist trade war with China.From what I can see from this still early stage is that there will be two main implications. One is that AI training will just get more efficient and the tech companies will now try to do ten times the computing with their million chips. The other is that the business model behind these AIs may have been fatally holed below the water line. Part of the reason of DeepSeek’s efficiency appears to be that it trained itself on the output of the other AIs – essentially copying their work. If it becomes easy or even trivial to copy a billion dollar AI and release it for free, then that person who owns the billion dollar version faces some serious disincentives to even trying. If only there was some legal protection that provided rights to these companies to prevent such copying. Some kind of copy right.

Copying Our Rights

The social problem with the training of these AIs is that they soak up such vast amounts of data (again, on the order of “everything ever produced by every human”) that they eventually run out of material to eat, and in particular they run out of the material that they have PERMISSION to access especially if they don’t want to limit themselves to public domain works where the author died 70+ years ago. At least two companies – OpenAI and Meta – have openly said that they have stolen the copyrighted works of creators and fed them into their AI training models. Others almost certainly have as well. In fact, I can point at the header image to this article as evidence. I generated that with a locally installed copy of Stability AI’s Stable Diffusion (i.e. I used my own PC to generate the image rather than relying on a server elsewhere as many AI apps do – training an AI to generate images is extremely expensive but running a trained AI isn’t much different from running a graphically intensive process like a modern video game). Notice the little black bar in the bottom left. I can’t decipher the thing that looks a bit like an artist’s name (GAI image generators tend to be poor at generating text) but the mark on the very left looks very much like a copyright mark (see blown up version below). Others have noted similar phenomena as if this particular AI was fed using the watermarked versions of entire commercial photograph databases.I have no idea if the image I generated is a direct rip off of a particular living artist (indeed, my “prompt” specifically said to produce an image “in the style of” an artist I knew to be dead long enough for their works to be public domain) but the AI has evidently decided that so many of the images it has “seen” before included a copyright notice in the bottom left corner then so should the image it served to me.This is a serious problem for creatives. Not just images, but music and words are being fed into the maws of these AI models. It’s even an existential risk for organisations like ours. Take, for example, those search engine summaries. Let’s say you ask your search engine “What does Common Weal think about rent controls?”. A traditional search engine might serve you a link to one of our policy papers on the topic. Maybe you like the paper so much that you decide that you should support us to create more like it. However, an AI search engine replacement app might have eaten that paper and instead of giving you a link, would just regurgitate a summary of the paper. Maybe it’ll “hallucinate” and you’ll get a very wrong idea about our thoughts on rent controls. Or maybe it’ll be accurate, but it still only gives you a summary and not a link back to the original. Either way, you don’t then see our other papers or our newsletter or, crucially, you don’t see our donate button. In that way, the AI steals not just our work, but our future revenue stream. If the AI served you an advert along with the summary, the company maybe even profited from it.So what is the UK Government planning to do about AI companies who have profited from this unprecedented scale of theft of humanity’s creative endeavour?According to their active consultation on the topic – they’re planning to legalise it.The UK Government is currently running a public consultation (deadline 25th February) where they are asking for views on their proposal to get rid of the problem of AIs training on copyrighted material by simply making it legal for them to do so. Under their scheme, if you create something (an artistic image, some music, an opinion piece for a think tank) and claim copyright on it, then an AI developer wants to take the work to feed their AI then you will have the right to approach the company and negotiate a licence agreement. The small flaw in this plan is that if they just take the work without asking (as they currently do) then you need to find out that they have (not all developers disclose the sources of their data though I have found Common Weal material in the datasets of at least a couple of the ones who have) and then you, a poor individual (or not-for-profit think tank), have to negotiate with a billionaire’s megacorporation. This proposal clearly favours the largest media companies (it’s relatively easy for Disney or Bloomberg to slap a “no AI scraping” notice up and then enforce it) but even in the best case scenario, individual creators get screwed. We get doubly screwed if the AI Developers are also our social media platforms – Meta, X and others have already quietly changed their user agreements to say that using their platform IS permission to harvest your work. We get screwed if someone else shares our work to one of those platforms (Meta has already trained its AI on a database of pirated books). There are also no regulations in the proposal for how much a “fair” licence should be – as any musician on Spotify how well that works for them.We’ll try to get our response to the consultation published before the deadline but I would like the UK Government to stop pandering to these billionaires and simply enforce copyright law when they break it on a mass scale. I’m old enough to remember the early 2000s when music pirates who uploaded copyrighted tracks at a commercial scale (even the largest of those sites will be smaller than these AI databases) were sued for sums of around $10,000 per file. That might be a good start for those companies who have already breached copyright. We also need a commitment from the companies that they will actually respect copyright notices in full (the consultation asks if it’s enough for artists to include a robot.txt file on their website to ward off scrapers, which of course it isn’t) and that they publish not just a full list of all of their data sources with creators, but a copy of the licence that gives them permission to use that work (this could be as complicated as a bespoke contract or as simple as noting that the work is released under Creative Commons-With Attribution and therefore can be used and manipulated even for commercial purposes).It also raises the question of how creators share their own work at all. Common Weal has always been pretty permissive about our work being used by others because our core principle of “All of Us First” means that we think our work should be available to all equally (this is why we don’t use subscription paywalls on our newsletter or promise perks like early access for donors – the absolute closest we have to any of that is an occasional email that goes out to donors but not non-donating subscribers to thank them for their support but that never contains a link to anything that no-one else could see). But it was never our intent that our work should be fed to an AI and regurgitated in a way that would actively harm us and other creatives. So do we need to pull our work under a more restrictive copyright, or even just make a statement denying permission for AI training? I’ve already changed the copyright statement on my personal blog (where my work is licenced CC-BY-NC) to say that I do not grant permission even for non-commercial AI training but under the current system I know I have no practical way to enforce my rights and under the Government’s proposals, those rights would be weakened even further.The only note of solace in all of this is that DeepSeek has not just challenged the AI companies and potentially popped their investment bubble but because it appears to have breached their rights too by copying their pre-trained models, it suddenly has the other AI companies interested in copyright law. But unless the Government decides that the rights of All of Us are worth just as much as the rights of half a dozen multi-billionaires, I can see us all getting screwed again anyway. The UK Labour Government must decide who they’re for. The millions of underpaid, unsung and exploited creatives who enrich the collective soul of humanity with their work? Or a few billionaires who are trying to legally steal that work to enrich themselves?

Previous
Previous

Is it time to ban smartphones?

Next
Next

The Dragons Ate Your Lunch