> In 2021, the Canadian agency ISED (Innovation, Science and Economic Development Canada) recommended three approaches to the question:
> 1. Ownership belongs to the person who arranged for the work to be created.
> 2. Ownership and copyright are only applicable to works produced by humans, and thus, the resultant code would not be eligible for copyright protection.
> 3. A new "authorless" set of rights should be created for AI-generated works.
It seems obvious to me that the answer should be #1. An artist who creates works out of random paint splatters (modern art!) didn't purposefully choose the locations of their paint marks. However, they still own the copyright because they arranged for the creation of the work. You would never argue that a piece like this is uncopyrightable.
Using that analogy, you are stochastically sampling from other people's splat mark art, and guiding it towards something that resembles the prior work.
I'd like to take a moment to emphasize that whether something is copyrightable and whether it is infringing are actually separate considerations. Both are social constructs the same as which side of the road one drives on, and it makes sense for us to define them in some way that is "fair", whatever that means.
Copyright implies creativity. The creator owns the copyright, expect for in works for hire. But the thing has to be created. An ai doesn't have creativity so it can't own a copyright. You could argue that the person who created the prompt owns the copyright. And you may have a valid point based on how much creativity is required.
TBH i do not see "prompts" to be copyrightable. For one, they're incredibly short (e.g. most image generators will output something for "a green field" - who owns that prompt?). For another, they're incredibly reliant on pretty much everything else around the system: the same prompt can produce wildly different results based on different algorithms, models, weights, etc. And let's not forget that not everything uses a "prompt".
Prompts are really just a UI (with a bad UX IMO, it just happens to be impressive at a first glance to "talk with the computer"), trying to copyright a prompt is akin to trying to copyright a series of clicks and keypresses in an image editor to produce some effect (and just like a "prompt", you'd easily get different results depending on where you apply that effect, what program/version you are using, etc).
> trying to copyright a prompt is akin to trying to copyright a series of clicks and keypresses in an image editor to produce some effect
But, isn't that copyrightable? Like, if I draw a pixel art sprite, that's just a series of clicks.
No, the copyrightable part is the pixel art sprite, not the way you made it.
...I don't see how this is different from saying "the copyrightable part is the AI-generated image, not the way you made it".
You might argue "well you didn't make the AI image", but if not for me doing something it wouldn't exist, so we're back to discussing process.
I already explained why it is different in my original post: the prompts are UI (and IMO a bad one at that). UI interactions are not copyrightable - you could copyright some sort of program/script akin to automation, but what would be copyrightable is the program/script itself, not the actions it takes.
You doing "something" doesn't mean that "something" is copyrightable - or that it should be (if anything, less things should be).
I don't mean to go in circles, but I think that applies to pixel art! You made the pixel art via a series of UI interactions, nothing more and nothing less.
You said that for pixel art "the copyrightable part is the pixel art sprite," but then couldn't I say the copyrightable part of an AI-generated image is the generated image? Again, both of these creations were made via UI interactions!
If UI interactions aren't copyrightable, then practically no digital art is copyrightable, because all of it can be recreated by a series of UI interactions. I'm not understanding how a digital drawing and an AI drawing are different under your framework.
> The creator owns the copyright, expect for in works for hire.
The exception is actually more narrow than it sounds too. Work for hire requires that the work be made as part of employment or that the work was both commissioned for a handful of carveouts and there was a contract in place agreeing that it is considered a work for hire.
So if you commission a piece of art for yourself, the owner of the copyright is the artist. Also my understanding is that while the artist can transfer copyright, they can at any point take it back from you.
[deleted]
1) is an utter minefield, though. If you create an artwork using Adobe Photoshop, you have the copyright. If you use Minecraft, apparently Microsoft own it. So where does the prompt engineer lie? No-one knows.
Microsoft owns the copyright to everything created in Minecraft?! That's news to me. Is it in the Terms of Service or something?
I would very much hope that wouldn't hold up in court.
It’s mentioned in the article, there’s case law.
I read the article again, what am I missing? I'm sorry, I know I must be skipping over something somehow but I don't know what!
They completely forgot a 4th option: it belongs to the people who have copyright on the training data.
If I take someone's book, painstakingly replace all words with their synonyms and publish it as mine, I am getting sued. Same if I do some other mechanical transformation meant to obscure the true original work.
There's no AI, there's just statistical models of existing work.
If you take a bunch of books. Read then. Then write your own story with hints from each of the books, then who owns the copyright? Yes if you take a single work and replace every word with a synonym, that works be infringement. But that isn't what an ai does
Let me reiterate, there is no AI, there are only statistical models. It doesn't matter if you make the math too complex for a human to make sense of or not, it's just a remix of existing copyrighted material.
Your argument forgets why copyright even exists, it's some a law of nature, it's a rule created to protect people who invest time and effort into creating. AI even if it existed would not be a person, it would be code created by a person (or more likely a corporation) to serve its goals.
I disagree, it does matter if the math is indistinguishable from human transformative work.
What matters is the result, not the process.
It’s hard to swallow with AI, but there is no other solution in my view. If we start using the process as a criteria for copyrighting, then we have chaos. Indeed, you cannot prove which process was used after the fact, especially not with AI.
And the reverse is particularly problematic: anyone could then allege some human work is in fact AI and hence cannot be copyrighted. We’re breaking the copyright system if we focus on the process.
Another example: if you ask an AI to create something “in the style of xxx”, then the result may be something that is infringing copyright, but so would be humans work producing tbd same output.
In the end, what matters is the output, not the fact that it’s math or a human. We’re also just a bunch of atoms in the end, one could argue, very similar to a very complex mathematical model…
> Indeed, you cannot prove which process was used after the fact, especially not with AI.
Citation needed.
> anyone could then allege some human work is in fact AI and hence cannot be copyrighted
I didn't say that. I said the copyright should belong to the original author. If some new work is provably based on old work to a substantial degree, it's plagiarism. Same as now. Indeed, the tool used does not matter in this case.
And we know today's LLMs produce code based on copyrighted original work because they readily admit they scraped everything available to them, regardless if it was proprietary, copyleft or public domain. Older versions literally produced entire functions copy-pasted from GPL-licensed Quake code. They "patched" that now to be less blatant (mix more) but it does not make it right or legal.
---
Now imagine a rogue employee of Google trains an LLM on all of Google's proprietary code (and only Google's code) and releases it publicly under, say, Apache 2.0. Can you imagine Google saying it's OK and not suing?
What if that person also got his friends from Microsoft, Oracle, Apple and Amazon to pool their companies' source codes? Would that be enough mixing? Clearly the LLM would only be regurgitating code that belongs to one of these companies.
What if instead somebody scraped only AGPL code and released the model under Apache 2.0? Clearly whatever comes out of the model is in its entirety based on work that is licensed under AGPL and therefore also has to be licensed under AGPL.
> Citation needed.
How do you prove whether or not someone has used AI?
> I didn't say that. I said the copyright should belong to the original author. If some new work is provably based on old work to a substantial degree, it's plagiarism. Same as now. Indeed, the tool used does not matter in this case.
I am a human. I have been shaped by the world around me. Everything I have seen and read has shaped who I am today, and it has shaped the type of work which I produce myself.
Yes, AI is trained on copyrighted work—but so am I! Nothing I produce is ever truly original. But it's different enough that it would not, and should not, be considered copyright infringement.
Now, if I accidentally reproduced an entire function from GPL-licensed Quake code, that would be copyright infringement. And humans do make mistakes like that, when they've seen the same code many times!
(Just to be clear, I am not one of those people who thinks LLMs are recreations of the human brain, I think they work differently. But we do both create output based on copyrighted input.)
---
> Now imagine a rogue employee of Google trains an LLM on all of Google's proprietary code (and only Google's code) and releases it publicly under, say, Apache 2.0. Can you imagine Google saying it's OK and not suing?
This is, in fact, one reason companies are always a bit worried when employees go to work for competitors. But luckily, we don't let Google say "you aren't allowed to work for anyone else after you've seen our source code."
> How do you prove whether or not someone has used AI?
In this case it's rather easy, they admit it officially and publicly. In other cases it might be harder, we might need a whistleblower. I am sure in some cases where it happened, it'll be impossible to prove.
Difficulty of proving it is not a valid reason for making something harmful legal.
> But it's different enough that it would not, and should not, be considered copyright infringement.
Yes, the line has to be drawn somewhere. The (typical) human brain for example has limitations how much code it can memorize and reproduce verbatim. LLMs don't have them (they're orders of magnitude higher on current hardware and models and they're essentially unlimited in principle).
> But luckily, we don't let Google say "you aren't allowed to work for anyone else after you've seen our source code."
But we also don't expect the new employee to build a competitor for one of google's services or internal tools in record time that can be measured in LOC per minute. An appropriately trained LLM certainly could.
---
Ultimately we strayed from the main point. Copyright exists to protect authors and their intentions against parasites.
If somebody's intention is to offer code for free as long as people who build on top of it also release their work for free (a slight simplification and misinterpretation of the GPL), then copyright exists to make sure that happens. That for example somebody who focuses solely on advertising (non productive zero-sum work) can't take that code and profit from it leaving the original author to eat dirt.
There's also the question of income per unit of work vs "passive" income. Many builder professions get paid only as long as they're actively putting in work. We as programmers are privileged in that our product can often run without constant work and produce positive value automatically (but we only get the continuous value out of it if we own the product, not if we produced it for somebody else).
Now, I believe that giving somebody fixed amount of compensation for building something that produces continuous value is fundamentally unfair, exploitative and abusive. Many people will no doubt disagree, must of the world has probably not ever thought about it by the looks of it.
LLMs take this to a whole new level. They can bring in enormous amounts of money (both in subscriptions to their services and in value produced by their output). Yet the people who built the training data that made it all possible don't get _any_ compensation at all.
[deleted]
No one. At least in the US, copyright requires human authorship.[1]
There are several ongoing cases, probably most prominently the Thaler v. Perlmutter case. [2]
[1] https://www.copyright.gov/ai/ai_policy_guidance.pdf
[2] https://www.copyright.gov/ai/docs/us-brief-for-appellees.pdf
My understanding of the linked report is it's saying you can't register a work as being "Copyright ChatGPT" or what have you. This is obviously stupid on a bunch of levels—just to begin with, how would "ChatGPT" sue someone for infringement? If it can't, then its copyright is meaningless.
The article is about whether the human using ChatGPT can claim copyright.
Isn’t this article trying to heavily over complicate matters? OpenAI grants you ownership of output. Why do we even need to discuss autherless rights?
> OpenAI grants you ownership of output
That’s only meaningful if it’s legally established that the ownership of the output was theirs to grant in the first place. There’s no law directly on point and precious few legal decisions in this area, so it’s still very much an open question.
My understanding from a talk by an attorney at HOPE 2024 was that AI-generated materials cannot be defended/owned under copyright.
Yes the copyright office ha already published guidance on the issue but journalists continue to skip over this as a primary source.
This varies widely by country. The US does not have "database copyright" or "sweat of the brow" copyright. See Feist vs. Rural Telephone, which was about telephone directories. This restriction comes directly from the U.S. Constitution and would require a constitutional amendment to change.[1]
The UK and EU are different. The EU allows copyrights on databases.
[1] https://constitution.congress.gov/browse/essay/artI-S8-C8-3-...
Plot twist: Nobody who is in charge should care.
Leave the no to the naysayers.
Ship your app, generate traffic, usage, income. Leave the discussions to other people.
Do that at $BigCorp and Legal will eat you alive, if not fired.
Long ago I went through the company-approved process to link to SQLite and they had such a long list of caveats and concerns that we just gave up. It gave me a new understanding of how much legal risk a company takes when they use a third-party library, even if it's popular and the license is not copyleft.
Unless you are now involved in a lawsuit that asks for a hypothetical 50% of your income for using a tech very similar to their and they speculate its been stolen and not permitted by their license and even if you know you are going to win/or that it doesn't affect you still have to spend money on the lawyers fighting it.
Commenting on this to mark it in my feed for later reference. Well said!
Ownership of software never made any sense to begin with. We should abandon such a concept as belonging to the dark ages.
On one hand I agree with you, if I can build the same software you can, I should be able to sell it all the same. On the other, if there's no copyright or similar, what stops theft of source code and an identical program with fresh branding?
> I should be able to sell it all the same
No, you should be compensated for your labor. This does not entail a market product.
How will your system compensate people who write useful code if that code isn't allowed to be an excludable market product?
So only the first company to make a search engine should be able to sell a search engine? I don't see how this stance makes any sense.
I assume op means you should be compensated through a method other than pretending software is scarce and trying to assign value to it through a system that relies on equivocating scarcity with value.
No, selling a search engine never made any sense to begin with. Services should be publicly funded and freely accessible. As a bonus we wouldn't have to put up with spam on every site on the internet.
Kagi sells a search engine and it's hugely popular. They're supported by the people who buy the software, not by ads. Not selling the software is exactly what leads to the spam you hate. In similar vein, would you say all software should be given away freely?
I'm saying services should be publicly funded and freely accessible.
Does ownership of books make sense? What is the difference between code and books? Code is translated to machine code, just like books can be translated to other languages.
You own the medium the content is contained in (for books, the paper). Not the content itself. You do not get to copy the books content and place it in another container and sell the new container as though the content within it was yours to distribute in the first place.
I own my hard drive too, not sure how that's any different than paper.
And we can't ignore that ebooks exist.
You do own your hard-drive.
That doesn't mean you own any software or ebooks obtained by ill-gotten means. Nor does it mean you can replicate and sell an application, as though it was your own, that you paid a licenses or rental fee for. But you still own your hard-drive and are free to erase it and sell it.
I fully agree. Limiting the amount of copies of software to sell them like a finite good has so many downsides:
1. There may be people who cannot use/afford some software, although there is technically an infinite supply.
2. Collaboration becomes awkward. Either all contributors give up their rights (Open Source), or one contributor holds all the rights and the rest is being treated unfairly. The latter decreases the incentives to make software modular and reusable.
3. The resulting software typically gets worse due to some copyright enforcement mechanisms. For example, no closed source software will ever have a good debugger, because that would allow viewing and changing the source code.
4. It creates a power imbalance between software owners and software users. Nearly all software has to be adapted over time, but the software owner has a monopoly on performing such adaptations. The result is enshittification, surveillance, and basically a return to feudalism where daily life is governed by a small number of overlords.
5. It is not clear how to price software fairly, and there is also little incentive to do so.
6. My impression is that high-quality software converges to formal proof, which is AFAIK not copyrightable.
For all these reasons, I think it is time to consider a world without copyright on software.
To those that worry about salaries in such a world: Negotiate payment in advance (contracts, crowdfunding, bounties, ...), or get a job where software is created as a byproduct (consultant, researcher, tester, ...).
Nothing particularly new, since none of the cases around this have concluded.
If I were to guess, I'd say the output of an LLM isn't copyrightable (it's not the creation of a human), unless it's a verbatim copy of some copyrighted training data in which case it belongs to the authors of the work(s) used in training. This creates the most annoying combination of legal problems around using it, so by Murphy's Law it must be correct!
As far as I know, the code I write for my employer it's his- it would be really funny if ChatGPT had more rights than me to ownership of the product of its work. This is, I believe, a generally accepted principle (even extending beyond what I code during working hours as long as it's connected with the main purpose of my job).
This felt like a way bigger topic (LLM copyright in general) when ChatGPT first dropped and now no one cares it seems. Are there ongoing cases for this or did something get settled that set a copyright-free precedent that I missed?
There are a bunch of ongoing cases and the US copyright office has made their position clear. We're waiting for American courts and for and European anything to say something
This is the wrong question in my mind.
The question we need a legal determination on (ideally globally consistent) is more on training data side.
Or put differently this is a "Fruit of the poisonous tree" type legal issue. Wondering about the output as the focus is all back to front
[deleted]
I already mentioned in another thread (which didn't get much discussion), but the recent EU AI Act takes into account the source material for training by essentially saying that you can train on copyrighted data unless the author opts out. The text from the AI Act is:
> General-purpose AI models, in particular large generative AI models, capable of generating text, images, and other content, present unique innovation opportunities but also challenges to artists, authors, and other creators and the way their creative content is created, distributed, used and consumed. The development and training of such models require access to vast amounts of text, images, videos, and other data. Text and data mining techniques may be used extensively in this context for the retrieval and analysis of such content, which may be protected by copyright and related rights.
> Any use of copyright protected content requires the authorisation of the rightsholder concerned unless relevant copyright exceptions and limitations apply.
> Directive (EU) 2019/790 introduced exceptions and limitations allowing reproductions and extractions of works or other subject matter, for the purpose of text and data mining, under certain conditions. Under these rules, rightsholders may choose to reserve their rights over their works or other subject matter to prevent text and data mining, unless this is done for the purposes of scientific research. Where the rights to opt out has been expressly reserved in an appropriate manner, providers of general-purpose AI models need to obtain an authorisation from rightsholders if they want to carry out text and data mining over such works.
(note that the "appropriate manner" is meant to be some machine readable way, AFAIK the way this will happen is still in works - the Act wont become law until 2026 anyway)
Under EU copyright law the machine generated output (like code, etc) cannot be copyrighted. Essentially this means that:
1. ChatGPT et al. can can train on copyrighted code, text, etc unless the authors opt out via some (machine readable) way.
2. ChatGPT et al. can then reproduce a bunch of code from whatever it was trained on, that code by itself is not copyrightable (but it can be modified and become part of a copyrighted work - think of it as combining public domain code with some other project).
AFAIK the only muddy aspect is what happens when ChatGPT (or really any AI generative algorithm) reproduces already copyrighted works without the knowledge of the user. Again AFAIK this is something that is currently being worked on.
There was an AMA on Reddit recently[0] by someone who worked on the act and answered a bunch of questions. IMO it is a great AMA on the topic (at least if you ignore the trolls that ask "why do you want to destroy EU", etc).
Also (unrelated to the above AMA) i think both UK and US are likely going towards a similar direction.
[0] https://www.reddit.com/r/ArtificialInteligence/comments/1fqm...
If i pay a consultant to write code out generally belongs to me. Why would a tool like an llm be any different if you are a paying customer. If you are on the free model shrug….
Incorrect, it clearly belongs to you if there is an agreement with the transfer of such rights otherwise you are in murky waters.
LLMs are not individuals automatically covered by copyright laws, as they are simply tools based off other (often) copyrighted work. This means that the initial copyright infringement is still a valid concern, hence these discussions.
If it was a easy and clear cut as just shrugging, the conversation wouldn't be so prevalent.
What if the contractor used an LLM?
Generally, if they never had the rights to something, they cannot transfer them to someone else.
Unless you have a work-for-hire agreement, it belongs to them. I once explained this to a client (that still owed me money) and he got pretty angry.