site title

Attribution Required

06-25-09 by . 37 comments

All the content contributed to Stack Overflow, Stack Overflow Meta, Server Fault, and Super User is cc-wiki (aka cc-by-sa) licensed, intended to be shared and remixed. We even provide all our data as a convenient data dump, seeded by us.

But our cc-wiki licensing, while intentionally permissive, does require attribution.

Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

I thought it was pretty clear what “attribution” meant, but given the semi-scammy way the content is popping up in some seedier areas of the internet, maybe not:

  • http://hiveminds.se/vote/framed/story.php?id=23472
  • http://programmingfaq.w3ec.com/

(there may be others; these are just the ones I know about)

So let me clarify what we mean by attribution. If you republish this content, we require that you:

  1. Visually indicate that the content is from Stack Overflow, Meta Stack Overflow, Server Fault, or Super User in some way. It doesn’t have to be obnoxious; a discreet text blurb is fine.
  2. Hyperlink directly to the original question on the source site (e.g., http://stackoverflow.com/questions/12345)
  3. Show the author names for every question and answer
  4. Hyperlink each author name directly back to their user profile page on the source site (e.g., http://stackoverflow.com/users/12345/username)

By “directly”, I mean each hyperlink must point directly to our domain in standard HTML visible even with JavaScript disabled, and not use a tinyurl or any other form of obfuscation or redirection. Furthermore, the links must not be nofollowed.

This is about the spirit of fair attribution. Attribution to the website, and more importantly, to the individuals who so generously contributed their time to create that content in the first place!

Anyway, I hope that clears up any confusion — feel free to remix and reuse to your heart’s content, as long as a good faith effort is made to attribute the content!

Filed under cc-wiki-dump, community, legal

37 Comments

Also, it looks like Google has put some serious smack down on that programmingfaq site. They barely appear in the Google indexes at all from what I can see (via the site: operator).

And if you aren’t in Google’s index, you cease to exist..

http://www.codinghorror.com/blog/archives/000767.html

nobody_ Jun 25 2009

Jeff, how do you suggest attribution of the data as a whole? I have distributed SQLite versions of the data dump with the following under a “copyright” section in the README file:

copyright:
the data in the dump, as well as all other files in this distribution, are distributed under the creative commons “Attribution-Share Alike 2.5 Generic” license (http://creativecommons.org/licenses/by-sa/2.5/). attribution should indicate the creator of the files is the StackOverflow user nobody_, and that the data is attributed to StackOverflow’s users and owners in whatever form they wish to be attributed.

in compliance with this license, the data contained in so-export-sqlite-2009-06.db.gz is hereby attributed to the users and owners of StackOverflow, but not in such a way as to suggest that they endorse me or my use of the data.

Is this acceptable? (Apologies in advance if the blog messes up the formatting)

nobody_ hmm, well, I think there are two cases there

1) publishing the data

2) analysis of the data in aggregate

if #1, the blog post applies. if #2, then a single link to the site and a mention of where the data came from is probably all that is necessary.

I think most of this is irrelevant until you begin publishing the underlying data.

Wow!

The ‘latest post’ on hiveminds.se is one of my questions!!!

And apparently, the people who answered my question are 1 and 2.

Exactly… I could complain about incorrect attribution at these sites, but I basically have to issue a DMCA takedown notice if I can’t find an email address to complain to, or cannot get responses to my emails.

Unpleasant all the way around.

In the end, stopping this kind of thing is impossible. Because you can’t serve a DMCA notice outside the united states. What do you do when the server is hosted on some server in China. Nothing. Not only that, but you’ve put the data up on bittorrent for everyone to download in a nice, easy (almost as easy as csv), file format for anybody to import into their own database and do as they please with. It’s a non winning battle. And I think the best we can hope for is that Google doesn’t index those sites.

>>must attribute the work in the manner specified
>>by the author or licensor

Please post your ‘manner specified by the author or licensor’ somewhere. You know either on the stackoverflow or with the dump or on the blog post where you announced the download.

Since you have not specified the manner of attribution, any format of attribution, however slim, would suffice.

Oscar Reyes Jun 25 2009

What about adding the attribution in the content it self ( haven’t seen the SO dump schema, but my guess is they are different fields )

By adding ( prepending or appending I don’t know ) the attribution in the content most of the autopublished – content will have it already.

:-/

======

In 100 yrs, people will ask: What was the most frequently asked programming question in 2008. And the answer will be

http://stackoverflow.com/questions/84556/whats-your-favorite-programmer-cartoon

:)

Ryan Fox Jun 26 2009

Why not make it easier to include this? Make a button that will generate some code that people can paste into their site that includes all of the information that you’d like them to use.

>2) analysis of the data in aggregate
>
>…if #2, then a single link to the site and a mention of where the data came from is probably all that is necessary.

Whew! I was starting to feel horrible for a second there when I read this posts title, with all my analysis and graphs and such. :P

Anyway, can you divulge how you are finding out about these copy-and-no-attribute sites?

Jeff:

I have to agree with Dave above. The Attribution portion of the cc-wiki license is reasonably clear that the you needed to make the attribution style clear.

Since you did not do this up front, any style of attribution will be sufficient until it is made clear — perhaps the next dump file should have a license.txt in it?

Boofus McGoofus Jun 26 2009

If it were my site, I’d have the cc-wiki link take you to a page summarizing the cc-wiki rules (and linking to the cc site) along with your attribution requirements. I usually want to find out how people want to be attributed around the same time that I’m trying to find out how the content is licensed.

I have been thinking of posting some of my better answers in an edited form to a blog. Since I am the author of the post I assume I don’t have to worry about the license, but I am slightly worried since other users can edit my posts and have some claim to the edited content. Would a link back to the question/answer be enough (I was planning on it anyway because I want more people to use the site)?

I agree with Dave et al, that you need to tell people clearly what copyright you’re asserting before you expect them to comply. It needs to be on every page of Stackoverflow, just as you currently have a link to the meaningless page at creativecommons.org/licenses/by-sa/2.5.

Also I think you should make it clear whether contributors have the right to republish material they have submitted.

You have built a reputation as a super nice guy who would not dream of using peoples’ freely given contributions for personal profit.

Can’t wait to see what what you do next.

BobbyShaftoe Jun 28 2009

I expected that to happen. I always wondered why being able to get a data dump of the questions was such a “hot” feature. I mean unless you are going to build one of these “scammy” sites, I don’t see why you’d want it. Sure, there might be a couple people trying to do some sort of analysis but let’s be honest, we know why most people would want it.

ha! the ‘latest post’ is still my question! I wonder if they’ll merge the new data dump. And perhaps they might put correct attribution on it now.

Hi, I am fan of stackoverflow and purchased stackoverflow.mobi in order to make your data dump available on mobile internet.
I will be complying with your rules here and will make mobile site source code publicly available.
I hope you’ll be ok with my idea.

Please let me know your thoughts.

Thanks

Is it all right if we import our own user RSS feed to Facebook? Our answers and questions will get posted there with a link back to Stack Overflow, but I don’t think “Stack Overflow” will be included anywhere, since it’s not in each RSS entry.

I’m gearing up for a new StackQL.net release in anticipation of the October data dump. Coming back to and reading this again raised a question regarding StackQL.net (and by extension StatOverflow.com/sandbox as well).

Reading over this again, it looks like StackQL technically violates item #4 for your required attribution list. We mention where the content comes from, try to link back to original questions by matching any column name in the result set that ends with PostID, QuestionID, or AnswerID, and make author names available in the results unless a user specifically excludes them from a result set by not selecting * or not selecting OwnerDisplayName.

But it doesn’t link those author names back to profiles, and that sounds like a violation. StackQL does create a link for any item in query results where the column name ends with “UserID”, so links are created; they’re just not attached to the name.

Aside from this one (minor) point I think the site does everything reasonable to follow the attribution guidelines. Even here I think I’m following the spirit of what you’re asking for, but it does fail when held up to the letter of law, so to speak. StatOverflow does even less, but I doubt you have any problems with it either.

Could you clarify the public attribution requirements in a way that makes us less legally murky? I don’t expect you’ll sue myself or Ian over the issue, but I’d like things set up in a way that would enable others to do the same kind thing if they want.

Carol Jul 14 2010

I agree. If someone takes the time to write something, then it is only fair to use them as a source versus stealing their work.

Jeff, I don’t see any positive examples of attributions mentioned in your post. This will be a good easy guide for users!

(Stumbled upon this while being frustrated looking for guidelines on CC-BY attributions from flicker!)

usually want to find out how people want to be attributed around the same time that I’m trying to find out how the content is licensed.

Anabel Dec 13 2010

I have found a lot of Chinese splogs blogs and autoblogs uses my content without any attribution. However there is solution as they mostly use rss feeds for content. It us RSS Footer plugin and it is really great.

I want to find out how people get attributed around the same time I’m trying to find out how the content is licensed.

Anone Mouse May 13 2011

http://stackmobile.com/, both the current version and current beta, aren’t following points #2 and #4 above.

Keith Nicholas Jun 9 2011

So does this mean for every little code snippet you use off stackoverflow, you need to provide attribution? because, if so, that sucks!

Hey thats very helpfull for me! best thanks
Regards Lara

THX! Thats sounds goog an was very helpfull! greetingss Cara

Thom Priemon Oct 11 2011

I was adding friends to my account i am a jazz photographer and i recieved a warning for adding people i know…how can this be possible these are people i speak to often and are friends and acssociates in the Jazz world …mostly performers i have written books on. can you help me i received a 2 day band from sending request….how else do i add my friends and associates.

So Can I sell a printed version of StackOverflow with all these attribution stuff you have mentioned here?! :)

The blog post applies. if #2, then a single link to the site and a mention of where the data came from is probably all that is necessary.

I think most of this is irrelevant until you begin publishing the underlying data.

ChrisLamont Jan 23 2012

There are several great posts on SO that don’t fit the Q&A format and have been closed and deleted. I hear that these questions are available to users with 10K rep.

How do I remix (with attribution) these questions?

I If this is the license, I don’t see where it says anything about usage in my code. if I use this stuff in my code, will it conflict with the GPL3 License?

Great job detailing about the attribution. I always wonder how the attribution should be as most places don’t add this information.

How does VMWare protect passwords?

Thanks for the information on attribution.

Question: if I just use an idea seen on stackoverflow – like for instance I noticed that one solution for a given problem (how to prevent ‘killed messages’) was to use ‘wait’ in linux after sending a ‘kill’. If I do not copy/paste the exact same solution, but instead simply use the wait command in my code (i.e the idea is used but not the exact content), then I assume no need for attribution is present – correct?


Leave a comment