I am a statistician interested in detecting potentially problematic research such as data fabrication, which results in unreliable findings and can harm policy-making, confound funding decisions, and hampers research progress.
To this end, I am content mining results reported in the psychology literature. Content mining the literature is a valuable avenue of investigating research questions with innovative methods. For example, our research group has written an automated program to mine research papers for errors in the reported results and found that 1/8 papers (of 30,000) contains at least one result that could directly influence the substantive conclusion [1].
In new research, I am trying to extract test results, figures, tables, and other information reported in papers throughout the majority of the psychology literature. As such, I need the research papers published in psychology that I can mine for these data. To this end, I started ‘bulk’ downloading research papers from, for instance, Sciencedirect. I was doing this for scholarly purposes and took into account potential server load by limiting the amount of papers I downloaded per minute to 9. I had no intention to redistribute the downloaded materials, had legal access to them because my university pays a subscription, and I only wanted to extract facts from these papers.
Full disclosure, I downloaded approximately 30GB of data from Sciencedirect in approximately 10 days. This boils down to a server load of 35KB/s, 0.0021GB/min, 0.125GB/h, 3GB/day.
Approximately two weeks after I started downloading psychology research papers, Elsevier notified my university that this was a violation of the access contract, that this could be considered stealing of content, and that they wanted it to stop. My librarian explicitly instructed me to stop downloading (which I did immediately), otherwise Elsevier would cut all access to Sciencedirect for my university.
I am now not able to mine a substantial part of the literature, and because of this Elsevier is directly hampering me in my research.
[1] Nuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp, S., & Wicherts, J. M. (2015). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 1–22. doi: 10.3758/s13428-015-0664-2
[MINOR EDITS: the link to the article was broken, should be fixed now. Also, I made the mistake of using "0.0021GB/s" which is now changed into "0.0021GB/min"; I also added "35KB/s" for completeness. One last thing: I am aware of Elsevier's TDM License agreement, and I nonetheless thank those who directed me towards it.]
Dear Chris,
We are happy for you to text mind content that we publish via the ScienceDirect API, but not via screen scraping. You can get access to an API key via our developer’s portal (http://dev.elsevier.com/myapikey.html). If you have any questions or problems, do please let me know. If helpful, I am also happy to engage with the librarian who is helping you.
With kind wishes,
Alicia
Dr Alicia Wise
Director of Access & Policy
Elsevier
a.wise@elsevier.com
@wisealic
Alicia, can you explain why you think downloading should use the API?
In my case, I can’t accept the Elsevier TDM license since its provisions are unenforceable under the UK copyright exception.
Quoting the UK government’s guidance on the TDM copyright exception:
Elsevier’s API is unworkable in my experience, often failing to work, and certainly counts as un ‘unreasonable’ restriction. In many cases the API returns only metadata in the XML, compared to the fulltext PDF I can access on the website. Simply downloading the paper via the normal web service for readers is easy – much easier than using the API.
Beyond that, you need to consider that the content served by the API is not exactly the same as that served by the web server. Under UK law I have the right to perform non-commercial TDM on anything I can read – and I can read the website.
In addition, the license agreement requires a restrictive statement about reuse of the products of TDM to be attached to any output, but the statement restricts behaviours which are permissible under UK law.
Hi Richard,
The reason that we require miners to use the API is so that we can meet their needs AND ALSO the needs of our human users who can continue to read, search and download articles and not have their service interrupted in any way. Under UK legislation, publishers can use “reasonable measures to maintain the stability and security” of their networks, and so the requirement to use this API is fully compatible with the copyright exception.
Other text miners regularly use the APIs, and I don’t believe we have received reports of the APIs only returning metadata before. How frustrating this must have been for you. I would be very happy to connect you with technical support colleagues who can provide you with assistance or answer any questions you may have.
You might find our text and data mining page and FAQs of interest: https://www.elsevier.com/about/company-information/policies/text-and-data-mining
And also this article which explains how our text and data mining services work with the UK copyright exception: https://www.elsevier.com/connect/how-does-elseviers-text-mining-policy-work-with-new-uk-tdm-law
With kind wishes,
Alicia
Dr Alicia Wise
Director of Access & Policy
Elsevier
a.wise@elsevier.com
@wisealic
Alicia,
Thank you for your reply.
My interpretation as a software engineer with 15 years experience of running web services, and that of legal scholars we have consulted, is that Elsevier’s API use requirement does not satisfy the condition of being a “reasonable measure to maintain the stability and security” of their networks. There are simpler alternatives that are less obstructive than what Elsevier has in place – rate limiting can be applied easily. Moreover, using the web interface with reasonable rate limits could not possibly impact the user experience of a site with the traffic that Elsevier’s network enjoys. If Elsevier believes that scraping with rate-limits applied impacts the experience of their other users, I challenge you to prove that it does.
Whilst it was frustrating to receive metadata-only XML, I do not consider it my responsibility to pursue improvement of your system. I have hundreds of content providers to interface with in my work, and the only commonality is that they all have a web presence that can be accessed in a browser.
By far the easiest way to address this is to use cross-publisher APIs (like crossref, pubmed, and EUPMC) in the first instance. If any of those fails (as in the case of Elsevier), or if a content provider does not provide material via any of those APIs, I fall back to the web interface download alternative. If publishers would like to encourage use of APIs, they should make their content available through the existing systems with as few limitations as technically possible, and without requiring extra publisher-specific steps to be taken.
It is a simple reality that if your API makes it harder for researchers to do their work, they will make use of their legal right to mine via responsible web scraping.
sex chat
found it
buy essay online Read This
buy cheap essays online online loans
10 top loan companies
Home Page
payday loans online
college essays online write my papers
sex chat room
best payday loan
found it for you sex chat
write my essay
buy essay online essay writing service
direct lenders for bad credit buy essay online
buy essay bad credit payday loans
best payday loans online no credit check
payday bad credit loan my paper writer
homework online
essay writing service payday loans online
write essay online bad credit payday loans
pay to write my essay payday loans online
payday loan online
writing essay online payday loans for bad credit
paper writer online payday advance
custom essay writing the best online payday loans
paper writing services
write my essay online payday loan
emergency payday loans online
personal loans
pay day loans
cash advance
Read Full Article
My reading of the UK law is that it says nothing about reuse of the products of TDM. This makes it weak but it also means that requiring a statement about reuse (however restrictive) cannot restrict behaviours that the law permits.
If the XML provided by the API falls short of the content in a PDF then that is a shame and I would urge TDM researchers to feed this back and urge Elsevier to fix it. Analysing PDFs scraped from web sites strikes me as a poor use of time and energy that would be better invested in advancing research. Just because you can read a PDF (or a web page) doesn’t mean it is the best foundation for TDM if better alternatives such as XML are available.
Anthony, thank you for your reply.
Under UK law, copyright and all other intellectual property rights do not apply to facts. Collections of facts might enjoy protection under sui generis database rights, but that rarely applies to the output of mining scientific papers.
You are absolutely right that adding a statement about reuse cannot legally restrict behaviours that the law permits, but in practise it does exactly that. Most potential users of scientific data are not intellectual property law experts, and on sight of such a statement will simply avoid the data. To add such a statement to my own work would be against the public interest, and unethical.
You are quite right that XML falling short of the PDF content is a shame. However, especially in the case of older material, PDFs are often the only archive of content available. We have an array of technological approaches to extracting and cleaning data from PDFs, and if they are the only choice, we can work with them quite well. XML is preferable, but not if it means taking a lot of time out to debug APIs with each individual content provider.
Best,
Richard
Hi Alicia,
Does this mean that, if you go through the API, you’re allowed to mine the full text of all Elsevier articles that you also have access to via ScienceDirect? Unlimited text mining, in other words, as long as you go through the API.
If so, then what’s the logic behind not allowing text mining through ScienceDirect? What difference does it make to Elsevier if a researcher chooses to be inefficient in the way he/she mines text? (Assuming that the API is more efficient, which I imagine it is.)
Cheers,
Sebastiaan
Hi Sebastiaan,
The reason that we require miners to use the API is so that we can meet their needs AND ALSO the needs of our human users who can continue to read, search and download articles and not have their service interrupted in any way. Science Direct holds 11 million pieces of content, shares infrastructure with Scopus, ClinicalKey, and other Elsevier products, and serves millions of researchers. I am told we are not alone in providing an API for this sort of high-volume access and that APIs also are used by others including Wikipedia and Twitter. We appreciate that users might wish to text mine across publisher platforms, and this is why we also participate in the multi-publisher cross-platform text and data mining service offered by CrossRef http://tdmsupport.crossref.org/
With kind wishes,
Alicia
Dr Alicia Wise
Director of Access and Policy
Elsevier
a.wise@elsevier.com
@wisealic
In response to Sebastiaan, I think there are extremely good reasons not to use the Elsevier API, not least those mentioned by Richard Smith-Unna. For instance they have rate-limits and restrictive terms & conditions on usage. It is not in any way “unlimited”.
“Elsevier has chosen to provisionally limit researchers to 10,000 articles per week” — Nature News
http://www.nature.com/news/elsevier-opens-its-papers-to-text-mining-1.14659
This is far too restrictive to be useful. I support Chris in his decision not to use Elsevier’s API. I have also done mining work at the Natural History Museum, London on ScienceDirect content and I did not use the Elsevier API. Researchers should be free to choose which tools and methods they use to do research.
Hi Ross,
This is incorrect, and there is no hard limit on the number of articles that can be mined per week. We do have some rate limits in place to ensure equal access to the API for all users, but feedback from researchers suggests these are reasonable. You can access up-to-date information about our TDM services here: https://www.elsevier.com/about/company-information/policies/text-and-data-mining/text-and-data-mining-faq
With kind wishes,
Alicia
Dr Alicia Wise
Director of Access & Policy
Elsevier
a.wise@elsevier.com
@wisealic
Dear Alicia,
Thank you for your comment. At the moment, Elsevier’s API policy is terribly unclear. You state “there is no hard limit on the number of articles that can be mined per week” – thank you for being so specific. However I am intrigued by your next sentence which is not so specific: “We do have some rate limits…”
If these unspecified limits are not on number of articles, perhaps they are on bandwidth (or some other property)? It would be extremely helpful if Elsevier was clearer about what its rate limits actually are. Publish this information, clearly! Both on the Elsevier site you linked to, and your comments here the information given appears to be purposefully vague and unhelpful. I cannot use a service for which I honestly still don’t understand the limits of.
Pingback: Elsevier stopped me doing my research | Science...
@Ailicia, what is “text mind”
So if if it’s only 9 a minute, what’s stopping 20 of my colleagues downloading an article from ScienceDirect every two minutes for our shared reading group? On the other hand, there could even be hundreds of people at my university alone simultaneously accessing ScienceDirect, thousands across the country, tens of thousands or hundreds of thousands globally. I hope the SD servers can stand up to that. I’m getting worried, given the statements above…
Hi Alicia,
(I cannot seem to re-reply directly to your comment, so I’ll post it like this.)
First, thanks for taking the time to reply, and giving Elsevier’s point of view. However, I would like to press you a bit on my main question, which you didn’t answer:
Does this mean that, if you go through the API, you’re allowed to mine the full text of all Elsevier articles that you also have access to via ScienceDirect? Unlimited text mining, in other words, as long as you go through the API.
If no, then I feel that your reply is disingenuous—suggesting that all researchers need to do is use the API, while this is in fact restricted. On the other hand, if yes, then you have point. So …? It’s a simple yes/ no question.
Cheers,
Sebastiaan
Yes!
With kind regards,
Alicia
Dr Alicia Wise
Director of Access & Policy
Elsevier
a.wise@elsevier.com
@wisealic
Blog post just out by Glyn Moody https://www.techdirt.com/articles/20151117/09383132839/elsevier-says-downloading-content-mining-licensed-copies-research-papers-could-be-considered-stealing.shtml
Alicia Wise writes:
“I am told we are not alone in providing an API for this sort of high-volume access and that APIs also are used by others including Wikipedia and Twitter. ”
While Wikipedia supports access through an API, they don’t use it as a way to limit access, as Elsevier apparently does. First of all, the Wikimedia API doesn’t have hard limits on access; the documentation simply says “There is no hard and fast limit on read requests, but we ask that you be considerate and try not to take a site down.” (See https://www.mediawiki.org/wiki/API:Etiquette . Some WIkimedia instances can add rate limits, but they’re not built into the API and I’m not aware of Wikipedia imposing a hard limit.)
Second, Wikipedia regularly makes their full content set available for analysis as well, via direct FTP download or BitTorrent. I use this myself– every month, I download a dump file with all the articles in English Wikipedia, in order to run programs over them that derive data for my Forward to Libraries service. That’s over 5 million articles I get every month, or over 100 times as many articles per month as Elsevier lets researchers download, if Ross Mounce’s figures above are correct.
In other words, a nonprofit with an annual budget of under $70 million supports full data downloads and still allow its users to “continue to read, search and download articles and not have their service interrupted in any way.” If a company with over $3 billion in annual revenue won’t do the same, it’s not for service-continuity or other technical reasons.
I hate to be the devil’s advocate here, but it seems like Alicia is correct: The API indeed allows full access to subscribed content in a way that doesn’t seem much more restrictive than usual. (Although ‘usual’ is very restrictive, of course.) You can see the registration form here:
– https://www.elsevier.com/__data/assets/pdf_file/0012/102234/TDM-sign-up-short-form.pdf
That’s my understanding of the terms, anyway. And, of course I have no idea whether the API works technically well enough to be useful.
There are many reasons why the API is problematic. The main ones at present are:
* I have to agree to Elsevier’s terms and conditions (even to look at it)
* I have disclose personal details about myself andf my research to Elsevier.
That is before I even know whether the API does what I want it to do.
The association of European research libraries (LIBER) and others have in July 2014 started a call on Elsevier to withdraw its TDM policy:
http://libereurope.eu/blog/2014/07/01/european-research-organisations-call-on-elsevier-to-withdraw-tdm-policy/
And here is our open letter in response: https://www.elsevier.com/__data/assets/pdf_file/0008/84464/TDM_openletter.pdf
With kind wishes,
Alicia
Dr Alicia Wise
Director of Access & Policy
Elsevier
a.wise@elsevier.com
@wisealic
Pingback: Why Elsevier’s “solution” is the problem | Chris H.J. Hartgerink's Notebook
Pingback: Content-mining; Rights versus Licences | petermr's blog
So, the purpose of this blog post is to paint Chris H.J. Hartgerink as the victim of Elsevier and therefore an open-access hero. Nicely done, Chris. In reality, it’s just a solipsistic essay that reveals the author’s ignorance about data mining. Fail.
Solipsism:
2. Extreme preoccupation with and indulgence of one’s feelings, desires etc; egoistic self-absorption
Would you mind Jeffrey enlightening us all on API so we might share your vision?
Yes, it’s a real shame that content-mining specialist Chris Hartgerink is so ignorant about data mining compared with anti-OA trolling specialist Jeffrey Beall. If only Chris could have had Jeffrey’s skills and experience, all this would have been so much better. Elsevier would never have cut off Jeffrey’s access! Silly Chris.
Pingback: Content-mining; Why do Publishers insist on APIs and forbid screen scraping? | petermr's blog
Pingback: Press and blog review | Blog @HEC Paris Library
Pingback: Corporate censorship of academic research | Pearltrees
Pingback: Copyright Reform: C4C Applauds, Regrets and Opposes | C4C
Pingback: Green Tea and Velociraptors | How to write to your MEPs about European Copyright reform
Pingback: Wiley also stopped me doing my research | Open Notebook Science Network
Pingback: Wiley also stopped my doing my research | Chris H.J. Hartgerink's Notebook
Pingback: Impact of Social Sciences – Announcing OpenCon 2016: Catalyzing collective action for a more open scholarly system.
Pingback: Did I just ‘make’ all of APA Open Access? | Chris H.J. Hartgerink's Notebook
Pingback: Reflections on OpenCon 2016 | PLOS ECR Community
Pingback: Reflections on OpenCon 2016 | PLOS Blogs Network
what is neuroleptic malignant syndrome nukusoki44-tumblr ways to reduce snoring
anti nausea medication for cancer patients order domperidone online what’s good medicine for nausea
over the counter mifepristone domperidone-sokukeru49 how to increase breast milk for pumping
drugs used for stomach pain where to buy domperidone what are the side effects of nexium tablets
buscopan how does it work order domperidone online fair handsome cream price
how to produce breast milk faster domperidone buy produce breast milk not pregnant
hilton extended stay hotels where can i buy domperidone list of canadian pharmacies online
prolactin hormone imbalance symptoms bukusa28 zyrtec cetirizine dihydrochloride 10mg
signs baby has acid reflux domperidone-bukusa28 rabeprazole over the counter
pantoprazole sodium 40 mg ter bukusa28-tumblr is there any cold medicine a pregnant woman can take
how long should breast pump session last bukusa28 newborn arching back crying
can you buy asthma inhalers over the counter uk domperidone buy symptoms of schizophrenia in adults
is naproxen good for fever bukusa28-domperidone sand flies and dogs
how to produce more milk order domperidone nasal decongestant pseudoephedrine 30 mg
buscopan dose for child bukusa28-domperidone how fast does pantoprazole work
zyrtec for skin allergy bukusa28 can dogs smell cancer
vega tablet in pakistan bukusa28-domperidone g6pd deficiency vitamin c
synflex naproxen sodium 275 mg domperidone-bukusa28 ways to improve breast milk
http://revia.phartesdomusa.org buy low dose naltrexone canada
http://revia.phartesdomusa.org naltrexone online bestellen
http://lasix.phartesdomusa.org/ forums about lasix
В нашем интернет магазине мы предлагаем детские площадки, горки, качели по самой низкой цене, с минимальной наценкой. По мимо удовольствия от выгодной покупки и экономии денег мы подберем для Вас детскую площадку, горку, качелю, которая оптимально соответстветсвует Вашим требованиям. Для ЖЭКов универсальные игровые площадки для детей под ключ сделать заказ у легендарного завода-производителЯ игровых оборудования для площадок для детей с ОФП. Ссылка на нас !
In 2016 Elsevier’s not-for-profit Elsevier Foundation committed $ a year, for 3 years, to programmes encouraging diversity in science, technology and medicine and promoting science research in developing countries.
Elsevier is conducting conferences, exhibitions and workshop worldwide, with over 50 conferences a year covering life sciences, physical sciences engineering, social sciences, and health sciences.
orsk4uu5lyvmda7o6s
The risk or severity of adverse effects can be increased when Prednisone is combined with SRP 299. prednisone
long term prednisone use in dogs side effects generic prednisone online
idrvcro49e3ssh3jda
It is actually unlawful for a dealer ship to roll again the odometer on any automobile they offer. Even when installed a fresh electric motor in the car, it really is still unlawful. If you suspect which a dealership is not declaring the proper mileage on a vehicle, leave and shop elsewhere.
Pingback: Copyright Madness (#125) : une semaine de propriété intellectuelle en délire - Pop culture - Numerama