Main Page | Recent changes | Edit this page | Page history

Printable version

Not logged in
Log in | Help
 
Other languages: Dansk | Deutsch | Polski | Svenska | 日本語 (Nihongo) | 中文 (Zhongwen)

Wikipedia:Database download

From Wikipedia, the free encyclopedia.

All Wikipedia content is licensed under the GNU Free Documentation License; see Wikipedia:Copyrights for more info.

See also Wikipedia:MediaWiki to get the software to run the wiki. If you're just looking for the database schema, it's described in schema.doc (a text file, not Microsoft Word; IE users beware).

Database dumps, updated approx. weekly

See http://download.wikipedia.org/ to grab the backup dumps of the database. These can be read into a MySQL relational database for leisurely analysis, testing of the Wikipedia software, and with appropriate preprocessing, perhaps offline reading.

The database schema is explained here. The cur tables contain the current revisions of all pages; the old tables contain the prior edit history. Approximate file sizes are given for the compressed dumps; uncompressed they'll be significantly larger.

Windows users may not have a bzip2 decompressor on hand; a command-line Windows version of bzip2 is available for free under a BSD license. A GUI file archiver, 7-zip, that is also able to open bz2 compressed files is available for free, here. MacOS X ships with the command-line bzip2 tool as well.

Static HTML tree dumps for mirroring or CD distribution

I have dumped wikipedia to html. Dump is beta, dumping code is alpha. wikipedia-terodump-0.1.tar.bz. (Helia mirror) - Tero

The wiki2static script is an experimental program to generate html dumps, inclusive of images and search function. Here are some examples: English (text only) (126MB), German (complete) (167MB), Spanish (complete) (50MB). More information here. - Alfio

If you'd like to help set up an automatic dump-to-static function, please drop us a note on the developers' mailing list.

Daily tarballs of older Non-English Wikipedias

These have not yet been upgraded and are running on UseMod-wiki. The software and data are included together in a single tarball.

see also Wikipedia:TomeRaider database


Please do not use a web crawler to download large amounts of articles. Aggressive crawling of the server can cause a dramatic slow-down of Wikipedia.


[Main Page]
Main Page
Recent changes
Random page
Current events

Edit this page
Discuss this page
Page history
What links here
Related changes

Special pages
Bug reports
Donations