Wikipedia:Database download
From Wikipedia, the free encyclopedia.
All Wikipedia content is licensed under the GNU Free Documentation License; see Wikipedia:Copyrights for more info.
See also Wikipedia:MediaWiki to get the software to run the wiki. If you're just looking for the database schema, it's described in schema.doc (a text file, not Microsoft Word; IE users beware).
Database dumps, updated approx. weekly
See http://download.wikipedia.org/ to grab the backup dumps of the database. These can be read into a MySQL relational database for leisurely analysis, testing of the Wikipedia software, and with appropriate preprocessing, perhaps offline reading.
The database schema is explained here. The cur tables contain the current revisions of all pages; the old tables contain the prior edit history. Approximate file sizes are given for the compressed dumps; uncompressed they'll be significantly larger.
Windows users may not have a bzip2 decompressor on hand; a command-line Windows version of bzip2 is available for free under a BSD license. A GUI file archiver, 7-zip, that is also able to open bz2 compressed files is available for free, here. MacOS X ships with the command-line bzip2 tool as well.
Static HTML tree dumps for mirroring or CD distribution
I have dumped wikipedia to html. Dump is beta, dumping code is alpha. wikipedia-terodump-0.1.tar.bz. (Helia mirror) - Tero
The wiki2static script is an experimental program to generate html dumps, inclusive of images and search function. Here are some examples: English (text only) (126MB), German (complete) (167MB), Spanish (complete) (50MB). More information here. - Alfio
If you'd like to help set up an automatic dump-to-static function, please drop us a note on the developers' mailing list.
Daily tarballs of older Non-English Wikipedias
These have not yet been upgraded and are running on UseMod-wiki. The software and data are included together in a single tarball.
- ca - Catalan (ca 7.5M)
- et - Estonian (ca 0.3M) (dead link)
- eu - Euskara (ca 0.3M) (dead link)
- fi - Finnish (ca 0.3M) (dead link)
- fy - Frisian (ca 0.3M) (dead link)
- he - Hebrew (ca 44k)
- ia - Interlingua (ca 0.3M) (dead link)
- id - Indonesian (ca 107k)
- it - Italian (ca 13M)
- pt - Portuguese (ca 5M)
see also Wikipedia:TomeRaider database
Please do not use a web crawler to download large amounts of articles. Aggressive crawling of the server can cause a dramatic slow-down of Wikipedia.