Small file performance
From Codawiki
Table of contents |
Word of Caution
The Coda statistics below do not represent real Coda capabilities. I re-ran the same test scripts in March 2007 on an AMD Athlon(TM) XP 2500+, 10Mbit HUB/half-duplex link, Linux kernel 2.6.18, Ext3, Coda version 6.9.0-CVS, and the results were more favorable for Coda.
I do not know if this is the result of improvements in Coda code, kernel, network link, hardware, or in all of them. In any case, the numbers were better even on an old 10Mbit/Half-duplex/HUB LAN link (and the network is the biggest impact factor, as you can see from the 3rd column).
Here is the quick comparison of the results. Column "OLD" are numbers as given in the initial benchmark test, "NEW" are my tests from March 2007, and "LOCAL" are March 2007 statistics for client and server on the same machine.
CODA OPERATION OLD NEW LOCAL File creation : real 0m49.017s , real 0m40.309s , real 0m12.399s File read first : real 0m1.973s , real 0m2.145s , real 0m0.913s File read subseq.: real 0m1.111s , real 0m0.592s , real 0m0.200s (+- 0.100s) File deletion : real 1m10.704s , real 0m19.330s , real 0m3.987s
So, consider all information below in the context of the above, updated statistics.
Simplified comparison against NFS
Setup: a small (Athlon 700Mhz, Debian with 2.6.6 kernel) server running both NFS and Coda, a notebook with Debian/2.6.5 as a
client. 1000 small php files are created on one directory, then read, then deleted. Coda setup is all metadata on disk (xfs
filesystem), one volume. Writeback caching isn't enabled (it doesn't seem to work currently, cfs wbstart dir
hangs
venus).
File creation
make.php:
<? for($i=0;$i<1000;$i++) { $f = fopen($i.".test.php",'w'); fwrite($f, "<? /* some test comment */ ?>"); fclose($f); } ?>
Coda:
time php4 make.php real 0m49.017s user 0m0.263s sys 0m0.383s
NFS:
time php4 make.php real 0m2.891s user 0m0.212s sys 0m0.729s
Native file system (xfs):
real 0m2.088s user 0m0.239s sys 0m0.958s
During file creation and deletion bo on the server was extremely high (maxed out). Metadata and log are stored on an XFS filesystem, could this interfere?
Reading
run.php:
<? for($i=0;$i<1000;$i++) { include_once($i.'.test.php'); } ?>
Coda, first run:
real 0m1.973s user 0m0.276s sys 0m0.204s
Coda subsequent runs, average (zero load on server):
real 0m1.111s user 0m0.247s sys 0m0.197s
NFS, first run:
real 0m1.842s user 0m0.262s sys 0m0.378s
NFS subsequent runs, average (some load on server):
real 0m0.960s user 0m0.226s sys 0m0.271s
Local file system (xfs):
real 0m0.407s user 0m0.117s sys 0m0.080s
Deletion
Coda:
time rm *.test.php real 1m10.704s user 0m0.045s sys 0m0.090s
NFS:
time rm *.test.php real 0m2.067s user 0m0.034s sys 0m0.262s
Local file system (xfs):
real 0m1.114s user 0m0.033s sys 0m0.723s
Summary
Coda seems to have very poor write/unlink performance, read performance is pretty good (especially as the server isn't loaded up).
Possible reasons/improvements
Not really sure how to appropriately add comments, but here is a try.
If run.php was run right after the make.php, everything is most likely still cached by the client which is why the time taken by the first and second runs are so close. A useful command to run before the first run is cfs flushcache . which tries to flush local copies of the files from the Coda cache.
I tried the same tests between my desktop (P4 3GHz/1GB ram/IDE drive), on a replicated volume on 2 servers (PII 266MHz/256MB
ram). My timings are as follows,
make.php | 1m1.257s |
cfs flushcache . | (not timed) |
run.php | 0m5.552s |
run.php | 0m0.157s |
rm *.test.php | 0m36.673s |
The remove timings are a lot better in this case because I'm running a development version of RVM on the servers which improves deallocation of data structures in RVM. The result shows as a significant improvement on the server-side performance of operations such as unlink and rmdir.
The main difference between Coda and NFS/XFS at this point is that in 'connected mode' Coda actually synchronously commits each operation and makes sure that all data and meta-data is committed to disk on the server before returning success on an operation. However we can switch the client to 'write-disconnected' operation in which case all operations are logged locally on the client, similar to how ext3 and xfs commit their meta-data updates to a journal instead of seeking all over the disk.
This log is sent back to the server at some later point in time. Not only does this dramatically improve response time for the user, it also allows the server to commit multiple operations in each transaction which is more efficient for the server.
Here are some timings in write-disconnected mode (cfs wd .) I also added the time it took for 'cfs forcereintegrate' to return, this is normally not necessary but gives some idea of how much overhead we save on the server side. Total time went from 62 seconds to 45 seconds, but as far as the application is concerned the operation was done in 4 seconds
make.php | 0m4.034s |
cfs forcereintegrate . | 0m41.406s |
run.php | (unchanged) |
rm *.test.php | 0m1.376s |
cfs forcereintegrate . | 0m15.186s |
Now if we don't call forcereintegrate we make use of the fact that venus performs 'log optimizations' in which the remove operations end up cancelling out the create operations and a whole create/remove commit cycle takes a little over 5 seconds.
make.php | 0m3.715s |
rm *.test.php | 0m1.373s |
cfs forcereintegrate . | 0m0.033s |
More performance information
The following was in a post on Apr 27, 2005 by Jan to the Coda mailing list. It explains slow write results when unpacking tar files.
> Also, there were questions about why unpacking a tar file seemed so > slow. I speculated that coda, which is connected strongly, was > uploading each file to the server before letting the next one unpack. > Is this true? I was also asked if it waited for the updates to be sent > out between the servers and I'm pretty sure that it doesn't, but I > wanted to double-check. Would it be quicker or is there any benefit to > disconnecting before unpacking the files?
Correct, in connect mode the we don't return back to the application until the file is stored on all replicas. So if you're using 2 replicas it will end up transferring twice the amount of data. On top of that, the Coda server will force all the changes do disk (and probably even flush/truncate the RVM) before it returns to the client. In addition, the client probably performs at least 4 operations for every file, (create, store, chown, chmod, possibly utimes) and there is no coalescing every operation becomes a separate RVM transaction, along with a bunch of RVM flushing/truncating/fsyncing.
Write-disconnected mode is in this respect a lot faster. The 4-5 operations will get optimized to just 2 or 3 (create/store/setattr). Although it could, I'm not sure if the setattr will merge with the store operation. It will also send the operations in batches of up to 100, which will all get committed within a single transaction, so the server will essentially have to perform about 1/167th the number of transaction.
However... in write-disconnected mode the client tries to predict what the version-vectors on the server will look like and is sometimes wrong and gets a reintegration conflict. Also the client often sends the updates to only a single server and then triggers resolution to propagate them to all other replicas. If it sends the next batch to another server before the resolution has completed, we get yet another type of reintegration conflict. So write-disconnected is not really the best solution especially for new users that are just taking their first steps with Coda.