Small file performance

From Codawiki

Table of contents

Word of Caution

The Coda statistics below do not represent real Coda capabilities. I re-ran the same test scripts in March 2007 on an AMD Athlon(TM) XP 2500+, 10Mbit HUB/half-duplex link, Linux kernel 2.6.18, Ext3, Coda version 6.9.0-CVS, and the results were more favorable for Coda.

I do not know if this is the result of improvements in Coda code, kernel, network link, hardware, or in all of them. In any case, the numbers were better even on an old 10Mbit/Half-duplex/HUB LAN link (and the network is the biggest impact factor, as you can see from the 3rd column).

Here is the quick comparison of the results. Column "OLD" are numbers as given in the initial benchmark test, "NEW" are my tests from March 2007, and "LOCAL" are March 2007 statistics for client and server on the same machine.

CODA OPERATION             OLD                NEW               LOCAL
File creation    :  real 0m49.017s  ,  real 0m40.309s  ,  real 0m12.399s
File read first  :  real 0m1.973s   ,  real 0m2.145s   ,  real 0m0.913s
File read subseq.:  real 0m1.111s   ,  real 0m0.592s   ,  real 0m0.200s (+- 0.100s)
File deletion    :  real 1m10.704s  ,  real 0m19.330s  ,  real 0m3.987s

So, consider all information below in the context of the above, updated statistics.

Simplified comparison against NFS

Setup: a small (Athlon 700Mhz, Debian with 2.6.6 kernel) server running both NFS and Coda, a notebook with Debian/2.6.5 as a client. 1000 small php files are created on one directory, then read, then deleted. Coda setup is all metadata on disk (xfs filesystem), one volume. Writeback caching isn't enabled (it doesn't seem to work currently, cfs wbstart dir hangs venus).

File creation

make.php:

<?
for($i=0;$i<1000;$i++) {
        $f = fopen($i.".test.php",'w');
        fwrite($f, "<? /* some test comment */ ?>");
        fclose($f);
}
?>

Coda:

time php4 make.php
 
real    0m49.017s
user    0m0.263s
sys     0m0.383s

NFS:

time php4 make.php
 
real    0m2.891s
user    0m0.212s
sys     0m0.729s

Native file system (xfs):

real    0m2.088s
user    0m0.239s
sys     0m0.958s

During file creation and deletion bo on the server was extremely high (maxed out). Metadata and log are stored on an XFS filesystem, could this interfere?

Reading

run.php:

<?
for($i=0;$i<1000;$i++) {
        include_once($i.'.test.php');
}
?>

Coda, first run:

real    0m1.973s
user    0m0.276s
sys     0m0.204s

Coda subsequent runs, average (zero load on server):

real    0m1.111s
user    0m0.247s
sys     0m0.197s

NFS, first run:

real    0m1.842s
user    0m0.262s
sys     0m0.378s

NFS subsequent runs, average (some load on server):

real    0m0.960s
user    0m0.226s
sys     0m0.271s

Local file system (xfs):

real    0m0.407s
user    0m0.117s
sys     0m0.080s

Deletion

Coda:

time rm *.test.php
 
real    1m10.704s
user    0m0.045s
sys     0m0.090s

NFS:

time rm *.test.php
 
real    0m2.067s
user    0m0.034s
sys     0m0.262s

Local file system (xfs):

real    0m1.114s
user    0m0.033s
sys     0m0.723s

Summary

Coda seems to have very poor write/unlink performance, read performance is pretty good (especially as the server isn't loaded up).

Possible reasons/improvements


Not really sure how to appropriately add comments, but here is a try.

If run.php was run right after the make.php, everything is most likely still cached by the client which is why the time taken by the first and second runs are so close. A useful command to run before the first run is cfs flushcache . which tries to flush local copies of the files from the Coda cache.

I tried the same tests between my desktop (P4 3GHz/1GB ram/IDE drive), on a replicated volume on 2 servers (PII 266MHz/256MB ram). My timings are as follows,

make.php 1m1.257s
cfs flushcache . (not timed)
run.php 0m5.552s
run.php 0m0.157s
rm *.test.php 0m36.673s

The remove timings are a lot better in this case because I'm running a development version of RVM on the servers which improves deallocation of data structures in RVM. The result shows as a significant improvement on the server-side performance of operations such as unlink and rmdir.

The main difference between Coda and NFS/XFS at this point is that in 'connected mode' Coda actually synchronously commits each operation and makes sure that all data and meta-data is committed to disk on the server before returning success on an operation. However we can switch the client to 'write-disconnected' operation in which case all operations are logged locally on the client, similar to how ext3 and xfs commit their meta-data updates to a journal instead of seeking all over the disk.

This log is sent back to the server at some later point in time. Not only does this dramatically improve response time for the user, it also allows the server to commit multiple operations in each transaction which is more efficient for the server.

Here are some timings in write-disconnected mode (cfs wd .) I also added the time it took for 'cfs forcereintegrate' to return, this is normally not necessary but gives some idea of how much overhead we save on the server side. Total time went from 62 seconds to 45 seconds, but as far as the application is concerned the operation was done in 4 seconds

make.php 0m4.034s
cfs forcereintegrate . 0m41.406s
run.php (unchanged)
rm *.test.php 0m1.376s
cfs forcereintegrate . 0m15.186s

Now if we don't call forcereintegrate we make use of the fact that venus performs 'log optimizations' in which the remove operations end up cancelling out the create operations and a whole create/remove commit cycle takes a little over 5 seconds.

make.php 0m3.715s
rm *.test.php 0m1.373s
cfs forcereintegrate . 0m0.033s

More performance information

The following was in a post on Apr 27, 2005 by Jan to the Coda mailing list. It explains slow write results when unpacking tar files.

>       Also, there were questions about why unpacking a tar file seemed so
> slow.  I speculated that coda, which is connected strongly, was
> uploading each file to the server before letting the next one unpack.
> Is this true?  I was also asked if it waited for the updates to be sent
> out between the servers and I'm pretty sure that it doesn't, but I
> wanted to double-check.  Would it be quicker or is there any benefit to
> disconnecting before unpacking the files?

Correct, in connect mode the we don't return back to the application until the file is stored on all replicas. So if you're using 2 replicas it will end up transferring twice the amount of data. On top of that, the Coda server will force all the changes do disk (and probably even flush/truncate the RVM) before it returns to the client. In addition, the client probably performs at least 4 operations for every file, (create, store, chown, chmod, possibly utimes) and there is no coalescing every operation becomes a separate RVM transaction, along with a bunch of RVM flushing/truncating/fsyncing.

Write-disconnected mode is in this respect a lot faster. The 4-5 operations will get optimized to just 2 or 3 (create/store/setattr). Although it could, I'm not sure if the setattr will merge with the store operation. It will also send the operations in batches of up to 100, which will all get committed within a single transaction, so the server will essentially have to perform about 1/167th the number of transaction.

However... in write-disconnected mode the client tries to predict what the version-vectors on the server will look like and is sometimes wrong and gets a reintegration conflict. Also the client often sends the updates to only a single server and then triggers resolution to propagate them to all other replicas. If it sends the next batch to another server before the resolution has completed, we get yet another type of reintegration conflict. So write-disconnected is not really the best solution especially for new users that are just taking their first steps with Coda.