RCF provides two basic performance guarantees for remote calls: zero copy and zero allocation.
RCF makes no internal copies of data while sending or receiving, either on
the server or the client. However, serialization often forces copies to be
made. For instance, deserializing a std::string
can't be done without making a copy of the string contents, because std::string
always allocates its own storage. The same goes for std::vector<>
. RCF provides a workaround for this
issue, via the RCF::ByteBuffer
class. The contents of a ByteBuffer
are not copied upon serialization
or deserialization, instead RCF uses scatter/gather style semantics to send
and receive the contents directly.
To improve performance, you should use RCF::ByteBuffer
whenever transferring large chunks of untyped data. On typical hardware,
transferring multiple megabytes of data in a single call is not a problem.
Zero allocation means that, whether on server or client, RCF will not make any heap allocations unless processing the first call on a connection, or processing a call with larger amounts of data than on the previous call of the same connection. In particular, if a remote call is made twice with identical parameters, on the same connection, RCF guarantees that it will not make any heap allocations on the second call, either on the client or on the server:
bool gExpectAllocations = true; // Override global operator new so we can intercept heap allocations. void *operator new(size_t bytes) { if (!gExpectAllocations) { throw std::runtime_error("Unexpected heap allocation!"); } return malloc(bytes); } void operator delete (void *pv) throw() { free(pv); } // Override global operator new[] so we can intercept heap allocations. void *operator new [](size_t bytes) { if (!gExpectAllocations) { throw std::runtime_error("Unexpected heap allocation!"); } return malloc(bytes); } void operator delete [](void *pv) throw() { free(pv); }
RCF_BEGIN(I_Echo, "I_Echo") RCF_METHOD_R1(RCF::ByteBuffer, echo, RCF::ByteBuffer) RCF_END(I_Echo) class Echo { public: RCF::ByteBuffer echo(RCF::ByteBuffer byteBuffer) { return byteBuffer; } };
RcfClient<I_Echo> client(( RCF::TcpEndpoint(port))); // First call will trigger some heap allocations. gExpectAllocations = true; client.echo(byteBuffer); // These calls won't trigger any client-side or server-side heap allocations. gExpectAllocations = false; for (std::size_t i=0; i<10; ++i) { RCF::ByteBuffer byteBuffer2 = client.echo(byteBuffer); }
However, serialization functions of the data types involved in the remote call, may make heap allocations of their own. To eliminate allocations associated with deserialization, RCF provides server-side object caching.
Serialization and deserialization of remote call parameters can become a performance bottleneck. In particular, deserialization of a complex datatype involves not only creating the object to begin with, but also a number of memory allocations (and CPU cycles) when deserializing all the fields and subfields of the object.
To improve performance in these circumstances, RCF provides a server-side cache of objects used during remote calls. Objects used as parameters in one remote call, can be transparently reused in subsequent calls. This means that construction overhead and memory allocations due to deserialization, can be eliminated in subsequent calls.
For example:
RCF_BEGIN(I_Echo, "I_Echo") RCF_METHOD_R1(std::string, echo, std::string) RCF_END(I_Echo) class Echo { public: std::string echo(const std::string & s) { return s; } };
Echo echo; RCF::RcfServer server( RCF::TcpEndpoint(0)); server.bind<I_Echo>(echo); server.start(); int port = server.getIpServerTransport().getPort(); RCF::ObjectPool & cache = RCF::getObjectPool(); // Enable caching for std::string. // * Don't cache more than 10 std::string objects. // * Call std::string::clear() before putting a string into the cache. cache.enableCaching<std::string>(10, boost::bind(&std::string::clear, _1)); std::string s1 = "123456789012345678901234567890"; std::string s2; RcfClient<I_Echo> client(( RCF::TcpEndpoint(port) )); // First call. s2 = client.echo(s1); // Subsequent calls - no memory allocations at all, in RCF runtime, or // in std::string serialization/deserialization, on client or server. for (std::size_t i=0; i<100; ++i) { s2 = client.echo(s1); } // Disable caching for std::string. cache.disableCaching<std::string>();
In this example, the first call to echo()
will cause several server-side deserialization-related
memory allocations - one to construct a std::string
,
and another to expand the internal buffer of the string, to fit the incoming
data.
With server-side object caching enabled, after the call returns, the server-side
string is cleared and then held in a cache, rather than being destroyed.
On the next call, instead of constructing a new std::string
,
RCF reuses the std::string
in the cache. Upon deserialization,
std::string::resize()
is called, to fit the incoming data. As this particular string object has
already held the requested amount of data earlier, the resize()
request does not result in any memory allocation.
The server-side object cache is configured on a per-type basis, using the
ObjectPool::enableCaching<>()
and ObjectPool::disableCaching<>()
functions. For each cached datatype, you can specify the maximum number of
objects to cache, and which function to call, to put the objects in a reusable
state.
The zero copy and zero allocation guarantees affect the execution of a single
remote call. Another performance factor is the maximum number of concurrent
clients a server can support. RCF's server transport implementation is based
on Boost.Asio, which
leverages native network API's when they are avaialable (I/O completion ports
on Windows, epoll()
on Linux, /dev/poll
on
Solaris, kqueue()
on FreeBSD), and less performant API's when they aren't (BSD sockets). The
number of clients a server can support is essentially determined by how much
memory the server has, and the application-specific resources required by
each client.
RCF has been designed to yield minimal performance overhead, with network intensive, high throughput applications in mind. Keep in mind that bottlenecks in distributed systems tend to be determined by the overall design of the distributed system - a poorly designed distributed system will have its performance cut off well before the communications layer reaches its limit.