Improving bulk insert performance

admin

Administrator
#1
How do I load large numbers of objects into the in-memory data grid as quickly as possible from my .NET client application?
 

admin

Administrator
#1
A number of strategies can be employed to improve bulk load operations:
  • ScaleOut StateServer was designed to handle a large number of concurrent operations across a farm, so consider using .NET’s Task Parallel Library to insert and read the objects in parallel (using Parallel.ForEach loops, for example). This will take advantage of all of the CPU cores on your client system.
  • If the Task Parallel Library is not an option (perhaps your application running in a version of .NET prior to 4.0), use .NET's ThreadPool to queue load operations as work items, or consider using the NamedCache.Add overload that takes an IDictionary as a parameter--this method takes a collection of objects to be inserted and inserts them into the in-memory data store in parallel as individual objects.
  • To take better advantage of parallelism in the client performing the load, increase the size of the connection pool in ScaleOut StateServer's client libraries by editing the soss_client_params.txt file and increasing the max_svr_conn setting to a higher value (up to 32). Also, in your application, set the DataAccessor.ConcurrentRequests static property to a matching value when your application first starts up to increase the size of the managed pool of connections.
  • To improve insert performance, consider disabling the in-process client cache that is maintained by the ScaleOut client libraries (NamedCache.AllowClientCaching = false). The client cache can improve object retrieval performance for large objects, but it does not help for applications that only perform bulk load operations. Since the client cache can add a little overhead during heavy, parallel insert operations, you may opt to disable it.
  • Avoid using strings as keys. String keys can be a convenient way to identify your objects in the in-memory data grid, but, behind the scenes, string keys must be encoded, hashed, and sent to the ScaleOut service for storage. Byte arrays (up to 32 bytes) or Guids are much more efficient as keys than arbitrary-length strings, so avoid string keys for a noticeable performance improvement.
  • Consider using custom serialization. By default, the ScaleOut APIs use .NET’s standard BinaryFormatter to serialize your cached objects. While convenient, the BinaryFormatter may not offer the best performance for your object types. The NamedCache.SetCustomSerialization method allows you to plug in custom serialization/deserialization methods.
 
Top