Is 100 ms latency achievable with hserver

#1
Assume my computation involves finding nearest neighbours from an existing set of 500,000 data points. Each data point carries 10 independent features. Distance used is the euclidean distance. What kind of rough latency can I expect with hserver assuming the cluster has around 50 servers?
 

admin

Administrator
#1
Your question cannot be answered as written. The algorithm needs to be implemented as a MapReduce computation and then measured. In general, ScaleOut hServer can complete MapReduce computations in 100 msec. We suggest that you first test your MapReduce implementation on a small data set running on a small cluster (for example, four hosts, which can be licensed at zero cost using the Community Edition). Measure the latency as you increase the data set size until you hit either a resource constraint or a latency of 100 msec. Then if your algorithm demonstrates scalable speedup, you can add more hosts and proportionally increase the size of the data set until you reach the desired data set size. That will determine the number of hosts you need to complete the computation in 100 msec.
It may be the case that your algorithm’s computational complexity does not allow it to demonstrate scalable speedup, i.e., that the throughput measured as data points processed per second grows linearly as you increase the number of data points and the number of hosts running the MapReduce implementation. In this case, the latency may just grow too fast with the data set size to meet your 100 msec. constraint.
 
#1
Thank you for your answer.
What kind of overhead is involved in invoking this calculation with hserver? We figured out that with map reduce, the overhead is too large and 100ms latency is impossible to achieve due to overhead.
 
Top