Hello,
I am currently evaluating hServer with my Hadoop installation. The first thing I tried was to modify Hadoop standard TeraGen step of TeraSort benchmark to use GridOutputFormat to skip writing data to disk and put it directly to hServer. So, according to tutorial I just changed the task output format to GridOutputFormat and started TeraGen to generate 10GB input.
Results were pretty unexpected to me: in soss server statistics I see ~10k creates/sec throughput which is way slower then HDFS. This number does not change significantly with number of reduce tasks (i.e. number of threads inserting records per server)
I suspect there is something wrong with my configuration, so is there any configuration parameter I should set/verify they are set to speed up insertion rate?
I am running 3x8 core boxes with 96GB ram, 10Gbit network, linux.
Thanks.
I am currently evaluating hServer with my Hadoop installation. The first thing I tried was to modify Hadoop standard TeraGen step of TeraSort benchmark to use GridOutputFormat to skip writing data to disk and put it directly to hServer. So, according to tutorial I just changed the task output format to GridOutputFormat and started TeraGen to generate 10GB input.
Results were pretty unexpected to me: in soss server statistics I see ~10k creates/sec throughput which is way slower then HDFS. This number does not change significantly with number of reduce tasks (i.e. number of threads inserting records per server)
I suspect there is something wrong with my configuration, so is there any configuration parameter I should set/verify they are set to speed up insertion rate?
I am running 3x8 core boxes with 96GB ram, 10Gbit network, linux.
Thanks.