Tagged Questions

The advantage of MapReduce is that it allows for distributed processing of the map and reduction operations. Provided each mapping operation is independent of the other, all maps can be performed in parallel - though in practice it is limited by the data source and/or the number of CPUs near that ...

learn more… | top users | synonyms

2
votes
0answers
20 views

Mongo-Hadoop simple test failing with NPE

This is an open issue posted in the support forum here but since I didn't get any response, I thought I should try asking here. I have an existing application that uses MongoDB as the data layer. ...
0
votes
1answer
33 views

Graph Clustering for almost Clustered Graph by removing nodes(vertices)

I want to carry out Graph Clustering in a huge undirected graph with millions of edges and nodes. Graph is almost clustered with different clusters joined together only by some nodes(kind of ambiguous ...
-1
votes
0answers
6 views

mapreduce exception

I am running java mapreduce. The map() is doing some very small processing. I get the following exception Below in the log warning/error. After this, I see lot of ...
0
votes
0answers
13 views

zookeeper jar not found in HBase MR job

I have a web UI that tries to spawn a MR job on HBase table. I keep getting this error though: java.io.FileNotFoundException: File ...
0
votes
0answers
12 views

Pig Script Knocking Data Nodes Offline

I'm running the following Pig script on a 12 node Hadoop cluster with 30 map/reduce tasks per node, each task having 2GB of memory: A = LOAD '/path/to/gzipped/logs' USING PigStorage('\t'); B = ...
0
votes
1answer
18 views

Data Versioning (Hadoop, HDFS, Hbase backends)

I wonder how to version Data in Hadoop/HDFS/Hbase. It should be part of your model as changes are very likely (big-data is collected over a long time). Main Example for HDFS (file based backend). ...
0
votes
0answers
12 views

EOFException at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)

I was trying to run a matrix multiplication example presented by Mr. Norstadt under following link http://www.norstad.org/matrix-multiply/index.html. I can run it successfully with hadoop 0.20.2 but I ...
0
votes
0answers
14 views

hadoop streaming getting optimal number of slots

I have a streaming map-reduce job. I have some 30 slots for processing. Initially I get a single input file containing 60 records (fields are tab separated), first field of every record is a number, ...
0
votes
0answers
13 views

Amazon MapReduce for Visual Studio Sample for Beginners

Does anyone have any sample code/ getting started tips for using Amazon Map Reduce in Visual Studio? I am new to MapReduce AND the Amazon SDK for .NET, so I am looking for something that makes it ...
0
votes
0answers
12 views

How can image indexing be done in map-reduce framework using Lucene Image REtrieval APIs?

How can image indexing be done in map-reduce framework using Lucene Image Retrieval (LIRe) APIs? What algorithms can be used?
0
votes
1answer
35 views

shuffle error:exceeded max_failed_unique_matche : bailing out

I am new to hadoop and i am trying to execute the wordcount example. I have a cluster of 4 nodes made by virtual machines on my computer. Every time the job completes the map task but the reduce task ...
1
vote
1answer
35 views

What types/classes of algorithms can be recast in the MapReduce paradigm?

A few 'quick questions': what types/classes of algorithms can be recast in the MapReduce paradigm? (eg k-means has a MR implementation) Are there any that can't be expressed in this way? What ...
3
votes
1answer
35 views

Mongo map/reduce slowdown on large collections

We have a seemingly simple map/reduce job that goes through logging data on a daily basis. On the development server, we can run this job over a very large number of documents, ~1M, and it takes about ...
0
votes
1answer
23 views

Puzzling behaviour of two seemingly identical MapReduce functions

Our MongoDB database contains a list of all user accounts, where each new registration has a 'created_at' field in the account document with the current date and time when it was created. We wanted ...
0
votes
2answers
25 views

how to force hadoop to process more data per map

I have a job which is going very slowly because I think hadoop is creating too many map tasks for the size of the data. I read on some websites that its efficient for fewer maps to process bigger ...

1 2 3 4 5 94
15 30 50 per page