Tagged Questions

info newest faq votes active unanswered

The advantage of MapReduce is that it allows for distributed processing of the map and reduction operations. Provided each mapping operation is independent of the other, all maps can be performed in parallel - though in practice it is limited by the data source and/or the number of CPUs near that ...

learn more… | top users | synonyms

votes

0answers

20 views

Mongo-Hadoop simple test failing with NPE

This is an open issue posted in the support forum here but since I didn't get any response, I thought I should try asking here. I have an existing application that uses MongoDB as the data layer. ...

asked 3 hours ago

maxsap
2141031

votes

1answer

33 views

Graph Clustering for almost Clustered Graph by removing nodes(vertices)

I want to carry out Graph Clustering in a huge undirected graph with millions of edges and nodes. Graph is almost clustered with different clusters joined together only by some nodes(kind of ambiguous ...

asked 5 hours ago

Shatu
155110

-1

votes

0answers

6 views

mapreduce exception

I am running java mapreduce. The map() is doing some very small processing. I get the following exception Below in the log warning/error. After this, I see lot of ...

google-app-engine mapreduce

asked 5 hours ago

aswath
13

votes

0answers

13 views

zookeeper jar not found in HBase MR job

I have a web UI that tries to spawn a MR job on HBase table. I keep getting this error though: java.io.FileNotFoundException: File ...

mapreduce hbase zookeeper

asked 16 hours ago

figaro
1196

votes

0answers

12 views

Pig Script Knocking Data Nodes Offline

I'm running the following Pig script on a 12 node Hadoop cluster with 30 map/reduce tasks per node, each task having 2GB of memory: A = LOAD '/path/to/gzipped/logs' USING PigStorage('\t'); B = ...

hadoop mapreduce pig

asked 18 hours ago

Lucas
335

votes

1answer

18 views

Data Versioning (Hadoop, HDFS, Hbase backends)

I wonder how to version Data in Hadoop/HDFS/Hbase. It should be part of your model as changes are very likely (big-data is collected over a long time). Main Example for HDFS (file based backend). ...

hadoop mapreduce versioning hbase bigdata

asked 22 hours ago

manuel aldana
2,626517

votes

0answers

12 views

EOFException at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)

I was trying to run a matrix multiplication example presented by Mr. Norstadt under following link http://www.norstad.org/matrix-multiply/index.html. I can run it successfully with hadoop 0.20.2 but I ...

java hadoop mapreduce

asked yesterday

waqas
626

votes

0answers

14 views

hadoop streaming getting optimal number of slots

I have a streaming map-reduce job. I have some 30 slots for processing. Initially I get a single input file containing 60 records (fields are tab separated), first field of every record is a number, ...

hadoop mapreduce hadoop-streaming

asked yesterday

sunillp
414

votes

0answers

13 views

Amazon MapReduce for Visual Studio Sample for Beginners

Does anyone have any sample code/ getting started tips for using Amazon Map Reduce in Visual Studio? I am new to MapReduce AND the Amazon SDK for .NET, so I am looking for something that makes it ...

visual-studio-2010 amazon-web-services mapreduce amazon-emr

asked yesterday

user1416088
1

votes

0answers

12 views

How can image indexing be done in map-reduce framework using Lucene Image REtrieval APIs?

How can image indexing be done in map-reduce framework using Lucene Image Retrieval (LIRe) APIs? What algorithms can be used?

java hadoop mapreduce

asked 2 days ago

the_silent_lord
144

votes

1answer

35 views

shuffle error:exceeded max_failed_unique_matche : bailing out

I am new to hadoop and i am trying to execute the wordcount example. I have a cluster of 4 nodes made by virtual machines on my computer. Every time the job completes the map task but the reduce task ...

hadoop mapreduce

asked 2 days ago

DB cooper
104

vote

1answer

35 views

What types/classes of algorithms can be recast in the MapReduce paradigm?

A few 'quick questions': what types/classes of algorithms can be recast in the MapReduce paradigm? (eg k-means has a MR implementation) Are there any that can't be expressed in this way? What ...

algorithm parallel-processing hadoop mapreduce

asked 2 days ago

user7289
462519

votes

1answer

35 views

Mongo map/reduce slowdown on large collections

We have a seemingly simple map/reduce job that goes through logging data on a daily basis. On the development server, we can run this job over a very large number of documents, ~1M, and it takes about ...

mongodb mapreduce

asked 2 days ago

Spencer
1425

votes

1answer

23 views

Puzzling behaviour of two seemingly identical MapReduce functions

Our MongoDB database contains a list of all user accounts, where each new registration has a 'created_at' field in the account document with the current date and time when it was created. We wanted ...

javascript mongodb mapreduce

asked 2 days ago

Dave
2,651616

votes

2answers

25 views

how to force hadoop to process more data per map

I have a job which is going very slowly because I think hadoop is creating too many map tasks for the size of the data. I read on some websites that its efficient for fewer maps to process bigger ...

hadoop mapreduce

asked May 23 at 4:59

hiroprotagonist
264

15 30 50 per page

newest mapreduce questions feed

Tagged Questions

Related Tags