Friday, 24 October 2014

How to load a file in DistributedCache in Hadoop MapReduce



We can load an extra file using Distributed Cache.To do that we need to configure the Distributed Cache with needed file in Driver Class


Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path cachefile = new Path("path/to/file");
FileStatus[] list = fs.globStatus(cachefile);
for (FileStatus status : list) {
 DistributedCache.addCacheFile(status.getPath().toUri(), conf);
}
And in Reducers setup() or Mappers Setup() we will be able to read this file.
public void setup(Context context) throws IOException{
 Configuration conf = context.getConfiguration();
 FileSystem fs = FileSystem.get(conf);
 URI[] cacheFiles = DistributedCache.getCacheFiles(conf);
 Path getPath = new Path(cacheFiles[0].getPath());  
 BufferedReader bf = new BufferedReader(new InputStreamReader(fs.open(getPath)));
 String setupData = null;
 while ((setupData = bf.readLine()) != null) {
   System.out.println("Setup Line in reducer "+setupData);
 }
}
You can give 0,1,... if you supplied more than 1 cache file
Path getPath = new Path(cacheFiles[1].getPath());  

Happy Hadooping ....

2 comments:

  1. Fortunately, Apache Hadoop is a tailor-made solution that delivers on both counts, by turning big data insights into actionable business enhancements for long-term success. To know more, visit Hadoop Training Bangalore

    ReplyDelete