Friday 24 October 2014

How to load a file in DistributedCache in Hadoop MapReduce



We can load an extra file using Distributed Cache.To do that we need to configure the Distributed Cache with needed file in Driver Class


Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path cachefile = new Path("path/to/file");
FileStatus[] list = fs.globStatus(cachefile);
for (FileStatus status : list) {
 DistributedCache.addCacheFile(status.getPath().toUri(), conf);
}
And in Reducers setup() or Mappers Setup() we will be able to read this file.
public void setup(Context context) throws IOException{
 Configuration conf = context.getConfiguration();
 FileSystem fs = FileSystem.get(conf);
 URI[] cacheFiles = DistributedCache.getCacheFiles(conf);
 Path getPath = new Path(cacheFiles[0].getPath());  
 BufferedReader bf = new BufferedReader(new InputStreamReader(fs.open(getPath)));
 String setupData = null;
 while ((setupData = bf.readLine()) != null) {
   System.out.println("Setup Line in reducer "+setupData);
 }
}
You can give 0,1,... if you supplied more than 1 cache file
Path getPath = new Path(cacheFiles[1].getPath());  

Happy Hadooping ....