Wednesday 21 January 2015

K-Nearest Neighbors Algorithm - KNN

KNN algorithm is a classification algorithm can be used in many application such as image processing,statistical design pattern and data mining.

As for any classification algorithm KN also have a model and Prediction part. Here model is simply the input dataset. While predicting output is a class membership. An object is classified by a majority vote of its neighbors (k), with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small).
1.  If k = 1, then the object is simply assigned to the class of that single nearest neighbor.
2.  If k=3, and the classlabels are Good =2 Bad=1,then the predicted classlabel will be Good,which contains the magority vote.

Lets see how to handle a sample data in KNN algorithm.


We have data from questionnaires survey and objective testing with two attribute to classify whether a special paper issue is good or not.

Here is for training sample.







Let this be the test sample





1. Determine the parameter k=the no.of nearest neighbours.
      Say  k=3
2. Calculate the distance between queryinstance and all the training samples. 

Coordinate of query instance is (3,7) ,instead of calculating the distance we compute square distance which is faster to calculate(without squareroot)


3. Sort the distance and determine Nearest neighbors based on the kth minimum distance.


4. Gather  the category Y  of the nearest neighbours .



-> the second row inthe last column that the category of nearest neighbours (Y) is not included becoz the rank of this data is more than 3(=k).

5. Use simple majority of the category of  nearest neighbors as the prediction value of query instance.

We have  2 good and 1 bad ,since,2>1 So we conclude that a new paper tissue that pass laboratory test with x1=3 and x2=7 is included in Good category.



Thursday 1 January 2015

Cloudera Certified Hadoop Developer (CCD - 410)


I cleared Cloudera Certified Hadoop Developer (CCD – 410) examination on December 31 st 2014.And received the certificate from cloudera on the very next day.

If you are planning to do this certification you need to know hadoop in depth and have hands-on experience too.

I started Hadoop career from my MCA (2010 - 2013) Final Year Major Project and it paved me to my current Job. Around 6 months after joining I planned to write Cloudera Certification Exam. I had around 1+ year experience in hadoop ,learned and practised hadoop myself. I thought it will be nice to attend the training session to see if I missed out any of the pointers. I registered with Cloudera and attended Cloudera Hadoop training @ Banglore.It was from 27 th to 30 th of March 2014 at Ibis Hotel, Banglore.It was my first trip to Banglore and was little bit tensed.

There were 13 trainees including me.And I was the only lady among them. Allan Schweitz was our trainer and Vipin Nahal from OssCube assisted him.

It was 4 days training which helps to know Hadoop Framework in depth , they also cover Hadoop EcoSystem Projects and Hands on assignments.After 4 days we will be able to know hadoop in depth.The 4 days class was really good and informative.If some one knows hadoop it will be like waste of time but still you can clarify your doubts.

At the end of 4 th day we received a training certificate and had a group pic. 



After training we recieved 180-day subscription to Cloudera Official Developer Practice Test Subscription for CCDH.This self-assessment will help you to discover strengths and weaknesses in your understanding and skills around Apache Hadoop and prepares you across the entire range of topics covered in a Cloudera certification exam.

Finally by 31 st of December 2014 I appeared for Hadoop Certification exam and cleared CCD 410 successfully.

There were around 52 questions in total and all options were easy and the answers seems to be similar and tricky. Here is my Certificate.



Advices to pass the certification examination
  1. Please dont depend on Cheating sites , Most of their answers are wrong.View some sample dumps from cheating site.
  2. Go through Hadoop - Definitive guide
  3. Gather a good knowledge in Hadoop EcoSystem projects.
  4. If attending Cloudera Training you will recieve 180 days subscription test as mentioned above.You can practise them.If you are getting an overall grade greater than 75% you will surely pass the examination.
  5. You will also get questions from EcoSystem projects(Hive,sqoop,Flume..) and programming questions related to MapReduce. All of them are output prediction.
Details For Cloudera Certification

1. Exam Code: CCD-410
    Number of Questions: 50 - 55 live questions
    Time Limit: 90 minutes
    Passing Score: 70%
    Language: English, Japanese
    Price: USD $295, AUD $300, EUR €215, GBP £185, JPY ¥28,500

2. You can log on to Cloudera PearsonVue for registering your Certification Test.
    First you need to set up your profile in cloudera pearson vue site.Once you have registered you will see a link to register for the exam and subsequently you can choose date and location. It will then take you to payment options where you need to pay for your certification Exam.

All the very best!!

For further information or queries or doubts regarding Hadoop you can contact me.