#3 - Measuring Data Similarity or Dissimilarity

by Atul Singh on May 12, 2017 in Attribute, Dissimilarity, Distance, Measures, Ordinal, Similarity, Statistics

Continue from -
'Measuring Data Similarity or Dissimilarity #1'
'Measuring Data Similarity or Dissimilarity #2',

3. For Ordinal Attributes:

Ordinal attribute is an attribute with possible values that have a meaningful order or ranking among them but the magnitude between successive values is not known. Ordinal values are same as Categorical Values but with the Order.

Such as, For "Performance" columns Values are - Best, Better, Good, Average, Below Average, Bad

These values are Categorical values with order or rank so called Ordinal Values. Ordinal attributes can also be derived from discretization of numeric attributes by splitting the value range into finite number of ordered categories.

We assign rank to these categories to calculate the similarity or dissimilarity, i.e. - There is an attribute f having N possible state can have `1, 2, 3........f_N` ranking.

Measuring Data Similarity or Dissimilarity for Ordinal Attributes

How to Calculate Similarity or Dissimilarity:

1, Assign the Rank `R_if`to each category of attribute f having N possible states.
2. Normalize the Rank between [0.0, 1.0] so that each attribute have equal weight.
Can be calculated as

`R_in = \frac{R_if - 1}{N - 1}`

3. Now Similarity or Dissimilarity can be calculated with any distance measuring techniques. ( 'Measuring Data Similarity or Dissimilarity #2)

Like the below page to get update
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

About Atul Singh
I am a Data Consultant at a Canadian financial firm. My keen interests varies from Data Analytics, ML, Kubernetes, NLP to ETL. I love to blog and travel in my spare time. If you’d like to get in touch, feel free to say hello through any of the social links.

Disclaimer

The postings on this site are my own and don't necessarily represent IBM's or other companies positions, strategies or opinions. All content provided on this blog is for informational purposes and knowledge sharing only.

The owner of this blog makes no representations as to the accuracy or completeness of any information on this site or found by following any link on this site. The owner will not be liable for any errors or omissions in this information nor for the availability of this information. The owner will not be liable for any losses, injuries, or damages from the display or use of his information.

DataGenX - Atul's Scratchpad

Breaking

Friday, May 12, 2017

#3 - Measuring Data Similarity or Dissimilarity

3. For Ordinal Attributes:

How to Calculate Similarity or Dissimilarity:

-

Follow Us

Search This Blog

Blog Archive

Disclaimer