Quantcast
Channel: Cloud BI – MSSQLDUDE Blog
Viewing all articles
Browse latest Browse all 23

Median function in Azure Data Factory

$
0
0

To perform a median (middle value of a sorted list), you need to put a couple of transformations together. Below are the steps needed to use median in ADF using data flows.

  1. Sort your data by the field that you wish to find median value
  2. Collect the values into an array
  3. Count the number of values
  4. Find the midpoint

median1

In my demo, I’m using the movies database CSV source and I would like to find the median rating value of movies grouped by year. My final result will be a single median value for each year that represents the median rating of movies for that year.

The Sort transformation sorts Ratings so that I know that they are in ascending order for my median calculation. Next is the Aggregate transformation which I use to group the data by year. Inside the aggregate, I use collect() so that I can have an index for each value to find the middle and a count() for the total number of indexes.

median2

Last thing I need to do is to find the middle. I do that as a calculation inside a Derived Column transformation. I call the new field “median” and apply this formula:

ratingsCollection[toInteger(round(ratingsCount/2)+1)]

The field ratingsCount was created in the aggregation and so I divide it by 2, round it to an integer and then add 1. Adding 1 means I won’t ever end-up with a 0 index value and I simply pick the higher middle index.


Viewing all articles
Browse latest Browse all 23

Latest Images

Trending Articles





Latest Images