PATHITEM vs Parsing. 2 ways to obtain an array ele…


It’s not the best practice when some structured data is stored as split by delimiter text in one column. But real life does not always equal ideal life, and I see such data sources quite often.

I’ve Python-ed a random 500k-lines csv-file (see attached) with the only column that has a structure like this:

 

 

GRP441871;Category667057;xtxtkjpdtwiazuszitjgabrz;AddInfo
GRP601591;Category807428;mweemtqqdsdinhwpfqrubqzbowju;AddInfo
GRP330108;Category994;dyitjeffwvhwzlpksjjo;AddInfo

 

 

Let’s have a look at a very common task – to get a value from a certain position in each row (and from each cell, to be more correct).

First, for example, we need to extract the third value from each row:

  • xtxtkjpdtwiazuszitjgabrz
  • mweemtqqdsdinhwpfqrubqzbowju
  • dyitjeffwvhwzlpksjjo

The first choice to solve it for any Power BI specialists is to split a column by delimiter in Power Query. Unfortunately, this option is not always available or comfortable (e.g. due to variable number of delimited columns or DirectQuery usage).

Let’s try to do it with DAX.

Note: For further analysis and methods comparison, I will use a Measure.

 

We can execute a simple parse: find substring between the second and the third delimiters, like this:

 

 

Third Value Parsing Measure = 
var _val = MAX([Column1])    
var _firstDelimiter = SEARCH(";", _val)    
var _secondDelimiter = SEARCH(";", _val, _firstDelimiter + 1)    
var _thirdDelimiter = SEARCH(";", _val, _secondDelimiter + 1)

RETURN
MID(_val, _secondDelimiter + 1, _thirdDelimiter - _secondDelimiter - 1)

 

 

 

Or, a much more easy-looking Measure with PATHITEM:

 

 

Third Value PATHITEM Measure = 
var _val = MAX([Column1])

RETURN
PATHITEM(SUBSTITUTE(_val, ";", "|"), 3)

 

 

OK. PATHITEM looks pretty. What about performance?

 

I created 2 different visuals with measures that created above

third measure.PNG

 

Now, it’s time to check Performance Analyzer. Start recording, refresh visuals, voilà!

third measure performance.PNG

PATHITEM is not only easy-looking, it’s also an easy-performed operation. 10% effective for DAX query execution – not bad at all.

 

Conclusion

We can draw the conclusion that the PATHITEM trick should be the first option over a string. But as data pros we have to have a critical mindset. What if it is not the greatness of PATHITEM but just a lack of parsing method?

 

Conclusion

Let’s create a new test.

I will try to extract the very first value for a structured column:

 

 

First Value Parsing Measure =
var _val = MAX([Column1])
    var _firstDelimiter = SEARCH(";", _val)

RETURN
LEFT(_val, _firstDelimiter - 1)

 

 

 

And the same as before for PATHITEM

 

 

First Value PATHITEM Measure =
var _val = MAX([Column1])
RETURN
PATHITEM(SUBSTITUTE(_val, ";", "|"), 1)

 

 

 

OK. See the Performance Analyzer:

first measure performance.PNG

 

Oops. PATHITEM takes twice as long as the only LEFT parsing method!

 

Final conclusion

PATHITEM is a good alternative solution that can be used instead of parsing statements that are too complex.

But as with many other programming algorithms, the choice of the appropriate method is always a compromise between coding, debugging and memory consumption.



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*