Matching words in multiple columns


Scenario:

Suppose I want to confirm if there are some same or similar words in these multiple columns. Then mark ‘not match’ if they are distinct and reflect the same words if there is any.

 

Sample data:

1.png

 

Expected output:

2.png

 

Guide Line:

To achieve this requirement, we have two main steps:

  1. Create a ‘Word Dictionary’ in power query which is prepared to match words
  2. Create a calculated column using DAX to get the matched result

 

Operations:

  1. Create a ‘Word Dictionary’ in power query:

To create a dictionary, we need to split these columns by space, combine them into a single table without duplicated values.

1) Split column by space, for example, column [Category], create it as a new tableA, rename the new column as [Word]:

TableA = Table.RenameColumns(Table.ExpandListColumn(Table.TransformColumns(Table.SelectColumns(OriginTable,{"Category"}), {{"Category", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Category"),{"Category","Word"})

 

Let me explain this query code:

This query combine filter rows, split columns, rename column into one single query:

Filter rows: Table.SelectColumns(…), which is used to filter column [Category]

Split columns:  Table.ExpandListColumn(…), when using split columns feature only, you can find this part code

3.png

Rename column: Table.RenameColumns(…), which is used to rename [Category] to [Word]

This query will return a table like this:

4.png

 

If the individual query looks complex, I will split them into a table with each function so that you can see more clearly.

Use Table A as an example:

 

let
    OriginTable = Table,
    #"Removed Other Columns" = Table.SelectColumns(OriginTable,{"Category"}),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Removed Other Columns", {{"Category", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Category"),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Category", type text}}),
    #"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Category", "Word"}})
in
    #"Renamed Columns"

 

Table B and Table C are the similar as Table A.

 

2) Through the above query, we’ve got three separate tables in power query, then create a blank query to combine them as a Word Dictionary Table:

                

Source = Table.Combine({#"Table A",#"Table B",#"Table C"})

3) Remove duplicated values:

               

#"Removed Duplicates" = Table.Distinct(Source)

               

Now we have got a word dictionary table (a part of it in the below), we can disable load Table A, Table B and Table C before shutting it down and applying it to power query to improve performance.

5.png

6.png

 

  1. Create a calculated column using DAX to get the matched result

In step1 we have created a word dictionary in power query, now we will use this dictionary to ‘search’:

 

Create this calculated column:

Match =
COALESCE (
    CALCULATE (
        MAX ( 'Word Dictionary'[Word] ),
        FILTER (
            ALL ( 'Word Dictionary' ),
            CONTAINSSTRING ( 'Table'[Category], 'Word Dictionary'[Word] )
                && CONTAINSSTRING ( 'Table'[Class], 'Word Dictionary'[Word] )
                && CONTAINSSTRING ( 'Table'[Name], 'Word Dictionary'[Word] )
                && CONTAINSSTRING ( 'Table'[Code], 'Word Dictionary'[Word] )
        )
    ),
    "Not match"
)

 

In this DAX formula, CONTAINSSTRING() will return TRUE or FALSE indicating whether one string contains another string and it is not case-sensitive. In this sample, it compares each column value with ‘Word Dictionary’. COALESCE() will return the first expression that does not evaluate to BLANK. In this sample, it will return the result from ‘Word Dictionary’ which matches all the columns.

 

Now we will get the final expected result:

7.png

 

We can hide the Word Dictionary table in the Fields tab at last, only show the source table.

8.png

 

This is about how we can return the same word from multiple columns. Hope this article helps everyone with similar questions.

 

 

Author:  Yingjie Li

Reviewer: Ula Huang, Kerry Wang

 

 

 

 

 

 

 



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*