Extract Domain from URLs in Power Query Language M

Use Case – Sometimes there will be a need to extract domain names from given URLs. Hence http://www.microsoft.com should give microsoft. Below are some sample cases and the results expected.



Exclusion – Subdomains are excluded from the scope of this. Hence, https://powerbi.microsoft.com which has subdomain powerbi.microsoft is excluded.


Solution – Following formula can be used to extract domain from a given URL


= Text.Lower(List.RemoveMatchingItems(Text.Split(Text.Replace([URL],"://","."),"."),{"https","http","www"},Comparer.OrdinalIgnoreCase){0})


To see its working on sample data – Open a blank query – Home – Advanced Editor – Remove everything from there and paste the below code to test (later on when you use the query on your dataset, you will have to change the source appropriately)


    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("ZY8xT8QwDIX/S2eUnIRAp5NOLAyICaYOVYdcSZSIujaxe4F/f4kLJxBb/N7z++Jh6KII8cHaUoohLD6fkpkQuvFm6HSmdEYxwKq01EtTX1efv0xaVO37vgaC6ABsJqcvyW7hgPlXmZab9V2Vp4quZGAMcmU2AqQp47f6w4hbWFvAUcsf9rur9ecEQZy5JWxE8CYKzBrUnzuizUvLm/9s5nZDXcYQ0uRv7+/Upw/LDmj2/I9S6i6WrYbIimdRysPKPh+fMS6P6LtxvAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [URL = _t]),
    #"Added Custom" = Table.AddColumn(Source, "Domain", each Text.Lower(List.RemoveMatchingItems(Text.Split(Text.Replace([URL],"://","."),"."),{"https","http","www"},Comparer.OrdinalIgnoreCase){0}))
    #"Added Custom"



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.