Scenario:
Sometimes we encounter a bunch of mixed strings, which contain various types of information, such as dates, numbers, currencies, etc. How to extract date/currency amount from such text strings?
Sample data:
Expected result:
Part1: How to extract dates from text strings
1. Add a new column that splits each cell into multiple elements by delimeter “ ” (Space) and stores them in lists
2. Iterate over the elements in lists and convert them to dates.If the element can’t be converted , errors will be returned and we need to use ‘try…otherwise..’ to replace these ‘error’ values with ‘null’ values.
3. Remove the empty elements in each list and extract each first element by index value {0}.
If there is no element in the list, an error will be reported.
(We still use ‘try…otherwise…’ to remove error)
4. Heretofore, we have realized the extraction of date. If you still need to convert it into a specified text date format, please try function ‘Date.toText’
The final formula for the date column is as follows:
=Date.ToText(
try
List.RemoveNulls(
List.Transform(
Text.Split([Source]," "
),each try Date.FromText(_) otherwise null
)
){0}
otherwise null,"YYYY-MM-DD"
)
Part2: How to extract currency amount from text strings
1. Add a new column that splits each cell into multiple elements by delimeter “$” and stores them in lists. e.g. row1:{xxx 2021-12-17 11:01:20 xx},{45 x}
2. Iterate over the elements in lists, and sequentially get the first y={1..20} values of each element, and store the the returned result in secondary lists.
3. Try to convert the elements in secondary lists to numbers, if error, return “null“
e.g. row1->list1-> list1.1{null,null……}, list1.2 {4,45, null,null……}
4. Extract the largest value in each secondary
e.g. row1->list1->list1.1 {null} ,list1.2{45} ->list1{null,45}
5. Remove the empty element in each list and extract the first element by index value {0}.
The final formula for the currency column is as follows:
=List.RemoveNulls(
List.Transform(
Text.Split([Source],"$"),(x)=>List.Max(
List.Transform({1..20},(y)=>try Number.From(
Text.Start(x,y)
)
otherwise null
)
)
)
){0}
It’s expected to get a result as below:
Related links:
List.Transform:Performs the function on each item in the list and returns the new list.
List.RemoveNulls:Removes null values from a list.
Date.FromText: Returns a Date value from a set of date formats and culture value.
Text.Split:Returns a list containing parts of a text value that are delimited by a separator text value.
Text.Start:Returns the count of characters from the start of a text value.
Number.From: Returns a number value from a value.
Author: Eason Fang
Reviewer: Ula Huang, Kerry Wang
Be the first to comment