Duplicate Detection rules in Dynamics 365 Customer Engagement (CE) are great for preventing duplicates, but aren’t helpful once you already have thousands of duplicate records in your system. Luckily you can use Power Automate to automatically delete duplicate records based on your criteria rather than manually paging through each and every record.
In this example we will be deleting duplicate Accounts where a duplicate is defined as an Account with the same Account Number. We will keep the Account that was created first, aka has the oldest Created On value.
To begin, create a Flow with a Manual Trigger. Then create a List Records step on your target entity, in this case Accounts. Since we are checking all Accounts for duplicates, we don’t need any filter criteria. However, since we want to review as many records as possible, we do need to click the “…” in the top right of the step, click “Settings”, activate “Pagination” and set the “Threshold” to the maximum allowed value of 100,000. If you don’t take this step, your Flow will only check 512 records by default and probably won’t catch all of your duplicates.
The next step is to add another List Records step. This one will run in an Apply to Each loop to iterate through each Account from the first loop and check for duplicates of that record. To do this for our criteria regarding Account Numbers, use the filter expression:
accountnumber eq [Account Number]
With a Dynamic Value from the first List Records step as the Account Number. Then add an Order By value for our chosen criteria, Created On date:
This will return the Accounts with this Account Number in order from oldest to newest. Note that the ascending order “asc” is the default so we don’t actually need to specify it, but I included it for clarity.
The result should be a List records step nested in an Apply to each loop like the below. This second list records step will return all Accounts with the same Account Number, including the one returned from the first list records step.
The next step is a Condition. It checks if each Account is the oldest version from this set of Accounts with the same Account Number, so we can then keep the original and delete the duplicates.
There are a couple ways to perform this check. One is creating a variable then incrementing it for each result, deleting all Accounts returned after the first record increments the variable, then resetting the variable. However, in this example I will use an expression to get the Account GUID of the first result directly out of the List Records 2 output, then compare each result’s Account GUID to it and delete all records that don’t match. This has the advantage of enabling you to set the loops to process concurrently without causing errors if needed.
To take a value directly from this List records step use the expression:
More generally, use the formula
body('List Records Step Name with spaces replaces with underscores')?['value'][result number you want ascending from 0]['schema name of field']
And make sure to enter it as an Expression rather than typing it directly as text. Note that when you enter this value the Condition won’t immediately be wrapped in an Apply to Each loop, since this expression refers to a fixed result from the List records 2 step. Leave the condition as is equal to and add the unique identifier of Account from List records 2 on the right hand side, which will then create an Apply to each 2 loop.
The condition will now return True for the oldest Account with this Account Number and False for the rest. The final step is to add a Delete a record step in the False branch for any Accounts that reach that branch. Your complete bulk delete Flow should look like this:
Overall the Flow works by listing each Account, then for each record listing all Accounts with the same Account ID. It keeps the oldest result from that list then deletes all of the rest. Note that the first List records step will return all Accounts, even those that are later deleted, so the second List records will be run multiple times on the same Account ID. That doesn’t cause a problem because regardless of how many duplicates there are, after the first round processing a given Account ID there will only be one Account with that ID left and it will always be kept as the first result in future executions.
With this design you can set the concurrency control on both loops to use concurrent processing to improve runtime. I didn’t get into detail on that aspect as it likely isn’t a priority for a Bulk Deletion job that will likely run infrequently or a single time.