
Develop a curiosity-driven workplace
Read Time: 5 mins
Written By:
Donn LeVie, Jr., CFE
Accountant and fraud investigator Mark Nigrini, Ph.D., popularized it in his book “Digital Analysis Using Benford’s Law,” first published in 2001. You’ve probably used Benford’s Law to analyze accounts payable amounts, purchasing card data and journal entries in your search for irregularities or risk areas. By searching for cases where the expected proportion of the first (as well as the first two, or even first three) digits in a payment or transaction stream don’t conform, you’ll find indications that someone might be overriding a control or manipulating the numbers — disrupting the digit patterns.
For this Innovation Update column, I’m going to describe how to take Benford’s Law to another level using artificial intelligence (AI) and automation so that fraud examiners may use Benford Subset Divergence Analysis (BSDA). Using BSDA, we can identify subsets such as business units, expense categories, vendor categories and any other type of filtering criteria generating the most significant nonconformity under Benford’s Law instead of looking at an entire dataset one time-consuming filter at a time. I was fortunate enough to collaborate with Nigrini to explore this concept further and test a few scenarios under his supervision.
A Scientific American article published last year tells the story of how Benford’s Law was originally identified in 1881 by astronomer Simon Newcomb. (See “What Is Benford’s Law? Why This Unexpected Pattern of Numbers Is Everywhere,” by Jack Murtagh, Scientific American, May 8, 2023.) Physicist Frank Benford made the same observation in 1938 and popularized the law — and attached his name to it. Some references attribute both names to the model, referring to it as the “Newcomb-Benford Law.” You may also see it referred to as the “law of anomalous numbers.” In many real-life sets of naturally occurring numbers, the first digit is likely to be small, starting with a 1, 2 or 3, for example. In sets that obey the law, the digit 1 appears as the first digit about 30% of the time, while 9 appears as the first digit less than 5% of the time. The first digit probabilities for 1 through 9 are as follows (zero isn’t admissible as a first digit even though we need to have the zero in some cases, such as 0.07):
Figure 1: Benford’s first-digit probabilities
The most effective Benford-related test for financial data is the first-two digits test because it results in smaller samples of notable items. This test works especially well in identifying large groups of transactions just below internal control thresholds or perceived thresholds, such as when a company’s vendor policy stipulates that any invoice over $5,000 requires a second approver or additional documentation for payment. The incentive for fraudsters is to keep an invoice just under that amount, perhaps maxing invoices out at $4,900 or even $4,999 to stay under the threshold. In Figure 2, you’ll see a spike in invoices starting with the digits “49” and “48.” Those digit combinations should be occurring around 0.89% of the time, but in fact, the digits occurred around 1.1% of the time (hence the spike). This is an indicator of someone artificially keeping invoices just under the $5,000 threshold.
Figure 2: Two-digit example
In my interview with Nigrini, he pointed out that Benford’s Law can be used to help fraud examiners or auditors identify:
As simple and powerful as Benford’s Law is to run on datasets, it does have a limitation, which stems from looking at large datasets in an aggregate, linear fashion. Nigrini recommends that fraud examiners look at datasets of no less than 2,500 but ideally greater than 5,000 transactions when using the first-two digits test. In large global companies, the 5,000 threshold is easy to obtain as journal entries can span hundreds of thousands, if not millions, of transactions. The challenge lies with the ability to spot rogue behavior within a small business unit, or a group of individuals within a geographic area or other classification when looking at transactions in the aggregate. Risk transactions not matching Benford’s Law can be easily “washed out” given the sheer volume of today’s transactional activity.
Data visualization tools like Tableau or PowerBI can be helpful to filter down large bodies of data on a case-by-case basis, using geography, expense type or other criteria. The goal is to home in on potential rogue activities within a subset of the aggregate data in a way that targets and detects anomalies across thresholds. However, this approach can still be time-consuming given the hundreds, if not thousands, of possible filtering combinations between geographies, business units, expense types, payment types and other factors used to drill into data.
But what if we could get a machine to run all these scenarios, instead of manually filtering one at a time?
In speaking to Nigrini about the challenge of running multiple filtering scenarios manually to find anomalies, we brainstormed ideas on how to automate this scenario process to identify only the subsets that drove the highest Benford’s Law deviations. Working with Kona AI’s head of data science, Roopak K. Prajapat, we analyzed a vast dataset spanning several hundred thousand invoice payments totaling more than $3 billion in spend. The aggregate data seemed to follow Benford’s Law rather nicely with a few immediately noticeable deviations, such as in payments starting with the digits 20 and 21.
We then used BSDA and automation with elements of robotic process automation (RPA) to run the data across five distinct classifications to identify the subsets with the largest deviations from Benford’s Law, subject to the 5,000 minimum sample size constraint. The variables we ran during our testing included:
The model ran more than 17,808 combinations in about 21 minutes of data processing on commodity hardware to identify the subsets with the largest divergence. (Compare that to the time it would take to do it manually.)
The most anomalous subset turned out to be a certain vendor — we won’t name them here — with the attributes listed in the table below.
We then fed those transactions to a predictive, machine-learning model to find “more-like-this” statistically similar transactions to further enhance the results. The revised Benford’s analysis lit up with all sorts of anomalies within this subset.
We provided those transactions to our customer for investigation, who deemed them anomalous enough to launch an investigation, which is ongoing. The investigators of this case shouldn’t only review vendor CCUS10 but also the related subsets, such as transactions in 2023 and 2022, or expenses posted to the same expense or asset account.
Benford’s Law has been, and always will be, a useful fraud detection tool. Now, with the use of better technologies and automation, techniques like BSDA allow investigators to identify the most anomalous subsets quickly and efficiently, turning what was once a time-consuming process (seldom even attempted) into a powerful automated tool for uncovering hidden fraud patterns in big data.
Vincent M. Walden, CFE, CPA, is the CEO of Kona AI, whose company mission is to empower compliance, audit, and investigative professionals with research-driven, innovative, and effective analytics to measurably reduce global fraud, corruption and enterprise risk. He works closely with CFEs, internal auditors, compliance, audit, legal, and finance professionals and welcomes your feedback and ideas. Contact Walden at vwalden@konaai.com.
Unlock full access to Fraud Magazine and explore in-depth articles on the latest trends in fraud prevention and detection.
Read Time: 5 mins
Written By:
Donn LeVie, Jr., CFE
Read Time: 18 mins
Written By:
Rasha Kassem, Ph.D., CFE
Read Time: 10 mins
Written By:
Vincent M. Walden, CFE, CPA
Read Time: 5 mins
Written By:
Donn LeVie, Jr., CFE
Read Time: 18 mins
Written By:
Rasha Kassem, Ph.D., CFE
Read Time: 10 mins
Written By:
Vincent M. Walden, CFE, CPA