Fraud Basics

Fraud examiners have a plethora of data analytics tools

Data analytics techniques play a crucial role in fraud prevention, detection and investigation. According to the 2020 Report to the Nations (ACFE.com/RTTN), proactive data monitoring and analysis are among the most effective anti-fraud controls. Organizations that undertake proactive data analysis techniques experience frauds that are 33% less costly and detect frauds 1.5 times as quickly as organizations that don’t monitor and analyze data for signs of fraud. And as stated in the 2022 ACFE Fraud Examiners Manual, “When properly used, data analysis processes and techniques are powerful resources for uncovering fraud. They can systematically identify red flags and perform predictive modeling, detecting a fraudulent situation long before many traditional fraud investigation techniques would be able to do so.” (See ACFE.com/FEM, Section 3: Investigation/Understanding the Need for Data Analysis.)

The sophistication and complexity of fraud schemes are growing and outclassing conventional fraud prevention, detection and investigation techniques. Fraudsters are developing new strategies to commit such crimes as vendor fraud, employee expense fraud, financial statement fraud, bribery and asset misappropriation.

Global data volumes continue to grow exponentially, but we can quickly harness this data to identify unusual patterns or red flags that may have previously gone undetected. In this column, we draw on ACFE resources to discuss ways that management and fraud investigators can use data analytics to prevent, detect and investigate fraud.

Fraud data analytics is the science and art of discovering and analyzing patterns, identifying anomalies, and extracting other useful information in data underlying or related to fraud. Fraud data analytics isn’t a new topic to readers of Fraud Magazine. Indeed, the ACFE, through the years, has published many reports and articles on the role of data analytics in our jobs. For example, the Anti-Fraud Technology Benchmarking Report (ACFE.com/techreport), which the ACFE developed in partnership with SAS in 2019, found that using data analytics techniques, such as data visualization, predictive analytics, artificial intelligence and machine learning, is expected to grow considerably.

The Anti-Fraud Playbook (ACFE.com/fraudrisktools), developed in partnership with Grant Thornton, provides practical guidance to begin or advance fraud risk management programs and benchmark them against industry best practices. The playbook includes 10 plays, which are organized into five phases based on the fraud risk management principles in the ACFE/COSO Fraud Risk Management Guide (ACFE.com/fraudrisktools): fraud risk governance, fraud risk assessment, fraud control activities, fraud investigation and corrective action, and fraud risk management monitoring activities. Play 5 in the playbook says that data analytics is an important part of an effective and holistic fraud risk management program. We can easily implement many anti-fraud analytics tests with basic spreadsheet software. But the most advanced organizations are leveraging robotics, machine learning and artificial intelligence to enhance their anti-fraud analytics programs.

The Anti-Fraud Playbook encourages us to use the Anti-Fraud Data Analytics Tests interactive tool (ACFE.com/fraudrisktools) to help identify the red flags of various occupational fraud schemes. The tool is based on the structure of the ACFE Occupational Fraud and Abuse Classification System, also called the Fraud Tree (ACFE.com/fraudtree). Since the inception of the Report to the Nations (ACFE.com/RTTN) in 1996, the taxonomy of fraud schemes provided in the Fraud Tree has been an excellent way to study and practice fraud investigation. Accordingly, tying data analytics tests to the taxonomy benefits anyone responsible for preventing, detecting or investigating fraud.

The tool allows users to drill down to a specific scheme type and see data analytics tests that are relevant to that fraud risk. For example, suppose you were interested in possible data analytics techniques related to purchasing schemes, you’d just click on “Corruption” and drill down to “Purchasing Schemes,” and you’d see corresponding tests. (See the following figure.)

2022-marchapril-fraud-basics-data-analytics-tool

ACFE Anti-Fraud Data Analytics Tests — interactive tool example

Deploying data analytics in an anti-fraud program

So, how can management and fraud investigators specifically use data analytics to prevent, detect and investigate fraud? Here’s a suggested approach:

  1. Identify fraud risk factors.
  2. Identify areas susceptible to fraud schemes and potential fraud scenarios.
  3. Understand relevant data sources.
  4. Analyze data.
  5. Draw insights from data.
  6. Act on insights.

The ACFE Fraud Risk Assessment Tool (ACFE.com/frat) can help with the first two steps plus identify clients’ or employers’ vulnerabilities to internal fraud, and develop fraud risk responses. A fraud scenario, not mining for data errors, drives fraud data analytics. The purpose of the fraud scenario is to act as the design plan for the fraud examiner who uses it to create routines that search databases for transactions, which meet the data profile for each fraud scenario. The red flags associated with each fraud scenario provide the basis of the selection of transactions for investigation.

Data analytics techniques are particularly applicable in steps 3 to 5 in our list above. Again, use the interactive tools in the ACFE Anti-Fraud Data Analytics Tests to drill down to a specific scheme type and see data analytics tests that are relevant to that fraud risk.

The data analytics tests require specific procedures to complete the tests. Here’s a more detailed discussion of steps 3 to 5.

Step 3: Understand relevant data sources

Data consists of numbers, letters, words, images, voice recordings and more as measurements of a set of variables (characteristics of the subject or event that we’re interested in analyzing). Data — classified as structured or unstructured — are often described as the lowest level of abstraction from which we derive the most detailed information and then knowledge.

Raw data is often “dirty,” misaligned, overly complex and inaccurate. Ultimately, any data that takes away from the data integrity of the entire dataset is considered dirty.

Use these preprocessing steps to organize data for analysis:

Data consolidation: Collect data, select data and integrate data. (See “What is data consolidation?” Stitch.) In this step, the relevant data is collected from the identified sources, the necessary records and variables are selected (based on an understanding of the data unnecessary information is filtered out), and the records coming from multiple data sources are integrated/merged. Data cleaning: Impute values, reduce “noise” and eliminate duplicates. (See “Understanding Data Cleaning,” by Krina Pajwani, Great Learning, April 26, 2021, and “Defining, Analysing, and Implementing Imputation Techniques,” by Shashank Singhal, Analytics Vidhya, Data Science Blogathon, June 21, 2021.)

Data transformation: Normalize data, discretize data and create attributes. (See “Data Transformation in Data Mining,” by Prakash Chandra Patel, GeeksforGeeks, Feb. 3, 2020.) In this step, for example, the data is normalized between a certain minimum and maximum for all variables to mitigate the potential bias of one variable having large numeric values dominating other variables having smaller values.

Data reduction: reduce dimension and volume, balance data. (See “Data Reduction,” Neha T, Binary Terms, Sept. 14, 2020.) Investigators generally like to have large datasets, but too much data can be a problem. In the simplest sense, we can visualize the data commonly used in predictive analytics projects as a flat file consisting of two dimensions: variables (the number of columns) and cases/records (the number of rows). In some cases (e.g., image processing and genome projects with complex microarray data), the number of variables can be rather large, and the analyst must reduce the number to a manageable size. Because the variables are treated as different dimensions that describe the phenomenon from different perspectives, this process is commonly called dimensional reduction (or variable selection).

The value of data preprocessing is huge. It takes time, but the investment is worth it because your efforts will result in datasets that are ready for analysis.

Step 4: Analyze data

You’ll use statistics to analyze data with familiar tools, such as Excel, ACL, IDEA and SAS. Indeed, using statistics to characterize and interpret data is at the heart of many data analytics procedures. What tends to be different today is that more data is available for analysis, and data isn’t limited to single structured files. For example, you can now combine structured and unstructured data sources to create a single dataset that you can analyze with time-tested statistical procedures.

Data visualization software such as Power BI and Tableau can help you explore, understand and communicate data. (Data visualization involves transforming data into a visual format such as a map, graph or picture to better convey difficult concepts. See “Data Visualization. What it is and why it matters,” SAS Insights.) The use of visualization to analyze data isn’t new.

However, current visualization capabilities are more robust. For example, the ability to create dynamic visualizations that change as parameters and underlying data change facilitates real-time analysis. Analyzing data may also include relatively new fraud investigation techniques, such as supervised or unsupervised machine-learning techniques.

In supervised learning, the computer program (such as Python) is presented with sample inputs (such as loan application records) and their associated outputs (whether the loan application record relates to fraud). The goal is to devise a general rule, such as customers with a related chargeback record, as indicators that map those inputs to outputs. You “train” a supervised model for fraud detection by presenting it with records associated with both fraudulent and legitimate activities. The model will then seek to define a function or instruction set that can predict the presence of fraud when applied to new data.

In unsupervised learning, the learning algorithm isn’t given any labels. The algorithm is on its own to find structure in the input, such as discovering hidden patterns in the data. Because you don’t know which data represents fraudulent activities, you want the model to create a function that describes the structure of the data, flags anything that doesn’t fit the norm as an anomaly, and then applies this knowledge to new and unseen data.

As described in the Anti-Fraud Fraud Playbook, as you execute your analytic techniques and tests, it’s important that you iterate and modify — based on the data you receive — data quality, user feedback and test results. This ongoing process will require refining your models as needed to ensure the effectiveness of the techniques and the accuracy and relevance of the results.

Step 5: Draw insights from data

We analyze data to draw insights. The insights usually will relate to the questions we initially planned to answer. For example, because fraud scenarios will drive fraud data analytics, insights will usually relate to the likelihood that these scenarios are possibly occurring or could occur. However, in some cases, insights won’t relate to the initial questions. For example, you might use data analytics in a fraud investigation to determine whether predication exists only to discover that data is indicating another fraud scenario.

Step 6: Act on insights

The findings from the fraud data analytics process will rarely be sufficient on their own to conclude that fraud is or isn’t occurring. Indeed, the fraud data analytics process is usually carried out to focus on areas that require additional testing such as interviews and document review.

We hope our suggested approaches, in conjunction with these ACFE resources, will help management and fraud investigators use data analytics to prevent, detect and investigate fraud.

Martin J. Coe, DBA, Educator Associate, CPA, is a professor and chair of the data analytics program at Augustana College in Rock Island, Illinois. Contact him at martycoe@augustana.edu.

Olivia Melton, Educator Associate, CPA, is an assistant professor at Augustana College in Rock Island, Illinois. Contact her at oliviamelton.augustana.edu.

Begin Your Free 30-Day Trial

Unlock full access to Fraud Magazine and explore in-depth articles on the latest trends in fraud prevention and detection.