Juliette Gust, CFE
Read Time: 5 Mins
Written By:
Anna Brahce
These three predictive analytical models can help you pinpoint fraud risk areas so you can make wise detection and deterrence decisions with available resources.This article is excerpted and adapted from "Fraud Analytics: Strategies and Methods for Detection and Prevention"; Delena D. Spann; Copyright 2014 (John Wiley & Sons). Reprinted with permission of John Wiley & Sons Inc.
Sylvia, a fraud examiner with PIC Insurance Inc., has a challenge. As director of the company's special investigative unit, she's operating on a limited budget. Her boss wants her to come up with the company's biggest fraud risks so they can conduct a "triage" of the most serious problems. PIC has seen an increase in hit-and-run losses with injuries. The National Insurance Crime Bureau tells Sylvia that it has identified many organized auto accident rings around the U.S. that are perpetrating this fraud. How can Sylvia know if that and other schemes will increase, and, if so, how many resources should the company allocate to their anti-fraud program? The solution is predictive analytics and modeling.
Sylvia hires Team Stats LLC, which uses predictive analytics to help identify insurance claims that need further investigation versus those that PIC can process through the normal claims handling system. Team Stats begins by analyzing a historical data set of previously investigated hit-and-run losses with injuries. From that analysis, Team Stats selects 15 specific attributes to build a predictive model, which indicates that this fraud will increase in the next few years. PIC is convinced that it should devote more resources to combat hit-and-run losses.
This fictitious (but accurate) case shows the worth of predictive analytics in modeling future fraud risks.
THREE POPULAR PREDICTIVE MODELING METHODOLOGIES
For years, fraud examiners have used both fraud data analytics and predictive analysis to detect and predict fraud or suspicious activity. Fraud data analysis (see Fraud EDge) helps to identify past behavior and predictive analysis — or modeling — helps determine future behavior.
Fraud examiners can use predictive analytics to detect potential security threats and duplicate payments, insurance fraud, credit card fraud and to establish patterns in high-crime areas, among other activities. Predictive analytics confirms that fraud is always changing and therefore fraud-fighting methods also should change.
In this article, I'll compare and contrast three popular predictive modeling methodologies: CRISP-DM 1.0, SAS with SEMMA, and the 13 Step Score Development. (See Figures 1 and 2 at left, and Table 1 below, respectively.) I don't have room here to delve into all the intricacies of the models, but I'll hit some of the major points.
OVERVIEW OF FRAUD ANALYTICS AND PREDICTIVE ANALYTICS
Fraud analytics examines historical evidence — data — to determine if and how fraud occurred, who was involved and when it occurred. In the past, the basic spreadsheet was the master of fraud analytics. However, a new revolution has taken us by force; we now have constantly evolving strategies, data-mining techniques and powerful software.
Conversely, with predictive analytics, fraud examiners take selected sets of variables known to have been involved in past fraud events and place those variables into processes to determine the likelihood that future outcomes or events will or won't be fraud.
Fraud examiners must use fraud analysis in predictive analytics' development, collection of information, deployment, evaluation and assessment of results. Data analysis for fraud, however, doesn't require the use of predictive models.
Of course, incomplete or inaccurate data can create havoc with both processes. We could argue that predictive modeling is more dependent upon quality of data because it derives one of its greatest benefits from quick action based upon results. If the results are tainted, time is lost. However, even though economical use of time is part of the analytical process, it doesn't upset the process as greatly if a step needs repeating or fine-tuning. Efficiency and efficacy are requisite factors in deterring, detecting, preventing, investigating and prosecuting fraudulent activity.
Data analysis has a standard linear process while predictive models have a non-linear design.
COMPARING AND CONTRASTING METHODOLOGIES
Each of the three predictive models has a different number of steps, and each begins with some type of objective. The SAS model (developed by SAS Institute Inc., a producer of statistics and business intelligence software) is an incomplete method unless and until it is combined with SEMMA (sample, explore, modify, modeling and assessment). The steps in the SAS model are simpler than those in CRISP-DM, which are detailed and complex. (CRISP-DM — Cross Industry Standard Process for Data Mining — was developed by five companies as a European Union project.)
Another obvious difference among the three is that 13 Step uses a scoring strategy throughout the model, whereas CRISP-DM and SAS don't. (The 13 Step method was developed by Wesley Wilhelm and Alan Jost.)
Step 1 in SAS is to establish a business objective, step 1 in 13 Step is to create a model design plan to solve a business problem and step 1 in CRISP-DM is to obtain a "business understanding." This first stage of 13 Step and CRISP-DM specifically includes a directive of determining a business/project objective.
Understanding the business and objective seems critical to the success of a predictive model. Yet, SAS seems to miss much of the detail needed to adequately complete the first step.
Both CRISP-DM and 13 Step dive into understanding the organization and identifying what resources (including personnel, data, equipment and facilities) are available throughout the course of the project. These two models take the objective a step further to design a plan or strategy that outlines a timeline and the tasks that must be completed to achieve success. (See CRISP-DM 1.0, Step-by-Step Data Mining, by P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer and R. Wirth, CRISP-DM Consortium, 2008.)

A surprising difference that occurs at the beginning of these models is the amount of planning and preparation required for the process. SAS keeps step 1 extremely simple by merely requiring the modeler to determine the type of business decision that needs to be automated and which modeling techniques are appropriate for the problem. Only CRISP-DM takes into account risk and contingency planning. With such a challenging task ahead, it seems unthinkable to proceed without having a back-up plan in place. Additionally, for any organization to agree to execute some type of predictive modeling experiment, the executives will want to see a cost-benefit analysis prior to making any decisions; again, CRISP-DM is the lone model to provide this information.
The next couple of steps of each model correlate to multiple steps in the other models, each involving aspects of the data. For instance, step 2 of SAS is data management, but step 3 — model development — also include stages for handling the data. Steps 2 through 7 of 13 Step all involve the data. Steps 2 and 3 of CRISP-DM play an important role with the data.
Obviously, one of the most important parts of predictive modeling is selecting and collecting the data. This task is performed in the second stage of each of these methodologies. Another highly important task involving the data is the act of cleaning it and ensuring it's of high quality. This vital step corrects any data-entry errors, removes discrepancies from third parties and eliminates any missing values. (See "Operationalizing Analytic Intelligence: A Solution for Effective Strategies in Model Deployment," SAS Institute, 2005.)
According to "Operationalizing Analytic Intelligence," both SAS and CRISP-DM seem to hold the original data to a higher standard. SAS suggests that the modeler explore the data to find patterns, segments, anomalies and abnormalities.
During the data understanding phase of CRISP-DM, the modeler is also tasked with exploring the data "using querying, visualization and reporting." These two methodologies also take into consideration integrating data from several different sources.
Meanwhile, the 13 Step model places more emphasis on existing variables and creating derived variables. Granted, the other models also utilize derived variables (in the modify phase of SEMMA in SAS and during the construct data phase of step 3, "data preparation," in CRISP-DM). However, these are small pieces of the respective modeling steps. In 13 Step, the derived variable gets attention in four separate steps: 3, 4, 6 and 7.
When it comes to the data, each of these models also allows for the creation, formatting and substitution of missing values, as long as it can be explained why and how those variables were created.

Once the data has been prepared, each methodology moves into a modeling stage. Step 3 of SAS is model development, step 4 of CRISP-DM is modeling, and steps 8 and 9 of 13 Step are "define outcome variable and modeling technique," and "build statistical model," respectively. Upon closer examination, the similarities between these modeling steps become visible. The first step for each model during this phase is to select the appropriate techniques and tools. Some of the common techniques employed include linear regression, decision trees, neural networks and traditional statistics. It's not uncommon for a modeler to select several techniques or tools to use simultaneously during this period — primarily because not all techniques are suitable for all types of data, especially when various constraints are present.
After the fraud examiner has chosen techniques and tools, he or she is ready to build the predictive model. Each of the three methodologies implements a step to validate or test the model. Although CRISP-DM seems to duplicate this process by creating a "test design" of the model to confirm the quality and validity of the model, it then requires an assessment after the model is built. Meanwhile, the other methodologies just experiment with the model in a test environment.
The next phase in the modeling stage is to assess and/or evaluate the model.
Again, each of the three methodologies performs some type of an assessment of the model. SAS incorporates this action into step 3, CRISP-DM places it in both step 4 and step 5 (evaluation), and 13 Step summarizes and documents the process at steps 10 and 11. During this assessment, the modeler might confront new discovered issues, such as in SAS; review the qualities of the model, make necessary revisions and ensure that it meets the stated business objective, such as in CRISP-DM; or summarize the results to determine if the model should even be deployed as done in 13 Step.
If appropriate, the next step in the process would be to move the model into a live environment. This stage is identified at step 4, "model deployment," of SAS; step 6, "deployment," of CRISP-DM; and step 12, "implement the model in production," of 13 Step. According to some, deployment can be the most time-consuming stage in predictive modeling. The modeler must create a plan and strategy to deploy the model and ensure that it's operating as expected. (See "Operationalizing Analytics Intelligence.")
While deployment sounds like the last step, there's still one more piece that completes the methodologies. The final steps are "model management," step 5 of SAS; "track model performance," step 13 of 13 Step; and another piece of deployment, step 6 in CRISP-DM. These concluding steps provide for maintenance of the models, which ensures that they are making the right decisions.
The fraud examiner may prepare a final report discussing the results, what went right or areas that need improvement. In both SAS and CRISP-DM, it seems that if, over the course of time, new data is available or received or new variables identified, the fraud examiner can change or alter the model to accommodate this. In fact, with reviews in virtually every step of CRISP-DM, the process could start over at any time. In SAS, the fraud examiner wouldn't go all the way back to the beginning but rather start over in the second step. This same cyclical process isn't apparent in step 13 or any other step of the 13 Step model.
Overall, the SEMMA portion of the SAS model is an easier and faster model to utilize than CRISP-DM. SAS is more concise and less costly, yet CRISP-DM provides more finality. While the tasks of each step differ across the three models, they basically cover very similar activities and ultimately strive to accomplish the same goal: predicting future occurrences of a particular incident.
The 13 Step model is a very tedious, time-consuming and costly model. It will be successful for a few business objectives, such as determining the likelihood of a person's debt by viewing monthly expenditures, and whether and when a credit card account will become delinquent. This depends upon the operational definition and variables the fraud examiner chooses earlier in the model.
It's not surprising that we have an abundance of predictive models given our need for different options of the ordering of the steps, items that need additional focus and varying techniques. Some are better than others, but none are perfect all the time. (I've borrowed steps from several predictive analytic models to create a hybrid predictive analytic model. Table 1 is a high-level overview of the steps contained in the various models. Table 2 is an overview of comparisons between fraud analytics and predictive analytics.)
Any of these models, or a combination of the three, will help you set goals and write budgets as you decide where to place your major anti-fraud efforts.
Delena D. Spann, CCA, is employed with a U.S. federal law enforcement agency. She's a former member of the ACFE Board of Regents.
Unlock full access to Fraud Magazine and explore in-depth articles on the latest trends in fraud prevention and detection.
Read Time: 5 Mins
Written By:
Anna Brahce
Read Time: 12 mins
Written By:
Steve C. Morang, CFE
Read Time: 2 mins
Written By:
Bruce Dorris, J.D., CFE, CPA
Read Time: 5 Mins
Written By:
Anna Brahce
Read Time: 12 mins
Written By:
Steve C. Morang, CFE
Read Time: 2 mins
Written By:
Bruce Dorris, J.D., CFE, CPA