U.S. seeks to ban Huione Group for money laundering and more
Read Time: 4 mins
Written By:
Crystal Zuzek
In the last column, we began a disussion of leading-edge technologies that have the potential to provide significant amounts of useful information to fraud examinations. We also introduced the collaboration between data mining and digital forensics, which is driven by the increasing volume of structured and unstructured data that can account for as much as 80 percent of the total data in an organization. Now we address text mining (sometimes called text analytics) and discuss its characteristics and possible areas of application in fraud cases. We look at the components of text mining and how practitioners might utilize these methods to analyze large data sets to provide information that achieves the fraud examiner's goals.
Some of the more commonly used sources of unstructured data in an examination include:
While there are many other potential sources, experience has shown these to be the most common in corporate examinations.
Email, chief among unstructured data sources in fraud examinations, not only contains word-for-word communications but also possesses a date/time element, metadata and even emotional tones expressed through idioms, phrases and adjectives. Fraud examiners can use these components to analyze the personalities of the communications and the communicators.
The contents of computer hard drives include not just email but also documents, audio and video, caches of Internet activity, discarded instant messaging and chat sessions, deleted content and overlooked backup and temporary copies of items. Digital forensics technologies can preserve, identify and produce these obscure items.
Handling the sheer volume and complexity of unstructured data requires special tools and processes. Often, the majority of useful, relevant material is human communications. Therefore, analysis shouldn't be limited to mere keyword searches. The extraction of meanings and topics; emotional tones of conversations; and creation of relationship networks to visualize how key players and topics interact, influence and evolve over time can provide fraud examiners critical information not otherwise apparent.
Figure 1 provides a conceptual overview of the family of processes that make up the core of text mining. These components encompass the science of "natural language processing" and related concepts of "latent semantic analysis" and "concept searching," among others. Experience has shown us that these components, when working together, can be an effective toolset in the identification of relevant evidence in a fraud examination.
|
|
Figure 1: Text-mining overview |
|
|
Figure 2: Topic maps |
The data and information gathered, analyzed and produced using text mining provides even more value to a fraud examiner when used in combination with other procedures related to data analytics related to structured data. Named entities, email recipients/senders and relationships provide further insights into how employees, customers and vendors interact. The relationship map, Figure 3, becomes even more robust by adding the identified structured data relationships (employee-vendor, employee-customer, vendor-customer, etc.) based on common attributes such as name, address, phone number or tax identification number. In some instances, analyzing structured data and unstructured data independently provides two interesting, but possibly incomplete, results. Combining the results of both into a single analysis provides a more complete picture.
Incorporating dates/times of email, document creation, social media postings and computer-based activities (downloads, uploads, deletions, etc.) provides chronological events that can be useful in further analyzing transactions to identify possible correlations and/or causations. For example, analyzing communications from a purchasing director involving a request for proposal (RFP) process can be used to find indications of potential bid rigging. Red flags may include email or phone communications to the winning vendor minutes before the submission deadline, or a specific vendor winning each time a particular individual is on the RFP evaluation committee.
|
|
Figure 3: Complex relationship mapping |
The growth of unstructured data is a key driver in the need for the collaboration of data analytics and digital forensics. Text mining is the overarching name for the family of functions used to analyze unstructured data and isolate the useful data elements for inclusion in data analytics processes. Initially, the incorporation of unstructured data into investigations is a daunting task. The volume and complexity of the data to be analyzed is a challenge, a challenge best conquered by the collaboration of data analytics and digital forensics. The result of this collaboration is a comprehensive analysis — fully integrating both structured and unstructured data. It's a process well worth the effort. Eric Berlow, ecologist, network scientist and Technology, Entertainment and Design (TED) Senior Fellow, summarized the need to embrace complexity in his July 2010 TED Talk, "We're discovering in nature that simplicity often lies on the other side of complexity. So for any problem, the more you can zoom out and embrace complexity, the better chance you have of zooming in on the simple details that matter most." (See the video.)
In the next issue of Fraud Magazine, we'll focus on two methods that address the need to leverage technology in an investigation. Augmented intelligence addresses leveraging machine learning to analyze unstructured data more effectively and efficiently. Data visualization relates to the use of visual analytics for analysis and communication of results to end users.
Les Heitger, Ph.D., Educator Associate, is BKD Distinguished Professor of Forensic Accounting in the School of Accountancy at Missouri State University in Springfield. He's chair of the ACFE Higher Education Advisory Committee.
Jeremy Clopton, CFE, CPA, ACDA, is senior managing consultant in the Forensics Practice of BKD, LLP.
Lanny Morrow, EnCE, is a managing consultant in the Forensics Practice of BKD, LLP.
Unlock full access to Fraud Magazine and explore in-depth articles on the latest trends in fraud prevention and detection.
Read Time: 4 mins
Written By:
Crystal Zuzek
Read Time: 14 mins
Written By:
Trisha Gangadeen, CFE
Read Time: 2 mins
Written By:
Crystal Zuzek
Read Time: 4 mins
Written By:
Crystal Zuzek
Read Time: 14 mins
Written By:
Trisha Gangadeen, CFE
Read Time: 2 mins
Written By:
Crystal Zuzek