You may not take the lead in a computer forensic investigation, but knowledge of binary files and metadata will help you understand some methods of catching fraudsters.
On March 26, 1999, Melissa began her worldwide rampage.
Computer cracker David L. Smith unleashed the computer virus with the feminine name that crippled network systems and cost the business world in excess of an estimated $80 million.
I want to teach you some of the same forensic computing investigation techniques that helped catch Smith and convict him of interruption of public communication, theft of computer service, and wrongful access to computer systems. These are advanced techniques that you can use to find the evidence you need to convict fraudsters in your business or those of your clients.
Mystery Dispelled
For many fraud examiners, the gathering of electronic evidence from a suspect's hard drive can appear to be an arcane business. Though many turn to specialists to search computers, those in charge of investigations should understand the methodology of evidence gathering and presentation.
Computer forensics isn't as mystical as it may appear. I'd like to walk you through a case I recently worked on to examine the types of information electronic documents can contain. We'll focus on the software package you probably use - Microsoft Office.
Market research showed that until the middle of 2001, Office 97 was still the most popular application package in its class (at about 50 percent), and that Microsoft Office versions share about 90 percent of the market for office software.1
Electronic Evidence: an Example
An international German construction company, which we'll rename Baugeschäft AG,2 suspected that one of its managers was conducting illegal or corrupt business practices during a large building project in a Latin American country.
The construction company called in the Big 5 firm for which I was working. We took an image of the company laptop used by the manager.3 We examined the paper and electronic evidence, and found "soft" copies of invoices from three service providers. This is automatically suspicious; invoices are submitted for payment by post and should never be found on a hard drive in "soft" form. It would be an invitation to commit fraud because they could so easily be altered. The suspect in this case had altered or created new invoices on his company laptop. He routed payments to these companies via an alias account with an offshore bank. Furthermore, we found the manager had dealings with a consultant whose company had a dummy address in a Latin America country but who lived and worked in Germany. This consultant was paid through the same offshore bank and accompanied the manager on two trips to the bank. A total of about $1,000,000 was in the alias account at the time of this visit, although total obligations to the service providers under suspicion amounted to $2,600,000.
Using the electronic evidence in the form of the binary files of Word documents,4 we proved that the manager had altered certain invoices to add extra payments. We recovered the evidence in the form of deleted documents or scraps of documents in free space from the manager's home directory on the company server and the image of the company laptop.5 One incriminating invoice interested us in particular; we'll focus on it now.
Three Versions of the Same Invoice
[Figures referenced in this section are no longer available. — Ed.]
We found three versions of the same invoice in different stages of development from one of Baugeschäft AG's service providers, Claude Bouffant of B. Utility S.A. The invoice, submitted by the manager, Johann Dachs, contained two requests to transfer funds to two different accounts for a total of $990,000. (See Figure 1.)
Using a popular forensic software, we did a name search of the captured image, which revealed an intermediary version of this invoice in the free space of the computer. (See Figure 2.) The file was a data scrap on the hard drive, and had no file name or other such identifying property.6 This document differed from the submitted invoice: Bank and recipient details were incomplete and the location hadn't been inserted as in the submitted version.
Since the file wasn't stored as a temporary file, it had been "forgotten" by the operating system. Temporary data, which is created when the program requires longer-term storage or when the program is interrupted, typically is stored in ".TMP" files. When the user then saves the document changes the temporary storage file is mostly deleted. However, this deletion merely removes the name of the file; the information remains on the hard drive until over-written by more recent data. The "scrap" data is unreachable and unseen by the normal operating system.
In this example, it appears as if Herr Dachs left the incomplete document open, possibly went on a lunch break, and returned to complete the document changes later. (Another possible reason for the incomplete document was a temporary instability in the operating system.) Dachs probably wasn't aware of this temporary storage.
Fortunately he wasn't an expert user of his laptop, and we found "deleted" documents in his Recycle Bin because he didn't empty it. (Even if he had emptied the Recycle Bin the data would remain on his hard drive as a scrap and could be recovered if not overwritten.) With a normal deletion under the Windows operating systems, documents aren't physically destroyed but are renamed and moved to a special folder. If this folder isn't "emptied" then the documents can easily be restored by the user. We found what was probably the earliest version of the submitted invoice on the image of the hard drive as part of the search for the service provider's name. (See Figure 3.) In this version, the wording was different and the total billed was less.7 This document was probably the actual invoice that Dachs had deleted.
Dachs' submitted invoice from Bouffant asked that $990,000 be transferred in two payments to two different banks - one for $640,000 and another for $350,000. However, the version that we found asked only for one transfer of $640,000 to one bank. We assumed Bouffant had transmitted the invoice to Dachs electronically via e-mail or by diskette. Johan Dachs was promptly dismissed.
Metadata Within Office Documents
As most computer users know, office documents contain hidden information about formatting or file information such as document size, when it was created or modified, and the author and his or her company. Some of this "data about data," called metadata, can be seen and edited within Microsoft Office in the File —> Properties menu option. but not all of it can be extracted and viewed by the user using standard Office programs. The best method for examining this metadata requires the investigator to examine the binary file - the "raw" data.
Figure 4 shows Dachs' doctored invoice appears when viewed in a text viewer such as Notepad.
Around the text are characters Word uses to describe structure and format. (At the bottom of the screen is the word "Standard" - the German version of the "Normal" style sheet.) Toward the end of this file were the previous save paths of the document (Figure 5), which revealed the original author (Claude Bouffant) and original name of the document (which I renamed "Eingereichte Rechnung," the German words for "submitted invoice"). Bouffant originally had saved the document under the name of a different client but subsequently saved it under the name of the German client, Baugeschäft AG. This indicated he had used another document as a template, a normal activity for a re-usable invoice. The document had been transmitted to Johann Dachs who then saved it onto the company's local server (here the path appears as SSSSS\\LOCAL SERVER\DACHSJOH03...), under a new name. (This name included the provider's name. This was rather unusual because one wouldn't normally send a document using one's own name.)
At some point the document was automatically saved (noted in the last line of the highlighted section - "Autowiederherstellen - Speicherung von..." meaning "Automatic recovery - save of..."), possibly as a result of a temporary instability in the computer. This might also account for the intermediary version we found. Although the document had been altered, the details of the creator remained, and the save paths reveal something of the history of the document. Furthermore, we could use peripheral details, such as the printers attached, to identify the office used to create the document or where it was printed. Further down in the binary file, we again found the name of the file's creator and the German client's company name. This name was presented in the same form as the standard setting for the MS Word File Information "Company" field, and had probably been inserted later into the document by Word. However, the company information was also present in the document found in the Recycle Bin folder therefore the field was probably empty in the document originally received but had been inserted later. This shows why the investigator needs to be cautious when interpreting digital evidence.8
Office documents, especially in Office 97 format, also contain identifying codes. These codes, which are generated automatically and mostly without the user's knowledge, can be useful in ascertaining authorship. They are designed to be unique, hence their designation: Globally Unique IDentifiers or Universally Unique IDentifiers (GUID or UUID, the terms are interchangeable).
GUID and its Usefulness
While the text or file information of a document can in most cases be changed easily using the application's file menus, the Office 97 GUID is created automatically, as part of the program code "behind the scenes" of the document, and is largely unknown to all but the most expert of users. GUID is implanted into a document as part of its metadata when it's created and has a fixed structure making it easier to identify. Figure 6 shows the code from a different invoice document than the previous example for which there was only one author and one save path found on the company server. Using our notation for this case, the path read: "S S S S\ \ L o k a l s e r v e r\ D a c h s 0 2 \ D A T A \C Utility SA \ I n v o i c e N r . 1 . d o c." The author was the default company setting for the file information tab: "BGK." (The file information tab is the information about a Word document that can be viewed and altered under the menu File —> Properties Information. "BGK" is my rendering of the company's three-letter reference.) No other versions of this document were found.
In this example, the GUID code read: {1 5 D E 4 9 8 B - 5 3 0 F - 1 1 D 4 - 8 0 7 0 - A A A 9 9 9 1 1 1 G G G}. While the rest of the GUID components weren't of immediate interest in the investigation (these are discussed in Appendix One in this article on www.thewhitepaper.com), the last 12 numbers corresponded with the Ethernet card's "Media Access Control" (MAC) address code of Dachs' laptop. (Ethernet is the most common network communications protocol.)
The MAC code is a part of most networked pieces of hardware, which for computers will include the Ethernet, or network interface card (NIC). (The NIC allows the computer to connect to the company network.) The code is used to address and identify individual hardware components within a network and is designed to be unique; it's built into the card and is extremely hard to alter. An Internet Protocol (IP) address, by contrast, is interchangeable and isn't built into the hardware by the manufacturer. Since the MAC code is a 12-digit hexadecimal number, there are 1216 possible combinations (or 120,000,000,000,000,000): it's highly improbable that this number will crop up more than once. Hence the MAC address code, when implanted into a document GUID, is useful in finding the computer on which the document was created, and if used carefully, can aid in establishing authorship. The company network administrator confirmed that this code was the same as the MAC code of the Ethernet card allotted to the laptop of the suspect.
The GUID is placed into certain types of Word documents during the creation of new documents. In the case example, the GUID with the suspect's MAC address was present not only in documents that were clearly created by him, but also in invoices purportedly from third parties.
Identifying a MAC address code
A fraud examiner should discover the MAC address code early in an investigation because it's so useful for examining the trails of documents. The 12-digit hexadecimal code is displayed as six sets of two numbers separated by a colon or hyphen (e.g. in our example, AA-A9-99-11-1G-GG). The MAC code is often stamped directly on the Ethernet card next to or as part of the serial number. On laptops, however, the Ethernet card frequently is a PCMCIA (Personal Computer Memory Card International Association) card, contained in a slot on the side of the laptop; code and serial labeling often isn't as detailed as with a PC. If you can't find the MAC code on the card, then remove it and insert it into a host PC in which you can view it through operating system commands. Here are some methods for finding the MAC code of an Ethernet or NIC card:
| Operating System |
Method |
| Windows 95 |
From the Start menu select:
The adapter address is the MAC code
|
NT / Windows 2000
|
From the Start menu select:
- RUN
- [Type in the Window:] CMD [Enter]
- Type at the prompt “ipconfig /all” [Enter].
The “physical address” is the MAC address code
|
| Alternatively: |
From the Start menu select:
- PROGRAMS
- ACCESSORIES
- SYSTEM TOOLS
- SYSTEM INFORMATION
- Within this program, select the Components branch
- Select the Network branch
- Select the Adapter branch.
From this tree scroll down to find the MAC Address of the NIC / Ethernet adaptor. If more than one has been installed, then one must identify the correct adapter by manufacturer (Name, or Product Name information).
|
The System Information program in Windows 2000 allows the user to save the information viewed as a text file (select the Actions menu, Save As Text File option). Highlighting the text displayed in a maximized window, and then copying it into an empty text editor such as Notepad, can save the information gathered by the other two methods, but may be useful first to record the time and date by entering these nouns at the command prompt.
Viewing Metadata in Documents
Viewing electronic documents can risk altering the data. On no occasion, should an examiner use original or evidence data unless he or she is using special software packages that protect it. Any data that has been identified as being of interest to an investigation should first be copied and isolated from evidence data. Only then can the document be examined safely.
Microsoft offers software developers a number of tools that might be of assistance in viewing binary metadata, but these weren't designed for forensic work and should be used with caution. The simplest method for viewing all information within a binary file is through a text/programming editor. For this article, I displayed documents using the Windows Notepad. However, this tool can't view documents larger than a few pages. Because normal processors exhibit problems with large files, I advise using such viewing tools as GWD Text Editor, EHex, or UltraEdit.9
After you fully examine an isolated, extracted document with a text editor to look at the binary data, you might want to view the document using the software it was created with because you might also glean extra evidence not readily discernable from binary data. For example, with MS Word it might be possible to track changes or view previous versions of the document if the Tools —> Track Changes option was checked during its composition. If the Allow Fast Save option was enabled10 then the remnants of other versions may also be present in the document such as save paths, etc. However, since this method won't uncover codes such as the GUID, a full-text examination is essential.
Begin with searching the full text of the whole image. Then extract files of interest and fully examine them in a text viewer to study the metadata. You could then view it with the Office software, isolated from evidence data.
Melissa Revisited: Metadata Use
Perhaps the most famous example of investigators examining metadata was the Melissa virus case. (The Melissa example is still germane because the teenage authors of the December 2001 Goner virus used it as a model.11)
The Melissa case showed how the examination of the internal "history" of files, including the metadata is increasingly crucial in investigators. On March 26, 1999, Smith launched the Melissa virus by posting it to the Alt.Sex newsgroup.12 After a user opened an attached document, the virus sent copies of itself to the first 50 entries in every address file it found in the MS Outlook mail program. The virus altered the Word security settings and infected any newly created documents with copies of itself.13 Within nine hours of the first reports, a solution had been found and posted on the Computer Emergency Response Team (CERT) of the Software Engineering Institute of Carnegie Mellon University. However, the crisis continued for several weeks.14 ICSA.net (ICSAlabs - International Computer Security Association - a division of TruSecure Corporation) and other groups gathered evidence, performed a detailed analysis of the virus, and identified a suspect using historical data.15
Richard M. Smith, president of the Swedish software company Phar Lap, and Fredrick Bjorck, then a doctoral student at Stockholm University, traced the virus writer using more focused means at approximately the same time. They found three GUID codes implanted into the virus-infected document attachment Smith had posted under an alias. They compared these to GUID codes inside other Word documents Smith placed onto a Web site he owned under a different alias. The GUIDs all had the same MAC address, leading the investigators back to a single computer.16 Smith was arrested after investigators discovered evidence from a stolen AOL account used by him to spread the virus. An AOL representative told investigators that Smith had stolen someone's e-mail account to illegally post the infected document. He initially pleaded innocent to the April 1999 charges. However, the GUID evidence probably persuaded him to change his plea and claim authorship for the virus. Smith is due for sentencing on April 8, 2002 following several delays. He faces a maximum term in prison of up to 40 years with a $480,000 fine.17 The damage estimate of his crime probably is conservative because his unique virus has many imitators.
During and after the Melissa crisis, the media commentators were quick to emphasize potential problems of relying upon implanted GUID codes as evidence.18 These points are even more applicable to metadata in general. If you use metadata in an investigation, and your case goes to court, you may be grilled on the following areas.
First, and most significantly, any information that can be read in a text editor can also be edited. This is particularly true of metadata that can be manipulated by the user within the software application itself (for example, through the Office File Information menu). There was the remote but unlikely possibility that someone had framed David Smith by altering the GUID. Because of Melissa, would-be crackers now know about the GUID.
Secondly, the investigator must understand the precise circumstances under which the metadata is implanted into the document. The GUID "PropertySet ID" ("_PID_GUID" as it appears in the binary files) is implanted when the document is created particularly in the Office 97 file format version.
Subsequent alterations to the text aren't recorded by the code. Therefore, the GUID only can be used as corroborating evidence in establishing authorship. Because the document could have been altered at any later stage, you must pay careful attention to the chain of evidence, ownership, and use of the document, and attempt to differentiate between versions if necessary.
Thirdly, you should be particularly stringent about:
- matching a network interface card to a suspect's computer;
- establishing sole possession and use of a laptop and its peripheral equipment (such as PCMCIA cards);
- precisely recording the circumstances of the evidence capture; and
- maintaining the integrity of the evidence against any risk of inadvertent alterations.
You may not take the lead in a computer forensic investigation, but knowledge of binary files, metadata, and GUID will help you understand the most critical methods of catching fraudsters.
The author thanks his colleagues Bettina Schober and John Weisweiler of PricewaterhouseCoopers Switzerland, for their help in researching the question of MS Office market share. He also thanks colleagues from Forensic Services in Germany, United States, and Switzerland: Marcel Meyer and Stefan Wieland for researching information, and Al Lakhani, Raim Mustafi, and Ludwig Düwel for technical advice.
Mark Furner, Ph.D., trained as a historian and programmer, worked as an associate in forensic computing for the Forensic Services Department of PricewaterhouseCoopers Ltd in Zurich, Switzerland until March 2002. He is an Associate Member of the Association.
Recommended Reading
“Forensic Computing: A Practitioner's Guide,” (Practitioner Series)
by Tony (A.J.) Sammes, Brian Jenkinson
(Springer Verlag: October 2000); ISBN 1852332999
“Handbook of Computer Crime Investigation: Forensic Tools & Technology,”
by Eoghan Casey (Editor)
(Academic Press: October 2001); ISBN: 0121631036
“Computer Forensics,”
by Warren G. Kruse II and Jay G. Heiser
(Addison-Wesley Publishing Co: September, 2001); ISBN 0201707195
“Digital Evidence and Computer Crime: Forensic Science, Computers, and the Internet,”
by Eoghan Casey
(Academic Press: March 2000); ISBN 012162885X
(Because the accompanying CD has interactive exercises, this is a good introduction and training resource.)
Appendix One: Identifying the Media Address Code (MAC)
Since the MAC code of the network card can provide useful evidence about a suspect computer, it should be identified early in an investigation. The MAC code is a 12-character hexadecimal number normally displayed in sets of bytes (two hexadecimal characters) separated by colons or hyphens.19 However, you may not be able to see it on any computer hardware. The network administrator at a large or medium-sized firm would know the code because it identifies the computer within the company network. But if no one knows the MAC code, remove the network card and install it into a different computer to inspect the settings without starting the suspect machine. Following are methods for identifying the code:
- Under Windows 95 or 98 select Run from the Start menu, and type WINIPCFG. This will produce the MAC code of a computer with a Network Interface or Ethernet card. The adapter address will be the MAC code.
- Under Windows NT or Windows 2000, the code can be found from the command prompt (StartàRunàtype “cmd” or “command” in the box) by typing
IPCONFIG /ALL. Alternatively, among the System Tools in the Accessories submenu, the option System Information produces a listing of the system information that can later be saved. Moving to the Components, Network, Adapter and scrolling down to the relevant card will reveal the MAC address. You can take screenshots of this information for later reference. [Take a screen shot by pressing the Print Screen button (Print Scrn) and then opening a graphics program such as MS Paint and pressing the Insert button.]
Appendix Two: The Globally Unique IDentifier (GUID) code
The case example mentioned the Microsoft GUID, which is inserted into MS Word 97 documents. This code is part of the Word 97 document summary information, stored at the end of the binary file between 6000 and 6400 bytes from the end.20 Microsoft based these codes upon a standard format described by the Open Group as that part of their specifications for a distributed computing environment (DCE), which defined methods for identifying data across networks.21 The DCE Universal Unique IDentifier (UUID) is a 128-bit code comprising of 16 hexadecimal characters: one group of 8 characters, followed by three groups of 4 hexadecimal characters, followed by a group of 12 characters. (UUID and GUID are interchangeable.) An example of a GUID could look like this:
{3FAD3020 – 16B7 – 11CE – 80EB – 00AA003D7352}.
The GUID / UUID code is used to keep track of data objects or to check the compatibility of object versions between servers and clients.22 Microsoft has acknowledged the usefulness of the GUID in this respect by using it as an important component in the Registry (the database that keeps track of MS Windows operating system changes to keep track of installations within MS Access to track the replication of data (changes to data across networks or versions of a database) and, in general, as part of its data exchange programming (including the object linking and embedding methodology).23 Unfortunately for the investigator, this development has led to variations of the GUID; I describe below two forms with useful evidentiary content.
Original DCE format of the UUID
The original form of the GUID/UUID, as used in Microsoft Office 97 and “component object model” programming (component object model is a subset of the data exchange programming, which is necessary to allow programs to communicate with one another), contains time and date information represented in the Universal Coordinated Time (UTC) format.24 (UTC is a way of encoding time so programs can read it more easily.) While the first three sections of the UUID are time-based values, the second two are clock sequences initialized against random values, and the last component, called the “node,” is normally but not exclusively the MAC address of the network card.25 Hence a UUID created according to Open Group specifications contains information that may be useful in an investigation, especially under circumstances where it’s implanted without the user knowing it.
Below is a table interpreting the DCE format for the GUID, based on a definition on the Open Group Web site.26 Since the clock sequence is based on a random sequence, this has less value to an investigator. The node needs no further interpretation if it’s established that this is the MAC code number of the Ethernet card of the computer being investigated.
| Field |
Data Type |
Byte |
Note / Explanation |
| time_low |
unsigned long |
0-3 |
The low field of the timestamp (minutes, seconds and fractions of seconds). |
| time_mid |
unsigned short |
4-5 |
The middle field of the timestamp. |
| time_hi_and_version |
unsigned short |
6-7 |
The high field of the timestamp multiplexed with the version number. |
| clock_seq_hi_and_reserved |
unsigned small |
8 |
The high field of the clock sequence multiplexed with the variant. |
clock_seq_low
|
Unsigned small |
9 |
The low field of the clock sequence. |
| Node |
Character |
10-15 |
The spatially unique node identifier. |
| Table 1: UUID Structure |
Microsoft Versions of the GUID
Microsoft uses various different methods for creating a GUID, and we can expect that these will continue to change. As part of its Visual Studio 6.0 Standard Developers’ Kit, Microsoft provides programs called UUIDGEN.EXE and GUIDGEN.EXE that create these codes in a number of formats, including the DCE-compatible.27 More recent versions of the GUID may, however, refer to a product rather than to a time of creation and particular computer.28 These forms of the code are of less interest to an investigator. The context of the GUID, together with a comparison with the MAC code with the last component of the GUID, will establish whether the code is useful in an investigation.
Office 2000 format for the GUID
The GUID format used by Microsoft Office programs after Office 97 differ from the DCE version in that they include product information in the first byte sets. Office 2000 documents, for example, also don’t appear to contain the GUID by default when created, which lessens the interest in this code from an investigative vantage point. A partial description for the Office 2000 numbering scheme that was found on the Microsoft Knowledgebase, described in Table 1 and 2 below, has been removed.29
| Position in GUID |
Meaning |
| 1-2 |
Product version code |
| 3-4 |
Stock Keeping Unit (SKU) number – see below |
| 5-8 |
Language identifier of the product (hexadecimal)
E.g. 0409 in hexadecimal is equivalent to code for the English version 1033 in decimal, or 0407 hexadecimal is 1031 in decimal, the German version. |
| 9-20 |
The code positions from 9 onwards were noted with the remark that they “do not provide any easily categorizable information.” |
| 21-32 |
Possibly the node / Ethernet card MAC address. |
| Table 2: Office 2000 GUID format |
| SKU No. |
Programme |
| 00 |
Microsoft Office 2000 Premium Edition CD1 |
| 01 |
Microsoft Office 2000 Professional Edition |
| 02 |
Microsoft Office 2000 Standard Edition |
| 03 |
Microsoft Office 2000 Small Business Edition |
| 04 |
Microsoft Office 2000 Premium CD2 |
| 05 |
Office CD2 SMALL |
| 06-0F |
(reserved) |
| 10 |
Microsoft Access 2000 (standalone) |
| 11 |
Microsoft Excel 2000 (standalone) |
| 12 |
Microsoft FrontPage 2000 (standalone) |
| 13 |
Microsoft PowerPoint 2000 (standalone) |
| 14 |
Microsoft Publisher 2000 (standalone) |
| 15 |
Office Server Extensions |
| 16 |
Microsoft Outlook 2000 (standalone) |
| 17 |
Microsoft Word 2000 (standalone) |
| 18 |
Microsoft Access 2000 runtime version |
| 19 |
FrontPage Server Extensions |
| 1A |
Publisher Standalone OEM |
| 1B |
DMMWeb |
| 1C |
FP WECCOM |
| 1D-1F |
(reserved standalone SKUs) |
| 20-2F |
Office Language Packs |
| 30-3F |
Proofing Tools Kit(s) |
| 40 |
Publisher Trial CD |
| 41 |
Publisher Trial Web |
| 42 |
SBB |
| 43 |
SBT |
| 44 |
SBT CD2 |
| 45 |
SBTART |
| 46 |
Web Components |
| 47 |
VP Office CD2 with LVP |
| 48 |
VP PUB with LVP |
| 49 |
VP PUB with LVP OEM |
| Table 3: Stock-Keeping Unit (SKU) |
Elements missing from the description in Table 3 could be date and time components in the UTC hexadecimal format, followed by the MAC address of the computer upon which the original installation or creation took place, but this is conjecture.
Endnotes
[Some links may no longer be available. —Ed.]
1 Precise figures are hard to find. Joe Wilcox in an article for CNET News.com cited the Gartner analyst Chris LeTocq ("New Microsoft Office faces dual obstacles," March 8, 2001, http://news.cnet.com/news/0-1003-200-5067906.html). Also see the article "Microsoft banks on Office XP," by James Middleton, May 31, 2001 (http://www.vnunet.com/News/1122220). An article by Kenneth Smiley on Microsoft's German Web site cited a study by the Giga Information Group: Office 97 gained 90 percent of market share for office software after its release, and before the introduction of Office XP held about 52 per cent of the MS Office software market share, against about 40 percent by Office 2000 and 8 percent for Office 95. ("Planung der Microsoft Office XP-Migration: Position von Giga," by Kenneth Smiley, no date [the Web site code points to a creation date of May 22, 2001], http://www.microsoft.com/germany/ms/businessloesungen/newsofficexp.htm).
2 I've altered all names, illustrations, MAC numbers, and figures etc. for this article. It may amuse readers to note that I've gone to considerable lengths to re-forge the forgeries for their benefit.
3 Regular readers of The White Paper will be familiar with this process, which involves the taking of one (or better, two) exact bit-for-bit copies of the hard drive being investigated without altering the original evidence in any way. One copy then can be used for processing (for example, searching for deleted files) while another is kept aside for use as evidence.
4 Binary files are the raw data format in which electronic documents are physically stored on a hard-drive, i.e. before and after the user opens them.
5 Free space is that part of a hard drive which appears to be empty in the Windows operating systems, but which contains the remnants of deleted and "forgotten" data (such as temporary files). Windows fills this space with newly created files on a random basis. Hence documents that have been recently deleted from the Windows Recycle Bin, for example, and are now no longer "visible" to the operating system, frequently can be recovered intact from this part of a hard drive. The chances of recovering less recent data in this manner depend upon the usage of the disk in the interim period.
6 The result of the search is seen here in the byte stream viewer provided by the Vogon forensic software we used. This viewer attempts to recreate the formatting of the data remnants where possible. The software allocates to the data scrap a file name that is based upon the physical location of the data on the drive (the physical sector number, here 19FADC, in hexadecimal) and the type of data scrap (.FRE for free space). For more information on Vogon forensic software and hardware, see their Web site: http://www.vogon-international.com.
7 As seen in the letter, the invoice sum of $640,000 is explained by an increase of $200,000 agreed between the company and the provider on May 21, 1999. (The later version doesn't mention this and gives a later date.)
8 The designation of this deleted document as an "original" is based upon the analysis of the differing text contents of the binary files, as described above. Since the "original" had been opened by Dachs, viewed in Word by him and then deleted (i.e. moved into the Recycle Bin folder), the metadata reflected these actions and certain elements had been changed. The MAC code in the original document's GUID, however, was different from Dachs' Ethernet card and from those documents created by him.
9 Programming code or text editors furthermore often have sophisticated search tools and display methods (for example, column mode editing) and accept files of almost any size, which make them more practical for investigative work, for example, when examining memory dump files, which can often be several megabytes in size.) UltraEdit has a hexadecimal viewing mode that allows the user to see the document, as it is stored on the hard drive, using offset numbers to describe exactly where, from the beginning of the document, the data is found. It also has a function that locks the examined file into read-only mode, a useful safety option (Text editing tools can be found by searching Internet shareware repositories such as www.ZDNet.com. The tools mentioned here can be found under www.gwdsoft.com, www.ultraedit.com, and www.etree.com).
10 This is part of the Tools —> Options —> Save tab. The Fast Save file format is slightly different from other Office file formats, in that it appends changes to a document to the end of the file rather than incorporating them into the main body of text. If this has been enabled, the 'binary' extracted from the image may still differentiate between recent changes.
11 Prosecution and plea-bargaining documents, together with a press release from the U.S. Department of Justice following Smith's indictment can be found on the Cybercrime.gov Web site. (http://www.usdoj.gov/criminal/cybercrime/ccpolicy.html).
12 See, for example, the "ecommercetimes" article by Tim McDonald of the NewsFactor Network, "Israeli Teens Confess to Launching 'Goner' Worm," December 10, 2001, http://www.ecommercetimes.com/perl/story/15211.html, or "How Goner suspects were tracked down," by John Leyden, The Register, Dec. 10, 2001, http://www.theregister.co.uk/content/56/23292.html.
13 The virus caused considerable activity from Microsoft. Some remnants of the activity can be found under the support site reference: http://support.microsoft.com/support/kb/articles/Q266/0/45.ASP.
14 Their advisory note of March 31, 1999 gives further details: http://www.cert.org/advisories/CA-1999-04.html, which includes a FAQ file. The testimony of Richard Penthia, a director of CERT, before the U.S. Congress on the significance of the Melissa incident in April 1999 can be found under http://www.cert.org/congressional_testimony/pethia9904.html.
15 The Department of Justice press release following Smith's indictment, Dec. 9, 1999, acknowledged the technical help of ICSA (http://cybercrime.gov/melissa.htm). The ICSA published their evidence in a research report under http://www.icsalabs.com/html/communities/antivirus/alerts_melissa.shtml. Interestingly they were aware of the GUID codes but stressed other evidence to link David Smith to the virus such as online interviews with the virus author. Since they were more influential in the work of the U.S. authorities, the full significance of this evidence emerged later.
16 Among the numerous news postings of the time, the ZDNet News reports of Robert Lemos are among the best: March 29, 1999, "Melissa creator may be uncovered," http://www.zdnet.com/zdnn/stories/news/0,4586,2233931,00.html and March 30, 1999, "How GUID tracking technology works," http://www.zdnet.com/zdnn/stories/news/0,4586,2234550-2,00.html. (These sites have since been archived and may no longer be available on the Internet.) Lemos was in direct contact with Richard Smith of Phar Lap. Casey (2000) also mentions the case, page 61. (See "Recommended Reading" on page 41 in this article.) The privacy concerns raised by the case could have been incidental in the Microsoct altering subsequent versions of Word so that they no longer contain the GUID by default. (See Appendix Two in this article on www.thewhitepaper.com.)
17 His plea agreement, dated Dec. 8, 1999, significantly reduced this: http://cybercrime.gov/meliplea.htm. The Web site run by the Public Affairs Office, U.S. Attorney's Office, District of New Jersey, no longer mentions Smith. (See: http://www.njusao.org/break.html). My thanks to Michael Drewniak of the Public Affairs Office, U.S. Attorney's Office, District of New Jersey, for confirming the date of sentencing as April 8, 2002.
18 See, for example, the articles by Robert Lemos March 29 and 30, 1999, cited above in footnote 16. Since these articles may no longer be available on the Internet, readers may read some of the background on the Melissa case and investigation by searching, for example, the http://news.bbc.co.uk.hi/english/world/ and www.theregister.co.uk/. An article on the delays in sentencing Smith was written by Kevin Poulsen of The Register on Aug. 1, 2001: "Justice mysteriously delayed for 'Melissa' author." See http://www.theregister.co.uk/content/archive/20751.html
19 Source: Microsoft Knowledgebase article ID Q230848, May 25, 1999. This article no longer appears on the Microsoft Knowledgebase.
20 Microsoft file formats are propriety and can change with software versions. A more precise analysis of document property information is beyond the scope of this article.
21 The relevant online document is “CAE [Common Applications Environment] DCE 1.1: Remote Procedure Call” (1997), document no. C706, section “Universal Unique Identifier”, found under: http://www.opengroup.org/onlinepubs/009629399/apdxa.htm#tagcjh_20.
22 The Microsoft Developers’ Network has a general article on GUID within an online document on Remote Procedure Calls, found under: www.msdn.microsoft.com/library/en-us/rpc/pr-dtype_3s84.asp.
23 See for example, Microsoft OLE DB 2.0 Programmer’s Reference and Data Access SDK (Microsoft Press: Redmond, 1998), Chapter 11 Properties, pp. 171-196, especially the section on Property Sets and Property Groups. Each property is identified by a GUID. The object linking and embedding methods appear relatively stable. Some information on the formats of the property sets at the end of Word files can be found in the OLE 2 Programmer’s Reference (Microsoft Press: Redmond, 1994), vol. 1, Working with Windows Objects, appendix B, OLE Property Sets, pp. 857-879. This is now out of print and superseded by the reference above. This literature describes the information from a programming and not from an evidentiary point of view; the investigator is still obliged to search through the end of the binary data when examining an image.
24 UTC is calculated from various starting points. According to the Open Group Web site mentioned above, the UUID time counter is calculated by 100-nanosecond intervals since midnight Oct. 14-15, 1582, when the current Gregorian calendar was introduced. It is based upon ISO 8601 (Data elements and interchange formats – Information interchange – Representation of dates and times, latest revision Dec. 21, 2000).
25 Further technical details, including the format of the date and time information, are found in the online document mentioned in note 21 above.
26 http://www.opengroup.org/onlinepubs/009629399/apdxa.htm#tagcjh_20.
27 There are online descriptions on the Microsoft Developers’ Network – MSDN. See the Microsoft online articles “GUIDGEN: Generates Globally Unique Identifiers (GUIDs)” (http://msdn.microsoft.com/library/en-us/vcsample98/html/_sample_mfc_guidgen.asp) and ‘GUID’, part of the online article on “Platform SDK: Remote Procedure Call,” http://www.msdn.microsoft.com/library/en-us/rpc/pr-dtype_3s84.asp.
28 This can be seen in the manner in which error codes identify Office programmes within Microsoft Office: see the online article “Identify the Product for Custom Error Messages” (dated Sept. 17, 1999),
http://www.microsoft.com/Office/ORK/2000/Journ/CustAlertGUID.htm. Microsoft may use these codes to control the legality of software installations as part of the online registration of its products (“Unique ID is built into WinXP final build,” by John Lettice, Aug. 28, 2001; http://www.theregister.co.uk/content/4/21307.html).
29 Source: Microsoft Knowledgebase article ID Q230848, May 25, 1999. This article no longer appears on the Microsoft Knowledgebase.
The Association of Certified Fraud Examiners assumes sole copyright of any article published on www.Fraud-Magazine.com or ACFE.com. Permission of the publisher is required before an article can be copied or reproduced.