Understanding how people, organizations and transactions relate to one another is essential in fraud detection and investigations, as fraud rarely happens in isolation. Fraud schemes typically involve a network of actors, hidden relationships and patterns of behavior that may individually appear harmless, but when connected, would reveal a coordinated scheme.
Unfortunately, modern-day fraud investigation still relies on manual identification of relationships, and seldom any utilization of artificial intelligence tools such as natural language processing, which can assist in mapping the relationships critical to a fraud case.
From Words to Visual Networks
Think of this process as translating the logic of language into geometry of networks. A sentence such as the following expresses a set of relationships in text form: “The Earth orbits the Sun, while the Moon revolves around the Earth.”
But when visualized, this information becomes a network diagram consisting of:
- Nodes or entities representing the nouns or key subjects (Earth, Sun and Moon).
- Edges or relationships representing the connections between them (“orbits” and “revolves around”).
Now, consider the expanded text below in Figure 1 on the topic of the Solar System:

Figure 1
By organizing the information contained in the above paragraphs in nodes and edges, the knowledge graph in Figure 2 below establishes a clear, linear relationship between the entities:
Figure 2
To generate the relationship map above, information extraction is the critical foundation, which allows for the subsequent steps. This first step converts unstructured text into the entities (or nodes), relationships (or edges) and the attributes associated with the entities.
In the relationship example above:
- Entities include texts such as “the solar system,” “the sun,” “the five dwarf planets,” “the largest planet,” “the inner planets,” and “Earth’s moon.”
- Relationships include texts such as “orbit.”
- Attributes include texts such as “small” and “largest.”
Applying this visualization through the lens of a fraud case, entities could be people, organizations, locations or dates, among others. Relationships could be familial connections or organizational associations. Attributes could be age of people, size of organizations, jurisdiction of locations and similarity of dates.
Using spaCy for Linguistic Analysis
To complete the above transformation, a library called spaCy, a natural language processing library in Python, is used. It provides essential tools to process and interpret language computationally, including tokenization, part-of-speech tagging, named entity recognition and dependency parsing.
Below is the code used for extracting nouns from the text block on the solar system example above:
Figure 3
Note in the code shown in Figure 3 how the second line adds “it” and “that” to the list of stop words. This is necessary, as stop words are common parts of speech that usually carry little meaning by themselves these includes terms such as “the”, “is”, “in” or “and.” They need to be excluded from text processing, as they often add noise instead of meaningful context.
Once entities are extracted from the text, the spaCy library goes on to analyze the network.

Figure 4
In Figure 4 above, spaCy finds the nouns in the given text and connects each noun to the next noun in the same sentence, storing their relationship in a list called “edges”.
Constructing and Visualizing Plot
With the “edges” list compiled from the previous step, the NetworkX library constructs the knowledge graph with the following code:
Figure 5
In Figure 5, the code string G.add_edges_from(edges) uses the relationships stored in the “edges” list from the previous step to be plotted in subgraphs by the following: graphs = [G.subgraph(c).copy() for c in nx.connected_components(G)]
These subgraphs are side by side as defined by fig, axs = plt.subplots(1, len(graphs), figsize=(30. 10)), which are each drawn to create the final output and visualization shown in Figure 2 by the following command:
for i, graph in enumerate(graphs):
pos = nx.spring_layout(graph, k=1, iterations=50, seed=42)
When the above entities and their relationships are plotted using a visualization tool such as matplotlib, the text transforms into a living diagram with the inter-connected relationships buried in the original text.
Further Modifying for Unique Situations
The above is a basic example to show how natural language processing can be used to assist the text analysis part of a fraud investigation. Further modifications could be made to introduce entity disambiguation to differentiation between the same word used in different contexts; this might include clarifying that “Paris” could be used to mean either the city or a person’s name.
Relationship weight could be included to add numerical strength to connections, allowing thicker or bolder edges for stronger relationships, while lighter edges apply to all others, so the relationship plot could show more than one layer of relationship significance.
Machine learning models could also be added to perform more context-aware extraction tasks. Below is an example using perplexity scores to detect phrases that are different from what the model expects the next word in a sequence would be based on its training data.

Figure 6
Ultimately, transforming text into visuals is about seeing knowledge, not just reading it. These tools allow readers, analysts and researchers to trace the flow of ideas and interconnections, identify themes or underexplored concepts, and most importantly, transform the invisible logic of language into something tangible and explorable.
For fraud examiners, applying these concepts into practice means distilling huge amounts of information into organized, concise relationships. This can help extract previously unseen patterns that opens the door to further investigation, and ultimately, can decrease the time and effort it takes to bring a case to resolution.