Finding Needles in a Haystack With Graph Databases and Machine Learning

Finding Needles in a Haystack With Graph Databases and Machine Learning

You know a technology has reached a tipping point when your kids ask about it. This happened recently when my eighth-grade daughter asked, "What is machine learning and why is it so important?"

Answering her question, I explained how machine learning is part of AI, where we teach machines to reason and learn like human beings. I used the example of fraud detection. In many ways, catching fraud is like finding needles in a haystack — you must sort and make sense of massive amounts of data in order to find your "needles" or, in this case, your fraudsters.

Consider a phone company that has billions of calls occurring in its network on a weekly basis. How can we identify signs of fraudulent activity from a mountain — or haystack — of calls? This is where machine learning comes in.

Of course, my daughter was ready with a solution to the problem: "Why not use a powerful magnet to draw out the needles from the haystack?"

She's right. When it comes to training a machine to spot fraudsters, we need to provide it with a more powerful magnet for drawing them out. Our magnet, in this case, is the ability to identify behaviors and patterns of likely fraudsters. Using this, a machine is more adept at recognizing suspicious phone call patterns and is able to separate them from the billions of calls made by regular people who comprise our haystack of data.

Let's use this example to consider current approaches for identifying fraudsters based on machine learning. Supervised machine learning algorithms need training data — in this case, phone calls identified as calls from confirmed fraudsters. There are two problems with the current approach, including both the quantity and of training data.

Confirmed fraudulent activity in phone networks currently constitutes less than 0.01% of total call volume. So, the volume or the quantity of training data with confirmed fraud activity is tiny. Having a small quantity of training data, in turn, results in poor accuracy for the machine learning algorithms.

Features or attributes for finding a fraudster are based on simple analyses. In this case, they include calling history of particular phones to other phones that may be in or out of the network, the age of a pre-paid SIM card, the percentage of one-directional calls made (cases where the call recipient did not return a phone call), and the percentage of rejected calls.These simplistic features tend to result in a lot of false positives. It's no wonder when you consider how, in addition to a fraudster, these features may also fit the behavior of a salesperson or a prankster!

A large mobile operator uses TigerGraph, the next-generation graph database with real-time deep link analytics, to address the deficiencies of current approaches for training machine learning algorithms. The solution analyzes over ten billion calls for 460 million mobile phones and generates 118 features for each mobile phone. These are based on deeper analysis of calling history and go beyond immediate recipients for calls.

The diagram below illustrates how the graph database identifies a phone as a "good" or a "bad" phone. A bad phone requires further investigation to determine whether it belongs to a fraudster.

A customer with a good phone calls other subscribers, and the majority of their calls are returned. This helps to indicate familiarity or trusted relationships between the users. A good phone also regularly calls a set of others phones — say, every week or month — and this group of phones is fairly stable over a period of time ("stable group").

Another feature indicating good phone behavior is when a phone calls another that has been in the network for many months or years and receives calls back. We also see a high number of calls between the good phone, the long-term phone contact, and other phones within a network calling both these numbers frequently. This indicates many in-group connections for our good phone.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

How Artificial Intelligence is Transforming Architecture Industry

30 Oct, 2020

Since its inception, AI has been growing. American computer scientist John McCarthy, known as the “Father of AI,” founded the …

Read more

A Brain-Inspired Chip Can Run AI With Far Less Energy

18 Nov, 2022

The NeuRRAM chip can run computations within its memory, where it stores data not in traditional binary digits, but in …

Read more

When will things break? Predictive analytics will soon warn us

3 Apr, 2016

Last week the most dreaded words in the English language flashed before me — “check engine.” That’s how my Friday …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.