Precision & Recall: When Conventional Fraud Metrics Fall Short

Ramin Madarshahian | Thursday, April 4th, 2024 | 14 minutes

Machine learning is a powerful tool in the ever-evolving battle against fraud, but its effectiveness is contingent on a delicate balancing act. You want to accept as many safe transactions as possible while simultaneously declining as much fraud as possible.

A lot of businesses turn to conventional fraud metrics to set that all-important approval-decline threshold. But a one-size-fits-all approach is not always the most profitable.

Let’s look at how a generic fraud strategy comes up short when compared to a customized option.

How Imbalanced Data Impacts Machine Learning

When it comes to fraud detection, machine learning and data go hand-in-hand. The value of each individual component is dependent on the quality of the other.

And one factor that can negatively impact machine learning outcomes is imbalanced data.

Imbalanced data means that there are far more legitimate, safe transactions than fraudulent ones. This causes a machine learning model to lean heavily toward the majority — which is the safe class.

Flour and salt

To put it simply, it's like having a big basket of flour with just a tiny pinch of salt hidden in it. 

It’s pretty easy to grab a cup of flour from the basket but a lot harder to find the salt!

This skewed data means you could label every transaction as safe, and you’d be correct more than 90% of the time. On the surface, that might seem fantastic.

But here's the catch: the goal is not to miss those rare grains of salt — the fraudulent transactions.

How Precision and Recall Help Overcome Imbalanced Data

The challenge is to strike a balance. The aim is to accurately catch the fraudulent transactions — those elusive grains of salt — while also ensuring there aren't too many genuine transactions mistakenly classified as fraudulent — effectively confusing flour for salt.

This balance is where precision and recall come into play, acting as our guides through the world of imbalanced fraud datasets.

In simple terms, the two can be defined in the following way.


Precision helps ensure safe transactions are not mistakenly classified as fraudulent. 

It's like ensuring that when you try to pick out the grains of salt from the basket, they are actually salt and not flour.


Recall ensures that actual instances of fraud are not missed amongst the safe transactions. 

It's akin to ensuring you don't miss any grains of salt in the flour.

Before we look at the calculations for each measurement, let’s first understand the variables involved and their place in our flour vs. salt analogy.

TRUE POSITIVEA correctly identified fraudulent transactionSalt identified as salt
TRUE NEGATIVEA correctly identified legitimate transactionFlour identified as flour
FALSE POSITIVEA legitimate transaction incorrectly identified as fraudulentFlour confused for salt
FALSE NEGATIVEA fraudulent transaction incorrectly identified as legitimateSalt overlooked amongst the flour

Now, let’s look at the calculations for precision and recall — and how they impact the salt vs. flour analysis.

Precision is calculated using the following formula:

Precision = True positives / (True positives + False positives)

Precision tells you the percentage of correctly identified fraud cases out of all the cases predicted as fraud. A high precision value indicates a low rate of false positives, reducing the chances of wrongly flagging legitimate transactions as fraudulent.

Recall is calculated using the following formula:

Recall = True positives / (True positives + False negatives)

Recall informs you about the percentage of correctly identified fraud cases out of all the actual fraud cases. A high recall value indicates a low rate of false negatives — meaning you capture as many fraudulent transactions as possible.

Considering the specific business challenges and costs associated with each type of error, it’s essential for a fraud detection system to strike the right balance between precision and recall. And there are two ways to do that — with one approach being more profitable than the other.

Incorporating Precision and Recall with the F1 Score

Some fraud detection systems address precision and recall with the F1 score.

The F1 score is calculated with the following formula:

F1 score = 2 x (Precision x Recall) / (Precision + Recall)

The F1 score does provide a balanced measure of performance.

However, it is a one-size-fits-all approach. The F1 score does not account for the unique business challenges and costs associated with false positives and false negatives.

And that’s a problem.

If you ignore the unique features of your business and rely solely on the F1 score, you’ll have suboptimal decisions that harm your business long-term.

A BETTER APPROACH: To make informed decisions, you need to consider a more holistic approach that takes into account the specific costs and challenges your business faces. This includes evaluating trade-offs between not just precision and recall but also factors like customer retention, reputation management, and potential financial losses. By considering the broader context and aligning the threshold selection with your business objectives, you can develop a fraud detection system that not only maintains accuracy but also addresses the specific challenges of your industry.


Incorporating Precision and Recall into a Customized Strategy

To introduce a better, more profitable alternative to the F1 score, we’ll look at the risk management priorities for two different businesses.

To create a level playing field for our comparison, we'll use the same set of example transactions for both cases. The common dataset comprises 2,300 transactions — including 300 fraudulent ones.

With identical recall, precision, and F1 score in both scenarios, we can direct our focus toward the specific challenges that each business faces when setting their threshold values.

Exploring evaluation metrics: recall, precision, and F1 score

Recall example
Precision example
F1 score example

The Predominance of Precision for Pizza

Making pizza

Imagine a bustling pizza shop.

Surviving as a restaurant in a highly competitive industry is challenging.

The customer is always right, and the owner is accustomed to "eating" the cost of a pizza now and then to keep the clientele happy. Whether it's due to a cold pizza or bad customer service, they prefer to err on the side of keeping their customers satisfied.

Dealing with fraudulent transactions adds another layer of complexity.

Because a primary concern for the pizza shop is to maintain the loyalty of its returning customers, incorrectly flagging a legitimate customer's transaction as fraudulent can have severe consequences. After all, a customer who gets declined for purchasing a pizza can easily walk away and buy from a competitor instead.

In data science speak, this translates to a strong requirement for high-precision fraud detection.

We can put some numbers to this by exploring the specific costs associated with different types of fraud detection errors.

In this scenario, we assume an average transaction value of $50 with a profit of $25 per transaction.

We estimate the average loss for false positives to be -$150. This takes into account not only the potential loss of a loyal customer but also the associated merchant risks and the negative impact on the shop's reputation.

On the other hand, false negatives occur when the fraud detection system fails to identify a fraudulent transaction. For the pizza shop, the estimated cost of false negatives is -$60. This includes the financial loss associated with the fraudulent transaction itself and potential risks such as chargebacks or legal complications.

While true positives and true negatives do not directly incur any costs in this simplified example, they still play significant roles.

True positives represent the successful identification of fraudulent transactions, safeguarding the pizza shop from potential financial losses and ensuring the security of its operations. True negatives denote legitimate transactions correctly identified as non-fraudulent, contributing to the overall profitability of the pizza seller.

Using these costs — along with the standardized precision and recall values established for this test — we can build a net profit curve for the pizza business.

Pizza selleer

The optimum threshold is the point at which the net profit is at a maximum (i.e. normalized net profit = 1.0). Given our proposed business impacts, this value is 0.83 for the pizza shop. Adjusting the threshold to be lower or higher increases the number of false negatives or false positives respectively, which decreases the overall profit for the business.

The peak of the profit curve is significantly skewed towards higher precision than the F1 score would lead us to believe.

In fact, using the F1-predicted threshold of 0.5 would result in a normalized net profit of 0.94, representing a 6% loss in revenue for this business.

This means that for the pizza shop, prioritizing precision over recall is crucial to maximizing revenue.

The Relevance of Recall for Rubies


Now, picture a high-end ruby emporium — a business with a very different story.

This industry operates with a smaller number of high-value transactions, catering to an elite group of customers. For the ruby seller, the consequences of allowing even a single fraudulent transaction to pass can be catastrophic, resulting in significant financial losses and irreparable damage to their reputation.

Maximizing the identification of fraudulent transactions is paramount for mitigating risks, even if it means rejecting a higher number of legitimate customers.

Unlike the pizza shop, where losing a customer can be particularly detrimental due to the presence of numerous competitors, the ruby emporium operates in a niche market with limited alternatives. A declined customer is more likely to reach out to the business directly to seek a resolution rather than immediately switching to a competitor.

As before, let’s explore the specific costs associated with different types of errors in the ruby industry.

In our simplified example, each transaction has an average value of $2,000 with a profit of $700. The average loss for false positives is -$100. This includes the costs associated with investigating flagged transactions and potential delays in processing legitimate transactions.

On the other hand, the estimated cost of false negatives is -$2,500. This includes the financial loss incurred from the fraudulent transaction itself, potential business damages, and associated risks. The higher cost compared to the average transaction value reflects the potential impact on the seller's reputation and additional consequences.

Similar to the pizza shop example, true positives and true negatives do not directly incur any costs in this simplified scenario. However, they are crucial for the successful identification of fraudulent and non-fraudulent transactions, respectively, ensuring the financial security and integrity of the ruby emporium.

Using these costs we can build a net profit curve for the ruby emporium.

Ruby seller

In this case, the optimum threshold is significantly skewed towards lower thresholds, reflecting the increased importance of high recall in the business.

If the F1-predicted threshold of 0.5 was used in this case, the normalized net profit would be 96%, representing a 4% decrease in revenue. Given the price of the individual items sold, that lost revenue could quickly become game-changing for this business!

The analysis indicates that ruby emporium's priority should be to maximize the identification of fraudulent transactions, even if it means rejecting some legitimate customers. By emphasizing recall, the ruby seller can effectively mitigate risks and protect their business from significant financial losses and reputational damage.


Customized vs. Standardized

The presented figure captures the essence of our analysis:

Recall for rubies and pizza
Precision for rubies and pizza

Our analysis showcases the importance of a customized strategy. It is essential for businesses to go beyond the simplistic evaluation provided by the F1 score to strike the right balance between precision and recall.

This approach enables businesses to:

  • Safeguard customer trust
  • Protect the business’s reputation
  • Mitigate financial risks
  • Optimize profitability

Taking Control of Your Fraud Detection

In the dynamic world of business, fraud can be a lurking threat. The need for precision and recall in fraud detection is clear, but it's even more important to understand how these metrics relate to your specific industry.

Here are some practical steps to take control of your fraud detection strategy.

  1. Assess your environment. Delve deep into your business landscape. Understand your industry, customer behavior, and the potential costs tied to fraud.
  2. Tailor your approach. Find the right balance between precision and recall based on your unique business priorities. Consider whether you operate in a highly competitive market or in a niche with limited alternatives.
  3. Analyze cost. Examine your transaction data to identify the financial implications of different types of errors, such as false positives and false negatives.
  4. Create a custom solution. Collaborate with experts to create a customized fraud detection solution that aligns perfectly with your business requirements.
  5. Implement and train. Ensure your team is well-versed in using the chosen fraud detection system and implement it effectively.
  6. Monitor regularly. Keep a close eye on your system's performance and be prepared to fine-tune your thresholds as your business evolves.
  7. Stay informed. Stay updated on the latest developments in fraud detection to adapt to emerging threats.

With the right strategy, precision, and recall, you can protect your business from fraud while maintaining customer trust.

And Kount can help.

Whether you have a restrictive recall requirement or prioritize perfect precision, we can customize our machine learning to align with your business’s goals. Reach out to the team today to explore how we can customize a fraud detection solution tailored to your business.

Don't let fraud undermine your success — take control today and secure your business's future.

Schedule a demo of Kount

Related content

See more related content


Ramin Madarshahian

Data Scientist

Dr. Ramin Madashahian is a seasoned data scientist specializing in fintech and fraud detection. With a PhD in Structural Engineering and a Master's in Statistics, Ramin brings over seven years of industry experience, including a prestigious postdoctorate at UC San Diego.

Currently serving as a staff data scientist at Equifax, Ramin is at the forefront of combating fraud, particularly in the dynamic landscape of fintech. His expertise in data-driven solutions has led to groundbreaking innovations, including patented methods for enhancing fraud detection models.

As a thought leader in the field, Ramin's dedication to advancing financial security is unparalleled. His contributions extend beyond Equifax, as the former chair and founder of the Data Science Technical Division for the Society for Experimental Mechanics (SEM), where he pioneered the application of data science in engineering.