How to Stop Alert Fatigue in Microsoft Sentinel

As Security Engineers and SOC Analysts, one of our main pain points is the sheer amount of false positive incidents and alerts that we have to sift through every day. An exercise I like to do every couple of months is going through the entire history of alerts and incidents that triggered in my environment. Then identify trends, and make exclusions for alerts triggered by normal day to day operations within my organization.

Thus, I usually follow this high level workflow every quarter or so. I have been able to minimize the number of false positives significantly. The number of alerts displayed on my board is manageable and I am able to go through everything every day without feeling overwhelmed or that I may have missed something critical.

High-Level Process

Login to security.microsoft.com and navigate to Alerts.
Expand the timeframe to the desired period. I usually do 6 months worth of alerts.
Remove any filters to make sure the list of alerts includes everything including resolved alerts.
Next, click Export and wait for the .csv file to be ready for download, then download it.
When downloaded, open the .csv file and select the entire list of alerts (Including all of the columns)
Click Analyze Data under the Home tab.
One of the PivotCharts should be Count of Alert Name, click on Insert PivotChart if you see it. And if you can’t see it, type “Count of unique alert name” in the “Ask a question about your data” search bar and click on Insert PivotChart on the Alert name answer that is returned.
A new Sheet1 will be automatically be added to the spreadsheet. It should look something similar to the following:
From this point on, what I like to do is address the Alerts from top to bottom, prioriting the ones with higher counts, and this is my approach:
1. Select the Alert name.
2. Find out its source. (Custom detection, Analytics Rule, Native Defender Alert…)
3. If it’s based on a Custom Detection or Analytics Rule, I like to manually run the corresponding KQL query and analyze the results.
4. Identify trends, work with different teams, and decide whether or not the detections are false positives.
5. Manually amend the KQL queries to exclude the false positives and save.

Example:

For the sake of this example, I am going to pick on a custom detection called Detection of Powershell making a registry change.

First, in Microsoft Defender, I am navigating to Investigation & Response > Hunting > Custom detection rules. Then I search for and click on the Custom Rule in question.

Next, I click the dots and select Modify query.

This should bring up Advanced hunting and the exact query being used for the Custom Detection rule. I usually copy this KQL query into a new tab so I can tweak it without having to worry about messing something up.

In our setup, these Custom Detection rules usually have “| where Timestamp > ago(90d)”, Which I like to modify to 180d. The data displayed will depend on your retention policies for the data tables.

Next I run the query and export the results into a .csv file so I can easily analyze it.

Detailed Analysis and Trend Detection

In the spreadsheet with different logs, I inspect the different columns (DeviceName, Registry related columns, Initiating Accounts and Command lines…). My goal here is to try and identify what caused the alert to trigger and see if I can identify certain patterns. I find myself having to work with the Endpoints and Servers team very often as they are the ones working on different patching and automation initiatives that sometimes trigger these alerts.

Depending on my findings, I tweak the KQL query to exclude certain things as I confirm they can safely be ignored.

The rule of thumb here is to be as explicit as possible and make the exclusions as tight as possible. Or we could end up exposing our environment unnecessarily.

I had already visited the rule I picked as an example for this post a few months ago. So I did not find anything that needed to be excluded, but I hope you get the idea. Your environment will have completely different findings. Accordingly, you would need to put in some effort to try and understand the different behaviors.

Thoughts

I have to say, this exercise often helps me understand what happens in my environment. It makes it easy to establish mental maps and benchmark that helps with other exercises like Incident Response and Threat Hunts.

Do you have a different way of approaching false positives and minimize alert fatigue as a Cybersecurity professional?

High-Level Process

Example:

Detailed Analysis and Trend Detection

Thoughts

You Might Also Like

Ingest Palo Alto FW Logs into Microsoft Sentinel SIEM