Banner


Workshop 2.1: R Demo Notebook

Anomaly detection and threat hunting on Windows logon data using anomalize

Reference original blog post by Russ Mcree:

Quick List of Useful Packages

#Suppress R Warnings
options(warn=-1)
# Load R Packages
pkgs <- c(pkgs <- c("tibbletime", "tidyverse","anomalize", "jsonlite", "curl", "httr", "lubridate","dplyr"))
sapply(pkgs, function(x) suppressPackageStartupMessages(require(x, character.only = T)))
tibbletime
TRUE
tidyverse
TRUE
anomalize
TRUE
jsonlite
TRUE
curl
TRUE
httr
TRUE
lubridate
TRUE
dplyr
TRUE

Install Missing packages

install.packages("tibbletime")
Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done
install.packages("anomalize")
Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done
#Suppress R Warnings
options(warn=-1)
# Load R Packages
pkgs <- c(pkgs <- c("tibbletime", "tidyverse","anomalize", "jsonlite", "curl", "httr", "lubridate","dplyr"))
sapply(pkgs, function(x) suppressPackageStartupMessages(require(x, character.only = T)))
tibbletime
TRUE
tidyverse
TRUE
anomalize
TRUE
jsonlite
TRUE
curl
TRUE
httr
TRUE
lubridate
TRUE
dplyr
TRUE

Read csv using R

# Read CSV into R
urlfile<-'https://raw.githubusercontent.com/ashwin-patil/threat-hunting-with-notebooks/master/rawdata/UserLogons-demo.csv'
userlogondemo<-read.csv(urlfile)

Printing the structure of the data

str(userlogondemo)
'data.frame':	2844 obs. of  6 variables:
 $ Date           : chr  "1/3/2018" "1/3/2018" "1/3/2018" "1/3/2018" ...
 $ EventId        : chr  "Microsoft-Windows-Security-Auditing:4624" "Microsoft-Windows-Security-Auditing:4624" "Microsoft-Windows-Security-Auditing:4624" "Microsoft-Windows-Security-Auditing:4624" ...
 $ AccountNtDomain: chr  "LABDOMAIN.LOCAL" "LABDOMAIN.LOCAL" "LABDOMAIN.LOCAL" "LABDOMAIN.LOCAL" ...
 $ AccountName    : chr  "SRVACCNT-01" "SRVACCNT-01" "SRVACCNT-01" "SRVACCNT-01" ...
 $ logontype      : int  10 11 2 4 7 10 4 3 3 5 ...
 $ TotalLogons    : int  28 2 15592 259 9 1 23 65 7 60 ...

Printing Sample Rows from the dataset

head(userlogondemo)
A data.frame: 6 × 6
DateEventIdAccountNtDomainAccountNamelogontypeTotalLogons
<chr><chr><chr><chr><int><int>
11/3/2018Microsoft-Windows-Security-Auditing:4624LABDOMAIN.LOCALSRVACCNT-0110 28
21/3/2018Microsoft-Windows-Security-Auditing:4624LABDOMAIN.LOCALSRVACCNT-0111 2
31/3/2018Microsoft-Windows-Security-Auditing:4624LABDOMAIN.LOCALSRVACCNT-01 215592
41/3/2018Microsoft-Windows-Security-Auditing:4624LABDOMAIN.LOCALSRVACCNT-01 4 259
51/3/2018Microsoft-Windows-Security-Auditing:4624LABDOMAIN.LOCALSRVACCNT-01 7 9
61/3/2018Microsoft-Windows-Security-Auditing:4624LABDOMAIN.LOCALSRVACCNT-0110 1
#Read Downloaded csv and arrange by columns
userlogonsummary <- userlogondemo %>%
                    arrange(AccountName,AccountNtDomain,Date)
# Aggregate By User Logon View
byuser <- userlogonsummary %>%
          mutate(Date = as.Date(Date, "%m/%d/%Y")) %>% 
          group_by(Date, AccountName) %>%
          summarise(logoncount=sum(TotalLogons)) %>% 
          ungroup() %>%
          arrange(AccountName, Date)

head(byuser)
`summarise()` has grouped output by 'Date'. You can override using the `.groups` argument.
A tibble: 6 × 3
DateAccountNamelogoncount
<date><chr><int>
2018-01-01ASHWIN 563
2018-01-03ASHWIN1462
2018-01-04ASHWIN1416
2018-01-05ASHWIN2241
2018-01-06ASHWIN1830
2018-01-08ASHWIN 278

Time Series anomaly detection

Anomalize : https://github.com/business-science/anomalize anomalize enables a tidy workflow for detecting anomalies in data. The main functions are time_decompose(), anomalize(), and time_recompose(). When combined, it’s quite simple to decompose time series, detect anomalies, and create bands separating the “normal” data from the anomalous data

anomalize has three main functions:

  • time_decompose(): Separates the time series into seasonal, trend, and remainder components

  • anomalize(): Applies anomaly detection methods to the remainder component.

  • time_recompose(): Calculates limits that separate the “normal” data from the anomalies!

# Filtering dataset for specific User
FilteredAccount = "SRVACCNT-01"

# Ungroup dataset , run Time series decomposition method and plot anomalies
graphUser <- byuser %>%
  filter(AccountName == FilteredAccount) %>% 
  ungroup()%>%
  time_decompose(logoncount, method = "twitter", trend = "3 months") %>%
  anomalize(remainder, method = "gesd") %>%
  time_recompose() %>%
  # Anomaly Visualziation
  plot_anomalies(time_recomposed = TRUE) +
  labs(title = paste0("User Anomaly: ",FilteredAccount), subtitle = "Twitter + GESD Methods")

plot(graphUser)
Registered S3 method overwritten by 'tune':
  method                   from   
  required_pkgs.model_spec parsnip

Converting from tbl_df to tbl_time.
Auto-index message: index = Date

frequency = 6 days

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 

median_span = 53 days
../../_images/day2-1-R-demo-notebook_15_1.png
# Plot Time series decompositions components separately
byuser %>%
    filter(AccountName == FilteredAccount) %>% 
    ungroup()%>%
    time_decompose(logoncount, method = "twitter", trend = "3 months") %>%
    anomalize(remainder, method = "gesd") %>%
    plot_anomaly_decomposition() +
    labs(title = "Decomposition of Anomalized Logons")
Converting from tbl_df to tbl_time.
Auto-index message: index = Date

frequency = 6 days

median_span = 53 days
../../_images/day2-1-R-demo-notebook_16_1.png