
Workshop 2.1: R Demo Notebook#
- Contributors: - Ashwin Patil (@ashwinpatil) 
- Jose Rodriguez (@Cyb3rPandah) 
- Ian Hellen (@ianhellen) 
 
- Agenda: - Jupyter is not just Python 
 
- Notebook: https://aka.ms/Jupyterthon-ws-2-1 
- License: Creative Commons Attribution-ShareAlike 4.0 International 
- Q&A - OTR Discord #Jupyterthon #WORKSHOP DAY 2 - JUPYTER ADVANCED 
Anomaly detection and threat hunting on Windows logon data using anomalize#
Reference original blog post by Russ Mcree:

#Suppress R Warnings
options(warn=-1)
# Load R Packages
pkgs <- c(pkgs <- c("tibbletime", "tidyverse","anomalize", "jsonlite", "curl", "httr", "lubridate","dplyr"))
sapply(pkgs, function(x) suppressPackageStartupMessages(require(x, character.only = T)))
- tibbletime
- TRUE
- tidyverse
- TRUE
- anomalize
- TRUE
- jsonlite
- TRUE
- curl
- TRUE
- httr
- TRUE
- lubridate
- TRUE
- dplyr
- TRUE
Install Missing packages#
install.packages("tibbletime")
Updating HTML index of packages in '.Library'
Making 'packages.html' ...
 done
install.packages("anomalize")
Updating HTML index of packages in '.Library'
Making 'packages.html' ...
 done
#Suppress R Warnings
options(warn=-1)
# Load R Packages
pkgs <- c(pkgs <- c("tibbletime", "tidyverse","anomalize", "jsonlite", "curl", "httr", "lubridate","dplyr"))
sapply(pkgs, function(x) suppressPackageStartupMessages(require(x, character.only = T)))
- tibbletime
- TRUE
- tidyverse
- TRUE
- anomalize
- TRUE
- jsonlite
- TRUE
- curl
- TRUE
- httr
- TRUE
- lubridate
- TRUE
- dplyr
- TRUE
Read csv using R#
# Read CSV into R
urlfile<-'https://raw.githubusercontent.com/ashwin-patil/threat-hunting-with-notebooks/master/rawdata/UserLogons-demo.csv'
userlogondemo<-read.csv(urlfile)
Printing the structure of the data#
str(userlogondemo)
'data.frame':	2844 obs. of  6 variables:
 $ Date           : chr  "1/3/2018" "1/3/2018" "1/3/2018" "1/3/2018" ...
 $ EventId        : chr  "Microsoft-Windows-Security-Auditing:4624" "Microsoft-Windows-Security-Auditing:4624" "Microsoft-Windows-Security-Auditing:4624" "Microsoft-Windows-Security-Auditing:4624" ...
 $ AccountNtDomain: chr  "LABDOMAIN.LOCAL" "LABDOMAIN.LOCAL" "LABDOMAIN.LOCAL" "LABDOMAIN.LOCAL" ...
 $ AccountName    : chr  "SRVACCNT-01" "SRVACCNT-01" "SRVACCNT-01" "SRVACCNT-01" ...
 $ logontype      : int  10 11 2 4 7 10 4 3 3 5 ...
 $ TotalLogons    : int  28 2 15592 259 9 1 23 65 7 60 ...
Printing Sample Rows from the dataset#
head(userlogondemo)
| Date | EventId | AccountNtDomain | AccountName | logontype | TotalLogons | |
|---|---|---|---|---|---|---|
| <chr> | <chr> | <chr> | <chr> | <int> | <int> | |
| 1 | 1/3/2018 | Microsoft-Windows-Security-Auditing:4624 | LABDOMAIN.LOCAL | SRVACCNT-01 | 10 | 28 | 
| 2 | 1/3/2018 | Microsoft-Windows-Security-Auditing:4624 | LABDOMAIN.LOCAL | SRVACCNT-01 | 11 | 2 | 
| 3 | 1/3/2018 | Microsoft-Windows-Security-Auditing:4624 | LABDOMAIN.LOCAL | SRVACCNT-01 | 2 | 15592 | 
| 4 | 1/3/2018 | Microsoft-Windows-Security-Auditing:4624 | LABDOMAIN.LOCAL | SRVACCNT-01 | 4 | 259 | 
| 5 | 1/3/2018 | Microsoft-Windows-Security-Auditing:4624 | LABDOMAIN.LOCAL | SRVACCNT-01 | 7 | 9 | 
| 6 | 1/3/2018 | Microsoft-Windows-Security-Auditing:4624 | LABDOMAIN.LOCAL | SRVACCNT-01 | 10 | 1 | 
#Read Downloaded csv and arrange by columns
userlogonsummary <- userlogondemo %>%
                    arrange(AccountName,AccountNtDomain,Date)
# Aggregate By User Logon View
byuser <- userlogonsummary %>%
          mutate(Date = as.Date(Date, "%m/%d/%Y")) %>% 
          group_by(Date, AccountName) %>%
          summarise(logoncount=sum(TotalLogons)) %>% 
          ungroup() %>%
          arrange(AccountName, Date)
head(byuser)
`summarise()` has grouped output by 'Date'. You can override using the `.groups` argument.
| Date | AccountName | logoncount | 
|---|---|---|
| <date> | <chr> | <int> | 
| 2018-01-01 | ASHWIN | 563 | 
| 2018-01-03 | ASHWIN | 1462 | 
| 2018-01-04 | ASHWIN | 1416 | 
| 2018-01-05 | ASHWIN | 2241 | 
| 2018-01-06 | ASHWIN | 1830 | 
| 2018-01-08 | ASHWIN | 278 | 
Time Series anomaly detection#
Anomalize : business-science/anomalize anomalize enables a tidy workflow for detecting anomalies in data. The main functions are time_decompose(), anomalize(), and time_recompose(). When combined, it’s quite simple to decompose time series, detect anomalies, and create bands separating the “normal” data from the anomalous data
anomalize has three main functions:
- time_decompose(): Separates the time series into seasonal, trend, and remainder components
- anomalize(): Applies anomaly detection methods to the remainder component.
- time_recompose(): Calculates limits that separate the “normal” data from the anomalies!
# Filtering dataset for specific User
FilteredAccount = "SRVACCNT-01"
# Ungroup dataset , run Time series decomposition method and plot anomalies
graphUser <- byuser %>%
  filter(AccountName == FilteredAccount) %>% 
  ungroup()%>%
  time_decompose(logoncount, method = "twitter", trend = "3 months") %>%
  anomalize(remainder, method = "gesd") %>%
  time_recompose() %>%
  # Anomaly Visualziation
  plot_anomalies(time_recomposed = TRUE) +
  labs(title = paste0("User Anomaly: ",FilteredAccount), subtitle = "Twitter + GESD Methods")
plot(graphUser)
Registered S3 method overwritten by 'tune':
  method                   from   
  required_pkgs.model_spec parsnip
Converting from tbl_df to tbl_time.
Auto-index message: index = Date
frequency = 6 days
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
median_span = 53 days
 
# Plot Time series decompositions components separately
byuser %>%
    filter(AccountName == FilteredAccount) %>% 
    ungroup()%>%
    time_decompose(logoncount, method = "twitter", trend = "3 months") %>%
    anomalize(remainder, method = "gesd") %>%
    plot_anomaly_decomposition() +
    labs(title = "Decomposition of Anomalized Logons")
Converting from tbl_df to tbl_time.
Auto-index message: index = Date
frequency = 6 days
median_span = 53 days
 
