Agenda (Eastern Time)

Friday, May 8th




Welcome Binder

Roberto Rodriguez @Cyb3rWard0g, Threat Researcher, Microsoft MSTIC


What About Jupyter Notebooks? Binder

An introduction to Jupyter Notebooks to kick off the main event! We are going to talk about the basics of notebooks, architecture, and how to deploy your own at home and even use open infrastructure to deploy one in the cloud and access it through your favorite browser. In addition, we are going to go over a few ways to connect to a few popular SIEMs such as ELK, splunk and Azure Sentinel from a notebook and return a dataframe to perfom additional analysis.

Roberto Rodriguez @Cyb3rWard0g, Threat Researcher, Microsoft MSTIC


Basic Data Analysis techniques with PySpark over the ATT&CK APT29 Evals datasets Binder

In this session, we will go over a few data analysis techniques leveraging PySpark and visualizations to generate additional context around security event logs and get a better understanding of adversarial techniques through data. We are going to be using the open dataset created by the Mordor project’s team after following the Mitre ATT&CK evaluations methodology and executing the public emulation plans.

Jose Luis Rodriguez @Cyb3rPandaH, Threat Researcher, NOVA Community College




The forever journey of making hard technical concepts/activities approachable, repeatable and explainable

A Security Operations Center can be a hard thing to explain - often involving complex technical activities like managing detection rules and threat hunting. At Expel, we want our service to be as transparent as possible. For the past 2 1/2 years we’ve been using Jupyter notebooks to help make these kinds of difficult tasks approachable, explainable, and repeatable. In this presentation we will take you on our journey from simple beginnings (naively thinking storing notebooks on Github was good enough) to our current CI/CD pipeline with JupyterHub and beyond! We’ll demo a few notebooks we’ve built that are used daily for various tasks including triaging hunting results, evaluating detection settings, and gathering metrics on the state of our security operations.

Andrew Prichett, Senior Detection & Response Engineer, Expel
Dan Whalen @vac4n7, Principal Detection & Response Engineer, Expel
Jon Hencinski @jhencinski, Director of Global Security Operations, Expel
Peter Silberman @petersilberman, CTO, Expel


The Basics of Forensics/IR in a Gsuite Environment

The Basics of Forensics/IR in a Gsuite Environment.

Jeff Bryner @0x7eff, CISO, Vacasa


Jupytering your security operations.

This talk will no focus on a particular notebook. The talk focus on our approach in solving some of the security operations such as: - Low efficiency such as number of ‘clicks’ before the analyst can obtain the information they need for an event - High turnover which leads to operation centers not running in a consistent standard - Lack of knowledge sharing between analysts - Lack of adequate documentation of work - Different technology user interfaces to work with We will share our approach to the problem using a combination of containerized solution (Kubernetes), Jupyter notebook and API wrappers. With that, we built an environment that attempts to solve some of these problems. The jupyter environment allows an analyst to perform different external and internal queries such as virustotal, passivetotal or internal tooling such as ticketing system, SIEM and sandbox solution through the use of different API wrappers. The analyst could stay in the same environment to perform their investigation without pivoting to different platforms. And since Jupyter Notebook supports python, an analyst could use python to work with the data and also leverage different data science libraries such as pandas, numpy, and hvplot to assist them to investigate or hunt. The analyst could also share their investigation thought process by sharing their notebooks with other analysts without much additional documentation work. The environment is managed using containerized technology. This provides the advantage of providing a consistent environment that the analyst does not need to maintain themselves. It also allows us to patch or upgrade the environment efficiently. We also included personal and shared persistent storage that allows the analyst to store their own notebooks and also allow sharing of notebooks with other analysts easily.

Wei Chea @77_6a, Security Engineer, Grab


OSQuery Table Visualizer

This notebook was created from curiosity on how OSQuery tables relate to each other for possible JOINS. The audience can learn how to extract specific data from a source and transform it in a graph to display relationships.

Sevickson @SKwid345, Security Engineer, DICTU




MSTICPy and Notebooklets

Jupyter notebooks are an amazing medium for developing repeatable data analysis tasks but have a few drawbacks: 1. Code re-use is limited to copy/paste and your ability to remember just where that cool bit of code you once wrote was. 2. Big code blocks detract from the results you are trying to present and can be intimidating to non-programmers. 3. Very limited testability. We put a lot of our reusable InfoSec code into an OS Python package - msticpy. This allows to do complex visualization and analysis routines into one or two lines of code. However, we were still writing common sequences of code cells in many notebooks - most very specific to the data environment/schema we were working in. Notebooklets (msticnb) aims to capture platform/SIEM-specific notebook patterns into a reusable format. You can import notebooklets such as host_summary, network_flow_summary into your notebook and execute the equivalent of several cells with just a couple of lines of code. In this session, I’ll look at both msticpy and msticnb and demo some of their capabilities.

Ian Hellen @ianhellen, Principal Software Engineer, Microsoft MSTIC


GPU-accelerated network mapping

A basic ability is mapping a bunch of network logs - Zeek, CloudTrails, FireEye/Palo Alto firewall, VPN, etc. - and especially together. Visual graph analysis of log files and SIEM queries is a natural fit. The trick is handling log volume and avoiding UI coding. This surprisingly tiny notebook shares going from a large log file, through the emerging pip-installable open source GPU python analytics ecosystem (think Pandas, SQL, SciKit, Spark, NetworkX, …), and into point-and-click GPU visual graph analysis tool Graphistry. The result is, with a few lines of code, you can quickly go from a log query to graph insights, even when there are billions of nodes and edges!

Leo Meyerovich @lmeyerov, CEO, Graphistry
Rodrigo Aramburu @rodaramburu, CEO, BlazingSQL
Winston Robson @winstonarobson, Data Scientist, BlazingSQL
Brad Rees @BradReesWork, Sr. Eng. Manager, Nvidia
Bartley Richardson @BartleyR, AI Infra Manager, Nvidia


Distinguishing Human from Machine Interaction

Use Jupyter, Pandas and some basic statistics to distinguish between humans and machines.

Thomas Patzke @blubbfiction, Threat Researcher, NA


Making notebooks work for everyone.

Rather than focus on a single notebook I will cover some of the highlights of the Azure Sentinel Notebooks we have published from the perspective of making them usable and resilient for a wide range of users and skill sets. This will focus less on the specific technology and features (although we will look at some) but instead look at how we have taken Notebooks from something you personally develop and use at a individual or team level and developed generic versions that can in theory be used by any Sentinel customer regardless of skill, and largely agnostic of what data you have. All the Notebooks we have are open sourced and I will pick elements from across them to demo and talk about.

Pete Bryan @MSSPete, Senior Software Engineer, Microsoft MSTIC


Collecting IOCs to Detect Encrypted DNS

There has been some attention given lately to different protocols used to encrypt DNS traffic and their pros and cons with respect to privacy and security. Since encrypted DNS on our networks could lessen the visibility of several of tools in our security stacks, I looked into ways to detect and prevent its use in an enterprise network setting. One of the simplest options is to collect and use domains and IPs publicly known to be used for DNS encryption. There is a good source for that information, but unfortunately the format is not ideal. This notebook will take you through the process of decoding DNSStamps into IOCs that can be used to detect and/or block the use of DNS encryption on your network. This code can be used to ensure that the lists are always up to date based on the configuration files that DNS Encryption software uses.

Troy Kent @SonicTheHexHog, Threat Researcher, Awake Security


Anomaly detection and visualization using Time Series Decomposition

As part of security monitoring and incident response, analysts often develop several detections based on static thresholds within a specified time interval window. e.g. brute force attack may have logic of 50 logon failures in 1 min etc. Traditionally this threshold value is static and identified by either manually analyzing historical trend of events or taking average over a longer period. In a typical enterprise environments, these detections flag false positives or misses true negatives since static threshold does not consider different time intervals such as after hours, weekends, holidays which affect it`s values. Also in addition, despite of static threshold being reached/ exceeded slightly, the results are often uninteresting and generate false positives for analysts. As part of triage, analyst keep on improving detections continuously via whitelisting to reduce false positive rate. This approach is not scalable and time series analysis can help us in such cases. It will identify time-based patterns(e.g. seasonality, trend etc) by extracting meaningful statistics to correctly flag outlier. The outliers are generally robust to false positives as it considers seasonality and historical trend before flagging. In this session, we will look at practical use cases on security event logs along with the example implementation notebook walk through. We will start with loading data and preparation using Pandas, data wrangling using Seasonal-Trend decomposition using LOESS (STL) approach available in stats library to create baseline pattern from seasonality, trend, residuals etc and lastly use z-score to calculate the score which can be used to flag point anamolies deviating far from the baseline. Lastly we will visualize the results as interactive chart using Bokeh library. We will conclude the session with how these functionalities are natively available in msticpy library to be able to use on any type of data.

Ashwin Patil @ashwinpatil, Senior Program Manager, Microsoft MSTIC


Transience LightGBM Binary Classifier

The audience will learn how to build a binary classifier to predict whether new security assets will be transient/ephemeral using a Databricks notebook. Using Pyspark, the notebook ingests curated data from Azure Data Lake angd then builds features to censure the data model. ML pipelines are used to vectorized and one-hot encode 10 categorical variables, then over 40 additional numeric variables are added to create a final data model. The data model is randomly sampled into 70/30 training and test data sets, then a Light GBM is used to create a classifier with the training data. The training and test data is evaluated with area under the precision/recall curve and model stats are preserved using ML Flow. Final predictions on new data is written back to Azure Data Lake.

Daniel Tetrick, Data Scientist, Microsoft CDG Security


Closing Remarks

Roberto Rodriguez @Cyb3rWard0g, Threat Researcher, Microsoft MSTIC