Agenda (Eastern Timezone)#

Time

Session

10:30

Opening Remarks

Roberto Rodriguez @Cyb3rWard0g, Principal Threat Researcher, Microsoft

10:45

Keynote - The MSTICPy Journey - the road to creating functionality to support notebooks in Infosec

The talk charts the journey from raw code in notebooks to a PyPI package: why we did this, how we decided on features to implement, the challenges of a continually growing package. We’ll finish by looking at current priorities - primarily the goal to refactor functionality to make it easier to find and use.

Ian Hellen @ianhellen, Principal Developer and Security Engineer, Microsoft

11:25

Break

11:35

Your first graph neural network: Detecting suspicious logins with link prediction

Graph neural networks are Science’s official breakthrough of the year, and we can now use them to automatically answer basic behavior questions like which user login events are suspicious. Especially exciting, new graph autoML libraries let us use these techniques with just a bit of knowledge of Python and Pandas. We will use winlogs data as an example to walk through, and describe how to use the same idea on other user logs: - What a graph neural network is and what problems they are good for - How to automatically transform logs to graphs with hypergraphs - Automatically visualizing the event graph, and as a bonus, as UMAP embeddings auto-tuned to whatever features we are examining - Preparing the graph for AI using automatic feature engineering… in one line of code - Training a GNN link prediction model on the graph using a prebuilt popular architecture - Scoring historic and new log events based on the model - Using the scores to visually understand typical behavior and visually audit surprises - How the same GNN process can be applied to other common cybersecurity and anti-fraud scenarios like account takeovers and suspicious transactions The notebook and workflow will use free OSS tools (Pandas, DGL/PyG, PyGraphistry[AI]) on free data. For participants with access to GPU notebooks, the workflow is accelerated on GPUs, and CPU-only will work too.

Leo Meyerovich @lmeyerov, CEO & Founder, Graphistry

12:15

Exploratory Threat Analytics using Jupyter Notebooks

Using Jupyter notebooks to Conduct Data analysis and tie different Actions & Tools Together.This Talk aims to Demonstrate how Notebooks could be used as a Canvas to paint your analytics/research over data and Complete a lifecycle of Alert-Investigation or Threat Analytics Objectives: 1. Calling Alerts/Data from SIEM (Elasticsearch) 2. Utilizing Data Analysis Frameworks and Libraries like Pyspark/Pandas 3. Utilizing APIs of Different Tools (EDR/Case management) all from a single Place. 4. Call VirusTotal/OSINT APIs for Reputation and Other Data Enrichment functions 5. Exploratory Analysis and Visualization Capabilities using Frameworks like Plotly 6. Create a Case for Escalated Incident-Response action As end result shows how this cycle has produced An Investigation Report. An Exploratory Hunt-Notebook A Repetitive Notebook as Re-usable component.

Saksham Tushar, Threat Detection Engineer, CRED India

12:55

Break

13:30

Magic Tricks Revealed: Demystifying IPython Magics

Jupyter notebooks support a feature called “magic commands” (aka “IPython magics”) that use a special syntax (%) for calling utility functions. This talk explains how magics work and how to write custom magics that make security analysis and research tasks more efficient. The live demo is a proof-of-concept magic command that integrates the Azure CLI with pandas for convenient evidence collection and analysis in Jupyter notebooks. The PoC code will be open-sourced under the MIT license prior to the talk.

Ryan Marcotte Cobb @detectdotdev, Principal Security Researcher, SecureWorks

14:10

Establish attacker footprint with Microsoft Defender Threat Intelligence

Microsoft Defender for Threat Intelligence, formerly RiskIQ, is a Threat Intelligence Engine that collects masses of data from the internet and crawls the internet to derive additional data creating a dense interconnected web of internet observations that can be used as signals in the art of threat intelligence, this gives the ability to see a lot of how the attacker operates. Investigations on this data are incredibly powerful allowing the discovery of infrastructure connections that could not have been seen or considered before. This visibility makes it harder for the attacker to go unidentified and theoretically making the playing field of the attacker bigger and the goal smaller! So how do you limit this scope and classify what belongs to an attacker or not? Especially when the attacker is using the same infrastructure that a normal company may use to support their needs. How do we make this large, interconnected web of threat intelligence actionable to an analyst? In this talk, we will present a notebook that address this challenge: Walking in the footsteps of attackers - Profiling compromised infrastructure using Machine Learning.

Chi Nguyen, Product Manager, Microsoft
Amritpal Singh, Security Data Scientist, Microsoft
James Duncan, Security Cloud Solutions Architect, Microsoft

14:50

Break

15:05

Cybersecurity: data holds the answer

The talk is about our new book “Cybersecurity: data holds the answer” which has our latest findings in cybersecurity data science. The book contains a set of eight research projects, which seek to guide the reader on how to use artificial intelligence in defensive and offensive perspectives. On the defensive approximation, the book includes precise experiments to develop models for detecting malware on Android devices, cryptojacking, deepfakes, and malicious botnets. In the offensive approach, models for the generation of non-legitimate multimedia content are explored, and the concept of secure learning and the application of adversarial machine learning is introduced as an approach to find the values that allow biasing the results of a machine learning model. Each project was done using Jupyter notebooks and a research methodology that allows the person to replicate the results. Finally, the code and the results can be found in the next URL i2tResearch/Ciberseguridad_web URL book: https://www.icesi.edu.co/editorial/ciberseguridad-datos/

Christian Camilo Urcuqui Lopez, Data Scientist, None

15:30

Extending MSTICPy - building your own queries and pivot functions

Alongside public/common data sources like VirusTotal, MS Sentinel, etc. SOC analysts use their own crafted queries, Python scripts and external packages to support their workflows. This session gives a quick introduction into extending MSTICPy to support this by: adding custom queries, then and adding these queries as well as custom and third-party Python functions as entity-based Pivot functions. mp.IpAddress.your_function_here()

Ian Hellen @ianhellen, Principal Developer and Security Engineer, Microsoft

15:55

Data Mining Threat Intelligence Reports

There is an ever-increasing set of threat intelligence reports produced by an ever wider number of providers. Keeping up with these reports can be a full-time job - understanding the value they provide, extracting the relevant elements and applying those to the defence of an organization an even bigger ask. In this talk we will cover how we can use Notebooks to help automate this task for us - from parsing reports, to extracting data of value, to validating it, and operationalizing it.

Pete Bryan @MSSPete, Something about security, Microsoft

16:20

Break

16:35

Analyzing billions of passwords from Breach compilation dataset using Jupyter notebook

Data breaches along with dump of sensitive information including passwords, social security and other business critical information is becoming norm in today`s digital world. This stolen information is often sold in underground community forums. This information can be further abused by the buyers in several forms such as buying goods with stolen credit card information, holding social media accounts data for ransom etc. In this presentation, we will take a look at old database dump of 1.4 billion clear text credentials which was compilation of several data breaches including sites such as yahoo.com, LinkedIn etc. The complete database is an archive of 41 GB with around 1980 files. We will take a look at how this dataset can be loaded for scalable data analysis using python and perform common data cleaning and preparation steps. We will then analyze the dataset to extract interesting insights and password trends. We will also visualize the summarized dataset. The goal of this presentation is to demonstrate how one can take real world large dataset, clean, enrich and perform data analysis at scale to extract interesting insights using Jupyter notebook.

Ashwin Patil @ashwinpatil, Senior Program Manager, Microsoft

17:00

Fun with securitydatasets.com and the Kestrel PowerShell Deobfuscator

The Kestrel Threat Hunting Language provides a composable way for conceiving and sharing threat hunts. We will walk the audience through the procedure to create huntbooks in a public Kestrel cloud sandbox (binder environment) and conduct threat hunting on data from securitydatasets.com. In this session, we will first introduce the basic Kestrel concepts and how to use the Kestrel cloud sandbox. Then we will start a Jupyter notebook using the Kestrel language kernel and exercise the Kestrel PowerShell Deobfuscator using adversarial emulation logs from securitydatasets.com. We will hunt step-by-step and explain the analytics used in the hunt. Next, we will (maybe another huntbook?). Lastly, we will show how to integrate Kestrel into existing workflows by using its Python API in a Python notebook. References: - Kestrel home repo: opencybersecurityalliance/kestrel-lang - Kestrel huntbooks: opencybersecurityalliance/kestrel-huntbook - OCA blog on Kestrel cloud sandbox: https://opencybersecurityalliance.org/try-kestrel-in-a-cloud-sandbox/ - OCA blog on PowerShell Deobfuscator: https://opencybersecurityalliance.org/fun-with-securitydatasets-com-and-the-kestrel-powershell-deobfuscator/

Paul Coccoli, Senior Software Engineer, IBM

17:35

Closing Remarks

Roberto Rodriguez @Cyb3rWard0g, Principal Threat Researcher, Microsoft

Time

Session

10:30

Opening Remarks

Roberto Rodriguez @Cyb3rWard0g, Principal Threat Researcher, Microsoft

10:45

Keynote - Sharing is Caring: Sharing Threat Intelligence Notebook Edition

You can derive value from threat intelligence in a number of ways, including a complete threat intelligence platform, ingesting threat feeds, or simply leveraging threat intelligence capabilities found in popular security tools. One less common way to leverage threat intelligence is by sharing practical information with others. Sharing the methods of analysis and derived conclusions is of value for the infosec community as it demonstrates a practical and reproducible procedure. Notebooks are all about sharing tools and workflow, they provide a perfect way to exchange knowledge and to improve the capabilities of your team. Share your hunting and defense techniques - the more you share, the harder it is for bad guys.

Thomas Roccia @fr0gger_, Senior Security Researcher, Microsoft

11:25

Break

11:35

Becoming a Cyber Data Analyst: My First Machine Learning Model

When developing detection analytics, we usually follow a reactive approach after the public disclosure of a vulnerability related to technology used by our organization. We understand the adversary behavior, the impact on the resources, and we finally conduct queries in our telemetry database to validate if an adversary has abused a vulnerability in our systems. But how can we complement this approach? How can we move to a more proactive-like detection analytics? One way to do this is by adding data science knowledge to our data analysis tool kit, especially predictive algorithms. In fact, the use of predictive analytics can help us to add more security insights and support our decisions when tagging events as malicious or not malicious. The next question would be, what do we need to learn about predictive analytics? The National Initiative for Cybersecurity Education (NICE) workforce framework provides us with a good reference of data science knowledge, skills, and abilities that we need to understand to analyze data and provide security and privacy insights to our organization such as developing data models, data encoding, performing regression and classification analysis, and identifying hidden patterns or relationships. In this presentation, we will use a sample dataset to guide you through the taught process of developing and evaluating predictive models to identify malicious network behavior following the CRISP-DM (Cross Industry Standard Process for Data Mining) methodology. We will use R and Jupyter Notebooks to tell the story behind the whole process and understand how the data collected from our networks will define the activities that we need to perform and define the proper predictive model to use. Some of the models we will discuss are Logistic Regression, Support Vector Machines, and Neural Networks. Don’t be afraid of the Math required for these techniques, R will help us with that. We will focus more on the advantages and disadvantages of the model and what metrics can we use to pick the best one.

Jose Rodriguez @Cyb3rPandah, Researcher, MITRE ATT&CK

12:15

Will it embed?: A UMAP-based introduction to making and visualizing embeddings for event and entity data

The lingua franca for working with modern AI is embeddings, which are low-dimensional representations of data that are used to make many AI tasks fast, easy, and composable. Especially effective in problems like cybersecurity and fraud log data, UMAP (universal manifold approximation and project) tools can automatically generate high-quality embeddings. This session looks at the problem of alert fatigue and shows how to treat alert tasks like grouping, visualization, outlier detection, and prioritization as embedding problems. While this may sound complicated, we will show, with modern tools like Pandas and the PyData ecosystem, working with embeddings can be quite simple yet powerful. We will walk through: - What an embedding is and where UMAP fits in the historical timeline - How to quickly run UMAP on any event or entity data, including examples from our own investigations in areas like anti-fraud, human trafficking, alerts, and tickets - Automatically visualizing UMAP - and going well beyond the typical scatterplot! - Quickly improve embeddings through automatic feature engineering and supervision - Some of our favorite parameters - Using the embedding for tasks like visually summarizing the day’s incidents, bucketing alerts into prioritized incident groups, automatically finding related incidents, and detecting outliers - Emerging patterns for how to go from notebook experiments to embedding into team tools & pipelines The notebook and workflow will use free OSS tools (Pandas, umap_learn/cuML, PyGraphistry[AI]) on free data. For participants with access to GPU notebooks, the workflow can also run on GPUs, and CPU-only will work too.

Alex Morrise @silkspaceships, Head of Graph Data Science, Graphistry

12:55

Break

13:30

Take your analysis to the next level with interactive dashboarding libraries easily

Interactive dashboarding in Jupyter Notebook is a relatively new area. It makes data analysis interactive and easy even for people without the skills. Dashboarding libraries used to be relatively low-level and probably difficult for infosec people. However, with the recent developments in the area, there are new libraries that make dashboarding quite easy. In this talk, I will showcase hvPlot and Panel libraries and cover some potential use cases in security data analysis, threat hunting, and DFIR.

Mehmet Ergene @Cyb3rMonk, Security Researcher & Data Scientist, Binalyze

14:10

Shell Language Processing: Modelling Linux Commands with Machine Learning

In this talk, we will discuss techniques that can be used to efficiently model Linux auditd telemetry with machine learning techniques. We will investigate how security specialists can use execve process creation events with machine learning modeling to perform both (a) supervised malicious activity detection and (b) unsupervised baseline definition with consequent anomaly detection. We will evaluate different preprocessing methods like TF-IDF and hashing trick, as well as various machine learning models like gradient-boosted decision trees and isolation forest. Additionally, we will evaluate the adversarial robustness of such machine learning solutions for detection heuristics and consider how attackers may bypass these techniques.

Dmitrijs Trizna @ditrizna, Senior Software Engineer, Microsoft

15:00

Break

15:20

Visualization

Ian Hellen @ianhellen, Principal Developer and Security Engineer, Microsoft

15:55

Windows Memory Triage

Carrying out forensic analysis of memory images is a challenging task. Especially when you need enrichment inbuilt. A Jupyter notebook named “memOptix” blueteam0ps/memOptix was created so that triage can be carried out in a consistent manner. It also harnesses the power of msticpy for data enrichment and visualisation to aid the analysts.

Janantha Marasinghe @blueteam0ps_, Senior Incident Response Specialist, Deloitte

16:20

Break

16:35

CVEData - Data Scientist Pretender Dives into CVEs for Infosec Insight

Cvedata is a python package is caught somewhere between a data collection tool and a CVE data API. It is much more the former than the latter. https://clearbluejar.github.io/cvedata/README.html The quick talk will cover how cvedata leverages Jupyter notebooks and books to: - Automate package documentation - Auto create source data Jupyter notebooks and display results within Javascript Datatables - Explore CVE public data - Analysis of MSRC CVEs, Updates, and File Info - Rank Security Researchers by CVEs

John Mac @clearbluejar, Data Scientist Pretender / Infosec Enthusiast, None

17:00

Speakers Panel

17:35

Closing Remarks

Roberto Rodriguez @Cyb3rWard0g, Principal Threat Researcher, Microsoft