Skip to main content

Command Palette

Search for a command to run...

Event-Driven Network Automation with Nokia SR OS and Ansible EDA

Event-driven automation is redefining how networks respond to the unexpected. Here is what EDA is and why it is a game-changer.

Published
13 min read
 Event-Driven Network Automation with Nokia SR OS and Ansible EDA

The problem with Polling

Every network operations team is familiar with the scenario: a ticket arrives at 3 a.m. indicating a BGP session has been down for 45 minutes. The monitoring system, checking every five minutes, detected the issue on the third attempt and took an additional 10 minutes to alert someone. By the time it was addressed, the network had been compromised for nearly an hour.

This is the core issue with scheduled, polling-based automation. Problems are checked at fixed intervals, so if something fails between checks, the network remains compromised until the next cycle detects it.

Event-driven automation completely transforms this approach. Rather than periodically asking the infrastructure "are you OK?", the infrastructure immediately notifies you of any changes, triggering an instant response.

"Instead of asking the network if it's OK every five minutes, the network tells you the moment something changes — and a response fires in under a second."

Ansible Event-Driven Automation (EDA) is Red Hat's answer to this challenge built directly on top of the Ansible ecosystem most network teams already know. But it is not the first tool in this space. StackStorm has been doing event-driven automation since 2014.

Ansible EDA Explained

Ansible EDA was launched in 2022 and became generally available with Ansible Automation Platform 2.4 in 2023. It introduces a dedicated event-processing layer to the existing Ansible ecosystem, capable of simultaneously listening to numerous event sources and responding to them in real time.

The three core concepts are simple:

Sources — where events come from: webhooks, Kafka topics, syslog streams, cloud alerts, Git commits, SNMP traps, monitoring system callbacks.

Rules — conditions you define in YAML. If the event payload matches, an action fires. Rules can include throttling, grouping, and logical operators.

Actions — what happens: run a playbook, post to Slack, open a ticket, set a variable, or simply log the event.

These are tied together in a Rulebook — a YAML file that looks like this:

---
- name: React to BGP session down
  hosts: network_devices
  sources:
    - ansible.eda.syslog:
        host: 0.0.0.0
        port: 5514

  rules:
    - name: BGP neighbor down
      condition: >
        event.message is search("bgpSessionDown", ignorecase=true)
      throttle:
        once_within: 60 seconds
        group_by_attributes:
          - event.host
      actions:
        - run_playbook:
            name: playbooks/bgp_remediate.yml
            extra_vars:
              source_host: "{{ event.host }}"

This forms the complete loop: source — condition — action. If you're familiar with Ansible, you already grasp 80% of EDA. The rulebook is the only new concept, and it reads like plain English.

What EDA is not

EDA is not a monitoring system; it doesn't generate events but responds to them. You still require a source, such as a syslog stream, a gNMI telemetry collector feeding Kafka, a Prometheus Alertmanager webhook, or a custom script. EDA serves as the reaction layer, not the observability layer. It is also not (yet) a fully stateful workflow engine. For long-running, multi-step orchestration with complex branching logic, Ansible Automation Platform's Workflows feature or a dedicated orchestrator is more suitable. EDA is optimized for quick, reactive, event-triggered automation.

Why EDA matters now ?

  1. Networks Are Too Dynamic for Polling - Modern network environments—such as SD-WAN fabrics, containerized services, and cloud-native workloads—change states much faster than traditional infrastructure. A BGP session can fluctuate and recover in less than 30 seconds. An interface might experience a temporary optical issue and resolve itself. Polling-based tools capture the aftermath, not the event itself. EDA captures both.

  2. Operator time is the bottleneck - The network operations talent gap is real and growing. Every alert that requires a human to log in, assess, and manually run a CLI command is wasted capacity. Closed-loop automation — where EDA detects an event, fires a playbook, and resolves the issue without paging anyone — frees operators to work on architecture, capacity, and projects instead of reactive fire-fighting

  3. The Ansible Ecosystem Is Already There - This is EDA's single biggest advantage over purpose-built event-driven tools. If your team already has Ansible playbooks for network configuration, NETCONF-based remediation, or cloud provisioning, those playbooks work unchanged in EDA. There is no rewrite, no new DSL to learn, no separate execution engine to operate. The activation cost is dramatically lower than adopting an entirely new platform.

  4. Kafka and gNMI Are Now Mainstream - Event-driven automation requires event streams. Coupler years ago, getting gNMI telemetry out of a network device and into an automation system was a niche skill. Today, gNMIc, Telegraf, and vendor-native streaming telemetry are standard practice. Kafka is ubiquitous. The infrastructure for event-driven automation is mature — EDA's role is to act on those streams, and it does so natively.

  5. The Throttle and Group Model Solves the Alert Storm Problem - Anyone who has tried to build event-driven network automation before has hit the alert storm problem: one event generates hundreds of syslog messages, triggers hundreds of playbook runs simultaneously, and overwhelms both the automation system and the network device it is trying to fix. EDA's built-in throttle block — once_within: 60 seconds, group_by_attributes: [event.host] — handles this correctly at the rulebook level, without requiring custom deduplication code.

EDA vs Alternatives

Ansible EDA is not the only event-driven automation tool available. StackStorm, Rundeck (now Process Automation) have all been in production environments for years. Each solves a slightly different problem. Here is an honest look at where each one excels and where it falls short.

Head-to-Head Comparison

Feature Ansible EDA StackStorm Rundeck
YAML-native config Strong partial partial
Reuse existing automation Strong absent partial
Kafka / streaming source Strong Strong absent
syslog source built-in Strong partial absent
gNMI / telemetry native partial partial absent
NETCONF / nokia.sros support Strong partial absent
Low learning curve Strong partial partial
Production maturity partial Strong Strong

Choose Ansible EDA

• If your team already uses Ansible playbooks for any automation tasks

• If you need event-driven reactions to network syslog, gNMI, or webhooks

• Time-to-value matters — you want something running in hours, not weeks

• If your events come from Kafka, webhooks, or syslog streams

• If you are working with Nokia SR OS, Cisco, Arista, or Juniper via NETCONF

Ansible EDA meets Nokia SR OS

Overview

This guide provides a step-by-step guide for setting up Ansible Event-Driven Automation (EDA) with Nokia SR OS nodes running in Containerlab . By the end, you will have:

  • A Python virtual environment with Ansible EDA and the nokia.sros collection

  • SR OS nodes configured to stream syslog to EDA

  • EDA rulebook reacting to BGP session changes

  • NETCONF-based playbooks that automatically remediate those events on the SR OS nodes

Directory Structure

Create the working directory and subdirectories first:

mkdir -p sros-eda/{rulebooks,playbooks,inventory,logs}
cd sros-eda

Step 1 — Python Virtual Environment

Create an isolated environment so EDA dependencies do not conflict with system packages:

python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip

Install Ansible EDA and its dependencies:

pip install ansible ansible-rulebook ansible-runner aiohttp ncclient

Install the required Ansible collections:

ansible-galaxy collection install ansible.eda
ansible-galaxy collection install nokia.sros
ansible-galaxy collection install ansible.netcommon

Verify everything installed correctly:

ansible-rulebook --version
ansible-galaxy collection list | grep -E "eda|sros|netcommon"

Step 2 — SR OS Node Configuration

Find your host IP first:

ip route get 8.8.8.8 | awk '{print $7; exit}'

Apply on SR OS nodes (replace <HOST_IP>):

/configure log syslog "eda" address <HOST_IP>
/configure log syslog "eda" severity warning
/configure log syslog "eda" port 5514
/configure log log-id "20" admin-state enable
/configure log log-id "20" description "EDA event stream"
/configure log log-id "20" source main true
/configure log log-id "20" destination syslog "eda"
 

Verify syslog is functioning

Start a listener on your host:

python3 -c "
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.bind(('0.0.0.0', 5514))
print('Listening on UDP 5514...')
while True:
    data, addr = s.recvfrom(1024)
    print(f'{addr[0]}: {data.decode()}')
"

Events from the SR OS node are being received:

Listening on UDP 5514...
172.20.20.13: <186>May 12 23:07:05 172.31.255.30 TMNX: 766 Base LOGGER-MAJOR-tmnxLogFileRollover-2008 [acct-log-id 9 file-id 19]:  Log file cf3:\act\act0919-20260512-223705.xml.gz on compact flash cf3 has been rolled over

172.20.20.13: <187>May 12 23:08:08 172.31.255.30 TMNX: 767 Base LOGGER-MINOR-tmnxLogFileDeleted-2009 [acct-log-id 9 file-id 19]:  Log file cf3:\act\act0919-20260512-200705.xml.gz on compact flash cf3 has been deleted

172.20.20.13: <187>May 12 23:08:08 172.31.255.30 TMNX: 768 Base LOGGER-MINOR-tmnxLogFileDeleted-2009 [acct-log-id 9 file-id 19]:  Log file cf3:\act\act0919-20260512-203705.xml.gz on compact flash cf3 has been deleted

Step 3 — Ansible Inventory

Create inventory/hosts.yml . This connects to SR OS via NETCONF using the nokia.sros.md network OS:

---
all:
  vars:
    ansible_user: <user-name>
    ansible_password: <password>
    ansible_connection: ansible.netcommon.netconf
    ansible_network_os: nokia.sros.md
    ansible_netconf_port: 830
    ansible_netconf_username: <user-name>
    ansible_netconf_password: <password>

  children:
    sros_nodes:
      hosts:

        sros1:
          ansible_host: 172.20.20.13
          ansible_netconf_host: 172.20.20.13
          bgp_neighbor: "10.10.10.2"
          bgp_peer_as: 65000

        sros2:
          ansible_host: 172.20.20.14
          ansible_netconf_host: 172.20.20.14
          bgp_neighbor: "10.10.10.1"
          bgp_peer_as: 65000

Step 4 — EDA Rulebooks

Rulebook 1 — BGP Events

---
- name: "SR OS EDA: BGP and Interface Event Handler"
  hosts: sros_nodes
  sources:
    - ansible.eda.syslog:
        host: 0.0.0.0
        port: 5514

  rules:

    - name: BGP session down
      condition: >
        event.message is search(
          "bgpBackwardTransNotification|bgpSessionDown|bgpPeerNotFound|ADMIN_SHUT|BGP-WARNING",
          ignorecase=true)
      throttle:
        once_within: 60 seconds
        group_by_attributes:
          - event.host
      actions:
        - run_playbook:
            name: playbooks/bgp_session_down.yml
            extra_vars:
              source_host: "{{ event.host }}"
              syslog_message: "{{ event.message }}"

    - name: BGP session up
      condition: >
        event.message is search(
          "bgpEstablishedNotification|ESTABLISHED",
          ignorecase=true)
      actions:
        - debug:
            msg: "BGP UP on {{ event.host }} — no action required"

    - name: Catch all — print every event received
      condition: event.message is defined
      actions:
        - debug:
            msg: "EVENT from {{ event.host }} | MSG: {{ event.message }}"

Step 5 — Playbooks

BGP Session down

---
- name: BGP Session Down — Remediate via NETCONF
  hosts: "{{ source_host | default('sros_nodes') }}"
  gather_facts: false

  tasks:

    - name: Collect BGP neighbor state via NETCONF
      netconf_get:
        filter: |
          <configure xmlns="urn:nokia.com:sros:ns:yang:sr:conf">
            <router>
              <bgp>
                <neighbor>
                  <ip-address/>
                  <admin-state/>
                </neighbor>
              </bgp>
            </router>
          </configure>
      register: bgp_state

    - name: Get current timestamp
      delegate_to: localhost
      command: date '+%Y-%m-%dT%H:%M:%S'
      register: timestamp

    - name: Log BGP down event
      delegate_to: localhost
      lineinfile:
        path: logs/bgp_events.log
        line: "[{{ timestamp.stdout }}] BGP DOWN on {{ inventory_hostname }} | {{ syslog_message | default('no message') }}"
        create: yes

    - name: Disable BGP neighbor
      ansible.netcommon.netconf_config:
        lock: never
        content: |
          <config xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
           <configure   xmlns="urn:nokia.com:sros:ns:yang:sr:conf">
             <router>
             <router-name>Base</router-name>
              <bgp>
                <neighbor>
                  <ip-address>{{ bgp_neighbor }}</ip-address>
                  <admin-state>disable</admin-state>
                </neighbor>
              </bgp>
             </router>
           </configure>
          </config>
      when: bgp_neighbor is defined
      ignore_errors: yes
      register: clear_result

    - name: Wait 3 seconds
      pause:
        seconds: 3

    - name: Enable BGP neighbor
      ansible.netcommon.netconf_config:
        lock: never
        content: |
          <config xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
           <configure   xmlns="urn:nokia.com:sros:ns:yang:sr:conf">
             <router>
             <router-name>Base</router-name>
              <bgp>
                <neighbor>
                  <ip-address>{{ bgp_neighbor }}</ip-address>
                  <admin-state>enable</admin-state>
                </neighbor>
              </bgp>
             </router>
           </configure>
          </config>
      when: bgp_neighbor is defined
      ignore_errors: yes
      register: clear_result

    - name: Wait for BGP to re-establish
      pause:
        seconds: 15

    - name: Log remediation result
      delegate_to: localhost
      lineinfile:
        path: logs/bgp_events.log
        line: "[{{ timestamp.stdout }}] BGP REMEDIATION COMPLETE on {{ inventory_hostname }} | neighbor {{ bgp_neighbor | default('unknown') }}"
        create: yes

Step 6 — Running EDA

Terminal 1 — Start EDA

cd sros-eda
source venv/bin/activate

ansible-rulebook \
--rulebook rulebooks/01_bgp_and_interface.yml \
--inventory inventory/hosts.yml \
--verbose 

EDA starts and listens on 0.0.0.0:5514. It is now waiting for syslog events from your SR OS nodes.

Output :

(venv) [kleburu@devplayground]$ ansible-rulebook   --rulebook rulebooks/01_bgp_and_interface.yml   --inventory inventory/hosts.yml --verbose
2026-05-13 12:05:43,531 - ansible_rulebook.app - INFO - Starting sources
2026-05-13 12:05:43,531 - ansible_rulebook.app - INFO - Starting rules
2026-05-13 12:05:43,532 - drools.ruleset - INFO - Using jar: /labs/kleburu/python-projects/Ansible_Automation/sros-eda/venv/lib/python3.9/site-packages/drools/jars/drools-ansible-rulebook-integration-runtime-1.0.11-SNAPSHOT.jar
2026-05-13 12:05:44 171 [main] INFO org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation threshold set to 90%
2026-05-13 12:05:44 172 [main] INFO org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory check event count threshold set to 64
2026-05-13 12:05:44 172 [main] INFO org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Exit above memory occupation threshold set to false
2026-05-13 12:05:44 177 [main] INFO org.drools.ansible.rulebook.integration.api.rulesengine.AbstractRulesEvaluator - Start automatic pseudo clock with a tick every 100 milliseconds
2026-05-13 12:05:44,199 - ansible_rulebook.engine - INFO - load source ansible.eda.syslog
2026-05-13 12:05:44,201 - ansible_rulebook.engine - INFO - loading source filter eda.builtin.insert_meta_info
Syslog listener started on 0.0.0.0:5514
2026-05-13 12:05:44,519 - ansible_rulebook.engine - INFO - Waiting for all ruleset tasks to end
2026-05-13 12:05:44,520 - ansible_rulebook.rule_set_runner - INFO - Waiting for actions on events from SR OS EDA: BGP and Interface Event Handler
2026-05-13 12:05:44,520 - ansible_rulebook.rule_set_runner - INFO - Waiting for events, ruleset: SR OS EDA: BGP and Interface Event Handler
2026-05-13 12:05:44 520 [drools-async-evaluator-thread] INFO org.drools.ansible.rulebook.integration.api.io.RuleExecutorChannel - Async channel connected

Terminal 2 — Test BGP event

On the SR OS node:

/configure router bgp neighbor 10.10.10.2 admin-state disable
commit

EDA catches the syslog within 1-2 seconds and fires bgp_session_down.yml. Re-enable after the test:

Output:

PLAY [BGP Session Down — Remediate via NETCONF] ********************************

TASK [Collect BGP neighbor state via NETCONF] **********************************
ok: [sros1]

TASK [Get current timestamp] ***************************************************
changed: [sros1 -> localhost]

TASK [Log BGP down event] ******************************************************
changed: [sros1 -> localhost]

TASK [Disable BGP neighbor] ****************************************************
ok: [sros1]

TASK [Wait 3 seconds] **********************************************************
Pausing for 3 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
ok: [sros1]

TASK [Enable BGP neighbor] *****************************************************
changed: [sros1]

TASK [Wait for BGP to re-establish] ********************************************
Pausing for 15 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
ok: [sros1]

TASK [Log remediation result] **************************************************
changed: [sros1 -> localhost]

PLAY RECAP *********************************************************************
sros1                      : ok=8    changed=4    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Quick Reference

Command Purpose
source venv/bin/activate Activate the virtual environment
ansible-rulebook \ --rulebook rulebooks/01_bgp_and_interface.yml \ --inventory inventory/hosts.yml \ --verbose start EDA
tail -f logs/bgp_events.log Watch BGP event log
show log syslog "name" Verify syslog log-id on SR OS
show system netconf Verify NETCONF on SR OS
show router bgp summary Check BGP state on SR OS

The Bottom Line

Event-driven automation is not optional anymore. Networks are highly dynamic, operators are limited, and the delay between events and responses is too expensive to rely on reactive, manual processes.

Ansible EDA secures its position in this landscape not due to its power, but because of its accessibility. For the vast majority of network automation teams that already have Ansible in their environment, EDA is the shortest path from a polling-based world to a truly event-driven one.

Start with syslog. React to BGP events and interface flaps. Watch a playbook fire in under two seconds after a real network event. Then ask yourself what else you want to connect to it. The answer is usually: everything.

Further readings

  • Ansible EDA Documentation: ansible.readthedocs.io/projects/rulebook

  • ansible-rulebook on GitHub: github.com/ansible/ansible-rulebook

  • nokia.sros Ansible Collection: galaxy.ansible.com/nokia/sros