Most Powerful Open Source ERP

NMS: How To Handle Faults and Alarms

Handling Faults and Alarms
  • Last Update:2024-10-02
  • Version:001
  • Language:en

Agenda

  • Acess Monitor and Logs
  • Alarms Explanation
  • Examples

This lecture is designed to help you understand how to use SlapOS Master for managing Baseband Units (BBUs) and Remote Units (RUs) via the Network Management System (NMS) integrated with the ors-amarisoft software release.

 

Acess Monitor and Logs

Click monitor-setup-url from bbu0-health and bbu0-ENB

 

Acess Monitor and Logs

Click monitor-setup-url from bbu0-health and bbu0-ENB

 

Alarms Explanation

Here is a thorough explanation about some promises in Error, especially for the case of RUs: 

RU*_config_log

Promise source codecheck_lopcomm_config_log.py

Common causes: Netconf connection lost; cu_config.xml is improperly configured (e.g., setting out-of-range frequencies).

Solution: Refer to the related RU*-config.log (see "Access ORS log" section) in the private log to debug the causes. If the logs indicate a connection loss, check for CPRI locks and ensure the RU gets an IPv6 address from the BBU. If the edit-config RPC request is unsuccessful, adjust the user input on the panel accordingly. If you need further assistance, contact the Rapid.Space Team.

RU*_cpri_lock

Promise source codecheck_cpri_lock.py

Common causes: Hardware issues (e.g., disconnected hardware, unplugged cables, RU powered off); software issues (e.g., failed frame synchronization).

Solution: If "HW Lock is missing", check the physical connection between the RU and BBU. If "SW Lock is missing", which rarely happens, it indicates frame loss. Contact Rapid.Space for assistance.

RU*_firmware

Promise source code: Unavailable

Common causes: The RU is running unverified firmware.

Solution: Provide the correct SSH key to the RU to enable firmware download and upgrade from the BBU.

RU*_lof

Promise source codecheck_lopcomm_lof.py

Common causes: Loss of frame.

Solution: Follow the same steps as for RU*_cpri_lock when "SW Lock is missing".

RU*_netconf_connection

Promise source code: Unavailable

Common causes: Netconf connection lost.

Solution: Check for CPRI locks and ensure the RU gets an IPv6 address from the BBU.

RU*_netconf_socket

Promise source code: Unavailable

Common causes: Netconf connection lost; the RU is not listening for Netconf.

Solution: Check for CPRI locks and ensure the RU gets an IPv6 address from the BBU.

RU*_pa_current

Promise source codecheck_lopcomm_pa_current.py

Common causes: RU's PA over current.

Solution: Contact Rapid.Space support.

RU*_pa_output_power

Promise source codecheck_lopcomm_pa_output_power.py

Common causes: RU's PA Over Output Power.

Solution: Contact Rapid.Space support.

RU*_rssi

Promise source codetest_check_lopcomm_rssi.py

Common causes: RU's RSSI imbalance; RX diversity lost.

Solution: Contact Rapid.Space support.

RU*_rx_saturated

Promise source codecheck_rx_saturated.py

Common causes: RU's RX antennas saturated.

Solution: Contact Rapid.Space support.

RU*_sdr_busy

Promise source codecheck_sdr_busy.py

Common causes: ENB doesn't properly use the CPRI card.

Solution: Refer to the related enb-output.log (see "Access ORS log" section) in the private log to debug the causes. Possible issues include uninitialized trx_sdr kernel, CPRI card in use by another process, incorrect SPF port, or missing license to launch LTEENB. Contact Rapid.Space for assistance if needed.

RU*_stats_log

Promise source codecheck_lopcomm_stats_log.py

Common causes: Netconf connection lost; subscription for notification from RU failed.

Solution: Refer to the related RU*-stats.log (see "Access ORS log" section) in the private log to debug the causes. Check for CPRI locks and ensure the RU gets an IPv6 address from the BBU. Contact Rapid.Space for further assistance if needed.

RU*_sync

Promise source codecheck_lopcomm_sync.py

Common causes: Similar to RU*_cpri_lock.

Solution: Similar to RU*_cpri_lock.

RU*_vswr

Promise source codecheck_lopcomm_sync.py

Common causes: RU's VSWR alarm.

Solution: Ensure antennas are connected. Try rebooting the RU to clear the alarm. Contact Lopcomm for help if necessary.

amarisoft_stats_log

Promise source codecheck_sdr_busy.py

Common causes: ENB doesn't properly use the CPRI card.

Solution: Refer to the related enb-output.log (see "Access ORS log" section) in the private log to debug the causes. Potential issues include uninitialized trx_sdr kernel, CPRI card in use by another process, incorrect SPF port, or missing license to launch LTEENB. Contact Rapid.Space for assistance if needed.

buildout_slappart*_status

Promise source code: Unavailable

Common causes: Fault in the software's buildout.

Solution: Contact Rapid.Space for a patch.

check_baseband_latency

Promise source codecheck_baseband_latency.py

Common causes: Insufficient processing time for LTEENB due to other processes consuming too much CPU on the server (ORS/BBU).

Solution: If you have access to the server, identify and resolve the disturbing process. Otherwise, contact Rapid.Space for help.

check_monitor_frontend_password

Promise source code: Unavailable

Common causes: monitor-setup-url with username and password cannot be accessed.

Solution: Ensure the server is online. Contact Rapid.Space for assistance.

monitor_bootstrap_status

Promise source codemonitor_bootstrap_status.py

Common causes: Fault in the software or request parameters.

Solution: Contact Rapid.Space for help.

sshd

Promise source code: Unavailable

Common causes: sshd on BBU for RU to download the firmware is unavailable.

Solution: Contact Rapid.Space for help.

monitor_httpd_listening_on_tcp

Promise source code: Unavailable

Common causes: Server's IPv6 is not accessible.

Solution: Check the server's connection.

monitor_http_frontend

Promise source code: Unavailable

Common causes: Monitor frontend URL is not ready.

Solution: Check the frontend server.

Health instance checks the computer health (CPU, memory, internet connection, etc.)

check_cpu_temperature

Promise source codecheck_cpu_temperature.py

Common causes: CPU temperature too high.

Solution: Check the device's environment.

check_cpu_load

Promise source codecheck_server_cpu_load.py

Common causes: CPU overload.

Solution: Check running processes on the server.

check_free_disk_space

Promise source codecheck_free_disk_space.py

Common causes: Insufficient disk space on the server.

Solution: If you have access to the server, free up some space. Otherwise, contact Rapid.Space for help.

check_network_errors

Promise source codecheck_network_errors_packets.py

Common causes: Network packet loss.

Solution: Check the server's connection.

check_partition_space

Promise source code: Unavailable

Common causes: Server's IPv6 is not accessible.

Solution: Check the server's partition Usage. Contact Rapid.Space for help.

check_ram_usage

Promise source codecheck_ram_usage.py

Common causes: High RAM usage.

Solution: Check the server's RAM usage. Contact Rapid.Space for help.

check_re6stnet_certificate

Promise source code: Unavailable

Common causes: Re6stnet certificate expired.

Solution: Contact Rapid.Space for help.

check_network_transit

Promise source codecheck_network_transit.py

Common causes: Network congestion.

Solution: Check the server's connection. Contact Rapid.Space for help if needed.

check_disk_space

Promise source code: Unavailable

Common causes: Same as check_free_disk_space.

Solution: Same as check_free_disk_space

Example: enb doesn't start

RU*_sdr_busy
Promise source codecheck_sdr_busy.py
Common causes: ENB doesn't properly use the CPRI card.
Solution: Refer to the related enb-output.log (see "Access ORS log" section) in the private log to debug the causes. Possible issues include uninitialized trx_sdr kernel, CPRI card in use by another process, incorrect SPF port, or missing license to launch LTEENB. Contact Rapid.Space for assistance if needed.

As indicated in the ticket, the error is "sdr_busy". We need to identify the cause.

Check the enb-output.log:

[2024/09/25 11:21:21.718190745] Starting eNB software...
/srv/slapgrid/slappart19/etc/enb.cfg:648: cell_id 11 is already used by another cell
Base Station version 2024-03-15, Copyright (C) 2012-2024 Amarisoft
This software is licensed to rapid.space.
Support and software update available until 2025-03-25.

The log shows an issue with the cell_id. Let's verify the cell_id in BBU.ENB.CELL. The input error stems from the fact that the panel expects the cell_id to begin with "0x". After correcting the cell_id, check the enb-output.log again, and you should see the eNB starting properly.

 

 

 

Example: enb doesn't start

RU*_sdr_busy
Promise source codecheck_sdr_busy.py
Common causes: ENB doesn't properly use the CPRI card.
Solution: Refer to the related enb-output.log (see "Access ORS log" section) in the private log to debug the causes. Possible issues include uninitialized trx_sdr kernel, CPRI card in use by another process, incorrect SPF port, or missing license to launch LTEENB. Contact Rapid.Space for assistance if needed.

As indicated in the ticket, the error is "sdr_busy". We need to identify the cause.

Check the enb-output.log:

[2024/09/25 15:57:01.694437726 ] Starting eNB software...

[2024-09-25 15:57:07.496] gtp: bind: Cannot assign requested address
Could not open GTP-U socket
Base Station version 2024-03-15, Copyright (C) 2012-2024 Amarisoft
This software is licensed to rapid.space.
Support and software update available until 2025-03-25.

RF0: sample_rate=30.720 MHz dl_freq=1865.200 MHz ul_freq=1770.200 MHz (band 3) dl_ant=4 ul_ant=4
RF1: sample_rate=30.720 MHz dl_freq=1865.200 MHz ul_freq=1770.200 MHz (band 3) dl_ant=4 ul_ant=4
RF2: sample_rate=30.720 MHz dl_freq=1865.200 MHz ul_freq=1770.200 MHz (band 3) dl_ant=4 ul_ant=4

The log shows an issue with the gtp_address. Let's verify the gtp_address in BBU.ENB instance tree. gtp_address is supposed to be a connected address of BBU. After correcting the gtp_address, check the enb-output.log again, and you should see the eNB starting properly.

 

 

 

Example: RU not connected

RU*_cpri_lock

Promise source codecheck_cpri_lock.py

Common causes: Hardware issues (e.g., disconnected hardware, unplugged cables, RU powered off); software issues (e.g., failed frame synchronization).

Solution: If "HW Lock is missing", check the physical connection between the RU and BBU. If "SW Lock is missing", which rarely happens, it indicates frame loss. Contact Rapid.Space for assistance.

This is a common failure when the RU is not properly connected. Verify the physical connection of the RU to ensure it turns green.