Sound Event Detection Algorithm


REVISION HISTORY

Revision No. Description Date
1.0 First version 05/30/2024
1.1 add lsd 09/13/2024

1. Overview

1.1. Algorithm Description

Sound Event Detection (SED) is an algorithm that detects the presence of corresponding sound events. Currently, it supports babycry detection, cough detection and glass-shatter detection.

model function classes
sed_tbs.img babycry detection negative(event_index=0); babycry(event_index=1);
sed_tbl.img babycry detection(large model) negative(event_index=0); babycry(event_index=1);
sed_tcs.img cough detection negative(event_index=0); cough(event_index=1);
sed_tcbs.img cough&babycry detection negative(event_index=0); cough(event_index=1); babycry(event_index=2);
sed_tbgl.img babycry&glass-shatter detection(large model) negative(event_index=0); babycry(event_index=1); glass(event_index=2);

1.2. Notes

The algorithm operates at a sampling rate of 16kHz with an input length of 256 samples (16ms).

This algorithm is trained using a 16kHz sampling rate; please play audio with a sampling rate higher than 16kHz during testing.

2. API Reference

The function module provides the following APIs:

API Names Functions
ALGO_SED_CreateHandle Creates algo handle
ALGO_SED_InitHandle Initializes algo handle
ALGO_SED_SetParams Sets the configurable parameters of the algorithm
ALGO_SED_GetInputAttr Get the attribute information of the model
ALGO_SED_Run Runs the algorithm
ALGO_SED_DeinitHandle Deinitializes the algorithm handle
ALGO_SED_ReleaseHandle Releases the resources occupied by algo handle
ALGO_SED_GetParams Gets the current configuration parameters of the algorithm
ALGO_SED_Lsd Runs the loud sound detection algorithm

2.1 ALGO_SED_CreateHandle

  • Function

    Creates algo handle

  • Syntax

    MI_S32 ALGO_SED_CreateHandle(void **handle);
    
  • Parameters

    Parameters Description In/Output
    handle Pointer to the algorithm handle Input
  • Return Values

    Return Values Description
    0 Success
    Others Failure(see Error Code for details)
  • Dependencies

    • Header File: sgs_sed_api.h

    • Library File: libsgsalgo_sed.a / libsgsalgo_sed.so

2.2 ALGO_SED_InitHandle

  • Function

    Initializes algo handle

  • Syntax

    MI_S32 ALGO_SED_InitHandle(void *handle, const SedInit_t *init_info);
    
  • Parameters

    Parameters Descriptions Input/Output
    handle algo handle Input
    init_info Initialization parameters for the algorithm, see SedInit_t for details Input
  • Return Values

    Return Values Descriptions
    0 Success
    Others Failure(see Error Code for details)
  • Dependencies

    • Header File: sgs_sed_api.h

    • Library File: libsgsalgo_sed.a / libsgsalgo_sed.so

2.3 ALGO_SED_SetParams

  • Function

    Sets the configurable parameters of the algorithm

  • Syntax

    MI_S32 ALGO_SED_SetParams(void *handle, const SedParams_t *params);
    
  • Parameters

    Parameters Descriptions Input/Output
    handle algo handle Input
    params Configurable parameters (see SedParams_t for details ) Input
  • Return Values

    Return Values Descriptions
    0 Success
    Others Failure(see Error Code for details)
  • Dependencies

    • Header File: sgs_sed_api.h

    • Library File: libsgsalgo_sed.a / libsgsalgo_sed.so

2.4 ALGO_SED_GetInputAttr

  • Function

    Get the attribute information of the model, including the model's input resolution and type of input data

  • Syntax

    MI_S32 ALGO_SED_GetInputAttr(void *handle, SedInputAttr_t *input_attr);
    
  • Parameters

    Parameters Descriptions Input/Output
    handle algo handle Input
    input_attr Pointer to store attribute information, see SedInputAttr_t for details Input
  • Return Values

    Return Values Descriptions
    0 Success
    Others Failure(see Error Code for details)
  • Dependencies

    • Header File: sgs_sed_api.h

    • Library File: libsgsalgo_sed.a / libsgsalgo_sed.so

2.5 ALGO_SED_Run

  • Function

    Runs the algorithm and gets the output result

  • Syntax

    MI_S32 ALGO_SED_Run(void *handle, const SedInput_t *input, SedOutput_t *output);
    
  • Parameters

    Parameters Descriptions Input/Output
    handle algo handle Input
    input Input audio data, see SedInput_t for details,Note that if the input audio data exceeds 2 seconds, only the last 2 seconds are detected. Input
    SedOutput_t Output result of the algorithm Output
  • Return Values

    Return Values Descriptions
    0 Success
    Others Failure(see Error Code for details)
  • Dependencies

    • Header File: sgs_sed_api.h

    • Library File: libsgsalgo_sed.a / libsgsalgo_sed.so

2.6 ALGO_SED_DeinitHandle

  • Function

    Deinitializes the algorithm handle

  • Syntax

    MI_S32 ALGO_SED_DeinitHandle(void *handle);
    
  • Parameters

    Parameters Description In/Output
    handle algo handle Input
  • Return Values

    Return Values Descriptions
    0 Success
    Others Failure(see Error Code for details)
  • Dependencies

    • Header File: sgs_sed_api.h

    • Library File: libsgsalgo_sed.a / libsgsalgo_sed.so

2.7 ALGO_SED_ReleaseHandle

  • Function

    Releases the resources occupied by algo handle

  • Syntax

    MI_S32 ALGO_SED_ReleaseHandle(void *handle);
    
  • Parameters

    Parameters Description In/Output
    handle algo handle Input
  • Return Values

    Return Values Descriptions
    0 Success
    Others Failure(see Error Code for details)

2.8 ALGO_SED_GetParams

  • Function

    Gets the current configuration parameters of the algorithm

  • Syntax

    MI_S32 ALGO_SED_GetParams(void *handle, SedParams_t *params);
    
  • Parameters

    Parameters Description In/Output
    handle algo handle Input
    params Pointer to the returned parameter structure Output
  • Return Values

    Return Values Descriptions
    0 Success
    Others Failure(see Error Code for details)
  • Dependencies

    • Header File: sgs_sed_api.h

    • Library File: libsgsalgo_sed.a / libsgsalgo_sed.so

2.9 ALGO_SED_Lsd

  • Function

    Runs the loud sound detection algorithm and obtains the output result

  • Syntax

    MI_S32 ALGO_SED_Lsd(void *handle, const SedInput_t *input, SedLsdOutput_t* output);
    
  • Parameters

    Parameters Descriptions Input/Output
    handle algo handle Input
    input Input audio data, see SedInput_t for details,Note that if the input audio data exceeds 2 seconds, only the last 2 seconds are detected. Input
    SedLsdOutput_t Output result of the loud sound detection Output
  • Return Values

    Return Values Descriptions
    0 Success
    Others Failure(see Error Code for details)
  • Dependencies

    • Header File: sgs_sed_api.h

    • Library File: libsgsalgo_sed.a / libsgsalgo_sed.so

3. Structure Definitions

The relevant structures for the algorithm are defined as follows:

Data Type Definition
SedInputAttr_t Algorithm Input Information Structure
SedInput_t Algorithm Input Audio Data Structure
SedInit_t Algorithm Initialization Parameter Structure
SedOutput_t Algorithm Output Structure
SedParams_t Algorithm Configurable Parameters Structure
SedLsdOutput_t Loud Sound Detection Output Data Structure

3.1 SedInputAttr_t

  • Description

    Defines the related information of the model's input data

  • Definition

    typedef struct
    {
        MI_S32 frame_len;
    } SedInputAttr_t;
    
  • Members

    Member Name Descriptions
    frame_len Frame length of the input audio data (number of samples per input)
  • Related data types and interfaces

    ALGO_SED_GetInputAttr

3.2 SedInput_t

  • Description

    Defines the structure for the algorithm's input data

  • Definition

    typedef struct
    {
        MI_S32 sample_rate;
        MI_S32 bit_width;
        void *buffer;
        MI_U32 buffer_len;
    } SedInput_t;
    
  • Members

    Member Name Descriptions
    sample_rate Sampling rate of the input audio
    bit_width Quantization bit number of the input audio
    buffer Pointer to the input audio buffer
    buffer_len Length of the input audio buffer (in bytes)
  • Related data types and interfaces

    ALGO_SED_Run

3.3 SedInit_t

  • Description

    Defines the structure for the algorithm's initialization parameters

  • Definition

    typedef struct
    {
        char ipu_firmware_path[MAX_SED_STRLEN]; // ipu_firmware.bin path
        char model_path[MAX_SED_STRLEN];        // model path
        MI_BOOL create_device;                  // set false to create ipu device outside algo lib
        MI_BOOL destroy_device;                 // set false to destroy ipu device outside algo lib
        void *model_buffer;                     // set it when load model from memory
        MI_U32 model_buffer_len;                // set it when load model from memory
    } SedInit_t;
    
  • Members

    Member Name Descriptions
    ipu_firmware_path Path to the IPU firmware
    model_path Path to the model
    create_device Whether to create the IPU device within the algorithm library (default true, set false to create outside the library)
    destroy_device Whether to destroy the IPU device within the algorithm library (default true, set false to destroy outside the library)
    model_buffer Set it when loading the model from memory(model path should be empty), set NULL if load model from model_path
    model_buffer_len Set when loading the model from memory, specifies the length of the model buffer
  • Related data types and interfaces

    ALGO_SED_InitHandle

3.4 SedOutput_t

  • Description

    Defines the structure for the algorithm's output

  • Definition

    typedef struct
    {
        MI_BOOL is_valid;
        MI_S32 event_index;
        MI_FLOAT event_score;
    } SedOutput_t;
    
  • Members

    Member Name Descriptions
    is_valid Indicates if the output result is valid(algo would be invoked every certain interval(not every times called), is_valid = true means algo is invoked and result is valid)
    event_index Index of the detected audio event (correlation with event types depends on the model), see Algorithm Description for details
    event_score Score of the detected audio event (range from 0.0 to 1.0)
  • Related data types and interfaces

    ALGO_SED_Run

3.5 SedParams_t

  • Description

    Defines the configurable parameters of the algorithm

  • Definition

    typedef struct
    {
        MI_S32 smooth_length;                         // output result smooth len; default 0
        MI_FLOAT vad_threshold;                       // vad threshold; default -45
        MI_FLOAT event_threshold[MAX_SED_EVENT_NUM];  // event score threshold; default [0.5, 0.5]
        MI_U32 min_trigger_times[MAX_SED_EVENT_NUM];  // return a positive event when detected more than min_trigger_times, default 1
        MI_FLOAT lsd_threshold;                       // large sound detection rms threshold(unit: dB)  default 0 means lsd is disabled
    } SedParams_t;
    
  • Members

    Member Name Descriptions
    smooth_length Length of the smoothing window for output results (default 0, no smoothing)
    vad_threshold VAD threshold for sound event detection (range -50.0 to 0.0, default -45.0)
    event_threshold Threshold for the score of audio events (default 0.5, suggested 0.35 with front-end, 0.7 without front-end)
    min_trigger_times Min trigger times of an event,only triggered an event when encouter scores exceed threshold for min_trigger_times in smooth_length time window (default 1)
    lsd_threshold RMS threshold for large sound detection (in dB, default 0, disables LSD)
  • Related data types and interfaces

    ALGO_SED_SetParams

    ALGO_SED_GetParams

3.6 SedLsdOutput_t

  • Description

    Defines the structure for the output data of the loud sound detection

  • Definition

    typedef struct
    {
        MI_BOOL is_large_sound;
        MI_FLOAT max_rms;
    } SedLsdOutput_t;
    
  • Members

    Member Name Descriptions
    is_large_sound Indicates if the current audio is a loud sound
    max_rms Maximum RMS of the current audio (in dB)
  • Related data types and interfaces

    ALGO_SED_Lsd

4. Error Code

Error Code Value Description
E_ALGO_SUCCESS 0 Operation successful
E_ALGO_HANDLE_NULL 1 Algorithm handle is null
E_ALGO_INVALID_PARAM 2 Invalid input parameter
E_ALGO_DEVICE_FAULT 3 Hardware error
E_ALGO_LOADMODEL_FAIL 4 Model loading failed
E_ALGO_INIT_FAIL 5 Algorithm initialization failed
E_ALGO_NOT_INIT 6 Algorithm has not been initialized
E_ALGO_INPUT_DATA_NULL 7 Algorithm input data is null
E_ALGO_INVALID_INPUT_SIZE 8 Invalid dimensions of the algorithm input data
E_ALGO_INVALID_LICENSE 9 Invalid license permission
E_ALGO_MEMORY_OUT 10 Insufficient memory
E_ALGO_FILEIO_ERROR 11 File read/write operation error
E_ALGO_INVALID_OUTPUT_SIZE 12 Invalid dimensions of the algorithm output data
E_ALGO_INVALID_DECODE_MODE 13 Invalid decode mode
E_ALGO_MODEL_INVOKE_ERROR 14 Model invoke error
E_ALGO_INVALID_FILE 15 Invalid file