Sound Event Detection Algorithm
REVISION HISTORY¶
| Revision No. | Description | Date |
|---|---|---|
| 1.0 | First version | 05/30/2024 |
| 1.1 | add lsd | 09/13/2024 |
1. Overview¶
1.1. Algorithm Description¶
Sound Event Detection (SED) is an algorithm that detects the presence of corresponding sound events. Currently, it supports babycry detection, cough detection and glass-shatter detection.
| model | function | classes |
|---|---|---|
| sed_tbs.img | babycry detection | negative(event_index=0); babycry(event_index=1); |
| sed_tbl.img | babycry detection(large model) | negative(event_index=0); babycry(event_index=1); |
| sed_tcs.img | cough detection | negative(event_index=0); cough(event_index=1); |
| sed_tcbs.img | cough&babycry detection | negative(event_index=0); cough(event_index=1); babycry(event_index=2); |
| sed_tbgl.img | babycry&glass-shatter detection(large model) | negative(event_index=0); babycry(event_index=1); glass(event_index=2); |
1.2. Notes¶
The algorithm operates at a sampling rate of 16kHz with an input length of 256 samples (16ms).
This algorithm is trained using a 16kHz sampling rate; please play audio with a sampling rate higher than 16kHz during testing.
2. API Reference¶
The function module provides the following APIs:
| API Names | Functions |
|---|---|
| ALGO_SED_CreateHandle | Creates algo handle |
| ALGO_SED_InitHandle | Initializes algo handle |
| ALGO_SED_SetParams | Sets the configurable parameters of the algorithm |
| ALGO_SED_GetInputAttr | Get the attribute information of the model |
| ALGO_SED_Run | Runs the algorithm |
| ALGO_SED_DeinitHandle | Deinitializes the algorithm handle |
| ALGO_SED_ReleaseHandle | Releases the resources occupied by algo handle |
| ALGO_SED_GetParams | Gets the current configuration parameters of the algorithm |
| ALGO_SED_Lsd | Runs the loud sound detection algorithm |
2.1 ALGO_SED_CreateHandle¶
-
Function
Creates algo handle
-
Syntax
MI_S32 ALGO_SED_CreateHandle(void **handle);
-
Parameters
Parameters Description In/Output handle Pointer to the algorithm handle Input -
Return Values
Return Values Description 0 Success Others Failure(see Error Code for details) -
Dependencies
-
Header File: sgs_sed_api.h
-
Library File: libsgsalgo_sed.a / libsgsalgo_sed.so
-
2.2 ALGO_SED_InitHandle¶
-
Function
Initializes algo handle
-
Syntax
MI_S32 ALGO_SED_InitHandle(void *handle, const SedInit_t *init_info);
-
Parameters
Parameters Descriptions Input/Output handle algo handle Input init_info Initialization parameters for the algorithm, see SedInit_t for details Input -
Return Values
Return Values Descriptions 0 Success Others Failure(see Error Code for details) -
Dependencies
-
Header File: sgs_sed_api.h
-
Library File: libsgsalgo_sed.a / libsgsalgo_sed.so
-
2.3 ALGO_SED_SetParams¶
-
Function
Sets the configurable parameters of the algorithm
-
Syntax
MI_S32 ALGO_SED_SetParams(void *handle, const SedParams_t *params);
-
Parameters
Parameters Descriptions Input/Output handle algo handle Input params Configurable parameters (see SedParams_t for details ) Input -
Return Values
Return Values Descriptions 0 Success Others Failure(see Error Code for details) -
Dependencies
-
Header File: sgs_sed_api.h
-
Library File: libsgsalgo_sed.a / libsgsalgo_sed.so
-
2.4 ALGO_SED_GetInputAttr¶
-
Function
Get the attribute information of the model, including the model's input resolution and type of input data
-
Syntax
MI_S32 ALGO_SED_GetInputAttr(void *handle, SedInputAttr_t *input_attr);
-
Parameters
Parameters Descriptions Input/Output handle algo handle Input input_attr Pointer to store attribute information, see SedInputAttr_t for details Input -
Return Values
Return Values Descriptions 0 Success Others Failure(see Error Code for details) -
Dependencies
-
Header File: sgs_sed_api.h
-
Library File: libsgsalgo_sed.a / libsgsalgo_sed.so
-
2.5 ALGO_SED_Run¶
-
Function
Runs the algorithm and gets the output result
-
Syntax
MI_S32 ALGO_SED_Run(void *handle, const SedInput_t *input, SedOutput_t *output);
-
Parameters
Parameters Descriptions Input/Output handle algo handle Input input Input audio data, see SedInput_t for details,Note that if the input audio data exceeds 2 seconds, only the last 2 seconds are detected. Input SedOutput_t Output result of the algorithm Output -
Return Values
Return Values Descriptions 0 Success Others Failure(see Error Code for details) -
Dependencies
-
Header File: sgs_sed_api.h
-
Library File: libsgsalgo_sed.a / libsgsalgo_sed.so
-
2.6 ALGO_SED_DeinitHandle¶
-
Function
Deinitializes the algorithm handle
-
Syntax
MI_S32 ALGO_SED_DeinitHandle(void *handle);
-
Parameters
Parameters Description In/Output handle algo handle Input -
Return Values
Return Values Descriptions 0 Success Others Failure(see Error Code for details) -
Dependencies
-
Header File: sgs_sed_api.h
-
Library File: libsgsalgo_sed.a / libsgsalgo_sed.so
-
2.7 ALGO_SED_ReleaseHandle¶
-
Function
Releases the resources occupied by algo handle
-
Syntax
MI_S32 ALGO_SED_ReleaseHandle(void *handle);
-
Parameters
Parameters Description In/Output handle algo handle Input -
Return Values
Return Values Descriptions 0 Success Others Failure(see Error Code for details)
2.8 ALGO_SED_GetParams¶
-
Function
Gets the current configuration parameters of the algorithm
-
Syntax
MI_S32 ALGO_SED_GetParams(void *handle, SedParams_t *params);
-
Parameters
Parameters Description In/Output handle algo handle Input params Pointer to the returned parameter structure Output -
Return Values
Return Values Descriptions 0 Success Others Failure(see Error Code for details) -
Dependencies
-
Header File: sgs_sed_api.h
-
Library File: libsgsalgo_sed.a / libsgsalgo_sed.so
-
2.9 ALGO_SED_Lsd¶
-
Function
Runs the loud sound detection algorithm and obtains the output result
-
Syntax
MI_S32 ALGO_SED_Lsd(void *handle, const SedInput_t *input, SedLsdOutput_t* output);
-
Parameters
Parameters Descriptions Input/Output handle algo handle Input input Input audio data, see SedInput_t for details,Note that if the input audio data exceeds 2 seconds, only the last 2 seconds are detected. Input SedLsdOutput_t Output result of the loud sound detection Output -
Return Values
Return Values Descriptions 0 Success Others Failure(see Error Code for details) -
Dependencies
-
Header File: sgs_sed_api.h
-
Library File: libsgsalgo_sed.a / libsgsalgo_sed.so
-
3. Structure Definitions¶
The relevant structures for the algorithm are defined as follows:
| Data Type | Definition |
|---|---|
| SedInputAttr_t | Algorithm Input Information Structure |
| SedInput_t | Algorithm Input Audio Data Structure |
| SedInit_t | Algorithm Initialization Parameter Structure |
| SedOutput_t | Algorithm Output Structure |
| SedParams_t | Algorithm Configurable Parameters Structure |
| SedLsdOutput_t | Loud Sound Detection Output Data Structure |
3.1 SedInputAttr_t¶
-
Description
Defines the related information of the model's input data
-
Definition
typedef struct { MI_S32 frame_len; } SedInputAttr_t; -
Members
Member Name Descriptions frame_len Frame length of the input audio data (number of samples per input) -
Related data types and interfaces
3.2 SedInput_t¶
-
Description
Defines the structure for the algorithm's input data
-
Definition
typedef struct { MI_S32 sample_rate; MI_S32 bit_width; void *buffer; MI_U32 buffer_len; } SedInput_t; -
Members
Member Name Descriptions sample_rate Sampling rate of the input audio bit_width Quantization bit number of the input audio buffer Pointer to the input audio buffer buffer_len Length of the input audio buffer (in bytes) -
Related data types and interfaces
3.3 SedInit_t¶
-
Description
Defines the structure for the algorithm's initialization parameters
-
Definition
typedef struct { char ipu_firmware_path[MAX_SED_STRLEN]; // ipu_firmware.bin path char model_path[MAX_SED_STRLEN]; // model path MI_BOOL create_device; // set false to create ipu device outside algo lib MI_BOOL destroy_device; // set false to destroy ipu device outside algo lib void *model_buffer; // set it when load model from memory MI_U32 model_buffer_len; // set it when load model from memory } SedInit_t; -
Members
Member Name Descriptions ipu_firmware_path Path to the IPU firmware model_path Path to the model create_device Whether to create the IPU device within the algorithm library (default true, set false to create outside the library) destroy_device Whether to destroy the IPU device within the algorithm library (default true, set false to destroy outside the library) model_buffer Set it when loading the model from memory( model pathshould be empty), setNULLif load model frommodel_pathmodel_buffer_len Set when loading the model from memory, specifies the length of the model buffer -
Related data types and interfaces
3.4 SedOutput_t¶
-
Description
Defines the structure for the algorithm's output
-
Definition
typedef struct { MI_BOOL is_valid; MI_S32 event_index; MI_FLOAT event_score; } SedOutput_t; -
Members
Member Name Descriptions is_valid Indicates if the output result is valid(algo would be invoked every certain interval(not every times called), is_valid = true means algo is invoked and result is valid) event_index Index of the detected audio event (correlation with event types depends on the model), see Algorithm Description for details event_score Score of the detected audio event (range from 0.0 to 1.0) -
Related data types and interfaces
3.5 SedParams_t¶
-
Description
Defines the configurable parameters of the algorithm
-
Definition
typedef struct { MI_S32 smooth_length; // output result smooth len; default 0 MI_FLOAT vad_threshold; // vad threshold; default -45 MI_FLOAT event_threshold[MAX_SED_EVENT_NUM]; // event score threshold; default [0.5, 0.5] MI_U32 min_trigger_times[MAX_SED_EVENT_NUM]; // return a positive event when detected more than min_trigger_times, default 1 MI_FLOAT lsd_threshold; // large sound detection rms threshold(unit: dB) default 0 means lsd is disabled } SedParams_t; -
Members
Member Name Descriptions smooth_length Length of the smoothing window for output results (default 0, no smoothing) vad_threshold VAD threshold for sound event detection (range -50.0 to 0.0, default -45.0) event_threshold Threshold for the score of audio events (default 0.5, suggested 0.35 with front-end, 0.7 without front-end) min_trigger_times Min trigger times of an event,only triggered an event when encouter scores exceed threshold for min_trigger_timesinsmooth_lengthtime window (default 1)lsd_threshold RMS threshold for large sound detection (in dB, default 0, disables LSD) -
Related data types and interfaces
3.6 SedLsdOutput_t¶
-
Description
Defines the structure for the output data of the loud sound detection
-
Definition
typedef struct { MI_BOOL is_large_sound; MI_FLOAT max_rms; } SedLsdOutput_t; -
Members
Member Name Descriptions is_large_sound Indicates if the current audio is a loud sound max_rms Maximum RMS of the current audio (in dB) -
Related data types and interfaces
4. Error Code¶
| Error Code | Value | Description |
|---|---|---|
| E_ALGO_SUCCESS | 0 | Operation successful |
| E_ALGO_HANDLE_NULL | 1 | Algorithm handle is null |
| E_ALGO_INVALID_PARAM | 2 | Invalid input parameter |
| E_ALGO_DEVICE_FAULT | 3 | Hardware error |
| E_ALGO_LOADMODEL_FAIL | 4 | Model loading failed |
| E_ALGO_INIT_FAIL | 5 | Algorithm initialization failed |
| E_ALGO_NOT_INIT | 6 | Algorithm has not been initialized |
| E_ALGO_INPUT_DATA_NULL | 7 | Algorithm input data is null |
| E_ALGO_INVALID_INPUT_SIZE | 8 | Invalid dimensions of the algorithm input data |
| E_ALGO_INVALID_LICENSE | 9 | Invalid license permission |
| E_ALGO_MEMORY_OUT | 10 | Insufficient memory |
| E_ALGO_FILEIO_ERROR | 11 | File read/write operation error |
| E_ALGO_INVALID_OUTPUT_SIZE | 12 | Invalid dimensions of the algorithm output data |
| E_ALGO_INVALID_DECODE_MODE | 13 | Invalid decode mode |
| E_ALGO_MODEL_INVOKE_ERROR | 14 | Model invoke error |
| E_ALGO_INVALID_FILE | 15 | Invalid file |