Non-Contact Health Monitoring During Daily Personal Care Routines
Here is LADH dataset collected by Qinghai University. The dataset collected 240 synchronized non-contact facial videos (including both RGB and IR modalities)across five scenarios(including sitting, sitting while brushing teeth and combing hair, standing, standing while brushing teeth and combing hair, and post-exercise), with 11 participants taking part continuously over 10 days. This dataset captures PPG, respiration rate (RR), and SpO2, and is designed to validate the accuracy and superiority of rPPG in daily personal care scenarios.
TABLE(DATASET COMPARISON)
| Dataset | Videos | Camera-Position | Vitals | Long-term | Obscured |
|---|---|---|---|---|---|
| PURE | 40 | Face | PPG/SpO₂ | ✗ | ✗ |
| UBFC-PPG | 42 | Face | PPG | ✗ | ✗ |
| MMPD | 660 | Face | PPG | ✗ | ✗ |
| SUMS | 80 | Face+Finger | PPG/SpO₂/RR | ✗ | ✗ |
| LADH | 240 | Face(RGB+IR) | PPG/SpO₂/RR | ✓ | ✓ |
We recruited 21 participants to collect data under daily five scenarios. Data collection utilized a camera module to capture facial videos of participants’ both RGB and IR modalities, physiological ground-truth signals were recorded using a CMS50E pulse oximeter for PPG and SpO2, and an HKH-11C respiratory sensor to monitor breathing patterns. Video recordings were acquired at a resolution of 640×480 pixels and a frame rate of 30 frames per second (FPS). The PPG signals were recorded at a frequency of 20 Hz, while respiratory waves were captured at 50 Hz. The experiment setup is shown as follows.
(Schematic illustration of the experimental setup of data collection while participants are brushing teeth.)
This study divided data collection into two groups: the first dataset was collected from 10 subjects performing five scenarios in a single day, while the second dataset was obtained from 11 subjects who conducted the same five scenarios daily over 10 consecutive days. During the seated resting condition (station 1), subjects wore an HKH-11C respiratory sensor on their abdomen and a CMS50E pulse oximeter on their left index finger while sitting upright facing the camera. They were instructed to remain motionless, maintain a fixed gaze at the camera, and undergo two minutes of physiological data recording. Subsequently, while maintaining the same equipment setup in a seated position, subjects performed toothbrushing and hair-combing actions for two minutes, which was labeled as station 2. Station 3 represented a standing resting state, where subjects stood in front of the camera with the same sensor configuration as station 1, and data were collected for two minutes. Station 4 repeated the actions of station 2 but in a standing posture, also lasting two minutes. Following the completion of the first four conditions, subjects engaged in physical exercise (e.g., squats, high knees, or breath-holding) to induce physiological changes. Post-exercise, while maintaining the same sensor setup as station 1, subjects underwent an additional two-minute recording period, designated as station 5.
(A visual illustration of our daily data collection protocol.)
| state-1 | state-2 | state-3 | state-4 | state-5 | |
|---|---|---|---|---|---|
| face-rgb | ![]() |
![]() |
![]() |
![]() |
![]() |
| face-ir | ![]() |
![]() |
![]() |
![]() |
![]() |
| bvp | ![]() |
![]() |
![]() |
![]() |
![]() |
| rr | ![]() |
![]() |
![]() |
![]() |
![]() |
We introduce a novel design in the FusionNet module by incorporating a modality-aware fusion mechanism. Specifically, a gated feature selection strategy is employed to adaptively modulate the contribution of each modality based on its global contextual representation, thereby effectively integrating information from both facial RGB and facial IR video streams. This design enables the model to dynamically emphasize the more informative modality under varying environmental conditions (e.g., changes in illumination), significantly enhancing the robustness and generalizability of the physiological signal estimation framework.
( FusionPhys Model with Input frames of facial RGB and facial IR. PPG, RR and SpO2 estimation tasks are trained simultaneously with a combined loss.)
STEP1: bash setup.sh
STEP2: conda activate rppg-toolbox
STEP3: pip install -r requirements.txt
Please use config files under ./configs/train_configs/LADH_PHYSNET_*
STEP1: Download the LADH raw data by asking the paper authors.
STEP2: Modify ./configs/train_configs/LADH_PHYSNET_face_RGB_IR_both.yaml
STEP4: Run python main.py --config_file ./configs/train_configs/LADH_PHYSNET_face_RGB_IR_both.yaml --r_lr 9e-3 --epochs 30 --path res_30_9e-3/face_RGB_IR_both
Note1: Preprocessing requires only once; thus turn it off on the yaml file when you train the network after the first time.
Note2: The example yaml setting will allow 70% of LADH(state 1, 2, 3, 4, 5) to train, 20% of LADH to valid and 20% of LADH to test. After training, it will use the best model(with the least validation loss) to test on LADH.(This is the day-wise partitioning experiment)
Note3: You can set the learning rate, epochs and save path
The rPPG-Toolbox uses yaml file to control all parameters for training and evaluation. You can modify the existing yaml files to meet your own training and testing requirements.
Here are some explanation of parameters:
-
train_and_test: train on the dataset and use the newly trained model to test.only_test: you need to set INFERENCE-MODEL_PATH, and it will use pre-trained model initialized with the MODEL_PATH to test.
-
bvp: only bvp => hr.spo2: only spo2.rr: only rr.both: bvp => hr and spo2 and rr.
-
face: only RGB video.face_IR: only IR video.both: RGB and IR video.
-
-
DATA.INFO.STATE: Filter the dataset by 5 states, like [1, 2, 3, 4, 5] -
DATA.INFO.TYPE: 1 stands for face, 2 stands for face_IR. like [1, 2] -
DATA.DATASET_TYPE: face, face_IR or both, the type of dataset -
DATA_PATH: The input path of raw data -
CACHED_PATH: The output path to preprocessed data. This path also houses a directory of .csv files containing data paths to files loaded by the dataloader. This filelist (found in default at CACHED_PATH/DataFileLists). These can be viewed for users to understand which files are used in each data split (train/val/test) -
EXP_DATA_NAMEIf it is "", the toolbox generates a EXP_DATA_NAME based on other defined parameters. Otherwise, it uses the user-defined EXP_DATA_NAME. -
BEGIN" & "END: The portion of the dataset used for training/validation/testing. For example, if theDATASETis PURE,BEGINis 0.0 andENDis 0.8 under the TRAIN, the first 80% PURE is used for training the network. If theDATASETis PURE,BEGINis 0.8 andENDis 1.0 under the VALID, the last 20% PURE is used as the validation set. It is worth noting that validation and training sets don't have overlapping subjects. -
DATA_TYPE: How to preprocess the video data -
LABEL_TYPE: How to preprocess the label data -
DO_CHUNK: Whether to split the raw data into smaller chunks -
CHUNK_LENGTH: The length of each chunk (number of frames) -
CROP_FACE: Whether to perform face detection -
DYNAMIC_DETECTION: If False, face detection is only performed at the first frame and the detected box is used to crop the video for all of the subsequent frames. If True, face detection is performed at a specific frequency which is defined byDYNAMIC_DETECTION_FREQUENCY. -
DYNAMIC_DETECTION_FREQUENCY: The frequency of face detection (number of frames) if DYNAMIC_DETECTION is True -
LARGE_FACE_BOX: Whether to enlarge the rectangle of the detected face region in case the detected box is not large enough for some special cases (e.g., motion videos) -
LARGE_BOX_COEF: The coefficient of enlarging. See more details athttps://github.com/ubicomplab/rPPG-Toolbox/blob/main/dataset/data_loader/BaseLoader.py#L162-L165.
-
The toolbox supports the LADH dataset. Cite corresponding papers when using.
-
- In order to use this dataset in a deep model, you should organize the files as follows:
data/LADH/ | |-- 12_05/ | |-- p_12_05_caip | |-- v01 | |-- BVP.csv | |-- HR.csv | |-- RR.csv | |-- SpO2.csv | |-- frames_timestamp_IR.csv | |-- frames_timestamp_RGB.csv | |-- video_RGB_H264.avi | |-- video_IR_H264.avi | |-- v02 | |-- v03 | |-- v04 | |-- v05 | |-- p_12_05_huangxj | |-- v01 | |-- ... | |-- v02 | |-- v03 | |-- v04 | |-- v05 | |-- p_12_05_liutj | |-- p_12_05_lujg | |-- ... | |-- 12_06/ | |-- p_12_06_caip | |-- p_12_06_huangxj | |-- p_12_06_liutj | |-- p_12_06_lujg | |... | |-- ... | |
In the subject-wise partitioning experiment, multimodal fusion with joint training outperforms single-modality and single-task approaches, particularly for HR and RR estimation. The dataset was partitioned such that data from 8 subjects were used for training, 3 subjects for validation, and an additional dataset from 10 individuals was reserved for testing. The results indicated significant improvements in the MAE for HR, which decreased from 9.02 to 7.12, reflecting a 21.06% error reduction, and for RR, which decreased from 2.25 to 1.43, reflecting a 36.44% error reduction. This suggests that multimodal fusion and joint training are more effective for periodic tasks likeHRandRR,whileSpO2doesnot exhibit clear periodic fluctuations and is inferred through indirect signals.
TABLE 1 :RESULTS OF HR-SpO₂-RR MULTI-TASK TRAINING BY SUBJECT
| Modality | HR TASK | SpO2 TASK | RR TASK | |||
|---|---|---|---|---|---|---|
| MAE↓ | MAPE↓ | MAE↓ | MAPE↓ | MAE↓ | MAPE↓ | |
| Both(Single Task) | 9.02 | 10.99 | 1.10 | 1.19 | 2.25 | 10.16 |
| RGB(Multi Task) | 9.34 | 12.08 | 1.29 | 1.39 | 3.08 | 13.78 |
| IR(Multi Task) | 12.99 | 15.73 | 1.23 | 1.33 | 2.41 | 11.20 |
| Both(Multi Task) | 7.12 | 8.93 | 1.14 | 1.23 | 1.43 | 6.53 |
In the day-wise partitioning experiment, multimodal fusionwith joint training improvesHRestimation, and multitask learningbenefits SpO2 andRRestimation. In thisexperiment,datacollectedover10daysweresplit into 7 days for training, 2 days forvalidation, and 1 dayfor testing. The results showed that for HR estimation, multimodal fusion with joint training outperformed single-modality and single-task approaches, reducing MAE from 5.23 to 4.99 (a 4.59% error reduction). In IR-based joint training, errors for SpO2 and RR were reduced by 2.29% and 41.25%, respectively. This highlights the effectiveness of multimodal fusion for HR and multitask learning for SpO2 and RR.
TABLE 2 :RESULTS OF HR-SpO₂-RR MULTI-TASK TRAINING BY DAY
| Modality | HR TASK | SpO2 TASK | RR TASK | |||
|---|---|---|---|---|---|---|
| MAE↓ | MAPE↓ | MAE↓ | MAPE↓ | MAE↓ | MAPE↓ | |
| Both(Single Task) | 5.23 | 5.44 | 1.31 | 1.38 | 2.57 | 13.45 |
| RGB(Multi Task) | 5.73 | 5.77 | 1.35 | 1.43 | 1.99 | 9.12 |
| IR(Multi Task) | 8.35 | 8.98 | 1.28 | 1.36 | 1.51 | 6.74 |
| Both(Multi Task) | 4.99 | 5.21 | 1.29 | 1.37 | 2.24 | 11.38 |
Comparison of the subject-wise and day-wise experiments illustrates how day-wise analysis can improve the adaptability of models to individual user data. While the subject-wise experiment shows strong performance for periodic tasks through multimodal fusion and joint training, the day-wise experiment emphasizes the ability of the model to adapt more closely to individual data. This could indicate that, in future personalized health monitoring systems, such as a health mirror, models can better accommodate daily variations and offer more tailored results to users, enhancing the accuracyof HR, RR, and SpO2 estimation on an individual level.
The table1 shows the Mean Absolute Error (MAE) and Mean Absolute Percent Error (MAPE) performance of the LADH dataset under unsupervised algorithms.
| Test Set | LADH | ||||||
|---|---|---|---|---|---|---|---|
| Method | ICA | POS | CHROM | GREEN | LGI | PBV | OMIT |
| MAE↓ | 22.09 | 11.27 | 12.34 | 26.67 | 19.54 | 21.73 | 19.52 |
| MAPE↓ | 23.99 | 12.17 | 13.34 | 29.21 | 21.21 | 23.71 | 21.19 |
Table 2 shows the cross-dataset experimental results of the LADH, SUMS, and PURE datasets on the PhysNet model.
| Train Set | LADH | SUMS | PURE | ||||
|---|---|---|---|---|---|---|---|
| Test Set | MAE↓ | MAPE↓ | MAE↓ | MAPE↓ | MAE↓ | MAPE↓ | |
| PhysNet | LADH | 8.15 | 9.19 | 16.93 | 18.2 | 17 | 18.78 |
| SUMS | 11.23 | 15.45 | 3.36 | 3.84 | 14.95 | 17.11 | |
| PURE | 8.1 | 8.83 | 7.97 | 8.87 | 0.59 | 0.77 | |
The dataset included 240 videos from 21 subjects. Dataset size is 133.22 GB.
There are two ways for downloads: OneDrive and Baidu Netdisk.
To access the dataset, you are supposed to download this data release agreement.
Please scan and dispatch the completed agreement via your institutional email to tjk24@mails.tsinghua.edu.cn and cc yuntaowang@tsinghua.edu.cn. The email should have the subject line 'LADH Access Request - your institution.' In the email, outline your institution's website and publications for seeking access to the LADH, including its intended application in your specific research project. The email should be sent by a faculty rather than a student.
Title: Non-Contact Health Monitoring During Daily Personal Care Routines
Xulin Ma, Jiankai Tang, Zhang Jiang, Songqin Cheng, Yuanchun Shi, Dong LI, Xin Liu, Daniel McDuff, Xiaojing Liu, Yuntao Wang, "Non-Contact Health Monitoring During Daily Personal Care Routines", IEEE BSN, 2025
@inproceedings{ma2025non,
title={Non-Contact Health Monitoring During Daily Personal Care Routines},
author={Ma*, Xulin and Tang*, Jiankai and Jiang, Zhang and Cheng, Songqin and Shi, Yuanchun and Li, Dong and Liu, Xin and McDuff, Daniel and Liu, Xiaojing and Wang, Yuntao},
booktitle={IEEE BSN 2025},
year={2025}
}
@inproceedings{tang2023mmpd,
title={MMPD: Multi-Domain Mobile Video Physiology Dataset},
author={Tang, Jiankai and Chen, Kequan and Wang, Yuntao and Shi, Yuanchun and Patel, Shwetak and McDuff, Daniel and Liu, Xin},
booktitle={2023 45th Annual International Conference of the IEEE Engineering in Medicine \& Biology Society (EMBC)},
pages={1--5},
year={2023},
organization={IEEE}
}
@inproceedings{liu2024rppg,
title={rPPG-Toolbox: Deep Remote PPG Toolbox},
author={Liu, Xin and Narayanswamy, Girish and Paruchuri, Akshay and Zhang, Xiaoyu and Tang, Jiankai and Zhang, Yuzhe and Sengupta, Roni and Patel, Shwetak and Wang, Yuntao and McDuff, Daniel},
booktitle={Advances in Neural Information Processing Systems},
volume={36},
year={2024}
}
@inproceedings{liu2024summit,
title={Summit Vitals: Multi-Camera and Multi-Signal Biosensing at High Altitudes},
author={Liu*, Ke and Tang*, Jiankai and Jiang, Zhang and Wang, Yuntao and Liu, Xiaojing and Li, Dong and Shi, Yuanchun},
booktitle={2024 IEEE Smart World Congress (SWC)},
pages={284--291},
year={2024}
}

























