IRIDIA-AF, a large paroxysmal atrial fibrillation long-term electrocardiogram monitoring database

The database14 is available on Zenodo ( The IRIDIA-AF database is composed of a general metadata file and 167 folders, one for each record in the database. Each record folder includes the ECG waveform from the Holter record and the associated annotations. It also contains the RR intervals and associated annotations. This section describes the composition of the data repository. The composition is graphically described in Fig. 4.

Fig. 4
figure 4

Files composition in the IRIDIA-AF database.

General metadata

We provide the general metadata about the record in a single table, contains in a csv format file as shown in Fig. 5. The file contains multiple columns with information about the patient and the record. The first columns contain information about the patient:

  1. 1.

    patient_id: the identifier of the patient;

  2. 2.

    patient_sex: the sex of the patient, i.e. male or female;

  3. 3.

    patient_age: the age of the patient at the day of the record, or the first day of record if there are multiple record days.

    The following columns contain general information about the record itself:

  4. 4.

    record_id: the identifier of the record;

  5. 5.

    record_date: the (shifted) date of the record;

  6. 6.

    record_start_time: the start time of the record in ISO 8601 format;

  7. 7.

    record_end_time: the initial end time of the record in ISO 8601 format;

  8. 8.

    record_timedelta: the time delta in seconds between the start and the end of the record.

    Finally, the following columns contain information about the files:

  9. 9.

    record_files: the number of ECG files for the record;

  10. 10.

    record_seconds: the real number of seconds in the record, i.e. this can differ from the record_timedelta due to the correction of the end of the record if noise was present;

  11. 11.

    record_samples: the real number of samples in all ECG files after end-of-file correction;

Fig. 5
figure 5

Content of the first and last lines in the general metadata file.

The age range is distributed between 41 and 99 years, with a mean age of 72 ± 11 years. The distribution is presented in Fig. 6. 53.2% are male and 46.7% are female. Mean CHADVASC score is 3.16 and range from 1 to 9. Holter are split into 24 hours record and most of the records (n = 103) have only one day of record, as shown in Fig. 7. In total, 388 AF episodes were recorded. Most of the records have only one (n = 96) or two (n = 31) AF episodes, but some records have up to 12 episodes, as show in Fig. 8.

Fig. 6
figure 6

Distribution of patient age.

Fig. 7
figure 7

Distribution of record days (continuous period of 24 hours) per record.

Fig. 8
figure 8

Distribution of the number of AF episodes per record.

ECG waveform data

The ECG waveform data is stored in HDF5 format, in the form of an array of shape L × 2, where 2 correspond to the two leads (lead I and lead II) and L correspond to the number of records points, i.e. number of seconds × sampling frequency (200 Hz). This format is designed for data storage and supported by a wide variety of programming language. In addition, the compression level helped to reduce the dataset size without losing information quality and the data can be loaded in slices rather than having to load the whole file in memory. Each record is split in a multiple 24-hour part. Each part is stored in a separate HDF5 record associated with the record identifier and an identifier, e.g. record_000_ecg_00.h5 for the first 24-hour of record and record_000_ecg_01.h5 for the second 24-hour. The number of available ECG files is given in the general metadata file, stored in the record_n_files value. It should be noted that the first 30 seconds of record, i.e. from index 0 to index 6000, correspond to the calibration phase of the recording device, as shown in Fig. 9.

Fig. 9
figure 9

Calibration phase over the first 30 seconds of the ECG record record_077.

ECG waveform annotations

For each record, one ECG waveform metadata file contains the annotations about each AF crisis with one AF onset, i.e. transition from NSR to AF, and one AF termination, i.e. transition from AF to NSR. Each line contains information about one crisis with the following information:

  1. 1.

    start_datetime: the day and time of the AF onset, in ISO 8601 format;

  2. 2.

    start_file_index: the number of the file in which the AF start;

  3. 3.

    start_qrs_index: the index of the QRS complex where the AF start, i.e. the first beat in AF;

  4. 4.

    end_datetime: the day and time of the AF termination in ISO 8601 format;

  5. 5.

    end_file_index: the number of the file in which the AF ends;

  6. 6.

    end_qrs_index: the index of the QRS complex of AF termination, i.e. the first NSR beat after the AF termination;

  7. 7.

    af_duration: the duration of the AF crisis in seconds;

  8. 8.

    nsr_before_duration: the duration of NSR before the AF onset, i.e. the time between this AF crisis and the previous AF crisis or the start of the record.

An example is presented in Fig. 10. We chose to use the start and end keywords to represent AF onset and AF termination to make it as easy as possible to understand the file content. Records are split in 24-hour files and therefore, an AF event can be starting on the calendar date d and end in calendar day d + 1 and still be in the same 24-hour record file. AF can also extend over several days of recordings, e.g. an AF crisis can start in record 0 and end in record 1.

Fig. 10
figure 10

Content of ECG annotations file of record record_001.

RR intervals data

The RR intervals data file contains RR intervals derived from the automatic QRS annotations by Microport Syneview. The RR intervals are represented in milliseconds. The data is stored in HDF5 format, in the form of an array of length L, where L correspond to the number of RR intervals. As for the ECG, the first 30 second of RR intervals correspond to the calibration phase. Therefore, the first 30 RR intervals are equal to 1000 ms. It should be noted that this number may vary slightly from one file to another, as the Microport automatic annotation does not always produce similar analyses for this phase. As for the ECG waveform data, each record day is stored in a separate record.

RR intervals annotations

The RR intervals metadata files contain the information about AF crisis correspondence with automatic QRS detection. The data is presented in a csv file, containing one line for each AF crisis in the record, as shown in Fig. 11. The information are the following:

  1. 1.

    start_file_index: the index of the record with the AF onset;

  2. 2.

    start_rr_index: the index in the corresponding file where the AF start, i.e. the RR intervals with one beat in NSR and the following beat in AF;

  3. 3.

    end_file_index: the index of the record with the AF termination;

  4. 4.

    end_rr_index: the index in the corresponding file where the AF ends, i.e. the RR intervals with one beat in AF and one beat in NSR.

Fig. 11
figure 11

Content of the RR intervals annotations file of record record_001.

Source link

Be the first to comment

Leave a Reply

Your email address will not be published.