Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Collect Data

In our experiments, we use Prometheus to monitor the performance of the database clusters. Then, we use prometheus_api_client to collect the data from Prometheus. Please make sure you have finished the setup before you start.

You can find the code for collecting data in collect_data.py. You can directly execute the script to collect and pre-process the data.

Usage:

python collect_data.py [-h] [--database {mysql,tidb}]
                       [--start_time START_TIME]
                       [--duration DURATION]
                       [--prometheus_url PROMETHEUS_URL]
                       [--exporter_url EXPORTER_URL]
                       [--query_list QUERY_LIST]
                       [--query_info QUERY_INFO]
                       [--drop_list DROP_LIST]
                       [--raw_data_output RAW_DATA_OUTPUT]
                       [--output_file_dr OUTPUT_FILE_DIR]

Here is the arguments table for the script:

Argument Description Special Remark
database Select which database to be collected. str, only two choice: mysql and tidb.
start_time Start time for data collection. int, should be Unix timestamp.
duration Duration (hours) for data collection, i.e., the time range of collected data. int.
prometheus_url The url to access Prometheus service. str, e.g., http://localhost:9090.
exporter_url The url of database exporter service. str, use kubectl -n ${namespace} svc to find it, or you can find it in Grafana’s dashboard.
query_list Path to query list file (should be txt file). str, we have provided our query list file in our repo.
query_info Path to query_csv file (should be csv file). str, we have provided our query info file in our repo.
drop_list Path to dropped table list file (should be csv file). str, we have provided our drop list file in our repo. Note that only MySQL has the drop list.
raw_data_output Path to output directory of raw data. str.
output_file_dir Path to the dir of output file. str, two file, combined.csv and blip.csv will be outputted in this directory. The latter will be fed to BLIP to learn the causal Graph.