Collect Data
In our experiments, we use Prometheus to monitor the performance of the database clusters. Then, we use prometheus_api_client to collect the data from Prometheus. Please make sure you have finished the setup before you start.
You can find the code for collecting data in collect_data.py. You can directly execute the script to collect and pre-process the data.
Usage:
python collect_data.py [-h] [--database {mysql,tidb}]
[--start_time START_TIME]
[--duration DURATION]
[--prometheus_url PROMETHEUS_URL]
[--exporter_url EXPORTER_URL]
[--query_list QUERY_LIST]
[--query_info QUERY_INFO]
[--drop_list DROP_LIST]
[--raw_data_output RAW_DATA_OUTPUT]
[--output_file_dr OUTPUT_FILE_DIR]
Here is the arguments table for the script:
| Argument | Description | Special Remark |
|---|---|---|
| database | Select which database to be collected. | str, only two choice: mysql and tidb. |
| start_time | Start time for data collection. | int, should be Unix timestamp. |
| duration | Duration (hours) for data collection, i.e., the time range of collected data. | int. |
| prometheus_url | The url to access Prometheus service. | str, e.g., http://localhost:9090. |
| exporter_url | The url of database exporter service. | str, use kubectl -n ${namespace} svc to find it, or you can find it in Grafana’s dashboard. |
| query_list | Path to query list file (should be txt file). | str, we have provided our query list file in our repo. |
| query_info | Path to query_csv file (should be csv file). | str, we have provided our query info file in our repo. |
| drop_list | Path to dropped table list file (should be csv file). | str, we have provided our drop list file in our repo. Note that only MySQL has the drop list. |
| raw_data_output | Path to output directory of raw data. | str. |
| output_file_dir | Path to the dir of output file. | str, two file, combined.csv and blip.csv will be outputted in this directory. The latter will be fed to BLIP to learn the causal Graph. |