Using strace in Linux to troubleshoot database performance issues

Using strace in Linux to troubleshoot database performance issues

Use Case:

Using top and iostat, we know that the database is slow due to IO constraints. What if we want to narrow down to the exact file causing the performance issues? What if we want to know what is the DB process waiting on?

Find the PID of the process

[root@hostname ~]# ps -aef | grep mysqld
mysql 16013 15513 23 Dec08 ? 1-00:17:41 /usr/sbin/mysqld --basedir=/usr --datadir=/data/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --wsrep-provider=/usr/lib64/libgalera_smm.so --log-error=/var/log/mysql/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --wsrep_start_position=a09ed4ce-79bc-11e6-9b2e-63d190df3dd7:42170895

Find out which system calls took the longest time

[root@hostname ~]# strace -cp 16013
Process 16013 attached
^CProcess 16013 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.63 28.601334 147430 194 poll
 1.33 0.387070 387070 1 restart_syscall
 0.03 0.008057 22 359 futex
 0.00 0.000891 5 195 accept
 0.00 0.000369 2 195 fcntl
------ ----------- ----------- --------- --------- ----------------
100.00 28.997721 944 total

Attach strace to the PID to find out the file descriptor it is waiting for.

[root@hostname ~]# strace -p 16013
Process 16013 attached
restart_syscall(<... resuming interrupted call ...>) = 1
accept(34, {sa_family=AF_LOCAL, NULL}, [2]) = 128
fcntl(128, F_SETFD, FD_CLOEXEC) = 0
futex(0x1e937c4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x1e937c0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x1e93800, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=50, events=POLLIN}, {fd=34, events=POLLIN}], 2, 4294967295) = 1 ([{fd=34, revents=POLLIN}])
accept(34, {sa_family=AF_LOCAL, NULL}, [2]) = 128
fcntl(128, F_SETFD, FD_CLOEXEC) = 0
futex(0x1e937c4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x1e937c0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x1e93800, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=50, events=POLLIN}, {fd=34, events=POLLIN}], 2, 4294967295) = 1 ([{fd=34, revents=POLLIN}])
accept(34, {sa_family=AF_LOCAL, NULL}, [2]) = 128
fcntl(128, F_SETFD, FD_CLOEXEC) = 0

Find the file that the file descriptor is pointing to!

ls -altr /proc/16013
lrwx------. 1 root root 64 Dec 12 15:46 50 -> /data/mysql/zabbix/history_uint#P#p201612160000.ibd
lrwx------. 1 root root 64 Dec 12 15:46 51 -> /data/mysql/zabbix/history_uint#P#p201612170000.ibd

Now we know mysqld process has been waiting for partitioned data file history_uint#P#p201612160000.ibd!

Hope this helps.

Regards,
Wei Shan

Advertisements
  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: