Archive for category Linux

PostgreSQL 9.6 on RHEL 7.2 -FATAL: semctl(SETVAL) failed: Invalid argument

PostgreSQL 9.6 on RHEL 7.2 -FATAL: semctl(SETVAL) failed: Invalid argument

The error below started occurring once we upgraded to PostgreSQL 9.6 on RHEL 7.2. It was strange because all we did was had a user to login, check the status of pgsql and logout.

 Dec 7 22:50:01 hostname systemd: Created slice user-1004.slice.
 Dec 7 22:50:01 hostname systemd: Starting user-1004.slice.
 Dec 7 22:50:01 hostname systemd: Started Session 2243 of user postgres.
 Dec 7 22:50:01 hostname systemd: Starting Session 2243 of user postgres.
 Dec 7 22:50:01 hostname systemd: Created slice user-0.slice.
 Dec 7 22:50:01 hostname systemd: Starting user-0.slice.
 Dec 7 22:50:01 hostname systemd: Started Session 2244 of user root.
 Dec 7 22:50:01 hostname systemd: Starting Session 2244 of user root.
 Dec 7 22:50:01 hostname postgres[12064]: [432-1] [unknown]: [local]: 58481389.2f20: LOG: connection received: host=[local]
 Dec 7 22:50:01 hostname postgres[12064]: [433-1] postgres: [local]: 58481389.2f20: FATAL: semctl(153976863, 2, SETVAL, 0) failed: Invalid argument
 Dec 7 22:50:01 hostname postgres[21799]: [432-1] : : 58444abd.5527: LOG: server process (PID 12064) exited with exit code 1
 Dec 7 22:50:01 hostname postgres[21799]: [433-1] : : 58444abd.5527: LOG: terminating any other active server processes
 Dec 7 22:50:01 hostname postgres[11514]: [434-1] repmgr: <IP_ADDRESS>(41171): 58480edb.2cfa: WARNING: terminating connection because of crash of another server process
 Dec 7 22:50:01 hostname postgres[11514]: [434-2] repmgr: <IP_ADDRESS>(41171): 58480edb.2cfa: DETAIL: The postmasteIP_ADDRESSr has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
 Dec 7 22:50:01 hostname postgres[11514]: [434-3] repmgr: <IP_ADDRESS>(41171): 58480edb.2cfa: HINT: In a moment you should be able to reconnect to the database and repeat your command.
 Dec 7 22:50:01 hostname postgres[12074]: [434-1] [unknown]: ::1(48077): 58481389.2f2a: LOG: connection received: host=::1 port=48077
 Dec 7 22:50:01 hostname postgres[12074]: [435-1] repmgr: ::1(48077): 58481389.2f2a: FATAL: the database system is in recovery mode
 Dec 7 22:50:01 hostname Backup: [INFO]DB check start.
 Dec 7 22:50:01 hostname postgres[21799]: [434-1] : : 58444abd.5527: LOG: all server processes terminated; reinitializing
 Dec 7 22:50:01 hostname postgres[21799]: [435-1] : : 58444abd.5527: LOG: could not remove shared memory segment "/PostgreSQL.1721076088": No such file or directory
 Dec 7 22:50:01 hostname postgres[21799]: [436-1] : : 58444abd.5527: LOG: semctl(152961024, 0, IPC_RMID, ...) failed: Invalid argument
 Dec 7 22:50:01 hostname postgres[21799]: [437-1] : : 58444abd.5527: LOG: semctl(152993793, 0, IPC_RMID, ...) failed: Invalid argument

After researching on the issue, we found this bug :).

Resolution

1) Set RemoveIPC=no in /etc/systemd/logind.conf

2) Reboot the server or restart systemd-logind as follows:
 # systemctl daemon-reload
 # systemctl restart systemd-logind

Reference:

  1. https://bugzilla.redhat.com/show_bug.cgi?id=1287618
  2. https://bugzilla.redhat.com/show_bug.cgi?id=1264533

Note:

  • Please note that I have seen this RHEL 7 issue with Oracle database too. It’s a RHEL 7 bug so it should applies to most database using IPC calls.

Hope this helps.

Regards,
Wei Shan

Advertisements

Leave a comment

Using strace in Linux to troubleshoot database performance issues

Using strace in Linux to troubleshoot database performance issues

Use Case:

Using top and iostat, we know that the database is slow due to IO constraints. What if we want to narrow down to the exact file causing the performance issues? What if we want to know what is the DB process waiting on?

Find the PID of the process

[root@hostname ~]# ps -aef | grep mysqld
mysql 16013 15513 23 Dec08 ? 1-00:17:41 /usr/sbin/mysqld --basedir=/usr --datadir=/data/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --wsrep-provider=/usr/lib64/libgalera_smm.so --log-error=/var/log/mysql/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --wsrep_start_position=a09ed4ce-79bc-11e6-9b2e-63d190df3dd7:42170895

Find out which system calls took the longest time

[root@hostname ~]# strace -cp 16013
Process 16013 attached
^CProcess 16013 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.63 28.601334 147430 194 poll
 1.33 0.387070 387070 1 restart_syscall
 0.03 0.008057 22 359 futex
 0.00 0.000891 5 195 accept
 0.00 0.000369 2 195 fcntl
------ ----------- ----------- --------- --------- ----------------
100.00 28.997721 944 total

Attach strace to the PID to find out the file descriptor it is waiting for.

[root@hostname ~]# strace -p 16013
Process 16013 attached
restart_syscall(<... resuming interrupted call ...>) = 1
accept(34, {sa_family=AF_LOCAL, NULL}, [2]) = 128
fcntl(128, F_SETFD, FD_CLOEXEC) = 0
futex(0x1e937c4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x1e937c0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x1e93800, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=50, events=POLLIN}, {fd=34, events=POLLIN}], 2, 4294967295) = 1 ([{fd=34, revents=POLLIN}])
accept(34, {sa_family=AF_LOCAL, NULL}, [2]) = 128
fcntl(128, F_SETFD, FD_CLOEXEC) = 0
futex(0x1e937c4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x1e937c0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x1e93800, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=50, events=POLLIN}, {fd=34, events=POLLIN}], 2, 4294967295) = 1 ([{fd=34, revents=POLLIN}])
accept(34, {sa_family=AF_LOCAL, NULL}, [2]) = 128
fcntl(128, F_SETFD, FD_CLOEXEC) = 0

Find the file that the file descriptor is pointing to!

ls -altr /proc/16013
lrwx------. 1 root root 64 Dec 12 15:46 50 -> /data/mysql/zabbix/history_uint#P#p201612160000.ibd
lrwx------. 1 root root 64 Dec 12 15:46 51 -> /data/mysql/zabbix/history_uint#P#p201612170000.ibd

Now we know mysqld process has been waiting for partitioned data file history_uint#P#p201612160000.ibd!

Hope this helps.

Regards,
Wei Shan

Leave a comment

Linux YUM Best Kept Secret – whatprovides

Linux YUM Best Kept Secret – whatprovides

Most of the time, when we are installing a software like Oracle or DB2 on a Linux system, there will be required RPMs that has not been installed yet. If the RPM required is unfamiliar, we will be forced to research and find out what is the RPM about or where to download it.

From Linux Man Pages (whatprovides argument):

Is used to find out which package provides some feature or file. Just use a specific name or a file-glob-syntax wildcards to list the packages available or installed that provide that feature or file.

Usage Example:

When I tried to install DB2, it complains the following error:

DBT3514W  The db2prereqcheck utility failed to find the following 32-bit library file: “libstdc++.so.6”.

So let’s try the argument out!

[root@centos68 server_t]# yum whatprovides libstdc++.so.6
libstdc++-4.4.7-17.el6.i686 : GNU Standard C++ Library
Repo : base
Matched from:
Other : libstdc++.so.6

[root@centos68 server_t]# yum install -y libstdc++-4.4.7-17.el6.i686

[root@centos68 server_t]# yum list libstdc++.i686 0:4.4.7-17.el6
Installed Packages
libstdc++.i686 4.4.7-17.el6 @base

Regards,
Wei Shan

Leave a comment

Investigating Oracle Database Memory Pressure Behavior

Investigating Oracle Database Memory Pressure Behavior

Oracle database will grab all and release as the O/S sees memory pressure.

I came across the above statement recently during a discussion., which led to this blog post. I wasn’t very sure if it was true so I did not reply to it. I’m aware of the behavior when it face CPU pressure but not memory pressure. So, this has definitely piqued my curiosity to find out more.


In Oracle 11gR2 and later, we can choose to use have the following options:

  • Automatic Memory Management (AMM)
    • We configure MEMORY_TARGET and MEMORY_MAX _TARGET and let Oracle database manage the SGA and PGA for us
  • Automatic Shared Memory Management (ASMM)
    • We configure SGA_TARGET and SGA_MAX_SIZE and let Oracle manage SGA for us.
  • Automatic PGA Memory (Usually used together with ASMM)
    • We configure PGA_AGGREGATE_TARGET and let Oracle manage PGA for us.

So, we can be sure that the parameters below affect Oracle database memory usage on a system. (Of course, we are not referring to manual memory tuning.)

  1. MEMORY_MAX _TARGET
  2. MEMORY_TARGET
  3. SGA_TARGET
  4. SGA_MAX_SIZE
  5. PGA_AGGREGATE_TARGET

It is widely understood that the above parameters are merely a a target. It may exceed the optimum size due to other circumstances. For example, the number of users increases beyond limit or insane SQL queries that exceed PGA value. PGA size is the unknown factor here, since we do not know how many users or what kind of queries will be running in the database.

The question here is, when it uses more memory than the target value, does it start to swap or does it grab the free memory from the non-allocated memory from the O/S. I would think that it will use free memory then swap space since it will ask the memory from O/S until it has memory pressure which then, it will start to swap.

Let’s test it out!

Let’s login and check all the baseline numbers.

[oracle@primary ~]$ sqlplus / as sysdba

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> show parameter pga_aggregate_target

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 200M

#free -m

total used free shared buffers cached
Mem: 2002 721 1281 183 24 451
-/+ buffers/cache: 245 1757
Swap: 4031 0 4031


select a.name, to_char(b.value, '999,999,999') bytes,
to_char(round(b.value/1024/1024,1), '99,999.9' ) mbytes
from v$statname a, v$mystat b
where a.statistic# = b.statistic#
and a.name like '%ga memory%';

NAME BYTES MBYTES
-------- ------------ ---------
session uga memory 2,515,088 2.4
session uga memory max 2,515,088 2.4
session pga memory 3,555,880 3.4
session pga memory max 22,889,000 21.8

The PGA_AGGREGATE_TARGET is configure to use 200MB. Currently it is only using 2.4MB. We have 4GB of swap space configured as well.

Let’s create a package and generate the workload

SQL> create or replace package demo_pkg
 as
 type array is table of char(2000) index by binary_integer;
 g_data array;
 end;
 /

SQL> begin
 for i in 1 .. 2000000
 loop
 demo_pkg.g_data(i) := 'x';
 end loop;
 end;
 /
 ERROR at line 1:
 ORA-04030: out of process memory when trying to allocate 16328 bytes (koh-kghu sessi,pl/sql vc2)

The above SQL script definitely uses more than 2GB of memory for sorting, which lead to ORA-04030 error.

From another separate SSH session, let’s monitor the memory usage via vmstat

[root@primary ~]# vmstat 1 6000000
 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r b swpd free buff cache si so bi bo in cs us sy id wa st
 0 0 0 1311424 24952 462624 0 0 1611 58 303 486 6 10 83 1 0
 0 0 0 1311392 24952 462628 0 0 0 16 151 282 0 1 99 0 0
 0 0 0 1311384 24952 462628 0 0 0 0 118 228 0 0 100 0 0
 0 0 0 1311384 24952 462628 0 0 0 32 151 287 0 0 100 0 0
 0 0 0 1311384 24952 462628 0 0 0 0 112 220 0 0 100 0 0
 0 0 0 1311384 24952 462628 0 0 0 0 141 263 0 1 99 0 0
 0 0 0 1311384 24960 462628 0 0 0 52 135 267 0 0 99 1 0
 0 0 0 1311384 24960 462628 0 0 0 0 141 271 0 0 100 0 0
 0 0 0 1311384 24960 462628 0 0 0 0 109 215 0 0 100 0 0
 0 0 0 1311384 24960 462628 0 0 0 32 150 283 0 0 100 0 0
 0 0 0 1311384 24960 462628 0 0 0 0 123 232 0 0 100 0 0
 0 0 0 1311384 24968 462624 0 0 0 20 144 276 0 1 99 0 0
 0 0 0 1311392 24968 462628 0 0 0 32 121 250 0 0 100 0 0
 0 0 0 1311136 24968 462628 0 0 0 0 167 329 1 1 98 0 0
 0 0 0 1311136 24968 462628 0 0 0 0 109 211 0 0 100 0 0
 1 0 0 226260 24968 463068 0 0 280 32 998 310 22 67 10 0 0
 2 4 86096 13744 60 105252 0 93124 12636 93144 2101 1698 13 86 0 1 0
 3 2 314548 14296 60 82612 1596 226304 29708 226304 2234 1695 9 91 0 0 0
 3 4 552488 13732 60 83128 9568 244640 35048 244664 2549 2140 8 90 0 2 0
 2 0 806904 13888 60 82228 1920 267312 34484 267344 2438 1904 9 89 0 2 0

The highlighted values above are the period where the SQL query executed. Noticed that the free memory was at 1.3GB until it drop to 200MB before it actually start to swap.

Summary

From the above, I have highlighted the important points:

  1. It is not possible to “overuse” SGA as it uses LRU algorithm to remove unnecessary objects under memory pressure.
  2. Although we configure PGA to only use a maximum of 200MB, it is possible to exceed this value due to user queries or sessions.
  3. Under memory pressure, the database will try to use available memory until it doesn’t have a choice, which it will choose to use swap space.
  4. When space space run out, the database will hit ORA-04030 error.

Regards,
Wei Shan

References: Expert Oracle Database Architecture -Thomas Kyte

 

 

 

 

Leave a comment

WARNING – Granting of psql/sqlplus executables to sudoers in Linux/Unix

WARNING – Granting of psql/sqlplus executables to sudoers in Linux/Unix

I discovered this behavior accidentally during my project work recently. I needed psql access to a RHEL 6 system. However, the company policy enforces a minimum access policy (Well done!). Thus, my account was given a sudoer permission to only use psql executable. I needed to perform DBA tasks on this system.

Let me show you the issue.

[weishan]$ sudo su -
[sudo] password for weishan:
ユーザー user は'/bin/su -' を root として  上で実行することは許可されていません。すみません。
(This means user not allowed to execute su -)

[weishan]$ tail -n 10 /var/log/messages
tail: `messages' を 読み込み用でオープンできません: 許可がありません
(This means access denied)

[weishan]$ sudo /home/postgres/pgsql/bin/psql -h localhost -p 5432 -U postgres postgres
[sudo] password for weishan:
psql (9.2.4)
Type "help" for help.

Looking at the output above, the sudoer was configured correctly! Now, let’s see..

psql> \!
[root@gsdb03s log]# tail -n 5 /var/log/messages
May 10 15:34:47  snmpd[22564]: Connection from UDP: [ip]:63484->[ip]
May 10 15:34:47 snmpd[22564]: Connection from UDP: [ip]:63484-> [ip]
May 10 15:34:47 snmpd[22564]: Connection from UDP: [ip]:63484-> [ip]
May 10 15:34:47 snmpd[22564]: Connection from UDP: [ip]:63484-> [ip]

From the above, we can show that it is technically possible for unintended DBAs to get root access even if you did not allow them access explicitly. Thus, we should be mindful while granting psql or sqlplus executable.

I did not test this on sqlplus, but the behaviour should be the same since both psql and sqlplus allow the user to “temporarily” exit the console to OS level access.

Moral of the story:

  1. Do not configure sudo for user to access psql/sqlplus to login to postgres as root user directly.
  2. Configure sudo to only allow weishan user to “sudo su -” to postgres user.
  3. Configure logging for sudo access

Regards,
Wei Shan

 

 

Leave a comment

Pacemaker/Corosync/PostgreSQL Issue

Pacemaker/Corosync/PostgreSQL Issue

The moment I sat on my desk, there were tons of tickets complaining that the database were down. It was 2 node PostgreSQL database HA cluster running the following stack.

  • RedHat Linux 6.x
  • Pacemaker/Corosync
  • PostgreSQL 9.2.x
  • Master-Slave synchronous streaming replication between the 2 PostgreSQL nodes

I ran the crm_mon command immediately and found out that there was something weird with the PostgreSQL. The Pacemaker/Corosync HA wasn’t working as intended. The PostgreSQL database was down. Both nodes are online but only node1 was the slave but PostgreSQL wasn’t running on node2. I started going through the logs.

On Node1

Sep 21 04:10:19 node1 postgres[32318]: [5-1] FATAL: could not connect to the primary server: could not connect to server: No route to host
Sep 21 04:10:19 node1 postgres[32318]: [5-2] #011#011Is the server running on host "<IP address>" and accepting
Sep 21 04:10:19 node1 postgres[32318]: [5-3] #011#011TCP/IP connections on port 5432?
Sep 21 04:10:19 node1 postgres[32318]: [5-4] #011
Sep 21 04:10:22 node1 lrmd[3185]: notice: operation_finished: pgsql_monitor_7000:32319 [ 2015/09/21_04:10:22 INFO: Master does not exist. ]
Sep 21 04:10:22 node1 lrmd[3185]: notice: operation_finished: pgsql_monitor_7000:32319 [ 2015/09/21_04:10:22 WARNING: My data is out-of-date. sta
tus=DISCONNECT ]

On Node2

Sep 21 15:28:41 node2 pengine[3194]: notice: LogActions: Start pgsql:1#011(node2)
Sep 21 15:28:41 node2 pengine[3194]: notice: process_pe_message: Calculated Transition 1120: /var/lib/pacemaker/pengine/pe-input-234.bz2
Sep 21 15:28:42 node2 postgres[61043]: [1-1] LOG: database system was shut down in recovery at 2015-09-19 23:36:56 SGT
Sep 21 15:28:42 node2 postgres[61044]: [1-1] FATAL: the database system is starting up
Sep 21 15:28:42 node2 postgres[61043]: [2-1] LOG: entering standby mode
Sep 21 15:28:42 node2 postgres[61043]: [3-1] LOG: could not read from log file 59, segment 114, offset 0: No such file or directory
Sep 21 15:28:42 node2 postgres[61043]: [4-1] LOG: invalid primary checkpoint record
Sep 21 15:28:42 node2 postgres[61043]: [5-1] LOG: could not read from log file 59, segment 114, offset 0: No such file or directory
Sep 21 15:28:42 node2 postgres[61043]: [6-1] LOG: invalid secondary checkpoint record
Sep 21 15:28:42 node2 postgres[61043]: [7-1] PANIC: could not locate a valid checkpoint record

Explanation

  • Node1 thinks that it is the slave and hence was trying to connect to the master database on node2
  • Node1 wasn’t able to connect to the IP address because that IP address was managed by Pacemaker/Corosync and since PostgreSQL was down, the “IP address” resource won’t be running!
  • Node2 couldn’t start the database because it was corrupted

What I need to do now is to promote the non-corrupted database(node1) to master and re-setup the slave on node2!

Resolution

Execute on node2

Stop the pacemaker/corosync services on node2

root#crm node standby node2
root#"/etc/init.d/pacemaker stop
root#/etc/init.d/cman stop

Execute on node1

Promote node1 to master in both pacemaker/corosync and database level. You don’t have to reboot the server but I just did it because I wanted a “clean” state in node1

root#"crm resource migrate master-group node1
postgres#pg_ctl promote -D $PGDATA
root# init 6

Execute on node2

Re-setup node2 as slave by seeding it from node1. Start pacemaker/corosync services on node2 once the database is re-seeded

root# rm -fr $PGDATA
postgres#pg_basebackup -D /var/lib/pgsql/9.2/data -h 192.168.17.221 --xlog -P -v
root#crm node online node2
root#"/etc/init.d/pacemaker stop
root#/etc/init.d/cman stop
root# crm node online node2

Cheers,
Wei Shan

2 Comments

DRBD – not defined in your config (for this host)

DRBD – <resource> not defined in your config (for this host)

I encountered the following error while creating a new DRBD resource for the HA cluster.

# drbdadm create-md r0
--== Thank you for participating in the global usage survey ==--
The server's response is:
'r0' not defined in your config (for this host).

Troubleshooting process….

Hostname is linuxzfs1.local

[root@linuxzfs1 drbd.d]# uname -a
Linux linuxzfs1.local 3.10.0-229.11.1.el7.x86_64 #1 SMP Thu Aug 6 01:06:18 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

However, resource is defined as linuxzfs1.(Note the highlighted)

[root@linuxzfs1 drbd.d]# cat /etc/drbd.dr0.res
resource r0 {
 device /dev/drbd1;
 disk /dev/sdb;
 meta-disk internal;
 on linuxzfs1 {
 address 192.168.56.181:7789;
 }
 on linuxzfs2 {
 address 192.168.56.182:7789;
 }
}

Modify the r0.res to ensure the hostname is defined correctly both in r0.res and /etc/hostname

[root@linuxzfs1 drbd.d]# cat r0.res
resource r0 {
 device /dev/drbd1;
 disk /dev/sdb;
 meta-disk internal;
 on linuxzfs1.local {
 address 192.168.56.181:7789;
 }
 on linuxzfs2.local {
 address 192.168.56.182:7789;
 }
}

After the file have been modified correctly, the drbdadm command executed successfully now.

# drbdadm create-md r0
initializing activity log
NOT initializing bitmap
Writing meta data...
New drbd meta data block successfully created.

Regards,
Wei Shan

Leave a comment

Disabling Transparent Huge Pages in CentOS 7.x

Disabling Transparent Huge Pages in CentOS 7.x

Transparent Huge Pages(THP) have been introduced since RedHat/CentOS 6. In CentOS 7,this feature has been turned on by default. Even THP is supposed to increased to memory performance, various database vendors(Oracle,MariaDB) are recommending to turn off THP. It seems to cause performance degradation when THP is enabled.

To verify if THP is enabled.

# cat /sys/kernel/mm/transparent_hugepage/defrag
[always] madvise never
# cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

Edit the rc.local file

Add the following to the bottom of /etc/rc.d/rc.local

if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi

Make rc.local file executable

chmod u+x /etc/rc.d/rc.local

Reboot

# init 6

To verfiy if THP is disabled.

# cat /sys/kernel/mm/transparent_hugepage/defrag
always madvise [never]
# cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]

Regards,
Wei Shan

Leave a comment