Troubleshooting ORA-27300 ‘fork failed with status: 11’ on SLES12 (SUSE /Linux 7)
近日新裝的一套ORACLE 12.2 RAC on SLES 12在使用srvctl start database 有時失敗, alert log 中出現ORA-27300、ORA-27301、ORA-27302錯誤, 從錯誤不難看出是OS資源資源限制, 發現這可能以後會是個常見問題, 因為這是SLES 12的預設引數限制,而且ORACLE的安裝文件和最佳實踐中也未提到該引數(至少當前沒有)。
alert log file
2018-11-19 09:29:03.940000 +08:00 Process P01E died, see its trace file Process startup failed, error stack: Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_psp0_32198.trc: ORA-27300: OS system dependent operation:fork failed with status: 11 ORA-27301: OS failure message: Resource temporarily unavailable ORA-27302: failure occurred at: skgpspawn3 2018-11-19 09:29:04.940000 +08:00 Process P01F died, see its trace file Process startup failed, error stack: Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_psp0_32198.trc: ORA-27300: OS system dependent operation:fork failed with status: 11 ORA-27301: OS failure message: Resource temporarily unavailable ORA-27302: failure occurred at: skgpspawn3
Note:
從當前的錯誤fork failed with status: 11可以大概猜測是最大程序數限制導致fork()程序時失敗, 另外一處是psp0程序fork() Pnnn的程序。
PSP trace file
oracle@kdanbob01:/home/oracle> vi /oracle/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_psp0_32198.trc Trace file /oracle/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_psp0_32198.trc Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production Build label:RDBMS_12.2.0.1.0_LINUX.X64_170125 ORACLE_HOME:/oracle/app/oracle/product/12.2.0/db_1 System name:Linux Node name:kdanbob01 Release:4.4.21-69-default Version:#1 SMP Tue Oct 25 10:58:20 UTC 2016 (9464f67) Machine:x86_64 Instance name: anbob1 Redo thread mounted by this instance: 0 Oracle process number: 4 Unix process pid: 32198, image: oracle@kdanbob01 (PSP0) *** 2018-11-19T09:15:26.133868+08:00 Process startup failed, error stack: ORA-27300: OS system dependent operation:fork failed with status: 11 ORA-27301: OS failure message: Resource temporarily unavailable ORA-27302: failure occurred at: skgpspawn3 OS - DIAGNOSTICS ---------------- loadavg : 42.40 27.56 11.09 Memory (Avail / Total) = 176425.47M / 515676.77M Swap (Avail / Total) = 32768.00M /32768.00M Max user processes limits(s / h) =65536 / 65536
oracle@kdanbob01:/home/oracle> ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 2062629
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 16384
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
what’s psp Process?
The first process that will be started when we start instance is PSP process. This is called PROCESS SPAWNER. This process is introduced in 10g and is responsible for creating and managing other oracle backgroung processes.
Pnnn程序都知道是並行程序,檢視並行相關的引數
SQL> show parameter parallel PARAMETER_NAMETYPEVALUE ------------------------------------------------------------ ----------- --------------------------------- containers_parallel_degreeinteger65535 fast_start_parallel_rollbackstringLOW parallel_adaptive_multi_userbooleanFALSE parallel_degree_limitstringCPU parallel_degree_policystringMANUAL parallel_execution_message_sizeinteger16384 parallel_force_localbooleanTRUE parallel_instance_groupstring parallel_max_serversinteger60 parallel_min_percentinteger0 parallel_min_serversinteger60 parallel_min_time_thresholdstringAUTO parallel_servers_targetinteger60 parallel_threads_per_cpuinteger2 recovery_parallelisminteger8 SQL> @pd containers_parallel_degree Show all parameters and session values from x$ksppi/x$ksppcv... INDX I_HEX NAMEVALUEDESCRIPTION ---------- ----- ------------------------------ ------------------------------ --------------------------------------- 4749128D containers_parallel_degree65535Parallel degree for a CONTAINERS() query
Tip:
開始我以為是12cR2 新特性導致的該問題,containers_parallel_degree 當使用containers()查詢或開啟PDB時並行的限制值, 當前環境是個非CDB模式的資料庫, 不過我個人還是建議在12.2 中使用多租戶,哪怕是1PDBs, 在19c NO-CDB都是NO supperted。 如果只是因為是不想用CDB而多SCHEMA模式,我認為CDB優點更多:
言歸正傳回到剛才的問題,trace檔案中提示系統max user process是65536, oracle當前使用者環境限制是16384, 手動ps了一下當前使用者程序根本沒有那麼多,只有一些後臺程序不該存在這個問題, 在MOS中找到了一下找到找到的原因SLES 12: Database Startup Error with ORA-27300 ORA-27301 ORA-27303 While Starting using Srvctl (文件 ID 2340986.1)
From SLES12 onwards, systemd is used instead of initd and the OHASD server is only allowed to open a maximum of 512 tasks.
SYSTEMD從SLES12後替代了原來的OHASD服務, SYSTEMD是一個很受爭議的東西,知乎上有篇說的不錯,但是在LINUX上是主推。
在LINUX 6及以前的版本查’max user processes’ 使用 ‘ulimit -a’ ,但是以LINUX7(SLES12) 後可能需要查DefaultTasksMax (default value is 512).
DefaultTasksMax ==>systemd limited maximum number of tasks that may be created in the unit.This setting also effect maxpid value on OS.
$ > systemctl status ohasd
â— ohasd.service – LSB: Start and Stop Oracle High Availability Service
Loaded: loaded (/etc/init.d/ohasd; bad; vendor preset: disabled)
Active: active (exited) since Mon 2018-11-19 11:36:59 CST; 23h ago
Docs: man:systemd-sysv-generator(8)
Process: 12385 ExecStart=/etc/init.d/ohasd start (code=exited, status=0/SUCCESS)
Tasks: 726 (limit:
65535
) — 這個值在修改前應該是512
解決辦法
Configure the value of DefaultTasksMax to 65535 in the file /etc/systemd/system.conf or or set the TasksMax value properly for the ohasd systemd service.
這裡我們使用的解決方法是修改/etc/systemd/system.conf 把DefaultTasksMax改成 了65535,當然也可以直接改成’infinity’ 無限。
建議在SLSE 12或以後的版本,或LINUX 7等以後的版本時,先了解一下系統變化,至少在安裝RAC時, 把DefaultTasksMax修改加入到安裝方檔中去, 可能Oracle 在以後的安裝文件或最佳實踐中會增加該內容。