Hi,
A couple of months back, we updated our local OBS server to OBS 2.5.4. Ever since this update, we've had a lot of issues with clockwork and delayed_job.
As soon as we updated OBS, /srv/www/obs/api/tmp/clockworkd.clock.output said:
I, [2014-09-12T20:30:55.163170 #29041] INFO -- : Triggering 'fetch notifications'
E, [2014-09-12T20:31:00.163931 #29041] ERROR -- : could not obtain a database connection within 5.000 seconds (waited 5.000 seconds) (ActiveRecord::ConnectionTimeoutError)
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:190:in `block in wait_poll'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:181:in `loop'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:181:in `wait_poll'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:136:in `block in poll'
/usr/lib64/ruby/2.0.0/monitor.rb:211:in `mon_synchronize'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:146:in `synchronize'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:134:in `poll'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:423:in `acquire_connection'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:356:in `block in checkout'
/usr/lib64/ruby/2.0.0/monitor.rb:211:in `mon_synchronize'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:355:in `checkout'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:265:in `block in connection'
/usr/lib64/ruby/2.0.0/monitor.rb:211:in `mon_synchronize'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:264:in `connection'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:294:in `with_connection'
/srv/www/obs/api/config/clock.rb:39:in `block in <module:Clockwork>'
/usr/lib64/ruby/gems/2.0.0/gems/clockwork-0.7.0/lib/clockwork/event.rb:51:in `call'
/usr/lib64/ruby/gems/2.0.0/gems/clockwork-0.7.0/lib/clockwork/event.rb:51:in `execute'
/usr/lib64/ruby/gems/2.0.0/gems/clockwork-0.7.0/lib/clockwork/event.rb:41:in `block in run'
This continued for a few days until I realized we weren't receiving any build failure notifications. Since I couldn't debug the issue - and I wasn't aware of the fact that clockwork log file was in "tmp" directory instead of "log" - I simply restarted /etc/init.d/obsapidelayed service. Immediately, our users were bombarded with a build failure mails!
Running "mysqladmin processlist", showed that there were more than 34-36 database connections open - almost each and every one of them had "Sleep" mentioned in the "Command" column.
The same story repeated a couple of weeks back which triggered me to make the following changes in database.yml:
production:
..
- pool: 30
+ pool: 50
+ checkout_timeout: 15
After this change, I managed to keep delayed_job happy for almost 2 weeks. Last week, things once again went to bad state:
I, [2014-11-12T16:09:31.940219 #6089] INFO -- : Triggering 'status.refresh'
E, [2014-11-12T16:09:36.938688 #6089] ERROR -- : could not obtain a database connection within 15.000 seconds (waited 15.000 seconds) (ActiveRecord::ConnectionTimeoutError)
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:190:in `block in wait_poll'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:181:in `loop'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:181:in `wait_poll'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:136:in `block in poll'
/usr/lib64/ruby/2.0.0/monitor.rb:211:in `mon_synchronize'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:146:in `synchronize'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:134:in `poll'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:423:in `acquire_connection'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:356:in `block in checkout'
/usr/lib64/ruby/2.0.0/monitor.rb:211:in `mon_synchronize'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:355:in `checkout'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:265:in `block in connection'
/usr/lib64/ruby/2.0.0/monitor.rb:211:in `mon_synchronize'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:264:in `connection'
/usr/lib64/ruby/gems/2.0.0/gems/activerecord-4.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:294:in `with_connection'
/srv/www/obs/api/config/clock.rb:39:in `block in <module:Clockwork>'
/usr/lib64/ruby/gems/2.0.0/gems/clockwork-0.7.0/lib/clockwork/event.rb:51:in `call'
/usr/lib64/ruby/gems/2.0.0/gems/clockwork-0.7.0/lib/clockwork/event.rb:51:in `execute'
/usr/lib64/ruby/gems/2.0.0/gems/clockwork-0.7.0/lib/clockwork/event.rb:41:in `block in run'
As of now, I've 57 connections to the database:
# mysqladmin processlist | grep frontend | wc -l
57
(Ignore the "grep", I'm just being lazy for skipping the table borders and the table header. There is just one database served by the MySQL server)
Regarding mysql's max connections:
# mysqladmin -p variables | grep max_conn
Enter password:
| max_connect_errors | 10 |
| max_connections | 151 |
Apparently, we keep filling up ActiveRecord's connection pool without freeing any of the connections. Not sure if the issue occurs because we have set "fetch_notifications" job to be run in a different thread:
every(27.seconds, 'fetch notifications', thread: true) do
ActiveRecord::Base.connection_pool.with_connection do |sql|
# this will return if there is already a thread running
UpdateNotificationEvents.new.perform
end
end
(Yes, I changed the duration from 17 to 27 seconds after I increased the timeout in database.yml from 5 to 15)
Has someone else found this issue? I can't find any solution other than setting a daily cron job to restart obsapidelayed :-/
Regards,
Srinidhi.
--
To unsubscribe, e-mail: opensuse-buildservice+unsubscribe(a)opensuse.org
To contact the owner, e-mail: opensuse-buildservice+owner(a)opensuse.org