Gitlab sidekiq队列频繁崩溃问题解决

之前在公司将Gitlab从8.x版本编译升级到了11.x版本,存在两个大问题,一个就是有些仓库merge代码的时候一直转,查看数据库有行数据被锁死了,后面将mysql从5.5升级到5.7解决了。第二个问题就是在ci的时候有些任务匹配正则的时候导致sidekiq崩溃,当时查到的原因应该是re2这个依赖库兼容性有问题。后面离职后就没管了,最近前同事告诉我解决了,很棒,将大致思路分享下。


1.崩溃现象

通过日志可以发现

1. 所有的崩溃都是在执行PostReceive这个worker时发生的。
2.通过call graph可以发现所有的崩溃root cause都是因为untrusted_regexp.rb:25里的段错误造成的

2020-02-24T20:21:34.562Z 30348 TID-grh05pnog UpdateMergeRequestsWorker JID-71d9b9c116cc958d619206d8 INFO:
start
/home/gitlab/gitlab/lib/gitlab/untrusted_regexp.rb:25: [BUG] Segmentation fault at 0x0000000000000000ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0119 p:---- s:0627 e:000626 CFUNC  :error
c:0118 p:0068 s:0623 e:000620 METHOD /home/gitlab/gitlab/lib/gitlab/untrusted_regexp.rb:25 [FINISH]
c:0117 p:---- s:0614 e:000613 CFUNC  :new



2.故障分析处理

2.1尝试在崩溃代码前加日志观察到底在处理什么业务相关的匹配

# [root@xxx log]# cat /home/gitlab/gitlab/lib/gitlab/untrusted_regexp/ruby_syntax.rb

# frozen_string_literal: true
module Gitlab
 class UntrustedRegexp
   # This class implements support for Ruby syntax of regexps
   # and converts that to RE2 representation:
   # /<regexp>/<flags>
   class RubySyntax
     PATTERN = %r{^/(?<regexp>.*)/(?<flags>[ismU]*)$}.freeze
     # Checks if pattern matches a regexp pattern
     # but does not enforce it's validity
     def self.matches_syntax?(pattern)
       pattern.is_a?(String) && pattern.match(PATTERN).present?
     end
     # The regexp can match the pattern `/.../`, but may not be fabricatable:
     # it can be invalid or incomplete: `/match ( string/`
     def self.valid?(pattern, fallback: false)
   puts "xudy pattern is: \"#{pattern}\""
       !!self.fabricate(pattern, fallback: fallback)
end
     def self.fabricate(pattern, fallback: false)
       self.fabricate!(pattern, fallback: fallback)
     rescue RegexpError
       nil
     end
     def self.fabricate!(pattern, fallback: false)
       raise RegexpError, 'Pattern is not string!' unless pattern.is_a?(String)
       matches = pattern.match(PATTERN)
       raise RegexpError, 'Invalid regular expression!' if matches.nil?
begin
         create_untrusted_regexp(matches[:regexp], matches[:flags])
       rescue RegexpError
         raise unless fallback &&
             Feature.enabled?(:allow_unsafe_ruby_regexp, default_enabled: false)
         create_ruby_regexp(matches[:regexp], matches[:flags])
       end
     end
     def self.create_untrusted_regexp(pattern, flags)
       pattern.prepend("(?#{flags})") if flags.present?
   puts "xudy flaged pattern is: \"#{pattern}\" "
       UntrustedRegexp.new(pattern, multiline: false)
end
     private_class_method :create_untrusted_regexp
       def self.create_ruby_regexp(pattern, flags)
         options = 0
         options += Regexp::IGNORECASE if flags&.include?('i')
         options += Regexp::MULTILINE if flags&.include?('m')
         Regexp.new(pattern, options)
       end
       private_class_method :create_ruby_regexp
     end
end end

可以发现每次崩溃前都会输出这样的日志:
 2020-02-25T22:42:34.787Z 144464 TID-ov0hhbaas UpdateMergeRequestsWorker JID-da8363d5018135858fd6ee16 INFO:
 done: 0.187 sec
 2020-02-25T22:42:34.791Z 144464 TID-ov0hhbaas PostReceive JID-8e046a6ff89a71a5223db273 INFO: start
 2020-02-25T22:42:34.805Z 144464 TID-ov0hhbaas PostReceive JID-8e046a6ff89a71a5223db273 ERROR: xudy post
 receive worker: "fos"  "key-11733"  "{}"
 xudy post receive worker: "fos"  "key-11733"  "{}"
 2020-02-25T22:42:35.194Z 144464 TID-ov0hhbakc ProcessCommitWorker JID-e46ac41e6469409194fde0cb INFO: done:
 0.538 sec
 2020-02-25T22:42:35.199Z 144464 TID-ov0hhbakc PostReceive JID-818e393f097f60bd15565e32 INFO: start
 2020-02-25T22:42:35.200Z 144464 TID-ov0hhb88g ProcessCommitWorker JID-c32e2c1fd8f0edcc4f10397a INFO: done:
 0.588 sec
 2020-02-25T22:42:35.209Z 144464 TID-ov0hhb88g PostReceive JID-6b9b40108350752d3874c283 INFO: start
 2020-02-25T22:42:35.214Z 144464 TID-ov0hhbakc PostReceive JID-818e393f097f60bd15565e32 ERROR: xudy post
 receive worker: "office-company-api"  "key-16326"  "{}"
 xudy post receive worker: "office-company-api"  "key-16326"  "{}"
 2020-02-25T22:42:35.241Z 144464 TID-ov0hhb88g PostReceive JID-6b9b40108350752d3874c283 ERROR: xudy post
 receive worker: "fs-workbench-web"  "key-15813"  "{}"
 xudy post receive worker: "fs-workbench-web"  "key-15813"  "{}"
 xudy pattern is: "/^fds-cn-.*$/"
 xudy flaged pattern is: "^fds-cn-.*$"
 /home/gitlab/gitlab/lib/gitlab/untrusted_regexp.rb:25: [BUG] Segmentation fault at 0x0000000000000000
 ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
 -- Control frame information -----------------------------------------------

结合堆栈上下文应该是解析ci yaml文件,然后正则匹配only/except等标签时触发的异常。


2.2验证主机re2是否正常

xudy-re2.rb

require 're2'
r = RE2::Regexp.new('w(\d)(\d+)')
m = r.match("w1234")
puts m.string

执行脚本:bundle exec ruby xudy-re2.rb

ruby:symbol lookup error:/home/gitlab/gitlab/vendor/bundle/ruby/2.5.0/gems/re2-1.1.1/lib/re2.so:undefined symbol:_2xxxx

发现re2执行失败,网上查了下结论是gem包版本和so版本不匹配造成的。https://github.com/google/re2/issues/196


2.3重装re2

 cd /home/gitlab/gitlab/bin
  bundle exec gem uninstall re2
  bundle install

再次执行脚本,输出w1234,可以正常运行

重装后观察日志PostReceive worker运行到相关代码已经不崩溃了。


参考:shell + python脚本监控gitlab ci停滞任务数

anzhihe 安志合个人博客,版权所有 丨 如未注明,均为原创 丨 转载请注明转自:https://chegva.com/3712.html | ☆★★每天进步一点点,加油!★★☆ | 

您可能还感兴趣的文章!

发表评论

电子邮件地址不会被公开。 必填项已用*标注