之前在公司将Gitlab从8.x版本编译升级到了11.x版本,存在两个大问题,一个就是有些仓库merge代码的时候一直转,查看数据库有行数据被锁死了,后面将mysql从5.5升级到5.7解决了。第二个问题就是在ci的时候有些任务匹配正则的时候导致sidekiq崩溃,当时查到的原因应该是re2这个依赖库兼容性有问题。后面离职后就没管了,最近前同事告诉我解决了,很棒,将大致思路分享下。
1.崩溃现象
通过日志可以发现
1. 所有的崩溃都是在执行PostReceive这个worker时发生的。
2.通过call graph可以发现所有的崩溃root cause都是因为untrusted_regexp.rb:25里的段错误造成的
2020-02-24T20:21:34.562Z 30348 TID-grh05pnog UpdateMergeRequestsWorker JID-71d9b9c116cc958d619206d8 INFO:
start
/home/gitlab/gitlab/lib/gitlab/untrusted_regexp.rb:25: [BUG] Segmentation fault at 0x0000000000000000ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
-- Control frame information -----------------------------------------------
c:0119 p:---- s:0627 e:000626 CFUNC :error
c:0118 p:0068 s:0623 e:000620 METHOD /home/gitlab/gitlab/lib/gitlab/untrusted_regexp.rb:25 [FINISH]
c:0117 p:---- s:0614 e:000613 CFUNC :new
2.故障分析处理
2.1尝试在崩溃代码前加日志,观察到底在处理什么业务相关的匹配。
# [root@xxx log]# cat /home/gitlab/gitlab/lib/gitlab/untrusted_regexp/ruby_syntax.rb
# frozen_string_literal: true
module Gitlab
class UntrustedRegexp
# This class implements support for Ruby syntax of regexps
# and converts that to RE2 representation:
# /<regexp>/<flags>
class RubySyntax
PATTERN = %r{^/(?<regexp>.*)/(?<flags>[ismU]*)$}.freeze
# Checks if pattern matches a regexp pattern
# but does not enforce it's validity
def self.matches_syntax?(pattern)
pattern.is_a?(String) && pattern.match(PATTERN).present?
end
# The regexp can match the pattern `/.../`, but may not be fabricatable:
# it can be invalid or incomplete: `/match ( string/`
def self.valid?(pattern, fallback: false)
puts "xudy pattern is: \"#{pattern}\""
!!self.fabricate(pattern, fallback: fallback)
end
def self.fabricate(pattern, fallback: false)
self.fabricate!(pattern, fallback: fallback)
rescue RegexpError
nil
end
def self.fabricate!(pattern, fallback: false)
raise RegexpError, 'Pattern is not string!' unless pattern.is_a?(String)
matches = pattern.match(PATTERN)
raise RegexpError, 'Invalid regular expression!' if matches.nil?
begin
create_untrusted_regexp(matches[:regexp], matches[:flags])
rescue RegexpError
raise unless fallback &&
Feature.enabled?(:allow_unsafe_ruby_regexp, default_enabled: false)
create_ruby_regexp(matches[:regexp], matches[:flags])
end
end
def self.create_untrusted_regexp(pattern, flags)
pattern.prepend("(?#{flags})") if flags.present?
puts "xudy flaged pattern is: \"#{pattern}\" "
UntrustedRegexp.new(pattern, multiline: false)
end
private_class_method :create_untrusted_regexp
def self.create_ruby_regexp(pattern, flags)
options = 0
options += Regexp::IGNORECASE if flags&.include?('i')
options += Regexp::MULTILINE if flags&.include?('m')
Regexp.new(pattern, options)
end
private_class_method :create_ruby_regexp
end
end end
可以发现每次崩溃前都会输出这样的日志:
2020-02-25T22:42:34.787Z 144464 TID-ov0hhbaas UpdateMergeRequestsWorker JID-da8363d5018135858fd6ee16 INFO:
done: 0.187 sec
2020-02-25T22:42:34.791Z 144464 TID-ov0hhbaas PostReceive JID-8e046a6ff89a71a5223db273 INFO: start
2020-02-25T22:42:34.805Z 144464 TID-ov0hhbaas PostReceive JID-8e046a6ff89a71a5223db273 ERROR: xudy post
receive worker: "fos" "key-11733" "{}"
xudy post receive worker: "fos" "key-11733" "{}"
2020-02-25T22:42:35.194Z 144464 TID-ov0hhbakc ProcessCommitWorker JID-e46ac41e6469409194fde0cb INFO: done:
0.538 sec
2020-02-25T22:42:35.199Z 144464 TID-ov0hhbakc PostReceive JID-818e393f097f60bd15565e32 INFO: start
2020-02-25T22:42:35.200Z 144464 TID-ov0hhb88g ProcessCommitWorker JID-c32e2c1fd8f0edcc4f10397a INFO: done:
0.588 sec
2020-02-25T22:42:35.209Z 144464 TID-ov0hhb88g PostReceive JID-6b9b40108350752d3874c283 INFO: start
2020-02-25T22:42:35.214Z 144464 TID-ov0hhbakc PostReceive JID-818e393f097f60bd15565e32 ERROR: xudy post
receive worker: "office-company-api" "key-16326" "{}"
xudy post receive worker: "office-company-api" "key-16326" "{}"
2020-02-25T22:42:35.241Z 144464 TID-ov0hhb88g PostReceive JID-6b9b40108350752d3874c283 ERROR: xudy post
receive worker: "fs-workbench-web" "key-15813" "{}"
xudy post receive worker: "fs-workbench-web" "key-15813" "{}"
xudy pattern is: "/^fds-cn-.*$/"
xudy flaged pattern is: "^fds-cn-.*$"
/home/gitlab/gitlab/lib/gitlab/untrusted_regexp.rb:25: [BUG] Segmentation fault at 0x0000000000000000
ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
-- Control frame information -----------------------------------------------
结合堆栈上下文应该是解析ci yaml文件,然后正则匹配only/except等标签时触发的异常。
2.2验证主机re2是否正常
xudy-re2.rb
require 're2' r = RE2::Regexp.new('w(\d)(\d+)') m = r.match("w1234") puts m.string
执行脚本:bundle exec ruby xudy-re2.rb
ruby:symbol lookup error:/home/gitlab/gitlab/vendor/bundle/ruby/2.5.0/gems/re2-1.1.1/lib/re2.so:undefined symbol:_2xxxx
发现re2执行失败,网上查了下结论是gem包版本和so版本不匹配造成的。https://github.com/google/re2/issues/196
2.3重装re2
cd /home/gitlab/gitlab/bin bundle exec gem uninstall re2 bundle install
再次执行脚本,输出w1234,可以正常运行
重装后观察日志PostReceive worker运行到相关代码已经不崩溃了。