Gangmax Blog

How to Count Words for Multi-language Text in Ruby

The code below came from here and here.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#! /usr/bin/env ruby
# coding: utf-8
class String
def contains_cjk?
!!(self =~ /\p{Han}|\p{Katakana}|\p{Hiragana}|\p{Hangul}/)
end
end

def countword(text)
count = 0
text.split.inject(1) do |sum, word|
if word.contains_cjk?
sum += word.length
else
sum += 1
end
count = sum
end
return count
end

s = 'The last Olympics was held in 北京'
puts "word count = #{countword(s)}"

Comments