Gangmax Blog

自由之思想,独立之精神

How to Count Words for Multi-language Text in Ruby

| Comments

The code below came from here and here.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#! /usr/bin/env ruby
# coding: utf-8
class String
  def contains_cjk?
    !!(self =~ /\p{Han}|\p{Katakana}|\p{Hiragana}|\p{Hangul}/)
  end
end

def countword(text)
  count = 0
  text.split.inject(1) do |sum, word|
    if word.contains_cjk?
      sum += word.length
    else
      sum += 1
    end
    count = sum
  end
  return count
end

s = 'The last Olympics was held in 北京'
puts "word count = #{countword(s)}"

Comments