I want to do this because my high school class has a lot of messages there(over 4k items during the last 10 years). Here is the brief introduction about how I did this.
I need a way to access the web page directly from Ruby, I do by the following steps:
Copy the cookies content after login “chinaren.com” within the Firebug console;
In the ruby code, using the following libraries to get the web page content:
require 'net/http' # Open http connection.
In order to get the messages content from the web page, besides ‘nokogiri’, the regular expression is also needed, to get specific content.
After get the messages content, the next step is to persist it, either to file or to database. For later data mining, a database is a better destination. I used ‘sqlite3‘, which also has very good Ruby support. In order to operate sqlite3 database in Ruby, you have to install the “sqlite3” gem; in order to install this gem, you need to install the following packages first in debian/Ubuntu(This solution comes from here):
sudo apt-get install sqlite3 sqlite3-doc sqliteman libsqlite3-dev # The error below will be displayed when running "gem install sqlite3" if the last package in the list was not included.