I want to do this because my high school class has a lot of messages there(over 4k items during the last 10 years). Here is the brief introduction about how I did this.
I need a way to access the web page directly from Ruby, I do by the following steps:
Copy the cookies content after login “chinaren.com” within the Firebug console;
In the ruby code, using the following libraries to get the web page content:
1 2 3 4
require'net/http'# Open http connection. require'open-uri'# As above. require'json'# Chinaren.com is using JSON as the data format. require'nokogiri'# Parse the html/content.
In order to get the messages content from the web page, besides ‘nokogiri’, the regular expression is also needed, to get specific content.
After get the messages content, the next step is to persist it, either to file or to database. For later data mining, a database is a better destination. I used ‘sqlite3‘, which also has very good Ruby support. In order to operate sqlite3 database in Ruby, you have to install the “sqlite3” gem; in order to install this gem, you need to install the following packages first in debian/Ubuntu(This solution comes from here):
sudo apt-get install sqlite3 sqlite3-doc sqliteman libsqlite3-dev # The error below will be displayed when running "gem install sqlite3" if the last package in the list was not included.
Fetching: sqlite3-1.3.6.gem (100%) Building native extensions. This could take a while... ERROR: Error installing sqlite3: ERROR: Failed to build gem native extension.
/home/user/.rvm/rubies/ruby-1.9.2-p290/bin/ruby extconf.rb checking for sqlite3.h... no sqlite3.h is missing. Try 'port install sqlite3 +universal' or 'yum install sqlite-devel' and check your shared library search path (the location where your sqlite3 shared library is located). *** extconf.rb failed *** Could not create Makefile due to some reason, probably lack of necessary libraries and/or headers. Check the mkmf.log file for more details. You may need configuration options.
Gem files will remain installed in /home/user/.rvm/gems/ruby-1.9.2-p290/gems/sqlite3-1.3.6 for inspection. Results logged to /home/user/.rvm/gems/ruby-1.9.2-p290/gems/sqlite3-1.3.6/ext/sqlite3/gem_make.out