The Readerware Newsletter
Welcome to The Readerware Newsletter.
Readerware 1.95 - Important Announcement
Readerware 1.95 is an important maintenance release, all users should upgrade.
Recently The Library of Congress contacted Readerware about
traffic to their web site. A recent crash of the LOC site
may have been caused inadvertently by a Readerware user flooding the web
site with requests.
For the average Readerware user, this is not an issue. If you use a dial up connection
to the internet, the speed of your connection means that traffic will not cause
problems. Even using a home based high speed connection like DSL or cable, it is
difficult to generate the kind of traffic that caused problems at the LOC site.
But if you are using Readerware at the office with a direct internet connection,
it is possible to flood a site with requests, especially if you are getting a lot
of not found errors.
To avoid this problem, Readerware 1.95 now incorporates a traffic "governor" that
will ensure that no web site is flooded with requests. The way this operates is
that before sending a request to a web site, Readerware checks the elapsed time since
the last request was sent. If below a certain threshold, Readerware will wait
before sending the next request.
For those of you with slow dial up connections it probably won't make any difference,
you will already naturally be above the threshold. So this governor will not make
an already slow connection slower still. But if you do have a high speed connection
you may notice that some batches take longer to process. None of this effects
manual lookups using the Readerware browser.
I hope this will resolve the problems with The Library of Congress and prevent similar
problems from occurring at other sites. I would just like to point out that The
Library of Congress have been very supportive and worked with us to resolve these
issues. According to the LOC, "Our system is still somewhat new and we limit
access to 250 simultaneous users. We are at that maximum number from
9:00 am until 4:00 pm Monday through Friday. Batch search facilities make
it difficult for the average user to compete."
So, please upgrade to Readerware 1.95 as soon as possible. If you do want to
run a large batch at the Library of Congress site, consider doing it outside
their peak hours. Thank you for your cooperation.
Other Readerware 1.95 Features
There are other reasons to upgrade to Readerware 1.95. Some URL handling
changes at Borders and 1BookStreet required a fix, so if you access those
sites you will want to upgrade. There are new sites and other minor
enhancements:
Plus some enhancements to support the new extraction customization facility. You
can read all about this powerful new feature below.
Readerware gets Wired!
Did you catch the piece on Readerware in the latest issue of
Wired magazine? Check out page 102, in the article "Cat Hack Fever". It
was a small mention, but many of you have sharp eyes, so welcome
to all the new Wired readers who now also subscribe to the Readerware
Newsletter!
Readerware Privacy Policy
There is no change to the Readerware privacy policy, but I felt it was
time to spell things out. It seems like a lot of new programs these days
are tracking you're every move on the net. Ever read the small print
on some of those license agreements? Did you realize that your favorite
MP3 player may be dutifully reporting back on every song you play?
Readerware does none of this and never will. So rest easy. I have tried
to put it all in plain english, the privacy policy link is on all web pages
and can be read at:
Readerware CD
Christmas is coming, don't forget the Readerware CD,
it makes a great gift for the book lovers and collectors
on your gift list.
The Readerware CD includes not just Readerware for
Windows but all the other versions too. Plus it includes
the Java runtimes for Linux, Solaris etc. Everything you
need to run Readerware on all supported platforms.
The Readerware for the Palm Pilot CD includes all the
above plus Palm support.
You will receive the Readerware CD packaged in a
virtually indestructible ClamShell TrimPak. Durable but
flexible, the clear Polyethylene case stores the CD
securely. Much tougher than jewel cases.
CD orders will be shipped via the US Postal Service. Order now to ensure
delivery in time for Christmas.
Sorry but I can only ship a CD to the US. You
will receive a confirmation e-mail before your CD is
shipped so you will be able to review all the information
before shipment. You will also receive your license key immediately
by e-mail, so you won't have to wait for the CD to arrive before
using Readerware.
You can find out more about the Readerware CD, including
a picture, plus place an order, on the web site:
Customizing Readerware Extraction
First off, this is not for everyone. Customizing the Readerware extraction
process means getting your hands dirty and writing some scripting code to
massage the data extracted. If this is not for you, don't worry many of
these features will be added to the GUI in upcoming releases.
But for those of you that have experience writing scripts, this is an
extremely powerful feature.
So why would you want to customize the extraction? There are a number of things you
can do. One user wanted to change the titles and strip of leading A's. For example
instead of "A Small Deceit" change the title to "Small Deceit" Some sites already do
this, some don't. Another user didn't like Readerware extracting categories, he
wanted to use his own. You can even substitute your category.
The way this is implemented is that Readerware will call a Python script after it
extracts data from a web site and before it adds a book to the database. Using
this script you can customize the data. You will find a basic copy of this
script on the web site at:
Download it and copy the script into your scrapers directory. If you used the
default install location this is C:\readerware\scrapers The file must
be called userexit.py. Here is what it looks like:
# Scraper user exit.
#
# If this file exists it is called immediately
# before the scraper process returns. You can change
# any of the
# global variables to customize the extraction process
#
import string
global title,author,format,bookclub,first,signed,read
global date,publisher,place,isbn,source,image
global value,category,copies,condition,rating,comments
global fullDateFormat
As mentioned earlier by itself this script does nothing, but it is the starting
point for developing your own scripts. Note the global statements.
These identify the global variable names that Readerware uses, in other words the
variable "title" contains the extracted title etc. This is really all you need to know about
how the process works, you need to set or change the contents of the variables to the
required data. So for example, if you don't want Readerware to extract categories
from a web site, you could add the following line at the end of the script:
category = ""
For something a little trickier, suppose you wanted to map the categories extracted
from a web site to your own categories:
if (string.find(category, "Mystery") != -1):
category = "My Mystery Category"
You would need to add these kinds of statements for every category and every web site.
You can probably see the basic idea, check for a string in the extracted category,
if found replace the category with another.
If you want to change the title as described earlier:
if (title[0:2] == "A "):
title = title[2:]
This may all look very strange, the script is written in the Python language. If
you know Python, you're all set. If you know another scripting language like Perl,
it shouldn't be much of a challenge.
Learning Python
There are a lot of resources available on the web
to help you with Python and a lot of books available too. Just fire up Readerware,
open the browser and search for Python titles at your favorite book retailer.
A good place to start your web search is at the official Python site:
Note that you don't have to install Python, all necessary libraries are included
with the Readerware distribution.
Python is a very powerful language and fairly easy to learn. If you're wondering
about the name, yes it was named after Monty. Unfortunately I
cannot offer support on Python itself. You will need to discover the power of
Python for yourself.
A book I really like is "Learning Python", it has a very readable approach, covers the basics
and advanced topics. The "Python Pocket Reference" is a handy thing to keep by your keyboard.
A friend recommends "Python Programming on Win 32", it covers Python with particular
emphasis on using it with Windows.
Click on the links below to learn more about these books and buy them at Amazon.
Debugging Your Script
Even the best Python programmer is going to make a mistake once in a while.
Fortunately it is very easy to debug your scripts with Readerware. First, start
Readerware, go to the UI Options Dialog and check the debug flag. Now shut down
Readerware.
To debug your scripts you need to start Readerware from the command line so
that you can see the output. CD into your Readerware directory and type:
readerware_debug
Use Readerware as normal. When extracting data you will see the output in the command
window. Here is some normal output, no errors, but you can see the data Readerware
extracted from the web site. You would also see any print statements you added
to your script:
D:\Win32App\Readerware>readerware_debug
[12:12:55] Running: scrapers/amazon.py
[12:12:55] title=The Scold's Bridle
[12:12:55] author=Walters, Minette
[12:12:55] format=Mass Market Paperback
[12:12:55] bookclub=null
[12:12:55] first=null
[12:12:55] signed=null
[12:12:55] read=null
[12:12:55] date=October 1995
[12:12:55] publisher=St Martins Mass Market Paper
[12:12:55] place=United States
[12:12:55] isbn=0312956126
[12:12:55] value=$6.29
[12:12:55] category=Books : Subjects : Mystery
[12:12:55] copies=null
[12:12:55] condition=Excellent
[12:12:55] rating=To Be Determined
[12:12:55] comments=
If something was wrong in your script you would see an error like this, a fabricated one,
obviously I would never make this error myself:-)
[12:14:38] Running: scrapers/amazon.py
Traceback (innermost last):
File "scrapers/amazon.py", line 207, in ?
File "scrapers/userexit.py", line 14
if (title[0:2] == "A ":
^
SyntaxError: invalid syntax
[12:14:38] title=The Scold's Bridle
[12:14:38] author=Walters, Minette
[12:14:40] format=Mass Market Paperback
[12:14:40] bookclub=null
[12:14:40] first=null
[12:14:40] signed=null
[12:14:40] read=null
[12:14:40] date=October 1995
[12:14:40] publisher=St Martins Mass Market Paper
[12:14:40] place=United States
[12:14:40] isbn=0312956126
[12:14:40] value=$6.29
[12:14:40] category=Books : Subjects : Mystery
[12:14:40] copies=null
[12:14:40] condition=Excellent
[12:14:40] rating=To Be Determined
[12:14:40] comments=
Also with debug on, Readerware will write the HTML file it retrieved from the web
site to the Readerware directory as trace.html. This can be useful sometimes when
debugging scripts.
Trading Scripts
If you write some useful scripts, how about sharing them? Send them to
support@readerware.com. I will
publish them on the web site for all to download. You will even get your name
in lights, unless of course you would rather remain anonymous.
Support Readerware
You can support Readerware and ensure that the new features keep on coming. Use
Readerware when making your online purchases. If you order using the Readerware
browser or shopping cart, Readerware normally receives a commission from the vendor.
The Readerware browser uses the same encryption technology as the major browsers so
you can order online, safely and securely using Readerware. Plus it ensures continued
development of Readerware.
Thank you.
Your Feedback
Your feedback is always welcome. Please send us your feedback.
E-Mail feedback@readerware.com.
Thanks for your support.
|