Customizing Readerware Extraction

First off, this is not for everyone. Customizing the Readerware extraction process means getting your hands dirty and writing some Python scripting code to massage the data extracted. But for those of you that have experience writing scripts, this is an extremely powerful feature.

So why would you want to customize the extraction? With the release of Readerware 3.0, there is a lot less need for custom extraction. Readerware auto-catalog now lets you specify what fields are extracted and supply default values for any field. There are a lot more standard fields in Readerware 3.0 which are extracted automatically. So if you have come to this page to learn how to migrate your Readerware 2.0 scripts to Readerware 3.0, the answer might be to throw them away and forget about custom extraction.

One use for custom extraction is to standardize fields like category or publisher. Using custom extraction you can look at the current contents of a field and change it.

To start using custom extraction you need to install the script file and then customize it. You can download the base script file from
http://www.readerware.com/misc/rwuserexit.py
You need to save this file into your Documents->Readerware folder. You can save it there or copy/paste it into a text editor.

By itself this script does nothing, but it is the starting point for developing your own scripts. To edit this script file use a text editor like Notepad or TextEdit. Add your statements to the script at the point indicated. You should not otherwise change the script, just add your statements.

Readerware uses the Python language for the extraction scripts. Please note that indentation is very important in Python. Python uses indentation to delimit code blocks. So when you add your statements to the supplied script you should add them after the placeholder line and start them in the same column.

As an example, suppose you wanted to map the categories extracted from a web site to your own categories:
# Add your statements here
if (string.find(rw.getCategory1(), "Mystery") != -1):
    rw.setCategory1("My Mystery Category")
You can probably see the basic idea, check for a string in the extracted category, if found replace the category with another. The full list of the methods available on the rw object are included at the end of this document.

Learning Python

There are a lot of resources available on the web to help you with Python and a lot of books available too. Just fire up your browser and search for Python titles at your favorite book retailer. A good place to start your web search is at the official Python site: http://www.python.org.

Note that you don't have to install Python, all necessary libraries are included with the Readerware distribution.

Python is a very powerful language and fairly easy to learn. If you're wondering about the name, yes it was named after Monty. Unfortunately Readerware cannot offer support on Python itself. You will need to discover the power of Python for yourself.

A good choice is "Learning Python by Mark Lutz", it has a very readable approach, covers the basics and advanced topics. The "Python Pocket Reference by Mark Lutz" is a handy thing to keep by your keyboard. Another good one is "Text Processing in Python by David Mertz". A friend recommends "Python Programming on Win 32 by Mark Hammond", it covers Python with particular emphasis on using it with Windows.

Debugging Your Script

Even the best Python programmer is going to make a mistake once in a while. Fortunately it is easy to debug your scripts with Readerware. First, start Readerware, go to General Preferences and ensure the Enable Readerware logging option is checked. You must restart Readerware when you change this option.

Use Readerware as normal. When extracting data Readerware will output debugging information and any error messages to a log file.

You will find this log file in your Documents->Readerware->Logs folder. You can view it in any text editor.

Also with logging on, Readerware will write the HTML file it retrieved from the web site to the logs folder as trace.html. This can be useful sometimes when debugging scripts.

Available Methods

The following methods are available to get and set extracted data. They are accessed from the rw object, i.e. rw.setTitle("My Title")

getTitle()
setTitle(data)
getSubtitle()
setSubtitle(data)
setAuthors(authorList)
getAuthor()
setAuthor(data)
getAuthor2()
setAuthor2(data)
getAuthor3()
setAuthor3(data)
getAuthor4()
setAuthor4(data)
getAuthor5()
setAuthor5(data)
getAuthor6()
setAuthor6(data)
getTranslator()
setTranslator(data)
getIllustrator()
setIllustrator(data)
getEditor()
setEditor(data)
getPublisher()
setPublisher(data)
getPublicationPlace()
setPublicationPlace(data)
getReleaseDate()
setReleaseDate(data)
getCopyrightDate()
setCopyrightDate(data)
getPages()
setPages(data)
getEdition()
setEdition(data)
getLanguage()
setLanguage(data)
getSigned()
setSigned(data)
getDimensions()
setDimensions(data)
getReadingLevel()
setReadingLevel(data)
getLexileLevel()
setLexileLevel(data)
getCopies()
setCopies(data)
getBarcode()
setBarcode(data)
getISBN()
setISBN(data)
getISSN()
setISSN(data)
getLCCN()
setLCCN(data)
getDewey()
setDewey(data)
getCallNumber()
setCallNumber(data)
getUserNumber()
setUserNumber(data)
getType()
setType(data)
getFormat()
setFormat(data)
getSeries()
setSeries(data)
getSeriesNumber()
setSeriesNumber(data)
getMyRating()
setMyRating(data)
getCondition()
setCondition(data)
getCoverCondition()
setCoverCondition(data)
getCategory1()
setCategory1(data)
getCategory2()
setCategory2(data)
getCategory3()
setCategory3(data)
getLocation()
setLocation(data)
getKeywords()
setKeywords(data)
getReadCount()
setReadCount(data)
getLastReadDate()
setLastReadDate(data)
getProductInfo()
setProductInfo(data)
getMyComments()
setMyComments(data)
getSource()
setSource(data)
getCatalogNumber()
setCatalogNumber(data)
getPurchasePrice()
setPurchasePrice(data)
getPurchaseDate()
setPurchaseDate(data)
getPurchasePlace()
setPurchasePlace(data)
getListPrice()
setListPrice(data)
getItemValue()
setItemValue(data)
getValuationDate()
setValuationDate(data)
getCurrencySymbol()
setCurrencySymbol(data)
getFavorite()
setFavorite(data)
getOutOfPrint()
setOutOfPrint(data)
getMediaURL()
setMediaURL(data)
getOwner()
setOwner(data)
getStatus()
setStatus(data)
getExternalID()
setExternalID(data)
getASIN()
setASIN(data)
getSalePrice()
setSalePrice(data)
getSaleDate()
setSaleDate(data)
getNewPrice()
setNewPrice(data)
getNewCount()
setNewCount(data)
getUsedPrice()
setUsedPrice(data)
getUsedCount()
setUsedCount(data)
getCollectiblePrice()
setCollectiblePrice(data)
getCollectibleCount()
setCollectibleCount(data)
getBuyerWaiting()
setBuyerWaiting(data)
getWeight()
setWeight(data)
getSalesRank()
setSalesRank(data)
getImage1()
setImage1(data)
setRefImage1(data, ref)
getImage2()
setImage2(data)
setRefImage2(data, ref)
getLargeImage1()
setLargeImage1(data)
setRefLargeImage1(data, ref)
getLargeImage2()
setLargeImage2(data)
setRefLargeImage2(data, ref)
getUser1()
setUser1(data)
getUser2()
setUser2(data)
getUser3()
setUser3(data)
getUser4()
setUser4(data)
getUser5()
setUser5(data)
getUser6()
setUser6(data)
getUser7()
setUser7(data)
getUser8()
setUser8(data)
getUser9()
setUser9(data)
getUser10()
setUser10(data)
setChapter(volume, chapter, title, author=None)
convertName(data)
getLogsDir()
getDocsDir()

Top of Page


Copyright © 1999-2017 Readerware Corporation