As with my first and second posts, I developed a few tools in order to scrape self-publishing data from the sites of Amazon and Barnes & Noble; the data focused on new titles released within the past few months. This time, I focused on the self-published titles which are part of a series. The patterns are starting to emerge…and once again, you wound me, sci-fi authors. But who knew that photo essays had series?
On a side note, here were the top genres, when it came to the percentage of titles which were in a series:
-
Women Investigators/Cops: (63.11 %)
-
Dark Fantasy: (60.1%)
-
Paranormal Romance: (58.3%)
-
Photo Essay: (57.6%)
-
Fantasy: (56.1%)
Peter Bolton is the author of Blowing the Bridge: A Software Story and has also been known to be a grumpy bastard on occasion.
I don’t suppose you’d care to do a writeup of your, ahem, ‘tool’ development for us?
‘Tool’…you said ‘Tool’…ah, huh, huh, huh…
Were Beavis & Butthead ever popular in the UK? Otherwise, that joke might have just been a massive failure…
In any case, I suppose that I could write a whole post on that someday. Even though there are various open source packages and solutions to scrape web sites, I decided that I wanted something a bit more flexible. So, I wrote my own solution using mainly C# for the scraping (which was repurposed code from a previous project), a little Java for miscellaneous stuff (data aggregation, loading into tables, etc.), and some SQL queries to probe the data. It’s ugly as sin, but it does work.
Maybe one day I’ll go back over it, make it pretty, and unveil it to the world along with its story. For now, though, this heinous beast should probably stay in its cage.