With the proliferation of content providers in the children’s digital publishing industry comes a growing need for children’s librarians OUTSIDE of library organizations. For me this means that the painful hours I spent slogging through cataloging classes, fiddling with reports in Millenium that sometimes went missing and the harrowing intricacies of bibliographic control are becoming marketable skills that are relevant to developers and entrepreneurs in the private sector.
I’ve been asked to help a start-up with the classification of their small but growing collection of ebooks for children. They are software engineers and parents who understand the value of reading to their young children, and they eschew the use of interactivity within the books themselves. One of the founders, Fang Chang, even said to me: “Books aren’t broken” (ie there is no need to add games, bells, whistles and other annoying things that app developers sometimes add to their kids apps.)
What Bookboard needs a librarian’s help to build is essentially a recommendation and classification engine that can offer an automatic equivalent of reader’s advisory within the app. We need to combine machine learning (an algorithm) with human expertise; specifically, librarian expertise. While this may cause some of my colleagues to consider throwing in the librarianship towel and stream Desk Set on their Rokus, keep in mind that the collection still needs to be maintained and developed. The junk has to be weeded out. New high-quality material needs to be produced or acquired, and added to the collection in an intelligent way.
The first part of this project is assigning some kind of level to the books. Most children’s libraries have a physical mechanism for separating books by reading level: board books, picture books, readers, paperbacks, novels etc. What we are mostly limited by is space, so for this project we take away the physical limitations of housing an entire children’s collection. How do we organize it when we are not guided by physical format, and “findability” is no longer relevant? (Seriously! If you have ideas, ping me! Let’s talk!)
An interesting excerpt from the Principles of Readability (2004) states the following with regard to the limitations of reading level formulas:
Readability researchers have long taken pains to recommend that, because of their limitations, formulas are best used in conjunction with other methods of grading and writing texts. Ojemann (1934) warned that the formulas are not to be applied mechanically, a caution expressed throughout readability literature. Other investigators concerned with the difficulty and density of concepts were Morriss and Holversen (1938) and Dolch (1939). E. Horn (1937) warned against the mechanical use of the word lists in the re-writing of books for social studies.
George Klare and colleagues (1969) stated, “For these reasons, formula scores are better thought of as rough guides than as highly accurate values. Used as rough guides, however, scores derived from readability formulas provide quick,easy help in the analysis and placement of educational material.”
I need to design a system by which we can calculate a book’s “reading level,” which feels gross even to type, but serves a valuable purpose when it comes to knowing what books are at the same level of difficulty. Most of my librarians colleagues shudder in revulsion at the mention of AR or Lexile levels, but it is these very tools that I am examining to determine what formula we can use to accurately group books in the collection according to reading level. We’ll combine some kind of metric with age appropriateness to create a formula by which we can offer books in a recommendation line-up that are similar to other books that reader has already enjoyed.
So how do we do this?
Many of the books in the collection are from Orca Publishers, which uses the Fry Readability Formula. Here’s a handy-dandy guide on how to use it. My initial thought was to go with this method and try to get pre-calculated numbers directly from the publishers if they have them, but not all the publishers will have this, and even if they did, we’d been relying on their data/calculations. The graph (below) makes my brain hurt, but may come in handy. The problem with this method is that you need to have 100 words to count; some of our books don’t even have 100 words. You can do the calculation, still, I suppose, but the results may not be statistically useful.
We need to calculate reading levels to some degree of accuracy, but the calculation process has to be reasonable from a time/trouble perspective. I have to do all the data entry myself, so I’m motivated to find the most effective means of punching in bibliographic data and having useable information spat back out to me. There is a tool within Microsoft Word that determines the Flesch-Kincaid reading grade level of any given passage, which may mean that we can do all of our own calculations and not rely on external tools or data. Unfortunately, many of the books that I tried only came up with a score of 100; they are too simple, or “readable” to register on the scale at all.
Finally, I signed up for the Lexile Analyser, uploaded some text and got some decent results. Some of the results my text came up with seemed a little off and I assumed I’d have to tinker with the results to take into consideration oddities (especially in picture books) like concept or lexicon-style books, poetry, non-standard layout or text and discrepancies in age appropriateness and reading level. Then I did a little more digging and found out that Lexile already takes these factors (and more!) into consideration and has a set of codes in place already to deal with oddities, so I might still have to tinker, but I won’t have to devise an entirely new system on my own.
- AD: Adult Directed
- NC: Non-Conforming
- HL: High-Low
- IG: Illustrated Guide
- GN: Graphic Novel
- BR: Beginning Reading
- NP: Non-Prose
ATOS (the readability tool used by AR) is also a possibility, and they have a similar system to Lexile in that they offer pre-leveled books and a text-upload analyser. There are fewer books from the Bookboard collection in the AR Bookfinder than there are for Lexile, though, so developing our own reading level standards will be a little more difficult.
I have to run a test set of books to see what produces the most useable data and then see how the reading level effects the recommendations that people get when they use Bookboard. I’m excited to see what we come up with; I’m also excited to walk my talk about how children’s librarians need to work with developers to produce high quality book-based content that could be available through public libraries. More children’s librarians need to get out of their libraries and into the content creation and management space, in a similar vein as the librarians who are developing eBook distribution systems like Califa and Douglas County. There’s no other way to have a say in the development of the services that public libraries offer to their communities than to work with the people who are developing those services.