Google has responded to my question about potential product plans related to public records. The official statement:
"While we're always working on new products to better serve our users, advertisers and publishers, we have nothing to announce at this time."
However, the Google spokesperson did note that it already has one initiative that addresses public records: Google Patent Search. The low-profile service was launched in 2006, and uses the same technology that powers Google Book Search to archive 7 million patents and over a million patent applications in a searchable, online database.
I had posed the question to Google earlier this week after considering the hypothetical impact of a Google service that indexed public documents such as census forms (see What if Google's mission extended to public records?).
These documents are of huge importance to anyone researching their family trees. Indeed, several companies have built respectable online businesses digitizing census forms, shipping lists, and other public documents. The most well-known service is Ancestry.com, which has approximately 890,000 paid subscribers and 750 million page views per month. The company has built an online archive of all U.S. census documents from 1790 to 1930, including the interview forms that census takers used to record data for individual families during this period.
But Google -- or any other company -- might be wise to consider the challenges associated with digitizing these records. I talked with Tim Sullivan, CEO of The Generations Network, which operates Ancestry.com, Genealogy.com, and many other international genealogy research websites. Sullivan said The Generations Network was concerned about Google's potential plans, pointing out that the company had been very successful at indexing text documents: "Any printed material will end up on Google or the Web," Sullivan declared.
However, Sullivan described a major drawback related to census forms and many other public records: Handwritten documents are notoriously difficult to read using optical character recognition (OCR) software.
Sullivan described old census forms from the 18th, 19th, and early 20th centuries as "a hugely diverse collection of handwritten records," in the sense that the handwriting styles and the quality of the source documents varied greatly. Remember, census forms from 1930 and earlier were filled out by a multitude of individuals going door to door across America, and their handwriting styles varied greatly. OCR tools, which are used to convert books and other printed documents into online text that can be searched and indexed, are "not even close" to being able to read handwritten records, Sullivan says.
So how did The Generations Network import the data from millions of old census forms into its online database? Sullivan says the company spent about $75 million over 10 years to build its "content assets" including the census data, and much of that cost went into partnering with Chinese firms whose employees read the data and entered it into Ancestry.com's database. The Chinese staff are specially trained to read the cursive and other handwriting styles from digitized paper records and microfilm. The task is ongoing with other handwritten records, at a cost of approximately $10 million per year, he adds.
When asked about Google, Sullivan said, "We view that their mission and ours is quite complimentary." However, he declined to discuss the nature of communications between the two companies.
Sullivan also told the Standard that The Generations Network will be opening up the Ancestry.com platform to outside developers.
Image: 1860 U.S. Census form (Source: Ancestry.com)
More news, commentary, and predictions from The Industry Standard:













Comments
The Mormons are taking a different approach - enlisting thousands of volunteers to read and type the records.
Post new comment