There is a lot of work being done in the area of Bible translation to reach people of different languages. One language we typically don’t think about is the digital language read by computers. We have many options available to read and download digital editions, but as I have written before, these formats have limits. MetaV is a new digital version which makes it easier for programmers and non-programmers alike to link each word to useful metadata and perform a wider range of analysis.
Metadata is information that describes other information. Take a photograph, for example. The picture captures information on a scene which, when described some other way, may take 1,000 words or more. Information describing that photograph (metadata) might include: photographer, date, film type, camera settings, and location. Modern software uses this type of information to efficiently organize large sets of digital photos. MetaV organizes large sets of words using similar methods.
The main limitation in freely available digital copies of the Bible is that each line contains a full verse. What I have done with MetaV is break it down to individual words, with columns describing more about each one. Currently, it can tell whether the word is italicized, what punctuation follows it, whether it is the beginning or end of a parenthetical statement, and whether it is at the beginning of a new paragraph. Of course, it also stores the book, chapter, verse, and position within the verse.
This serves as a foundational building block to efficiently add more pieces of metadata for simple searches and advanced analysis. First, I’ll add Strong’s numbers. Then, location information, genealogical relationships, speakers, timelines, and nearly anything else that can be linked back to the root text. The diagram below illustrates how this information will be joined together.
Even before any new modules are included, some useful analysis can be performed, such as: readability statistics for individual books (or any subset of your choosing), writing style analysis (How long are the sentences and paragraphs? What words does the author favor?), or just simple word counts (How many italicized words are there? How many unique words are there?).
MetaV is a new translation in the true sense of the word – it “slides” the words to a new position to make it more readable by a database language. I have done nothing to remove words or change their meaning (as too many modern translations do) and have taken great care to ensure each programming detail is correct in every way. In the coming weeks and months I will be publishing results of some analysis made simpler by this new tool, so stay tuned!
Update 6-11-2011: Version 1 has been deprecated. You can download MetaV 2.0 here.