Answering Big Questions With Big Data

The words of scripture create a tightly knit fabric; exciting pictures emerge when we weave them together with silicon and electrons.  The field of Big Data is rapidly expanding the possibilities for quantitatively and visually analyzing text as complex and rich as that of the Bible.  With it we can more easily study language structures, writing styles, or discover hidden codes.

Textual Analysis

cross model

One of the more difficult areas of big data is text mining.  It is “unstructured” in the sense that it isn’t arranged in a way a computer can easily understand.  Machines have a very difficult time with natural language, though major search engines and other startups are making great strides in that area.  For the most part, language is analyzed according to word frequency or proximity to other words of a known type. I know of at least two practical examples in biblical studies.

First is Steven Boyd’s work in the RATE project.  He presented a statistical approach to determining whether a passage is prose or poetry.  Specifically, he looked at the distribution of four types of finite verbs in sections that are indisputably poetic and those which are prose.  We can then take a text in which the genre is controversial (Genesis 1:1-2:3 in this case) and compare the distribution of verb forms to appropriately categorize them.  Boyd’s study was limited enough that it wouldn’t be put in the big data category but the techniques would be similar with a much larger set of passages.

Another project published at openbible.info explores the “sentiment” of every biblical event. In basic terms, a program calculates the frequency of words generally considered to convey a positive sentiment vs. those that are more negative.  This approach is more useful to marketers studying customer reaction to their brand than serious biblical analysis but I do think it’s a good starting point and will prove more useful as language processing algorithms become more advanced and widespread.

Bible Codes

A far more well-known and controversial field is that of Bible codes.  To even approach a debate on the significance or meaning of messages some claim God encrypted in the Bible, we must have good data to back it up – and lots of it.  Consider a well-known example: by taking every 50 letters of either Genesis or Exodus, it spells out the word “Torah.”  To argue for or against the notion that this is evidence of divine cryptography, we must know how likely it is we’ll find the same phenomenon elsewhere.  That means gathering writings in the same language from the same time period as well as books from other languages and periods.  In other words, big data.

Books, software, and videos abound with claims of similar discoveries from simple to more complex and unlikely phrases.  I have not gone through the statistical rigor of verifying or refuting the claims myself, but some seem quite compelling.  In any case, newer technologies and mathematical discoveries are sure to shed new light on this subject as time passes.

Other Big Data Applications

Fresh possibilities abound, from authorship analysis to readabilityn-grams and much more. It is an exciting time to be involved in big data programming and visualization.  It won’t answer questions about where we come from, why we’re here, or where we’re going any better than God’s words have already spoken, but it does have some potential to expand our understanding of those words.  In what ways do you think big data could aid Bible studies?