cougarguard.com — unofficial BYU Cougars / LDS sports, football, basketball forum and message board  

Go Back   cougarguard.com — unofficial BYU Cougars / LDS sports, football, basketball forum and message board > non-Sports > Religious Studies
Register FAQ Community Calendar Today's Posts Search

Reply
 
Thread Tools Display Modes
Old 08-18-2007, 05:31 PM   #1
Solon
Senior Member
 
Join Date: Mar 2007
Location: Happy Valley, PA
Posts: 1,866
Solon is on a distinguished road
Default Stylometric Analysis of Scripture

Can any of you stats folks figure this out?

http://links.jstor.org/sici?sici=096...3E2.0.CO%3B2-Z

I could only handle the introduction and conclusions.

If you would like a pdf of the entire article and don't have access to JSTOR, send me a Private Message with an e-mail address. Just make sure you observe "fair use" guidelines.
__________________
I hope for nothing. I fear nothing. I am free. - Epitaph of Nikos Kazantzakis (1883-1957)
Solon is offline   Reply With Quote
Old 08-18-2007, 06:43 PM   #2
pelagius
Senior Member
 
Join Date: Nov 2006
Posts: 1,431
pelagius is on a distinguished road
Default

Quote:
Originally Posted by Solon View Post
Can any of you stats folks figure this out?

http://links.jstor.org/sici?sici=096...3E2.0.CO%3B2-Z

I could only handle the introduction and conclusions.

If you would like a pdf of the entire article and don't have access to JSTOR, send me a Private Message with an e-mail address. Just make sure you observe "fair use" guidelines.
Solon, do you have specific questions? I don't do stylometrics in my professional work but I am generally familiar with it. I read this paper once a long time ago and I don't keep up with the developments in the literature. I have little doubt that BYU or FARMS responded to this study, but I am unaware of the response.

If you are looking for a kind of summary of what he finds then look at figure 1. Can you see how the same author samples don't cluster together (for example, look at the Mormon samples)? The often cluster closer to other authors (except for the Joseph Smith writing samples).I think that may be the most important point he makes. There appears to be a fair amount of variation within author (as identified by the Book of Mormon). The within author variation looks at least as big as between author variation. (I think that study uses measures of vocabulary richness as compared to the early BYU stuff that looked at the frequency of non-contextual words). In summary, the original stuff said that Mormon and Nephi don't write the same, and basically this study says that Mormon doesn't write the same Mormon and the difference is as big as the Mormon/Nephi difference.

Implications

I think Mormons should be wary of relying on or turning to stylometric results for support of multiple authorship. At best the empirical evidence in favor of the result isn't robust, and probably should be described as mixed.

Second, I can't for the life of me figure out why even if it is an ancient document one would expect there to be evidence of multiple authorship given what we know of the translation process . Also, I think you can construct reasonable hypotheses where it is a 19th century document and multiple authorship. I just don't see how a sharp hypothesis with regard to multiple authorship can be generated (either direction).

Last edited by pelagius; 08-18-2007 at 06:54 PM.
pelagius is offline   Reply With Quote
Old 08-18-2007, 06:50 PM   #3
Solon
Senior Member
 
Join Date: Mar 2007
Location: Happy Valley, PA
Posts: 1,866
Solon is on a distinguished road
Default

Quote:
Originally Posted by pelagius View Post
Solon, do you have specific questions? I don't do stylometrics in my professional work but I am generally familiar with it. I read this paper once a long time ago and I don't keep up with the developments in the literature. I am sure that BYU/FARMS responded to this study, but I am unaware of the response.

If you are looking for a kind of summary of what he finds then look at figure 1. Can you see how the same author samples don't cluster together (for example, look at the Mormon samples)? The often cluster closer to other authors (except for the Joseph Smith writing samples) I think that may be the most important point he makes. There appears to be a fair amount of variation within author (as identified by the Book of Mormon). The within author variation looks at least as big as between author variation. (I think that study uses measures of vocabulary richness as compared to the early BYU stuff that looked at the frequency of non-contextual words).

Implications

I think Mormons should be wary of relying on or turning to stylometric results for support of multiple authorship. At best the empirical evidence in favor of the result isn't robust, and probably should be described as mixed.

Second, I can't for the life of me figure out why even if it is an ancient document one would expect there to be evidence of multiple authorship given what we know of the translation process. Also, I think you can construct reasonable hypotheses were it is a fraud and multiple authorship.
I don't have any specific questions - just stumbled across it looking for something else and was wondering if anyone knew about the stats involved. I'm reluctant to give much credence to measuring something like style, but then again, what do I know?

I've come across stylometry in Classics with people trying to prove/disprove Aeschylean authorship of Prometheus Bound, but that was small potatoes compared to the algorithms in this article.

I once heard Rick Majerus say something along the lines of, "Statistics are like bathing suits: they reveal a lot but conceal the most important parts."

Is anyone familiar with this line of research?
__________________
I hope for nothing. I fear nothing. I am free. - Epitaph of Nikos Kazantzakis (1883-1957)
Solon is offline   Reply With Quote
Old 08-18-2007, 06:58 PM   #4
Archaea
Assistant to the Regional Manager
 
Archaea's Avatar
 
Join Date: Aug 2005
Location: The Orgasmatron
Posts: 24,338
Archaea is an unknown quantity at this point
Default

That study is just a mid study and Pelagius has stated the general conclusions about stylometrics, namely they are interesting but are so far removed from proving much that emphasis on them has been mostly abandoned.

Another coined phrase for them is "wordprint", but stylometrics is the academic term. In the end, the style of this study and others could be considered, "Much Ado About Nothing."
__________________
Ἓν οἶδα ὅτι οὐδὲν οἶδα
Archaea is offline   Reply With Quote
Old 08-18-2007, 07:12 PM   #5
ChinoCoug
Senior Member
 
ChinoCoug's Avatar
 
Join Date: Jan 2006
Location: NOVA
Posts: 3,005
ChinoCoug is an unknown quantity at this point
Default

I should be able to access this article from work on Monday. I'll look into it.
__________________
太初有道
ChinoCoug is offline   Reply With Quote
Old 08-18-2007, 07:31 PM   #6
pelagius
Senior Member
 
Join Date: Nov 2006
Posts: 1,431
pelagius is on a distinguished road
Default

Quote:
Originally Posted by Solon View Post
I once heard Rick Majerus say something along the lines of, "Statistics are like bathing suits: they reveal a lot but conceal the most important parts."

Is anyone familiar with this line of research?
I guess I am still unsure of what you are looking for, Solon (I promise I am not trying to give you a hard time about this. I hope it doesn't come across that way. I am honestly never quite sure what level of detail want when they ask statistical questions). Did you want me to explain the methodology used in the paper? If you want then I can do that (I would be surprised if you really want me to). For example, one of his core measures is the Sichel Distribution which measures the probability that a word (he only uses nouns) appear X times in a N word sample (N=1000 in the paper). The distribution is a two parameter distribution: alpha and theta (they jointly describe the shape of the distribution the way mean and standard deviation do for the normal distribution). The author doesn't specify but it makes most sense to estimate the distribution using Maximum likelihood. You then can compare the different writing samples by testing if the alpha's and theta's are different across the writing samples.

He then combines this measure with 4 other similar measures and then explores commonality using two different approaches: cluster analysis and principal component analysis.

Are you asking for a comment on whether this approach is subjective?

There some truth to that charge. The stylometrician has a fair amount of degrees of freedom in terms of the design of the test.

Last edited by pelagius; 08-18-2007 at 08:03 PM.
pelagius is offline   Reply With Quote
Old 08-19-2007, 02:15 PM   #7
Solon
Senior Member
 
Join Date: Mar 2007
Location: Happy Valley, PA
Posts: 1,866
Solon is on a distinguished road
Default

Quote:
Originally Posted by pelagius View Post
I guess I am still unsure of what you are looking for, Solon (I promise I am not trying to give you a hard time about this. I hope it doesn't come across that way. I am honestly never quite sure what level of detail want when they ask statistical questions). Did you want me to explain the methodology used in the paper? If you want then I can do that (I would be surprised if you really want me to). For example, one of his core measures is the Sichel Distribution which measures the probability that a word (he only uses nouns) appear X times in a N word sample (N=1000 in the paper). The distribution is a two parameter distribution: alpha and theta (they jointly describe the shape of the distribution the way mean and standard deviation do for the normal distribution). The author doesn't specify but it makes most sense to estimate the distribution using Maximum likelihood. You then can compare the different writing samples by testing if the alpha's and theta's are different across the writing samples.

He then combines this measure with 4 other similar measures and then explores commonality using two different approaches: cluster analysis and principal component analysis.

Are you asking for a comment on whether this approach is subjective?

There some truth to that charge. The stylometrician has a fair amount of degrees of freedom in terms of the design of the test.
No, you're not giving me a hard time, and thanks for all the details you provide. I have nothing other than passing curiosity, and wondered if anyone was familiar with this type of endeavor and what he/she thought of it. Has stylometry been accepted in other fields, or is this just a statistical game? I'm not looking for hardcore answers, just curious about the size of the iceberg beneath this tip.
__________________
I hope for nothing. I fear nothing. I am free. - Epitaph of Nikos Kazantzakis (1883-1957)
Solon is offline   Reply With Quote
Old 08-19-2007, 03:47 PM   #8
Archaea
Assistant to the Regional Manager
 
Archaea's Avatar
 
Join Date: Aug 2005
Location: The Orgasmatron
Posts: 24,338
Archaea is an unknown quantity at this point
Default

Quote:
Originally Posted by Solon View Post
No, you're not giving me a hard time, and thanks for all the details you provide. I have nothing other than passing curiosity, and wondered if anyone was familiar with this type of endeavor and what he/she thought of it. Has stylometry been accepted in other fields, or is this just a statistical game? I'm not looking for hardcore answers, just curious about the size of the iceberg beneath this tip.
I believe it's been attempted in literature but not with much acclaim or success.
__________________
Ἓν οἶδα ὅτι οὐδὲν οἶδα
Archaea is offline   Reply With Quote
Old 08-19-2007, 04:51 PM   #9
pelagius
Senior Member
 
Join Date: Nov 2006
Posts: 1,431
pelagius is on a distinguished road
Default

Quote:
Originally Posted by Solon View Post
No, you're not giving me a hard time, and thanks for all the details you provide. I have nothing other than passing curiosity, and wondered if anyone was familiar with this type of endeavor and what he/she thought of it. Has stylometry been accepted in other fields, or is this just a statistical game? I'm not looking for hardcore answers, just curious about the size of the iceberg beneath this tip.
I think Arch has it about right. I have seen it used in various disciplines. For example, you see it pop up in the 70s in Biblical Studies in terms of trying to establish authorship of things like Isaiah. I think the study you link to underscores some of the problems with the measures that are used. Namely, how stable are these measure within author? If within author variation is large, then you can't make meaningful infererences in terms of identifying authors.

I don't know the literature well enough give a sense for the advantages or disadvantage of various measures in general. The article does talk about these issues a little bit. I will say that I don't like the original stuff by BYU that relied on non-contextual word use patterns ("and it came to pass") because it seems likely that non-contextual word use is affected by translator choice or preference.

However, I think stylometrics could be useful but I think one needs to have a pretty sharp hypothesis about how the different proposed authors wrote. I think in such a case the results could be quite compelling.
pelagius is offline   Reply With Quote
Reply

Bookmarks


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 09:31 PM.


Powered by vBulletin® Version 3.8.2
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.