Well, the title may be a bit too grandiose, but my little investigation may surprise you too.

When I read a paper stating “We were able to show…” I cringe (and I delete it from any papers I co-author). To simplify, there are two ways to do science. Approach A is to “… perceive whatever holds / The world together in its inmost folds”, approach B: “look how much work/ingenuity I have invested / impact factor harvested / publications put out / recognition earned” (yes I said it’s simplified ;-). So for me approach A is typified by “We found…”, while approach B is revealed by “We were able to show…”. Now I know myself – I can go overboard with my opinions. So in my last sleepless night it occurred to me to look whether my feeling that everyone these days is “able to show / demonstrate…” is just my obsession or based on actual fact. So I went to Google Scholar, where it is easy to narrow searches to time ranges, and compared the number of occurrences of “we found” versus “we were able to show”.

Based on a quick web query I learned the term “web scraper” and built a simple one to extract the number of results. [This is “iffed out” in the code below, since it banned me after a few tries – see comments on “scraping” Google Scholar; ideas?]

Amazingly: the ratio of “we could show” over “we found” changed from 0.5–0.7% in 1980–2000 to 6% in 2011, a 8th(!) order polynom being necessary to model the steep rise by a factor of 10 in the last ≈5 years!

There’s lots to be argued – is the wording I chose adequate, is it occasionally ok to say “we could show…” (yes), why I started 1980, and and… This post is already too long. Finally, here’s the code, and I would love to learn how to improve (e.g. get rid of the for loop) and how to avoid a Google ban (which shows up as “this page has moved”).

Code Begin

looking for a style change in scientific papers over time

© 2012-08-05 Michael Bach

«a href=”https://michaelbach.de”>https://michaelbach.de</a>> bach@uni-freiburg.de

a simple scholar scraper

googleScholarHits <- function(searchString, yearStart, yearEnd) {

require(RCurl)

url = paste0("<a href="http://scholar.google.com/scholar?as_ylo=">http://scholar.google.com/scholar?as_ylo=</a>", 

	as.character(yearStart), "&as_yhi=",  as.character(yearEnd), 

	"&q=%22", searchString, "%22&hl=en&num=1&as_sdt=0")

print(url)

webpage = as.character(getURL(url))

print(substr(webpage, 1, 500))

require(stringr)

start = str_locate(webpage, fixed(">About "))[2]

end = str_locate(webpage, fixed(" results ("))[1]

number = substr(webpage, start, end)

return(as.numeric(sub(pattern=",", replacement="", x=number)))

}

style1string=”we+found”; style2string=”we+have+shown”

googleScholarHits(style1string, 2009, 2009) # a manual test

years = seq(from=1980, to=2011, by=1)

if (FALSE) {

nStyle1 = array();  nStyle2 = array();  i = 1

for (year in years) {

	nStyle1[i] = googleScholarHits(style1string, year, year)

	nStyle2[i] = googleScholarHits(style2string, year, year)

	i = i+1;  

}

d = data.frame(years, nStyle1, nStyle2)

} else {

since the scraper caused Google to “ban” me, here are the literal findings

dRaw = "year	nStyle1	nStyle2

39200	282

44400	268

49400	292

56000	321

61600	352

66300	359

72300	397

79000	448

87300	466

96500	528

109000	558

115000	583

123000	657

135000	739

146000	737

158000	880

168000	907

176000	1010

185000	1110

196000	1190

218000	1600

214000	1440

218000	1600

225000	1860

229000	2260

215000	2360

204000	2690

187000	2950

170000	3170

146000	3520

102000	3580

64200	3950"

d = read.table(textConnection(dRaw), header=TRUE)

}

d$nStylesRatio = d$nStyle2 / d$nStyle1

library(ggplot2); library(scales)

ggplot(data=d, aes(x=years, y=nStylesRatio)) +

geom_point(size=5) +

stat_smooth(method = “lm”, formula=y ~ poly(x, 8), size=2) +

scale_y_continuous(labels=percent) +

coord_cartesian(xlim=c(1980, max(years)+1), ylim=c(0, 1.05*max(d$nStylesRatio, na.rm=T))) +

labs(x=”Time [year]”, y = “Style Ratio [%]”) +

opts(title = paste0(“Paper style change: “”, style1string, “”/“”, style2string,”””))

2012-08-05 (tags: science, r)

Scientific Style Change: Boasting exploded in the last 5 years – with R code to prove it