Το Neon54 Casino δέχεται κρυπτονομίσματα και νομίσματα fiat που περιλαμβάνουν ευρώ, καναδικά δολάρια, δολάρια ΗΠΑ, ιαπωνικά γεν, ουγγρικά φιορίνια, νορβηγική κορώνα, ρωσικά ρούβλια και άλλα. Free ebooks Library zlibrary project svenska darknet marknaden flugsvamp 4.0 länk

When do we expect conformal prediction sets to be helpful? 

This is Jessica. Over on substack, Ben Recht has been posing some questions about the value of prediction bands with marginal guarantees, such as one gets from conformal prediction. It’s an interesting discussion that caught my attention since I have also been musing about where conformal prediction might be helpful. 

To briefly review, given a training data set (X1, Y1), … ,(Xn, Yn), and a test point (Xn+1, Yn+1) drawn from the same distribution, conformal prediction returns a subset of the label space for which we can make coverage guarantees about the probability of containing the test point’s true label Yn+1. A prediction set Cn achieves distribution-free marginal coverage at level 1 − alpha when P(Yn+1 ∈ Cn(Xn+1)) >= 1 − alpha for all joint distributions P on (X, Y). The commonly used split conformal prediction process attains this by adding a couple of steps to the typical modeling workflow: you first split the data into a training and calibration set, fitting the model on the training set. You choose a heuristic notion of uncertainty from the trained model, such as the softmax values–pseudo-probabilities from the last layer of a neural network–to create a score function s(x,y) that encodes disagreement between x and y (in a regression setting these are just the residuals). You compute q_hat, the ((n+1)(1-alpha))/n quantile of the scores on the calibration set. Then given a new instance x_n+1, you construct a prediction set for y_n+1 by including all y’s for which the score is less than or equal to q_hat. There are various ways to achieve slightly better performance, such as using cumulative summed scores and regularization instead.

Recht makes several good points about limitations of conformal prediction, including:

—The marginal coverage guarantees are often not very useful. Instead we want conditional coverage guarantees that hold conditional on the value of Xn+1 we observe. But you can’t get true conditional coverage guarantees (i.e., P(Yn+1 ∈ Cn(Xn+1)|Xn+1 = x) >= 1 − alpha for all P and almost all x) if you also want the approach to be distribution free (see e.g., here), and in general you need a very large calibration set to be able to say with high confidence that there is a high probability that your specific interval contains the true Yn+1.

—When we talk about needing prediction bands for decisions, we are often talking about scenarios where the decisions we want to make from the uncertainty quantification are going to change the distribution and violate the exchangeability criterion. 

—Additionally, in many of the settings where we might imagine using prediction sets there is potential for recourse. If the prediction is bad, resulting in a bad action being chosen, the action can be corrected, i.e., “If you have multiple stages of recourse, it almost doesn’t matter if your prediction bands were correct. What matters is whether you can do something when your predictions are wrong.”

Recht also criticizes research on conformal prediction as being fixated on the ability to make guarantees, irrespective of how useful the resulting intervals are. E.g., we can produce sets with 95% coverage with only two points, and the guarantees are always about coverage instead of the width of the interval.

These are valid points, worth discussing given how much attention conformal prediction has gotten lately. Some of the concerns remind me of the same complaints we often hear about traditional confidence intervals we put on parameter estimates, where the guarantees we get (about the method) are also generally not what we want (about the interval itself) and only actually summarize our uncertainty when the assumptions we made in inference are all good, which we usually can’t verify. A conformal prediction interval is about uncertainty in a model’s prediction on a specific instance, which perhaps makes it more misleading to some people given that it might not be conditional on anything specific to the instance. Still, even if the guarantees don’t stand as stated, I think it’s difficult to rule out an approach without evaluating how it gets used. In some sense, the meaning of an uncertainty quantification depends on its use, and what the alternatives considered in a given situation are. So I guess I disagree that one can answer the question “Can conformal prediction achieve the uncertainty quantification we need for decision-making?” without considering the specific decision at hand, how we are constructing the prediction set exactly (since there are ways to condition the guarantees on some instance-specific information), and how it would be made without a prediction set.

There are various scenarios where prediction sets are used without a human in the loop, like to get better predictions or directly calibrate decisions, where it seems hard to argue that it’s not adding value over not incorporating any uncertainty quantification. Conformal prediction for alignment purposes (e.g., control the factuality or toxicity of LLM outputs) seems to be on the rise. However I want to focus here on a scenario where we are directly presenting a human with the sets. One type of setting where I’m curious whether conformal prediction sets could be useful are those where 1) models are trained offline and used to inform people’s decisions, and 2) it’s hard to rigorously quantify the uncertainty in the predictions using anything the model produces internally, like softmax values which can be overfit to the training sample.

For example, a doctor needs to diagnose a skin condition and has access to a deep neural net trained on images of skin conditions for which the illness has been confirmed. Even if this model appears to be more accurate than the doctor on evaluation data, the hospital may not be comfortable deploying the model in place of the doctor. Maybe the doctor has access to additional patient information that may in some cases allow them to make a better prediction, e.g., because they can decide when to seek more information through further interaction or monitoring of the patient. This means the distribution does change upon acting on the prediction, and I think Recht would say there is potential for recourse here, since the doctor can revise the treatment plan over time (he lists a similar example here). But still, at any given point in time, there’s a model and there’s a decision to be made by a human.    

Giving the doctor information about the model’s confidence in its prediction seems like it should be useful in helping them appraise the prediction in light of their own knowledge. Similarly, giving them a prediction set over a single top-1 prediction seems potentially preferable so they don’t anchor too heavily on a single prediction. Deep neural nets for medical diagnoses can do better than many humans in certain domains while still having relatively low top-1 accuracy (e.g., here). 

A naive thing to do would be to just choose some number k of predictions from the model we think a doctor can handle seeing at once, and show the top-k with softmax scores. But an adaptive conformal prediction set seems like an improvement in that at least you get some kind of guarantee, even if it’s not specific to your interval like you want. Set size conveys information about the level of uncertainty like the width of a traditional confidence interval does, which seems more likely to be helpful for conveying relative uncertainty than holding set size constant and letting the coverage guarantee change (I’ve heard from at least one colleague who works extensively with doctors that many are pretty comfortable with confidence intervals). We can also take steps toward the conditional coverage that we actually want by using an algorithm that calibrates the guarantees over different labels, or maybe over certain covariates if data allows. 

So while I agree with all the limitations, I don’t necessarily agree with Recht’s concluding sentence here:

“If you have multiple stages of recourse, it almost doesn’t matter if your prediction bands were correct. What matters is whether you can do something when your predictions are wrong. If you can, point predictions coupled with subsequent action are enough to achieve nearly optimal decisions.” 

It seems possible that seeing a prediction set (rather than just a single top prediction) will encourage a doctor to consider other diagnoses that they may not have thought of. Presenting uncertainty often has _some_ effect on a person’s reasoning process, even if they can revise their behavior later. The effect of seeing more alternatives could be bad in some cases (they get distracted by labels that don’t apply), or it could be good (a hurried doctor recognizes a potentially relevant condition they might have otherwise overlooked). If we allow for the possibility that seeing a set of alternatives helps, it makes sense to have a way to generate them that give us some kind of coverage guarantee we can make sense of, even if it gets violated sometimes. 

This doesn’t mean I’m not skeptical of how much prediction sets might change things over more naively constructed sets of possible labels. I’ve spent a bit of time thinking about how, from the human perspective, prediction sets could or could not add value, and I suspect its going to be nuanced, and the real value probably depends on how the coverage responds under realistic changes in distribution. There are lots of questions that seem worth trying to answer in particular domains where models are being deployed to assist decisions. Does it actually matter in practice, such as in a given medical decision setting, for the quality of decisions that are made if the decision-makers are given a set of predictions with coverage guarantees versus a top-k display without any guarantees? And, what happens when you give someone a prediction set with some guarantee but there are distribution shifts such that the guarantees you give are not quite right? Are they still better off with the prediction set or is this worse than just providing the model’s top prediction or top-k with no guarantees? Again, many of the questions could also be asked of other uncertainty quantification approaches; conformal prediction is just easier to implement in many cases. I have more to say on some of these questions based on a recent study we did on decisions from prediction sets, where we compared how accurately people labeled images using them versus other displays of model predictions, but I’ll save that for another post since this is already long. 

Of course, it’s possible that in many settings we would be better using some inherently interpretable model for which we no longer need a distribution-free approach. And ultimately we might be better off if we can better understand the decision problem the human decision-maker faces and apply decision theory to try to find better strategies  rather than leaving it up to the human how to combine their knowledge with what they get from a model prediction. I think we still barely understand how this occurs even in high stakes settings that people often talk about.

Stabbers gonna stab — fraud edition

One of the themes of Dan Davies’s book, Lying for Money, was that fraudsters typically do their crimes over and over again, until they get caught. And then, when they are released from prison, they do it again. This related to something I noticed in the Theranos story, which was that the fraud was in open sight for many years and the fraudsters continued to operate in the open.

Also regarding that interesting overlap of science and business fraud, I noted:

There seem to have been two ingredients that allowed Theranos to work. And neither of these ingredients involved technology or medicine. No, the two things were:

1. Control of the narrative.

2. Powerful friends.

Neither of these came for free. Theranos’s leaders had to work hard, for long hours, for years and years, to maintain control of the story and to attract and maintain powerful friends. And they needed to be willing to lie.

The newest story

Ben Mathis-Lilley writes:

On Wednesday, the Department of Justice announced that it has arrested a 48-year-old Lakewood, New Jersey, man named Eliyahu “Eli” Weinstein on charges of operating, quote, “a Ponzi scheme.” . . . How did authorities know that Weinstein was operating a Ponzi scheme? For one thing, he allegedly told associates, while being secretly recorded, that he had “Ponzied” the money they were using to repay investors. . . . Weinstein is further said to have admitted while being recorded that he had hidden assets from federal prosecutors. (“I hid money,” he is said to have told his conspirators, warning them that they would “go to jail” if anyone else found out.) . . .

These stories of “least competent criminals” are always fun, especially when the crime is nonviolent so you don’t have to think too hard about the victims.

What brings this one to the next level is the extreme repeat-offender nature of the criminal:

There was also one particular element of Weinstein’s background that may have alerted the DOJ that he was someone to keep an eye on—namely, that he had just been released from prison after serving eight years of a 24-year sentence for operating Ponzi schemes. More specifically, Weinstein was sentenced to prison for operating a Ponzi scheme involving pretend real estate transactions, then given a subsequent additional sentence for operating a second Ponzi scheme, involving pretend Facebook stock purchases, that he conducted after being released from custody while awaiting trial on the original charges.

Kinda like when a speeding driver runs over some kid and then it turns out the driver had 842 speeding tickets and the cops had never taken away his car, except in this case there’s no dead kid and the perp had already received a 24-year prison sentence.

How is it that he got out after serving only 8 years, anyway?

In January 2021, Weinstein was granted clemency by President Donald Trump at the recommendation of, among others, “the lawyer Alan Dershowitz,” who has frequently been the subject of news coverage in recent years for his work representing Trump and his relationship with the late Jeffrey Epstein.

Ahhhhh.

This all connects to my items #1 and 2 above.

The way Weinstein succeeded (to the extent he could be considered a success) at fraud was control of the narrative. And he got his get-out-of-jail-free card from his powerful friends. “Finding your roots,” indeed.

Stabbers gonna stab

This all reminded me of a story that came out in the newspaper a few decades ago. Jack Henry Abbott was a convicted killer who published a book while in prison. Abbott’s book was supposed to be very good, and he was subsequently released on parole with the support of various literary celebrities including Norman Mailer. Shortly after his release, Abbott murdered someone else and returned to prison, where he spent the rest of his life.

The whole story was very sad, but what made it particularly bizarre was that Abbott’s first murder was a stabbing, his second murder was a stabbing, and his most prominent supporter, Mailer, was notorious for . . . stabbing someone.

A gathering of the literary critics: Louis Menand and Thomas Mallon, meet Jeet Heer

Marshall McLuhan: The environment is not visible. It’s information. It’s electronic.

Normal Mailer: Well, nonetheless, nature still exhibits manifestations which defy all methods of collecting information and data. For example, an earthquake may occur, or a tidal wave may come in, or a hurricane may strike. And the information will lag critically behind our ability to control it.

Regular readers will know that I’m a big fan of literary criticism.  See, for example,

“End of novel. Beginning of job.”: That point at which you make the decision to stop thinking and start finishing

Contingency and alternative history (followup here)

Kazin to Birstein to a more general question of how we evaluate people’s character based on traits that might, at least at first glance, appear to be independent of character (followup here)

“Readability” as freedom from the actual sensation of reading

Things that I like that almost nobody else is interested in

Anthony West’s literary essays

I recently came across a book called “Sweet Lechery: Reviews, Essays and Profiles,” by literary journalist Jeet Heer. The “Lechery” in the title is a bit misleading, but, yes, Heer is open about sexual politics. In any case, like the best literary critics, he engages with the literary works and the authors in the context of politics and society. He has some of the overconfidence of youth—the book came out ten years ago, and some of its essays are from ten or more years before that—, and there’s a bunch of obscure Canadian stuff that doesn’t interest me, but overall I found the writing fun and the topics interesting.

One good thing about the book was its breadth of cultural concerns, including genre and non-genre literature, political writing, and comic books, with the latter taken as of interest in themselves, not merely as some sort of cultural symbol.

I also appreciated that he didn’t talk about movies or pop music. I love movies and pop music, but they’re also such quintessential topics for Boomer critics who want to show their common touch. There are enough other places where I can read about how Stevie Wonder and Brian Wilson are geniuses, that Alex Chilton is over- or under-rated, appreciation of obscure records and gritty films from the 1970s, etc.

My comparison point here is Louis Menand’s book on U.S. cold war culture from 1945-1965, which made me wonder how he decided what to leave in and what to leave out. I’m a big fan of Menand—as far as I’m concerned, he can write about whatever he wants to write about—; it was just interesting to consider all the major cultural figures he left out, even while considering the range of characters he included in that book. Heer writes about Philip Roth but also about John Maynard Keynes; he’s not ashamed to write about, and take seriously, high-middlebrow authors such as John Updike and Alice Munro, while also finding time to write thoughtfully about Robert Heinlein and Philip K. Dick. I was less thrilled with his writing about comics, not because of anything he said that struck me as wrong, exactly, but rather because he edged into a boosterish tone, promotion as much as criticism.

Another comparison from the New Yorker stable of writers is Thomas Mallon, who notoriously wrote this:

Screen Shot 2015-06-14 at 12.32.19 PM

Thus displaying his [Mallon’s] ignorance of Barry Malzberg, who has similarities with Mailer both in style and subject matter. I guess that Malzberg was influenced by Mailer.

And, speaking of Mailer, who’s written some good things but I think was way way overrated by literary critics during his lifetime—I’m not talking about sexism here, I just think there were lots of other writers of his time who had just as much to say and could say it better, with more lively characters, better stories, more memorable turns of phrase, etc.—; anyway, even though I’m not the world’s biggest Mailer fan, I did appreciate the following anecdote which appeared, appropriately enough, in an essay by Heer about Canadian icon Marshall McLuhan:

Connoisseurs of Canadian television should track down a 1968 episode of a CBC program called The Summer Way, a highbrow cultural and political show that once featured a half-hour debate about technology between McLuhan and the novelist Norman Mailer. . . .

McLuhan: We live in a time when we have put a man-made satellite environment around the planet. The planet is no longer nature. It’s no longer the external world. It’s now the content of an artwork. Nature has ceased to exist.

Mailer: Well, I think you’re anticipating a century, perhaps.

McLuhan: But when you put a man-made environment around the planet, you have in a sense abolished nature. Nature from now on has to be programmed.

Mailer: Marshall, I think you’re begging a few tremendously serious questions. One of them is that we have not yet put a man-made environment around this planet, totally. We have not abolished nature yet. We may be in the process of abolishing nature forever.

McLuhan: The environment is not visible. It’s information. It’s electronic.

Mailer: Well, nonetheless, nature still exhibits manifestations which defy all methods of collecting information and data. For example, an earthquake may occur, or a tidal wave may come in, or a hurricane may strike. And the information will lag critically behind our ability to control it.

McLuhan: The experience of that event, that disaster, is felt everywhere at once, under a single dateline.

Mailer: But that’s not the same thing as controlling nature, dominating nature, or superseding nature. It’s far from that. Nature still does exist as a protagonist on this planet.

McLuhan: Oh, yes, but it’s like our Victorian mechanical environment. It’s a rear-view mirror image. Every age creates as a utopian image a nostalgic rear-view mirror image of itself, which puts it thoroughly out of touch with the present. The present is the enemy.

That’s great! I love how McLuhan keeps saying these extreme but reasonable-sounding things and then, each time, Mailer brings him down to Earth. Norman Mailer, who built much of a career on bloviating philosophizing, is the voice of reason here. The snippet that I put at the top of this post is my favorite: McLuhan as glib Bitcoin bro, Mailer as the grizzly dad who has to pay the bills and fix the roof after the next climate-induced hurricane.

Heer gets it too, writing:

It’s a measure of McLuhan’s ability to recalibrate the intellectual universe that in this debate, Mailer—a Charlie Sheen–style roughneck with a history of substance abuse, domestic violence, and public mental breakdowns—comes across as the voice of sobriety and sweet reason.

Also, Heer’s a fan of Uncle Woody!

Lefty Driesell and Bobby Knight

This obit of the legendary Maryland basketball coach reminded me of a discussion we had a few years ago. It started with a remark in a published article by political scientist Diana Mutz identifying herself as “a Hoosier by birth and upbringing, the daughter of a former Republican officeholder, and someone who still owns a home in Mike Pence’s hometown.”

That’s interesting: I don’t know so many children of political officeholders! Actually, I can’t think of anyone I know, other than Mutz, who is a child of a political officeholder, but perhaps there are some such people in my social network. I don’t know the occupations of most of my friends’ parents.

Anyway, following up on that bit from Mutz, sociologist Steve Morgan added some background of his own:

I was also born in Indiana, and in fact my best friend in the 1st grade, before I left the state, was Pat Knight. To me, his father, Bobby Knight was a pleasant and generally kind man (who used to give us candy bars, etc.). He turned out to be a Trump supporter, and probably his son too. So, in addition to not appreciating his full basketball personality when I was 6 years old, I also did not see his potential to find a demagogue inspiring. We moved to Ohio, where I received a lot of education in swing-state politics and Midwestern resentment of coastal elites.

And then I threw in my two cents:

I was not born in Indiana, but I grew up in suburban Maryland (about 10 miles from Brett Kavanaugh, but I went to a public school in a different part of the county and so had zero social overlap with his group). One of the kids in my school was Chuck Driesell, son of Lefty Driesell, former basketball coach at the University of Maryland. Lefty is unfortunately now most famous for his association with Len Bias, but Chuck and I were in high school before that all happened, when Lefty was famous for being a good coach who couldn’t ever quite beat North Carolina. Once I remember the Terps decided to beat Dean Smith at his own game by doing the four corners offense themselves. But it didn’t work; I think Maryland ended up losing 21-18 or some other ping-pong-like score. Chuck was in my economics class. I have no idea if he’s now a Trump supporter. I guess it’s possible. One of the other kids in that econ class was an outspoken conservative, one of the few Reagan supporters of our group of friends back in 1980. Chuck grew up and became a basketball coach; the other kid grew up and became an economist.

I never went to a Maryland basketball game all the time I lived there, even when I was a student at the university. I wish I’d gone; I bet it would’ve been a lot of fun. My friends and I played some pickup soccer and basketball, and I watched lots of sports on TV, but for whatever reason we never even considered the idea of going to a game. We didn’t attend any of high school football games either, even though our school’s team was the state champions. This was not out of any matter of principle; we just never thought of going. Our loss.

Here’s some academic advice for you: Never put your name on a paper you haven’t read.

Success has many fathers, but failure is an orphan.

Jonathan Falk points to this news article by Tom Bartlett which has this hilarious bit:

What at first had appeared to be a landmark study . . . seemed more like an embarrassment . . .

[The second-to-last author of the paper,] Armando Solar-Lezama, a professor in the electrical-engineering and computer-science department at MIT and associate director of the university’s computer-science and artificial-intelligence laboratory, says he didn’t realize that the paper was going to be posted as a preprint. . . .

The driving force behind the paper, according to Solar-Lezama and other co-authors, was Iddo Drori, [the last author of the paper and] an associate professor of the practice of computer science at Boston University. . . . The two usually met once a week or so. . . .

Solar-Lezama says he was unaware of the sentence in the abstract that claimed ChatGPT could master MIT’s courses. “There was sloppy methodology that went into making a wild research claim,” he says. While he says he never signed off on the paper being posted, Drori insisted when they later spoke about the situation that Solar-Lezama had, in fact, signed off. . . .

Solar-Lezama and two other MIT professors who were co-authors on the paper put out a statement insisting that they hadn’t approved the paper’s posting . . . Drori didn’t agree to an interview for this story, but he did email a 500-word statement providing a timeline of how and when he says the paper was prepared and posted online. In that statement, Drori writes that “we all took active part in preparing and editing the paper” . . . The revised version doesn’t appear to be available online and the original version has been withdrawn. . . .

This reminds me of a piece of advice that someone once gave me: Never put your name on a paper you haven’t read.

The Lakatos soccer training

Alex Lax writes:

While searching the Internet for references to Lakatos, I noticed your comment about Lakatos being a Stalinist. I met Imre Lakatos shortly after his arrival in the UK. My parents spoke Hungarian and helped to settle the refugees to 1956. Imre Lakatos was one of those the refugees. I remember him playing football with me at a time when Hungarian football was seen as far superior to English football, and I also remember once when we met him at Cambridge railway station with his latest girlfriend who was very tall. She had managed to lose some contact lenses and I was grovelling around on the road trying to find them. During his visits he would often complain about his treatment in prison which destroyed his stomach and he would rant against the Communists. However after his death, I was told that a book by a well known French Communist was dedicated to Imre. I have not found this dedication but if true would suggest that he was a Communist of some flavour while pretending otherwise.

I hope this might be of interest to you.

He adds:

By the way, the Lakatos soccer training consisted of two players on a small pitch with two smallish opposing goals, with each player protecting their own goal. Each player was only allowed to touch the ball once.

I’m interested in Lakatos because his writing has been very influential to my work; see for example here and here. He was said to be a very difficult person, but perhaps that was connected in some way to his uncompromising intellectual nature, which served him well as an innovator in the philosophy of science.

Uncertainty in games: How to get that balance so that there’s a motivation to play well, but you can still have a chance to come back from behind?

I just read the short book, “Uncertainty in games,” by Greg Costikyan. It was interesting. His main point, which makes sense to me, is that uncertainty a key part of the appeal of any game. He gives interesting examples of different sources of uncertainty. For example, if you’re playing a video game such as Pong, the uncertainty is in your own reflexes and reactions. With Diplomacy, there’s uncertainty in what the other players will do. With poker, there’s uncertainty about all the hole cards. With chess, there’s uncertainty in what the other player will do and also uncertainty in the logical implications of any position, in the same way that I am uncertain about what is the 200th digit of the decimal expansion of pi, even though that number exists. I agree with Costikyan that uncertainty is a helpful concept for thinking about games.

There’s one thing he didn’t discuss in his book, though, that I wanted to hear more about, and that’s the way that time and uncertainty interact in games, and how this factors into game design. I’ve been thinking a lot about time lately, and this is another example, especially relevant to me as we’re in the process of finishing up the design of a board game, and we want to improve its playability.

To fix ideas, consider a multi-player tabletop game with a single winner, and suppose the game takes somewhere between a half hour and two hours to play. As a player, I want to have a real chance of winning, until close to the end, and when the game reaches the point at which I pretty much know I can’t lose, I still want it to be fun, I want some intermediate goal such as the possibility of being a spoiler, or of being able to capitalize on my opponents’ mistakes. At the same time, I don’t want the outcome to be entirely random.

Consider two extremes:
1. One player gets ahead early and then can relentlessly exploit the advantage to get a certain win.
2. Nobody is ever ahead by much; there’s a very equal balance, and the winner is decided only at the very end by some random event.

Option #1 actually isn’t so bad—as long as the player in the lead can compound the advantage and force the win quickly. For example, in chess, if you have a decisive lead you can use your pieces together to increase your advantage. This is to be distinguished from how we played as kids, which was that once you’re in the lead you’d just try to trade pieces until the opposing player had nothing left: that got pretty boring. If you can use your pieces together, the game is more interesting even during the period where the winning player is clinching it.

Option #2 would not be so much fun. Sure, sometimes you will have a close game that’s decided at the very end, and that’s fine, but I’d like for victory to be some reflection of cumulative game play, as otherwise it’s meaningless.

Sometimes this isn’t so important. In Scrabble, for example, the play itself is enjoyable. The competition can also be good—it’s fun to be in a tight game where you’re counting the letters, blocking out the other player, and strategizing to get that final word on the board—but even if you’re way behind, you can still try to get the most out of your rack.

In some other games, though, once you’re behind and you don’t have a chance to win, it’s just a chore to keep playing. Monopoly and Risk handle this by creating a positive incentive for players to wipe out weak opponents, so that once you’re down, you’ll soon be out.

And yet another approach is to have cumulative scoring. In poker it’s all about the money. Whether you’re ahead or behind for the night, you’re still motivated to improve your bankroll.

One thing I don’t have a good grip on regarding game design is how to get that balance between all these possibilities, so that how you play matters throughout the game, while at the same time keeping the possibility of winning for as long as is feasibly possible.

I remember my dad saying that he preferred tennis scoring (each game is played to 4 points, each set is 6 games, you need to win 2 or 3 sets) as compared to old-style ping-pong scoring (whoever reaches 21 points first, wins), because in tennis, even if you’re way behind, you always have a chance to come back. Which makes sense, and is related to Costikyan’s point about uncertainty, but is hard for me to formalize.

A key idea here, I think, is that the relative skill of the players during the course of a match is a nonstationary process. For example, if player A is winning, perhaps up 2 sets to 0 and up 5 games to 2 in the third set, but then player B comes from behind to catch up and then maybe win in the fifth set, yes, this is an instance of uncertainty in action, but it won’t be happening at random. What will happen is that A gets tired, or B figures out a new plan of action, or some other factor that affects the relative balance of skill. And that itself is part of the game.

In summary, we’d like the game to balance three aspects:

1. Some positive feedback mechanism so that when you’re ahead you can use this advantage to increase your lead.

2. Some responsiveness to changes in effort and skill during the game, so that by pushing really hard or coming up with a clever new strategy you can come back from behind.

3. Uncertainty, as emphasized by Costikyan.

I’m sure that game designers have thought systematically about such things; I just don’t know where to look.

Those annoying people-are-stupid narratives in journalism

Palko writes:

Journalists love people-are-stupid narratives, but, while I believe cognitive dissonance is real, I think the lesson here is not “To an enthusiastically trusting public, his failure only made his gifts seem more real” and is instead that we should all be more skeptical of simplistic and overused pop psychology.

It’s easier for me to just give the link above than to explain all the background. The story is interesting on its own, but here I just wanted to highlight this point that Palko makes. Yes, people can be stupid, but it’s frustrating to see journalists take a story of a lawsuit-slinging celebrity and try to twist it into a conventional pop-psychology narrative.

I love this paper but it’s barely been noticed.

Econ Journal Watch asked me and some others to contribute to an article, “What are your most underappreciated works?,” where each of us wrote 200 words or less about an article of ours that had received few citations.

Here’s what I wrote:

What happens when you drop a rock into a pond and it produces no ripples?

My 2004 article, Treatment Effects in Before-After Data, has only 23 citations and this goes down to 16 after removing duplicates and citations from me. But it’s one of my favorite papers. What happened?

It is standard practice to fit regressions using an indicator variable for treatment or control; the coefficient represents the causal effect, which can be elaborated using interactions. My article from 2004 argues that this default class of models is fundamentally flawed in considering treatment and control conditions symmetrically. To the extent that a treatment “does something” and the control “leaves you alone,” we should expect before-after correlation to be higher in the control group than in the treatment group. But this is not implied by the usual models.

My article presents three empirical examples from political science and policy analysis demonstrating the point. The article also proposes some statistical models. Unfortunately, these models are complicated and can be noisy to fit with small datasets. It would help to have robust tools for fitting them, along with evidence from theory or simulation of improved statistical properties. I still hope to do such work in the future, in which case perhaps this work will have the influence I hope it deserves.

Here’s the whole collection. The other contributors were Robert Kaestner, Robert A. Lawson, George Selgin, Ilya Somin, and Alexander Tabarrok.

My contribution got edited! I prefer my original version shown above; if you’re curious about the edited version, just follow the link and you can compare for yourself.

Others of my barely-noticed articles

Most of my published articles have very few citations; it’s your usual Zipf or long-tailed thing. Some of those have narrow appeal and so, even if I personally like the work, it is understandable that they haven’t been cited much. For example, “Bayesian hierarchical classes analysis” (16 citations) took a lot of effort on our part and appeared in a good journal, but ultimately it’s on a topic that not many researchers are interested in. For another example, I enjoyed writing “Teaching Bayes to Graduate Students in Political Science, Sociology, Public Health, Education, Economics, . . .” (17 citations) and I think if it reached the right audience of educators it could have a real influence, but it’s not the kind of paper that gets built upon or cited very often. A couple of my ethics and statistics papers from my Chance column only have 14 citations each; no surprise given that nobody reads Chance. At one point I was thinking of collecting them into a book, as this could get more notice.

Some papers are great but only take you part of the way there. I really like my morphing paper with Cavan and Phil, “Using image and curve registration for measuring the goodness of fit of spatial and temporal predictions” (12 citations) and, again, it appeared in a solid journal, but it was more of a start than a finish to a research project. We didn’t follow it up, and it seems that nobody else did either.

Sometimes we go to the trouble of writing a paper and going through the review process, but then it gets so little notice that I ask myself in retrospect, why did we bother? For example, “Objective Randomised Blinded Investigation With Optimal Medical Therapy of Angioplasty in Stable Angina (ORBITA) and coronary stents: A case study in the analysis and reporting of clinical trials” has been cited only 5 times since its publication in 2019—and three of those citations were from me. It seems safe to say that this particular dropped rock produced few ripples.

What happened? That paper had a good statistical message and a good applied story, but we didn’t frame it in a general-enough way. Or . . . it wasn’t quite that, exactly. It’s not a problem of framing so much as of context.

Here’s what would’ve made the ORBITA paper work, in the sense of being impactful (i.e., useful): either a substantive recommendation regarding heart stents or a general recommendation (a “method”) regarding summarizing and reporting clinical studies. We didn’t have either of these. Rather than just getting the paper published, we should’ve done the hard work to more forward in one of those two directions. Or, maybe our strategy was ok if we can use this example in some future article. The article presented a great self-contained story that could be part of larger recommendations. But the story on its own didn’t have impact.

This is a good reminder that what typically makes a paper useful is if it can get used by people. A starting point is the title. We should figure out who might find the contents of the article useful and design the title from there.

Or, for another example, consider “Extension of the Isobolographic Approach to Interactions Studies Between More than Two Drugs: Illustration with the Convulsant Interaction between Pefloxacin, Norfloxacin, and Theophylline in Rats” (5 citations). I don’t remember this one at all, and maybe it doesn’t deserve to be read—but if it does, maybe it should’ve be more focused on the general approach so it could’ve been more directly useful to people working in that field.

“Information, incentives, and goals in election forecasts” (21 citations). I don’t know what to say about this one. I like the article, it’s on a topic that lots of people care about, the title seems fine, but not much impact. Maybe more people will look at it in 2024? “Accounting for uncertainty during a pandemic” is another one with only 21 citations. For that one, maybe people are just sick of reading about the goddam pandemic. I dunno; I think uncertainty is an important topic.

The other issue with citations is that people have to find your paper before they would consider citing it. I guess that many people in the target audiences for our articles never even knew they existed. From that perspective, it’s impressive that anything new ever gets cited at all.

Here’s an example of a good title: “A simple explanation for declining temperature sensitivity with warming.” Only 25 citations so far, but I have some hopes for this one: the title really nails the message, so once enough people happen to come across this article one way or another, I think they’ll read it and get the point, and this will eventually show up in citations.

“Tables as graphs: The Ramanujan principle” (4 citations). OK, I love this paper too, but realistically it’s not useful to anyone! So, fair enough. Similarly with “‘How many zombies do you know?’ Using indirect survey methods to measure alien attacks and outbreaks of the undead” (6 citations). An inspired, hilarious effort in my opinion, truly a modern classic, but there’s no real reason for anyone to actually cite it.

“Should we take measurements at an intermediate design point?” (3 citations). This is the one that really bugs me. Crisp title, clean example, innovative ideas . . . it’s got it all. But it’s sunk nearly without a trace. I think the only thing to do here is to pursue the researcher further, get new results, and publish those. Maybe also set up the procedure more explicitly as a method, rather than just the solution to a particular applied problem.

Torment executioners in Reno, Nevada, keep tormenting us with their publications.

The above figures come from this article which is listed on this Orcid page (with further background here):

Horrifying as all this is, at least from the standpoint of students and faculty at the University of Nevada, not to mention the taxpayers of that state, I actually want to look into a different bizarre corner of the story.

Let me point you to a quote from a recent article in Retraction Watch:

The current editor-in-chief [of the journal that featured the above two images, among with lots lots more] . . . published a statement about the criticism on the journal’s website, where he took full responsibility for the journal’s shortcomings. “While you can argue on the merits, quality, or impact of the work it is all original and we vehemently disagree with anyone who says otherwise,” he wrote.

I don’t think that claim is true. In particular, I don’t think it’s correct to state, vehemently or otherwise, that the work published in that journal is “all original.” I say this on the evidence of this paragraph from the this article that appeared there, an article we associate with the phrase, “torment executioners“:

It appears that the original source of this material was an article that had appeared the year before in an obscure and perhaps iffy outlet called The Professional Medical Journal. From the abstract of the paper in that journal:

The scary thing is that if you google the current editor of the journal where the apparent bit of incompetent plagiarism was published, you’ll see that this is first listed publication:

Just in case you were wondering: no, “Cambridge Scholars Publishing” is not the same as Cambridge University Press.

Kinda creepy that someone who “vehemently” makes a false statement about plagiarism published in his own journal has published a book on “Guidelines for academic researchers.”

We seem to have entered a funhouse-mirror version of academia with entire journals and subfields of fake articles, advisers training new students to enter fake academic careers, and, in a Gresham’s law sort of way, crowding out legitimate teaching and researchers.

Not written by a chatbot

The published article from the above-discussed journal that got this all “torment executioners”started was called “Using Science to Minimize Sleep Deprivation that May Reduce Train Accidents.” It’s two paragraphs long, includes a mislabeled figure that was a stock image of a fly, and has no content.

I pointed that article to a colleague who asked whether it was written by ChatGPT. I said, no, I didn’t think so because it was too badly written to be by a chatbot. I was not joking! Chatbot text is coherent at some level, often following something like the format of the standard five-paragraph high school essay, while this article did not make any sense at all. I think it’s more likely that it was a really bad student paper, maybe something written in desperation in the dwindling hours before the assignment was due, and then they published it in this fake journal. On the other hand, it was published in 2022, and chatbots were not so good back in 2022, so maybe it really is the product of an incompetent chatbot. Or maybe it was put together from plagiarized material, as in the “torment executioners” paper, and we just don’t have the original source to demonstrate it. My guess remains that it was a human-constructed bit of nonsense, but I’m guessing that anyone who would do this sort of thing today would use a chatbot. So in that sense these articles are a precious artifact of the past.

Back to the torment executioners

That apparently plagiarized article was still bugging me. One weird part of the story is that even the originally-published study seems a bit off, with statements such as “42% dentist preferred both standing and sitting position.” Maybe the authors of the “torment executioners” paper purposely picked something from a very obscure source, under the belief that then nobody would catch the copying?

What the authors of the “torment executioners” paper seem to have done is to take material from the paper that had been published earlier in in a different journal and run it through a computer program that changed some of the words, perhaps to make it less easily caught by plagiarism detectors? Here’s the map of transformations:

"acquired" -> "procured"
"vision" -> "perception"
"incidence" -> "effect"
"involvement" -> "association"
"followed" -> "taken after"
"Majority of them" -> "The larger part of the dental practitioner"
"intensity of pain" -> "concentration of torment"

Ha! Now we’re getting somewhere. “Concentration of torment,” indeed.

OK, let’s continue:

"discomfort" -> "inconvenience"
"aching" -> "hurting"
"paracetamol" -> "drugs"
"pain killer" -> "torment executioners"

Bingo! We found it. It’s interesting that this last word was made plural in translation. This suggests that the computer program that did these word swaps also had some sort of grammar and usage checker, so as a side benefit it fixed a few errors in the writing of the original article. The result is to take an already difficult-to-read passage and make it nearly incomprehensible.

But we’re not yet done with this paragraph. We also see:

"agreed to the fact" -> "concurred to the truth"

This is a funny one, because “concurred” is a reasonable synonym for “agreed,” and “truth” is not a bad replacement for “fact,” but when you put it together you get “concurred to the truth,” which doesn’t work here at all.

And more:

"pain" -> "torment level"
"aggravates" -> "bothers"
"repetitive movements" -> "tedious developments"

Whoa! That makes no sense at all. A modern chatbot would do it much better, I guess.

Here are a few more fun ones, still from this same paragraph of Ferguson et al. (2019):

"Conclusions:" -> "To conclude"
"The present study" -> "the display consideration"

“Display consideration”? Huh?

"high prevalence" -> "tall predominance"

This reminded me of Lucius Shepard’s classic story, “Barnacle Bill the Spacer,” which featured a gang called the Strange Magnificence. Maybe the computer program was having some fun here!

"disorders" -> "disarrangement"
"dentist" -> "dental specialists"
"so there should be" -> "in this manner"
"preventing" -> "avoiding"
"delivered" -> "conveyed"
"during" -> "amid"
"undergraduate curriculum" -> "undergrad educational programs"
"should be programmed" -> "ought to be put up"
"explain" -> "clarify"
"prolonged" -> "drawn out"

Finally, “bed posture density” becomes “bed pose density.” I don’t know about this whole “bed posture” thing . . . maybe someone could call up the Dean of Engineering at the University of Nevada and find out what’s up with that.

The whole article is hilarious, not just that paragraph. It’s a fun game, to try to figure out the original source of phrases such as, “indigent body movements” (indigent = poor) and “There are some signs when it comes to musculoskeletal as well” (I confess to be baffled by this one), and, my personal favorite, “Several studies have shown that
overweight children are an actual thing.”

Whaddya say, president and provost of the University of Reno? Are you happy that your dean of engineering is running a journal that publishes a paper like that? “Overweight children are an actual thing.”

Oh, it’s ok, that paper was never read from beginning to end by anybody—authors included.

Actually, this sentence might be my absolute favorite:

Having consolation in their shoes, having vigor in their shoes, and having quality in their shoes come to play within the behavioral design of youthful and talented kids with respect to the footwear they select to wear.

“Having vigor in their shoes” . . . that’s what it’s all about!

There’s “confidential dental clinics”: I guess “confidential” is being used as a “synonym” for private. And this:

Dental practitioners and other wellbeing callings in fact cannot dodge inactive stances for an awfully long time.

Exactly what you’d expect to see in a legitimate journal of the International Supply Chain Technology Journal.

I think the authors of this article are well qualified to teach in the USC medical school. They just need to work in some crazy giraffe facts and they’ll be just fine.

With the existence of chatbots, there will never be a need for this sort of ham-fisted plagiarism. End of an era. Kinda makes me sad.

P.S. As always, we laugh only to avoid crying. I remain furious on behalf of the hardworking students and faculty at UNR, not to mention the taxpayers of the state of Nevada, who are paying for this sort of thing. The phrase “torment executioners” has entered the lexicon.

P.P.S. Regarding the figures at the top of the post: I’ve coauthored papers with students. That’s fine; it’s a way that students can learn. I’m not at all trying to mock the students who made those pictures, if indeed that’s who drew them. I am criticizing whoever thought it was a good idea to publish this, not to mention to include it on professional C.V.’s. As a teacher, when you work with students, you try to help them do their best; you don’t stick your name on their crude drawings, plagiarized work, etc., which can’t be anyone’s best. I feel bad for any students who got sucked into this endeavor and were told that this sort of thing is acceptable work.

P.P.P.S. It looks like there may be yet more plagiarism going on; see here.

Clinical trials that are designed to fail

Mark Palko points us to a recent update by Robert Yeh et al. of the famous randomized parachute-jumping trial:

Palko writes:

I also love the way they dot all the i’s and cross all the t’s. The whole thing is played absolutely straight.

I recently came across another (not meant as satire) study where the raw data was complete crap but the authors had this ridiculously detailed methods section, as if throwing in a graduate level stats course worth of terminology would somehow spin this shitty straw into gold.

Yeh et al. conclude:

This reminded me of my zombies paper. I forwarded the discussion to Kaiser Fung, who wrote:

Another recent example from Covid is this Scottish study. They did so much to the data that it is impossible for any reader to judge whether they did the right things or not. The data are all locked down for “privacy.”

Getting back to the original topic, Joseph Delaney had some thoughts:

I think the parachute study makes a good and widely misunderstood point. Our randomized controlled trial infrastructure is designed for the drug development world, where there is a huge (literally life altering) benefit to proving the efficacy of a new agent. Conservative errors are being cautious and nobody seriously considers a trial designed to fail as a plausible scenario.

But you see new issues with trials designed to find side effects (e.g., RECORD has a lot more LTFU than I saw in a drug study, when I did trials we studied how to improve adherence to improve the results—but a trial looking for side effects that cost the company money would do the reverse). We teach in pharmacy that conservative design is actually a problem in safety trials.

Even worse are trials which are aliased with a political agenda. It’s easy-peasy to design a trial to fail (the parachute trial was jumping from a height of 2 feet). That makes me a lot more critical when you see trials where the failure of the trial would be seen as a upside, because it is just so easy to botch a trial. Designing good trials is very hard (smarter people than I spend entire careers doing a handful of them). It’s a tough issue.

Lots to chew on here.

If school funding doesn’t really matter, why do people want their kid’s school to be well funded?

A question came up about the effects of school funding and student performance, and we were referred to this review article from a few years ago by Larry Hedges, Terri Pigott, Joshua Polanin, Ann Marie Ryan, Charles Tocci, and Ryan Williams:

One question posed continually over the past century of education research is to what extent school resources affect student outcomes. From the turn of the century to the present, a diverse set of actors, including politicians, physicians, and researchers from a number of disciplines, have studied whether and how money that is provided for schools translates into increased student achievement. The authors discuss the historical origins of the question of whether school resources relate to student achievement, and report the results of a meta-analysis of studies examining that relationship. They find that policymakers, researchers, and other stakeholders have addressed this question using diverse strategies. The way the question is asked, and the methods used to answer it, is shaped by history, as well by the scholarly, social, and political concerns of any given time. The diversity of methods has resulted in a body of literature too diverse and too inconsistent to yield reliable inferences through meta-analysis. The authors suggest that a collaborative approach addressing the question from a variety of disciplinary and practice perspectives may lead to more effective interventions to meet the needs of all students.

I haven’t followed this literature carefully. It was my vague impression that studies have found effects of schools on students’ test scores to be small. So, not clear that improving schools will do very much. On the other hand, everyone wants their kid to go to a good school. Just for example, all the people who go around saying that school funding doesn’t matter, they don’t ask to reduce the funding of their own kids’ schools. And I teach at an expensive school myself. So lots of pieces here, hard for me to put together.

I asked education statistics expert Beth Tipton what she thought, and she wrote:

I think the effect of money depends upon the educational context. For example, in higher education at selective universities, the selection process itself is what ensures success of students – the school matters far less. But in K-12, and particularly in under resourced areas, schools and finances can matter a lot – thus the focus on charter schools in urban locales.

I guess the problem here is that I’m acting like the typical uninformed consumer of research. The world is complicated, and any literature will be a mess, full of claims and counter-claims, but here I am expecting there to be a simple coherent story that I can summarize in a short sentence (“Schools matter” or “Schools don’t matter” or, maybe, “Schools matter but only a little”).

Given how frustrated I get when others come into a topic with this attitude, I guess it’s good for me to recognize when I do it.

Hey, here’s some free money for you! Just lend your name to this university and they’ll pay you $1000 for every article you publish!

Remember that absolutely ridiculous claim that scientific citations are worth $100,000 each?

It appears that someone is taking this literally. Or, nearly so. Nick Wise has the story:

A couple of months ago a professor received the following email, which they forwarded to me.

Dear esteemed colleagues,

We are delighted to extend an invitation to apply for our prestigious remote research fellowships at the University of Religions and Denominations (URD) . . . These fellowships offer substantial financial support to researchers with papers currently in press, accepted or under review by Scopus-indexed journals. . . .

Fellowship Type: Remote Short-term Research Fellowship. . . .

Affiliation: Encouragement for researchers to acknowledge URD as their additional affiliation in published articles.

Remuneration: Project-based compensation for each research article.

Payment Range: Up to $1000 USD per article (based on SJR journal ranking). . . .

Why would the institution pay researchers to say that they are affiliated with them? It could be that funding for the university is related to the number of papers published in indexed journals. More articles associated with the university can also improve their placing in national or international university rankings, which could lead directly to more funding, or to more students wanting to attend and bringing in more money.

The University of Religions and Denominations is a private Iranian university . . . Until recently the institution had very few published papers associated with it . . . and their subject matter was all related to religion. . . . However, last year there was a substantial increase to 103 published papers, and so far this year there are already 35. This suggests that some academics have taken them up on the offer in the advert to include URD as an affiliation.

Surbhi Bhatia Khan is a lecturer in data science at the University of Salford in the UK since March 2023 and a top 2% scientist in the world according to Stanford University’s rankings. She published 29 research articles last year according to Dimensions, an impressive output, in which she was primarily affiliated to the University of Salford. In addition though, 5 of those submitted in the 2nd half of last year had an additional affiliation at the Department of Engineering and Environment at URD, which is not listed as one of the departments on the university website. Additionally, 19 of the 29 state that she’s affiliated to the Lebanese American University in Beirut, which she was not affiliated with before 2023. She is yet to mention her role at either of these additional affiliations on her LinkedIn profile.

Looking at the Lebanese American University, another private university, its publication numbers have shot up from 201 in 2015 to 503 in 2021 and 2,842 in 2023, according to Dimensions. So far in 2024 they have published 525, on track for over 6,000 publications for the year. By contrast, according to the university website, the faculty consisted of 547 full-time staff members in 2021 but had shrunk to 423 in 2023. It is hard to imagine how such growth in publication numbers could occur without a similar growth in the faculty, let alone with a reduction.

Wise writes:

How many other institutions are seeing incredible increases in publication numbers? Last year we saw gaming of the system on a grand scale by various Saudi Arabian universities, but how many offers like the one above are going around, whether by email or sent through Whatsapp groups or similar?

It’s bad news when universities in England, Iran, Saudi Arabia, and Lebanon start imitating the corrupt citation practices that we have previously associated with nearby Cornell University.

But I can see where Dr. Khan is coming from: if someone’s gonna send you free money, why not take it? Even if the “someone” is a University of Religions and Denominations, and none of your published research relates to religion, and you list an affiliation with an apparently nonexistent department.

The only thing that’s bugging me is that, according to an esteemed professor at Northeastern University, citations are worth $100,000 each—indeed, we are told that it is possible to calculate “exactly how much a single citation is worth.” In that case, Dr. Khan is getting ripped off by University of Religions and Denominations, who are offering a paltry “up to $1000”—and that’s per article, not per citation! I know about transaction costs etc. but maybe she could at least negotiate them up to $2000 per.

I can’t imagine this scam going on for long, but while it lasts you might as well get in on it. Why should professors at Salford University have all the fun?

Parting advice

Just one piece of advice for anyone who’s read this far down into the post: if you apply for the “Remote Short-term Research Fellowship” and you get it, and you send them the publication notice for your article that includes your affiliation with the university, and then they tell you that they’ll be happy to send you a check for $1000, you just have to wire them a $10 processing fee . . . don’t do it!!!

Listen to those residuals

This is Jessica. Speaking of data sonification (or sensification), Hyeok, Yea Seul Kim, and I write

Data sonification-mapping data variables to auditory variables, such as pitch or volume-is used for data accessibility, scientific exploration, and data-driven art (e.g., museum exhibitions) among others. While a substantial amount of research has been made on effective and intuitive sonification design, software support is not commensurate, limiting researchers from fully exploring its capabilities. We contribute Erie, a declarative grammar for data sonification, that enables abstractly expressing auditory mappings. Erie supports specifying extensible tone designs (e.g., periodic wave, sampling, frequency/amplitude modulation synthesizers), various encoding channels, auditory legends, and composition options like sequencing and overlaying. Using standard Web Audio and Web Speech APIs, we provide an Erie compiler for web environments. We demonstrate the expressiveness and feasibility of Erie by replicating research prototypes presented by prior work and provide a sonification design gallery. We discuss future steps to extend Erie toward other audio computing environments and support interactive data sonification.

Have you ever wanted to listen to your model fit? I haven’t, but I think it’s worth exploring how one would do so effectively, either for purposes of making data representations accessible to blind and visual impaired users, or for other purposes like data journalism or creating “immersive” experiences of data like you might find in museums.

But turns out it’s really hard to create data sonifications with existing tools! You have to learn low-level audio programming and use multiple tools to do things like combine several sonifications into a single design. Other tools only offer the ability to make sonifications corresponding to a narrow range of chart types, perhaps as a result of a bias toward thinking about sonifications only from the perspective of how they map to existing visualizations.

Hyeok noticed some of these issues and decided to do something about it. Erie provides a flexible specification format where you can define a sonification design in terms of tone (the overall quality of a sound) and encodings (mappings from data variables to auditory features). You can compose more complex sonifications by repeating, sequencing, and overlaying sonifications, and it interfaces with standard web audio APIs. 

Documentation on how to install and use Erie is here. There’s also an online editor you can use to try out the grammar. But first I recommend playing some of the examples, which include some simple charts and recreations of data journalism examples. My favorites are the residuals from a poorly fit model and a better fitting one. Especially if you play just the data series of these back to back, the better fit should sound more consistent and slightly more harmonious.

This was really Hyeok’s vision; I can’t claim to have contributed very much to this work. But it was interesting to watch it come together. During our meetings about the project, it was initially very unfamiliar to me, trying to interpret audio variables like pitch as carrying information about data values, and I can’t really say it’s gotten easier. I guess this gets at how hard it is to make data easily consumable in a serial format like audio, at least for users who are accustomed to all the benefits of parallel visual processing. 

Social penumbras predict political attitudes (my talk at Harvard on Monday Feb 12 at noon)

Monday, February 12, 2024, 12:00pm to 1:15pm

Social penumbras predict political attitudes

The political influence of a group is typically explained in terms of its size, geographic concentration, or the wealth and power of the group’s members. This article introduces another dimension, the penumbra, defined as the set of individuals in the population who are personally familiar with someone in that group. Distinct from the concept of an individual’s social network, penumbra refers to the circle of close contacts and acquaintances of a given social group. Using original panel data, the article provides a systematic study of various groups’ penumbras, focusing on politically relevant characteristics of the penumbras (e.g., size, geographic concentration, sociodemographics). Furthermore, we show the connection between changes in penumbra membership and public attitudes on policies related to the group.

This is based on a paper with Yotam Margalit from 2021.

Bayesian Analysis with Python

Osvaldo Martin writes:

The third edition of Bayesian Analysis with Python serves as an introduction to the basic concepts of applied Bayesian modeling. It adopts a hands-on approach, guiding you through the process of building, exploring and expanding models using PyMC and ArviZ. The field of probabilistic programming is in a different place today than it was when the first edition was devised in the middle of the last decade. The journey from its first publication to this current edition mirrors the evolution of Bayesian modeling itself – a path marked by significant advancements, growing community involvement, and an increasing presence in both academia and industry. Consequently, this updated edition also includes coverage of additional topics and libraries such as Bambi, for flexible and easy hierarchical linear modeling, PyMC-BART, for flexible non-parametric regression; PreliZ, for prior elicitation; and Kulprit, for variable selection.

Whether you’re a student, data scientist, researcher, or developer aiming to initiate Bayesian data analysis and delve into probabilistic programming, this book provides an excellent starting point. The content is introductory, requiring little to none prior statistical knowledge, although familiarity with Python and scientific libraries like NumPy is advisable.

By the end of this book, you will possess a functional understanding of probabilistic modeling, enabling you to design and implement Bayesian models for your data science challenges. You’ll be well-prepared to delve into more advanced material or specialized statistical modeling if the need arises.

See more at the book website

Osvaldo spent one year at Aalto in Finland (unfortunately, during the pandemic) so I know he knows what he writes. Bambi is rstanarm / brms style interface for building models with PyMC in Python ecosystem, and Kulprit is the Python version of projpred (in R) for projective predictive model selection (which is one of my favorite research topics).

When all else fails, add a code comment

Another way of saying this is that you should treat inline code comments as a last resort when there is no other way to make your intentions clear.

I used to teach a session of Andrew’s statistical communication class once a year and I’d focus on communicating a computational API. Most of the students hated it because they signed up for the class to hear Andrew talk about stats, not me talk about API design. At least one student just up and walked out every year! So if you’re that student, now’s your chance to bail.

Comments considered harmful

Most academics, before they will share code with me, tell me they have to “clean it up.” I invariably tell them not to bother, and at best, they will dilly dally and shilly shally and apologize for lack of comments. What they don’t realize is that they were on the right track in the first place. The best number of inline code comments is zero. Nada. Zilch. Nil. Naught.

Why are comments so harmful? They lie! Even with the best of intent, they might not match the actual implementation. They often go stale over time. You can write whatever you want in a comment and there’s no consistency checking with the code.

You know what doesn’t lie? Code. Code doesn’t lie. So what do professional programmers do? They don’t trust comments and read the code instead. At this point, comments just get in the way.

What’s a bajillion times better than comments?

Readable code. Why? It’s self documenting. To be self documenting, code needs to be relatively simple and modular. The biggest mistake beginners make in writing code is lack of modularity. Without modularity, it’s impossible to build code bottom up, testing as you go.

It’s really hard to debug a huge program. It’s really easy to debug modules built up piece by piece on top of already-tested modules. So design top down, but build code bottom up. This is why we again and again stress in our writing on Bayesian workflow and in our replies to user questions on forums, that it helps immensely to scaffold up a complicated model one piece at a time. This lets you know when you add something and it causes a failure.

Knowing where to draw lines between modules is, unfortunately, a matter of experience. The best way to get that experience? Read code. In the product coding world, code is read much more often than it’s written. That means much more effort typically goes into production code to make it readable. This is very unlike research code which might be written once and never read again.

There is a tradeoff here. Code is more readable with short variable names and short function names. It’s easier to apprehend the structure of the expression a * b + c**2 than meeting_time * number_of_meetings + participants**2. We need to strike a balance with not too long, but still informative variable names.

And why are beginners so afraid of wasting horizontal space while being spendthrifts on the much more valuable vertical space? I have no explanation. But I see a lot of code from math-oriented researchers that looks like this, ((a*b)/c)+3*9**2+cos(x-y). Please use spaces around operators and no more parens than are necessary to disambiguate given attachment binding.

When should I comment?

Sometimes you’re left with no choice and have to drop in a comment as a last resort. This should be done if you’re doing something non-idiomatic with the language or coding an unusual algorithm or something very involved. In this case, a little note inline about intent and/or algebra can be helpful. That’s why commenting is sometimes called a method of last resort.

But whatever you do, comment for people who know the language better than you. Don’t write a comment that explains what a NumPy function does—that’s what the NumPy doc is for. Nobody wants to see this:

int num_observations = 513;  // declare num_observations as an integer and set equal to 513

But people who feel compelled to comment will write just this kind of thing, thinking it makes their code more professional. If you think this is a caricature, you don’t read enough code.

The other thing you don’t want to do is this:

#####################################################
################## INFERENCE CODE ###################
#####################################################
...
...
...

This is what functions are for. Write a function called inference() and call it. It will also help prevent accidental reuse of global variables, which is always a problem in scripting languages like R and Python. Don’t try to fix hundreds or thousands of lines of unstructured code with structured comments.

Another thing to keep in mind is that vertical space is very precious in coding, because we want to be able to see as much of the code as we can at a time without scrolling. Don’t waste vertical space with useless or even harmful comments.

Do not, and I repeat, do not use /* ... */ style comments inline with code. It’s too easy to get confused when it’s a lot of lines and it’s doubly confusing when nested. Instead, use line comments (// in C++ and Stan, # in Python and R). Use the comment-region command in emacs or whatever does the same in your IDE. With line comments, the commented out code will be very visible, as in the following example.

for (int n = 0; n < N; ++n) {
  // int x = 5
  // int y = x * x * 3;
  // int z = normal_rng(y, 1);
  z = n * 3
}

Compare that to what I often see, which is some version of the following.

for (int n = 0; n < N; ++n) {
  /* int x = 5
  int y = x * x * 3;
  int z = normal_rng(y, 1); */
  z = n * 3
}

In the first case, it's easy to just scan down the margin and see what's commented out.

After commenting out and fixing everything, please be a good and respectful citizen and just delete all the edited out code before merging or releasing. Dead code makes the live code hard to find and one always wonders why it's still there---was it a mistake or some future plan or what? When I first showed up at Bell Labs in the mid 1990s, I was handed a 100+ page Tcl/Tk script for running a speech recognizer and told only a few lines were active, but I'd have to figure out which ones. Don't do that!

The golden triangle

What I stressed in Andrew's class is the tight interconnection between three aspects of production code:


$latex \textrm{API Documentation} \leftrightarrow \textrm{Unit tests} \leftrightarrow \textrm{Code}$

 

The API documentation should be functionally oriented and say what the code does. It might include a note as to how it does it if that is relevant to its use. An example might be different algorithms to compute the same thing that are widely known by name and useful in different situations. The API doc should ideally be specific enough to be the basis of both unit testing and coding. So I'm not saying don't document. I'm saying don't document how inline code works, document your API's intent.

The reason I call this the "golden" triangle is the virtuous cycle it imposes. If the API doc is hard to write, you know there's a problem with the way the function has been specified or modularized. With R and Python programmers, that's often because the code is trying to do too many things for a single function and the input types and output types become a mess of dependencies. This leads to what programmers identify as a "bad smell" in the code. If the code or the unit tests are hard to write, you know there's a problem with the API specification.

Clients (human and computational) are going to see and "feel" the API. That's where the "touch" is that designers like to talk about in physical object design. Things need to feel natural for the application, or in the words of UI/UX designers, it needs to offer affordances (in the past, we might have said it should be intuitive). It needs to feel natural for the application. Design the API first from the client perspective. Sometimes you have to suffer on the testing side to maintain a clean and documentable API, but that clean API is your objective.

What about research code?

Research code is different. It doesn't have to be robust. It doesn't have to be written to be read by multiple people in the future. You're usually writing end-to-end tests rather than unit tests, though that can be dangerous. It still helps to develop bottom-up with testing.

What research code should be is reproducible. There should be a single script to run that generates all the output for a paper. That way, even if the code's ugly, at least the output's reproducible and someone with enough interest can work through it.

And of course, research code needs to be tested that it's doing what it's supposed to be doing. And it needs to be audited to make sure it's not "cheating" (like cross-validating a time-series, etc.).

Notebooks, Quarto, and other things that get in the way of coding and documenting

With all due respect to Donald Knuth (never a good start), literate programming is a terrible way to develop code. (On a related topic, I would totally recommend at least the beginning part of Knuth's notes on how to write math.)

I don't love them, but I use Quarto and Jupyter (nee iPython) notebooks for writing reproducible tutorial material. But only after I've sorted out the code. These tools mix text and code and make too many compromises along the way to make them good at either task. Arguably the worst sin is that it winds up obfuscating the code with a bunch of text. Jupyter also makes it possible to get into inconsistent states because it doesn't automatically re-run everything. Quarto is just a terrible typesetting platform, inheriting all the flaws of pandoc, citeproc, with the added joy of HTML and LaTeX interoperability and R and Python interoperability. We use it for Stan docs so that we can easily generate HTML and LaTeX, but it always feels like there should be a better way to do this as it's a lot of trial and error due to the lack of specs for markdown.

“Replicability & Generalisability”: Applying a discount factor to cost-effectiveness estimates.

This one’s important.

Matt Lerner points us to this report by Rosie Bettle, Replicability & Generalisability: A Guide to CEA discounts.

“CEA” is cost-effectiveness analysis, and by “discounts” they mean what we’ve called the Edlin factor—“discount” is a better name than factor, because it’s a number that should be between 0 and 1, it’s what you should multiply a point estimate by to adjust for inevitable upward biases in reported effect-size estimates, issues discussed here and here, for example.

It’s pleasant to see some of my ideas being used for a practical purpose. I would just add that type M and type S errors should be lower for Bayesian inferences than for raw inferences that have not been partially pooled toward a reasonable prior model.

Also, regarding empirical estimation of adjustment factors, I recommend looking at the work of Erik van Zwet et al; here are some links:
What’s a good default prior for regression coefficients? A default Edlin factor of 1/2?
How large is the underlying coefficient? An application of the Edlin factor to that claim that “Cash Aid to Poor Mothers Increases Brain Activity in Babies”
The Shrinkage Trilogy: How to be Bayesian when analyzing simple experiments
Erik van Zwet explains the Shrinkage Trilogy
The significance filter, the winner’s curse and the need to shrink
Bayesians moving from defense to offense: “I really think it’s kind of irresponsible now not to use the information from all those thousands of medical trials that came before. Is that very radical?”
Explaining that line, “Bayesians moving from defense to offense”

I’m excited about the application of these ideas to policy analysis.

I’ve been mistaken for a chatbot

… Or not, according to what language is allowed.

At the start of the year I mentioned that I am on a bad roll with AI just now, and the start of that roll began in late November when I received reviews back on a paper. One reviewer sent in a 150 word review saying it was written by chatGPT. The editor echoed, “One reviewer asserted that the work was created with ChatGPT. I don’t know if this is the case, but I did find the writing style unusual ….” What exactly was unusual was not explained.

That was November 20th. By November 22nd my computer shows a file created named ‘tryingtoproveIamnotchatbot,’ which is just a txt where I pasted in the GitHub commits showing progress on the paper. I figured maybe this would prove to the editors that I did not submit any work by chatGPT.

I didn’t. There are many reasons for this. One is I don’t think that I should. Further, I suspect chatGPT is not so good at this (rather specific) subject and between me and my author team, I actually thought we were pretty good at this subject. And I had met with each of the authors to build the paper, its treatise, data and figures. We had a cool new meta-analysis of rootstock x scion experiments and a number of interesting points. Some of the points I might even call exciting, though I am biased. But, no matter, the paper was the product of lots of work and I was initially embarrassed, then gutted, about the reviews.

Once I was less embarrassed I started talking timidly about it. I called Andrew. I told folks in my lab. I got some fun replies. Undergrads in my lab (and others later) thought the review itself may have been written by chatGPT. Someone suggested I rewrite the paper with chatGPT and resubmit. Another that I just write back one line: I’m Bing.

What I took away from this was myriad, but I came up with a couple next steps. I decided this was not a great peer review process that I should reach out to the editor (and, as one co-author suggested, cc the editorial board). And another was to not be so mortified as to not talk about this.

What I took away from these steps were two things:

1) chatGPT could now control my language.

I connected with a senior editor on the journal. No one is a good position here, and the editor and reviewers are volunteering their time in a rapidly changing situation. I feel for them and for me and my co-authors. The editor and I tried to bridge our perspectives. It seems he could not have imagined that I or my co-authors would be so offended. And I could not have imagined that the journal already had a policy of allowing manuscripts to use chatGPT, as long as it was clearly stated.

I was also given some language changes to consider, so I might sound less like chatGPT to reviewers. These included some phrases I wrote in the manuscript (e.g. `the tyranny of terroir’). Huh. So where does that end? Say I start writing so I sound less to the editor and others ‘like chatGPT’ (and I never figured out what that means), then chatGPT digests that and then what? I adapt again? Do I eventually come back around to those phrases once they have rinsed out of the large language model?

2) Editors are shaping the language around chatGPT.

Motivated by a co-author’s suggestion, I wrote a short reflection which recently came out in a careers column. I much appreciate the journal recognizing this as an important topic and that they have editorial guidelines to follow for clear and consistent writing. But I was surprised by the concerns from the subeditors on my language. (I had no idea my language was such a problem!)

This problem was that I wrote: I’ve been mistaken for a chatbot (and similar language). The argument was that I had not been mistaken — my writing had been. The debate that ensued was fascinating. If I had been in a chatroom and this happened, then I could write `I’ve been mistaken for a chatbot’ but since my co-authors and I wrote this up and submitted it to a journal, it was not part of our identities. So I was over-reaching in my complaint. I started to wonder: if I could not say ‘I was mistaken for an AI bot’ — why does the chatbot get ‘to write’? I went down an existential hole, from which I have not fully recovered.

And since then I am still mostly existing there. On the upbeat side, writing the reflection was cathartic and the back and forth with the editors — who I know are just trying to their jobs too — gave me more perspectives and thoughts, however muddled. And my partner recently said to me, “perhaps one day it will be seen as a compliment to be mistaken for a chatbot, just not today!”

Also, since I don’t know an archive that takes such things so I will paste the original unedited version below.

I have just been accused of scientific fraud. It’s not data fraud (which, I guess, is a relief because my lab works hard at data transparency, data sharing and reproducibility). What I have just been accused of is writing fraud. This hurts, because—like many people—I find writing a paper a somewhat painful process.

Like some people, I comfort myself by reading books on how to write—both to be comforted by how much the authors of such books stress that writing is generally slow and difficult, and to find ways to improve my writing. My current writing strategy involves willing myself to write, multiple outlines, then a first draft, followed by much revising. I try to force this approach on my students, even though I know it is not easy, because I think it’s important we try to communicate well.

Imagine my surprise then when I received reviews back that declared a recently submitted paper of mine a chatGPT creation. One reviewer wrote that it was `obviously Chat GPT’ and the handling editor vaguely agreed, saying that they found `the writing style unusual.’ Surprise was just one emotion I had, so was shock, dismay and a flood of confusion and alarm. Given how much work goes into writing a paper, it was quite a hit to be accused of being a chatbot—especially in short order without any evidence, and given the efforts that accompany the writing of almost all my manuscripts.

I hadn’t written a word of the manuscript with chatGPT and I rapidly tried to think through how to prove my case. I could show my commits on GitHub (with commit messages including `finally writing!’ and `Another 25 mins of writing progress!’ that I never thought I would share), I could try to figure out how to compare the writing style of my pre-chatGPT papers on this topic to the current submission, maybe I could ask chatGPT if it thought I it wrote the paper…. But then I realized I would be spending my time trying to prove I am not a chatbot, which seemed a bad outcome to the whole situation. Eventually, like all mature adults, I decided what I most wanted to do was pick up my ball (manuscript) and march off the playground in a small fury. How dare they?

Before I did this, I decided to get some perspectives from others—researchers who work on data fraud, co-authors on the paper and colleagues, and I found most agreed with my alarm. One put it most succinctly to me: `All scientific criticism is admissible, but this is a different matter.’

I realized these reviews captured both something inherently broken about the peer review process and—more importantly to me—about how AI could corrupt science without even trying. We’re paranoid about AI taking over us weak humans and we’re trying to put in structures so it doesn’t. But we’re also trying to develop AI so it helps where it should, and maybe that will be writing parts of papers. Here, chatGPT was not part of my work and yet it had prejudiced the whole process simply by its existential presence in the world. I was at once annoyed at being mistaken for a chatbot and horrified that reviewers and editors were not more outraged at the idea that someone had submitted AI generated text.

So much of science is built on trust and faith in the scientific ethics and integrity of our colleagues. We mostly trust others did not fabricate their data, and I trust people do not (yet) write their papers or grants using large language models without telling me. I wouldn’t accuse someone of data fraud or p-hacking without some evidence, but a reviewer felt it was easy enough to accuse me of writing fraud. Indeed, the reviewer wrote, `It is obviously [a] Chat GPT creation, there is nothing wrong using help ….’ So it seems, perhaps, that they did not see this as a harsh accusation, and the editor thought nothing of passing it along and echoing it, but they had effectively accused me of lying and fraud in deliberately presenting AI generated text as my own. They also felt confident that they could discern my writing from AI—but they couldn’t.

We need to be able to call out fraud and misconduct in science. Currently, the costs to the people who call out data fraud seem too high to me, and the consequences for being caught too low (people should lose tenure for egregious data fraud in my book). But I am worried about a world in which a reviewer can casually declare my work AI-generated, and the editors and journal editor simply shuffle along the review and invite a resubmission if I so choose. It suggests not only a world in which the reviewers and editors have no faith in the scientific integrity of submitting authors—me—but also an acceptance of a world where ethics are negotiable. Such a world seems easy for chatGPT to corrupt without even trying—unless we raise our standards.

Side note: Don’t forget to submit your entry to the International Cherry Blossom Prediction Competition!