3/15: Final predictions posted! https://mrphilipslibrary.wordpress.com/2024/03/14/final-hugo-predictions-2024-but-first-some-final-thoughts-on-the-2023-hugos/
EDIT 2/1/24: I have decided that I will continue to use the 2023 Hugo finalists as announced in my data. There’s no sure way to know exactly what happened, so I’m not going to waste any more time or energy speculating on it. Unless there is an official decision to renounce the 2023 Hugo awards I’m gonna roll with what I’ve got. The nominating data is, of course, unusable, but it has a pretty negligible effect on the prediction model anyway.
EDIT 1/21/24: With the reveal of the fiasco that is the 2023 Hugo nominations, as explained in my new post, I have to decide what that means for these stats. Should I continue to use the stats from last year with the question of their legitimacy? Do I omit them and only use stats for the previous four years instead of my typical use of five years of stats? Or go back a year to make sure I have all five? I’ll figure it out over the next couple weeks as I update for February.
UPDATED 2/2/24: A lot happened in January – Locus RR, Alex Awards, RUSA Reading List, etc. But not much changed beyond a few flip-flops. Things are pretty solid by this point and the last thing that will have the potential to make a big impact is the Nebula Awards announcement, probably sometime early next month.
Novel:
Starling House | Alix E. Harrow |
The Adventures of Amina al-Sirafi | S.A. Chakraborty |
Some Desperate Glory | Emily Tesh |
System Collapse | Martha Wells |
Witch King | Martha Wells |
Ink Blood Sister Scribe | Emma Törzs |
Translation State | Ann Leckie |
Starling House takes over the top spot for the first time, and The Adventures of Amina Al-Sarafi holds strong at second place despite its surprising omission from the Locus Recommended Reading List. Ink Blood Sister Scribe muscles its way onto the list proper, switching with Translation State in the bonus slot. IBSS doesn’t feel to me like a serious contender, but neither did Light From Uncommon Stars a couple years ago, so you never know. A few books bubbling under the surface include familiar names like John Scalzi, and Shelley Parker-Chan, with a sequel to her previous finalist, as well as Chain-Gang All-Stars by Nana Kwame Adjei-Brenyah, although that might carry with it the dreaded perception of being too Literary.
Novella:
Mammoths at the Gate | Nghi Vo |
Thornhedge | T. Kingfisher |
The Crane Husband | Kelly Barnhill |
The Mimicking of Known Successes | Malka Older |
The Lies of the Ajungo | Moses Ose Utomi |
Lost in the Moment and Found | Seanan McGuire |
Untethered Sky | Fonda Lee |
The only change this month is the appearance of Untethered Sky in the bonus slot, bumping off Feed Them Silence by Lee Mandelo. Rose/House by Arkady Martine is not far behind. I have to wonder if at some point Seanan McGuire and/or Nghi Vo would start declining nominations for their individual series if nominated, both of which have won previously (McGuire’s Wayward Children three times now). The listed books are #8 and #4 in their respective series.
—————————–
I’ll continue to update monthly-ish until voting closes next year. My plan is to edit this post throughout the year. I’ll carry over this methodology explanation from my previous post.
I’m always open to answering questions and discussing.
Methodology:
The model I use to make the predictions is continually a work in progress and I regularly train it to make it as accurate as possible (although at this point it’s all just very minor tweaks) which is the reason it sometimes changes seemingly without any new information. I’ll explain a specific example of this as well as my general methodology for further context.
As a disclaimer, I’m not a coder and do not use any sophisticated programming. I’m a pseudo-statistician who has researched predictive modeling to design a formula for something that interests me. I first noticed certain patterns among Hugo finalists that made me think it would be cool to try and compile those patterns into an actual working formula. I use a discriminant function analysis (DFA) which uses predictors (independent variables) to predict membership in a group (dependent variable). In this case the group is whether a book will be a Hugo finalist.
I’ve compiled a database of past Hugo finalists that currently goes back to 2008. Each year I use a dataset that includes information from the previous 5 years to reflect current trends that are more indicative of the final outcome than many years of past data (Pre-Puppy era data is vastly different than the current Post-Puppy era despite not being that long ago.) I also compile a database of books that have been or are being published during the current eligibility year. Analyzing those databases generates a structure matrix that provides function values for different variables/predictors. For these 2023 predictions, 28 total predictors were used. Each predictor is assigned value based on how it presented in previous finalists, and how it presents in the database of current books. My rankings are simply sums of the values each book receives based on which predictors are present.
Predictors cover four general areas: “Specs” such as genre, publisher, and standalone/sequel; “Awards” meaning performance in other awards leading up to the Hugos; “History” meaning an author’s past Hugo history; and ”Buzz” such as inclusion on various reader lists, bestseller performance, and whether a book receives a starred review from a prominent publication.
Sometimes I’ll consider a new variable and evaluate whether I have enough previous data to use it, and whether its predictive power makes it worth including. Sometimes I’ll re-evaluate the data I’m already using and determine if I’m utilizing it effectively. Here’s an example regarding Hugo History as a predictor. Previously I assigned separate predictor values to books written by authors who are previous winners, previous finalists, and have been previously longlisted, and included each separate variable in the model. I since determined this made the variables redundant leading the model to overfit the data, and transitioned to assigning works value based on whether they were written by previous Hugo winners/finalists or previously longlisted authors.