Using reinforcement learning and $4.80 of GPU time to find the best HN post

213 points | by kcorbitt 4 days ago

96 comments

jerjerjer 4 days ago
> In this case, I included the post title, author, date, and content. All of those factors could be relevant to the chance a story gets voted up.
> Even if the model gets extremely good at predicting final_score_if_it_hits_front_page, there’s still the inherent randomness of probability_of_hitting_front_page that is fundamentally unpredictable.
In addition to date, you might want to include three fields:
- day of week (categorical)
- is weekend/holiday (boolean)
- hour or time of the day (categorical, you can have 24 of them or morning/afternoon/etc.).
The probability of a post hitting the front page is usually affected by these things so it can really help the model.
[-]
- sitkack 4 days ago
  I find that the best stories get posted by folks in EU time zones as well as the weekend (more of hacker ethos). The flame bait startup drama is M-F Pacific.
- fennecbutt 6 hours ago
  The data is massively interconnected too: if Apple releases new m chip, people flood here to see if there's a thread on it, while browsing they may be more or less likely to see other threads given that first case.
- jedberg 4 days ago
  I haven't run the data, but anecdotally I can tell you that those things probably don't affect hitting the front page. They do affect the total score, but that is not what is being optimized here.
  It's counterintuitive, but if you post at a really popular time, you're competing with a lot of other submissions. If you post at a really slow time, you'll get fewer votes, but it will take fewer to reach the front page and you'll have less competition.
  In the end, it kinda evens out. The number of votes it takes to get to the front page and the number of competing submissions are both correlated to your fields above.
  [-]
  - floobertoober 4 days ago
    I think that this assumes a uniform distribution of "interestingness" in the competing posts across all of those dimensions and I wouldn't be surprised if that isn't the case
    [-]
    - jedberg 4 days ago
      It may not be even, but I don't think interestingness is correlated with time of day. But I could be wrong!
      [-]
      - sadeshmukh 4 days ago
        Interestingness is subjective, and I would imagine different timezone people have different preferences. Interesting thing to ponder for a bit
  - 4m1rk 4 days ago
    Popular time for voting vs posting are not the same
- josefx 4 days ago
  > is weekend/holiday
  Somehow this reminded me of someone datamining spiegel.de (german news site) and using the timestamps of the posted articles to extrapolate the writers religion (holidays) and relationships (shared vacations) among dozens of other data points from several years of publicly available data. I think no AI was involved back then.
  [-]
  - EffrafaxOfWug 4 days ago
    For anyone interested, it was this CCC talk by David Kriesel (sadly german only).
    https://media.ccc.de/v/33c3-7912-spiegelmining_reverse_engin...
    [-]
    - drilbo 2 days ago
      There is an english translated audio track, actually. (sound quality is not fantastic though)
- maaaaattttt 4 days ago
  I wonder if hour of day would benefit from being combined with HN's visitors location data to be truly relevant? I think the location is embedded in the time somehow if the visitors' origins are stable over time. If 9am PT is a popular time and most of the visitors are on the PT timezone then even if this 9am PT is encoded as UTC the model will pick it up (I think). Now, if over time visitors get more diverse and a big chunk is now coming from Europe, this original 9am will make less sense to the model. Adding visitors origin stats at time of the post would probably even help surface region trends. But I guess this historical data isn't public.
- kcorbitt 4 days ago
  Yep that makes sense. Would be interesting to do a follow-up that explicitly includes these variables and see if it meaningfully improves the results.
- rajnathani 3 days ago
  I would replace author with a boolean of if the author's account is new or not (the green marker that HN has for new users' posts and comments).
- aaron695 4 days ago
  > might want to include three fields:
  This has been studied multiple times on HN posts, most seem to have link-rotted. Web Archive them if looking for insights - https://hn.algolia.com/?q=best+time+to+post
kelnos 4 days ago
I don't get the conclusion the author is trying to draw. If you look at the data presented, it seems that the model was actually pretty bad at guessing the real-world behavior of the posts listed. Out of the top ten it picked:
* 1 had a score that was reasonably close (8.4%) to what the model predicted
* 4 had scores wildly lower than the model predicted
* 2 had scores wildly higher than the model predicted
* the remaining 3 were not wildly off, but weren't really that close either (25%-42% off)
Then there's a list of 10 submissions that the model predicted would have scores ranging from 33 to 135, but they all only received a score of 1 in reality.
The graph shown paints a bit of a better picture, I guess, but it's still not all that compelling to me.
[-]
- kcorbitt 4 days ago
  This is a fair point. The reason why I think "correlation" is a better metric than "predicts the exact correct score" is because of how I'll be using this model in the next post.
  Broadly, the main use case for this model (in the RL context) will be to take two different versions of the same post, and predict which of the two is more likely to be upvoted. So what matters isn't that it gets the exact number of upvotes correctly, but that it correctly predicts the relative difference in likely upvote count between two variants.
  Now it still doesn't do a great job at that (the correlation is only 0.53 after all) but it still does a good enough job to provide some useful signal.
  [-]
  - espadrine 3 days ago
    That makes me wonder though what the best loss function was. I assume you used MSE on the logscore. I wonder if a sigmoid on which of two articles has the higher score would yield better results for the downstream RLHF task.
- manx 4 days ago
  Scores are not a good metric to be compared. I did some data analysis and wrote about it here: https://felx.me/2021/08/29/improving-the-hacker-news-ranking...
- nl 4 days ago
  The score divergence is likely because if a story makes the front page then it almost certainly gets comments and each comment adds one to the score.
  But the number of comments depends on the time posted more than the story itself and that information isn't in the model.
youoy 4 days ago
Thanks for sharing! Very interesting.
> The correlation is actually not bad (0.53), but our model is very consistently over-estimating the score at the low end, and underestimating it at the high end. This is surprising; some variation on any given data point is expected, but such a consistent mis-estimation trend isn’t what we’d expect.
This is a consequence on the model objective. If you don't know what is really happening, a good way of reducing the overall error is to do that. If you instead try to exactly predict the very highs and very lows, you can see that you will get very high errors on those, resulting in a bigger overall error.
Appart from that, I want to comment on AI alignment here. For me the objective of "most up votes" is not fully correlated with where I get the most value on HN. Most of the time, the most up voted I would have found them anyway on other platforms. It's the middle range what I really like. So be careful implementing this algorithm at scale, it could turn the website into another platform with shitty AI recommendations.
[-]
- kcorbitt 4 days ago
  > For me the objective of "most up votes" is not fully correlated with where I get the most value on HN. Most of the time, the most up voted I would have found them anyway on other platforms.
  Yes, this is a fantastic point. I'm curious if there's some other measurable proxy metric for "things I get the most value out of on HN"? Upvotes seems like the most natural but optimizing for it too strongly would definitely take HN down a dark path.
  [-]
  - losteric 4 days ago
    Perhaps selecting for posts with the highest quality reply engagement? If many different people were drawn to lengthy discussions, that suggests the content sparks thoughts that others then feel compelled to engage with. Or select for the emotional content of replies, awe/empathy/anger, depending on what one wants from HN?
    [-]
    - hatthew 4 days ago
      lots of platforms optimize for engagement, but all that does is encourage ragebait
    - kcorbitt 4 days ago
      Ohh, I really like that as a potential proxy metric!
  - coolcoder613 4 days ago
    Perhaps number of comments, or number of non-flamewar comments, or proportion of flamewar comments together with number of comments?
oli5679 4 days ago
If you withhold a small amount of data, or even retrain on a sample of your training data, then isotonicregression is good to solve many calibration problems.
https://scikit-learn.org/dev/modules/generated/sklearn.isoto...
I also agree with your intuition that if your output is censored at 0, with a large mass there, it's good to create two models, one for likelihood of zero karma, and another expected karma, conditional on it being non-zero.
[-]
- kcorbitt 4 days ago
  I hadn't heard of isotonicregression before but I like it!
  > it's good to create two models, one for likelihood of zero karma, and another expected karma, conditional on it being non-zero.
  Another way to do this is to keep a single model but have it predict two outputs: (1) likelihood of zero karma, and (2) expected karma if non-zero. This would require writing a custom loss function which sounds intimidating but actually isn't too bad.
  If I were actually putting a model like this into production at HN I'd likely try modeling the problem in that way.
- Y_Y 4 days ago
  Did you dictate this? It looks like you typo'd/brain I'd "centered" into "censored", but even allowing for phonetic mistakes (of which I make many) and predictive text flubs, I still can't understand how this happened.
  [-]
  - oli5679 4 days ago
    I was thinking of censoring, maybe I should have said another word like floored.
    The reason I think of this as censoring is that there are are some classical statistical models that model a distribution with a large mass at a minimum threshold, e.g. "tobit" censored regression.
    https://en.wikipedia.org/wiki/Censoring_(statistics)
    [-]
    - Y_Y 4 days ago
      Thanks for the explanation. I never paid much attention in my stats lectures so I deserve to have missed out on that term-of-art. I think the physics lingo would be to call it "capped" or "bounded" or "constrained".
      [-]
      - oli5679 4 days ago
        thanks, it's very understandable that you thought i was mistyping 'centred'.
  - CaptainFever 4 days ago
    I'm not the parent commenter, but whisper based dictation is getting pretty awesome nowadays. It's almost as good as sci-fi.
    (Fully dictated, no edits except for this)
  - 1024core 4 days ago
    I also thought that the commenter spoke "centered" and the speech recognition model output "censored".
swyx 4 days ago
> > This query took 17 seconds to load the dataset into RAM and then aggregating by type was almost instant. It is absolutely incredible to me that I can load every HN post and comment ever into RAM in a few seconds on my (admittedly beefy) dev laptop, and analyze them at will. What an age of abundance!
https://motherduck.com/blog/big-data-is-dead/
Arctic_fly 4 days ago
> But in 2015 there is a stark discontinuity, where the number of stories (with text) shoots up by >10x, and the average score drops by 5x! Is this some kind of eternal September?
Based on the later analysis in the post (which I agree with), the total score of a comment is disproportionately tied to whether it hits the front page, and of course how long it stays there. Regardless of the quality of the average post starting in 2015, the sheer quantity would make it impossible for all but a few to stay on the front page for very long. Hacker News got more popular, so each story got less prime time.
kcorbitt 4 days ago
Hey all, this project was a labor of love I worked on in my spare time over the last couple of weeks. Happy to answer any questions!
[-]
- Eisenstein 4 days ago
  I think it is interesting, but I can't help but feel that things like this result in the homogenizing and blandefying of content. It is like training a model to predict what movies will be successful at the box office -- the result will be the same kinds of movies over and over. No one knows what the breakthrough success is until it shows up, and no model can predict those. Essentially this is teaching people how to make HN full of nothing but complaints and indie success stories.
  What is your take on this?
sdflhasjd 4 days ago
It's interesting that service complaints are so popular on HN. I always feel a bit bad that my most popular HN contribution was me complaining about a popular service
[-]
- kelnos 4 days ago
  I flag most complaint posts, unless the complaint actually brings to light or discusses something surprising or unique that can be generalized and discussed.
  I generally find these posts pretty boring, and most comments on them are people recounting their own stories about how that (or a similar) service screwed them over. I suppose they can be a decent way to warn people off of a particular product (scammy, terrible customer support, whatever), but that's not what I come to HN for.
- Karrot_Kream 4 days ago
  A popular theory on techie parts of the web is that engagement-optimizing sites create this negativity loop, but I disagree. I think negativity is naturally something that people seek no matter what the algorithm is. In an upvote based site, outrage ranks to the top. I also think text based platforms suffer from negative engagement much moreso than multimedia platforms.
  Model correlation is decent here but there's certainly more to do to use its outputs predictively.
  [-]
  - johnfn 4 days ago
    I don't really agree with this. I go and hang out with my friends, and we don't all end up getting outraged about stuff. I go for a walk in the park and no one is shouting at me; I go to a restaurant and people are sitting around normally discussing whatever. If you start quoting outrage bait that you read online, people might look at you strangely.
    My point is I don't think people seek out outrage. Social media's algorithms may not explicitly reward it as transparently as `if (post.outrage > 100) post.boost()`, but outrage isn't some default rule of interaction.
  - miki123211 4 days ago
    As a mastodon user, I can definitely confirm this.
    Give people the way to repost / retweet / boost, and your feed suddenly turns into mostly negativity, even if your algorithm is "show posts from my followers only, newest to oldest"
    [-]
    - Karrot_Kream 4 days ago
      Yeah my Bluesky followers are carefully curated to stop from swelling into negativity. I've been playing around with a labeller that filters followed posts into those that I find emotionally pleasant which I've been training based on my own labeling of followers' posts. The goal is to follow more people and have the labeller (or feed generator depending on how I go) hide the posts I don't care for.
  - Vampiero 4 days ago
    If that theory were true then, what about every website on the internet pre-2010? What about 4chan?
    See also https://en.wikipedia.org/wiki/Negativity_bias
    We're just built like that.
    Regarding text platforms suffering more than non-text platforms, I think it's because of the lack of social cues that are otherwise there. You can infer a lot from the way someone talks, or from their body language. You can't infer much from text, which is partly why Poe's law exists -- sarcasm doesn't translate well.
    [-]
    - Karrot_Kream 4 days ago
      > what about every website on the internet pre-2010
      It was definitely there. Plenty of forums had "rant threads" that were efforts to quarantine shitty reactionary behavior like this. Also a lot of the healthier forums were smaller forums. I was on plenty of forums that had 10-20 folks on them that today would just be a Telegram group chat or a small Discord "server". These small spaces tend to be a lot lower on toxicity than larger fora. I was part of a few large fora like Gaia Online and they were just as toxic as today's large platforms. Managing large communities with chronological posting is really difficult and upvote based social networks were the first real networks to be able to scale to larger userbases without having hundreds of moderators (like Gaia or the large MUDs.)
      > What about 4chan?
      4chan is immune because the default emotional register there is indignant dismissal. Because of this it's just a matter of choosing what else to layer ontop of the indignant dismissal, like sarcasm or anger or whatnot.
      > Regarding text platforms suffering more than non-text platforms, I think it's because of the lack of social cues that are otherwise there. You can infer a lot from the way someone talks, or from their body language. You can't infer much from text, which is partly why Poe's law exists.
      That's an interesting theory actually. My theory was that in the age of multimedia platforms, text platforms tend to attract folks who specifically want to use text over multimedia. Generally text forums will select for folks with social or self-esteem issues. These folks are the least likely to healthily deal with their emotions or disengage positively. This leads to higher toxicity on text based platforms.
      [-]
      - Eisenstein 4 days ago
        > My theory was that in the age of multimedia platforms, text platforms tend to attract folks who specifically want to use text over multimedia. Generally text forums will select for folks with social or self-esteem issues. These folks are the least likely to healthily deal with their emotions or disengage positively. This leads to higher toxicity on text based platforms.
        Some people like to take time to compose thoughts in written form because that is generally the best way to communicate thoughtfully. You can say what you will about a lack of body language, but plenty of people get into verbal fights in person and it doesn't help that they end up talking over each other.
        I think that your assertion that people who communicate via text have social issues is without evidence and is reductive.
        You could say that people who enjoy looking at themselves and hearing themselves enough to edit their footage and post it online have ego issues and are less likely to listen to what others have to say.
        [-]
        Karrot_Kream 3 days ago
        My reading of your response is that you identify as a person who prefers written form communication because you feel that it is the best to communicate thoughtfully and you felt personally attacked by my response. I think that's reductive and not really relevant for this train of thought and your response seems to feel like a defense of your identity. I personally prefer communicating in text as well because I like to take my time to compose my thoughts but I know that presents a weakness for me because I'm much less able to articulate my thoughts in fast-moving situations such as work meetings or community emergency planning or other things. I am, indeed, less capable in social situations than others and it's a deficiency I've tried to grow past my entire life.
        The direction of my implication comes from observation: text communities tend to all descend into toxicity (observation) -> why does this happen in text communities moreso than non-text communities? (question) -> higher proportion of socially maladapted people (theory). You might well be correct that people who enjoy looking and hearing themselves and have ego issues are the ones that prefer (compose a higher proportion thereof) multimedia social networks. I don't disagree with you, either. That's beside the point. The point is that most text communities tend to descend into toxicity.
        Humans aren't perfect and if I'm in a positive community of high egos, I'd much prefer that than a toxic community with "normal" egos.
        So I want to zoom in on this:
        > Some people like to take time to compose thoughts in written form because that is generally the best way to communicate thoughtfully. You can say what you will about a lack of body language, but plenty of people get into verbal fights in person and it doesn't help that they end up talking over each other.
        We're talking about social networks here not real life, because social networks deal with the fundamentally different problem. In a social network (yes this includes IRC) you are interacting with a number of people whom you do not share any real-world context with, whom you do not share any physical space with, and whom generally have a much lower stake in their relationships because of the lack of shared context.
        In my experience all textual social networks that grow beyond a certain number of users descend into toxicity: Usenet, IRC (old Freenode and Rizon), Slashdot, Digg, Reddit, HN, Youtube Comments, Nextdoor, Local News Comments, Twitter/X, etc. I think "algorithms" (including counting upvotes) have reduced the moderation burden and allowed social sites to scale much higher than they could before algorithms.
        Text communities all eventually collapse into ranting, bullying, hot takes, moral outrage, zealotry, and negativity. I'm open to any and all theories about why this is but I find this specific to text-based communities: Twitch, Instagram and TikTok have so much less of it for example. I think the idea that text leads to thoughtful communication was a hypothesis advanced first during the Usenet era and later during the blogging era but ended up being disproven. I think there's a nostalgia of the pre-media web that pervades these discussions that prevent text-fans from realizing at a macro level that the toxicity that was on comp.lang.lisp is the same toxicity in HN comments and is toxicity that just isn't there on most of Instagram, for better or for worse.
        I actually think this identity around being a "text person" is part of the problem. The moment you wrap your identity around something you become both proud and protective of it. For some things this is fine, but if your preferred media itself becomes part of your identity, then you're going to have a blind spot around what makes your preferred social media different from the others.
        [-]
        Eisenstein 3 days ago
        Excuse me but you are the one who made the 'text identity' distinction and called people who don't prefer posting videos of themselves 'toxic'.
        What exactly is a 'multimedia community' anyway? You haven't defined it. Is it just tik tok?
        [-]
        Karrot_Kream 3 days ago
        I don't think you're really engaging with my comment. I feel that you're offended at me calling text-only users of the internet toxic, and that you're responding in defense. If that's the case then there's no value in our discussion. You're just going to reply with charged comments until I recant.
        If you want another perspective on my point, take a look at https://www.reddit.com/r/slatestarcodex/comments/9rvroo/most...
        Have a nice day.
        [-]
        Eisenstein 3 days ago
        I think that you are not only incredibly patronizing, but you use faux psychology 'active listening' tactics to pretend to engage when you are really just shoving your point through while making yourself think that you are listening to people.
        The fact that you cannot even engage to answer what a multimedia community is without claiming that I am acting in bad faith in order to jump out of an escape hatch is telling.
        Your lack of self-awareness is astonishing.
        drilbo 2 days ago
        >Twitch, Instagram and TikTok have so much less of it for example.
        I'd be interested in any sort of evidence that supports this
      - Vampiero 4 days ago
        > My theory was that in the age of multimedia platforms, text platforms tend to attract folks who specifically want to use text over multimedia. Generally text forums will select for folks with social or self-esteem issues. These folks are the least likely healthily deal with their emotions or disengage positively. This leads to higher toxicity on text based platforms.
        Yeah that's very plausible indeed
  - int_19h 4 days ago
    This video will make you angry: https://www.youtube.com/watch?v=rE3j_RHkqJc
  - jerjerjer 4 days ago
    Humans love having something to be righteously indignant about.
- Rick76 4 days ago
  I don't like it, but it seems the internet always reacts more to inherently negative posts. That seems to be common across the entire internet, I think that's why the internet doesn't seem as fun as it did 10 years ago.
  I'm sure it's just human psyche but I'm trying to overcome it and make my life more positive again
- andrewmcwatters 4 days ago
  I suspect a large percentage of Dan's work moderating HN is downweighing posts that incite engagement from frustration. I've had on at least one occasion the top comment in a thread by over 100 upvotes that was purely the sentiment of several readers but did not contribute to the curated voice of the community.
- 4 days ago
  [deleted]
manx 4 days ago
Very interesting! Identifying great new content is a big unsolved problem for HN IMHO. Unfortunately, scores are not a good metric to predict, because they are not comparable (see https://felx.me/2021/08/29/improving-the-hacker-news-ranking...). A better metric might be "upvoterate", defined as how much more or less likely users are to upvote a story compared to the average story. More about that here: https://github.com/social-protocols/quality-news?tab=readme-...
pclmulqdq 4 days ago
There is a timing factor that you need to consider, too. Anecdotally, Sunday morning is the best time to get onto the front page, while Tuesday or Wednesday morning gets you the most views.
[-]
- kcorbitt 4 days ago
  Yep, that's why I included the post date in the information available to the model; in theory (if it's smart enough) it should be able to take that into account. That said I didn't include time-of-day; it would be interesting to see whether adding that information would be able to make the model more accurate!
  If the reward model is indeed smart enough to be able to take that into account you could actually use it to plan the optimal time of day to post a specific story! You could just use the reward model to compute a predicted score for 8 different versions of your content, holding the post title/text constant across them all and just changing the date. Based on the differences in scores, you can determine which posting time the RM thinks is most likely to make your post successful!
  [-]
  - pixl97 4 days ago
    >you could actually use it to plan the optimal time of day to post a specific story!
    You see this on Reddit pretty commonly.
    Someone posts original content at an off time and get a small/moderate amount of upvotes. Then some time later (could be hours, days, or weeks) a bot/karma account will post the content at an optimal time to farm upvotes.
Nevermark 3 days ago
> It’s super important that your training inputs includes all the information your model will need to make predictions. In this case, I included the post title, author, date, and content. All of those factors could be relevant to the chance a story gets voted up.
You would do better to leave out dates and authors.
Do you really want the model to hone in on dates & authors? If you just trained on those would it create anything useful?
It can’t for dates, since it isn’t getting any future date examples to prepare for future dates. I suppose you could argue that month & day matter. But surely that would be a much lower quality discriminator than forcing the model to stay focused on title & content.
Similarly with author. You can find out which authors produce content with the most upvotes with a simple calculation.
But again, is that the discriminator you want the model to use? Or the title & content? Because it will use the easiest discriminator it can.
gavin_gee 4 days ago
Take note HN, this is what great content marketing looks like.
6gvONxR4sf7o 4 days ago
Why use RL for this instead of plain old supervised learning?
[-]
- dinobones 4 days ago
  I am trying to understand this too.
  Supervised learning you train on pairs of (x, y) where x is your input (title/post text/metadata) and y is the output score.
  Naively, it's a linear regression model, Y = b0 + b1x1 + b2x2 + b3x3. Where b0 is your bias ("a floor for score points"), and b1, b2, and b3 are bias terms for the actual data of the post. You can solve this, closed form, and find the b1/b2/b3 that minimize the error of fitting to Y.
  How do these equations change with RL? I always assumed RL was a multi-step process where actions are taken to get to a reward. If there is only 1 step/decision, to produce a "random" score, it feels much like supervised learning.
  [-]
  - jampekka 4 days ago
    The post is not doing RL. It's just regression as you thought.
    [-]
    - billmalarky 4 days ago
      This post is using regression to build a reward model. The reward model will then be used (in a future post) to build the overall RL system.
      Here's the relevant text from the article:
      >In this post we’ll discuss how to build a reward model that can predict the upvote count that a specific HN story will get. And in follow-up posts in this series, we’ll use that reward model along with reinforcement learning to create a model that can write high-value HN stories!
      [-]
      - jampekka 2 days ago
        The title is misleading. The $4.80 is spent for supervised learning to find the best post.
        The post is interesting and I'll be sure to check out the next parts too. It's just that people, as evidenced by this thread, clearly misunderstood or were what was done.
- jampekka 4 days ago
  It is just plain old supervised learning. A regression from the post features to vote count. The RL discussion in TFA is a bit confusing.
  Such a model can be used as the "reward model" for the "reinforcement learning from human feedback" (RLHF) method.
Havoc 4 days ago
Nice write up.
Did you ever figure out what happened in 2016?
[-]
- kcorbitt 4 days ago
  Nope. I was actually planning on asking dang if he has any insights there. If he sees this thread hopefully he can chime in!
  [-]
  - n2d4 4 days ago
    Given that Google Trends doesn't show that bump, I'd assume it has to do with how the data was collected. Maybe all stories with < X votes/comments older than 2015 are not included, or deleted from whatever index you used?
  - kelnos 4 days ago
    In case he doesn't, you might as well email him about it. He's a very responsive guy and might find it interesting.
  - twoodfin 4 days ago
    I think text vs. link used to be XOR, but isn’t any longer.
    It’s still outside the hn mainstream to use both in the same submission, so that might be biasing the model in strange ways.
    [-]
    - jerjerjer 4 days ago
      From the post:
      > But to simplify, instead I’ll just limit to stories that have only text bodies, instead of links.
      This line implies that pre- and post- 2016 stories are text only, so this change should not affect the data so much.
metalman 3 days ago
now do it again, and this time see where your post on ranking posts,ranks Personaly,I find lauding the dead, and dead past to be some how objectionable. Though I suppose that it is the business of our so called Ai, mining the dead past, hoping to come up with something better than frankenstien's zombie corpse. It is an insurmountable limitation, and dangerous I think as well, the past is that ultimatly perfect thing, its absolute imutability, and totality, as it is all there, to pick and choose from such a thing is brazen indeed. I cant help but imagine a picture of your $4.80 actualy bieng consumed in a bed of fluidised coal, which in fact it was.
hnburnsy 3 days ago
Suggestion would be to try and coorolate the best time to post on HN to get it noticed. A good post won't catch fire if it doesn't overcome the initial low visibility. I've posted items that are later posted by others that gain traction.
Maybe the reputation of the poster is also a factor?
1024core 4 days ago
Is it my understanding that the reward model is also similar to an LLM (with the difference being it predicts a score instead of the next token)?
[-]
- kcorbitt 4 days ago
  Yes! The architecture is almost identical. The only difference is in the final layer. In an LLM used for text generation, the final layer has a separate output for every potential token the model could produce, and we decide which token to generate by choosing the one with the highest likelihood at each generation step (at least that's what the simplest sampling methods do). In an LLM used as a reward model, we only have one output in the final layer, and we interpret its value as the predicted reward.
  Everything else in the model before that final layer is exactly identical, architecture-wise.
  [-]
  - 1024core 4 days ago
    But a typical LLM has a feedback loop: it looks at the last token it generated and then decides, given the N tokens before that, which token to output next.
    In the case of a reward model, are you streaming in the list of tokens; if so, what is the output after each token? Or are you feeding in all of the tokens in one shot, with the predicted reward as the output?
    [-]
    - maleldil 4 days ago
      There are multiple ways to model reward. You can have it be fine-grained, such that every token gets its own reward, but by far the most common is to feed in the whole sequence and generate a single reward at the end.
      [-]
      - 1024core 4 days ago
        I guess I'm not sure how the "feed in the whole sequence" works, if there's a single reward at the end.
        [-]
        maleldil a day ago
        It depends on the model and the problem. As an example, BERT-based models have a special [CLS] token that was pre-trained to encode information about the whole sequence. A reward model based on BERT would take the output embedding of that token from the last layer and feed it through a classification head, which would depend on your problem. You could then train this classification head on your alignment dataset like a classification problem.
        You can check the examples from the TRL library for more information.
eugenekolo 4 days ago
What does the model say about this post?
[-]
- kcorbitt 4 days ago
  Haha great question. Since it's only trained on on-platform HN content and not external links, this post is a little bit out of distribution for it unfortunately. I'm thinking about scraping a corpus of external links and running the same analysis though, in which case I'd definitely run it on this story because I'm also curious about that. :)
  [-]
  - Rick76 4 days ago
    I would be very interested in the results of that as well
4 days ago
[deleted]
hn_throwaway_99 4 days ago
> And in follow-up posts in this series, we’ll use that reward model along with reinforcement learning to create a model that can write high-value HN stories!
Well, thanks HN, you were good while it lasted...
suyash 4 days ago
Very interesting project, would love to read a more technical write up on how the model was architected and trained, any pointers?
[-]
- kcorbitt 4 days ago
  I link to it from the post, but all the code is open source! You can find the specific training script here: https://github.com/OpenPipe/best-hn/blob/main/stories_train_...
  And all the graphs for the blog are from this notebook: https://github.com/OpenPipe/best-hn/blob/main/blog-figures.i...
  Lots of other good stuff in that repo, although it's only organized to a "working researcher" standard I'm afraid.
octocop 4 days ago
Even the AI's don't read the content before up/down voting.
floobertoober 4 days ago
Maybe it would help to use a box cox transform on the score distribution?
chx 4 days ago
> . That’s not much time for a model that (hopefully) understands all of HN!
this is dangerous talk.
it doesn't understand anything at all.
Reminder: We are more prone to anthromorphizing LLMs than to humanizing suffering humans.
ivanovm 3 days ago
this is very cool, have you tried DPO?
ChrisArchitect 4 days ago
First problem with the submissions that supposedly 'would do well on HN' is other than the Ask HN: they're misusing the submission by putting it in a text post instead of sharing as a link post directly. And sketchy new/inactive accounts. C'mon. Not gonna keep reading grifty post after that opening.