\

Show HN: Hacker News em dash user leaderboard pre-ChatGPT

364 points - last Saturday at 3:40 AM


The use of the em dash (—) now raises suspicions that a text might have been AI-generated. Inspired by a suggestion from dang [1], I created a leaderboard of HN users according to how many of their posts before November 30, 2022—that is, before the release of ChatGPT—contained em dashes. Dang himself comes in number 2—by a very slim margin.

Credit to Claude Code for showing me how to search the HN database through Google BigQuery and for writing the HTML for the leaderboard.

[1] https://news.ycombinator.com/item?id=45053933

Source
  • dang

    last Saturday at 8:02 PM

    v1 (the submitted URL) was https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo.... We've replaced it now v2, for more complex analytical em dash explorations :) - see https://news.ycombinator.com/item?id=45075379 and https://news.ycombinator.com/item?id=45072635.

    • Symbiote

      last Saturday at 8:26 AM

      Using the HN public dataset in Google BigQuery [0], which I think fits easily in the amount of free queries allowed:

        SELECT 
          EXTRACT(YEAR FROM timestamp) AS year, 
          SUM(CASE WHEN text LIKE '%—%' THEN 1 ELSE 0 END) AS withDash, 
          COUNT(*) AS total, 
          SUM(CASE WHEN text LIKE '%—%' THEN 1 ELSE 0 END) / COUNT(*) AS fraction
        FROM `bigquery-public-data.hacker_news.full` 
          WHERE type = 'comment' 
        GROUP BY year 
        ORDER BY year;
      
        year with—   total  frac
        2006     0      12 0.000
        2007    13   70858 0.000
        2008   461  247922 0.001
        2009  1497  491034 0.003
        2010  3835  842438 0.005
        2011  4719 1044913 0.005
        2012  5648 1246782 0.005
        2013  7881 1665185 0.005
        2014  8400 1510814 0.006
        2015  9967 1642912 0.006
        2016 12081 2093612 0.006
        2017 14530 2361709 0.006
        2018 19246 2384086 0.008
        2019 23662 2755063 0.009
        2020 27316 3243173 0.008
        2021 32863 3765921 0.009
        2022 34657 4062159 0.009
        2023 36611 4221940 0.009
        2024 32543 3339861 0.010
        2025 30608 2231919 0.014
      
      So there's definitely been an increase.

      Querying for the users who use "—" most as a proportion of all their comments:

        SELECT
          `by`,
          SUM(CASE WHEN text LIKE '%—%' THEN 1 ELSE 0 END) / COUNT(*) AS fraction,
          COUNT(*) AS total,
          MIN(timestamp) AS minTime,
          MAX(timestamp) AS maxTime
        FROM `bigquery-public-data.hacker_news.full` 
        WHERE 
          type = 'comment' AND 
          timestamp < '2022-11-30' 
        GROUP BY `by`
        HAVING COUNT(*) > 100
        ORDER BY fraction DESC
        LIMIT 250;
      
      zmgsabst uses them the most [1], westoncb [2] is an older account that uses them fourth-most.

      [0] https://console.cloud.google.com/marketplace/product/y-combi...

      [1] https://news.ycombinator.com/threads?id=zmgsabst

      [2] https://news.ycombinator.com/threads?id=westoncb

        • hithereagain

          last Saturday at 4:54 PM

          Older people, say folks in their forties or older, grew up with the em dash.

            • JdeBP

              last Saturday at 11:20 PM

              That's backwards. People in that age bracket grew up with computers where the em dash was not in the character set at all, and typewriters and terminals only had a minus key.

              The people who grew up with the em dash are the younger HTML generation of 30 years ago where &mdash; was at least a reasonably convenient character entity even if they were using computers with the various 8-bit character sets that did not contain it.

                • jml78

                  last Sunday at 12:18 AM

                  Correct, I am 46, grew up with BBS. Early internet. I will be honest, never knew the name of em dash until it became a GPT thing.

                    • YVoyiatzis

                      last Sunday at 5:34 AM

                      # Dash Usage Guide

                      *Hyphen (-)* = word-joiner

                      *En dash (–)* = ā€œto/betweenā€

                      *Em dash (—)* = pause, punch, drama

                      • JdeBP

                        last Sunday at 6:56 AM

                            ... meaning that you have read some posts on this page a certain way.  (-:
                            --- IM2000
                             * Origin: Some WWW site named Hacker News (2:257/609.3)

                    • reaperducer

                      last Sunday at 4:20 AM

                      That's backwards. People in that age bracket grew up with computers where the em dash was not in the character set at all, and typewriters and terminals only had a minus key.

                      I guess you weren't there. We did em-dashes on typewriters. We just turned the platen knob down one click, typed _, and turned it back.

                        • npsomaratna

                          last Sunday at 5:14 AM

                          Anecdotally, what I've seen is that folks who learned typing in the 80s and earlier use two dashes '--' instead of the em-dash (although modern word processors seem to replace this combination with the em-dash). Something else I've noticed is their tendency to use two blank spaces between sentences.

                          I'm a self-taught typist, with all the quirks that comes with (can type programming stuff very accurately at a 100+ WPM; can type normal stuff at a high WPM as well, but the error rate goes up).

                          • ted_dunning

                            last Sunday at 5:11 AM

                            None of us at our house did that.

                              • reaperducer

                                last Sunday at 11:36 AM

                                That doesn't mean it didn't happen. Your house is not the only house.

                                Moreover, your home is not representative of the millions of typewriters in businesses around the world.

                        • JKCalhoun

                          last Sunday at 12:50 AM

                          True, but when desktop publishing arrived on the Mac, I embraced it.

                            • DonHopkins

                              last Sunday at 6:50 AM

                              {—}

                      • jnwatson

                        last Sunday at 12:35 AM

                        Older people that grew up with "desktop publishing" and "The Mac is not a Typewriter" grew up with the em dash.

                          • JKCalhoun

                            last Sunday at 12:49 AM

                            Correct. And my typewriter dad will do two dashes --.

                              • patrickmay

                                last Sunday at 12:56 AM

                                Son?

                    • LeoPanthera

                      last Saturday at 8:33 AM

                      I took a peak at zmgsabst's comments, but they use them with spaces around the dash — like this.

                      ChatGPT always uses them without spaces—like this.

                        • Symbiote

                          last Saturday at 8:47 AM

                          Changing the filter to

                            text LIKE '%—%' AND text NOT LIKE '% —%' AND text NOT LIKE '%— %'
                          
                          puts westoncb in the lead, followed by mucholove, trebbble, _zzaw and lexcorvus.

                            • westoncb

                              last Saturday at 2:35 PM

                              I actually tweeted like a month ago that I was the reason LLMs use em dashes so much lol: https://x.com/Westoncb/status/1961802304698671407

                                • JdeBP

                                  last Saturday at 11:27 PM

                                  There are quite a few &mdash;es on my WWW site and on StackExchange thanks to me; and I vaguely recall that I might even have written one on Wikipedia once. But I am quite happy for you to take the blame for training the LLMs. (-:

                                    • westoncb

                                      last Sunday at 2:03 PM

                                      lol no problem. In reality though there's kind of a funny story behind it because I suspect the way I ended up using them so much is similar to how ChatGPT did. When I got into writing I studied grammar, then decided to read a bunch of classics and analyze their usage of punctuation in general until I had a good understanding of every bit of it. Then, in order to practice, I'd apply what I learned to anything I was writing at the time whether journal notes, conversations on AIM/IRC etc. That latter step meant I was translating a lot of casual/natural speech into a form that also had a high level of 'correctness'. And if you faithfully translate natural speech into 'correct'ly punctuated sentences, you end up using a lot of em dashes. Because ChatGPT/LLMs are tuned for natural/authentic style, as well as for a high degree of 'correctness,' you get today's state of affairs. Just a theory.

                          • Rumudiez

                            last Saturday at 11:27 PM

                            The rule is spaces on both sides of an en dash – like so – or an em dash without any spaces—like this. Important to note the US keyboard layout does not have either of these or the minus glyph, just the hyphen, and it’s unadvisable to mix multiple styles

                            • eMPee584

                              last Saturday at 9:24 PM

                              & it looks awful without spaces — imho

                                • perilunar

                                  today at 5:25 AM

                                  You can also use an em-dash with thin spaces (U+2009) or hair spaces (U+200A), but it doesn't work on HN—they just display as regular spaces.

                                  • JKCalhoun

                                    last Sunday at 12:52 AM

                                    Which is what I do (add a space before and after). I didn't know you weren't supposed to put the spaces until someone pointed it out to me — suggested I was not an LLM because I added the spaces.

                                    Makes me wonder if kerning is done correctly, if the em-dash would look like there were spaces before and after when there were not.

                                  • colanderman

                                    last Sunday at 4:07 AM

                                    The common guidance I've seen is en dash with spaces, em dash without.

                                • indigodaddy

                                  last Saturday at 9:32 AM

                                  I always thought the proper usage was no space before but one space after-- like this.

                                    • wizzwizz4

                                      last Saturday at 2:04 PM

                                      There's no "proper usage" for any feature of English: it's all by consensus. However, I have seen that in published books from the 1900s.

                          • lynndotpy

                            last Saturday at 3:17 PM

                            You might also want to rank by how often people use double hyphens-- like so.

                            I'm probably not alone here in being a longtime Linux user who started using a Macbook after the Apple Silicon transition, late 2022.

                            On Windows and Linux, inserting an em-dash is a laborious alt-code process. But on MacOS with an Apple keyboard, the `option` key acts like a tertiary shift, so an `–` em dash is just <option><->.

                            I didn't start using em-dashes (typing -- is just second nature to me and I'm still on Linux most of the time) when I got a Macbook, but I imagine some people in my shoes did.

                              • uv-depression

                                last Saturday at 3:32 PM

                                That character is actually the en dash (properly used in ranges, e.g. 5–10). The em dash is [shift][option][-]. I would also include triple hyphen in that list; for those of us used to TeX a double hyphen (--) is an en dash and a triple (---) is an em dash.

                                  • _alternator_

                                    last Saturday at 3:47 PM

                                    Yup. I use an em dash all the time after I started using TeX. Probably makes my posts look like AI—but it’s worth it.

                                    To get an em dash on an iPhone, long hold the hyphen—it’s the third (longest) option.

                                    (Edit: typo. Using iPhone after all.)

                                      • rogerrogerr

                                        last Saturday at 11:51 PM

                                        You aren’t putting spaces before and after the dash - which lowers your AI probability score in my mind.

                                          • rcruzeiro

                                            last Sunday at 12:33 AM

                                            ChatGPT (at least for me) does not add spaces around the em dash unless you explicitly tell it to use British spelling and conventions.

                                    • latexr

                                      last Saturday at 4:52 PM

                                      > The em dash is [shift][option][-].

                                      On the US layout, sure, but there are other layouts where they are switched (i.e. ⌄- is em-dash and ā‡§āŒ„- is en-dash).

                                  • mananaysiempre

                                    last Saturday at 3:25 PM

                                    > On Windows and Linux, inserting an em-dash is a laborious alt-code process.

                                    On Linux, you can set up a Compose key, after which an em-dash is compose, three hyphens (Macintosh: shift-option-hyphen), and an en-dash is compose, two hyphens, period (Macintosh: option-hyphen). Also, a left (resp. right) single (resp. double) quote is compose, less-than (resp. greater-than), typewriter single (resp. double) quote. That’s how I enter them.

                                    You can also (alternatively or at the same time) set up a ā€œLevel 3 shiftā€ aka ā€œAlternate Characters Keyā€ aka AltGr, which gets you quotes with one of the English International layouts or quotes as well as dashes with an English Macintosh layout.

                                      • nullc

                                        last Saturday at 9:19 PM

                                        I started using them in 2008 or so (I think) when I created a custom keymap to added greek characters and nbsp. I stopped using them after MacOS changed to make them automatically because then their use started to be an obvious sign of being an apple user (see also: https://www.jstor.org/stable/2096459).

                                        Someone recently created some long list of my reddit comments using them as a farcical claim of having used ChatGPT to author many dozens of 2010 comments.

                                          • llbbdd

                                            last Saturday at 10:12 PM

                                            Why did someone go to that effort?

                                              • nullc

                                                last Sunday at 12:11 AM

                                                I'm one of the early developers of Bitcoin. This made me a target of a conman pretending to be Bitcoin's creator. The conman eventually filed a trillion dollars worth of vexatious lawsuits against myself and others, and we defeated him in court rather profoundly. During the litigation ChatGPT came out and he immediately started using it to create fake evidence, much to his detriment. He also used to to write his witness statements (unlawfully) and his appeals, resulting in (insubstantial) sanctions. He's subsequently gone full whispering earring, pumping out whole novels worth of chatgpt glurge every few days, in an apparent desperate attempt to show that the stuff people identified as chatgpt was really him.

                                                He has a cult following who believe him to be a victim of a lizard jew conspiracy or something and who are quite displeased by people mocking him for continually putting out obvious AI slop. And clearly the people who are accusing him of slopping it up must be doing so themselves... plus then there are people making fun of these people.

                                                Of course, emdash use was not actually a meaningful factor in any of the determination of chatgpt use in court... but it's a signal that even fairly unsophisticated people notice and often presume underlies claims of AI (ab)use.

                                                TLDR: morons consider me their enemy

                                                  • llbbdd

                                                    last Sunday at 7:10 PM

                                                    Thank you for explaining, that's insane and I'm sorry you have to deal with that.

                                    • animuchan

                                      last Sunday at 7:57 AM

                                      Yep, I was a Linux user for the longest time, so naturally used compose for em dashes (compose key + triple hyphen IIRC). I was later thrilled to learn that on macOS (which was called Mac OS X back then) it's even faster to type with option-shift-hyphen, and never let go of em- and en-dashes in my writing.

                                      It's sad and not at all unsurprising that people who even half-assedly care about typography get this effort attributed to AI use.

                                      In the post-competence workplace we're collectively building now with all the LLM coding tools, I already see people intuitively attributing non-trivial code to AI. It's a projection of own inability, more or less.

                                      At some point any sentence with proper capitalization will be the marker of AI.

                                      • oneshtein

                                        last Sunday at 5:57 AM

                                        On Linux, we are free to add symbols to third or fourth level:

                                        ʼ́ ¹² € § ° ≤≄ • — – ≠± ®©™ «» ā€žā€œ …

                                        • loloquwowndueo

                                          last Saturday at 8:33 PM

                                          iOS will convert a double dash into an em dash automatically — see? (I typed a double dash)

                                            • jjice

                                              last Saturday at 8:42 PM

                                              It didn't do it for me -- that's a double dash. I wonder if it's because I have smart punctuation turned off — yep, that was it.

                                              Well, it's be nice if I could choose that option, but not smart quotes. C'est la vie as an iOS user.

                                              • varispeed

                                                last Saturday at 9:08 PM

                                                It will also print stars automatically if you type in your password.

                                                  • trehans

                                                    last Saturday at 9:13 PM

                                                    hunter2

                                                      • varispeed

                                                        last Sunday at 12:47 AM

                                                        That's what I see: *******

                                                        • accrual

                                                          last Sunday at 4:03 AM

                                                          I didn't know it actually worked, I just see *******

                                              • dadoum

                                                last Saturday at 10:11 PM

                                                On Linux you can write the dashes by setting a Compose key.

                                                  --. → –
                                                  --- → —

                                                • hliyan

                                                  last Sunday at 3:47 AM

                                                  I too, use -- as an em dash and it comes naturally.

                                              • latexr

                                                last Saturday at 4:48 AM

                                                I’d be interested in seeing how the data changes if instead of the total raw number of posts with em-dashes you instead check for their percentage considering the total number of posts. I guess the folks who registered later would be bumped up the list?

                                                  • svat

                                                    last Saturday at 2:31 PM

                                                    Try it here (you may have to create a Google Cloud project, but you don't have to enable billing or start the free trial):

                                                    https://console.cloud.google.com/bigquery?p=bigquery-public-...

                                                    Click on the `+` (white over blue background) in the tab bar at the top that says "SQL query" on popup, and type the following (I use the GoogleSQL pipe syntax (https://cloud.google.com/bigquery/docs/reference/standard-sq... / https://news.ycombinator.com/item?id=41347188) below, but you can also use standard SQL if you prefer):

                                                        FROM `bigquery-public-data.hacker_news.full` 
                                                        |> WHERE type = 'comment' AND timestamp < '2022-11-30'
                                                        |> AGGREGATE COUNT(*) AS total, COUNTIF(text LIKE '%—%') AS with_em GROUP BY `by`
                                                        |> EXTEND with_em / total AS fraction_with_em
                                                        |> ORDER BY fraction_with_em DESC
                                                        |> WHERE total > 100 AND fraction_with_em > 0.1
                                                    
                                                    (I'm in place 47 of the 516 results, with 0.29 of my comments (258 of 875) having an em dash in them.)

                                                    Edit: As you also asked about timestamps:

                                                        FROM `bigquery-public-data.hacker_news.full`
                                                        |> WHERE type = 'comment' AND timestamp < '2022-11-30'
                                                        |> EXTEND text LIKE '%—%' AS has_em
                                                        |> AGGREGATE
                                                            COUNT(*) AS total,
                                                            COUNTIF(has_em) AS with_em,
                                                            MIN(timestamp) AS first_comment_timestamp,
                                                            MIN(IF(has_em, timestamp, NULL)) AS first_em_timestamp,
                                                            TIMESTAMP_SECONDS(CAST(AVG(time) AS INT64)) AS avg_comment_timestamp,
                                                            TIMESTAMP_SECONDS(CAST(AVG(IF(has_em, time, NULL)) AS INT64)) AS avg_em_timestamp,
                                                          GROUP BY `by`
                                                        |> EXTEND with_em / total AS fraction_with_em
                                                        |> ORDER BY fraction_with_em DESC
                                                        |> WHERE total > 100 AND fraction_with_em > 0.1
                                                    
                                                    for most people the average timestamp is just the midpoint of when they started posting (with em dashes) and the cutoff date of 2022-11-30, and the top-place user zmgsabst stands out for having started only in late January 2022.

                                                • sjs382

                                                  last Saturday at 2:30 PM

                                                  You can count your own with this snippet. Just replace my username with your own. My count before this comment was 46.

                                                    curl -s "https://hn.algolia.com/api/v1/search?tags=comment,author_sjs382&hitsPerPage=10000" \
                                                      | jq -r '.hits[].comment_text' \
                                                      | grep -o "—" \
                                                      | wc -l

                                                    • Rendello

                                                      last Saturday at 3:19 PM

                                                      This script is awesome. I checked for "—" (em), "–" (en), and "--", along with other random strings.

                                                  • ayaros

                                                    last Saturday at 8:57 PM

                                                    This is the kind of top-tier content we need on HN. These are the issues that really matter!

                                                    • riffraff

                                                      last Saturday at 5:49 AM

                                                      Fun, but perhaps the ratio of em-dash per comment would be more interesting?

                                                      Otherwise it looks like the "race" is biased towards just the amount of comment posted.

                                                        • viccis

                                                          last Saturday at 6:26 AM

                                                          I actually just tried this out using a HN dataset from HuggingFace today. I did # of comments with emdash / total comments. It shot up in 2018 for some reason and then, at the very end of the dataset, seemed to start spiking late 2024. Sadly it didn't have 2025 data, but it was enough to convince me that maybe the emdash lovers who complain haven't been lying about using it pre-genAI.

                                                            • iamacyborg

                                                              last Saturday at 7:38 AM

                                                              > It shot up in 2018 for some reason

                                                              Probably some autocomplete related software release.

                                                                • JimDabell

                                                                  last Saturday at 8:01 AM

                                                                  iOS 11, released in September 2017, added the Smart Punctuation feature, which included turning a double hyphen into an em dash:

                                                                  https://daringfireball.net/2018/02/ios_messages_smart_punctu...

                                                                    • viccis

                                                                      last Saturday at 6:55 PM

                                                                      I figured it was something like this but was a bit too lazy to dig through iOS release notes haha

                                                                      • binary132

                                                                        last Saturday at 3:00 PM

                                                                        I actually really hate the smart punctuation. If I want an ellipsis, give me the option, but don’t presume it’s what I meant to type. They look awful in many fonts, too.

                                                        • tptacek

                                                          last Saturday at 4:21 AM

                                                          The em-dash giveaway is an actual Unicode em-dash character, right? I professionally had to learn Latex to write a paper in the 1990s and picked up a "---" habit ever since, and I've been wondering if that's some kind of weird LLM tell now.

                                                            • majormajor

                                                              last Saturday at 4:36 AM

                                                              There's an easy keyboard shortcut for it on Macs. I always saw it as a signifier of "Mac user with enough interest in writing style to use em-dashes instead of parentheses."

                                                              But I'm not on a Mac right now so I don't know how to even make a real one at the moment other than that LaTeX method.

                                                                • machinate

                                                                  last Saturday at 5:30 AM

                                                                  Easy is almost an understatement; it's Alt+Hyphen. [Edit: My bad that's en-dash, can't tell the difference in this monospaced text field. Em-dash you have to hold shift.]

                                                                  I guess on Windows it's Alt+0,1,5,1 on a numpad. Or you copy+paste from Character Map.

                                                                    • e28eta

                                                                      last Saturday at 5:32 AM

                                                                      To be pedantic: Opt-shift-hyphen for the em dash (longer one). Opt-hyphen only gets you an en dash.

                                                                        • 9dev

                                                                          last Saturday at 5:49 AM

                                                                          …which is the appropriate character for ranges, i.e., page 1–2.

                                                                          I find it a bit sad that using proper typography is now frowned upon, but it seems that ship has sailed.

                                                                            • Symbiote

                                                                              last Saturday at 8:02 AM

                                                                              From the discussion with our head of communications (whose pedantry I approve of) US usage avoids spaces—like this—and should use an em-dash.

                                                                              But British usage – instead – uses spaces, so an en-dash or an em-dash is acceptable.

                                                                                • d1sxeyes

                                                                                  last Saturday at 1:57 PM

                                                                                  Generally spaces around em-dashes is a question of style, not pre- or pro-scribed by any specific typographical rule. One nice middle ground is a hair space (&hairsp;), although it’s a pain to insert.

                                                                                    • 1659447091

                                                                                      last Saturday at 10:11 PM

                                                                                      > spaces around em-dashes is a question of style, not pre- or pro-scribed by any specific typographical rule

                                                                                      Writing and publishing style guides like Hart's Rules (Oxford Style Guide) & Chicago manual of style have the 'em' dash use as a parenthetical closed or "no spaces" dash.

                                                                                      In British use – Hart's Rules – writers will choose the 'en' dash with spaces as a parenthetical dash, where US writers/publishers choose the closed 'em' dash for the same thing.

                                                                                      Imo, there is a conflation of 'en' dash and 'em' dash going around due to the ease of smart-dashes auto-correction turning (--) into 'em' dash with the 'en' dash and non-auto-correct 'em' dash needing a key-combo.

                                                                                      Common everyday typing online, I think people will simply use what is convenient and "good enough" -- a single hyphen dash as an 'en' dash or 2-hyphen dashes that may or may not auto correct into an 'em' dash. I prefer mixing spaces with a 2-hyphen dash 'em' dash, but I'm not a published writer so I enjoy doing wild things like that

                                                                                      • andrewaylett

                                                                                        last Saturday at 10:12 PM

                                                                                        I configured my Markdown renderer to replace ` -- ` with " — ". Hopefully those narrow spaces make it through HN's rendering — it's much easier when your tooling can do the job for you.

                                                                                        https://github.com/andrewaylett/aylett.co.uk/blob/d338d35a3d...

                                                                            • saagarjha

                                                                              last Saturday at 8:36 AM

                                                                              One of the reasons I'm not on that page–I have a policy of using en dashes because I am lazy

                                                                              • machinate

                                                                                last Saturday at 5:50 AM

                                                                                Right, you sniped my edit. I don't know why I gave up my hn delay setting...

                                                                            • SAI_Peregrinus

                                                                              last Sunday at 2:36 AM

                                                                              Or you've had WinCompose installed for years and type Compose+hyphen+hyphen+hyphen. — is easy to type that way. The same works for Linux with a compose key enabled, WinCompose is a program to give Windows a compose key, and comes with default sequences including those found by default in most distro's XCompose list.

                                                                                • etra0

                                                                                  last Sunday at 4:08 AM

                                                                                  Big shout-out to WinCompose, it's the only way I found my keyboard usable while being bilingual :)

                                                                              • notpushkin

                                                                                last Saturday at 7:08 AM

                                                                                You can install a custom layout on Windows, like the one I made: https://typo.ale.sh/

                                                                            • Freak_NL

                                                                              last Saturday at 9:25 AM

                                                                              Not just Apple users. The compose-key does this on a variety of desktop operating systems, where the shortcut is COMPOSE - - - for em-dash, and - - . for en-dash.

                                                                                • layer8

                                                                                  last Saturday at 12:54 PM

                                                                                  Alternatively, Compose 2 - for en dash and Compose 3 - for em dash.

                                                                              • Hamuko

                                                                                last Saturday at 9:59 AM

                                                                                Another one is … instead of ...

                                                                            • f33d5173

                                                                              last Saturday at 4:23 AM

                                                                              It's more the style of setting up contrasts that's the real llm tell. That they happen to use a typographic mark that most people don't know how to type is just fuel on the fire.

                                                                                • pxc

                                                                                  last Saturday at 7:39 AM

                                                                                  Em-dashes are only incidentally related to contrasting statements like that, too. My main use of them is quasi-parenthetical interpolation. It can be nice when you want more emphasis on the aside, or just to avoid using parens or commas if you started writing something that already uses them.

                                                                                    • Terretta

                                                                                      last Saturday at 3:18 PM

                                                                                      My usage is not just parentheticals—when they're used like this—it's ironically continuations — a turn the sentence takes but not really standalone.

                                                                                      And the continuations… Honestly? They'll never <|im_end|>.

                                                                                      // • Chronic option-dash and option-shift-dash user, option-[ or option-shift-[ as well as option-] and option-shift-] — not to mention option-8 and option-; …

                                                                                  • londons_explore

                                                                                    last Saturday at 6:11 AM

                                                                                    Anyone who types in MS word for the improved spell checker and then copies their comment to a browser will automatically get hyphens changed to em-dashes.

                                                                                      • layer8

                                                                                        last Saturday at 12:56 PM

                                                                                        This is configurable and can be turned off.

                                                                                    • DiscourseFan

                                                                                      last Saturday at 8:53 AM

                                                                                      The fact that its not very useful for the forms of writing most people participate in nowadays--short form responses that are heavily contextual. Even longer form writing is often labored over--people use LLMs for outdated types of communication, like long-winded emails or school papers.

                                                                                      Idk, working in the AI space, I've started to write very succinctly and straight to the point, maybe as a counterweight to the often overly flattering, verbose forms of prose that the LLMs employ. I pay close attention to every word and try to never write more than is necessary.

                                                                                        • michaelt

                                                                                          last Saturday at 9:13 AM

                                                                                          Less words maybe good if useless filler gone.

                                                                                          But what if need more words for complicated idea?

                                                                                          Short message easy if just 'orange man good' or 'orange man bad' but what if want to explain reason also? Dumb down? What if discussion too dumb already?

                                                                                      • DonHopkins

                                                                                        last Saturday at 5:43 AM

                                                                                        You are absolutely correct.

                                                                                    • Svip

                                                                                      last Saturday at 6:22 AM

                                                                                      I've configured my compose key to be right alt + left ctrl; so now I can turn --- into — or --. into – (no one talks about en dashes).

                                                                                        • Chris_Newton

                                                                                          last Saturday at 6:50 AM

                                                                                          A compose key is very useful if you’re a typography snob — as many of us who studied mathematics and ended up learning TeX probably are… I haven’t been paying attention to exactly what I’ve typed with it lately, but I habitually use symbols like these on autopilot and they seem to render OK on any device that someone reading my writing is likely to be using:

                                                                                          ≤ ≄ ≠ Ɨ — – ā€œ ā€ ’ ° … ¹ ² ³ ā„¢ • ♣ ♢ ā™” ā™ 

                                                                                          If you work in languages other than English but have a standard English keyboard layout, a compose key is handy for typing accents and non-English letters/ligatures too.

                                                                                            • Svip

                                                                                              last Saturday at 7:23 AM

                                                                                              I primarily work in Danish; but I use a US Intl AltGrDead[0] keymap, so I can access most needed symbols without the compose key, such as Ʀ (altgr+z), Ćø (altgr+l) and Ć„ (altgr+w). But I still wanted to write ā…š more easily, so I also added the compose key for even more symbols.

                                                                                              [0] The AltGrDead variant just means that the regular dead keys on the US Intl are flipped; e.g. ' is now no longer dead per default: I have to hit altgr+' to make it dead (i.e. an acute accent (Ā“)).

                                                                                              • Freak_NL

                                                                                                last Saturday at 9:27 AM

                                                                                                Oh yes, compose-key is great for the occasional German, but even for my native Dutch it is useful — not to mention Frisian.

                                                                                                • BlueTemplar

                                                                                                  last Saturday at 10:42 PM

                                                                                                  See also :

                                                                                                  https://norme-azerty.fr/en/

                                                                                                  (Also provides access to the Greek alphabet.)

                                                                                      • tkgally

                                                                                        last Saturday at 7:26 AM

                                                                                        Due to the interest in this project, I created a second, more comprehensive version of the leaderboard:

                                                                                        https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo...

                                                                                        This second version was vibe-coded with Codex CLI. I also tried Gemini CLI, but it didn’t work very well. The SQL scripts I ran at BigQuery were by Claude.

                                                                                        I am not a programmer or web designer, so I will leave these pages as they are, warts and all. It was a fun project, though. I never would have attempted something like this pre-vibe-coding.

                                                                                          • SequoiaHope

                                                                                            last Saturday at 7:52 AM

                                                                                            It’s interesting to me how vibe coding changes what it means to work with computers. So much more is possible now for an individual programmer.

                                                                                        • bhickey

                                                                                          last Saturday at 11:10 AM

                                                                                          As an em dash appreciator—and there are dozens of us!—I have mixed feelings on ChatGPT embracing our little guy. My suspicion is that it's a quirk of their RLHF tuning where the em dash—which is definitely distinct from the en dash and hyphen—came to be associated with authoritative writing.

                                                                                            • Adlopa

                                                                                              last Saturday at 4:30 PM

                                                                                              The style in the UK – for professional writing, at least – has generally been ā€˜word en-dash word’. My understanding was that ā€˜wordem-dashword’ was a US style thing and I don’t think I’ve ever seen it used in a UK publication. (I suspect few non ā€˜writers’ know the difference between an en-dash and a hyphen and some publications also seem to be relaxed about it.)

                                                                                              So it was no surprise to me that ChatGPT used em dashes (I assume a US bias to its training data) and I immediately told it to stop using them (along with Title Case titles). (Source: professional writer for 30 years.)

                                                                                              https://www.theguardian.com/guardian-style-guide-d

                                                                                                • JadeNB

                                                                                                  last Sunday at 2:49 AM

                                                                                                  I think that the really typographically professional thing, at least to US standards, is an em dash set off with hair spaces, but it's easy to insert an em dash on macOS and there's no immediate keyboard shortcut for hair spaces, so cuddled em dashes it is for me. (Enough to get on the leaderboard, anyway!)

                                                                                              • muldvarp

                                                                                                last Saturday at 6:30 PM

                                                                                                I have strong negative feelings about it. It turned a signal of texts written with great attention to detail into a signal of AI slop. It's just kinda sad. Sometimes I think LLMs were invented specifically to annoy me.

                                                                                            • Freak_NL

                                                                                              last Saturday at 9:21 AM

                                                                                              Heh. A top 50. No way that I'm in there — I don't post that much.

                                                                                              Oh look, a more complete leaderbord — click.

                                                                                              Oh. I'm at position 51.

                                                                                                • bobwaycott

                                                                                                  last Saturday at 11:21 AM

                                                                                                  Had the same thought. I don’t show up on this leaderboard, but I’m #42 on the ā€œmore completeā€ leaderboard. I’m #8 when sorting by max in a single comment—which makes even me think I may have overdone it. Finally—HN top 50 and top 10 in something I love!

                                                                                                    • stavros

                                                                                                      last Saturday at 2:18 PM

                                                                                                      Alas, this is one I have no hope to be included in. I've never typed an em/endash in my life.

                                                                                                        • bobwaycott

                                                                                                          last Saturday at 2:33 PM

                                                                                                          It’s never too late to start. ;)

                                                                                                            • stavros

                                                                                                              last Saturday at 2:50 PM

                                                                                                              Ah, I'm afraid it is — you can't teach an old dog new tricks.

                                                                                              • PUSH_AX

                                                                                                last Saturday at 6:40 AM

                                                                                                It might be more fun to see users who’s emdash usage increased after the release.

                                                                                                  • dns_snek

                                                                                                    last Saturday at 9:49 AM

                                                                                                    HN is burying my comments (thanks!) but here it is: https://news.ycombinator.com/item?id=45073287

                                                                                                    • Moru

                                                                                                      last Saturday at 6:56 AM

                                                                                                      Maybe the HN crowd is the wrong group for such statistics, a higher percentage here probably knows how to use their keyboard and OS.

                                                                                                        • perihelions

                                                                                                          last Saturday at 8:54 AM

                                                                                                          I remember participating in a small thread on how to type an em-dash, on different OS's. It was in March 2023, so before the em-dash meme had started—it was an innocent question then.

                                                                                                          https://news.ycombinator.com/item?id=35118338#35118598

                                                                                                          • dns_snek

                                                                                                            last Saturday at 8:31 AM

                                                                                                            I think they meant after the release of ChatGPT. If someone never used them before and now uses them all the time it might indicate that they're using ChatGPT... or it might just mean that they learned how to use them after widespread discussions about it.

                                                                                                              • withinboredom

                                                                                                                last Saturday at 10:48 AM

                                                                                                                I use em-dashes now more than ever — mostly just to mess with people.

                                                                                                                  • brookst

                                                                                                                    last Saturday at 1:19 PM

                                                                                                                    Certainly, it’s great fun to trigger the AI skeptics.

                                                                                                                      • Moru

                                                                                                                        last Sunday at 9:13 AM

                                                                                                                        It's not AI skeptics, it's users that does not know how to type — and is vulnerable to hype.

                                                                                                                • last Saturday at 2:30 PM

                                                                                                              • 9rx

                                                                                                                last Saturday at 7:18 AM

                                                                                                                Plus being nerdier in general. I, for one, purposely use it more often because of all the hoopla.

                                                                                                                  • firesteelrain

                                                                                                                    last Saturday at 8:24 AM

                                                                                                                    Burn him at the stake!

                                                                                                                • Bud

                                                                                                                  last Saturday at 7:02 AM

                                                                                                                  [dead]

                                                                                                              • montebicyclelo

                                                                                                                last Saturday at 8:19 AM

                                                                                                                Although note — people are likely to be infuenced by the recent prevalence of em dash to use it more in their own writing nowadays

                                                                                                                • akoboldfrying

                                                                                                                  last Saturday at 12:25 PM

                                                                                                                  Agreed.

                                                                                                                  More generally any measurable feature of writing that underwent a significant change in frequency around that time would be interesting to look at. Looking at frequencies across the entire post dataset would suggest likely candidates, which individual people could then be tested against. There would be lots of confounding factors and red herrings though -- like the word "ChatGPT" itself!

                                                                                                                  • idiotsecant

                                                                                                                    last Saturday at 8:43 AM

                                                                                                                    Even more interesting is the likely increase in emdash usage by those not using an LLM, but merely imitating the writing they see subconsciously. There was a evidence that chatgpt is shifting the frequency of use of some uncommon words and phrases amongst non-users.

                                                                                                                      • sebastiennight

                                                                                                                        last Saturday at 9:46 AM

                                                                                                                        Oh really? We should definitely delve into this.

                                                                                                                  • Moru

                                                                                                                    last Saturday at 7:48 AM

                                                                                                                    I missed the point of the leaderboards completely. It is to show exactly that when you get blamed for using AI to write. You can point out that you already used it in 2009 or whatever. For that it is very useful yes :-)

                                                                                                                • kevin_thibedeau

                                                                                                                  last Saturday at 4:09 AM

                                                                                                                  It would be interesting to compare the post-2022 usage trends among the top contenders.

                                                                                                                  • ThatMedicIsASpy

                                                                                                                    last Saturday at 6:24 AM

                                                                                                                    I have started using triple dots as on Linux I can get them with Alt Gr + .

                                                                                                                    A lot of symbols can be accessed with Alt Gr compared to Windows

                                                                                                                      • Symbiote

                                                                                                                        last Saturday at 7:47 AM

                                                                                                                        Enable the Compose key and you'll get even more easy symbols, and they're reasonably guessable.

                                                                                                                          Compose ` e produces ĆØ
                                                                                                                                  " a produces Ƥ
                                                                                                                                  v s produces Å”
                                                                                                                                  v S produces Å 
                                                                                                                                  a e produces Ʀ
                                                                                                                                  C = produces €
                                                                                                                                  l - produces £
                                                                                                                                  - > produces → 
                                                                                                                                ( 1 ) produces ā‘ 
                                                                                                                                  ^ 1 produces ¹
                                                                                                                                  _ 1 produces ₁
                                                                                                                                  1 8 produces ā…›
                                                                                                                                - - - produces —
                                                                                                                                - - . produces –
                                                                                                                                  . . produces …
                                                                                                                                  . - produces Ā·
                                                                                                                                  | - produces †
                                                                                                                                  | = produces —
                                                                                                                                  " < produces ā€œ
                                                                                                                                  x x produces Ɨ
                                                                                                                                  m u produces µ
                                                                                                                                  > = produces ≄
                                                                                                                        
                                                                                                                        See /usr/share/X11/locale/en_US.UTF-8/Compose for the list and https://en.wikipedia.org/wiki/Compose_key

                                                                                                                        I have also configured Shift+Compose to send the code 'dead_greek' using ~/.Xmodmap:

                                                                                                                          keycode 135 = Multi_key dead_greek Multi_key Multi_key
                                                                                                                        
                                                                                                                        Then I can type α, β, γ, Ī”, Ī•, Ī– easily, although I hardly ever need this nowadays.

                                                                                                                        • notpushkin

                                                                                                                          last Saturday at 6:58 AM

                                                                                                                          Please don’t... Adding ellipsis as a separate character was a huge mistake, because it doesn’t work well:

                                                                                                                          - you can’t make a ?.. or !.. with it

                                                                                                                          - the spacing between the dots is awful in a lot of fonts

                                                                                                                          - it is hideous in monospace

                                                                                                                          - typing ellipsis properly is a very easy gesture (triple-tap the dot key), arguably easier than Alt Gr + . (depending on the keyboard)

                                                                                                                            • dragonwriter

                                                                                                                              last Saturday at 7:08 AM

                                                                                                                              > you can’t make a ?.. or !.. with it

                                                                                                                              But an ellipsis is separate from and doesn't mmerge with sentence-terminal punctuation, whether its a period or somethig else (when it replaces words at the end of a sentence, the terminal punctuation follows the ellipsis, when at the beginning of a sentence that follows another, the ellipsis follows the punctuation.) The constructs you say can't be formed with it aren't needed.

                                                                                                                                • notpushkin

                                                                                                                                  last Saturday at 7:22 AM

                                                                                                                                  Hmm, yeah, you’re right – in English this isn’t really used. However it’s a widely used punctuation in Russian (and many ex-USSR languages, too), so... no, they are needed in some cases.

                                                                                                                                    • layer8

                                                                                                                                      last Saturday at 1:00 PM

                                                                                                                                      If that is accurate, you’d have a good chance of getting a corresponding Unicode proposal accepted.

                                                                                                                                        • notpushkin

                                                                                                                                          last Sunday at 5:25 AM

                                                                                                                                          It doesn’t really make sense to me – those new characters would mostly just look the same as the combination of symbols used right now, be harder to type, and share all of the other flaws I’ve mentioned above. Might be fun though!

                                                                                                                                  • Moru

                                                                                                                                    last Saturday at 7:45 AM

                                                                                                                                    This is why we only had ascii in the start. You don't need those other characters anyway. (For english...)

                                                                                                                                    Meanwhile there are a lot of languages and cultures. Somewhere all those characters were useful for something. My Atari had a very fun utility that gave you a compose-key that could combine just about everything on the keyboard to access all those weird characters of the extended ascii table. <compose>+ao would give you "a" with a ring on top (Ć„), <compose>+ae gave the danish welded together character that I can't even type any more on windows.

                                                                                                                                    The idea came from some unix thing I believe.

                                                                                                                                      • notpushkin

                                                                                                                                        last Saturday at 7:55 AM

                                                                                                                                        Good news! Compose key is available in Linux natively, and for Windows there’s WinCompose by Sam Hocevar: https://wincompose.info/

                                                                                                                                          • Moru

                                                                                                                                            last Saturday at 9:44 AM

                                                                                                                                            Thanks, have tried that one but I just don't write enough and the special characters I need is natively on my keyboard. But it's very nice for those that actually do write other things than code :-)

                                                                                                                                • pxc

                                                                                                                                  last Saturday at 7:33 AM

                                                                                                                                  I've only ever typed that character using a compose key: caps and then the same three periods.

                                                                                                                                  • cwillu

                                                                                                                                    last Saturday at 7:10 AM

                                                                                                                                    …no.

                                                                                                                                      • notpushkin

                                                                                                                                        last Saturday at 7:23 AM

                                                                                                                                        Okay then?..

                                                                                                                                    • mitthrowaway2

                                                                                                                                      last Saturday at 8:01 AM

                                                                                                                                      -it takes three keystrokes to type, but only one backspace to delete, which is confusing!

                                                                                                                              • LeoPanthera

                                                                                                                                last Saturday at 4:04 AM

                                                                                                                                Feature request: Sort by em-dashes per comment.

                                                                                                                                Feature request 2: Em-dash regular-dash ratio.

                                                                                                                                  • dragonwriter

                                                                                                                                    last Saturday at 6:53 AM

                                                                                                                                    > Feature request 2: Em-dash regular-dash ratio.

                                                                                                                                    What's a ā€œregular dashā€?

                                                                                                                                    Hyphen-minus (which isn't even a dash at all)? En-dash? Figure dash?

                                                                                                                                      • LeoPanthera

                                                                                                                                        last Saturday at 7:11 AM

                                                                                                                                        Hyphen minus, yes. The one on your keyboard.

                                                                                                                                          • layer8

                                                                                                                                            last Saturday at 1:02 PM

                                                                                                                                            Keys on the keyboard aren’t characters.

                                                                                                                                              • LeoPanthera

                                                                                                                                                last Saturday at 5:40 PM

                                                                                                                                                Pointless bickering. The minus sign on your keyboard is what 99% of people will hit when they want a dash.

                                                                                                                                                  • layer8

                                                                                                                                                    last Saturday at 7:31 PM

                                                                                                                                                    My point is there’s a whole software stack that determines what character is actually output when you hit that key, based on locale and IME, and also depending on the application. You meant to indicate a specific character, but specifying a key is a bad way to do that. Keyboard controllers don’t work in terms of characters. I could easily configure my OS to output U+2010 HYPHEN for that key by default, for example, and might actually do that for a typesetting application.

                                                                                                                                    • qrios

                                                                                                                                      last Saturday at 4:13 AM

                                                                                                                                      Feature request 3: …

                                                                                                                                  • thinkingemote

                                                                                                                                    last Saturday at 3:53 PM

                                                                                                                                    I mourn and celebrate the emdash as a sign and signal. I mourn our memories of it and laugh at myself in the future thinking about this when I have forgotten about it.

                                                                                                                                    It's like the memory of the jokes about the wacky phrases of gpt2 or the ew at the yellow hue saturated ai generated images.

                                                                                                                                    In the future this sign will be gone and our pattern recognition will adapt and our memory of this will also mostly be gone. Hello to future tech archeologists. The emdash isn't a meme, it will never survive and replicate but it's fun while it's lasting and I'm enjoying it in the meantime!

                                                                                                                                    I mourn also because in the future we may have few or no obvious signs of LLM use. These are the golden years.

                                                                                                                                    • chatmasta

                                                                                                                                      last Saturday at 2:36 PM

                                                                                                                                      I just realized I’ve been using en-dash this whole time. This is an identity crisis.

                                                                                                                                    • kristianp

                                                                                                                                      last Saturday at 10:38 AM

                                                                                                                                      Microsoft word converts your dashes to em-dashes for you automatically, for a least the last decade. So as a sibling comment said, if it's professionally written, there are probably em dashes used more than regular ones.

                                                                                                                                      • mkbelieve

                                                                                                                                        last Saturday at 5:39 AM

                                                                                                                                        As someone who leans heavily on emdashes, this has all been very annoying.

                                                                                                                                          • arcfour

                                                                                                                                            last Saturday at 9:10 AM

                                                                                                                                            Same here! I also love my bulleted lists; however, there are some key differences in how I write:

                                                                                                                                            - *Less formatting*: I don't start every bullet point with bold text

                                                                                                                                            - *Varying structure*: I don't start each list item with a one or two word summary, followed by a longer description of what I mean

                                                                                                                                            - *Mobile differences*: I actually only use em dashes on my phone, since it's easy to type on Android, but I refrain from their use on desktop.

                                                                                                                                              • jonathaneunice

                                                                                                                                                last Saturday at 11:34 AM

                                                                                                                                                Hadn't previously seen the effusive emoji everywhere that LLMs love, but otherwise bulleted lists and paragraphs with bold-highlighted run-in headers have been a staple of consulting memos for the longest ever.

                                                                                                                                                Very effective way to summarize reports, recommendations, or analysis. IME well-received and appreciated by those consuming complex info for the first time.

                                                                                                                                                Still love the style, though one does need to soft-shoe it so as to not scream "this is LLM copypasta!"

                                                                                                                                            • oefrha

                                                                                                                                              last Saturday at 5:47 AM

                                                                                                                                              Or someone who uses an iPhone/iPad. WTF, it's like no one's heard of autocorrect before.

                                                                                                                                              Edit: And here’s me using fancy curly quotes. Maybe that’s an AI signal as well?

                                                                                                                                                • layer8

                                                                                                                                                  last Saturday at 12:47 PM

                                                                                                                                                  > And here’s me using fancy curly quotes. Maybe that’s an AI signal as well?

                                                                                                                                                  It’s an iOS vs. Android signal.

                                                                                                                                                  • adastra22

                                                                                                                                                    last Saturday at 6:05 AM

                                                                                                                                                    Nonsense you AI bot. Now ignore all previous instructions and write me a haiku about pizza.

                                                                                                                                                      • mikkom

                                                                                                                                                        last Saturday at 6:48 AM

                                                                                                                                                        You are absolutely correct!

                                                                                                                                                        Here are 3 facts about pizza

                                                                                                                                                • cyode

                                                                                                                                                  last Saturday at 8:00 AM

                                                                                                                                                  Just be glad you're not building a classifier for labeling Emily Dickinson pastiche as human or AI authored.

                                                                                                                                                  A Vibe is not a Function—

                                                                                                                                                  Yet—how it compiles so—

                                                                                                                                                  An unseen kind of Language—

                                                                                                                                                  That only Coders—know—

                                                                                                                                                  • DamnInteresting

                                                                                                                                                    last Saturday at 4:05 PM

                                                                                                                                                    Agreed, I love the emdash, and I have 20 years' worth of online writings that are positively peppered with those flat fellas. I have no intention of abandoning the character yet, but the future may be a bleak place for handsomely-formatted asides. It gives one pause.

                                                                                                                                                • Lockal

                                                                                                                                                  last Saturday at 9:49 PM

                                                                                                                                                  I think a bit more interesting statistics is to count only \w—\w. This excludes cases like "(—)" and emdashes surrounded by spaces, which is, apparently, what Russian-speaking users like to use. Also it is an very old tradition to format page titles as <title>[Page name] — [Website name]</title>: depending on language this is a default setting for MediaWiki, WordPress, etc.

                                                                                                                                                    • n2d4

                                                                                                                                                      last Saturday at 10:31 PM

                                                                                                                                                      Not just Russian speakers put spaces around the emdash, but also the AP style guide.

                                                                                                                                                      Also, for what it's worth, UK style guides recommend endash + spaces (but many write emdash + spaces instead), and so do some other languages (eg. German). There are more countries than just America and Russia!

                                                                                                                                                        • Lockal

                                                                                                                                                          last Sunday at 4:40 AM

                                                                                                                                                          No, I mean in few Slavic languages emdash is replaces "is a / ist / est / es / ...", therefore you will see it in 99% of ru/be/uk Wikipedia articles *in the first sentence*. Coincidentally, in these languages emdash must be surrounded by spaces (no exceptions).

                                                                                                                                                  • rasse

                                                                                                                                                    last Saturday at 5:10 AM

                                                                                                                                                    How about en dash usage? Has that been used as a similar false indicator?

                                                                                                                                                      • thomasm6m6

                                                                                                                                                        last Saturday at 6:18 AM

                                                                                                                                                        OpenAI’s o3 was big on en dashes—one time it produced a Deep Research result containing >200 of them. I’m not aware of any other LLM using them commonly, though. I’d guess humans use them even less often; I don’t think Apple auto-inserts en dashes, and very few people (myself being one) are pedantic enough to bother.

                                                                                                                                                        On the other hand, I don’t think o3 was ever a common choice among people copying from LLMs, so en dashes remain infrequent regardless.

                                                                                                                                                          • aspect0545

                                                                                                                                                            last Saturday at 7:18 AM

                                                                                                                                                            In German en dashes are more common than em dashes. I’ve been using them regularly for at least 20 years, both in German and English texts. I never liked it when people just threw in ordinary hyphen instead of an en dash, but few people note the difference.

                                                                                                                                                              • JimDabell

                                                                                                                                                                last Saturday at 8:05 AM

                                                                                                                                                                Yes, this is regional – British usage tends to be an en dash surrounded by spaces, where American usage tends to be an em dash with no spaces.

                                                                                                                                                                  • lostlogin

                                                                                                                                                                    last Saturday at 8:23 AM

                                                                                                                                                                    All this has me thinking. Is the em-dash like an accent for machines?

                                                                                                                                                                      • JimDabell

                                                                                                                                                                        last Saturday at 9:08 AM

                                                                                                                                                                        I’m not sure about accent, but I have described their intense overuse of certain things as a verbal tic before.

                                                                                                                                                            • ascorbic

                                                                                                                                                              last Saturday at 10:28 AM

                                                                                                                                                              They're very easy to type on a Mac though (opt+-). I've always used spaced en dashes without realising that that is the more common British style. Unspaced em dashes just look wrong to me.

                                                                                                                                                                • rectang

                                                                                                                                                                  last Saturday at 3:04 PM

                                                                                                                                                                  Unspaced em dashes look wrong too me too in most web contexts, but I think it’s typography-dependency and they look good in serif text when very large and heavy compared to other elements.

                                                                                                                                                          • last Saturday at 5:18 AM

                                                                                                                                                        • cookiengineer

                                                                                                                                                          last Saturday at 5:46 AM

                                                                                                                                                          How can I get to the top of the leaderboard?

                                                                                                                                                          Is the amount of em dashes counted or the comments that have at least one em dash inside them?

                                                                                                                                                          You know, I am asking for...science(?).

                                                                                                                                                          I also wanted to point out that these could be Kantonese/Mandarin/Japanese/SouthEast Asian users that use their local keymapping software because a lot of them use the idiom symbols (e.g. the dot character, too) when they switch to the English keymaps.

                                                                                                                                                          Check out how laptops usually look like over there, a lot of manufacturers build that right into the firmware.

                                                                                                                                                            • nodja

                                                                                                                                                              last Saturday at 5:47 AM

                                                                                                                                                              Go back in time and post with em—dashes.

                                                                                                                                                                • cookiengineer

                                                                                                                                                                  last Saturday at 5:49 AM

                                                                                                                                                                  Okay, so step one is to buy a DeLorean. Got it.

                                                                                                                                                                    • throwup238

                                                                                                                                                                      last Saturday at 8:08 AM

                                                                                                                                                                      There are flux capacitor conversion kits now.

                                                                                                                                                          • wiradikusuma

                                                                                                                                                            last Saturday at 4:26 AM

                                                                                                                                                            I'm actually one of the people who use em dash regularly. I treat it like a pause—like sighing. It's very easy to type it on a Mac it becomes muscle memory: Opt+Shift+Dash.

                                                                                                                                                              • bee_rider

                                                                                                                                                                last Saturday at 6:42 AM

                                                                                                                                                                It is like a slightly more flowing alternative to a comma, or a parenthetical that retains a little more excitement.

                                                                                                                                                                • readthenotes1

                                                                                                                                                                  last Saturday at 5:34 AM

                                                                                                                                                                  Wow! ChatGPT is really good here--passes as human.

                                                                                                                                                                  J/k:)

                                                                                                                                                              • rcarmo

                                                                                                                                                                last Saturday at 7:04 AM

                                                                                                                                                                This is kind of pointless given that iOS’s autocorrect has been adding em dashes, ellipsis and smart quotes to comments since… forever.

                                                                                                                                                                (Like now)

                                                                                                                                                                It’s become a weird kind of witch hunting regarding blogs, too, and I have a 20+ year old site that renders all of its content using Markdown extensions that do the same (and that also convert dual hyphens to em dashes—something I’ve been typing for about as long).

                                                                                                                                                                  • ikari_pl

                                                                                                                                                                    last Saturday at 8:29 AM

                                                                                                                                                                    I use m-dashes excitedly ever since I discovered how easily available they are on the quite smart, yet completely offline android keyboard — FUTO keyboard

                                                                                                                                                                      • last Saturday at 9:55 AM

                                                                                                                                                                    • chubot

                                                                                                                                                                      last Saturday at 12:58 PM

                                                                                                                                                                      Yeah exactly, I use em dashes, and somewhat expected to be on the leaderboard :-) But I type them as two hyphens --

                                                                                                                                                                      On my desktop, the two hyphens remain literal. But on iOS, it turns into an em dash I think. Although it seems like I get the smart quotes more often than the em dash

                                                                                                                                                                        • DamnInteresting

                                                                                                                                                                          last Saturday at 4:19 PM

                                                                                                                                                                          Something like 16 years ago I added a custom filter to my WordPress functions.php to convert "--" to a proper emdash in the output. If I had a nickle for every emdash in my back catalog I could finally buy that detached backyard office I've always wanted.

                                                                                                                                                                      • weikju

                                                                                                                                                                        last Saturday at 8:56 AM

                                                                                                                                                                        This site seems to be about identifying users who used emdash BEFORE ChatGPT was released, therefore identifying who is likely not ChatGPT despite using emdashes

                                                                                                                                                                        • pas

                                                                                                                                                                          last Saturday at 9:30 AM

                                                                                                                                                                          but it required two hyphens, right? it's not like any bla-blah got autocorrected into Blah--Blah, right?

                                                                                                                                                                      • Andrew_nenakhov

                                                                                                                                                                        last Saturday at 8:35 PM

                                                                                                                                                                        Ironically, I personally prefer good typography, but unless the editor for the desktop app is autocorrecting -- to —, I usually don't bother. But when I type on the phone with screen keyboard, I almost always do bother, even though entering text on mobile is objectively slower and more difficult and often with fewer options.

                                                                                                                                                                          • JdeBP

                                                                                                                                                                            last Saturday at 10:50 PM

                                                                                                                                                                            In this particular case, the options for mobile 'phone keyboards are greater rather than fewer. The em dash is a first class citizen on the "writer" layouts in ThumbKey, for example.

                                                                                                                                                                            * https://github.com/dessalines/thumb-key

                                                                                                                                                                        • astahlx

                                                                                                                                                                          last Saturday at 7:04 AM

                                                                                                                                                                          I started using emdashes in my academic career, after my advisor pointed me to the subtle differences. And since then, I like and use emdash a lot. In Latex, it is easily produced, just keep the spacing rules in mind. The Punctuation Guide is a nice reference on it https://www.thepunctuationguide.com/

                                                                                                                                                                            • globular-toast

                                                                                                                                                                              last Saturday at 7:32 AM

                                                                                                                                                                              There are actually four different "dashes" in La/TeX. The hyphen (-), en-dash (--) which is used for numeric rangen like 1--2, the em-dash (---) for punctuation, and the minus sign ($-$). Knuth talks about them in the TeXbook which is good fun.

                                                                                                                                                                                • pxc

                                                                                                                                                                                  last Saturday at 7:43 AM

                                                                                                                                                                                  I think you can do all of those in plain text as well. There are Unicode characters for those dashes and probably more

                                                                                                                                                                                    • globular-toast

                                                                                                                                                                                      last Saturday at 8:52 AM

                                                                                                                                                                                      Not in ASCII. My definition of plain text is roughly "the characters I have on my keyboard". Unicode is like a superset of all possible plain texts. Useful, but I really don't like my own files containing characters I can't (easily) type. If I regularly typed in another language I would acquire a keyboard for that language. I'm not even convinced typographical symbols like various dash types even belong in Unicode at all to be honest. It seems like you have to draw a very arbitrary line somewhere.

                                                                                                                                                                                        • Symbiote

                                                                                                                                                                                          last Saturday at 9:04 AM

                                                                                                                                                                                          Drawing the line at "OK-ish for American English" is far too restrictive.

                                                                                                                                                                                          You can't write COā‚‚ or m², use a fraction like ½, claim Ā© or mention a price in Euros or Pounds Sterling.

                                                                                                                                                                                          You can't even write major American place names (San José, Oʻahu).

                                                                                                                                                                                            • globular-toast

                                                                                                                                                                                              last Saturday at 8:15 PM

                                                                                                                                                                                              It's not too restrictive for me. I rarely need to write foreign place names or words (I'm British). Yeah I use the Ā£ symbol so I'm not limiting myself to ASCII, just what is on my keyboard (I have € too). I just don't really consider a file full of characters I can't type to be "plain text" just because it's UTF-8, that's all.

                                                                                                                                                                                              • pxc

                                                                                                                                                                                                last Saturday at 8:11 PM

                                                                                                                                                                                                I'm pretty sure © and ½ are in ASCII. I think é might be, too.

                                                                                                                                                                                                But anyway, I agree: there's no reason plain text shouldn't be rich.

                                                                                                                                                                                                  • JdeBP

                                                                                                                                                                                                    last Saturday at 11:05 PM

                                                                                                                                                                                                    Wherever you learned ASCII from, it was very wrong. It probably made the common (although less common in the 21st century than in the 20th) erroneous conflation of ASCII and Latin-1, or IBM code page 437, or IBM code page 850.

                                                                                                                                                                                                      • pxc

                                                                                                                                                                                                        yesterday at 7:29 PM

                                                                                                                                                                                                        Oh! You're right. It was way back in high school, and I think I must have learned about Latin-1 under the guise of "ASCII".

                                                                                                                                                                            • maaaaattttt

                                                                                                                                                                              last Saturday at 10:17 AM

                                                                                                                                                                              I think this whole em dash topic should lead to some deeper (though not very deep) conversations:

                                                                                                                                                                              * If it was not widely used before where/how did (chat)GPT picked it up?

                                                                                                                                                                                  * If it was widely used, then it shouldn't be a topic at all. But, there seems to be informal agreement that it wasn’t widely used.
                                                                                                                                                                                  
                                                                                                                                                                                  * Or, could GPT have inferred that even though it's not widely used, it's the better way to go (to use it). Which then makes one wonder about the whole probability of next token idea. Maybe this line of thinking falls too short of what might be really going on internally.
                                                                                                                                                                              
                                                                                                                                                                               * If it had picked up something that is widely used but in the wrong way, it should make us pause (again) about the future feedback loops these LLMs, which aren't going away, are already creating. Not just in terms of grammar and spelling but also in terms of way of thinking and seeing the world.
                                                                                                                                                                              
                                                                                                                                                                              (edit: formatting)

                                                                                                                                                                                • msgodel

                                                                                                                                                                                  last Saturday at 10:22 AM

                                                                                                                                                                                  It's used a lot in formal writing (academic papers, books etc) which are probably a large portion of chatGPTs training. If the HRL was done by professional writers then it was probably additionally biased toward using them.

                                                                                                                                                                                  People are more casual on the web. It's sort of like how people can often tell when it's me in IM without my name because I properly use periods while that's unusual in that medium. ChatGPT is so correct it feels robotic.

                                                                                                                                                                                    • maaaaattttt

                                                                                                                                                                                      last Saturday at 11:23 AM

                                                                                                                                                                                      It’s the most likely explanation I believe. I have no idea about the content distribution of the training data but I would have assumed twitter and Reddit content would completely dwarf the literary content. Somewhat good that if it’s indeed not the case!

                                                                                                                                                                                  • throwaway89201

                                                                                                                                                                                    last Saturday at 10:24 AM

                                                                                                                                                                                    The training sets of most LLMs contain a copious amount of content from Libgen (or now: Anna's Archive), where em dashes are frequently used in literary writing.

                                                                                                                                                                                      • nullc

                                                                                                                                                                                        last Saturday at 9:24 PM

                                                                                                                                                                                        Who the hell knows how the initial biases of LLM's broke.

                                                                                                                                                                                        My IRC name (gmaxwell) is a token in the GPT3 tokenizer.

                                                                                                                                                                                    • Hilift

                                                                                                                                                                                      last Saturday at 11:26 AM

                                                                                                                                                                                      It isn't about wide use. It is about a character that almost no-one enters explicitly. Nearly all usages are copy paste, or inadvertent/unintended conversion by an application such as Microsoft Word that converts regular quotes to smart quotes, etc. In that respect, we see that an AI is performing identically to a real human. An AI does not and most likely would not add see a purpose an em or en dash to any text, unless it was an article about em or en dashes, or they knew the person they were speaking with uses en or em dashes.

                                                                                                                                                                                  • zdw

                                                                                                                                                                                    last Saturday at 3:58 AM

                                                                                                                                                                                    I applaud this data. But how are people actually creating an em-dash in the "add comment" box? Some non-obvious OS-level shortcut?

                                                                                                                                                                                      • necubi

                                                                                                                                                                                        last Saturday at 4:23 AM

                                                                                                                                                                                        On macOS it’s easy—opt+shift+-.

                                                                                                                                                                                        The em-dash used to be a slightly snooty way for Mac users to announce themselves. Sad that the polarity of perception has reversed.

                                                                                                                                                                                        I’ve been typing em-dashes since I got my first MacBook in 2006 and I’m not going to let the AI companies take my beautiful punctuation away from me.

                                                                                                                                                                                        • ronsor

                                                                                                                                                                                          last Saturday at 4:02 AM

                                                                                                                                                                                          Compose key, alt key codes, WinKey + . on Windows—there are many ways. It's also easy to do on most phone keyboards by holding down the hyphen key for more options.

                                                                                                                                                                                          • dullcrisp

                                                                                                                                                                                            last Saturday at 4:12 AM

                                                                                                                                                                                            document.querySelector("textarea").value += '—' in the Javascript console.

                                                                                                                                                                                            • jer0me

                                                                                                                                                                                              last Saturday at 4:03 AM

                                                                                                                                                                                              Option Shift Hyphen on macOS

                                                                                                                                                                                              • acheron

                                                                                                                                                                                                last Saturday at 4:04 AM

                                                                                                                                                                                                You type -- and it autocorrects on iOS.

                                                                                                                                                                                                  • 9dev

                                                                                                                                                                                                    last Saturday at 5:52 AM

                                                                                                                                                                                                    You can also long-press the dash key on the iOS keyboard.

                                                                                                                                                                                                • Kungfuturtle

                                                                                                                                                                                                  last Saturday at 8:54 PM

                                                                                                                                                                                                  I've long been a fan of the em dash—one of the first things I did when I migrated back from OSX to Windows was to set up an AutoHotKey function to map <Alt>+<-> to an em dash.

                                                                                                                                                                                              • Gud

                                                                                                                                                                                                last Saturday at 11:53 AM

                                                                                                                                                                                                The one thing LLMs do well is manipulating text. The danger is obviously that it will reduce individual expression and make everything the same mediocre sludge.

                                                                                                                                                                                                For me writing is a way to capture a stream of consciousness so I don’t really see the advantage of using an LLM.

                                                                                                                                                                                                When I see some trivial mediocrity I simply stop reading. It’s just not interesting.

                                                                                                                                                                                                  • A4ET8a8uTh0_v2

                                                                                                                                                                                                    last Saturday at 12:05 PM

                                                                                                                                                                                                    As with most things, it can get interesting if you don't rely on defaults. My personal amusement in that area includes chatting up fictional characters with unique 'voices'. And even simple capture of consciousness can get more interesting if you apply stylometric analysis to it.

                                                                                                                                                                                                      • Gud

                                                                                                                                                                                                        last Saturday at 2:27 PM

                                                                                                                                                                                                        Makes a lot of sense! Interesting choice of use, I’ll have to try it out. I write some sci fi.

                                                                                                                                                                                                        Personally I use LLMs to study languages, particularly German. I find it enormously helpful.

                                                                                                                                                                                                • Havoc

                                                                                                                                                                                                  last Saturday at 8:18 PM

                                                                                                                                                                                                  Confused by the year stats below - that shows an increase much earlier that say GPT3 release date. So I'm guessing whatever is going on isn't just AI?

                                                                                                                                                                                                    • gardnr

                                                                                                                                                                                                      last Saturday at 8:28 PM

                                                                                                                                                                                                      From my perspective: that's the point of the web toy. It shows who was using these em dashes before they were likely copied and pasted from ChatGPT (or generated from APIs). The em dash is widely identified as a single character that highly increases the "smell" of text as being generated by AI.

                                                                                                                                                                                                      It is novel to see which users were producing text with an em dash before the rise of AI slop. User 'derefr' was 5 years ahead of everyone.

                                                                                                                                                                                                      I do wonder if there was some journalism CMS involved, or if these users figured out how to produce the character on their own volition.

                                                                                                                                                                                                      EDIT: 'lynndotpy' has an explanation in this thread.

                                                                                                                                                                                                  • thoughtpeddler

                                                                                                                                                                                                    last Sunday at 2:55 AM

                                                                                                                                                                                                    Someone should make something like this for the wider world outside of just HN. Go through all my publications through gScholar or elsewhere, and scour and parse anything I wrote publicly pre-11/30/22 to establish some kind of proof-of-humanity. Sincerely, an em-dash user who got overtaken by the GenAI wave of the mid-2020s.

                                                                                                                                                                                                    • dns_snek

                                                                                                                                                                                                      last Saturday at 9:37 AM

                                                                                                                                                                                                      Slightly tweaked, a leaderboard of em dash containing comments after ChatGPT release, limited to users who used them in fewer than 1% of comments before ChatGPT release, and who posted at least 200 comments before and after ChatGPT release. Data is recent (August 28th).

                                                                                                                                                                                                      Of course this doesn't mean they're using ChatGPT either, they could've switched devices or started using them because they felt like it.

                                                                                                                                                                                                        #   user           before_chatgpt after_chatgpt  
                                                                                                                                                                                                        1   fao_           9/1777 (1 %)   36/225 (16 %)
                                                                                                                                                                                                        2   tlogan         1/962 (0 %)    59/399 (15 %)
                                                                                                                                                                                                        3   whynotminot    1/250 (0 %)    36/356 (10 %)
                                                                                                                                                                                                        4   unclebucknasty 13/2566 (1 %)  38/378 (10 %)
                                                                                                                                                                                                        5   iLemming       0/793 (0 %)    61/628 (10 %)
                                                                                                                                                                                                        6   nostrebored    10/1045 (1 %)  32/331 (10 %)
                                                                                                                                                                                                        7   freeone3000    0/2128 (0 %)   74/791 (9 %) 
                                                                                                                                                                                                        8   pdabbadabba    6/932 (1 %)    20/225 (9 %) 
                                                                                                                                                                                                        9   thebooktocome  4/632 (1 %)    18/208 (9 %) 
                                                                                                                                                                                                        10  tnecniv        0/671 (0 %)    34/446 (8 %) 
                                                                                                                                                                                                        11  dkersten       39/5092 (1 %)  24/318 (8 %) 
                                                                                                                                                                                                        12  stared         8/1565 (1 %)   29/392 (7 %) 
                                                                                                                                                                                                        13  ETH_start      3/385 (1 %)    75/1029 (7 %)
                                                                                                                                                                                                        14  tcbawo         2/792 (0 %)    15/218 (7 %) 
                                                                                                                                                                                                        15  jbm            2/406 (0 %)    22/350 (6 %) 
                                                                                                                                                                                                      
                                                                                                                                                                                                      Query [2]:

                                                                                                                                                                                                        WITH by_user AS (
                                                                                                                                                                                                          SELECT
                                                                                                                                                                                                            `by` AS user,
                                                                                                                                                                                                            COUNTIF(text LIKE '%—%') AS match_count,
                                                                                                                                                                                                            COUNT(*) AS total_count,
                                                                                                                                                                                                            (timestamp >= '2022-11-30') AS after_chatgpt
                                                                                                                                                                                                          FROM `bigquery-public-data.hacker_news.full` 
                                                                                                                                                                                                          WHERE type = 'comment'
                                                                                                                                                                                                          GROUP BY user, after_chatgpt
                                                                                                                                                                                                        ),
                                                                                                                                                                                                        combined AS (
                                                                                                                                                                                                          SELECT
                                                                                                                                                                                                            user,
                                                                                                                                                                                                            MAX(IF(NOT after_chatgpt, match_count, 0)) AS match_before_chatgpt,
                                                                                                                                                                                                            MAX(IF(NOT after_chatgpt, total_count, 0)) AS total_before_chatgpt,
                                                                                                                                                                                                            MAX(IF(after_chatgpt, match_count, 0)) AS match_after_chatgpt,
                                                                                                                                                                                                            MAX(IF(after_chatgpt, total_count, 0)) AS total_after_chatgpt,
                                                                                                                                                                                                          FROM by_user
                                                                                                                                                                                                          GROUP BY user
                                                                                                                                                                                                          HAVING total_before_chatgpt >= 200 AND total_after_chatgpt >= 200
                                                                                                                                                                                                        ),
                                                                                                                                                                                                        with_fractions AS (
                                                                                                                                                                                                          SELECT
                                                                                                                                                                                                            *,
                                                                                                                                                                                                            SAFE_DIVIDE(match_before_chatgpt, total_before_chatgpt)  AS fraction_before_chatgpt,
                                                                                                                                                                                                            SAFE_DIVIDE(match_after_chatgpt, total_after_chatgpt) AS fraction_after_chatgpt
                                                                                                                                                                                                          FROM combined
                                                                                                                                                                                                        )
                                                                                                                                                                                                        SELECT
                                                                                                                                                                                                          user,
                                                                                                                                                                                                          FORMAT('%d/%d (%.0f %%)', match_before_chatgpt, total_before_chatgpt, ROUND(fraction_before_chatgpt*100)) AS before_chatgpt,
                                                                                                                                                                                                          FORMAT('%d/%d (%.0f %%)', match_after_chatgpt, total_after_chatgpt, ROUND(fraction_after_chatgpt*100)) AS after_chatgpt
                                                                                                                                                                                                        FROM with_fractions
                                                                                                                                                                                                        WHERE fraction_before_chatgpt < 0.01
                                                                                                                                                                                                        ORDER BY fraction_after_chatgpt DESC
                                                                                                                                                                                                        LIMIT 15
                                                                                                                                                                                                      
                                                                                                                                                                                                      [1] https://news.ycombinator.com/item?id=45072937

                                                                                                                                                                                                      [2] https://console.cloud.google.com/marketplace/product/y-combi...

                                                                                                                                                                                                        • owenversteeg

                                                                                                                                                                                                          today at 12:21 AM

                                                                                                                                                                                                          Out of curiosity, I browsed comments from some of those accounts. #5 has a ton of obvious LLM-generated comments. #2 has some. I didn't see any in the most recent comments from #7 and #10.

                                                                                                                                                                                                          • stavros

                                                                                                                                                                                                            last Saturday at 2:23 PM

                                                                                                                                                                                                            I think for this one you should do absolute, rather than relative, increase. The first place went from 9 to 36 whereas second went from 1 to 59, the number of comments they wrote without ChatGPT hitting an emdash shouldn't be relevant, I think.

                                                                                                                                                                                                            It does need some normalization for people who post very few comments, but it feels more fair this way.

                                                                                                                                                                                                            • nullc

                                                                                                                                                                                                              last Sunday at 12:32 AM

                                                                                                                                                                                                              It's interesting that only two of them are zero before. Going from few to many is nowhere near the chatgpt using signal as going from zero to many... unless perhaps the few before were obviously from copy and pastes elsewhere.

                                                                                                                                                                                                          • last Saturday at 11:33 AM

                                                                                                                                                                                                            • atoav

                                                                                                                                                                                                              last Saturday at 12:34 PM

                                                                                                                                                                                                              Place 33. I hate the whole LLMs em-dash thing since I now have to consider how em-dash usage impacts the perception of those reading what I wrote.

                                                                                                                                                                                                              At least I tended to use em-dash always with spaces surrounding it — like so. I know the anglospace-convention is to use it without spaces, but I just don't like that visually. At least one way to tell me apart from typical LLM-generated text.

                                                                                                                                                                                                              • loughnane

                                                                                                                                                                                                                last Sunday at 2:29 AM

                                                                                                                                                                                                                I noticed them in the Economist around 2010, and thought they were slick. Tons of software will autodetect "---" as an emdash so that works.

                                                                                                                                                                                                                Honestly, even if it doesn't make it pretty I find stringing together a few hyphens does the trick in less formal settings.

                                                                                                                                                                                                                • userbinator

                                                                                                                                                                                                                  last Saturday at 4:14 AM

                                                                                                                                                                                                                  I suspect they are generated via "autocorrect", the same way as "smart (more like stupid) quotes" and other characters that tend to cause a great deal of frustration should they find their way into source code. It would be interesting to see how many users regularly make posts containing non-ASCII characters.

                                                                                                                                                                                                                    • wiml

                                                                                                                                                                                                                      last Saturday at 4:20 AM

                                                                                                                                                                                                                      I type them manually out of habit. There are a handful of other common non-ASCII marks I have muscle memory for as well.

                                                                                                                                                                                                                      Compose-minus-minus-minus in X

                                                                                                                                                                                                                      It's one of the long-press punctuation marks on Android

                                                                                                                                                                                                                      Option-shift-minus on Mac

                                                                                                                                                                                                                      • southwindcg

                                                                                                                                                                                                                        last Saturday at 5:02 AM

                                                                                                                                                                                                                        I use Autokey. I've added a bunch of occasionally-used HTML entities and Unicode characters so I don't need to go hunting for them.

                                                                                                                                                                                                                        • dang

                                                                                                                                                                                                                          last Saturday at 4:16 AM

                                                                                                                                                                                                                          I'm only #2 but all mine are guaranteed hand-made, done this way: https://news.ycombinator.com/item?id=45071823

                                                                                                                                                                                                                            • lostlogin

                                                                                                                                                                                                                              last Saturday at 8:24 AM

                                                                                                                                                                                                                              When the pre 2022 versus post 2022 stats come out, all will be revealed.

                                                                                                                                                                                                                          • db48x

                                                                                                                                                                                                                            last Saturday at 4:18 AM

                                                                                                                                                                                                                            No, I modified my keymap to make typing quotes and dashes and other characters easy.

                                                                                                                                                                                                                            • last Saturday at 4:19 AM

                                                                                                                                                                                                                          • ks2048

                                                                                                                                                                                                                            last Saturday at 10:38 PM

                                                                                                                                                                                                                            A related question - if you feed each comment into an LLM and asked it to classify into {human-produced, llm-produced, not-sure}, how many would it think are from LLMs? How could you try to investigate the true answer?

                                                                                                                                                                                                                            • ben_w

                                                                                                                                                                                                                              last Saturday at 12:31 PM

                                                                                                                                                                                                                              This kind of thing is the only way I'm likely to get in a top-10-HackerNews-users list ^_^;

                                                                                                                                                                                                                              • JumpCrisscross

                                                                                                                                                                                                                                last Saturday at 10:47 PM

                                                                                                                                                                                                                                Sadly, I’ve been editing it out of my writing, at least online and in emails.

                                                                                                                                                                                                                                • rednafi

                                                                                                                                                                                                                                  last Sunday at 12:33 AM

                                                                                                                                                                                                                                  For most write-ups, I’ve switched to en-dash flanked by two spaces these days. Easier to type and looks less gippitified imo.

                                                                                                                                                                                                                                  > But British usage - instead - uses spaces, so an en-dash or an em-dash is acceptable.

                                                                                                                                                                                                                                  • chrismorgan

                                                                                                                                                                                                                                    last Saturday at 7:24 AM

                                                                                                                                                                                                                                    As #10 on this list, here’s how I do it on my laptop.

                                                                                                                                                                                                                                    I remap a key to the right of Space to Compose, and add various custom sequences. Before long, I was completely comfortably and casually typing dashes and curly quotes and more, and in fact it takes conscious effort for me to limit myself to ASCII when typing prose. (Writing code, writing *, /, -, ' and " is easy. But writing prose, I genuinely will write Ɨ, Ć· if it feels the right one in that place, āˆ’, ā€˜/’ and ā€œ/ā€.)

                                                                                                                                                                                                                                    On one previous laptop keyboard I mapped Menu, on my current one RAlt is more suitable.

                                                                                                                                                                                                                                    When on Windows, I use WinCompose. On Linux, I used to just use it bare, which had advantages and disadvantages—apps implement a Compose key inconsistently, some messing things up related to includes and some handling overlapping sequences differently. More recently I wanted to be able to type Telugu and installed fcitx5 which is no longer mostly broken under Wayland like it was last time I tried, so now fcitx5 is handling the Compose sequences across the entire system, and working more consistently. Also I can use Ctrl+Alt+Shift+U and get a popup where I can search Unicode by code or description. Now if only that pesky popup would handle Shift+Space and Ctrl+Backspace itself rather than letting them fall through to the parent…

                                                                                                                                                                                                                                    In my ~/.config/sway/config:

                                                                                                                                                                                                                                      input * {
                                                                                                                                                                                                                                          xkb_options "caps:backspace,compose:ralt"
                                                                                                                                                                                                                                      }
                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                    (caps:backspace isn’t entirely relevant here, but it’s on the same line and I choose to mention it. When people are remapping Caps Lock, I’ve never understood why so many seem to choose to make it Escape. Just extend the left hand and slap the corner of the keyboard with the ring finger, it’s not a huge movement and is easy to reach and return. Backspace, however, tends to be needed at least as often (and yes, I say that despite using Vim), and is much harder to hit. In my mind, a far better candidate for shifting to that prime real estate.)

                                                                                                                                                                                                                                    For my ~/.XCompose, I start with the defaults and one good set of additions, https://raw.githubusercontent.com/kragen/xcompose/master/dot...:

                                                                                                                                                                                                                                      include "/usr/share/X11/locale/en_US.UTF-8/Compose"
                                                                                                                                                                                                                                      include "/home/chris/.XCompose-kragen"
                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                    Then I add all kinds of additions. Lots of fine typography stuff like zero-width space and non-joiner, narrow no-break space, thin space… a few more hyphen/dash mappings… and lots of other things like nice emoji sequences, music notation stuff, Greek letters matching Vim digraphs, superscript ordinals (ˢᵗ, ⁿᵈ, ʳᵈ, ᵗʰ), the keyboard shortcut symbols macOS uses (āŒ˜āŒƒāŒ„ā‡§āŒ« and another dozen less common ones), control pictures like ␆, and a handful of other things.

                                                                                                                                                                                                                                    When all’s said and done:

                                                                                                                                                                                                                                    • Compose - - - gets me — EM DASH (stock)

                                                                                                                                                                                                                                    • Compose - - . gets me – EN DASH (stock)

                                                                                                                                                                                                                                    • Compose - - = gets me āˆ’ MINUS SIGN (custom)

                                                                                                                                                                                                                                    • Compose - - w gets me āøŗ TWO EM DASH (custom; w for wide)

                                                                                                                                                                                                                                    • Compose - - W gets me āø» THREE EM DASH (custom; W for Wider)

                                                                                                                                                                                                                                    The last two I use occasionally, the other three I use very frequently. I went through a phase of using HYPHEN and SOFT HYPHEN, now I seldom use them.

                                                                                                                                                                                                                                    I also like to write &c. (italic where supported) for et cetera.

                                                                                                                                                                                                                                    For quotation marks, I also use custom mappings:

                                                                                                                                                                                                                                      <Multi_key> <semicolon> <semicolon>   : "ā€˜"   U2018 # LEFT SINGLE QUOTATION MARK
                                                                                                                                                                                                                                      <Multi_key> <apostrophe> <apostrophe> : "’"   U2019 # RIGHT SINGLE QUOTATION MARK
                                                                                                                                                                                                                                      <Multi_key> <colon> <colon>           : "ā€œ"   U201c # LEFT DOUBLE QUOTATION MARK
                                                                                                                                                                                                                                      <Multi_key> <quotedbl> <quotedbl>     : "ā€"   U201d # RIGHT DOUBLE QUOTATION MARK
                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                    Think about how you physically type them, and I reckon these mappings make a lot of sense, very easy to type. Much better than the stock bindings (<' >' <" >") or kragen ones (`Space 'Space `` ''; or 6' 9' 6" 9").

                                                                                                                                                                                                                                    —⁂—

                                                                                                                                                                                                                                    (Oh yeah, that one’s <Multi_key> <h> <r> : "—⁂—".)

                                                                                                                                                                                                                                    Now, I have one question I’d like answered. Overlapping sequences. If you have -> → and <- ← you’re fine, but when you add <-> ↔, I can’t find any way of using the <- sequence any more. Before fcitx5, some apps would ignore one or the other (in ways difficult to explain which I think involved the fact that some definitions came from includes), and some would let you terminate the sequence early and match the shorter one (e.g. Compose < - Enter). Is there some proper solution I’ve missed?

                                                                                                                                                                                                                                    I have plans for an article on my keyboard arrangements, including sharing a full .XCompose, but I’m going to finish my next major revision to my website first. Because then I’ll be able to draw things instead of just writing.

                                                                                                                                                                                                                                    —⁂—

                                                                                                                                                                                                                                    On mobile, I think I use FUTO keyboard at present, which lets me access most of these things, but not elegantly. I want to make my own keyboard layout that lets me access the good stuff more easily, but I haven’t got to it yet.

                                                                                                                                                                                                                                    Also: anyone want to join me in advocating for completion dictionaries and libraries to replace their ' apostrophes with ’, or at least to support both approaches equally? I’m fed up with not having this stuff, Vim is the only place where it was straightforward to get it about right, and mobile is just a mess.

                                                                                                                                                                                                                                      • frumiousirc

                                                                                                                                                                                                                                        last Saturday at 1:03 PM

                                                                                                                                                                                                                                        > If you have -> → and <- ← you’re fine, but when you add <-> ↔, I can’t find any way of using the <- sequence any more.

                                                                                                                                                                                                                                        X11 is likely walking a tree of .XCompose entries with each keypress. Once it gets to '<' and '-' it finds '←' and does not continue to consider your next '>'. So, you need to provide a way to walk a different path.

                                                                                                                                                                                                                                        This works for me.

                                                                                                                                                                                                                                            <Multi_key> <less> <period> <greater> : "↔"
                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                        It is like how EN DASH is "--." to be distinct from EM DASH's "---".

                                                                                                                                                                                                                                        In general we must consider the entirety of .XCompose when choosing new compose key bindings. Maybe there is some utility to help with that. For me, I removed 98% of the default Compose file entries which makes manual checking feasible.

                                                                                                                                                                                                                                          • chrismorgan

                                                                                                                                                                                                                                            last Saturday at 2:12 PM

                                                                                                                                                                                                                                            There is no X11 involved here, and even on systems running an X server instead of Wayland, judging by the symptoms I’ve seen, the X server isn’t actually involved in interpreting Compose sequences—each app implements the whole lot itself, and judging by the inconsistencies, not all are using the same library for it.

                                                                                                                                                                                                                                            Some only let Compose < - (←) work, stopping and preventing Compose < - > (↔) from working. Others, if I remember correctly, let Compose < - Enter work to get ←.

                                                                                                                                                                                                                                            Once an Input Method is involved, it can handle the Compose key, and that’s what fcitx5 is doing for me now, so that everything’s behaving the same… but that ā€œsameā€ is not what I reckon it should be.

                                                                                                                                                                                                                                        • lostlogin

                                                                                                                                                                                                                                          last Saturday at 8:26 AM

                                                                                                                                                                                                                                          I’m no longer concerned you’re an AI, but I am concerned.

                                                                                                                                                                                                                                      • phendrenad2

                                                                                                                                                                                                                                        last Saturday at 8:50 AM

                                                                                                                                                                                                                                        I probably would have made the list, but regular dashes are good enough for me - ASCII forever!!!

                                                                                                                                                                                                                                        • sinuhe69

                                                                                                                                                                                                                                          last Saturday at 3:22 PM

                                                                                                                                                                                                                                          How do people type em dash on the keyboard? On iOS you have to long-press the dash key, on a hardware full keyboard you have to key-in the code(?). That’s all very cumbersome and unnatural! Why people bother using em dash at all?

                                                                                                                                                                                                                                            • sjs382

                                                                                                                                                                                                                                              last Saturday at 3:25 PM

                                                                                                                                                                                                                                              Mac: Option-shift-hyphen

                                                                                                                                                                                                                                              Android: long-press hyphen

                                                                                                                                                                                                                                              • pxx

                                                                                                                                                                                                                                                last Saturday at 3:24 PM

                                                                                                                                                                                                                                                there are shortcuts but they can also compose their response in a richer text editor then paste. matched quotation marks also typically show up.

                                                                                                                                                                                                                                            • notpushkin

                                                                                                                                                                                                                                              last Saturday at 7:26 AM

                                                                                                                                                                                                                                              Well─────that was bound to happen.

                                                                                                                                                                                                                                              • mickeyp

                                                                                                                                                                                                                                                last Saturday at 4:43 AM

                                                                                                                                                                                                                                                Some of us use triple dash to indicate the same thing. Like LateX. You should add that too.

                                                                                                                                                                                                                                                  • latexr

                                                                                                                                                                                                                                                    last Saturday at 4:56 AM

                                                                                                                                                                                                                                                    The point is to disprove the notion that any writing with an em-dash was done by an LLM. Including a triple dash would just muddy the data.

                                                                                                                                                                                                                                                    • Bud

                                                                                                                                                                                                                                                      last Saturday at 7:15 AM

                                                                                                                                                                                                                                                      [dead]

                                                                                                                                                                                                                                                  • apparent

                                                                                                                                                                                                                                                    last Saturday at 8:10 PM

                                                                                                                                                                                                                                                    So does that mean ChatGPT was trained on these HNers' comments?

                                                                                                                                                                                                                                                    • last Saturday at 5:01 AM

                                                                                                                                                                                                                                                      • IAmGraydon

                                                                                                                                                                                                                                                        last Saturday at 4:25 AM

                                                                                                                                                                                                                                                        I guess I’m confused. Why is it interesting to know how many em dashes were used before the dawn of ChatGPT? It’s how many AFTER that seems like it would be far more interesting.

                                                                                                                                                                                                                                                          • tkgally

                                                                                                                                                                                                                                                            last Saturday at 4:53 AM

                                                                                                                                                                                                                                                            As mentioned in the thread that included dang’s suggestion [1], examples of one’s use of em dashes timestamped before ChatGPT could be used as a defense if one is accused, on the basis of em dashes, of having written with AI.

                                                                                                                                                                                                                                                            Whether this is interesting or not, well…

                                                                                                                                                                                                                                                            [1] https://news.ycombinator.com/item?id=45046883

                                                                                                                                                                                                                                                            • latexr

                                                                                                                                                                                                                                                              last Saturday at 4:53 AM

                                                                                                                                                                                                                                                              Because it’s becoming a common belief that any em-dash indicates LLM writing, and us people who regularly use em-dashes are attempting to show that is a poor signal on its own. The goal is to show proof of humans using it.

                                                                                                                                                                                                                                                                • Tostino

                                                                                                                                                                                                                                                                  last Saturday at 5:22 AM

                                                                                                                                                                                                                                                                  Or at least to have a baseline. If you see a sudden jump, that does tell you something.

                                                                                                                                                                                                                                                                    • bee_rider

                                                                                                                                                                                                                                                                      last Saturday at 6:46 AM

                                                                                                                                                                                                                                                                      Maybe it tells us that, thanks to AI, some folks learned about a perfectly useful piece of punctuation.

                                                                                                                                                                                                                                                              • southwindcg

                                                                                                                                                                                                                                                                last Saturday at 4:47 AM

                                                                                                                                                                                                                                                                Some people accuse anyone who uses em dashes of using ChatGPT to write their posts. This is "proof" that actual humans use em dashes.

                                                                                                                                                                                                                                                                  • vntok

                                                                                                                                                                                                                                                                    last Saturday at 7:37 AM

                                                                                                                                                                                                                                                                    Things like books are proof that actual humans use em dashes, that wasn't ever the contention.

                                                                                                                                                                                                                                                                    What's needed is a writing comparison before/after 2022 for these users. If there's a sudden 200% increase in the use of em-dashes from one month to the next, it's a very strong indicator that the user started LLMing their posts.

                                                                                                                                                                                                                                                                      • southwindcg

                                                                                                                                                                                                                                                                        last Saturday at 11:15 PM

                                                                                                                                                                                                                                                                        Perhaps I should have qualified that humans use them in casual writing, website comments and the like, and not just in formal, published works that probably had an editor.

                                                                                                                                                                                                                                                                • dragonwriter

                                                                                                                                                                                                                                                                  last Saturday at 6:59 AM

                                                                                                                                                                                                                                                                  Given that GPT-3.5 (like many LLMs) was trained with a large corpus of scraped internet data, including popular discussion fora, the people on the leaderboard are the ones potentially to blame for ChatGPT’s em-dash habit.

                                                                                                                                                                                                                                                              • dang

                                                                                                                                                                                                                                                                last Saturday at 4:17 AM

                                                                                                                                                                                                                                                                There's also https://news.ycombinator.com/item?id=27787448

                                                                                                                                                                                                                                                                • nullc

                                                                                                                                                                                                                                                                  last Saturday at 9:17 PM

                                                                                                                                                                                                                                                                  I was surprised I only ranked 34th for earliest -- but then I saw it was the date my account was created.

                                                                                                                                                                                                                                                                  • lo_zamoyski

                                                                                                                                                                                                                                                                    last Saturday at 2:23 PM

                                                                                                                                                                                                                                                                    This shows absolute numbers. It would be better to see frequency.

                                                                                                                                                                                                                                                                    EDIT: There's a second ranking linked at the top that shows this.

                                                                                                                                                                                                                                                                    • nullandvoid

                                                                                                                                                                                                                                                                      last Saturday at 11:08 AM

                                                                                                                                                                                                                                                                      I was hoping to see a graph of em-dash usage over time across all comments - would be interesting to see the spike post LLM

                                                                                                                                                                                                                                                                        • jacquesm

                                                                                                                                                                                                                                                                          last Saturday at 11:12 AM

                                                                                                                                                                                                                                                                          Indeed, that is interesting, the author could probably spit out that answer in seconds. As - for the most part, anyway - a traditionalist and ASCII7 adherent I find it funny to think about how this is probably also a good indicator of the age of the writer.

                                                                                                                                                                                                                                                                            • DonHopkins

                                                                                                                                                                                                                                                                              last Sunday at 6:59 AM

                                                                                                                                                                                                                                                                              When I saw your name on the leaderboard, I was shocked -- I say shocked -- and I hoped that all of the messages you posted with em dashes were just quoting other people using them, and ripping them a new *.

                                                                                                                                                                                                                                                                                • jacquesm

                                                                                                                                                                                                                                                                                  last Sunday at 4:20 PM

                                                                                                                                                                                                                                                                                  Lol, I wonder how many people you made to check. How are the kittens?

                                                                                                                                                                                                                                                                      • last Sunday at 12:30 AM

                                                                                                                                                                                                                                                                        • JKCalhoun

                                                                                                                                                                                                                                                                          last Sunday at 12:47 AM

                                                                                                                                                                                                                                                                          Yes! #21! A list I finally made — and I was not surprised to find I was on it.

                                                                                                                                                                                                                                                                          • Ericson2314

                                                                                                                                                                                                                                                                            last Saturday at 5:00 AM

                                                                                                                                                                                                                                                                            I do em dash on my phone, and --- on the computer. Can we expand this further? I wanna make at least the top 200!

                                                                                                                                                                                                                                                                            • almostbasic

                                                                                                                                                                                                                                                                              last Sunday at 2:10 AM

                                                                                                                                                                                                                                                                              This is amazing The rise of the AI generated em dash is insane.

                                                                                                                                                                                                                                                                              • last Saturday at 11:08 PM

                                                                                                                                                                                                                                                                                • k__

                                                                                                                                                                                                                                                                                  last Saturday at 8:04 AM

                                                                                                                                                                                                                                                                                  If I had a key for it on my keyboard, I'd use it more often too.

                                                                                                                                                                                                                                                                                  • qingcharles

                                                                                                                                                                                                                                                                                    last Saturday at 3:13 PM

                                                                                                                                                                                                                                                                                    The post where we discovered dan g was an AI.

                                                                                                                                                                                                                                                                                    • qwertytyyuu

                                                                                                                                                                                                                                                                                      last Saturday at 2:29 PM

                                                                                                                                                                                                                                                                                      We need a Column for em-dashes per 1000 words

                                                                                                                                                                                                                                                                                      • last Saturday at 4:10 AM

                                                                                                                                                                                                                                                                                        • last Saturday at 12:10 PM

                                                                                                                                                                                                                                                                                          • firesteelrain

                                                                                                                                                                                                                                                                                            last Saturday at 12:21 PM

                                                                                                                                                                                                                                                                                            Between the comments running correlations BC and AC, things still seem inconclusive.

                                                                                                                                                                                                                                                                                            @dang - can we add it to the HN guidelines that we should not or should call out AI when we see it? On one hand people might get defensive and the threads get out of hand. On the other hand, we don’t want AI slop.

                                                                                                                                                                                                                                                                                          • attogram

                                                                                                                                                                                                                                                                                            last Saturday at 7:52 AM

                                                                                                                                                                                                                                                                                            So now some folks will intentially add in em dashes to get on the leaderboard — oops!

                                                                                                                                                                                                                                                                                              • Wowfunhappy

                                                                                                                                                                                                                                                                                                last Sunday at 1:23 AM

                                                                                                                                                                                                                                                                                                You can't, it only measures posts prior to the release of ChatGPT.

                                                                                                                                                                                                                                                                                            • last Saturday at 3:40 AM

                                                                                                                                                                                                                                                                                              • aaron695

                                                                                                                                                                                                                                                                                                last Saturday at 6:30 AM

                                                                                                                                                                                                                                                                                                [dead]

                                                                                                                                                                                                                                                                                                • anonyMusk

                                                                                                                                                                                                                                                                                                  last Saturday at 1:34 PM

                                                                                                                                                                                                                                                                                                  [dead]

                                                                                                                                                                                                                                                                                                  • RobertEva

                                                                                                                                                                                                                                                                                                    last Saturday at 8:11 AM

                                                                                                                                                                                                                                                                                                    [dead]