Fighting spam with Haskell at Meta (2015)

7 comments

I worked on this project for a week during the FB team matching process.

I had previous Haskell experience, but what struck me was how irrelevant that was.

To their immense credit, they had managed to use Haskell to write an entirely entreprise focused, "boring" business rule backend serving millions of QPS, and people who were not Haskell fans could easily use its sensible DSL to accomplish their goals.

Way too many Haskell projects are about Haskell itself, so this was really fascinating to see.

Haskell could be a great practical language if some constraints were introduced, e.g. limiting the language extensions used. https://www.simplehaskell.org attempted to do this and, currently, https://neohaskell.org is going in the same direction. After all, Haskell '98 is not that hard.

Personally, I think Haskell, or something like Haskell, is going to be reasonably popular in the near future. Functional programming and an expressive type system are great for ML-powered synthesis. You provide the type signature, and the machine fills in the function body. Furthermore, with dependent or refinement types, the solution can be verified to be correct.

I'm not fully convinced. FP is such a different paradigm than the leading imperative/OO design that most are comfortable with. Other languages that are too different like lisp, forth, Apl, Haskell, and Prolog are just too different for the average person IMO. I've given Haskell/OCaml/F# a go a few times and enjoy learning new paradigms and it certainly didn't click for me. I have a feeling it'll be even harder with more normal people not into this as a hobby.

Here is how to read a text file in Haskell (I assume a standard way):

https://stackoverflow.com/questions/7867723/haskell-file-rea...

From a Python tutorial:

https://www.pythontutorial.net/python-basics/python-read-tex...

I could be biased, but it certainly seems like Python has a lot less conceptual hurdles here. You basically specify a filepath and then for loop through each line and print. The Haskell solution requires more steps and theory.

I know Haskell is a very awesome and cool language and super powerful for all kinds of uses that Python may be inferior at (example compilers). You'll get no argument there. I'm just pointing out that I think wide adoption may be difficult. I drank the koolaid and read dozens of blog posts and a couple of Haskell books and really wanted to make it work. I'm an engineer and like math and puzzles and learning new programming languages...and yet I couldn't make it that far (without sinking a whole lot more time into it than I had) and ultimately gave up.

It's not really that Haskell is "powerful", but the type system will catch a lot of errors for you and make refactoring much easier.

Also, I wouldn't recommend starting FP with Haskell. It's hard, mostly because of the monads (and laziness can make things more confusing too). Also the syntax which is confusing because indentation is meaningful in some places, but not others.

On the other hand, a language like Scheme is really super easy. Even OCaml is super simple if you stick to the classic ML subset of the language which can take you a very long way. These languages have been using to teach programming to complete beginners. Seriously, OCaml is arguably simpler than Python, and without doubt order of magnitude simpler than C++.

If you're familiar with things like lambda, closures and functions like map or filter in Python, you already know most of what you need to write OCaml code.

> FP is such a different paradigm than the leading imperative/OO design that most are comfortable with.

I don't think most of FP is at all hard for most programmers to understand. Hell I interviewed plenty of Javascript devs that couldn't solve a simple fizzbuzz level question because they didn't know how to do for loops - they only knew how to use `.forEach()`!

The hard bits in Haskell are:

1. Purity. I get it, but it definitely makes things harder.

2. Weird syntax: pervasive currying / single argument functions / lack of grouping syntax / waaaay too many operators, etc. E.g. from your link:

        let singlewords = words contents
            list = f singlewords
This is just a collection of words. No clue of what is a function call, what is a variable name, etc. Allow me to recreate the experience of reading Haskell, if you think the above is reasonable:

    file read lines put
    lines iter while cbr empty
    for do <~> x IO::print
3. General unfriendliness to people who aren't hardcore Haskellers. E.g. suppose I want to match a regex. In Go, Python, Rust, Javascript, etc. there's one obvious way to do it. Haskell? How about you pick between 8 alternatives... https://wiki.haskell.org/index.php?title=Regular_expressions

There are other flaws, like global type inference & lazy IO, but I think those are the main ones.

From the Haskell snippet, f is just "map read" specialized for lists of Integers, so inlining it would read like this :

  let list = map read (words contents)
The equivalent Python is basically this:

  list = [int(word) for word in contents.split()]
I'm just writing this for the benefit of others here.

> waaaay too many operators, etc.

Haskell doesn't have operators, it's based on expressions consisting of function compositions, some of which can be infix and look like operators.

> This is just a collection of words. No clue of what is a function call, what is a variable name, etc.

That's by design, because all of them are expressions that can either reduce immediately or require runtime data to reduce fully.

> How about you pick between 8 alternatives...

How about you pick either of those and start using for real, and then come to the point it either works well or you find inefficiencies to look specific alternatives for? It doesn't take much.

The Haskell version parses the contents of the file. One answer also explains how to lazily read the file to process it as it's read, using the ByteString package. I think part of the overhead here is due to this lazy processing, plus the fact that there are basically 3 String in Haskell: String, ByteString and Text.

The python version simply loads the whole file into memory. The equivalent Haskell would be to call Data.Text.IO.readFile. What would the python version of the lazy parsing / processing of a file look like ?

There were several Python versions shown at the link and the latter one that I was referring to does not read it all into memory. It loops through each line in only three lines of code, not like 1/4 of a page.

I tried just pasting the code into the window, but never learned how to format HN on mobile. Sorry for the confusion. After the colon there are indents, so 3 separate lines.

with open('testfile.txt') as f: for line in f: print(line.strip())

I think Haskell seems harder because it's built on another set of abstractions (laziness, typeclasses) where Python abstractions (iterators in this case, plus the with statement for resource acquisition/cleanup) are more common, but I could be wrong (I've been working with Scala for close to a decade now so functional style looks more familiar that imperative now.)

(I also always forget how to format comments, the ref is here: https://news.ycombinator.com/formatdoc )

> The Haskell solution requires more steps and theory.

Really? The suggested Haskell solution is

    main = do  
        contents <- readFile "test.txt"
        print . map readInt . words $ contents
which is no more complex than the Python code. The entire file contents is read as a String into content. words breaks it into a list of words, and readInt parses each word into an Int. Finally the list of ints is printed.

yes, it looks all good, until you want to change it. For instance, to make the file name a CLI parameter, you need to understand things like type classes, specifically Monad, Functor and Applicative. You may need a long time to be at ease with these things and a superficial understanding will not take you very far. In Python, it'll take you a few minutes to do the same thing.

What do you mean by "long time"? Why wouldn't it apply just the same to some Python file that defines a Maybe with a bit of unwrap and bind and orElse or whatever?

Instead of things going boom and spewing a stack sometimes they just go Nothing instead, you don't need much more understanding than that. There is a lot of syntax in Haskell that is somewhat hard to understand at first but you don't actually need it to write stuff yourself.

This idea that you need to talk a particular way and be able to teach CS theory at uni and whatnot to build stuff in Haskell is probably not very healthy.

Long time, I mean possibly several weeks of full-time studying for an intermediate Python or C++ programmer.

I'm not saying you need any CS theory to write Haskell or that it's super hard. But I think the learning curve is pretty steep, and it's hard to write code without a good understanding of the concepts. Just tweaking until it type checks isn't going to cut it.

Consider this code. Generated from ChatGPT. This is supposed to be a simple command line option parsing. I don't think it's obvious what all the operators mean.

Sure, you can try to reuse this, and extend it. But I think sooner than later you'll be stuck with some undecipherable error messages and you'll have to sit down and understand what it all means.

  data Options = Options
    { optVerbose :: Bool
    , optInput   :: String
    } deriving Show

  optionsParser :: Parser Options
  optionsParser = Options
    <$> switch
        ( long "verbose"
       <> short 'v'
       <> help "Enable verbose mode" )
    <\*> strOption
        ( long "input"
       <> short 'i'
       <> metavar "FILENAME"
       <> help "Input file name" )

This is a really good point. I have coworkers that don't really code, but can use ChatGPT to help them put together a Python app that does what they need with some common sense changes. I don't think I could even do the same with Haskell with a fair amount of coding experience+ reading up a lot on Haskell over the years. It may be obvious to those select few who are drawn to Haskell, but I think they greatly underestimate the challenges for the average person. That's the essence of what I've been saying to the parent thread that believes a subset of Haskell will become popular some day. I could be wrong, but just can't see it.

It is obvious, as long as you know what Functors and Semigroups for custom data types are. If you don't know it, you can still use it freely without fully understanding the meaning of `<>` and `<$>`, because they are written almost as plain bullet points of components to treat as a single whole.

I'd say a lot more is going on there conceptually than the Python code. Imagine through the eyes of a beginner. You have some main do thing, then you appear to read it all into memory and assign a variable, then you do a bunch of composition to words and readInt and map it to print?

With the python version, you see the file handle as a variable and then for loop on that iterable and you can print each line or parse it however you want. Even when I was learning to program and had no clue what an iterable object was, there seemed to be an obvious three line idiom for processing text files that was easy to use.

I think the issue that many people have with Haskell is the order of expression evaluation is not structured left-to-right and top-to-bottom. At least it's what makes it difficult to read Haskell for me. Compare it with F# (and OCaml family in general):

    open System.IO

    (File.ReadAllText "text.txt").Split()
    |> Seq.map int
    |> Seq.iter (printfn "%d")
It doesn't really matter on simple expressions but as you keep chaining the operations the reverse order gets more difficult to follow.

Maybe I'm completely wrong (as an outside than never touched anything ML other than OCaml), but Idris 2 does seem like a "clean Haskell" minus the laziness.

Every time I try to work on someone's Haskell code, I'm confronted with a slew of custom infix operators, and I find my previous experience is completely irrelevant as I learn the new DSL the authors chose to write (which more often than not also involves flexing their brain as hard as they can).

But that's like half of computing.. every new tool the world inflicts on you, configured in Jojo's Awesome Configuration Language, with some arbitrary made up grammar punctuated by Tourette outbursts of special character line-noise.

That, or YAML, then a tool to template and generate the YAML, and then a stack of tools wrapped around that, ad infinitum.

A little learning is the cost of not having to write the thing yourself. On the other hand, hell is the non-composable tower of babel that is computing. Total employment by exponentially compounding incidental complexity.

It's one of many reasons I love Go.

The amount of arbitrary glue, configuration and "cool things" I need to learn-and-forget for every new project is an order of magnitude less than any other language I've used in anger.

I don't know that I really exactly like working with Go but I do have to hand it to them, they did nail this aspect of it.

I don't see how that is language specific. I think it's entirely based on the project size.

Pick up docker or kubernetes and you'll have to learn plenty of stuff. Pick up ory/hydra to do auth in go, same thing.

Check out the age (from FiloSottile) or the rage (Rust port) codebase and they require similar level of understanding.

Right, now imagine the templated YAML mess implemented in Haskell together with the app. By someone smart enough to liberally use Haskell's most complicated features

I think it's a sliding scale and Haskell is definitely on the extreme end.

The fact that people do it in the ordinary language, so you can click through to the operator definition and see immediately what it does, makes it a lot less taxing IMO. Even if the code is quite complex, figuring out what an actual function does is 10x easier than doing the same with a magic annotation, which is what you have to do in that situation in most other languages.

That's interesting, because Haskell is the only language in which I can easily read code written by others (and in "others" I include "myself three years ago").

Same! Well, Rust fits the bill too.

Generally, any language with a type system advanced enough.

I'm sure it is super cool and immediately comprehensible for an independent project. I'm actually really fascinated by programming languages like APL that take it even further, I'd like to try some of those out but I don't have a great fit for a project to use with them.

But all I can think of when I see custom operators professionally is how much of a nightmare it would be the second one has to onboard a junior developer onto the project.

It's super cool, but I'm not sure immediately comprehensible. I can't even read my own Haskell code after a few months away from it. You end up using other people's infix operators from the libraries and it's easy to forget what they are.

> Way too many Haskell projects are about Haskell itself, so this was really fascinating to see.

It seems like Haskell is a good match for the requirements in this case, too. Implicit concurrency and enough robustness from the type system that new filters can be deployed directly to production plays to the strengths of the language. It doesn't feel like the usual solution in search of a problem.

I’m very sad that this service is basically on life support and got moved into PHP with everything else.

Haha I was checking the comments precisely to see if this was the case. This happens nearly everywhere a language that isn't Java, Python, Ruby, Go, PHP, or JavaScript is used. IMO it has more to do with tech labor arbitrage[1] than anything technical. Even if a system is punching above its weight, over time, the Weird Language Choice spooks people enough that they get the rewrite bug.

Bleacher Report is a funny example: it used to be a darling example of Elixir, where a migration from Ruby -> Elixir claimed a move from "150 Ruby servers to 5 (probably overprovisioned) Elixir servers."[2] But then management and politics got scared, moved it all to more conventional tech, and the whole system suffered (see this legendary post[3]).

Fred Hebert describes a similar thing happening with a migration from Erlang deployments to Go/Docker/immutable, where you lose some pretty valuable capabilities by migrating to more conventional tech.[4]

I don't see this changing anytime soon -- we came of age when it was viable to attract investment with the promise of tech innovation. These days, those are liabilities because managers misunderstood the "Use Boring Technology" post the way consultants bartardized "Agile" (taking decent advice and misunderstanding it into something wholly different and horrifying). The result is you've got companies with customers in the 1000s using k8s, calling it "simple" and "Boring," whereas that same company would be called amateur if they did things like stateful deploys on-prem.[5]

At least we'll always have WhatsApp.

[1]: https://www.baldurbjarnason.com/2024/react-electron-llms-lab... [2]: https://web.archive.org/web/20170204160005/http://www.techwo... [3]: https://www.reddit.com/r/erlang/comments/18f3kl3/comment/kct... [4]: https://ferd.ca/a-pipeline-made-of-airbags.html [5]: https://morepablo.com/2023/05/where-have-all-the-hackers-gon...

Love your take on it. From a fellow prog languages enthusiast that matured, I can safely say that you are right.

I've coded in so many languages in my life, but the job market pull from Ruby always drags me back, with time, I began to really love and appreciate what Ruby is.

I still find Ruby a bit niche than other mainstream languages like Java and Python, I bet that if I had >5 years of Java, the Java market pull would be higher than Ruby and I'd be doing Java.

I recall hearing about this a while back, and how the team at Meta had implemented hotswapping to deploy new rules, without redeploying the entire executable, even though the rules were expressed in a Haskell DSL.

Unfortunately, I have never seen hotswapping of Haskell code anywhere else. All I can find is this library [0] by Simon Marlow.

Together with Cloud Haskell [1], hotwapping would allow to create pretty flexible distributed systems à la Erlang/OTP.

[0]: https://hackage.haskell.org/package/ghc-hotswap

[1]: https://github.com/haskell-distributed/distributed-process

It's worth noting that Simon Marlow is at Meta, and indeed is also the author of the present article.

Wonder if Haskell is still around at Meta these days, does anybody know?

Yes, (and so is OCaml) but less than in the past, mostly for specialized and older projets. Most engineers at Meta will not encounter these languages. It's costly to integrate them with the internal tools and arguably not worth the effort. On the other hand, Rust is increasingly popular and well-supported and approved by the company.

[deleted]

I think one of the most promising pitches for rust adoption is that it borrowed as much as possible from Haskell/ML without treading into the territory where it becomes "scary" for broadly-C-family language monoglots. A C++ or even Java programmer can look at Rust and think to themselves "this text has a comfortably familiar shape".

You aren't hiring good programmers if they are intimidated by syntax.

The differences between Haskell/OCaml and C-family languages are much more significant than syntax. Rust brings a lot (but not all) of the richness of these languages while staying within a broadly imperative paradigm.

That's neither my point nor OP's point. The soothing feeling of "this text has a comfortably familiar shape" is entirely about syntax, and indeed is only a soothing feeling because such programmer has not seen anything beyond it and is unwilling to get outside that comfort zone.

I do not hire anything who does not get outside of their comfort zone or who is intimated by syntax they haven't seen. If a programmer rejects Haskell or OCaml by giving me a good critique of say, Hindley Milner style type systems†, that's a good programmer I'm willing to hire even though I don't necessarily agree.

†: I've been asked exactly this question during an interview: to critique HM style type systems. As a candidate I felt this style of questions gave me a far better way to display my knowledge and experience than Leetcode style questions.

Do you put any weight on any factors beside the technical capabilities of the tools you're working with? Like, say, how easy they are to pick up or use, etc.?

The way your comment reads, it's like you're saying you'll never hire someone who rejects speaking to you in Esperanto unless they offer a good critique of its grammar.

You’re on hacker news so Occam’s razor would pick the simplest explanation is which the latter

> I do not hire anything

Today's Freudian slip award goes to...

You do not need too much good programmers.

I've encountered many otherwise solid engineers who are intimidated and/or put off by Nix (the language).

I count myself among them. And I wouldn't say I am "otherwise solid". I am solid. Nix is just a weird custom footgun-laden language in a place where I don't want to have to learn a weird custom footgun-laden language. It has other serious issues too, like the fact that it is declarative makes discoverability extreeemely difficult. What does setting `foo: true` do? You can't go-to-definition on it. You can only hope it is well documented (it isn't) or try to find the place that reads that key ... somewhere... in all of Nix... Good luck.

Hell even if it were a perfectly designed language (it isn't) I still would be put off by it because I don't really want to be a professional full-time packaging engineer.

I think Bazel got a lot right by using a subset of Python. Basically nothing to learn language-wise.

This shallow dismissal pretty much proves OPs point.

For starters, you compared Nix the ecosystem to Starlark, perhaps the smallest aspect of Bazel. But Bazel (the ecosystem) has a horrendous documentation problem as well.

I grant that "Nix" is a very overloaded term, but it seems like you don't know which part of Nix you are even referring to. "Somewhere... in all of Nix" is not something that makes any sense.

I fully admit that Nix has a steep learning curve. Very steep. But I don't think you know enough to give a thoughtful critique.

you're not solid, by what you've just written here and in above threads.

Nix is really weird and ugly tbf.

It’s great at being a declarative configuration language though…. Even if it’s ugly.

I agree, but most programmers unfortunately are not good, so this is the reality we must deal with

Mostly discouraged in new projects.

Looks like the author is now working on Glean (the AI docs/search tool) at Meta, which is also Haskell-based.

Simon Marlow is a famous Haskell developer, alongside the likes of Simon Peyton-Jones or Philip Wadler

Wow, there is a startup named Glean doing something very similar. Founded by Arvind Jain who co-founded Rubrik.

This article was written about the same time when I interviewed with this group. I was very enthusiastic about Haskell but a senior Haskell lead told me I needed much more Haskell experience but I was otherwise an interesting candidate and he offered to put me in touch with the other FB team doing a similar function, but in Java. I was only interested in learning Haskell, so nothing happened. Still it was very cool to be able to talk with a couple of famous Haskell folks.

From my user experience, I don’t think “anti-spam” is the best part of the FB engineering.

If this article was about "fighting spam with C++ at Meta", would it make the front page? I doubt it. I learnt Haskell thinking that it would give me super-powers no other language has, but in the end my impression of the language was "meh". I like the syntax a lot, and I have adopted the monadic approach to error handling when I do Python, but Haskell itself doesn't give me an edge over other programmers.

Haskellers are proud of a few projects that use Haskell (pandoc, xmonad, etc.) and they boast about how they can achieve the same thing as i3 but in much fewer LoC. But frankly I don't think #LoC is a good metric for gauging a language's value.

"If this article was about "fighting spam with C++ at Meta"

That would not be news, Haskell is famous for being an academic language without many real world applications. C++ is famous for being practical.

I do think it's interesting how I've been hearing about how much people love Haskell and how elegant it is since middle school. Now I'm 10 years out of university and it's still notable when something is actually written in Haskell. That's got to tell you something about the language, the tooling, the docs, the community, or maybe some combination of those things.

Probably a combination of those things but also about the staying power or perception and stereotype from the broader programming community. (I’m not saying that Haskell is super pragmatic and industry-ready, only that the perception of it often lags behind its real usefulness — admittedly in part because other languages work well enough so why bother to engage with one that requires thinking differently?)

The edge that functional languages give you is expressiveness, the more complexity you have in the business logic, the more that compounds.