News | drihu.com

By jandrewrogers, 10 months ago

Something that has also worked well for me historically is aggressively updating tool chains as an integral part of the development process. I've seen far too many companies de-prioritize upgrading their tool chains until it got to the point where they were so out-of-date that things broke through decay and obsolescence, and often had security vulnerabilities with no easy path toward remedying them. At this point, it usually requires heroic efforts to bring everything up to a vaguely modern standard.

I've adopted a policy that for every significant compiler or build system release, we create a branch that verifies a good build. If it doesn't build without warnings or errors, or fails in testing, that is treated the same as a bug in the code to be dealt with ASAP. Prioritizing this also tends to enforce good build configuration hygiene and automation. Not being able to do this is a bit of an architecture/code smell anyway. This also makes it much easier to incrementally modernize and refactor the code base with new language features -- the tool chain supports the latest and greatest even if the code base doesn't yet.

I also find it difficult to rationalize third-party dependencies these days, they almost all become disappointments over the long-term. The benefits rarely justify the risks. Anecdotally, this policy has not materially reduced development velocity; the scope of useful functionality can usually be replaced with a small amount of code that has good mechanical sympathy with the rest of the code base because it was designed with the code base in mind. I've found in many cases you'll write just as much code trying to interface a third-party library.

That said, for new projects it is often useful to use a few third-party dependencies to plug holes with a clear plan to replace them in the near- to mid-term. Stopgap dependencies have been a reasonable practice in my experience.

By Lutger, 10 months ago

Fully agree, though I feel there are some nuances with third party dependencies. Consider the following:

Some third party dependencies are so widely adopted and supported, they are on-par with the main framework you are using in terms of support and development. They often don't lag the main framework in supporting new versions for long. I don't think these kinds of dependencies have the same drawbacks. You only have a couple at this level anyway, and should choose them carefully.

There are also dependencies that solve one particular problem well and very narrowly couple to a specific feature in your product. If they become problematic over time, it is easy to just replace them or even rewrite it yourself. Specifically, you could select these judging how well you could maintain the sourcecode yourself in such an event. Often, these are easy to extend as the focus is so specific. You can consider just adding it to your codebase and thinking of these dependencies as your code, except that it is written by someone else.

By wutwutwat, 10 months ago

> I also find it difficult to rationalize third-party dependencies these days

Your home grown auth code doesn’t have the minds and the scrutiny of security experts like the go-to third party auth lib everyone uses does, nor does it have the free pen testing, security fixes, maintenance, optimizations, refactors, ever collaborative maintainers, or corporate sponsorships.

Auth is a solved problem by minds much smarter than my own. I’m glad to outsource things like that to third party deps.

By izacus, 10 months ago

Well, it also doesn't have LDAP dependencies in the Log stack to cause bunch of those issues in the first place.

And while I kind of agree with you on principle, I've also seen way too many shoddy OSS libraries from "experts" that I had to rewrite to really make it a no-brainer statement.

Just like OP, I'm leaning against shoveling in dependencies if I can help it as I age.

By mattgreenrocks, 10 months ago

The issue I take with this stance is that people reading will put auth in the "too scary to implement" bucket mentally, and think it's fine to pay the Auth0 tax on every single project, even one with a few users.

This is a great resource to learn how to write it yourself, and things to take into consideration: https://lucia-auth.com/

IMO, this is one thing to implement carefully if need be (vs avoid altogether, as is the case of cryptography).

By wutwutwat, 10 months ago

I should have clarified, I wasn't talking about third party auth SaaS providers, I was talking about auth libraries you'd use in your codebase.

I also left "auth" ambiguous and didn't specify "authentication" or "authorization" because my original comment applies to both imo.

By bsder, 10 months ago

> Auth is a solved problem by minds much smarter than my own.

Is it? It seems like all the outsourced auth has lots and lots and lots of problems, issues, etc. along with being remarkably expensive.

I'm not saying I, personally, can do better, but it seems like auth is far from a solved problem. Especially since it seems like the difficult part of "auth" is actually "customer support" rather than any technical issue.

(I would argue that the lack of an open-source "Auth in a Box" seems to also argue that auth is far from "solved".)

By vinckr, 10 months ago

There are many open source "auth in a box" projects that you can self-host, such as Ory (https://github.com/ory/), Zitadel, Keycloak, and many other small projects. They all have small differences but for small to mid scale projects its definitely manageable.

By wutwutwat, 10 months ago

I was talking about third party libraries (library code), not providers (okta)

By sam_lowry_, 10 months ago

The evil in the details:

Spring Security is complex and brittle, I have seen a push to use underlying libraries e.g. nimbus in real-life projects.

By cowsandmilk, 10 months ago

I would go beyond auth to areas where there historically have been security bugs. Parsing JPEG files for instance, I can write a parser and can probably choose a modern safe language to avoid many classes of bugs, but a popular open source library is more likely to have been fuzzed by dozens of techniques from multiple companies and universities. The code is more resilient because it has been out there to be experimented on in public.

By hombre_fatal, 10 months ago

All of those auth abstractions do a lot more than what I want. And because of that, they are far more complicated. And they can involve so much abstraction that you don't even know exactly how your own system works. And the lack of understanding can lead to worse practices.

Probably because auth abstractions are so complicated is why developers started thinking auth is hard when all it was doing was putting a uuidv4 in a session_id cookie.

Like writing SQL without an ORM, developers have been psyop'ed into thinking things are so hard that they shouldn't even try.

By pdhborges, 10 months ago

> Auth is a solved problem

It is solved until you have to integrate with third party home grown implementations and providers that implement the specs except for a little bit of behavior that is not in the spec.

By wutwutwat, 10 months ago

I see you've worked with oauth2 providers, my friend. Welcome to the circus :)

By skydhash, 10 months ago

Is auth that difficult? The truly difficult problem is cryptography and that's the code you really don't want to write. But most authentication schemes are well-known. No need to import a library, and if you do, you should be prepared to vendor it in.

By wutwutwat, 10 months ago

> most authentication schemes are well-known

so why maintain your own implementation of the scheme/spec when a community already does, as well as handles the vulns as they pop up and generally has more eyes looking (and fixing) issues?

Here's how I look at it; Trust my own NIH implementation of a spec and hope a user reports a vuln instead of it going undisclosed/unnoticed in the wild for years, or use a community lib that is always being scrutinized and vetted and has security professionals involved. It's a no brainer for me.

By lifeisstillgood, 10 months ago

Honestly I’m not convinced it is a solved problem.

I have been trying to get my head around OAuth, zero trust, Fido, SAML et al - and I am not sure I get it.

It’s not that say “JWT tokens might be dangerous” it’s how to hang it all together in a sane manner.

For example I was listening to Security Cryptography Whatever podcast and one of the guests (Chrome dev?) said obviously don’t trust everything that comes down an established TLS connection but verify / auth each request.

That’s a pretty tall order and seems to nullify quite a lot of established frameworks

But realistically I think a lot of frameworks and standards serve different needs than “smallish number high value clients”

By jollyllama, 10 months ago

It all depends on the attack surface/vectors. In some cases, YAGNI. It all comes down to what is more likely to be your biggest PITA: the steady time expense of updating all your deps to accomdate that patched auth lib OR the cost of dealing with whatever breech you're likely to incur is.

By quonn, 10 months ago

> Auth is a solved problem by minds much smarter than my own.

Imagine a doctor making such statement about their field of work. Do you have a degree?

By raincole, 10 months ago

This is exactly what a responsible doctor is supposed to do.

By xtreme, 10 months ago

Doctors refer patients to specialists all the time.

By quonn, 10 months ago

Yes: to specialists that have typically another 5 years of formal education on top. They do not refer patients for relatively trivial topics covered in the first year of a bachelor‘s degree.

By wutwutwat, 9 months ago

Ah, so the cryptography involved in authentication schemes is a first year CS trivial task. Hashing, salting, optimization, defending against side-channel attacks, looking for backdoors introduced by state actors, developing standards used by the strongest militaries on the planet that protects literally enough nuclear weapons to destroy all life on the planet, that's old news by week 3 of the course at any community college. The folks who went overboard with those Phds in cryptography or became specialists in elite OpSec or post-quantum cryptography really did waste their time and money, and someone should tell the CIA they don't need to go hunting for the best cryptography hackers on the planet, just grab a year one CS brogrammer and they can totally make sure all is secure in the ToR network.

Tell you what. I'll never become a specialist if you never become a doctor. I'd rather you not refuse to admit that there are better doctors for a task than you if my health is in your hands.

By szvsw, 10 months ago

> I also find it difficult to rationalize third-party dependencies these days, they almost all become disappointments over the long-term. The benefits rarely justify the risks.

What kind of software development do you do? As someone who is in ML/data science and works primarily in python, occasionally TS/React, at a small company (about 5 FTE, one round of funding), that sentence is so completely foreign to me. Again, this might just be because I am working primarily on ML systems in Python on relatively small teams (also in academia at the same time), where the most basic building blocks like Pandas (or Polars), PyTorch, XGBoost, SKLearn, etc are so essential and so it’s pretty hard to imagine ever being at a scale where they would be feasible or worth it to replace with our own code… and anyways if support for libraries like those collapses, the only explanation is that the entire landscape has shifted so much that it’s hard to imagine the underlying tech of the company not also having naturally shifted with it…

Maybe I am misunderstanding how you are using “3rd party,” or maybe you are just operating in a completely different staffing and financial context which I don’t have experience with.

Just curious for more details about what kind of world you operate in where that philosophy is viable!

By jandrewrogers, 10 months ago

I primarily work on high-performance analytics data infrastructure, mostly written in C++ with a bit of Python and Rust at the periphery. Classic performance-engineered systems software. The only real dependencies are the compiler tool chain and Linux. The biggest issue with most popular libraries is some combination of poor performance, poor scalability on large hardware, and/or a software architecture that is not well-suited to performance-engineered systems. While there is a complaint that external libraries tend to significantly increase build times, that is not the motivating factor here.

I've lost count of the number of times runtime behavior and performance issues have been traced back to (for us) far-from-optimal design choices in third-party libraries that probably seemed like reasonable choices to the library authors. Particularly in open source, you see remarkably little investment in performance engineering or scalability, so it is always straightforward to write your own implementation that is many times faster on the same hardware. Even basic things like popular data format conversion libraries are surprisingly consistently suboptimal. In most cases, the kinds of changes we'd want to make cannot be upstreamed to that library; most open source libraries heavily optimize for simplicity and the widest set of users to the exclusion of maximum performance. In an age when hardware can deliver >100 GB/s of throughput, many popular software libraries deliver <1 GB/s per core even though far better can be achieved with a modicum of diligence.

Another issue is the impracticality of threading fine-grained metrics and observability through a hodgepodge of libraries that were never designed to be used that way or to work together. This is a big deal if you want to heavily automate operations.

There is also a real issue with supply chain risk in terms of both quality and security. The vast majority of open source libraries are implemented to a lower standard of quality and security than the rest of the code base. And those libraries are often not really designed to be testable to the extent that would give similar confidence in the implementation quality. Combine this with the reality that the software needs to be deployable in environments that require higher than average assurance makes this a business risk.

As for the manpower cost, it has been low in practice. An engineer-month here, a few engineer-months there, with the benefit of better performance, more optionality, and much easier maintainability. In most cases, we would only use a tiny subset of library functionality regardless, so it adds a large surface area for a narrow benefit. Much of this code is also reusable. The existing open source libraries are very useful here because they provide something to measure against.

I also like that this forces a discussion of actual requirements for that functionality. Far too much software makes their requirements whatever a library offers.

I arrived at this point gradually over many, many years. The practical maintainability and opportunity for optimization have been key. In far too many cases, it turns out that a library dependency could be replaced with a couple hundred lines of code that actually worked better and which matched the design and style of the rest of the code base. There is a learning curve but once you've written a common class of library once, it is straightforward to do it again.

By szvsw, 10 months ago

Thank you for the detailed feedback! Great insight and makes a lot of sense.

> Particularly in open source, you see remarkably little investment in performance engineering or scalability, so it is always straightforward to write your own implementation that is many times faster on the same hardware. Even basic things like popular data format conversion libraries are surprisingly consistently suboptimal. In most cases, the kinds of changes we'd want to make cannot be upstreamed to that library; most open source libraries heavily optimize for simplicity and the widest set of users to the exclusion of maximum performance.

Yeah, I have even encountered this myself once or twice - i.e. re-implementing some optimization/search algos, sampling processes, or numerical simulation methods to be more tightly coupled to our requirements/context for great performance gain - despite not being in a context where heavy performance engineering has become necessary yet. This obviously overlaps with your point about using only a tiny subset of library functionality.

> Another issue is the impracticality of threading fine-grained metrics and observability through a hodgepodge of libraries that were never designed to be used that way or to work together. This is a big deal if you want to heavily automate operations.

Yes, I personally have not ever run in to this because of the relatively small scales I am operating at but makes perfect sense. Same goes for the security concerns. Obviously these are more serious concerns for larger and longer timescale projects (as discussed in the original blog post).

> engineer-month here, a few engineer-months there

I've only worked in contexts where there are no more than 5 simultaneous active devs on a project, so the long-term investment for longer term gains is probably in higher tension with the short term needs. obvious red flag for increasing technical debt/deferred maintenance/downstream complexity, but still can be difficult to justify diverting 20% of team's total productivity for a long term investment when there are short term demands. But always good to be reminded that those long term investments shouldn't be rejected outright and should be more seriously considered.

> I also like that this forces a discussion of actual requirements for that functionality. Far too much software makes their requirements whatever a library offers. I arrived at this point gradually over many, many years. The practical maintainability and opportunity for optimization have been key. In far too many cases, it turns out that a library dependency could be replaced with a couple hundred lines of code that actually worked better and which matched the design and style of the rest of the code base. There is a learning curve but once you've written a common class of library once, it is straightforward to do it again.

This is a really excellent and concise way of summarizing it. I'm definitely going to try to push this with my team more often and now have a great resource for how to articulate it. Thanks again for taking the time to explain your perspective.

By crabbone, 10 months ago

You chose the field with one of the worst rates of churn and brokeness of third-party libraries, and by some kind of magic managed to avoid it? You should be buying lottery tickets by a truckload!

All the projects you listed (Pandas, PyTorch, SKLearn) have a huge list of compatibility issues, on top of compatibility issues of Python itself. And I know this because I have to support various research projects, which typically use this stuff. My estimate is that the shelf-life of a project using just these three will not exceed 3 years in >90% of cases. My typical situation with these projects is that I'm called in because some former PhD student, who graduated a year or two ago left a project that now another student is picking up. And nothing works. Results cannot be reproduced. The dependencies cannot be installed. And so on.

And then, depending of the amount of effort I'm allowed to put into fixing the project, I end up either crafting some sort of a spec that tries to find a set of versions of used libraries that seem to make the project work again. Or, more commonly, have to vendor some code and add fixes on top to make the packages fit together at least somehow. Sometimes I even end up repackaging someone else's code (a dependency), where the version used before cannot be found anymore, or I cannot figure out the combination of version that could possibly make the program work.

I hear that it's worse in JavaScript world. And, my experience with LaTeX packages has also been marred by similar problems (but not anywhere near the extent of how bad it is in Python).

On the other hand, I had to build absolutely atrocious (math operations on a mix of signed and unsigned ints, modifying const char etc.) C code (pre-K&R, the kind where they write function argument types after parenthesis), and it was much easier to deal with than any "modern" Python (i.e. 3-4 years old).

If anything, Python is the poster child of the problem of unreliability of third-party dependencies.

By szvsw, 10 months ago

Oh I have 100% run into dependency hell many times in the Python ecosystem! I am well aware. But the overall force multiplication of that ecosystem has far outweighed the cumulative day or two per year spent in dependency hell.

As someone who works in both academia (currently a PhD student!) and industry (at the aforementioned startup), it just seems obvious to me that the situation you are describing is less of an innate issue with incorporating dependencies and more an issue with the lack of training on environment management / containerization and lack of experience with software development in general that you see in academic projects. In a relatively competent professional shop, these sorts of issues in my experience crop up occasionally but there are so many mechanisms available to (a) mitigate them from appearing in the first place and (b) have rough plans or strategies in place ahead of time if they do crop up and (c) continuously address them when they do appear, which prevents it from ever blowing up into a true dependency hell. It seems like being judicious about when you decide to use dependencies and how you manage them is sufficient, as opposed to actively avoiding them altogether...

Anyways to me it definitely is the casethat Python has loads of footguns/risks in re: dependencies when someone is developing without thinking about dependency management/long term planning, which is usually the case with academic projects, but I don't think these dangers are so bad when the user is actively aware of them and plans for them - or not bad enough to make the risks outweigh the rewards of adopting the dependencies... it definitely means that the ecosystem is elevating risk levels, but I would also think that there are organizational/management issues which contribute significantly to (and compound with) the issues you've described in an academic context.

> My estimate is that the shelf-life of a project using just these three will not exceed 3 years in >90% of cases.

That seems reasonable; it's problematic from the perspective of the replication crisis in the sciences, but from a different perspective - if the code/project has no users or active developers for 3 years, at what point does it just become cruft? Is it okay to let code die? I don't love this view, but my PI strongly feels that no matter how excellently or poorly engineered a GitHub repository is, it doesn't matter from the perspective of sciences - the only thing that matters is if (a) the methodology is explained thoroughly enough in the literature you produce (in which case the repo can be trash and unrunnable in a few years - if the method is useful enough it will be copied and expanded on etc by others naturally or just completely independently, and 3-5 year old methods are out of date anyways) or (b) it goes on to be actively used by you, your peers, or people in industry, in which case the literature does not matter (but by definition the repo is not trash as it is in continuous use/active development/evolution). I think their are some flaws in this view but I also see merit in it.

Anyways, having said all of this, I hate it when one of my peers at the startup I work with wants to pull in some dependency for a relatively straightforward search algorithm they don't understand when I think it would be easier to just implement it ourselves, so I'm contradicting myself :)

By crabbone, 10 months ago

> if the code/project has no users or active developers for 3 years, at what point does it just become cruft?

At the time the hardware cannot possibly support it. So, we are talking about something like 50 years. For more high-level languages the deadline should extend further (i.e. languages like C should, in principle, have shorter shelf life because they have stronger dependency on the hardware they run on). This is how, ideally it should work. I.e. there's no reason any Python code should be dead by now: Python simply hadn't been around long enough for that to become a real, impossible to solve issue. Ironically, C programs have far better shelf life than Python. About an order of magnitude better.

Second deadline is when businesses will stop using project's code if it doesn't have enough shelf life. This is somewhere between 10 and 5 years. Government / military will want the longer shelf life (and usually have enough bargaining power to compel the developers), while the rest of the industry will suck it up and accept the inevitable death of their software. But, at some point, it no longer makes sense to sacrifice the profits just to keep up with the trends. Eg. you can still find businesses that haven't upgraded from CentOS 7 (released in 2014), and are doing OK. A lot of military software uses 90s or even 80s tech. Banking is also a special case here, as well as some embedded systems that can be many decades old eg. railway networks.

I wish that anything below that threshold wasn't commercially viable... but, obviously, a lot of businesses find enough resources to upgrade more frequently. I feel like these businesses make a disservice to the broader community of users depending on the same technology. In some ways, their ability to upgrade more often is also a way to beat the competition. Unfortunately, more often, it's not just the competition that suffers from this, but also businesses in completely unrelated fields.

----

Now, if talking about Python specifically, there's quite a bit of difference based on particular area the code is written for. Web and data science are the areas with the worst shelf life, and require the most effort to simply stand still. These are the areas where projects typically have more dependencies, attract a lot of attention (and so there's more drive to iterate faster on those dependencies).

But, on Python's side, there are plenty of bad things happening regardless of what individual teams can do. For example, when talking about support of software like operating systems, this includes support of a large number of individual (optional) components that need to work together. I.e. the LTS Linux distribution will have a set of packages that are (allegedly) tested to work together, and if you want to have a reliable baseline for your own software, you just have to draw from a particular distribution and you should be fine. PyPI, or more broadly, Python community doesn't have that kind of concept. There's no set of dependencies that can be guaranteed for any length of time at any time. Anaconda tried to be more like that, but they were overpowered by the users' desire to get new things out of bound.

Another problem with how Python versions packages is that it imposes virtually no restrictions on publishing. If a package author wants to upgrade ten major package versions in a day, there's nobody to tell it to stop. Want to publish an older version than the last one published? -- sure, go ahead. And, of course, on top of it, interface breakage in versions where that's not supposed to happen etc. The poor stewardship of PyPI is a self-fulfilling prophecy: package maintainers realize that the versioning is, essentially, broken in Python, and in order to prevent their packages from breaking, they narrow down the range of acceptable dependency versions as much as possible (and often only accept the single version of the dependency they tested with). This, in turn, results in a situation where upgrades are impossible without massive breakage: incremental upgrades become impossible, because no part of the dependency tree can be upgraded independently. On the other hand, if a single dependency in this tree goes stale, the whole tree is stale.

Another factor to this is that dependencies which accept a wider range of transitive dependencies are harder to install using currently available package managers / installers. I.e. because pip / conda will spend a lot more time trying to find a solution for dependency graph with more potential candidates (due to more loose version requirements), developers are prone to try and expedite the process by narrowing down version constrains. This is especially painful in conda world, where the solver can take many hours (or days in pathological cases) to solve such installations. And so developers have to choose between being more friendly to the user (and supporting multiple transitive dependency versions) or being more hostile to the user (by narrowing down supported versions) while speeding up their deployment.

Yet another factor is the design of Python module system that is unaware of module versions. I.e. if only it was possible to specify the version of the imported module at import time, a lot of problems would've gone away. There would've been no need for virtual environments. Subsequently, there'd be more incentive to ensure compatibility of various independent libraries and programs, which would put pressure on package maintainers to dedicate more time to integration testing and slow down the release cycle.

By jvans, 10 months ago

It would obviously be crazy to roll your own numpy or lightgbm. But there's plenty of bloat in the library space where the cost of integration and upkeep is far more expensive than a simpler implementation of your own

By all2, 10 months ago

I do SW in a hardware world that restricts a lot of dependencies. We vendor some, but mostly roll or own libraries. For time intensive things peripheral to our work we will rely on 3rd party modules (ie, network stack or UI).

By t-writescode, 10 months ago

> I also find it difficult to rationalize third-party dependencies these days

I would love a deeper insight into what you mean by this. I can't imagine you're rolling your own http server, database communication layer (as in like, postgresql's jdbc driver), json parsing engine, etc.

By perlgeek, 10 months ago

The original piece was on developing with a long-scale time horizon (10 years), so you could go with software that has been very stable over comparable time scales.

For example, there are corners of the Perl ecosystem that have been really stable for a long time. Perl itself is very conservative with backwards-incompatible changes, and DBI, the module that abstracts away database interactions, has been around for a long time (the oldest changelog entry is from 1995, but that was for version 0.58, I'd be surprised if it weren't even older).

Of course, then you don't get the fancy, modern libraries, but going with DBI plus DBD::Pg is probably more efficient than coding your own, and stable enough for a ten-year horizon.

Another approach might be to move the really critical pieces of your software to a much simpler interface. For example, HTTP/2 looks much more complicated than HTTP/1.1, so your software might only implement (a subset of) a HTTP/1.1 server, and then you can use an off-the-shelf reverse proxy to expose it to the "modern" internet. This reverse proxy then is easier to replace, because it doesn't handle the core logic of your application.

Or you go really old-school and use CGI, which is an even simpler interface than writing a HTTP server, and any HTTP server that implements CGI would do as a frontend.

By paulryanrogers, 10 months ago

> Perl itself is very conservative with backwards-incompatible changes

So much so that Perl 6 became a whole separate (doomed?) language after trying to change too much.

By throwaway290, 10 months ago

JSON parser, http server are provided by many runtimes and if a runtime does not provide something it provides primitives that make it often trivial to build

By lmz, 10 months ago

I'm not saying it's impossible, but writing your own HTTP server ("trivial to build"?) to avoid bugs in other implementations doesn't sound like a good idea to me.

By rurban, 10 months ago

I had to write one recently. One third page of code, multi-threaded. Really trivial, and no security nightmares as with overblown 3rdparty deps.

By throwaway290, 10 months ago

Read more carefully. We talk about "third-party dependencies" not about "other implementations". Using HTTP server from your runtime is not against the recommendation of avoiding third-party dependencies. Your runtime is usually not counted as "third-party dependency", normal runtimes don't have breaking changes every other moon

(Even if you need to implement something like that from scratch it may be OK, your requirements don't need all of the functionality a third package includes because someone else needed it)

By lmz, 10 months ago

It's the "trivial to build" thing I have issues with - especially for the HTTP server. Not runtime vs third party.

By pdhborges, 10 months ago

Web dev context here. Just from the top of my head: django-constace breaks storage format with no built-in no downtime transition, django-modeltranslations, want to add translations to a heavily used field and you are using a database without transactional ddl good luck with that, django-importexport used a savepoint per row making it useless if you want to import csvs with a few thousand rows, requests doesn't officialy declare thread safety, I could go on and on ...

By graemep, 10 months ago

Most of my work is Django and I think its fairly easy to keep dependencies low because Django is very batteries included.

A good case where people use an unnecessary dependency is calling REST (or other http) APIs. If there is a wrapper, people tend to use the wrapper, which is not really needed if the in many/most cases. AT the most use something like requests which is what a lot of things use anyway.

> django-importexport used a savepoint per row making it useless

That is pretty crazy.

It is also another example of something that is easily avoided.

By cpeterso, 10 months ago

> I've adopted a policy that for every significant compiler or build system release, we create a branch that verifies a good build.

Mozilla's Firefox CI builds with multiple compilers, including clang trunk and (IIRC) Rust Nightly compilers to identify code issues that will need to be fixed before Firefox can update to the new compiler versions and to catch regressions in the compilers themselves.

By MetaWhirledPeas, 10 months ago

> I also find it difficult to rationalize third-party dependencies these days, they almost all become disappointments over the long-term. The benefits rarely justify the risks.

I think I understand your sentiment here. For me the risk is that you are not just trusting the original author, you're trusting all future maintainers, especially if you follow your first (and good) suggestion of "aggressively updating tool chains".

The other risk is what you are mitigating by aggressively updating: things break through decay and obsolescence. That can usually be mitigated through updates, but if you introduce a dependency that is eventually abandoned and not given a quality fork then you are suddenly in need of solving this dependency in an entirely new way.

By vv_, 10 months ago

> I also find it difficult to rationalize third-party dependencies these days

This depends on what kind of dependency you're talking about. For example using Zephyr RTOS significantly reduces time to market and it brings in a lot of 3rd party libraries that are written specifically for it. In other fields this might differ, but fields like Web development are a nightmare to begin with.

By Olreich, 10 months ago

Tool-chain dependencies are still dependencies. If you need autoconf, cmake, python, perl, and gcc to build, you’ve made yourself a giant pile of garbage to deal with compared to just having a bash script that you spent two days getting into shape to send incantations to gcc.

By jcarrano, 10 months ago

I had the same experience with Buildroot. The project was stuck at an older release and it was kind of a time-bomb: as time went on it became harder to update because customizations kept piling on but at the same time there were obsolete packages (some with CVEs) which had to be upgraded ASAP.

By rramadass, 10 months ago

> Something that has also worked well for me historically is aggressively updating tool chains as an integral part of the development process. ... I've adopted a policy that for every significant compiler or build system release, we create a branch that verifies a good build. If it doesn't build without warnings or errors, or fails in testing, that is treated the same as a bug in the code to be dealt with ASAP. ... This also makes it much easier to incrementally modernize and refactor the code base with new language features -- the tool chain supports the latest and greatest even if the code base doesn't yet.

Good Point.

By withinboredom, 10 months ago

I worked at a place that required vendoring dependencies and had to be code reviewed just like it was your own. In fact, we were responsible for fixing any bugs in them ourselves. Sometimes this was just a matter of opening an issue, sometimes it required backporting the fix to our version, and sometimes it meant we took over a fork because the dependency was no longer maintained.

There were too many cases where I looked at the PR preview to do my own code review and decided to just write it myself. There is a ton of shitty code out there. More than you'd expect.

By db48x, 10 months ago

That is the only sensible way, in the end. Ultimately you have to fix the bug in your product, even if the bug comes from a library. Users won’t generally like to hear you make excuses of that kind.

By BobbyTables2, 10 months ago

I’m amazed at the “big name” open source projects that effectively accept PR requests without review nor testing.

Open-source feels like a cheap buffet at times…

By dartos, 10 months ago

I mean… you still pay at a cheap buffet.

By bigfatkitten, 10 months ago

You're either paying at the counter or you're paying in the bathroom.

By dartos, 10 months ago

At a cheap buffet, probably both

By kidneystereotyp, 10 months ago

[dead]

By jcelerier, 10 months ago

We went with Qt, CMake and modern C++ for https://ossia.io in 2013 knowing that it would be a long term effort for an extensively extensible linux/mac/windows desktop software aiming to do real-time audio, visuals and networking and so far this "classic" stack keeps on giving and allowing me to ship regular features and improvements, here's to the next ten years :) in the meantime I can't count how many techs and frameworks I've seen come and go but these are here to stay.

By sroerick, 10 months ago

Coming out of python land and spending a bit of time working in Emacs Lisp was kind of a breath of fresh air. Working with the org roam library, I’d see a lot of complaints about how it hasn’t been updated in two years. This was of course, a feature and not a bug. I really did feel a prevailing sense of calm that libraries would not drop out from under me, that I was just running basically all my own code or borrowed code, and if it works, it would continue to work.

This contrasted wonderfully with my experiments with Gatsby and Node, which I foolishly deployed to client websites not knowing that as a result I would be doomed to years of deployments breaking and nightmare library updates.

Of course, the trade off here is that with the exception of a few wonderful libraries like org-roam, basically everything, my own code included, shipped basically broken out of the box

By suzzer99, 10 months ago

> Write super boring code. Write naive but obvious code. “Premature optimization is the root of all evil”. If it was too simple, you can always make it more complex later. And that moment might never arrive. Don’t write clever code until you simply have to. You will not ever regret writing code that was simple.

Hang this in the Louvre.

By globular-toast, 10 months ago

But be very careful not to mistake easy for simple. Too many people think that simple means easy when in fact easy often leads to the most complex code.

I don't like that clever is thrown in here. Your software architecture should be clever. It should be clever in that it allows you to write simple code.

Three simple functions in three different layers of an architecture is better than one function that couples various layers.

See Rich Hickey for the difference between simple and easy: https://m.youtube.com/watch?v=SxdOUGdseq4

By suzzer99, 10 months ago

Yeah for sure there's an art to it that only comes with experience. But you better be 100% sure you need those three layers. My first big project as a developer was on an over-designed reporting/query-builder application that had like 7 layers of of abstraction between the browser and the database - stuff like just in case we ever want to switch from Oracle to Sybase, unnecessary connection factories, Java controller classes for every type of html form field, etc (still have no idea why we needed a back-end CheckboxController).

I ended up maintaining that thing and it was a great lesson in premature optimization. The bottom 3 or 4 layers remained a grey box that I did everything to avoid touching. Just changing the front-end query-builder flow from a series of forms to a flat panel that could do everything in one step involved ripping out half the back-end code. All those layers of abstraction and none of them considered a form-flow change, which is exactly why I never assume anything now.

It's so much easier to add a layer than remove an unneeded one, which can be practically impossible on a complex app.

By globular-toast, 10 months ago

Nobody is ever 100% sure. The only thing you can really be sure about is the software will be required to change in ways you have not yet imagined.

I've become wary of the "you won't need it" attitude. In my experience, tomorrow rarely arrives. If you don't put work into your architecture up front then you'll just end up building a ball of mud. It's even worse if you've got junior developers who very much code by example.

By kmoser, 10 months ago

I agree 100% with the section on documenting your system and code. It's sobering to consider how many HN posts about software development contain comments to the contrary, indicating that documentation (and heavily commenting your code) is useless.

It seems the longer you've been developing software, and/or the bigger the project, the more you become a fan of documentation.

By elric, 10 months ago

My previous customer insisted that documentation (Javadoc in particular) was Bad and should be Avoided. Because "code should be self documenting". It's such a stupid fallacy. I could not convince them to even add the bare minimum of class-level documentation (explaining what the purpose is of a given class and how it fits into the bigger picture). This is all fine and well today, when you're working on that code base. But three years down the line when you have to go back to fix a bug ... good luck remembering how the ConfabulatingFooService relates to the ChristmasLights system.

By OtomotO, 10 months ago

Documentation can be utterly useless, if taken to the extreme.

Like e.g. in Java it was common to have a comment on a simple (!) setter, telling you "this sets X"

No shit, captain obvious! Never would have guessed that.

I like a combination of "literate programming" (in a light form at least), that leads to readable, self explanatory code that's still fast and well (not prematurely) optimized.

But reading docs about the most trivial things: Brrr!

By elric, 10 months ago

> Like e.g. in Java it was common to have a comment on a simple (!) setter, telling you "this sets X"

This has never been common in any sensible environment. Only on projects with project leads who had zero clue about anything, and with silly catch-all rules such as "EVERY METHOD MUST BE DOCUMENTED!". This predictably leads to shitty documentation.

   /**
    * Sets foo to <code>foo</code>
    * @param foo new foo value
    */
   public void setFoo(int foo) {
     this.foo = foo;
   }

No one sensible does this. Some IDEs might auto generate this garbage. In which case that should be disabled.

By OtomotO, 10 months ago

Wholeheartedly agree, but there is a lot of non sensible software out there ;-)

By rcxdude, 10 months ago

It's probably the largest fraction of javadoc by volume I've seen.

By elric, 10 months ago

I'm sorry to hear that you've been plagued by shitty codebases. Have you considered trying to improve them? Or maybe try looking for greener pastures.

By kmoser, 10 months ago

I'll gladly deal with a few (or even many) useless "this sets X" comments if they come along with helpful comments for other non-trivial methods.

By medo-bear, 10 months ago

> Like e.g. in Java it was common to have a comment on a simple (!) setter, telling you "this sets X"

Given the prevelance of mutating functions in Java this is welcome

By OtomotO, 10 months ago

No, it's not. Because it doesn't add anything.

Just document if something is mutating and what it mutates.

Or simply name your functions accordingly.

Like of your setter is doing a HTTP call, maybe name it so that it's obvious.

By ChrisMarshallNY, 10 months ago

> I’ve personally been burned on Python by the last bullet point where one of the dependencies required version 3.14 or less of module such and such, and another dependency needed 3.15 or higher.

This is what people used to call “DLL Hell,” in Microsoft Windows.

COM was supposed to fix that, but I don’t think it worked especially well.

By pjmlp, 10 months ago

COM has nothing to do with fixing DLL hell, it is as old as OLE, and uses DLLs as well.

COM fixes having a cross language OOP ABI, that is all.

The things that were supposed to fix that were:

1- Version resources, which hardly matter, because the app loading them has to validate the versions itself

2- .NET, hence the Global Assembly Cache and having the version as part of the Assembly (DLL) lookup. Still has issues if the lookup rules were badly configured on the app.

3- Application manifests, bringing the .NET ideas to Win32

4- Registry Free COM, extension of app manifests, allowing direct lookup of desired COM libraries without going through the registry for the version

5- UWP sandbox, didn't took off, only what is inside the sandbox is searched for

6- Easiest one, don't put stuff all over the place, only search inside an application specific directory

By db48x, 10 months ago

Rust fixes the problem at the language level by allowing your dependencies to use different versions of their dependencies. It can be a little rough on the size of your binaries, but you can end up with as many different versions of the common dependencies as you need.

By Animats, 10 months ago

That only works if the dependency doesn't expose types used by multiple other dependencies.

I keep running into this with the Rust 3D graphics stack. That's winit (windowing), wgpu (wrapper for Vulkan, Metal, WebGPU, etc.), egui (2D menu overlay), and support crates such as wgpu-egui, wgpu-profiling, tracy-profiler, and glam (vector and matrix math). Every time one of these has a breaking change, it takes 4 to 6 weeks before the whole stack works without patching. Then I have to fix three more levels of my own.

By db48x, 10 months ago

Yep. In principle you can examine the crates before you use them and pick ones that won’t ever expose you to that pitfall. In practice, of course, things are not always so ideal.

But at least if you end up with incompatible versions then the compiler will step in and prevent your program from compiling, instead of allowing it to crash mysteriously later on. It’s frustrating, but not half so frustrating as it could be.

By Animats, 10 months ago

Yes. Cargo and the Rust compiler do catch clashes.

A big advantage of that is that you seldom have to do a clean build in Rust. Make-based systems seem to require regular "make clean" cleanups, but Cargo has enough smarts to really know when it can avoid recompiling.

By ChrisMarshallNY, 10 months ago

The issue is probably the same as with COM. COM introduced versioned “packages,” inside of the library, but that meant that library providers needed to package up multiple versions, as part of their CD system.

In practice, a lot of orgs just stopped including deprecated versions.

By db48x, 10 months ago

Yea, that was a mistake. Rust handles it transparently; if the program you write happens to depend on two versions of something, it just downloads them both for you.

By ChrisMarshallNY, 10 months ago

How does this work with linking?

I would assume that both versions publish the same API, so does Rust add a mangling to the link reference?

One of the things that made DLL Hell so bad, was that you would have no idea that you were calling a deprecated/new version of the function, until it started misbehaving.

By db48x, 10 months ago

Rust does name mangling, and it basically just adds the crate version to the mix when it mangles the names. So if crates A and B both depend on X but different version, then A can only call functions from its chosen version of X while B can only call functions from its version. There is an optimization pass that discards duplicate functions, so functions in X that haven’t actually changed between versions will be deduplicated. This all happens automatically, so nobody ever has to think about it. Not the crate authors, and not you.

What does get complicated is when the crates A and B both deal in types from X. If a function from A returns a type from X and a function from B takes that same type as an argument, then the compiler will step in with an error that the types don’t match. It’ll tell you that type Foo (from crate X version 1.0) doesn’t match Foo (from crate X version 2.0). This prevents all the possible runtime errors that could occur if you were really mixing an matching between both version. In that situation you will likely need to constrain your chosen versions of A and/or B such that they can agree on a single version of X, instead of allowing cargo to simply pick the latest available version.

By norir, 10 months ago

I'm sure that cargo does a clever job of all of this but this kind of functionality is precisely why I find rust so off putting. It encourages you to take on huge amounts of unnecessary complexity (including complicated dependency trees) and then tries to hide that complexity in abstraction. But in practice these are always leaky abstractions that _someone_ (and likely you) will have to pay for. At a baseline, the poor compilation times and byzantine of rust are to me the most obvious symptoms of this embrace of complexity.

By db48x, 10 months ago

If the language you use don’t have any way of dealing with this problem then one day you will be dealing with heap corruption because a library you (transitively) depend on changed the size of a struct. This is the fate of every C and C++ programmer.

Or you could use a safe dynamic language like Lisp, Javascript, Python, and many others. Programmers using these languages are so productive because they never have to put up with any nonsense like random heap corruption.

Or you could use Rust, which gives you the safety without any of the runtime costs incurred by the dynamic languages. What you call Byzantine complexity is in fact a very simple rule that you can teach to anybody, and which in practice most developers never need to deal with. It doesn’t even slow down the compiler.

Engineering is all about tradeoffs, so choose wisely.

By PartiallyTyped, 10 months ago

There is no dynamic linking in rust when it comes to other rust packages.

By int_19h, 10 months ago

Not quite. DLL Hell was the situation when apps installed their dynamically linked dependencies globally, so that when two apps shipped different versions of the same library they could conflict. The situation was much rarer for dependencies because dependency trees simply weren't that deep in that era.

It had also been solved since the introduction of assembly versioning over 20 years ago, which provided a standard way to have multiple versions of the same dependency installed and loaded side by side. Later Windows versions also added the ability to do the same for app-local DLLs, to the point where you can have different parts of your code get different versions of the DLL when they ask for it within the same process. So, for all practical purposes, it has been a solved problem for a very long time now.

By NoZZz, 10 months ago

It required the re-implementation of stable interfaces as the components updated in version number; not everyone did that. I guess. A case of good plan, lazy execution.

By mardifoufs, 10 months ago

DLL hell hasn't been a thing for more than a decade at this point though, so maybe it did. I genuinely never encounter any issues with DLLs anymore.

By bregma, 10 months ago

Most Windows apps ship with copies of all their dependencies. The DLL Hell problem was solved by disk and bandwidth getting cheaper to the point where many dozens or even hundreds of copies of MSCVRT.DLL and friends on your system pass unnoticed.

By int_19h, 10 months ago

If you absolutely need a way to install dependencies globally such that they can be shared between apps, you can still do that without version conflicts if you write proper manifests for them and use https://en.wikipedia.org/wiki/Side-by-side_assembly.

But it's easier to just package a local copy and not worry about any of this, so that's what people mostly do now.

By buzzin_, 10 months ago

The cost of storage space these days is about $0.01 per 100 megabytes.

By mardifoufs, 10 months ago

But isn't that a good thing? I don't mind having more DLLs if stuff just works. It avoids the similar, though harder to solve issues on Linux. I'd rather be able to have a copy of the MSCVRT.DLL than to have to pray that my system has the right glibc to be honest.

But maybe I'm not understanding the trade offs correctly.

By cesarb, 10 months ago

> I don't mind having more DLLs if stuff just works.

The counterpart is when a remotely exploitable security issue is found in that DLL, as has happened with zlib in the past and log4j more recently; then you have to chase down and update every single copy of that DLL.

By maccard, 10 months ago

An awful lot of it went away in my experience in the c++11 world when everyone all Of a sudden wanted to upgrade. Then another time in 2015 when MSVC decided their standard library would be backwards compatible

By francisofascii, 10 months ago

Then later on NuGet was supposed to fix that, but even with a package manager you encounter "dll hell" like scenarios.

By a1o, 10 months ago

What I found mostly curious is I didn't even knew phyton 3.15 had already released

By lambda, 10 months ago

The example was talking about versions of another package that both of the given packages depend on. Not Python itself. The version numbers were made up.

By ChrisMarshallNY, 10 months ago

I’ve found the best way to write software that lasts, is to be “boring.”

Avoid the buzzwords, stick to the language basics. Do things by hand. Defintely avoid dependencies.

I’ve written libraries in C, that were still in use, 25 years later.

By dtrav, 10 months ago

This post starts with " Software that controls (nuclear) power plants, elections, pacemakers, airplanes, bridges, heavy machinery " and proceeds to list pracitises you should apply to, say, shopify or facebook. Those practises are no where near robust enough for industrial software. Such is the degree of risk adversity that the use cases above can run on out of date hardware and ancient software, and change is deplored in favour of workarounds. Why is that ? Well engineers see risk differently to us - they will attempt to remediate all risk, whereas we will mitigate that risk. That mitigation is the substance of the post, and I have no quibble with that, but to assume that is best practise for my ICD is naive. The author proposes that doing what we do well is sufficient, whereas we need to shift our thinking to a world in which we are in complete control.

By rramadass, 10 months ago

Very good point particularly w.r.t. "remediate vs. mitigate". Do you have any articles/resources/books that you can point to on how it is done for "industrial software"?

By crabbone, 10 months ago

Here's my controversial idea on one of the aspects mentioned in OP. Here's my approach to dealing with dependencies in Python:

* When setting up the project, in the initial stage, install dependencies with pip or conda, whichever was used for the project.

* Examine what was installed and figure out why it was needed.

* Remove dependencies that are only touched by code paths that aren't executed by the project I'm working on.

* Write a script that downloads the thinned out list of dependencies, skipping pip or conda. Unpacks and installs it w/o the help of either. Then, possibly post-process the installation (a bunch of Python packages come with a lot of junk, like unit tests, scripts to call library functions etc.)

* Finally, use this script, and on a side install the same combination of packages with pip or conda once every so many months (maybe about half a year) to see if there are any changes. If there are worthwhile changes: incorporate them back into my script.

On the face of it, this is more work than writing some configuration file that pip or conda will understand. On the other hand, for so many times I was burned by Python's dependency problems, this approach proved to work better for me in terms of how many times CI was broken, and the time I had to spend investigating CI errors went noticeably down.

So, to restate it: I see package management software, at least in Python world, as one of, if not the most important contributors to the failure of ensuring project's longevity. This software prioritizes the ability to fetch and to adjust for the new stuff to the ability to support the old stuff. This software prioritizes information and wishes of the third-party developers over the wishes of the user installing the software. I don't think it does this out of malice. It's probably natural for most people to want new things than to try to preserve the old.

By rramadass, 10 months ago

Relevant:

Lehman's Laws of Software Evolution - https://en.wikipedia.org/wiki/Lehman%27s_laws_of_software_ev...

Studying the laws of software evolution in a long-lived FLOSS project - https://pmc.ncbi.nlm.nih.gov/articles/PMC4375964/

By ahubert, 10 months ago

Added this to https://berthub.eu/articles/posts/on-long-term-software-deve..., thanks!

By rramadass, 10 months ago

Software Evolution - https://en.wikipedia.org/wiki/Software_evolution

By tasuki, 10 months ago

> What might not be a great idea is to have 1600 dependencies in 2024, dependencies which already change at such a rapid clip your code base is effectively a moving target.

Has the author not heard about reproducible builds? Why would you care that your dependencies change?

> Dependencies...

> Drift away, leading to adjustments in your code or, worse, silent changes in behaviour

> Shift to new major versions with semantic changes, requiring rewrites on your part

> Get abandoned or simply disappear, or start to decay

Really, these aren't things that can affect you at all if you have reproducible builds!

> Tests are always a good idea, especially if you have many dependencies which shift and drift all the time.

Not a thing! If your dependencies "shift" or "drift", you're doing it wrong!

By perlgeek, 10 months ago

Even when you have reproducible builds and local copies of all recursive dependencies, you have to be able to react when e.g. a security vulnerability is found in one of them.

If you have 1600 of them, you probably don't know them all very well, so you might already be in trouble in the assessment stage. If there's a fix upstream, you might need to update to the newest version, and that in turn might force you to upgrade more dependencies, or introduce new ones etc.

Reproducible builds are great, but they aren't a panacea for dependency hell.

By tasuki, 10 months ago

Yes, security vulnerabilities are about the only case when you have to react. The kinds of libraries that can introduce security vulnerabilities (crypto, networking, etc) are hard to write by yourself. If you do happen to write that code by yourself -- to minimize dependencies -- chances are it's full of security holes anyway. Security is hard!

By devjab, 10 months ago

I think that it is interesting that testing plays such a part in these plans. I guess it shouldn’t be a surprise after so many years of TDD, Clean Code and whatever nonsense the pseudo industry of “best practices” has been successfully selling. Odd on its own considering software seems to be just as broken as it was 20 years ago despite all these efforts. Anyway, if it was me I would look closer at how NASA build things. Which includes testing, but the key tool for finding actual programming errors was assertions.

I’ve worked in medial software for a short while. The key features of development when lives are at stake are, assertions, avoid interpretation and no dynamic memory allocation. You should test things, but your assertions should catch any programming errors without a test/debug suite. You should never use an interpreted language, because that is a head ache you don’t want, but you also shouldn’t parse data like JSON. If you need to send data you send bytecode. You do this because you really don’t want dynamic memory allocation in software that will kill people if it fails.

Now a little anecdote. In Denmark we have digital elections, not the voting but the voter registration and the system where each municipality reports the results. These systems run on COBOL and are ancient. There has been multiple attempts to replace them with “modern” software, because the old system can only be run by one private company and that is a monopoly. This is because we privatised the sector 20ish years ago. Anyway, every attempt at replacing it with modern long term software has failed, and a big part of the reason is because people have forgotten how to write code which isn’t infected with all sorts of OOP bullshit. So even the best suppliers and large companies like IBM have failed to make something with the same resilience. It’ll be interesting to see if we’re also going to infect embedded software with these things as the public sector decides to enter and hand out “best practices”.

By rramadass, 10 months ago

Great points! Regarding assertions what techniques/practices did you guys follow? What did you do when an assertion failed? I myself am a fan of DbC and hence am interested in knowing more about how others use/modify/extend the technique.

> If you need to send data you send bytecode.

I presume you mean some encoding of binary-to-text; eg. base64 etc. ?

By devjab, 10 months ago

We used runtime assertions both as executable documentation and to avoid system corrution by failing on any false. Since you can't just stop a critical system we would use triple modular redundancy ultimately leading to devices running in a "safe state". If this happened you would need to get your device recalibrated but you wouldn't die.

Base64 would require dynmaic memory, you would likely use a packed struct. How you would do it depends on your compiler, but we would use #pragma.

By boomlinde, 9 months ago

You can parse and encode JSON without dynamic memory allocation. You can encode base64 without dynamic memory allocation, and you can even decode it in-place. You just need to know the appropriate buffer sizes for a valid payload. For base64 this is a trivial function of the input size. For a streaming decoder you don't even need more than a fixed few bytes of state.

I am not arguing that either of these are appropriate for your application, particularly not base64, which doesn't address the basic problem of representing structured data, but it isn't true that they require dynamic allocation, which is good to keep in mind if you ever find yourself in need of these encoding schemes in a resource constrained system.

By cess11, 10 months ago

Ten years is not a long period, unless you're a kid or adolescent. Aim for a century.

By int_19h, 10 months ago

Software engineering hasn't even been around for a century, so we don't know how to write software that lasts that long. We can only make educated guesses.

But, in any case, given the current state of things, 10 years is already a massive improvement.

By cess11, 10 months ago

We don't? There are computer systems running since the seventies in areas like banking and aerospace, why wouldn't they last another fifty years?

Most of my professional work over the last fifteen years has been on applications that were ten years or older when I was recruited. Currently I do work in public sector digital archiving, where it's common to retire systems that are still being updated and have been going since the nineties. This kind of long term storage aims for century or more.

By tpoacher, 10 months ago

> Keeping it simple requires periodic refactoring / code deletion

The equivalent in medicine is "Always question the established diagnoses / management plan during a new admission".

And just like the periodic refactoring mentioned here, it's a lofty idea, but without incentives, investment, and/or infrastructure to support or enforce such reviews, it almost never actually happens in practice while one is busy with the daily grind of things.

Which is how you end up with people taking drug A to counteract drug B which was given years ago to alleviate the symptoms of drug C which it turns out you didn't even really need in the first place.

By inSenCite, 10 months ago

I sincerely hope nobody is taking advice from linkedIn posts. Software development or otherwise...

By bruce511, 10 months ago

>> There are many databases around, but most of them work more or less the same. If you built on MySQL and you come to your senses and shift to something else

Im spending a bit of time on databases lately, mostly SQLite, PostgreSQL and MS Sql Server. With a bit of Firebird.

I never went down the MySql path, but statements like the above intrigue me (because I like to learn from others experience. )

So my question is - why not MySql? (And does the same argument apply to MariaDB?

By sjamaan, 10 months ago

MySQL doesn't care about your data. There are many places where it silently truncates, "helpfully" tries to "do the right thing" and so on.

It used to have very impoverished data types. At least that has been mostly amended, but AFAIK you still can't make an index on a TEXT column without a limit.

The "utf8" text type, despite the name, can't hold all of unicode. You need to use utf8mb4. This is merely one example of the general attitude. Another fine example: you can't use mysql_escape_string() in the C API to safely escape strings. Instead, use mysql_real_escape_string(). Instead of hiding the unsafe version, PHP exposes both these functions as-is. PHP and MySQL are truly a match made in hell.

The MySQL issue queue is also littered with unaddressed corruption issues which have been open for many years.

MariaDB is just a fork, most criticisms to MySQL apply to it as well.

By brabel, 10 months ago

MySQL is like the JS of databases, it does some truly bizarre shit in order to try to "help" you.

https://sql-info.de/mysql/gotchas.html

By bruce511, 10 months ago

thanks

By valyala, 10 months ago

Very good post! We at VictoriaMetrics try following similar practices:

- Constantly think on how to simplify the code.

- Keep external dependencies to the minimum - https://valyala.medium.com/stripping-dependency-bloat-in-vic... .

We have public development goals, which outline these practices - https://docs.victoriametrics.com/goals/

Sometimes we have to complicate the code:

- when this simplifies the usage of VictoriaMetrics products

- when this significantly reduces resource usage (RAM, CPU, disk IO, disk space, network) for typical production workloads

By liampulles, 10 months ago

> One of the easiest hacks for successful software longevity is keeping people around for a decade.

That doesn't sound like an easy hack at all, but I can appreciate the value of achieving it. Are there any good books or studies on this subject?

By perlgeek, 10 months ago

I don't think it's too complicated, especially if you start in a culture where people don't job-hop all the time. But, you have to make it your priority, which means:

* pay them more than competitors, pay them more than if they changed jobs

* actually treat them well, which includes listening to and acting on their feedback

* celebrate people with long tenure

* give them options to grow inside the company

* don't just hire fresh graduates, a 40yo might be tired of job-hopping and still has 20+ years of employment in them

* give them long-term incentives, like stock options

By dijksterhuis, 10 months ago

> Keep it simple. Simpler than that. Yes, even simpler. You can always add the complexity later if needed!

This. A thousand times this. Be grug brained, not big brained.

By imrejonk, 10 months ago

https://grugbrain.dev/

By BobbyTables2, 10 months ago

But if we don’t require an entire Kubernetes cluster with dozens of beefy nodes to deploy and manage a trivial app, how can we have a scalable enterprise application? /s

By maccard, 10 months ago

I know you’re kidding but a container is one of the simplest easiest most portable ways to deploy a web app.

By sjamaan, 10 months ago

Only if you consider web apps have become so complex that it requires a container to wrap up the complexity.

By maccard, 10 months ago

Disagree. If I have a container with pho, I can run it in any hosting platform in any underlying architecture without needing to worry if my host has the right version underneath. That’s a huge win. I can run it on AWS, on digital ocean for $5/mo , on lambda, or GCR. Completely portable. And I can run the exact same setup at home.

By dijksterhuis, 10 months ago

wanted to echo your sentiments on portability with containers. sometimes simplicity is hidden in the tradeoff of what you get for doing the work — i.e. simplest way to do the most “stuff”.

like i was last working on a job orchestrator thing. and containers would have meant we could run jobs on azure/aws/do/wherever. writing one job container, a customer can have that job run almost wherever they want to.

containers added some complexity, but were the simplest way of getting that amount of “stuff”. the project stalled horrendously but that’s another matter.

but also important to remember that containers add a framework/dependency at quite a fundamental level.

there are maintenance costs. although i have personal containers from like 6 years ago that still run fine!

By maccard, 10 months ago

> remember that containers add a framework/dependency at quite a fundamental level

I don’t mind this fundamental dependency. If the only artefact you have is a container, and no source code, that’s no worse (arguably it’s better) than the “we only have the compiled binary of the production service from a decade ago, and no idea what’s installed on the machine”. That complexity lives somewhere, and if your application is tied to it (e.g. a specific python version) then it makes sense to have that part of your application. Containers have been bread and butter for a long time now, I don’t think they’re a super scary new thing that we should worry about disappearing off the face of the earth tomorrow (like the latest js framework)

> there are maintenance costs

Those maintenance costs exist with or without containers. Either you have a virtualenv or whatever, a container, or a global toolchain installed and you just ignore it. I’ve got containers that I wrote in 2019 that I built last week and they just worked out of the box, no faffing around with versions of anything. In my experience, the thing that’s most likely to fail or rot is external URLs that you grab the dependencies of the container from - which will happen with a shell script or virtualenv either

Edit: https://github.com/dotnet/core/issues/9671 here’s a perfect example

By n_ary, 10 months ago

And of course, if you didn’t touch all of cloud stuff, how will you explain to the junior randomly promoted to tech lead(sometimes CTO!) that you are worth hiring when your current corp decides to lay you off? /s

By liampulles, 10 months ago

An important learning for me was to not be so concerned with different styles and ways of doing things in the same codebase. Fighting against this is ultimately a big waste of time and results in team fragmentation and "grammar Nazism" IMO.

Better to learn to read and understand regardless of the style, and micromanage each other less. Save the sweat for the big stuff.

This is something I constantly need to remind myself.

By MH15, 10 months ago

Good vibe-check on my day to day work in web tech. Sometimes I wonder how web deployments would work if we still deployed quarterly, yearly, etc.

By withinboredom, 10 months ago

Where I work now basically does quarterly deployments. We service enterprises that generally outsource their development work. So we deploy pretty regularly to a staging environment and deliver documentation weeks/months before we actually deploy it. The exception to that rule is for hotfixes, which is also extremely rare. Like 2-5 times per year.

By guytv, 10 months ago

This is why doing long-term development on Android feels like trying to build a sandcastle during high tide.

Google forces you to use their ever-changing, drama-prone OS-level components through their “official” libraries. These libraries evolve faster than a teenager’s TikTok algorithm, so good luck keeping up. Oh, and if you don’t update your app to comply with their shiny new toolchains and runtime? The Play Store will ghost you faster than a bad Tinder match.

By cpeterso, 10 months ago

On this subject, I recommend Marianne Bellotti‘s “Kill It with Fire: Manage Aging Computer Systems (and Future Proof Modern Ones)”. She has managed the maintenance and modernization of large legacy systems at the United States Digital Service. https://nostarch.com/kill-it-fire

By ahubert, 10 months ago

Added this book to https://berthub.eu/articles/posts/on-long-term-software-deve..., thanks!

By revskill, 10 months ago

To me it is testable and debuggable.

By mentalgear, 10 months ago

KISS

By exac, 10 months ago

> Ever tighter security standards. JavaScript over http is dying, for example.

Bert, I can assure you that this is not the case.

By vaylian, 10 months ago

> Bert, I can assure you that this is not the case.

Can you give some examples?

There are still legacy websites that don't support HTTPS. But these are by definition a dying breed.

By ahubert, 10 months ago

There are already some practical restrictions on non-127.0.0.1 http cookies. I fully expect more of this in the future. And a good thing too.

By mrbluecoat, 10 months ago

> Write boring simple code. Even more simple than that. Even more boring. Write super boring code. Write naive but obvious code.

That's the rub, isn't it? Boring code tends to bore developers so we stop doing it, regret it, return to boredom, only to find ourselves listless once again..

By MantisShrimp90, 10 months ago

I think as you go on it becomes a out being sure that you are solving problems in the simplest way possible. Chances are even with the simplest solution possible you will still need to do some creative stuff.

However, too many programmers in my experience (including myself) can frett about using tools and features that are cool little puzzles, but don't really contribute to the core solution and can in the worst case even make less manageable code.

As in all things, it's about finding the balance for the context. Let your personal projects be where you play with new tools while you keep solutions for your job dead simple due to all the job-related reasons to do that.

By sam_lowry_, 10 months ago

Remember the Google 20% rule?

There are ways to deal with boredom without jeopardizing business.

By int_19h, 10 months ago

Google effectively killed it off though as part of becoming a "proper enterprise".

By kidneystereotyp, 10 months ago

[dead]

By cjblomqvist, 10 months ago

Would be awesome if we had some actual proof (science) backing this up. Anyone?

By cratermoon, 10 months ago

Lehman, M. M. “Laws of Software Evolution Revisited.” In Software Process Technology, edited by Carlo Montangero, 108–24. Berlin, Heidelberg: Springer Berlin Heidelberg, 1996.

Lehman, M.M. “On Understanding Laws, Evolution, and Conservation in the Large-Program Life Cycle.” Journal of Systems and Software 1 (1979): 213–21. https://doi.org/10.1016/0164-1212(79)90022-0.

———. “Programs, Life Cycles, and Laws of Software Evolution.” Proceedings of the IEEE 68, no. 9 (September 1980): 1060–76. https://doi.org/10.1109/PROC.1980.11805.

Lehman, M.M., J.F. Ramil, P.D. Wernick, D.E. Perry, and W.M. Turski. “Metrics and Laws of Software Evolution-the Nineties View.” In Proceedings Fourth International Software Metrics Symposium, 20–32, 1997. https://doi.org/10.1109/METRIC.1997.637156.

By ahubert, 10 months ago

Added these references to https://berthub.eu/articles/posts/on-long-term-software-deve..., thanks!

By elric, 10 months ago

If you're interested in some scientific background to Software Engineering, I can recommend the book "Making Software" (O'reilly) by Andy Oram & Greg Wilson. It's a bit old now, but addresses and challenges many common beliefs about Software Engineering.

https://www.oreilly.com/library/view/making-software/9780596...

By TZubiri, 10 months ago

I understand the strategic advantage for offline software for elections and nuclear power plants. For security reasons you can't depend on outsiders, and you've got long term budget to support in house teams.

However in terms of longevity, in the consumer space and corporate space, third party SaaS is much more stable and reliable than offline software. How much of the data from offline software do you keep from the 2000s or 2010s? You changed computers and the data is gone. Yes, many SaaS providers pull the rug, but usually it's fine and you get a chance to migrate the data?

If I had to place a bet on whether the files on my desktop will outlive the files on my Google Drive, I wouldn't take it.

By cesarb, 10 months ago

> If I had to place a bet on whether the files on my desktop will outlive the files on my Google Drive, I wouldn't take it.

I have files on my desktop which have lived on it (and its predecessors) for longer than Google Drive existed. Some might be older than Google itself. That's probably not an uncommon case. You say "you changed computers and the data is gone", but it's far too easy to copy your documents folder (or equivalent) from one computer to the next, either directly or through an intermediate burned CD or external disk (which ends up also becoming an informal backup of that old data).

Meanwhile, I have already lost access to at least one Google account (the one which was originally my Orkut account); if I had any files on it (instead of just using it to log into Orkut), I would have lost them.

By atoav, 10 months ago

You are sure about that? The oldest machine I have seen at my University IT was a server with an uptime of 16 years. Judging be the look it was probably 30 years old. I could name you numerous services that came and went within three decades, granted that old machine did something simple (can't remember, probably a NTP server), but I am pretty sure an external service would have been more unreliable throughout that timeframe.

By int_19h, 10 months ago

I would totally take that bet. As far as personal files that I care about go, there are some that predate this whole "cloud" thing even before it was called that.

And these days, given how dirt cheap storage is and how fast transfers are, it's even easier to copy data from system to system and to maintain backups. Heck, we have 8-terabyte USB drives these days.

Long Term Software Development