Is StackOverflow Losing Its Way?

I find myself rather surprised to be writing this post. I’ve long been a supporter of StackOverflow as a far more effective means of getting answers to programming questions than the extremely low signal-to-noise ratio you get get on the more traditional forum-based approach.

Gaming the System

StackOverflow, like many sites, uses a point-based system called reputation (and karma and other names elsewhere). You get points for having your posts voted up by other users. You lose points for having them voted down but unlike many other sites downvoting costs you a token amount and that token payment is important. It does a lot to stop spurious downvoting as you’ll witness elsewhere.

6 Simple Tips to Get Stackoverflow Reputation Fast was written by a user who managed to 25,000 reputation in ~4 months. That’s a little over 200 per day (the soft cap) so is noteworthy but not extraordinary.

As a long-time user with almost 50,000 rep I feel compelled to comment on some of the points and qualified to do so.

1. Be the First to Answer. Even at the cost of quality.

Being first is (or, rather, was but we’ll get to that) important but that’s not a bad thing. A user getting a quick answer is a good thing as long as it’s useful. If it’s wrong it’ll probably get downvoted. So you can’t just post anything or any benefit of being first will be irrelevant.

2. Use Downvotes and Comments Strategically

This is basically true. One negative behaviour this encourages is what’s usually termed “tactical downvoting” where you vote down another answer simply to improve the relative positioning of your own answer. Various fixes of this have been suggested like disallowing downvoting on something you don’t answer. I don’t agree with that because you often end up answering something because something is wrong.

My personal preference is either to not have anonymous downvoting or not to have it when you also answer in the interests of full disclosure.

3. Use obnoxious in-your-face formatting and lists.

I don’t know why the author chose the term “obnoxious”. It’s true that an answer with some combination of images, links, lists, block quotes and other content visually different from text that breaks up that text will, in general, be more pleasing to the eye and more easily noticed on casual inspection.

4. Be Aware of the 200 rep/day Limit

True enough if your sole goal is to increase your points. The only one negative thing about this (which again has met with stubborn resistance to fixing or changing) is that the point gain is inconsistent. Accepted answers, for example, can go above the 200/day limit but only if they happen when you’re at a score that will go over with the additional points. This means things like the order of upvotes and accepted answers can change the end number. There are lots of inconsistencies around this but they’ve all been written off as being “by design”, which in my view a failing.

5. Edit, But Don’t Edit Too Much

The author is correct about this. If you edit your own post ~6-7 times it becomes a form of wiki (“community wiki” is the official term) and not owned by anyone. Any future upvotes will gain you no points. This was done to stop people endlessly bumping their posts but, in this author’s opinion, it fails miserably at that. There are many ways of bumping your own post without editing it:

  • Add another answer as yourself or a so-called “sock puppet”;
  • Edit another post on that same question (requires 3,000+ reputation); and
  • Retag the question (500+ reputation).

I’m told that a sufficient number of any kind of edit will force wiki status on it anyway but I’ve never witnessed this personally.

My suggestion is to not have edits to your post after the 6th or so bump the question. People shouldn’t be disincentivized from maintaining content they wrote.

So basically the author is correct and, barring some qualification, I have no problem with the post itself, but this brings us to…

Interestingly, this thread prompted Jeff Atwood to respond on Meta with 6 Simple Tips to Get Stackoverflow Reputation Fast. But I’m interested to know what about this post gets the attention of the powers that be when these issues months old and cold in the ground having been declined or declared to be “by design”?

The Reddit Cesspool

The aforementioned post was posted to reddit as Why StackOverflow sucks, which to me is a gross misrepresentation of the author’s intent. Obviously the submitter simply took it as a case in point of why he thinks StackOverflow sucks but that’s his axe to grind.

The comments (over 300 at last count) make for a sad and in many ways predictable read. My favourite is from an urchin named relix:

I've answered exactly 1 question. It was the only correct answer to that question but it didn't get any votes and didn't get selected as the answer. Instead, the wrong answer was selected and got all the brownie points.

 

That was the moment I decided never to waste time on StackOverflow again. Why even bother.

The only sad part about it is that for some reason 27 people have seen fit to upvote this drivel. A neophyte picked a bad alternative answer. Get over it. It’s worth noting the answer in question currently has 49 upvotes (which is a lot in SO terms).

One has to wonder how many of the reddit commentors actually read the linked article or just took the title as invitation to grind their particular axes.

Yet this idea that the system is worthless if it isn’t perfect is unfortunately rather pervasive among many reddit regulars. My opinion is that this belies the likely age of frequent redditors as all or nothing black and white thinking is a common malaise of those under 25, particularly amongst programmers who have a tendency for that kind of thinking anyway.

But I Just Want To Talk!

The other common complaint about StackOverflow in these comments was the fact that it was hard to have a conversation with the chosen format. Yes it is. That’s the point.

Take the comments on reddit, slashdot or any other reasonably popular site. It’s not hard to get 500-1000+ comments on an article. What purpose do they serve? Much like the answers on Stackoverflow, you’ll find that people will scan the first few, quickly get bored and move on. There are probably a half dozen people on the planet that read every one of those comments.

These people are the East German judges of the internet.

To put it in statistical terms, they are outliers that will distort your sample. You simply throw them out as bad data and look at the rest.

In a programming context, it’s easy for forums (which are designed for idle chit-chat) to degenerate into noise. The example I used is the difference between able to ask a question about Haskell and getting an answer versus having that same forum topic degenerating into 15 pages of whether the Death Star uses Haskell or C#.

If you want to have a chat, Stackoverflow is not the place for you. This is a Good Thing.

Some see value in such endless dicussions (“evolve” was used on more than one occasion). You’re kidding yourselves if you think that. Only 5 other people will read them all and you’re all so closed-minded that you simply won’t convince anyone who disagrees with you anyway.

And yes that’s a generalization but where there’s smoke there’s fire.

I’ll Take Any Answer

A recent change was to display acceptance rates of users, being the percentage of questions they have asked that they have chosen an accepted answer for (with some caveats).

The idea is to encourage people to accept answers. Often you’ll get someone come to the site, ask their question, get an answer and that’s the last you’ll hear from them. If so, putting their acceptance rate up isn’t going to change anything.

So is the target more experienced users? If so, should we be encouraging people to pick answers for which there really is no satisfactory answer yet to avoid at least the appearance of not using the site right (by accepting answers)?

To me this is tinsel.

Fastest Gun In The West

Returning to the first point of codexon’s post: yes being first is useful from a pure points view. I consider this a good thing. Getting an answer within minutes is good. Waiting a day and hoping someone checks the forum for your post (let alone bothers to answer it) is not.

What has happened to the sorting of answers on stack overflow? has caused a bit of a stir. Previously the default view for answers was net votes first (highest to lowest), time of post second (oldest to newest). This has changed so posts with the same net votes will be randomly sorted on each view.

There are many reasons I hate this change.

  1. Until users figure this out, there will be a much greater tendency of picking a random answer even if it was a “me too” answer that came much later;
  2. It makes it easier to clone other answers because you now have to rely on people looking at timestamps to figure out who posted first rather than the natural order;
  3. It’s even less worthwhile putting extra effort into particularly low-vote questions because upvotes are a far more hit-and-miss proposition;
  4. The Fastest Gun In the West “problem” is not a problem; and
  5. This change is huge. To trade the devil you know for the devil you don’t for such small apparent (potential) upside is a huge risk.

This most definitely isn’t tinsel.

The old system clearly got questions answered. Isn’t that the most important thing? So what’s the problem?

Conclusion

What I’ve been seeing over the last month or two are changes that have more on from the cosmetic (and/or irrelevant) to the downright undesirable. Combine this with the obstinate refusal to even acknowledge the downsides of the current system (let alone do anything about them) is now bordering on the bewildering.

What’s more, I’m disappointed that it takes this kind of external scrutiny to grab attention when the Stackoverflow community—who have been vocal on these issues for a long time—is seemingly dismissed?

One hopes that the community is being listened to on the site created specifically for them to have a voice but seeing recent changes and that external scrutiny rather than internal dialogue is provoking a response, one has to question how much they’re being listened to.

Test-Driven Development: I Finally Get It

I’ve never been a big fan of TDD (“Test-Driven Development”). Some things are just plain hard to test. Some people end up doing end-to-end tests with tear-down temporary databases and calling them “unit tests”. Recently I’ve seen the light and realized my mistake. I think I’ve unfairly misjudged TDD but at the same time I think it’s made a serious error.

Let’s back up. Kent Beck’s 2002 book Test-Driven Development By Example ignited this particular movement (although it seems to be more of a lit brazier than a raging bonfire to take that metaphor further). As Beck describes it:

My goal is for you to see the rhythm of test-driven development:

  1. Quickly add a test
  2. Run all tests and see the new one fail
  3. Make a little change
  4. Run all tests and see them all succeed
  5. Refactor to remove duplication

This is the part I’ve always had a problem with.

Not Everything Needs To Be Tested

This is my first criticism. Now I don’t think any TDD advocate will argue that absolutely everything needs to be tested. After all, load a controller in your Spring MVC application and it’s either there or it isn’t.

But my point is that that the premise of testing everything strikes me as naive.

Not Everything Can Be Tested

Wikipedia has a good summary of TDD Limitations. The one that jumps out at me is UIs. That’s a bit of a problem since a lot of software has UIs. What’s worse is that they’re increasingly Web UIs that are even harder to test because the functionality can rely on server calls and once you start getting into mocking Ajax calls it’s not long before you’re banging your head against the wall hoping the pain goes away.

Unit Tests Are Fragile

Often if you get a decent suite of unit tests you are only a few requirements changes or change requests away from half of them breaking. Unit tests are software. That may sound trite but it’s worth remembering. The larger and more complex your unit tests, the more code you’ve written to implement a particular feature (the tests count in that metric) and that’s more code you have to change.

I’ve lost count of the number of times I’ve seen build instructions that have quickly degenerated into:

mvn –Dmaven.test.skip package

I See The Light

The problem with Beck’s methodology is that it is prescriptive. Write a test, fail it, write some code. I see this as a useful learning technique but I don’t think it should be the end goal.

Let me give you an analogy: if you describe Spring to a novice Java programmer or one that is simply unfamiliar with dependency injection or inversion of control they just don’t get it.

That’s completely understandable. I can still remember when I first heard about Spring back in 2004 and I didn’t get it. It just sounded like a way of standardizing configuration for, say, EJB containers (EJB 2.x having different config files and behaviour depending on which application server you used). In that context it sounded potentially useful but nothing to get too excited about. Statements such as “Spring is a lightweight container” just didn’t sink in.

Of course, that just misses the point completely.

Having learnt the error of my ways, I now firmly believe that DI/IoC are now right up there with object-oriented programming and managed code as key turning points in software development. The point is that it changes how you design software, how you construct your object model and how you put it all together. In doing so you should realize (if you hadn’t worked out already) just how evil an anti-pattern the static Service Locator really is, which was common in J2EE applications 5+ years ago.

One of the benefits of Spring is that an application designed with Spring in mind will in all likelihood have much greater testability as anything with injected behaviour can just as easily have mock objects injected instead.

What about TDD?

The conclusion I’ve come to is this: TDD is descriptive not prescriptive.

Put another way, TDD is first and foremost a set of principles rather than a rigid methodology. Just like Spring (and somewhat related to it) TDD for me is about changing your thinking.

I’m currently in the process of writing a Java library I hope to release as open source soon. As I wrote in It’s Time We Stopped Rewarding Projects with Crappy Documentation: Open Source is No Excuse, I firmly believe that any publicly released software—open source or otherwise—must be held to a higher standard than software you may just write for yourself. For my upcoming library this means having extensive unit tests with high coverage.

That’s the key point: before I wrote a line of code, while I was designing it (in my head) I was asking myself “How do I test this?” For some of it I wrote unit tests first. For some of it I haven’t. But the point is that I’ve thought about it from the beginning.

You can contrast that with software I’ve written previously where unit testing has been nothing but an afterthought at the end of the project, something to do to check off the “unit testing” box on the project plan. In my experience, that approach works incredibly badly. The longer you put off thinking about how to write automated tests for something, the harder it will be to graft in later.

Conclusion

My view of TDD can be summarized as follows:

  1. Think about how to write automated tests for something before you start; and
  2. TDD is descriptive not prescriptive.

Purists may argue this isn’t TDD at all. If not it is at least the essence of TDD. There’s no need to be a fanatic about unit testing. Like so many other things, it’s just another tool in your hopefully extensive toolbox.

My CSS3 Wish List

Whether you frame the argument as “divs vs tables”, “tables vs pure CSS” or whatever there are some strong opinions out there. I’m all for the idea of semantic markup. The problem? There are some things you can do (sometimes trivially) with tables that with CSS are:

  • not compatible with IE6 (or even IE7 potentially);
  • done with a fixed or partially fixed width layout but can’t be done a liquid layout;
  • approximate facsimiles but have cases where they break down (eg if two side-by-side floated divs are too wide for their container one will drop below the other); or
  • just downright impossible.

But rather than rehash that debate I simply want to put up my list of what I think CSS3 needs to be able to do. More importantly, it needs to be done trivially.

1. Vertical Centering in Fixed Height

Vertical Centering in CSS, with it’s relative+absolute+relative positioning with three nested divs is a prime example of a non-trivial equivalent to “vertical-align: middle” on a table cell.

2. Vertical Centering in Variable Height

The previous technique can be applied to this problem too but things start getting messy and some trivial cases start to get hard.

3. Fixed Column Layout

Any number of fixed-width columns can be done trivially with “pure” CSS. The standard technique is to use floats. There are two problems with this technique (compared to tables):

  1. If the content in one of the columns stretches it, your “columns” can drop off below; and
  2. There is no automatic equalization of height. You can give each column a fixed height but one of the beauties of tables is that all cells in a row automatically have the same height.

CSS3 needs to do better.

4. Liquid Column Layout

A frequent question on the Web is people wanting a “pure” CSS solution that will have a fixed column (on the left or right) with the other div taking up the remaining space, making this a liquid layout.

The standard technique alternative to a table with one fixed column is to use floats and negative margins, which is rather unintuitive.

5. Full Window Height

Again this can be done but to get cross-browser support you have to use the right combination of CSS attributes on the right elements. It just needs to be easier.

6. Fixed Height Header with Full Window Height

The standard technique for this seems to be absolute positioning of the header and adding padding to the top of the content or columns so they’re properly 100% height. We need a more intuitive solution.

7. Variable Height Header with Full Window Height

For example, How can I convert this table based layout to CSS? demonstrates a trivial table-based solution to the problem of full height where the header height is unknown. I’ve seen no adequate CSS solution to this problem to date.

8. Sensible Numbered List Styling

It still astounds me that HTML then CSS only provided one way of styling numbered list items (ie “1.”). Particularly because other variants are arguably more common (eg “1)”, “(1)”). I’ve seen so many pseudo-lists rendered with tables to get around this problem (floats and negative margins should also work). This one I consider to be a particularly bad use of tables but there it is.

Oddly, the solution to this seems to be Generated content, automatic numbering, and lists. Having counters I guess is useful for chapter headings and the like but it’s completely overkill when all I want to do is style a list item. We need better.

9. Sensible Column Spacing

This is another one that just astounds me that it’s still a problem. Going back to HTML 3.2 up to the present day, there are various methods of putting space around, say, table cells. The problem is that all these solutions ignore the most common (in my experience) use case for tables.

Let’s say I need a table with 7 columns. I want that table to be the full width of the container. I just want a gap between each cell in each row. Using margins or padding I’ll end up with extra space on the left and/or right where the cell doesn’t go quite to the edge. The current solution for this is to treat the first or last cell different by adding a class to it or using inline styles.

Now with CSS3 this will get easier thanks to the :last-child and :first-child pseudo-elements but honestly, why can’t we just have a CSS attribute on tables that controls the space between cells putting space to the far left and far right?

10. Related Variable Height

Tables give you for free the ability for cells in the same row to have the same height. We need to be able to tell the browser that two (or more) cells will have exactly the same height without specifying a fixed height.

Suggestions of using display: table-cell (when supported widely enough) raise the inevitable question: well if you’re using table layout anyway, how is a div with table-cell display any better than a td?

11. Related Variable Width

Imagine you’re producing a report of some kind that has a number of tables on it. Those tables show essentially the same type of data but you might have one table per state, for example. Generally speaking you want those tables to have the same cell widths. The only way to do that currently is to give each cell a fixed width. This isn’t particularly flexible and if content in one table stretches the cell width the whole system breaks down anyway.

One (nasty) workaround to this is to generate the entire report as one table with, say, 7 columns (assuming the table per state has 7 columns). The table has no borders. The content in between the “tables” is merely a single row of that table with a colspan of (in this case) 7. That way the tables will layout exactly the same and can be variable width too.

CSS3 needs to provide a better solution for “related” layout like this.

12. Anchoring Blocks to the Bottom of Containers

Place content (particularly floats) at, say, the bottom right of a container is a real problem.

The Elephant in the Room

There is a lot of talk lately about CSS3 and HTML5. What we seem to be ignoring is that we’re still supporting IE6 and probably will be for at least a year or two more. To get HTML5/CSS3 support we’re looking at Chrome 4+ (3 partial), FF 4+ (3.5/3.6 partial) and IE9+.

How long will it be before those browsers (which mostly don’t exist yet) account for over 90% of users? It’s no wonder the HTML5 roadmaps goes out to 2022. So does any of this matter? Won’t we simply be developing against HTML4 and CSS2(ish) for years to come?

Conclusion

I see a lot of talk about “cool” features like animations in CSS3. Personally I wish the W3C CSS working group would spend a bit more time dealing with real problems that cover common usage patterns.

But then again, even if they do, will it even matter?

It’s Time We Stopped Rewarding Projects with Crappy Documentation: Open Source is No Excuse

In the last month, I shared my experiences with Spring Batch—not all positive—which led me to ponder the political correctness of criticizing open source software. Time to move on.

This week I’ve been writing unit tests for Java. In the past I’ve always used JUnit for no other reason than it’s the first such unit testing framework I learned. When doing repeated tests, you have a couple of options.

Firstly, you can create a single test that loops across a list of inputs and expected results. For example:

public class RepeatTest {
  private final static List TESTS = Arrays.asList(...);

  @Test
  public void testData() {
    for (TestData data : TESTS) {
      assertEquals(doSomething(data), data.getExpectedResult();
    }
  }
}

You get the idea. This of course is coarse-grained.

Wherefore Art Thou, Documentation?

I had a recollection from years past that you could create a parameterized test set up in JUnit (although I had to google to figure out that’s what it was called) so I went looking for info on it.

Just take one look at http://www.junit.org/. Seriously, take it all in. I have never in my life seen a site so woeful for a framework that is as commonly-used and allegedly mature as JUnit. Considering I’ve seen Apache projects whose documentation borders on the criminally negligent, that’s really saying something.

Take Log4J as a prime example. I will say that the introduction to Log4J is quite reasonable. The problem? That’s all there is. Sorry, let me retract that: there is documentation. You just have to pay for it. I’m not sure how much influence Ceki Gülcü—the author of that book and the guy who started Log4J—still has (he has since moved on) but if anyone who was running a project (and thus deciding to a large degree what the priorities are and who does what) then if they were also selling documentation to that same project it in the very least has the appearance of a conflict of interest.

But back to JUnit. Consider the documentation. The "cookbook" is only a cookbook in the sense that “tear lid and microwave on HIGH for 8 minutes” on the back of a frozen dinner can be considered a recipe. There is an example of what I was looking for in the Javadoc for @Parameterized but the real answer came from the blogosphere, notably Writing JUnit 4 parameterized tests with varargs data and a few others.

I’ve previously evaluated TestNG as a unit testing framework. Although there didn’t seem to be anything wrong about it there wasn’t anything really compelling either. Test dependencies and the like just seemed complicated to me and not how I tend to use unit testing. Plus you generally find people know JUnit works and people know how to use it. Well, the basics anyway: @Before, @Test and @After, which covers 95%+ of cases.

It’s worth nothing that TestNG has extensive support for parameterized tests.

It’s the Documentation, Stupid

That’s not a compelling reason to switch though but I’ve now found one: documentation. I am gushing like a schoolgirl over TestNG’s documentation. There are examples of how to integrate into different IDEs—not just Eclipse (my pet peeve as a diehard IntelliJ IDEA user). The user guide is detailed and extensive. There is even integration documentation for Ant and Maven as well as a migration guide for JUnit (here is a comparison).

Documentation alone is reason enough I will in future choose TestNG over JUnit without exception where I have a choice.

Some open source projects have excellent documentation. I consider Spring documentation to be the gold standard of OSS documentation. Even Spring Batch, which I criticized, has good documentation. It’s not perfect. It’s coverage in some areas is a little light and it could use some more examples but overall it’s pretty good. TestNG documentation is excellent.

This will inevitably invite several counterarguments.

RTFS

This one is predictable. Some will argue that the source code is sufficient documentation. Bollocks to that. While at some point it is inevitable you will end up reading or stepping through the source code of any framework or library you use for any non-trivial purpose, the fact that you have the source code is no substitute for high-level documentation that describes the overall architecture and design principles as well as how to get started and how to do common tasks.

DIY

Another popular defence of poor OSS practices is you can get involved and do it yourself. While theoretically true it is typically completely impractical. For one thing, before you can document something you have to know it. How do you learn it without reading the (non-existent) documentation?

Depending on the size and complexity of the project in question, it might takes anything from days to months or even years to get up to speed (eg the Linux kernel). Even spending days getting up to speed on something that isn’t documented is typically time you don’t have because you have a job you’re trying to get done.

It’s also unlikely to happen. Once you figure out how to do something, what are the chances that someone will then turn around and spend a few more days writing up some half-decent documentation? Chances are they’ll be so frustrated by their efforts to date it’ll simply be time to move along to the next problem.

It’s Free

This is the one that bothers me the most. It comes up in any criticism of open source and is little more than a justification for laziness. The idea seems to be that you can’t complain if it didn’t cost you anything.

Oh but you can.

You see, many people approach open source projects as a means of self-promotion. They simply want to make a name for themselves. That’s fine. I have no problem with that (I am after all writing a blog). The problem is it’s not about you.

Depending on your job, your company and your jurisdiction, if you are a consultant of some kind and thus dealing with people who don’t know a lot about software development you will typically have a duty of care to that party. You are making representations regarding your professionalism, capabilities, skills and/or experience. That other party is making has legal recourse for negligence (or even fraud) should you misrepresent yourself or fail to fulfil your duties.

When you create a library or framework or just some significant piece of code that you decide to share you are making representations—implicit and/or explicit—regarding it’s efficacy. What’s more your users will typically know a lot less than you.

What you should remember when you hand out “free” software is that if that software doesn’t work as advertised, it’s unreliable or the documentation is non-existent or misleading then you are doing it’s users harm. While you may not be legally liable for such deficiencies you have an ethical responsibility to ensure what you deliver does what it says it does.

Conclusion

Open source is no excuse. It costing nothing is no excuse. Like doctors, we programmers should first do no harm. We don’t have an Hippocratic Oath but that doesn’t mean we shouldn’t do what’s right.

I for one am tired of accepting mediocre libraries with documentation that is coming Real Soon Now [tm] (“Under Construction”). We should not tolerate second-rate offerings. Reward projects like TestNG who have done the right thing by us.

I don’t care if writing documentation is not fun or sexy. I care that you can explain how it works and provide me with some confidence that it does what you claim it does. Otherwise I’m just not interested anymore.