Building Software Development Expertise – Using The Dreyfus Model

01/08/2009 1250 words 6 min read

I’ve recently been thinking about how we build our skills when we work in teams, more, how do we as software developers become expert at what we do? Is it an organic process or do most who gain expertise in particular areas of software development actively pursue the knowledge? Is it mostly about self-study or the environment or some sort of combination of both? The point is, if we can somehow get to the root and distill the special magic of how developers become experts, we can try and build this expertise within our teams in a much more targeted way as opposed to waiting for it to happen without any direct intervention. Sounds intriguing doesn’t it, ‘building’ experts rather than waiting for them to emerge. Most places will have some kind of training program, or ad-hoc training courses that you can go on, but I have never seen any team actually directly try to build up an expert (in a particular skill) within the team, over the course of time. It would surely be a chancy undertaking, how can you tell if a person is even capable of gaining expertise and even if they did, how do you measure the extent of this expertise? I am not the first person to ask these questions :), so lets look at some of the existing body of knowledge on the subject.

The Dreyfus Model

In case you didn’t know, the Dreyfus model deals with how people attain and master skills. The Dreyfus model is applicable on a per skills basis and recognizes five 5 stages of development for a person to get to expert level.

Novice – at this stage a person doesn’t really know how to be successful at what they are doing, they don’t want the big picture, but instead want unambiguous rules that will allow them to successfully accomplish their tasks. They need quick successes and will be confused when anything out of the ordinary happens.
Advanced Beginner – want to find the information they need quickly, they have some context regarding what they are doing but still not really interested in the big picture. They can use the skill at a more competent level than a novice but will still have trouble when things go wrong.
Competent – have the ability to see the big picture and how it would affect the outcomes. They can deal with problems when something unexpected occurs they are able to draw on their experience when necessary. However their ability to reflect on what they do is still somewhat limited.
Proficient – need the big picture and will not function well without it. They can reflect on how well they are doing and draw upon their experience to improve. Can draw relevant information from the experience of others rather than taking it in as a whole.
Expert – no longer require rules and guidelines to function, will intuitively know what to do in a given situation. They will also constantly seek to find better ways of doing things.

If you want to know more about it you can always go to our second most favorite website (the first being Google :)). Basically what the Dreyfus model postulates is that novices will need strict rules and guidelines to succeed while experts will be stifled by strict rules and will not be able to apply their expertise/intuition to the fullest. In a software development context this means that a strict process would benefit a beginner and they will be more effective at what they do while more expert people will need a more organic environment to allow them to take advantage of their expertise. Andy Hunt examines the Dreyfus model in quite a bit of detail in his book, “Pragmatic Thinking And Learning: Refactor Your Wetware” and he presents it in a very favorable light regarding it’s usage in software development. While I find nothing wrong with the model itself, I do believe it misses a vital facet of the picture.

Does It Hold Up For Software Development

Applying the Dreyfus model – to software development especially – is potentially a great way to allow more expert people on the software team, the use of their fullest abilities, however it does nothing to help us build experts from the more junior people on the team! More than that, I believe that rigidly applying the Dreyfus model in the first two stages of development (novice and advanced beginner) will actually hurt the chances of a person attaining expert level knowledge in a particular skill.

Things are never that clear-cut when it comes to software. We may be lucky to have experts on our team, but on the other hand we may not. With the vast array of skills that are necessary in modern software development, most of the time the majority of people will be at most advanced beginners in a particular skill and sometimes not even that. And anyway how do you define a skill in the first place, there are no clear boundaries. Being an expert in one skill might make you competent or proficient in a dozen others and you would not be well served by being treated as a novice when it comes to those skills.

I don’t believe you can build a healthy agile team if you attempt to treat your junior people like Dreyfus novices or advanced beginners, especially when you take the typical developer mindset into account. Ever looked at a piece of code and thought that you could do better, and were itching to try? Well most developers have, it is just how we think, regardless of what our skill level is in that particular area. I believe the best thing you can do regarding the more junior people on your team would be to treat them in a similar fashion as you would your experts. Show them the big picture, it is patronizing to think that they don’t want to know this and it would confuse them. Put a mentoring safety net in place but allow your junior people room to grow and experiment, the people who truly have potential will bloom in this environment. The rule is guidance with a light hand, NOT a clear set if hard and fast rules. Rigid processes and clear unambiguous guidelines are a crutch that no developer should allow themselves to get used to. No modern software project will have the luxury of clarity and no ambiguity, the sooner everyone gets used to it the sooner they will become more effective and productive.

So Can We BUILD Experts

It is still very fuzzy. I don’t believe the Dreyfus model helps in any significant way to build expertise within a team. It will however do us good to keep this model in mind when creating an environment where a software team can thrive. Besides relying on a developer’s personal drive to learn and get better; and providing support if needed, I don’t know of any other way to ensure that expertise develops within a software team. As usual it comes down to trying to hire great people and trying to create the right kind of atmosphere where those people can succeed. The rest you leave to nature.

I would like to come back to this topic at some point, so if anyone has any tips or ideas regarding building expertise within individuals in a software team, I would love to hear what you have to say.

Image by Marco Bellucci

Stopping People From Switching Off During Standups

30/07/2009 906 words 5 min read

I sometimes find that as people work together in a team for a long period of time they tend to start attending stand-ups on autopilot. Instead of being a laser focused status update for everyone regarding what is going on in the team, it becomes part of your daily routine BEFORE the real work starts (i.e. morning coffee, stadup, check e-mail, then the REAL work starts). People stop engaging, they’re not fully there and as a consequence the standup looses part of it’s value as people are simply waiting for it to be over. When you notice this happening in your team, it is time to do something different, change the format, change the location, change the time, the point is to find some creative way to snap people out of their groove and get them actively listening and participating in the standup.

The Traditional Standup

The traditional standup is when each member of the team in turn answers the 3 questions while everyone else listens.

What have they been doing since the previous standup?
What will they be doing before the next one?
Is there anything impeding them in their work?

While those questions are great, they do nothing to get people engaged and snap their mind to attention. If this is the format the team has been using for a while it may be partly responsible for putting everyone to sleep (this is the mind’s natural response to routine). This doesn’t mean we need to abandon the traditional standup format, but we do need to do something to liven things up a little. Here are some things that I would suggest.

If you don’t already use it, try getting a speaking token. Rather than going around the room, throw a ball to each other, whoever has the ball – speaks. People need to focus in case they are next to get the ball, so the mind becomes more active as a result. This is good, but can quickly become routine as well, especially if you allow people to call for the token rather than getting the previous speaker to pick the next one.
Introduce ‘homework’. I don’t mean anything fancy, but something like starting every standup with a joke is another idea. Every day a different person has to tell a joke to start the standup. It forces people to think about the standup outside the standup itself (they need to prepare a joke after all). The joke also snaps everyone else’s mind out of their groove and gets them listening, you will be able to complete the standup before this effect wears off. As a side-effect it also promotes team bonding.
Get peoples minds working by introducing a random twist into the standup order every day. For example, one day ask the people to talk in alphabetical order by last name, they will need to work this out and are engaged as a result. Next day get them to do it by height. Next might be in order of time of arrival at work etc. The point is to not let people coast on autopilot and get their minds active.
Lastly as I mentioned previously, who says the standup has to be in the same place, or even at the same time all the time. You can, for example, decide on a different standup time or place at the start of every iteration. One iteration is not long enough to get people into a routine. Try having a standup to end the day, rather than to start one, or maybe right before lunch, nobody said it HAD to be in the morning (of course changing standup time and place every day is not a good idea for obvious reasons).

Of course you don’t need to follow the traditional format at all if you find that you get little value from it or if people are simply bored with it.

Doing It Another Way (Story Focused Standups)

One of the different ways to have standups that has been suggested was to make them story focused rather than people focused (see this post by Dave Nicolette and this infoq post). Here, rather than each person reporting on what they did, will do etc. The team gathers around the task board and examines each of the cards that are in play at the moment. Dave calls this “_walking the board_”. One of the ways to do this is to designate a ‘champion’ for each card. It would be their job to report on the progress of that card while it is in play. Of course you could also just try and make this organic where anyone who has something of relevance to say about the card just pipes in. The danger to guard against here is lack of focus (trying to solve the issues then and there), remember that it is still a standup, so keep it short and to the point. Regardless of how you do it; it is certainly a different way to have your daily standup which can revitalize the experience and up the level of engagement if you find that your team is lacking energy during your daily standup.

Do you know of/use other ways to hold a standup that are different from the traditional approach? Or maybe you have more tips to make a traditional standup more fun. If you do, please share your thoughts in the comments.

Image by tskdesign

How To Write A Simple Web Crawler In Ruby

28/07/2009 1249 words 6 min read

I had an idea the other day, to write a basic search engine – in Ruby (did I mention I’ve been playing around with Ruby lately). I am well aware that there are perfectly adequate ruby crawlers available to use, such RDig or Mechanize. But I don’t want to use any libraries for the higher level functions of my search engine (crawling, indexing, querying etc.), at least not for the first version. Since the main idea is to learn (while doing something fun and interesting) and the best way to learn is to sometimes do things the hard way. Now that I have ensured I don’t get eaten alive for not reusing existing code, we can move on :).

I will examine all the different aspects of what makes a search engine (the anatomy) in a later post. In the meantime I believe doing something like this gives you an opportunity to experience first-hand all the different things you have to keep in mind when writing a search engine. It gives you chance to learn why we do SEO the way we do; it lets you play with different ruby language features, database access, ranking algorithms, not to mention simply cut some code for the experience. And you get all this without touching Rails, nothing against Rails, but I prefer to get comfortable with Ruby (by itself) first.

Well to dive right into it, I decided to write the crawler first, after all, you can’t have a search engine without a nice web crawler. First thing first, some basic features:

should be able to crawl the web (basic general functionality)
must be able to limit the depth of the crawl (otherwise will potentially keep going for ever and will also run out of memory eventually because of the way it’s written)
must be able to limit number of pages to crawl (even with a depth limited crawl, might still have too many pages to get through depending on the starting set of domains)
must be able to crawl just a single domain (you would also be able to limit this by number of pages)
the only output it will produce is to print out the fact that it is crawling a url

If you want to dive right into the code and explore you can download it here. For the rest, here is how it works. Firstly to run it do the following:

ruby search-engine-main.rb -c web -d 3 -p 100 -f 'urls.txt'

where:

-c (is either ’web’ or ‘domain’)

-d (is the depth of the crawl, it will only look at links this many levels below the initial urls)

-p (is the page limit, will not crawl more than this many pages regardless of other parameters)

-f (the file to use as input, simply contains a list of starting urls on separate lines, will use the first one from this file for domain crawl)

Our nice and clean entry point looks like this :):

ruby argument_parser = CommandLineArgumentParser.new argument_parser.parse_arguments spider = Spider.new url_store = UrlStore.new(argument_parser.url_file) spider.crawl_web(url_store.get_urls, argument_parser.crawl_depth, argument_parser.page_limit) if argument_parser.crawl_type == CommandLineArgumentParser::WEB_CRAWLER spider.crawl_domain(url_store.get_url, argument_parser.page_limit) if argument_parser.crawl_type == CommandLineArgumentParser::DOMAIN_CRAWLER

We don’t really care about this too much since this is not where the real fun bits are, so lets move on to that.

The Spider

The main worker class here is the Spider class. It contains 2 public methods, crawl_web and crawl_domain. Crawling the web looks like this:

ruby def crawl_web(urls, depth=2, page_limit = 100) depth.times do next_urls = [] urls.each do |url| url_object = open_url(url) next if url_object == nil url = update_url_if_redirected(url, url_object) parsed_url = parse_url(url_object) next if parsed_url == nil @already_visited[url]=true if @already_visited[url] == nil return if @already_visited.size == page_limit next_urls += (find_urls_on_page(parsed_url, url)-@already_visited.keys) next_urls.uniq! end urls = next_urls end end

As you can see it is not recursive as this makes it easier to limit the depth of the crawl. You can also tell that we take special care to handle server side redirects. The class also keeps the urls already visited in memory, so as to guard against us getting into loops visiting the same several pages over and over. This is not the most efficient way, obviously, and will not scale anywhere past a few thousand pages, but for our simple crawler this is fine. We can improve this later.

Crawling a domain looks like this:

ruby def crawl_domain(url, page_limit = 100) return if @already_visited.size == page_limit url_object = open_url(url) return if url_object == nil parsed_url = parse_url(url_object) return if parsed_url == nil @already_visited[url]=true if @already_visited[url] == nil page_urls = find_urls_on_page(parsed_url, url) page_urls.each do |page_url| if urls_on_same_domain?(url, page_url) and @already_visited[page_url] == nil crawl_domain(page_url) end end end

This time the algorithm is recursive as it is easier to crawl a domain this way and we don’t need to limit the depth. Everything else is pretty much the same except we take special care to only crawl links on the same domain and we no longer need to care about redirection.

To get the links from the page I use Hpricot. I know I said I didn’t want to use too many libraries, but parsing html by hand would just be torture :). Here how I find all the links:

ruby def find_urls_on_page(parsed_url, current_url) urls_list = [] parsed_url.search('a[@href]').map do |x| new_url = x['href'].split('#')[0] unless new_url == nil if relative?(new_url) new_url = make_absolute(current_url, new_url) end urls_list.push(new_url) end end return urls_list end

The challenging part here is handling the relative urls and making them absolute. I didn’t do anything fancy here. There is a helper module that I created (UrlUtils – yeah I know, great name :)), this doesn’t have anything too interesting, just blood, sweat and string handling code. That’s pretty much all there is to it. Now there are a few points that we need to note about this crawler.

It is not an ‘enterprise strength’ type of crawler, so don’t go trying to unleash it on the whole of the web, do make liberal use of the depth and page limiters, I wouldn’t try to get it to handle more than a few thousand pages at a time (for reasons I noted above). Some other limitations are as follows:

won’t work behind proxy
won’t handle ftp, https(maybe) etc. properly
client side redirects – you’re out of luck
if link not in href attribute, i.e. javascript – no good
relative url handling could probably be improved some more
will not respect no-follow and similar rules that other bots will e.g. robots.txt
if list of starting urls too big, can grind it too a halt
probably a whole bunch of other limitations that I haven’t even begun thinking of

Of course there is an easy way to make this guy a little bit more scalable and useful – introduce indexing. This would be the next thing to do cause, even a simple little search engine would need some indexing. This would allow us to stop storing so much stuff in memory and also retain more info about pages which would let us potentially make better decisions, but that’s a story for another time :). Finally it probably goes without saying that I didn’t test this guy exhaustively and I only ran it on XP (too lazy to boot into my Linux :)). Anyway, have a play with it if you like, and feel free to suggest improvements and point out issues (or just say hello in general) while I start thinking about an indexer.

You can download all the code here.

Images by jpctalbot and mkreyness