|
Jun 15
2008
|
Distributed Software DevelopmentPosted by pablo santos in Untagged |
Distributed Software Development is all about enabling teams and individuals to seamlessly work through the Internet as if they were sitting close to each other, even when they’re worlds apart.
Software Configuration Management (SCM) systems play a key role as the cornerstones of asset distribution and sharing among the team members.
A simple picture of the software development tool stack will show the SCM sitting very deep in the chain. It is the cornerstone to build all the development process around. The SCM distributes the code, diagrams, documents, design material, help files, everything which composes a development. Then tools like compilers, build systems, debuggers, IDEs, profilers and so on will take these assets to help developers create the software.
Any change performed using the different tools available will be controlled by the SCM, and it will be made available, on demand, attending to certain constraints, to the rest of the team.
And the team, even when they sit on the same room, is on a network. Documents, sources, images, html files and design diagrams, they all have to travel through the network from one computer to another.
All changes are tracked by the SCM server, which resides on a separate machine on the network. It is the center piece to make the development team flow.
And then the Internet enters the scene:
- Different teams can be on very distant locations, not anymore on the same room or even the same building, but they need to continue working on the same code base, sharing the changes, evolving the software.
- Developers can be located at the client’s site, making specific changes attending to detailed feedback given on-site.
Then the SCM server has to continue providing the same range of services to the developers, but this time they’re not on the same network, not even on the same network domain.
The first option is simple: if the team relies on network facilities, then they can continue working on a virtual network (VPN) even when they’re worlds apart.
Unfortunately the story is not that simple and easy.
- Network connections can be unreliable or slow, making the distant team members work slow, error prone and unproductive.
- Direct connection to the SCM server can be discouraged due to security restrictions or simple unavailable.
Of course developers can still work on their local code copies but then we can’t pretend to say they get advantage of the same facilities they’ve when they sat on the office. They don’t even have the basics!
DSD
Beyond methodologies, beyond best practices or the preferred programming language of choice, there’s something which really makes a difference: you need a tool to collaborate. You modify some code at your office at
DSD is all about versioning. There are other related issues, of course: from challenges in project management to dealing with different time zones to handle with a variety of cultures all around the world. But, primarily, you’ve to create an environment where people can work almost as they were all sitting together on the same room, even when they’re worlds apart, at least code wise.
DSD approaches
So far the industry has provided a range of solutions to address the distributed software development issue, ranging from the VPN based solutions to full distributed SCM.
- Centralized SCM: it is the conventional approach. There’s a single server hosting all the changes and coordinating all the developers’ efforts. When multiple separated locations enter the scene, the system relies on the networking facilities to continue providing the service to the users. VPNs are a regular alternative on this scenario. Other options are internet servers were connection is directly set up from the client to the server through the internet, without a VPN. This centralized approach has all the drawbacks of the underlying network infrastructure: if the network goes down, the impact is clear on the developers. SVN, Source Safe or CVS are clear samples of this alternative.
- Proxy based multi-site: the central server is helped by a set of proxy servers which act as data caches. When the network goes down the clients can still access copies of the data through their local caches. The benefit is an enhanced capability to work on disconnected scenarios. The downside is that proxies aren’t full servers so they’re normally only able to support read access but they can’t provide write operation support. They can also cache write operations (like check-ins) if they implement a delayed operation mode but concurrent changes aren’t allowed. It is a clear step ahead in terms of enhanced support in the event of network problems, but it still doesn’t provide a way for developers to work seamlessly when the connection is down. Change reconciliation is not supported. Proxy servers are designed to be deployed per site but not per developer so roaming users working on laptops aren’t supported. Systems like Perforce, Team Foundation Server or Accurev are samples of this proxy based approach.
- Mastership based multi-site: a server is installed at each remote location and changes are replicated back and forth with some restrictions. The replication unit is the branch, usually, and the strong restriction is that a site is set to be the owner of a given branch or set of branches at a time. Developers are free to make changes at their sites provided the site is in control of the branch they’re using. This way concurrent change conflicts are avoided because only one developer can modify a file or directory at the same branch at the same time on different sites, which greatly reduce the problem. It is an advantage over proxy based replication because it creates the illusion of unlimited changes at distant sites provided that a set of mastership (or ownership) rules are enforced. Teams at separate locations can work together at the same code base and set up their servers to be replicated regularly which enforces the sites have the right sources. Network problems are not an issue anymore as teams can continue working even when the connection is down. The downsides are: the replicated servers conceived to be deployed following a one per site strategy. Each developer can’t run his own server because they’re heavy in terms of resources so roaming developers are not yet supported. Clearcase Multi-site is a clear example of this multi-site approach.












