Always nice to read a new retelling of this old story.
TFA throws some shade at how "a single get of the office repo took some hours" then elides the fact that such an operation was practically impossible to do on git at all without creating a new file system (VFS). Perforce let users check out just the parts of a repo that they needed, so I assume most SD users did that instead of getting every app in the Office suite every time. VFS basically closes that gap on git ("VFS for Git only downloads objects as they are needed").
Perforce/SD were great for the time and for the centralised VCS use case, but the world has moved on I guess.
Some companies have developed their own technology like VFS for use with Perforce, so you can check out the entire suite of applications but only pull the files when you try to access them in a specific way. This is a lot more important in game development where massive source binary assets are stored along side text files.
It uses the same technology that's built into Windows that the remote drive programs (probably) use.
Personally I kind of still want some sort of server based VCS which can store your entire companies set of source without needing to keep the entire history locally when you check out something. But unfortunately git is still good enough to use on an ad-hoc basis between machines for me that I don't feel the need to set up a central server and CI/CD pipeline yet.
Also being able to stash, stage hunks, and interactively rebase commits are features that I like and work well with the way I work.
Doesn’t SVN let you check out and commit any folder or file at any depth of a project you choose? Maybe not the checkouts and commit, but that log history for a single subtree is something I miss from the SVN tooling.
You can indeed. The problem with this strategy is that now you need to maintain the list of directories that needs to be checked out to build each project. And unless this is automated somehow, the documentation will gradually diverge from reality.
Can you not achieve the log history on a subtree with `git log my/subfolder/`? Tools like TortoiseGit let you right click on a folder and view the log of changes to it.
Yes it can, but the point is that in a git repo you store the entire history locally, so whenever you clone a repo, you clone its history on at least one branch.
So when you have a repo that's hundreds of GB in size, the entire history can be massive.
My firm still uses perforce and I can't say anyone likes it at this point. You can almost see the light leaves the eyes of new hires when you tell them we don't use git like the rest of the world.
I moved to Google from Microsoft and back when employee orientation involved going to Mountain View and going into labs to learn the basics, it was amusing to see fresh college hires confused at not-git while I sat down and said "It's Source Depot, I know this!"[1]
Yeah it's an issue for new devs for sure. TFA even makes the point, "A lot of people felt refreshed by having better transferable skills to the industry. Our onboarding times were slashed by half".
Interesting to hear it was so much of a problem in terms of onboarding time. Maybe Source Depot was particularly weird, and/or MS were using it in a way that made things particularly complicated? Perforce has never felt especially difficult to use to me, and programmers never seem to have any difficulty with it. Artists and designers seem to pick it up quite quickly too. (By and large, in contrast to programmers, they are less in the habit of putting up with the git style of shit.)
> Interesting to hear it was so much of a problem in terms of onboarding time. Maybe Source Depot was particularly weird, and/or MS were using it in a way that made things particularly complicated?
It was not. It was literally a fork of perforce with executable renamed to sd.exe from p4. Command line was pretty much identical.
It's a fairly typical Unix tool, but: worse! It's got a rather cryptic and unhelpful command line UI that's often fiddly to use, rather requires you to understand its internals in order to work with it, and has lots of perplexing failure modes. There are GUIs for it, which do help, but people moan at you if you use them, and/because they're often opinionated and oriented towards some specific workflow or other. And they don't really stop it being fiddly to use, you still need to bear in mind how it works internally to use it, and the perplexing failure modes remain.
It's a generalisation, but, by and large, artists and designers don't enjoy using these sorts of tools. Also, they are more likely to be working with unmergeable files, something that git isn't really designed to work with.
(Programmers often don't like these sorts of tools either, but - again, a generalisation - they're harder to avoid if you're a programmer, so the average one is typically a bit more practised at putting up with this crap.)
I cannot believe that new hires would be upset by the choice of version control software. They joined a new company after so many hoops and it's on them for having an open mind towards processes and tools in the new company.
I feel like I’ve got an open mind towards processes and tools; the problem with a company using anything other than Git at this point is that unless they have a good explanation for it, it’s not going to be an indicator that the company compared the relative merits of VCS systems and chose something other than Git - it’s going to be an indicator that the company doesn’t have the bandwidth or political will to modernize legacy processes.
Yeah but as a new hire, one doesn't yet know whether there is a good explanation for using a non-git tool. It takes time to figure that out.
A legacy tool might be bad, or it might be very good but just unpopular. A company that devotes political will to modernize for the sake of modernizing is the kind of craziness we get in the JS ecosystem.
> cried of happiness when we moved to git from SVN
You bring back some real memories with this phrase.
I recall moving from RCS and CVS and thinking: "Oh, this is a real improvement." (Forgive me; I was young.)
Then, I moved from CVS to SVN and thought: "This is revolutionary. I can rename and merge." (Again, please withhold throwing your tomatoes until the end of the story!)
Later, projects grew large enough, that SVN became horrible because of how I/O worked on my PCs/networks (file by file, long before 1G network to developer PCs and horribly slow spinning disks).
The upgrade to Git was epic. Sure, the commands are weird and complex. More than 10 years later, I still need to lookup a command once a month, but it beats the pants off anything before. Hat tip to all the people (Hamano-san, et al) who have contributed to Git in the last 20 years. My life as a developer is so much better for it.
> I cannot believe that new hires would be upset by the choice of version control software.
I can, if the version control software is just not up to standards.
I absolutely didn’t mind using mercurial/hg, even though I literally haven’t touched it until that point and knew nothing about it, because it is actually pretty good. I like it more than git now.
Git is a decent option that most people would be familiar with, cannot be upset about it either.
On another hand, Source Depot sucked badly, it felt like I had to fight against it the entire time. I wasn’t upset because it was unfamiliar to me. In fact, the more familiar I got with it, the more I disliked it.
You missed the point. The point is, a new hire being asked to use a new version control software likely doesn't know whether the tool is more like hg or more like source depot. It takes times to figure out that. So they need to reserve judgement.
The problem is that you come to a prestigious place like Microsoft and end up using horrible outdated software.
Credit where credit is due at my time at Excel we did improve things a lot (migration from Script# to TypeScript, migration from SourceDepot to git, shorter dev loop and better tooling etc) and a large chunk of development time was spent on developer tooling/happiness.
But it does suck to have to go to one of the old places and use sourcedepot and `osubmit` the "make a change" tool and then go over 16 popups in the "happy path" to submit your patch for review (also done in a weird windows gui review tool)
Perforce is sufficiently idiosyncratic that it's kinda annoying even when you remember the likes of SVN. Coming to it from Git is a whole world of pain.
But they are Analysts and know corporate speak and are really good at filling their schedules with meetings! They must be so busy doing very meaningful work!
Discord I get, at least from a community or network effect, but Bitbucket? I can’t figure out why anyone but a CTO looking to save a buck would prefer Bitbucket.
We use BitBucket where I work. Due to certain export regulations it's simpler for us to keep as many services as possible on-prem if they're going to contain any of our intellectual property, so BitBucket Server it is. There are other options of course, but all of the cloud solutions were off the table.
I wouldn't know, the choice was made years before I joined the company, and I haven't come across anyone who told me what considerations went in back then.
nothing prevents them to run a gpu locally or on their own infra.
I was asking because I wonder what the enterprises that want to both use AI on their workflows like LLM's and have air-gap owned 100% data and pipelines are doing rn.
Feels like one of the few areas where to compete with big labs to me, might be wrong
External AI are banned, local or otherwise on-prem models are allowed. We're currently experimenting with some kind of llama instance running on one of our servers, but I personally don't use it much.
VFS does not replace Perforce. Most AAA game companies still use Perforce. In particular, they need locks on assets so two people don't edit them at the same time and have an unmergable change and wasted time as one artist has to throw their work away
I'm a bit surprised git doesn't offer a way to checkout only specific parts of the git tree to be honest. It seems like it'd be pretty easy to graft on with an intermediate service that understands object files, etc.
I spent nearly a week of my Microsoft internship in 2016 adding support for Source Depot to the automated code reviewer that I was building (https://austinhenley.com/blog/featurestheywanted.html) despite having no idea what Source Depot was!
Quite a few devs were still using it even then. I wonder if everything has been migrated to git yet.
With a product like this that spans many decades would the source repo contain all of these versions and the changes over time. For instance word 97 - 2000 - 2003 - 2007, etc..
I would hope they forked the repo for each new version, to keep the same core but being free to refactor huge parts without affecting previous versions.
VSS was picked up via the acquisition of One Tree Software in Raleigh. Their product was SourceSafe, and the "Visual" part was added when it was bundled with their other developer tools (Visual C, Visual Basic, etc). Prior to that Microsoft sold a version control product called "Microsoft Delta" which was expensive and awful and wasn't supported on NT.
One of the people who joined Microsoft via the acquisition was Brian Harry, who led the development of Team Foundation Version Control (part of Team Foundation Server - TFS) which used SQL Server for its storage. A huge improvement in manageability and reliability over VSS. I think Brian is retired now - his blog at Microsoft is no longer being updated.
From my time using VSS, I seem to recall a big source of corruption was it's use of network file locking over SMB. If there were a network glitch (common in the day) you'd have to repair your repository. We set up an overnight batch job to run the repair so we could be productive in the mornings.
I used VSS in the 90s as well, it was a nightmare when working in a team. As I recall, Microsoft themselves did not use VSS internally, at least not for the majority of things.
Yes, I used VSS as a solo developer in the 90s. It was a revelation at the time. I met other VCS systems at grad school (RCS, CVS).
I started a job at MSFT in 2004 and I recall someone explaining that VSS was unsafe and prone to corruption. No idea if that was true, or just lore, but it wasn't an option for work anyway.
The integration with sourcesafe and all of the tools was pretty cool back then. Nothing else really had that level of integration at the time. However, VSS was seriously flakey. It would corrupt randomly for no real reason. Daily backups were always being restored in my workplace. Then they picked PVCS. At least it didnt corrupt itself.
I think VSS was fine if you used it on a local machine. If you put it on a network drive things would just flake out. It also got progressively worse as newer versions came out. Nice GUI, very straight forward to teach someone how to use it (checkout file, change, check in like a book), random corruptions about sums up VSS. That checkin/out model seems simpler for people to grasp. The virtual/branch systems most of the other ones use is kind of a mental block for many until they grok it.
My memory is fuzzy on this but I remember VSS trusting the client for its timestamps and everything getting corrupted when someone's clock was out of sync. Which happened regularly because NTP didn't work very well on Windows back in the early 2000s.
It's an absurd understatement. The only people that seriously used VSS and didn't see any corruption were the people that didn't look at their code history.
I used VSS for a few years back in the late 90's and early 2000's. It was better than nothing - barely - but it was very slow, very network intensive (think MS Access rather than SQL), it had very poor merge primitives (when you checked out a file, nobody else could change it), and yes, it was exceedingly prone to corruption. A couple times we just had to throw away history and start over.
SourceSafe had a great visual merge tool. You could enable multiple checkouts. VSS had tons of real issues but not enabling multiple checkouts was a pain that companies inflicted on themselves. I still miss SourceSafe's merge tool sometimes.
Have you used Visual Studio's git integration? (Note that you could just kick off the merge elsewhere and use VS to manage the conflicts, then commit from back outside. Etc.)
As I recall, one problem was you got silent corruption if you ran out of disk space during certain operations, and there were things that took significantly more disk space while in flight than when finished, so you wouldn’t even know.
When I was at Microsoft, Source Depot was the nicer of the two version control systems I had to use. The other, Source Library Manager, was much worse.
Is that what inspired the "Exchange: The Most Feared and Loathed Team in Microsoft" license plate frames? I'm probably getting a bit of the wording wrong. It's been nearly 20 years since I saw one.
And sometimes they loved MSMAIL for the weirdest reasons...
MSMAIL was designed for Win3.x. Apps didn't have multiple threads. The MSMAIL client app that everyone used would create the email to be sent and store the email file on the system.
An invisible app, the Mail Pump, would check for email to be sent and received during idle time (N.B. Other apps could create/send emails via APIs, so you couldn't have the email processing logic in only the MSMAIL client app).
So the user could hit the Send button and the email would be moved to the Outbox to be sent. The mail pump wouldn't get a chance to process the outgoing email for a few seconds, so during that small window, if the user decided that they had been too quick to reply, they could retract that outgoing email. Career-limited move averted.
Exchange used a client-server architecture for email. Email client would save the email in the outbox and the server would notice the email almost instantly and send it on its way before the user blinked in most cases.
A few users complained that Exchange, in essence, was too fast. They couldn't retract a misguided email reply, even if they had reflexes as quick as the Flash.
I re-wrote MSPAGER for Exchange. Hoo boy what a hack that was! A VB3 app running as a service, essentially. I don't know if you remember romeo and juliet; those were PCs pulled from pc-recycle by a co-worker to serve install images.
My neurons are firing on "MSPAGER" in recognition, but romeo and juliet draw blanks - which is a good thing. Because if you know the infrastructure, but you weren't involved in the implementation, then there's typically a bad reason why you know how the sausage is getting made. :)
Also, I survived Bedlam DL3 (but I didn't get the t-shirt).
I wouldn't call either Mail Pump's slow email processing a "benevolent deception", nor Exchange's quick email processing an attempt to be perceived as a fast email server.
In MSMail/Exchange Client/Outlook, the presence of email in the Outbox folder signifies that an email is to be sent, but that the code for sending the email hasn't processed that particular email.
MSMail being slower than Exchange to send email is a leaky abstraction due to software architecture.
Win3.x doesn't support multithreaded apps, using a cooperative multitasking system. Any app doing real work would prevent the user from accessing the system since no user interface events would be processed.
So the Mail Pump would check to see if the system was idle. There are no public or secret/private Windows APIs (despite all the MS detractors) for code to determine if the system is idle - you, the developer, had to fall back to using heuristics. These heuristics aren't fast - you didn't want to declare that the system was idle only to discover the user was in the middle of an operation. So the Mail Pump had to be patient. That meant the email could sit in the outbox for more than a second.
Exchange Server was a server process running on a separate box. When an email client notified the Exchange Server that an email was to be sent (whether via RPC or SMTP command), Exchange Server didn't have to wait in case it would block the user's interaction with the computer. Exchange Server could process the email almost immediately.
But there was a happy resolution to the conundrum - no, the Exchange Server didn't add a delay.
Some architect/program manager had added a "delay before sending/processing" property to the list of supported properties of an email message. The Exchange/Win95/Capone email client didn't use/set this property. But a developer could write an email extension, allow the user to specify a default delay for each outgoing email and the extension could get notified when an email was sent and set this "delay before sending/processing" property, such that Exchange Server would wait at least the specified delay time before processing the email message.
The user who desired an extended delay before their sent email was processed by the Exchange Server, could install this client extension, and specify a desired delay interval.
Outlook eventually added support for this property a few years later.
I notice that Gmail has added support for a delay to enable the user to undo sent emails.
Ha, maybe my old memory is rusty, but I feel like I recognize this name and you had an old blog with some quotable Raymond Chen -- one bit I remember was something like
"How do you write code so that it compiles differently from the IDE vs the command line?"
to which the answer was
"If you do this your colleagues will burn you in effigy when they try to debug against the local build and it works fine"
> Authenticity mattered more than production value.
Thanks for sharing this authentic story! As an ex-MSFT in a relatively small product line that only started switching to Git from SourceDepot in 2015, right before I left, I can truly empathize with how incredible a job you guys have done!
Thanks for the recommendation! I was just about to reread "Soul Of A New Machine", but will try Showstopper instead, since it sounds to be the same genre.
I want to thank dev leads who trained this green-behind-the-ears engineer on mysteries of Source Depot. Once I understood it, it was quite illuminating. I am glad we only had a dependency on WinCE and IE, and so the clone only took 20 minutes instead of days. I don't remember your names but I remember your willingness to step up and help and onboard new person so they could start being productive. I pay this attitude forward with new hires here in my team no matter where I go.
I spent a couple years at Microsoft and our team used Source Depot because a lot of people thought that our products were special and even Microsoft's own source control (TFS at the time) wasn't good enough.
I had used TFS at a previous job and didn't like it much, but I really missed it after having to use Source Depot.
I was surprised that TFS was not mentioned in the story (at least not as far as I have read).
It should have existed around the same time and other parts of MS were using it. I think it was released around 2005 but MS probably had it internally earlier.
SLM (aka slime, shared file-system source code control system) was used in most of MS, aka systems & apps.
NT created (well not NT itself, IIRC, there was some an MS-internal developer tools group in charge)/moved to source depot since a shared file-system doesn't scale well to thousands of users. Especially if some file gets locked and you DoS the whole division.
Source depot became the SCCS of choice (outside of Dev Division).
Then git took over, and MS had to scale git to NT-size scale, and upstream many of the changes to git mainline.
TFS was used heavily by DevDiv, but as far as I know they never got perf to the point where Windows folk were satisfied with it on their monorepo.
It wasn't too bad for a centralized source control system tbh. Felt a lot like SVN reimagined through the prism of Microsoft's infamous NIH syndrome. I'm honestly not sure why anyone would use it over SVN unless you wanted their deep integration with Visual Studio.
After the initial TFS 1.0 hiccups, merging was way, way better than SVN. SVN didn't track anything about merges until 1.6. Even today git's handling of file names has nothing on TFS.
We used it. We knew no better. It was different then, you might not hear about alternatives unless you went looking for them. Source Safe was integrated with Visual Studio so was an obvious choice for small teams.
Get this; if you wanted to change a file you had to check it out. It was then locked and no-one else could change it. Files were literally read only on your machine unless you checked them out. The 'one at a time please' approach to Source Control (the other approach being 'lets figure out how to merge this later')
Which is exactly how CVS (and its predecessors RCS and SCCS) worked.
They were file based revision control, not repository based.
SVN added folders like trunk/branches/tags that overlaid the file based versioning by basically creating copies of the files under each folder.
Which is why branch creation/merging was such a complicated process, because if any of the files didn't merge, you had a half merged branch source and a half merged branch destination that you had to roll back.
Perforce does not lock files on checkout unless you have the file specifically configured to enforce exclusive locking in the file's metadata or depot typemap.
I am quite sure that you can edit files in an svn repo to your heart’s content regardless of whether anyone else is editing them on their machine at the same time.
Yep, svn has a lock feature but it is opt-in per file (possibly filetype?)
A pretty good tradeoff, because you can set it on complex structured files (e.g. PSDs and the like) to avoid the ballache of getting a conflict in an unmergeable file but it does not block code edition.
And importantly anyone can steal locks by default. So a colleague forgetting to unlock and going on holidays does not require finding a repo admin.
The file lock was a fun feature when a developer forgot to unlock it and went on holidays. Don't forget the black hole feature that made files randomly disappear for no reason. It may have been the worst piece of software I have ever used.
The lock approach is still used in IC design for some of the Cadence/Synopsis data files which are unmergable binaries. Not precisely sure of the details but I've heard it from other parts of the org.
I remember a big commercial SCM at the time that had this as an option, when you wanted to make sure you wouldn’t need to merge. Can’t remember what it was called, you could “sync to file system” a bit like dropbox and it required teams of full time admins to build releases and cut branches and stuff . Think it was bought by IBM?
Lucky you. Definitely one of the worst tools I’ve had the displeasure of working with. Made worse by people building on top of it for some insane reason.
I remember when we migrated from Visual Source Safe to TFS at my place of work. I was in charge of the migration and we hit errors and opened a ticket with Microsoft Premier Support. The ticket ended up being assigned to one of creators of Source Safe who replied "What you are seeing is not possible". He did manage to solve it in the end after a lot of head scratching.
> Agreed. It had a funny habit of corrupting its own data store also. That's absolutely what you want in a source control system.
I still ‘member articles calling it a source destruction system. Good times.
> It sucked; but honestly, not using anything is even worse than SourceSafe.
There have always been alternatives. And even when you didn’t use anything, at least you knew what to expect. Files didn’t magically disappear from old tarballs.
It was at least a little better than CVS, but with SVN available at the same time, never understood the mentality of the offices that I worked at using Source Safe instead of SVN.
CVS has a horrendous UI, but didn’t have a tendency to corrupt itself at the drop of a hat and didn’t require locking files to edit them by default (and then require a repository admin to come in and unlock files when a colleague went on holidays with files checked out). Also didn’t require shared write access to an SMB share (one of the reasons it corrupted itself so regularly).
Uhh, CVS definitely regularly corrupted itself. I nearly lost my senior research thesis in undergrad because of it. The only thing that saved me was understanding professors and the fact that I could drive back home to my parents' house and get back to my review in 30 minutes, where I had a good copy of my code I could put on an Iomega Zip disk from my desktop instead of the corrupted copy we couldn't pull from CVS in the CS lab.
funny how most folks remember the git migration as a tech win
but honestly the real unlock was devs finally having control over their own flow
no more waiting on sync windows, no more asking leads for branch access
suddenly everyone could move fast without stepping on each other
that shift did more for morale than any productivity dashboard ever could
git didn’t just fix tooling, it fixed trust in the dev loop
> In the early 2000s, Microsoft faced a dilemma. Windows was growing enormously complex, with millions of lines of code that needed versioning. Git? Didn’t exist. SVN? Barely crawling out of CVS’s shadow.
I wonder if Microsoft ever considered using BitKeeper, a commercial product that began development in 1998 and had its public release in 2000. Maybe centralized systems like Perforce were the norm and a DVCS like BitKeeper was considered strange or unproven?
I feel like we're well into the longtail now. Are there other SCM systems or is it the end of history for source control and git is the one and done solution?
Mercurial still has some life to it (excluding Meta’s fork of it), jj is slowly gaining, fossil exists.
And afaik P4 still does good business, because DVCS in general and git in particular remain pretty poor at dealing with large binary assets so it’s really not great for e.g. large gamedev. Unity actually purchased PlasticSCM a few years back, and has it as part of their cloud offering.
Google uses its own VCS called Piper which they developed when they outgrew P4.
I've heard this about game dev before. My (probably only somewhat correct) understanding is it's more than just source code--are they checking in assets/textures etc? Is perforce more appropriate for this than, say, git lfs?
I'm not sure about the current state of affairs, but I've been told that git-lfs performance was still not on par with Perforce on those kinds of repos a few years ago. Microsoft was investing a lot of effort in making it work for their large repos though so maybe it's different now.
But yeah, it's basically all about having binaries in source control. It's not just game dev, either - hardware folk also like this for their artifacts.
Interesting. Seems antithetical to the 'git centered' view of being for source code only (mostly)
I think I read somewhere that game dev teams would also check in the actual compiler binary and things of that nature into version control.
Usually it's considered "bad practice" when you see, like, and entire sysroot of shared libs in a git repository.
I don't even have any feeling one way or another. Even today "vendoring" cpp libraries (typically as source) isn't exactly rare. I'm not even sure if this is always a "bad" thing in other languages. Everyone just seems to have decided that relying on a/the package manager and some sort of external store is the Right Way. In some sense, it's harder to make the case for that.
It's only considered a bad idea because git handles it poorly. You're already putting all your code in version control - why would you not include the compiler binaries and system libraries too? Now everybody that gets the code has the right compiler to build it with as well!
The better organised projects I've worked on have done this, and included all relevant SDKs too, so you can just install roughly the right version of Visual Studio and you're good to go. Doesn't matter if you're not on quite the right point revision or haven't got rough to doing the latest update (or had it forced upon you); the project will still build with the compiler and libraries you got from Perforce, same as for everybody else.
> It's only considered a bad idea because git handles it poorly. You're already putting all your code in version control - why would you not include the compiler binaries and system libraries too? Now everybody that gets the code has the right compiler to build it with as well!
No arguments here, it makes perfect sense to me as a practice. It's shortsighted to consider only "libraries" (as source, typically) to be "dependencies"--implicitly you're relying on a compatible compiler version/runtime/interpreter (and whatever those depend on) and etc
What was the nature of this project? Was this something related to game development?
That seems to be the only domain where this approach is used (from what I've heard).
I've been checking in large (10s to 100s MBs) tarballs into one git repo that I use for managing a website archive for a few years, and it can be made to work but it's very painful.
I think there are three main issues:
1. Since it's a distributed VCS, everyone must have a whole copy of the entire repo. But that means anyone cloning the repo or pulling significant commits is going to end up downloading vast amounts of binaries. If you can directly copy the .git dir to the other machine first instead of using git's normal cloning mechanism then it's not as bad, but you're still fundamentally copying everything:
$ du -sh .git
55G .git
2. git doesn't "know" that something is a binary (although it seems to in some circumstances), so some common operations try to search them or operate on them in other ways as if they were text. (I just ran git log -S on that repo and git ran out of memory and crashed, on a machine with 64GB of RAM).
3. The cure for this (git lfs) is worse than the disease. LFS is so bad/strange that I stopped using it and went back to putting the tarballs in git.
Source control for large data.
Currently our biggest repository is 17 TB.
would love for you to try it out. It's open source, so you can self host as well.
Why would someone check binaries in a repo? The only time I came across checked binaries in a repo was because that particular dev could not be bothered to learn nuget / MAVEN. (the dev that approved that PR did not understand that either)
Because it’s way easier if you don’t require every level designer to spend 5 hours recompiling everything before they can get to work in the morning, because it’s way easier to just checkin that weird DLL than provide weird instructions to retrieve it, because onboarding is much simpler if all the tools are in the project, …
Hmm, I do not get it.... "The binaries are checked in the repo so that that the designer would not spend 5 hours recompiling" vs "the binaries come from a nuget site so that the designed would not spend 5 hours recompiling".
In both cases the designer does not recompile, but in the second case there are no checked in binaries in the repo... I still think nuget / MAVEN would be more appropriate for this task...
Everything is in P4: you checkout the project to work on it, you have everything. You update, you have everything up to date. All the tools are there, so any part of the pipeline can rely on anything that's checked in. You need an older version, you just check that out and off you go. And you have a single repository to maintain.
VCS + Nuget: half the things are in the VCS, you checkout the project and then you have to hunt down a bunch of packages from a separate thing (or five), when you update the repo you have to update the things, hopefully you don't forget any of the ones you use, scripts run on a prayer that you have fetched the right things or they crash, version sync is a crapshoot, hope you're not working on multiple projects at the same time needing different versions of a utility either. Now you need 15 layers of syncing and version management on top of each project to replicate half of what just checking everything into P4 gives you for free.
> VCS + Nuget: half the things are in the VCS, you checkout the project and then you have to hunt down a bunch of packages from a separate thing
Oh, and there's things like x509/proxy/whatever errors when on a corpo machine that has ZScaler or some such, so you have to use internal Artifactory/thing but that doesn't have the version you need or you need permissions to access so.. and etc etc.
I have no idea what environment / team you worked on but nuget is pretty much rock solid. There are no scripts running on a prayer that everything is fetched. Version sync is not a crapshot because nuget versions are updated during merges and with proper merge procedures (PR build + tests) nuget versions are always correct on the main branch.
One does not forget what nugets are used: VS projects do that bookkeeping for you. You update the VS project with the new nugets your task requires; and this bookkeeping will carry on when you merge your PR.
I have seen this model work with no issues in large codebases: VS solutions with upwards of 500,000 lines of code and 20-30 engineers.
But if you have to do this via Visual Studio, it's no good for the people that don't use Visual Studio.
Also, where does nuget get this stuff from? It doesn't build this stuff for you, presumably, and so the binaries must come from somewhere. So, you just got latest from version control to get the info for nuget - and now nuget has to use that info to download that stuff?
And that presumably means that somebody had to commit the info for nuget, and then separately upload the stuff somewhere that nuget can find it. But wait a minute - why not put that stuff in the version control you're using already? Now you don't need nuget at all.
Because it's (part of) a website that hosts the tarballs, and we want to keep the whole site under version control. Not saying it's a good reason, but it is a reason.
There are some other solutions (like jujutsu, which while using git as storage medium, has some differences in the handling of commits). But I do believe we reached a critical point where git is the one stop shop for all the source control needs despite it's flaws/complexity.
git by itself is often unsuitable for XL codebases. Facebook, Google, and many other companies / projects had to augment git to make it suitable or go with a custom solution.
AOSP with 50M LoC uses a manifest-based, depth=1 tool called repo to glue together a repository of repositories. If you’re thinking “why not just use git submodules?”, it’s because git submodules has a rough UX and would require so much wrangling that a custom tool is more favorable.
In general, the philosophy of distributed VCS being better than centralized is actually quite questionable. I want to know what my coworkers are up to and what they’re working on to avoid merge conflicts. DVCS without constant out-of-VCS synchronization causes more merge hell. Git’s default packfile settings are nightmarish — most checkouts should be depth==1, and they should be dynamic only when that file is accessed locally. Deeper integrations of VCS with build systems and file systems can make things even better. I think there’s still tons of room for innovation in the VCS space. The domain naturally opposes change because people don’t want to break their core workflows.
It's interesting to point out that almost all of Microsoft's "augmentations" to git have been open source and many of them have made it into git upstream already and come "ready to configure" in git today ("conical" sparse checkouts, a lot of steady improvements to sparse checkouts, git commit-graph, subtle and not-so-subtle packfile improvements, reflog improvements, more). A lot of it is opt-in stuff because of backwards compatibility or extra overhead that small/medium-sized repos won't need, but so much of it is there to be used by anyone, not just the big corporations.
I think it is neat that at least one company with mega-repos is trying to lift all boats, not just their own.
git submodules have a bad ux but it's certainly not worse than Android's custom tooling. I understand why they did it but in retrospect that seems like an obvious mistake to me.
We did migrate from Perforce to Git for a fairly large repositories, and I can relate to some of the issues. Luckily we did not had to invent VFS, although git-lfs was useful for large files.
We communicated the same information through multiple channels: weekly emails, Teams, wiki docs, team presentations, and office hours. The rule: if something was important, people heard it at least 3 times through different mediums.
If only this were standard. Last week I received the only notification that a bunch of internal systems were being deleted in two weeks. No scream test, no archiving, just straight deletion. Sucks to be you if you missed the email for any reason.
Every month or two, we get notifications along the FINAL WARNING lines, telling us about some critical system about to be deleted, or some new system that needs to be set up Right Now, because it is a Corporate Standard (that was never rolled out properly), and by golly we have had enough of teams ignoring us, the all powerful Board has got its eyes on you now.
It's a full time job to keep up with the never-ending churn. We could probably just spend all our engineering effort being compliant and never delivering features :)
Company name withheld to preserve my anonymity (100,000+ employees).
Even with this, there were many surprised people. I'm still amazed at all of the people that can ignore everything and just open their IDE and code (and maybe never see teams or email)
If you read all the notifications you'll never do your actual job. People who just open their IDE and code are to be commended in some respects - but it's a balance of course.
Alternatively, communications fatigue. How many emails does the average employee get with nonsense that doesn't apply to them? Oh cool, we have a new VP. Oh cool, that department had a charity drive. Oh cool, system I've never heard of is getting replaced by a new one, favourite of this guy I've never heard of.
Add in the various spam (be it attacks or just random vendors trying to sell something).
At some point, people start to zone out and barely skim, if that, most of their work emails. Same with work chats, which are also more prone to people sharing random memes or photos from their picnic last week or their latest lego set.
Everybody gets important emails, and it's literally part of their job to filter the wheat from the chaff. One of my benchmarks for someone's competency is their ability to manage information. With a combination of email filters and mental discipline, even the most busy inbox can be manageable. But this is an acquired skill, akin to not getting lost in social media, and some people are far better at it than others.
Our HR lady took personal offence when I asked to be unsubscribed from the emails about “deals” that employees have access to from corporate partners. :(
Yes, the last filter is always the human being who has to deal with whatever the computer couldn't automate. But even then, you should be able to skim an email and quickly determine its relevancy, and decide whether you need to take action immediately, can leave it for the future, or can just delete it. Unless you're getting thousands of emails a day, this should be manageable.
No kidding. The amount of things that change in important environments without anyone telling people outside their teams in some organizations can be maddening.
What we do is we scream the day before, all of us, get replied that we should have read the memo, reply we have real work to do, and the thing gets cancelled last minute, a few times a year, until nobody gives a fuck anymore.
Source Depot was based on Perforce. Microsoft bought a license for the Perforce source code and made changes to work at Microsoft scale (Windows, Office).
TFS was developed in the Studio team. It was designed to work on Microsoft scale and some teams moved over to it (SQL server). It was also available as a fairly decent product (leagues better than SourceSafe).
We had a similar setup, also with a homegrown VCS developed internally in our company, where I sometimes acted as branch admin. I’m not sure it worked exactly like Source Depot, but I can try to explain it.
Basically instead of everyone creating their own short-lived branches (expensive operation), you would have long-lived branches that a larger group of people would commit to (several product areas). The branch admins job was then to get the work all of these people forward integrated to a branch upwards in the hierarchy. This was attempted a few times per day, but if tests failed you would have to reach out to the responsible people to get those test fixed. Then later, when you get the changes merged upwards, some other changes have also been made to the main integration branch, and now you need to pull these down into your long lived branch - reverse integration - such that your branch is up to date with everyone else in the company.
At least in the Windows group, we use ri and fi oppositely from how you describe. RI = sharing code with a broader group of people toward trunk. FI = absorbing code created by the larger group of people on the dev team. Eventually we do a set of release forks that are isolated after a final set of FIs, so really outside customers get code via FI and then cherry pick style development.
RI/FI is similar to having long-lived branches in Git. Imagine you have a "develop-word" branch in git. The admins for that branch would merge all of the changes of their code to "main" and from "main" to their long lived branches. It was a little bit different than long-lived git branches as they also had a file filter (my private branch only had onenote code and it was the "onenote" branch)
I've long wanted a hosted Git service that would help me maintain long lived fork branches. I know there's some necessary manual work that is occasionally required to integrate patches, but the existing tooling that I'm familiar with for this kind of thing is overly focused on Debian packaging (quilt, git-buildpackage) and has horrifyingly poor ergonomics.
I'd love a system that would essentially be a source control of my patches, while also allowing a first class view of the upstream source + patches applied, giving me clear controls to see exactly when in the upstream history the breakages were introduced, so that I'm less locking in precise upstream versions that can accept the patches, and more actively engaging with ranges of upstream commits/tags.
I can't imagine how such a thing would actually be commercially useful, but darned if would be an obvious fit for AI to automatically examine the upstream and patch history and propose migrations.
Perforce is broadly similar to SVN in semantics, and the same branching logic applies to both. Basically if you have the notion of long-lived main branch and feature branches (and possibly an hierarchy in between, e.g. product- or component-specific branches), you need to flow code between them in an organized way. Forward/reverse integration simply describes the direction in which this is done - FI for main -> feature, RI for feature -> main.
What were the biggest hurdles?
Where did Git fall short?
How did you structure the repo(s)?
Where there many artifacts that went into integration with GitLFS?
One thing I find annoying about these Perforce hate stories: yes it's awkward to branch in Perforce. It is also the case that there is no need to ever create a branch for feature development when you use Perforce. It's like complaining that it is hard to grate cheese with a trumpet. That just isn't applicable.
I actually remember using Perforce back in like 2010 or something. And I can't remember why or for which client or employer. I just remember it was stupid.
I used Perforce a lot in the 90s, when it was simple (just p4, p4d, and p4merge!), super fast, and
never crashed or corrupted itself. Way simpler, and easier to train newbies on, than any of the alternatives.
Subdirectories-as-branches (like bare repo + workspace-per-branch practices w/git) is so much easier for average computer users to grok, too.
Very easy to admin too.
No idea what the current "enterprisey" offering is like, though.
For corporate teams, it was a game changer. So much better than any alternative at the time.
We're all so used to git that we've become used to it's terribleness and see every other system as deficient. Training and supporting a bunch of SWE-adjacent users (hw eng, ee, quality, managers, etc) is a really, really good reality check on how horrible the git UX and datamodel is (e.g. obliterating secrets--security, trade, or PII/PHI--that get accidentally checked in is a stop-the-world moment).
For the record, I happily use git, jj, and Gitea all day every day now (and selected them for my current $employer). However, also FTR, I've used SCCS, CVS, SVN, VSS, TFS and MKS SI professionally, each for years at a time.
All of the comments dismissing tools that are significantly better for most use cases other than distributed OSS, but lost the popularity contest, is shortsighted.
Git has a loooong way to go before it's as good in other ways as many of its "competitors". Learning about their benefits is very enlightening.
And, IIRC, p4 now integrates with git, though I've never used it.
I've used CVS, SVN, TFS, Mercurial, and Git in the past, so I have plenty of exposure to different options. I have to deal with Perforce in my current workplace and I have to say that even from this perspective it's honestly pretty bad in terms of how convoluted things are.
I don't disagree at all--p4 was kick-ass back in the day but the world, and our expectations, have moved on. Plus, they went all high-street enterprisey.
What makes it convoluted? Where did it lose the beat?
Perforce is really nice if you need to source control 16k textures next to code without thinking too much about it. Git LFS absolutely works but it's more complicated and has less support in industry tooling. Perforce also makes it easier to purge (obliterate) old revisions of files without breaking history for everyone. This can be invaluable if your p4 server starts to run out of disk space.
The ability to lock files centrally might seem outdated by the branching and PR model, but for some organizations the centralized solution works way better because they have built viable business processes around it. Centralized can absolutely smoke distributed in terms of iteration latency if the loop is tight enough and the team is cooperating well.
I agree with everything you say except git-lfs works. For modern game dev (where a full checkout is around 1TB of data) git-lfs is too slow, too error prone and too wasteful of disk space.
Perforce is a complete PITA to work with, too expensive and is outdated/flawed for modern dev BUT for binary files it's really the only game in town (closely followed by svn but people have forgotten how good svn was and only remember how bad it was at tracking branch merging).
I would say it's no more convoluted and confusing than git. I used Perforce professionally for quite a few years in gamedev, and found that a bit confusing at first. Then I was self-employed and used git, and coming to git from Perforce I found it very confusing at first. But then I grew to love it. Now I'm back to working for a big gamedev company and we use Perforce and I feel very proficient in both.
> Microsoft had to collaborate with GitHub to invent the Virtual File System for Git (VFS for Git) just to make this migration possible. Without VFS, a fresh clone of the Office repository (a shallow git clone would take 200 GB of disk space) would take days and consume hundreds of gigabytes.
It takes less than an hour on my third world apartment wifi to download Call of Duty Modern Warfare remake which is over 200 gygabytes. Since we're not talking about remote work here, I think Microsoft offices and servers (probably on local network) might have managed similar bandwidth back then.
There is a lot more to it than that. Check out "The largest Git repo on the planet" by Brian Harry who was in charge of the git migration and Azure DevOps (Microsoft's pendant to GitHub)
> For context, if we tried this with “vanilla Git”, before we started our work, many of the commands would take 30 minutes up to hours and a few would never complete. The fact that most of them are less than 20 seconds is a huge step but it still sucks if you have to wait 10-15 seconds for everything. When we first rolled it out, the results were much better. That’s been one of our key learnings. If you read my post that introduced GVFS, you’ll see I talked about how we did work in Git and GVFS to change many operations from being proportional to the number of files in the repo to instead be proportional to the number of files “read”. It turns out that, over time, engineers crawl across the code base and touch more and more stuff leading to a problem we call “over hydration”. Basically, you end up with a bunch of files that were touched at some point but aren’t really used any longer and certainly never modified. This leads to a gradual degradation in performance. Individuals can “clean up” their enlistment but that’s a hassle and people don’t, so the system gets slower and slower.
Having had yesterday the dubious pleasure of using MS Word for the first time in a decade, I can safely affirm that they could have have just piped the whole Office repo to the Windows equivalent of /dev/null and nothing of value would have been lost.
The worst part about Word is that it has been feature complete since Office 97, except they've made the UI worse each and every version since then. I wish I could get excited about a new version of Office or WordPerfect, but neither Microsoft or Corel has figured out how to innovate in the past three decades. And no, slapping """AI""" in there isn't the solution. There are so many possibities but they just sort of do nothing with it now that they make a few billion a month on Microsoft 365 subscriptions.
If it were that simple, would 100s of engineers spend so much time and effort? They did what they have to and spent the time and energy to maintain some semblance of commit and change history.
GP has a valid point. We had a Git repo managed in BitBucket that was gigantic because it contained binary files and the team didn’t know about LFS and storing them in an external tool like Artifactory. So checkouts took forever and even with shallow clones it took forever. With a CI/CD system running constantly and tests needing constant full coverage and hundreds of developers well it eats into developers time. We can’t just prune all the branches well because of compliance rules.
So we ended up removing all the binary artifacts before cloning into a new repo then making the old repo as read only.
Microsoft seemed to want to mirror everything rather than keep source depot alive.
We had another case where we had a subversion system that went out of security compliance that we simply ported to our git systems and abandoned it.
So my guess is they wanted everything to look the same and not just importing the code.
In about 2010, I briefly had a contract with a security firm with one dev, and there was no source control, and everything written was in low quality PHP. I quit after a week.
php_final_final_v2.zip shipped to production. A classic. I had a similar experience with https://www.ioncube.com/ php encryption. Everything encrypted and no source control.
> Today, as I type these words, I work at Snowflake. Snowflake has around ~2,000 engineers. When I was in Office, Office alone was around ~4,000 engineers.
Excel turns 40 this year and has changed very little in those four decades. I can't imagine you need 4,000 engineers just to keep it backwards compatible.
In the meantime we've seen entire companies built with a ragtag team of hungry devs.
This article makes out thousands of engineers that are good enough to qualify at Microsoft and work on Office but haven't used git yet? That sounds a bit overplayed tbh, if you haven't used git you must live under a rock. You can't use Source Depot at home.
Something not touched on by others. The standard Microsoft contract outlawed any moonlighting for years, any code you created was potentially going to be claimed by Microsoft - so you didn't feel safe working on side projects or contributing to open source.
Open source code was a pariah - you were warned unless you had an exception to never look at any open source code even vaguely related to your projects, including in personal time, for fear of opening up Microsoft to legal trouble.
In the context of this, when and why would the average dev get time to properly use git - no just get a shallow understanding, but use it at the complexity level needed for an large internal mono-repo ported to it.
I've used git Microsoft for years, but using git with Office client is totally different. I believe it's used differently, with very different expecations in Windows.
You’d be surprised at the amount of people at Microsoft that their entire career have been at Microsoft (pre-git-creation) that never used Git. Git is relatively new (2005) but source control systems are not.
I believe it. If you are a die-hard Microsoft person, your view of computing would be radically different from even the average developer today, let alone devs who are used to using FOSS.
Turn it around: If I were to apply for a job at Microsoft, they would probably find that my not using Windows for over twenty years is a gap on my CV (not one I would care to fill, mind).
It would very much depend on the team. There's no shortage of those that ship products for macOS and Linux, and sometimes that can even be the dominant platform.
Yes? If its in your field, like a webdev who has never touched Wordpress, it can be surprising. An automated tester who has never tried containers also has a problem.
These are young industries. So most hiring teams expect that you take the time to learn new technologies as they become established.
It's entirely plausible that a long-term engineer at Microsoft wouldn't have have used git. I'm sure a considerable number of software engineers don't program as a hobby.
It only takes a week to learn enough git to get by, and only a month or two to become every-day use proficient. Especially if one is already familiar with perforce, or svn, or other VCS.
Yes, there is a transition, no it isn't really that hard.
Anyone who views lack of git experience as a gap in a CV is selecting for the wrong thing.
Its oddly fascinating that Microsoft has managed to survive for so long with ancient/bad tools for software engineering. Almost like “life finds a way” but for software dev. From the outside it seems like they are doing better now after embracing OSS/generic dev tools.
At one point source depot was Toincredibly advanced, and there are still features that it had that git doesn't. Directory mapping being a stand out feature! Being able to only pull down certain directories from a depot and also remap where they are locally, and even have the same file be in multiple places. Makes sharing dependencies across multiple projects really easy, and a lot of complicated tooling around "monorepos" wouldn't need to exist if git supported directory mapping.
(You can get 80% of the way there with symlinks but in my experience they eventually break in git when too many different platforms making commits)
Also at one point I maintained an obscenely advanced test tool at MS, it pounded through millions of test cases across a slew of CPU architectures, intermingling emulators and physical machines that were connected to dev boxes hosting test code over a network controlled USB switch. (See: https://meanderingthoughts.hashnode.dev/how-microsoft-tested... for more details!)
Microsoft had some of the first code coverage tools for C/C++, spun out of a project from Microsoft Research.
Their debuggers are still some of the best in the world. NodeJS debugging in 2025 is dog shit compared to C# debugging in 2005.
I never understood the value of directory mapping when we used Perforce. It only seemed to add complexity when one team checked out code in different hierarchies and then some builds worked, some didn’t. Git was wonderful for having a simple layout.
I'm in exactly this situation with Perforce today, and I still hate it. The same problem OP described applies - you need to know which exact directories to check out to build, run tests etc successfully. You end up with wikis filled with obscure lists of mappings, many of them outdated, some still working but including a lot of cruft because people just copy it around. Sometimes the required directories change over time and your existing workspaces just stop working.
As always, git's answer to the problem is "stop being afraid of `git submodule`."
Cross-repo commits are not a problem as long as you understand "it only counts as truly committed if the child repo's commit is referenced from the parent repo".
Git submodules are awful. Using subversion's own submodule system should be mandatory for anyone claiming Git's implementation is somehow worthwhile or good.
Is this a "git" failure or a "Linux filesystems suck" failure?
It seems like "Linux fileystems" are starting to creak under several directions (Nix needing binary patching, atomic desktops having poor deduplication, containers being unable to do smart things with home directories or too many overlays).
Would Linux simply sucking it up and adopting ZFS solve this or am I missing something?
How is that related? I don’t think anyone would suggest ntfs is a better fit for these applications. It worked because it was a feature of the version control software, not because of file system features.
What would ZFS do for those issues? I guess maybe deduplication, but otherwise I'm not thinking of anything that you can't do with mount --bind and overlays (and I'm not even sure ZFS would replace overlays)
Snapshots seems to be a cheap feature in ZFS but are expensive everywhere else, for example.
OverlayFS has had performance issues on Linux for a while (once you start composing a bunch of overlays, the performance drops dramatically as well as you start hitting limits on number of overlays).
Google used Perforce for years and I think Piper still has basically the same interface? So no, MSFT wasn’t ridiculously behind the times by using Source Depot for so long.
Let’s not forget that Microsoft developed a lot of tools in the first place, as in, they were one of the companies that created things that didn’t really exist before Microsoft created them.
Git isn’t even very old, it came out in 2005. Microsoft Office first came out in 1990. Of course Office wasn’t using git.
Some examples would be useful here. Not knocking MS tools in general but are there any that were industry fists? Source code control for example existed at least since SCCS which in turn predates Microsoft itself.
Microsoft rarely did or does anything first. They are typically second or third to the post and VC is no different.
Most people don’t know or realize that Git is where it is because of Microsoft. About 1/2 of the TFS core team spun out to a foundation where they spent several years doing things like making submodules actually work, writing git-lfs, and generally making git scale.
You can look for yourself at the libgit2 repo back in the 2012-2015 timeframe. Nearly the whole thing was rewritten by Microsoft employees as the earliest stages of moving the company off source depot.
It was a really cool time that I’m still amazed to have been a small part of.
Of course that's only half the story - Microsoft invents amazing things, and promptly fails to capitalize on them.
AJAX, that venerable piece of kit that enabled every dynamic web-app ever, was a Microsoft invention. It didn't really take off, though, until Google made some maps with it.