is another good one to know, it also makes a bare repository that is an exact clone (including all branches, tags, notes, etc) of a remote repo. Unlike a normal clone that is set up for local tracking branches of the remote.
It doesn't include pull requests, when cloning from github, though.
> It doesn't include pull requests, when cloning from github, though.
Because GitHub pull requests are a proprietary, centralized, cloud-dependent reimplementation of `git request-pull`.
How the "free software" world slid head first into a proprietary cloud-based "open source" world still boils my blood. Congrats, Microsoft loves and owns it all, isn't that what what we always wanted?
When this kind of “sliding” happens it’s usually because the base implementation was missing functionality. Turns out CLI interfaces by themselves are (from a usability perspective) incomplete for the kind of collaboration git was designed to facilitate.
In another post discussion, someone suggested git as an alternative to overleaf, a Google Docs for latex... I guess there are plenty of people with blind spots for things that are technically possible, and usabel to experts, and UI that actually empowers much broader classes of users to wield the feature.
If you actually use the live collaboration features of overleaf, sure, it’s not a replacement. But lots of people use overleaf to write latex by themselves. The experience is just so much worse than developing locally and tracking changes with git.
> Turns out CLI interfaces by themselves are (from a usability perspective) incomplete for the kind of collaboration git was designed to facilitate.
git was designed to facilitate the collaboration scheme of the Linux Kernel Mailing List, which is, as you might guess... a mailing list.
Rather than a pull-request (which tries to repurpose git's branching infrastructure to support collaboration), the intended unit of in-the-large contribution / collaboration in git is supposed to be the patch.
The patch contribution workflow is entirely CLI-based... if you use a CLI mail client (like Linus Torvalds did at the time git was designed.)
The core "technology" of this is, on the contributor side:
1. "trailer" fields on commits (for things like `Fixes`, `Link`, `Reported-By`, etc)
2. `git format-patch`, with flags like `--cover-letter` (this is where the thing you'd think of as the "PR description" goes), `--reroll-count`, etc.
3. a codebase-specific script like Linux's `./scripts/get_maintainer.pl`, to parse out (from source-file-embedded headers) the set of people to notify explicitly about the patch — this is analogous to a PR's concept of "Assignees" + "Reviewers"
4. `git send-email`, feeding in the patch-series generated in step 2, and targeting the recipients list from step 3. (This sends out a separate email for each patch in the series, but in such a way that the messages get threaded to appear as a single conversation thread in modern email clients.)
And on the maintainer side:
5. `s ~/patches/patch-foo.mbox` (i.e. a command in a CLI email client like mutt(1), in the context of the patch-series thread, to save the thread to an .mbox file)
6. `git am -3 --scissors ~/patches/patch-foo.mbox` to split the patch-series mbox file back into individual patches, convert them back into an annotated commit-series, and build that into a topic branch for testing and merging.
Subsystem maintainers, meanwhile, didn't use patches to get topic branches "upstream" [= in Linus's git repo]. Linus just had the subsystem maintainers as git-remotes, and then, when nudged, fetched their integration branches, reviewed them, and merged them, with any communication about this occurring informally out-of-band. In other words, the patch flow was for low-trust collaboration, while direct fetch was for high-trust collaboration.
Interestingly, in the LKML context, `git request-pull` is simply a formalization of the high-trust collaboration workflow (specifically, the out-of-band "hey, fetch my branches and review them" nudge email). It's not used for contribution, only integration; and it doesn't really do anything you can't do with an email — its only real advantages are in keeping the history of those requests within the repo itself, and for forcing requests to be specified in terms of exact git refs to prevent any confusion.
There's basically 2 major schools of thought for submitting patches under git:
* Pile of commits - each individual commit doesn't matter as much as they all work combined. As a general rule, the only requirement for a valid patch is that the final version does what you say it does. Either the final result is squashed together entirely and then merged onto "master" (or whatever branch you've set up to be the "stable" one) or it's all piled together. Keeping the commit history one linear sequence of events is the single most important element here - if you submit a patch, you will not be updating the git hashes because it could force people to reclone your version of the code and that makes it complicated. This is pretty easy to mentally wrap your head around for a small project, but for larger projects quickly makes a lot of the organizatory tools git gives you filled with junk commits that you have to filter through. Most git forges encourage this PR system because it's again, newbie friendly.
* Patch series. Here, a patch isn't so much a series of commits you keep adding onto, but is instead a much smaller set of commits that you curate into its "most perfect form" - each individual commit has its own purpose and they don't/shouldn't bleed into each other. It's totally okay to change the contents of a patch series, because until it's merged, the history of the patch series is irrelevant as far as git is concerned. This is basically how the LKML (and other mailing list based) software development works, but it can be difficult to wrap your head around (+years of advice that "changing history" is the biggest sin you can do with git, so don't you dare!). It tends to work the best with larger projects, while being completely overkill for a smaller tool. Most forges usually offer poor support for patch series based development, unless the forge is completely aimed at doing it that way.
> It's totally okay to change the contents of a patch series, because until it's merged, the history of the patch series is irrelevant as far as git is concerned.
Under the original paradigm, the email list itself — and a (pretty much expected/required) public archive of such, e.g. https://lore.kernel.org for LKML — serves the same history-preserving function for the patch series themselves (and all the other emails that go back and forth discussing them!) that the upstream git repo does for the final patches-turned-commits. The commits that make it into the repo reference URLs of threads on the public mailing-list archive, and vice-versa.
Fun fact: in the modern era where ~nobody uses CLI email clients any more, a tool called b4 (https://b4.docs.kernel.org/) is used to facilitate the parts of the git workflow that interact with the mailing list. The subcommand that pulls patches out of the list (`b4 mbox`) actually relies on the public web archive of the mailing list, rather than relying on you to have an email account with a subscription to the mailing list yourself (let alone a locally-synced mail database for such an account.)
That makes sense. The first one sounds like basically any PR workflow on GitHub/GitLab whatever. Though I don't really care if people squash/reorder their commits. The only time it's annoying is if someone else branched off your branch and the commit gets rebased out from under them. Though I think rebase --onto helps resolve that problem.
The second one makes sense, but I can't imagine actually working that way on any of the projects I've been in. The amount of work it would take just doesn't make sense. Can totally understand why it would be useful on something like the Linux Kernel though.
You normally have one patch per commit. The patch is the diff between that commit and its parent. (I forget how git format-patch handles the case where there are two parents.)
> (I forget how git format-patch handles the case where there are two parents.)
As per [0] merge commits are dropped:
Note that format-patch will omit merge commits from the output, even if they are part of the requested range. A simple "patch" does not include enough information for the receiving end to reproduce the same merge commit.
I originally thought it would use --first-parent (so just diff vs the first parent, which is what I would want) but apparently no! It is possible to get this behaviour using git log as detailed in this great write-up [1].
If that's the case I'm assuming the commit itself is quite large then? Or maybe it would more accurate to say it can be large if all the changes logically go together?
I'm thinking in terms of what I often see from people I work with, where a PR is normally made up of lots of small commits.
The idea is that you divide a large change into a series of small commits that each make sense in isolation, so that Linus or Greg Kroah-Hartman or whoever is looking at your proposed change can understand it as quickly as possible—hopefully in order to accept it, rather than to reject it.
I think the point I always get stuck on is how small is "small" when we're talking about commits/patches. Like if you're adding a new feature (to anything, not necessarily the Linux Kernel), should the entire feature be a single commit or several smaller commits? I go back and forth on this all the time, and if you research you're gonna see a ton of different opinions. I've seen some people argue a commit should basically only be a couple lines of code changed, and others argue it should be the entire feature.
You commonly hear Linus talk about commits/patches having very detailed descriptions attached to them. I have trouble believing people would have time for that if each commit was only a few lines, and larger features were spread out over hundreds of commits.
When I'm reviewing commits, I find it useful to see refactoring, which doesn't change behavior, separated from functional changes, and for each commit to leave the tree in a working, testable state. This is also helpful for git bisect.
Often, a change to a new working state is necessarily bigger than a couple of lines, or one of the lines has to get removed later.
I don't want to have to say, "Hmm, I wonder if this will work at the end of the file?" and spend a long time figuring out that it won't, then see that the problem is fixed later in the patch series.
It still blows my mind how git has lost it's original ideas of decentralized development because of github and how github, a for-profit - centralized - close-sourced forge, became the center for lots of important open source projects. We need radicle, forgejo, gitea to catch up even more!
It didn't really lose the original ideas. It just never learned that people don't want to use it the way kernel devs want to use it. Git never provided an easy github-like experience, so GitHub took over. Turns out devs in general are not into the "setup completely independent public mailing lists for projects" idea.
> Turns out devs in general are not into the "setup completely independent public mailing lists for projects" idea.
My feeling is that devs in general are not into the "learning how to use tools" idea.
They don't want to learn the git basics, they don't want to learn the cmake basics, ...
I mean that as an observation more than a criticism. But to me, the fact that git was designed for those who want to learn powerful tools is a feature. Those who don't can use Microsoft. It all works in the end.
Fun fact: if I want to open source my code but not get contributions (or rather feature requests by people who probably won't ever contribute), I put my git repo on anything that is not GitHub. It feels like most professional devs don't know how to handle anything that is not on GitHub :-). Bonus point for SourceHut: if someone manages to send a proper patch on the mailing list, it usually means that they know what they are doing.
> My feeling is that devs in general are not into the "learning how to use tools" idea
Well, the devs learnt how to use Github, didn't they? Seems like people CAN learn things that are useful. I can also make the argument that Github pull requests are actually more powerful than git request-pull in addition to having a nicer UI/UX.
Being upset that people aren't using git request-pull is like the creator of Brainfuck being upset that scientists aren't using Brainfuck instead of something more powerful and has a better UI/UX like Python. It's kinda obvious which one is better to use...
> My feeling is that devs in general are not into the "learning how to use tools" idea.
Given the number of vim, emacs, nix, git, i3, etc. users who are proud of it and all the customisations they do, I don't think so. Like, there will be a decent group, but not generalisable to "devs".
For me the largest advantage of Git was being able to easily switch branches. Previously I'd have to have multiple copies of an entire source repo on my machine if I wanted to work on multiple things at the same time. Likewise a patch set going through CR meant an entire folder on my machine was frozen until I got feed back.
Not having to email complex patches was another huge plus. I was at Microsoft at the time and they had home made scripts (probably Perl or VBS, but I forget what) that applied patches to a repo.
It sucked.
Git branch alone was worth the cost of changing over.
I once worked at a smaller company that didn't want to shell out for github and we just hosted repos on some VM and used the ssh method. It worked. I just found it to be kind of clunky having come from a bigger place that was doing enterprise source control management with Perforce of all things. Github as a product was fairly new back then, but everyone was trying to switch over to Git for resume reasons there. So then I go to this smaller place using git in the classic manner.
I don’t think it’s really that surprising. git didn’t become popular because it was decentralised, it just happened to be. So it stands to reason that part doesn’t get emphasised a ton.
It did become popular because it was decentralized, but the specific features that this enabled were less about not depending on a central server, and more about being able to work with the same repo locally with ease without having to be online for most operations (as was the case with Subversion etc). Git lets me have a complete local copy of the source with all the history, branches etc in it, so anything that doesn't require looking at the issues can be done offline if you did a sync recently.
The other big point was local branches. Before DVCS, the concept of a "local branch" was generally not a thing. But now you could suddenly create a branch for each separate issue and easily switch between them while isolating unrelated changes.
It's not the interface, it's the web hosting. People want a free destination server that's up 24/7 to store their repository.
If it was only the web interface, people could locally install GitLab or Gitea to get a web browser UI. (Or use whatever modern IDE code editor to have a GUI instead of a CLI for git commands.) But doing that still doesn't solve what GitHub solves: a public server to host the files, issue tracking, etc.
Before git & Github, people put source code for public access on SourceForge and CodeProject. The reason was the same: a zero-cost way to share code with everybody.
GitLab is around a decade old, is a solid enterprise product and has always had a very similar interface to GitHub, at times even drawing criticism for being too similar. There's more to it than that.
I've used this method to make a backup of our 80+ repo's at a previous company, just grab the url's from the api & run the git clone in a for loop. Works great!
It doesn't include pull requests, when cloning from github, though.