will give you a git repo without a working set (just the contents typically in the .git directory). This allows you to create things like `foo.git` instead of `foo/.git`.
“origin” is also just the default name for the cloned remote. It could be called anything, and you can have as many remotes as you’d like. You can even namespace where you push back to the same remotes by changing fetch and push paths. At one company it was common to push back to `$user/$feature` to avoid polluting the root namespace with personal branches. It was also common to have `backup/$user` for pushing having a backup of an entire local repo.
I often add a hostname namespace when I’m working from multiple hosts and then push between them directly to another instead of going back to a central server.
For a small static site repo that has documents and server config, I have a remote like:
So I can push from my computer directly to that server, but those branches won’t overwrite the server’s branches. It acts like a reverse `git pull`, which can be useful for firewalls and other situations where my laptop wouldn’t be routable.
is another good one to know, it also makes a bare repository that is an exact clone (including all branches, tags, notes, etc) of a remote repo. Unlike a normal clone that is set up for local tracking branches of the remote.
It doesn't include pull requests, when cloning from github, though.
I've used this method to make a backup of our 80+ repo's at a previous company, just grab the url's from the api & run the git clone in a for loop. Works great!
> It doesn't include pull requests, when cloning from github, though.
Because GitHub pull requests are a proprietary, centralized, cloud-dependent reimplementation of `git request-pull`.
How the "free software" world slid head first into a proprietary cloud-based "open source" world still boils my blood. Congrats, Microsoft loves and owns it all, isn't that what what we always wanted?
When this kind of “sliding” happens it’s usually because the base implementation was missing functionality. Turns out CLI interfaces by themselves are (from a usability perspective) incomplete for the kind of collaboration git was designed to facilitate.
In another post discussion, someone suggested git as an alternative to overleaf, a Google Docs for latex... I guess there are plenty of people with blind spots for things that are technically possible, and usabel to experts, and UI that actually empowers much broader classes of users to wield the feature.
If you actually use the live collaboration features of overleaf, sure, it’s not a replacement. But lots of people use overleaf to write latex by themselves. The experience is just so much worse than developing locally and tracking changes with git.
> Turns out CLI interfaces by themselves are (from a usability perspective) incomplete for the kind of collaboration git was designed to facilitate.
git was designed to facilitate the collaboration scheme of the Linux Kernel Mailing List, which is, as you might guess... a mailing list.
Rather than a pull-request (which tries to repurpose git's branching infrastructure to support collaboration), the intended unit of in-the-large contribution / collaboration in git is supposed to be the patch.
The patch contribution workflow is entirely CLI-based... if you use a CLI mail client (like Linus Torvalds did at the time git was designed.)
The core "technology" of this is, on the contributor side:
1. "trailer" fields on commits (for things like `Fixes`, `Link`, `Reported-By`, etc)
2. `git format-patch`, with flags like `--cover-letter` (this is where the thing you'd think of as the "PR description" goes), `--reroll-count`, etc.
3. a codebase-specific script like Linux's `./scripts/get_maintainer.pl`, to parse out (from source-file-embedded headers) the set of people to notify explicitly about the patch — this is analogous to a PR's concept of "Assignees" + "Reviewers"
4. `git send-email`, feeding in the patch-series generated in step 2, and targeting the recipients list from step 3. (This sends out a separate email for each patch in the series, but in such a way that the messages get threaded to appear as a single conversation thread in modern email clients.)
And on the maintainer side:
5. `s ~/patches/patch-foo.mbox` (i.e. a command in a CLI email client like mutt(1), in the context of the patch-series thread, to save the thread to an .mbox file)
6. `git am -3 --scissors ~/patches/patch-foo.mbox` to split the patch-series mbox file back into individual patches, convert them back into an annotated commit-series, and build that into a topic branch for testing and merging.
Subsystem maintainers, meanwhile, didn't use patches to get topic branches "upstream" [= in Linus's git repo]. Linus just had the subsystem maintainers as git-remotes, and then, when nudged, fetched their integration branches, reviewed them, and merged them, with any communication about this occurring informally out-of-band. In other words, the patch flow was for low-trust collaboration, while direct fetch was for high-trust collaboration.
Interestingly, in the LKML context, `git request-pull` is simply a formalization of the high-trust collaboration workflow (specifically, the out-of-band "hey, fetch my branches and review them" nudge email). It's not used for contribution, only integration; and it doesn't really do anything you can't do with an email — its only real advantages are in keeping the history of those requests within the repo itself, and for forcing requests to be specified in terms of exact git refs to prevent any confusion.
There's basically 2 major schools of thought for submitting patches under git:
* Pile of commits - each individual commit doesn't matter as much as they all work combined. As a general rule, the only requirement for a valid patch is that the final version does what you say it does. Either the final result is squashed together entirely and then merged onto "master" (or whatever branch you've set up to be the "stable" one) or it's all piled together. Keeping the commit history one linear sequence of events is the single most important element here - if you submit a patch, you will not be updating the git hashes because it could force people to reclone your version of the code and that makes it complicated. This is pretty easy to mentally wrap your head around for a small project, but for larger projects quickly makes a lot of the organizatory tools git gives you filled with junk commits that you have to filter through. Most git forges encourage this PR system because it's again, newbie friendly.
* Patch series. Here, a patch isn't so much a series of commits you keep adding onto, but is instead a much smaller set of commits that you curate into its "most perfect form" - each individual commit has its own purpose and they don't/shouldn't bleed into each other. It's totally okay to change the contents of a patch series, because until it's merged, the history of the patch series is irrelevant as far as git is concerned. This is basically how the LKML (and other mailing list based) software development works, but it can be difficult to wrap your head around (+years of advice that "changing history" is the biggest sin you can do with git, so don't you dare!). It tends to work the best with larger projects, while being completely overkill for a smaller tool. Most forges usually offer poor support for patch series based development, unless the forge is completely aimed at doing it that way.
> It's totally okay to change the contents of a patch series, because until it's merged, the history of the patch series is irrelevant as far as git is concerned.
Under the original paradigm, the email list itself — and a (pretty much expected/required) public archive of such, e.g. https://lore.kernel.org for LKML — serves the same history-preserving function for the patch series themselves (and all the other emails that go back and forth discussing them!) that the upstream git repo does for the final patches-turned-commits. The commits that make it into the repo reference URLs of threads on the public mailing-list archive, and vice-versa.
Fun fact: in the modern era where ~nobody uses CLI email clients any more, a tool called b4 (https://b4.docs.kernel.org/) is used to facilitate the parts of the git workflow that interact with the mailing list. The subcommand that pulls patches out of the list (`b4 mbox`) actually relies on the public web archive of the mailing list, rather than relying on you to have an email account with a subscription to the mailing list yourself (let alone a locally-synced mail database for such an account.)
That makes sense. The first one sounds like basically any PR workflow on GitHub/GitLab whatever. Though I don't really care if people squash/reorder their commits. The only time it's annoying is if someone else branched off your branch and the commit gets rebased out from under them. Though I think rebase --onto helps resolve that problem.
The second one makes sense, but I can't imagine actually working that way on any of the projects I've been in. The amount of work it would take just doesn't make sense. Can totally understand why it would be useful on something like the Linux Kernel though.
You normally have one patch per commit. The patch is the diff between that commit and its parent. (I forget how git format-patch handles the case where there are two parents.)
> (I forget how git format-patch handles the case where there are two parents.)
As per [0] merge commits are dropped:
Note that format-patch will omit merge commits from the output, even if they are part of the requested range. A simple "patch" does not include enough information for the receiving end to reproduce the same merge commit.
I originally thought it would use --first-parent (so just diff vs the first parent, which is what I would want) but apparently no! It is possible to get this behaviour using git log as detailed in this great write-up [1].
If that's the case I'm assuming the commit itself is quite large then? Or maybe it would more accurate to say it can be large if all the changes logically go together?
I'm thinking in terms of what I often see from people I work with, where a PR is normally made up of lots of small commits.
The idea is that you divide a large change into a series of small commits that each make sense in isolation, so that Linus or Greg Kroah-Hartman or whoever is looking at your proposed change can understand it as quickly as possible—hopefully in order to accept it, rather than to reject it.
I think the point I always get stuck on is how small is "small" when we're talking about commits/patches. Like if you're adding a new feature (to anything, not necessarily the Linux Kernel), should the entire feature be a single commit or several smaller commits? I go back and forth on this all the time, and if you research you're gonna see a ton of different opinions. I've seen some people argue a commit should basically only be a couple lines of code changed, and others argue it should be the entire feature.
You commonly hear Linus talk about commits/patches having very detailed descriptions attached to them. I have trouble believing people would have time for that if each commit was only a few lines, and larger features were spread out over hundreds of commits.
When I'm reviewing commits, I find it useful to see refactoring, which doesn't change behavior, separated from functional changes, and for each commit to leave the tree in a working, testable state. This is also helpful for git bisect.
Often, a change to a new working state is necessarily bigger than a couple of lines, or one of the lines has to get removed later.
I don't want to have to say, "Hmm, I wonder if this will work at the end of the file?" and spend a long time figuring out that it won't, then see that the problem is fixed later in the patch series.
It still blows my mind how git has lost it's original ideas of decentralized development because of github and how github, a for-profit - centralized - close-sourced forge, became the center for lots of important open source projects. We need radicle, forgejo, gitea to catch up even more!
It didn't really lose the original ideas. It just never learned that people don't want to use it the way kernel devs want to use it. Git never provided an easy github-like experience, so GitHub took over. Turns out devs in general are not into the "setup completely independent public mailing lists for projects" idea.
> Turns out devs in general are not into the "setup completely independent public mailing lists for projects" idea.
My feeling is that devs in general are not into the "learning how to use tools" idea.
They don't want to learn the git basics, they don't want to learn the cmake basics, ...
I mean that as an observation more than a criticism. But to me, the fact that git was designed for those who want to learn powerful tools is a feature. Those who don't can use Microsoft. It all works in the end.
Fun fact: if I want to open source my code but not get contributions (or rather feature requests by people who probably won't ever contribute), I put my git repo on anything that is not GitHub. It feels like most professional devs don't know how to handle anything that is not on GitHub :-). Bonus point for SourceHut: if someone manages to send a proper patch on the mailing list, it usually means that they know what they are doing.
> My feeling is that devs in general are not into the "learning how to use tools" idea
Well, the devs learnt how to use Github, didn't they? Seems like people CAN learn things that are useful. I can also make the argument that Github pull requests are actually more powerful than git request-pull in addition to having a nicer UI/UX.
Being upset that people aren't using git request-pull is like the creator of Brainfuck being upset that scientists aren't using Brainfuck instead of something more powerful and has a better UI/UX like Python. It's kinda obvious which one is better to use...
> My feeling is that devs in general are not into the "learning how to use tools" idea.
Given the number of vim, emacs, nix, git, i3, etc. users who are proud of it and all the customisations they do, I don't think so. Like, there will be a decent group, but not generalisable to "devs".
For me the largest advantage of Git was being able to easily switch branches. Previously I'd have to have multiple copies of an entire source repo on my machine if I wanted to work on multiple things at the same time. Likewise a patch set going through CR meant an entire folder on my machine was frozen until I got feed back.
Not having to email complex patches was another huge plus. I was at Microsoft at the time and they had home made scripts (probably Perl or VBS, but I forget what) that applied patches to a repo.
It sucked.
Git branch alone was worth the cost of changing over.
I once worked at a smaller company that didn't want to shell out for github and we just hosted repos on some VM and used the ssh method. It worked. I just found it to be kind of clunky having come from a bigger place that was doing enterprise source control management with Perforce of all things. Github as a product was fairly new back then, but everyone was trying to switch over to Git for resume reasons there. So then I go to this smaller place using git in the classic manner.
I don’t think it’s really that surprising. git didn’t become popular because it was decentralised, it just happened to be. So it stands to reason that part doesn’t get emphasised a ton.
It did become popular because it was decentralized, but the specific features that this enabled were less about not depending on a central server, and more about being able to work with the same repo locally with ease without having to be online for most operations (as was the case with Subversion etc). Git lets me have a complete local copy of the source with all the history, branches etc in it, so anything that doesn't require looking at the issues can be done offline if you did a sync recently.
The other big point was local branches. Before DVCS, the concept of a "local branch" was generally not a thing. But now you could suddenly create a branch for each separate issue and easily switch between them while isolating unrelated changes.
It's not the interface, it's the web hosting. People want a free destination server that's up 24/7 to store their repository.
If it was only the web interface, people could locally install GitLab or Gitea to get a web browser UI. (Or use whatever modern IDE code editor to have a GUI instead of a CLI for git commands.) But doing that still doesn't solve what GitHub solves: a public server to host the files, issue tracking, etc.
Before git & Github, people put source code for public access on SourceForge and CodeProject. The reason was the same: a zero-cost way to share code with everybody.
GitLab is around a decade old, is a solid enterprise product and has always had a very similar interface to GitHub, at times even drawing criticism for being too similar. There's more to it than that.
> “origin” is also just the default name for the cloned remote. It could be called anything, and you can have as many remotes as you’d like.
One remote can also hold more URLs! This is arguably more obscure (Eclipse's EGit doesn't even support it), but works wonders for my workflow, since I want to push to multiple mirrors at the same time.
Whenever I fork a repo I rename origin to “fork” and then add the parent repo as a remote named “upstream” so i can pull from that, rebase any of my own changes in to, and push to fork as needed.
Multiple remotes is also how you can combine multiple repos into one monorepo by just fetching and pulling from each one, maybe into different subdirectories to avoid path collisions.
This sounds like submodules, but I'm guessing it's completely orthogonal ... multiple distinct remotes for the same _repository_, all of them checked out to different sub-paths; does that result in all the remotes winding up with all the commits for all the shared repositories when you push, or can you "subset" the changes as well?
Yeah, it's different, I was thinking about a time I needed to combine to separate repos into one. To do that, you clone one of them, then add a remote to the other one, fetch that, and `pull --rebase` or similar, and you'll replay all of the first's commits on top of the second's. I can't remember what I was thinking about the subdirectories, I guess they'd already have to be organized that way in the various repos to avoid conflicts or smushing separate sources together.
I always thought it would have been better, and less confusing for newcomers, if GitHub had named the default remote “github”, instead of origin, in the examples.
Requiring a fork to open pull requests as an outsider to a project is in itself a idiosyncrasy of GitHub that could be done without. Gitea and Forgejo for example support AGit: https://forgejo.org/docs/latest/user/agit-support/.
Nevertheless, to avoid ambiguity I usually name my personal forks on GitHub gh-<username>.
No, it's a normal feature of Git. If I want you to pull my changes, I need to host those changes somewhere that you can access. If you and I are both just using ssh access to our separate Apache servers, for example, I am going to have to push my changes to a fork on my server before you can pull them.
And of course in Git every clone is a fork.
AGit seems to be a new alternative where apparently you can push a new branch to someone else's repository that you don't normally have access to, but that's never guaranteed to be possible, and is certainly very idiosyncratic.
That's backwards. In Github every fork is just a git clone. Before GitHub commandeered the term "fork' was already in common use and it had a completely different meaning.
Arguably the OG workflow to submit your code is `git send-email`, and that also doesn't require an additional third clone on the same hosting platform as the target repository.
All those workflows are just as valid as the others, I was just pointing out that the way github does it is not the only way it can be done.
> Requiring a fork to open pull requests as an outsider to a project is in itself a idiosyncrasy of GitHub that could be done without. Gitea and Forgejo for example support AGit: https://forgejo.org/docs/latest/user/agit-support/.
Ah yes, I'm sure the remote being called "origin" is what confuses people when they have to push to a refspec with push options. That's so much more straightforward than a button "create pull request".
As far as I'm concerned the problem isn't that one is easier than the other.
It's that in the github case it completely routes around the git client.
With AGit+gitea or forgejo you can either click your "create pull request" button,
or make a pull request right from the git client. One is necessarily going to require more information than the other to reach the destination...
It's like arguing that instead of having salad or fries on the menu with your entree they should only serve fries.
agreed, you'd need a second name anyway. and probably "origin" and "upstream" is nicer than "github" and "my-fork" because.. the convention seems like it should apply to all the other git hosts too: codeberg, sourcehut, tfs, etc
Git was always explicitly a decentralized, "peer to peer" version control system, as opposed to centralized ones like SVN, with nothing in the protocol itself that makes a distinction between a "server" and a "client". Using it in a centralized fashion is just a workflow that you choose to use (or, realistically, one that somebody else chose for you). Any clone of a repository can be a remote to any other clone, and you can easily have a "git server" (ie. just another directory) in your local filesystem, which is a perfectly reasonable workflow in some cases.
There was a thread not to long ago where people were conflating git with GitHub. Git is an incredible tool (after coming from SVN/CVS/p4/source safe) that stands on its own apart from hosting providers.
And GitHub naturally has done nothing to disabuse people of the interpretation that git = GitHub. Meanwhile, the actual raison d'etre for the existence of git of course doesn't use GitHub, or the "pull request" based workflow that GitHub invented and is also not anything intrinsic to git in any way.
I have a use case just for this. Sometimes my internet goes down while I'm working on my desktop computer. I'll put my work in a branch and push it to my laptop, then go to a coffee shop to continue my work.
It's a little more complex than that. Yes git can work in a peer-to-peer fashion, but the porcelain is definitely set up for a hub-and-spoke model, given how cloning a remote repo only gives you a partial copy of the remote history.
There's other stuff too, like git submodules can't be configured to reference another branch on the local repository and then be cloned correctly, only another remote.
> given how cloning a remote repo only gives you a partial copy of the remote history
When you clone you get the full remote history and all remote branches (by default). That’s painfully true when you have a repo with large binary blobs (and the reason git-lfs and others exist).
You're right, I got that part wrong, git actually fetches all of the remote commits (but not all of the refs, many things are missing, for instance notes).
But a clone of your clone is not going to work the same way, since remote branches are not cloned by default, either. So it'll only have partial history. This is what I was thinking about.
> given how cloning a remote repo only gives you a partial copy of the remote history
You may be thinking of the optional -depth switch, which allows you to create shallow clones that don't have the full history. If you don't include that, you'll get the full history when cloning.
I'd say git submodules have such an awkward UX that should probably not be used except in very rare and organized cases. I've done it before but it has to be worth it.
I can't get over my fear of subtrees after accidentally nuking one of my repos by doing a rebase across the subtree commit. I've found that using worktrees, with a script in the main branch to set up the worktrees, works pretty well to split history across multiple branches, like what you might want in a monorepo.
Sadly doing a monorepo this way with pnpm doesn't work, since pnpm doesn't enforce package version requirements inside of a pnpm workspace. And it doesn't record installed version information for linked packages either.
> “origin” is also just the default name for the cloned remote
I don't have a central dotfiles repo anymore (that I would always to forget to push to); I have SSH access to my devices - via tailscale - anyway so I'm doing
git remote add $hostname $hostname:.config
and can cd ~/.config && git fetch/pull/rebase $hostname anytime from anywhere.
I've been considering a bare repo + setting $GITDIR (e.g via direnv) but somehow the dead simple simplicity has trumped the lack of push ability
What's the benefit of this compared to rsync or scp $hostname:.config/<TAB>?
I put my whole home folder in git and that has its benefits (being able to see changes to files as they happen) but if I'm just copying a file or two of config I'll just cat or scp it--introducing git seems needlessly complex if the branches are divergent
Yes, I encourage my co-workers, when pushing to a common repo, to use `$user/$whatever` exactly to have their own namespace. The main selling point I'm making is that it makes cleanup of old branches easier, and less conflict-prone.
Tangentially related: when you have multiple local checkouts, often `git worktree` is more convenient than having completely independent local repository. See https://git-scm.com/docs/git-worktree
“origin” is also just the default name for the cloned remote. It could be called anything, and you can have as many remotes as you’d like. You can even namespace where you push back to the same remotes by changing fetch and push paths. At one company it was common to push back to `$user/$feature` to avoid polluting the root namespace with personal branches. It was also common to have `backup/$user` for pushing having a backup of an entire local repo.
I often add a hostname namespace when I’m working from multiple hosts and then push between them directly to another instead of going back to a central server.
For a small static site repo that has documents and server config, I have a remote like:
So I can push from my computer directly to that server, but those branches won’t overwrite the server’s branches. It acts like a reverse `git pull`, which can be useful for firewalls and other situations where my laptop wouldn’t be routable.