diff options
| author | nicole mazzuca <mazzucan@outlook.com> | 2020-09-02 09:14:24 -0700 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2020-09-02 09:14:24 -0700 |
| commit | d73ead568b00063a060d3e026d083e2730fff809 (patch) | |
| tree | 5b25e4b961ce7f03e270b68cef0d36d3c0c75d89 /docs | |
| parent | 9740611cab8faa9d9ada57265471e8e4786d37bc (diff) | |
| download | vcpkg-d73ead568b00063a060d3e026d083e2730fff809.tar.gz vcpkg-d73ead568b00063a060d3e026d083e2730fff809.zip | |
[vcpkg RFC] initial registries RFC (#12881)
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/specifications/registries.md | 285 |
1 files changed, 285 insertions, 0 deletions
diff --git a/docs/specifications/registries.md b/docs/specifications/registries.md new file mode 100644 index 000000000..48b392a5c --- /dev/null +++ b/docs/specifications/registries.md @@ -0,0 +1,285 @@ +# Package Federation: Custom Registries + +As it is now, vcpkg has over 1400 ports in the default registry (the `/ports` directory). +For the majority of users, this repository of packages is enough. However, many enterprises +need to more closely control their dependencies for one reason or another, and this document +lays out a method which we will build into vcpkg for exactly that reason. + +## Background + +A registry is simply a set of packages. In fact, there is already a registry in vcpkg: the default one. +Package federation, implemented via custom registries, allows one to add new packages, +edit existing packages, and have as much or as little control as one likes over the dependencies that one uses. +It gives the control over dependencies that an enterprise requires. + +### How Does the Current Default Registry Work? + +Of course, the existing vcpkg tool does have packages in the official, +default registry. The way we describe these packages is in the ports tree – +at the base of the vcpkg install directory, there is a directory named ports, +which contains on the order of 1300 directories, one for each package. Then, +in each package directory, there are at least two files: a CONTROL or +vcpkg.json file, which contains the name, version, description, and features +of the package; and a portfile.cmake file which contains the information on +how to download and build the package. There may be other files in this +registry, like patches or usage instructions, but only those two files are +needed. + +### Existing vcpkg Registry-like Features + +There are some existing features in vcpkg that act somewhat like a custom +registry. The most obvious feature that we have is overlay ports – this +feature allows you to specify any number of directories as "overlays", which +either contain a package definition directly, or which contain some number of +package directories; these overlays will be used instead of the ports tree +for packages that exist in both places, and are specified exclusively on the +command line. Additionally, unfortunately, if one installs a package from +overlay ports that does not exist in the ports tree, one must pass these +overlays to every vcpkg installation command. + +There is also the less obvious "feature" which works by virtue of the ports +tree being user-editable: one can always edit the ports tree on their own +machine, and can even fork vcpkg and publish their own ports tree. +Unfortunately, this then means that any updates to the source tree require +merges, as opposed to being able to fast-forward to the newest sources. + +### Why Registries? + +There are many reasons to want custom registries; however, the most important reasons are: + +* Legal requirements – a company like Microsoft or Google + needs the ability to strictly control the code that goes into their products, + making certain that they are following the licenses strictly. + * There have been examples in the past where a library which is licensed under certain terms contains code + which is not legally allowed to be licensed under those terms (see [this example][legal-example], + where a person tried to merge Microsoft-owned, Apache-licensed code into the GPL-licensed libstdc++). +* Technical requirements – a company may wish to run their own tests on the packages they ship, + such as [fuzzing]. +* Other requirements – an organization may wish to strictly control its dependencies for a myriad of other reasons. +* Newer versions – vcpkg may not necessarily always be up to date for all libraries in our registry, + and an organization may require a newer version than we ship; + they can very easily update this package and have the version that they want. +* Port modifications – vcpkg has somewhat strict policies on port modifications, + and an organization may wish to make different modifications than we do. + It may allow that organization to make certain that the package works on triplets + that our team does not test as extensively. +* Testing – just like port modifications, if a team wants to do specific testing on triplets they care about, + they can do so via their custom registry. + +Then, there is the question of why vcpkg needs a new solution for custom registries, +beyond the existing overlay ports feature. There are two big reasons – +the first is to allow a project to define the registries that they use for their dependencies, +and the second is the clear advantage in the user experience of the vcpkg tool. +If a project requires specific packages to come from specific registries, +they can do so without worrying that a user accidentally misses the overlay ports part of a command. +Additionally, beyond a feature which makes overlay ports easier to use, +custom registries allow for more complex and useful infrastructure around registries. +In the initial custom registry implementation, we will allow overlay ports style paths, +as well as git repositories, which means that people can run and use custom registries +without writing their own infrastructure around getting people that registry. + +It is the intention of vcpkg to be the most user-friendly package manager for C++, +and this allows us to fulfill on that intention even further. +As opposed to having to write `--overlay-ports=path/to/overlay` for every command one runs, +or adding an environment variable `VCPKG_OVERLAY_PORTS`, +one can simply write vcpkg install and the registries will be taken care of for you. +As opposed to having to use git submodules, or custom registry code for every project, +one can write and run the infrastructure in one place, +and every project that uses that registry requires only a few lines of JSON. + +[legal-example]: https://gcc.gnu.org/legacy-ml/libstdc++/2019-09/msg00054.html +[fuzzing]: https://en.wikipedia.org/wiki/Fuzzing + +## Specification + +We will be adding a new file that vcpkg understands - `vcpkg-configuration.json`. +The way that vcpkg will find this file is different depending on what mode vcpkg is in: +in classic mode, vcpkg finds this file alongside the vcpkg binary, in the root directory. +In manifest mode, vcpkg finds this file alongside the manifest. For the initial implementation, +this is all vcpkg will look for; however, in the future, vcpkg will walk the tree and include +configuration all along the way: this allows for overriding defaults. +The specific algorithm for applying this is not yet defined, since currently only one +`vcpkg-configuration.json` is allowed. + +The only thing allowed in a `vcpkg-configuration.json` is a `<configuration>` object. + +A `<configuration>` is an object: +* Optionally, `"default-registry"`: A `<registry-implementation>` or `null` +* Optionally, `"registries"`: An array of `<registry>`s + +Since this is the first RFC that adds anything to this field, +as of now the only properties that can live in that object will be +these. + +A `<registry-implementation>` is an object matching one of the following: +* `<registry-implementation.builtin>`: + * `"kind"`: The string `"builtin"` +* `<registry-implementation.directory>`: + * `"kind"`: The string `"directory"` + * `"path"`: A path +* `<registry-implementation.git>`: + * `"kind"`: The string `"git"` + * `"repository"`: A URI + * Optionally, `"path"`: An absolute path into the git repository + * Optionally, `"ref"`: A git reference + +A `<registry>` is a `<registry-implementation>` object, plus the following properties: +* Optionally, `"scopes"`: An array of `<package-name>`s +* Optionally, `"packages"`: An array of `<package-name>`s + +The `"packages"` and `"scopes"` fields of distinct registries must be disjoint, +and each `<registry>` must have at least one of the `"scopes"` and `"packages"` property, +since otherwise there's no point. + +As an example, a package which uses a different default registry, and a different registry for boost, +might look like the following: + +```json +{ + "default-registry": { + "kind": "directory", + "path": "vcpkg-ports" + }, + "registries": [ + { + "kind": "git", + "repository": "https://github.com/boostorg/vcpkg-ports", + "ref": "v1.73.0", + "scopes": [ "boost" ] + }, + { + "kind": "builtin", + "packages": [ "cppitertools" ] + } + ] +} +``` + +This will install `fmt` from `<directory-of-vcpkg.json>/vcpkg-ports`, +`cppitertools` from the registry that ships with vcpkg, +and any `boost` dependencies from `https://github.com/boostorg/vcpkg-ports`. +Notably, this does not replace behavior up the tree -- only the `vcpkg-configuration.json`s +for the current invocation do anything. + +### Behavior + +When a vcpkg command requires the installation of dependencies, +it will generate the initial list of dependencies from the package, +and then run the following algorithm on each dependency: + +1. Figure out which registry the package should come from by doing the following: + 1. If there is a registry in the registry set which contains the dependency name in the `"packages"` array, + then use that registry. + 2. For every scope, in order from most specific to least, + if there is a registry in the registry set which contains that scope in the `"scopes"` array, + then use that registry. + (For example, for `"cat.meow.cute"`, check first for `"cat.meow.cute"`, then `"cat.meow"`, then `"cat"`). + 3. If the default registry is not `null`, use that registry. + 4. Else, error. +2. Then, add that package's dependencies to the list of packages to find, and repeat for the next dependency. + +vcpkg will also rerun this algorithm whenever an install is run with different configuration. + +### How Registries are Layed Out + +There are three kinds of registries, but they only differ in how the registry gets onto one's filesystem. +Once the registry is there, package lookup runs the same, with each kind having it's own way of defining its +own root. + +In order to find a port `meow` in a registry with root `R`, vcpkg first sees if `R/meow` exists; +if it does, then the port root is `R/meow`. Otherwise, see if `R/m-` exists; if it does, +then the port root is `R/m-/meow`. (note: this algorithm may be extended further in the future). + +For example, given the following port root: + +``` +R/ + abseil/... + b-/ + boost/... + boost-build/... + banana/... + banana/... +``` + +The port root for `abseil` is `R/abseil`; the port root for `boost` is `R/b-/boost`; +the port root for `banana` is `R/banana` (although this duplication is not recommended). + +The reason we are making this change to allow more levels in the ports tree is that ~1300 +ports are hard to look through in a tree view, and this allows us to see only the ports we're +interested in. Additionally, no port name may end in a `-`, so this means that these port subdirectories +will never intersect with actual ports. Additionally, since we use only ASCII for port names, +we don't have to worry about graphemes vs. code units vs. code points -- in ASCII, they are equivalent. + +Let's now look at how different registry kinds work: + +#### `<registry.builtin>` + +For a `<registry.builtin>`, there is no configuration required. +The registry root is simply `<vcpkg-root>/ports`. + +#### `<registry.directory>` + +For a `<registry.directory>`, it is again fairly simple. +Given `$path` the value of the `"path"` property, the registry root is either: + +* If `$path` is absolute, then the registry root is `$path`. +* If `$path` is drive-relative (only important on Windows), the registry root is + `(drive of vcpkg.json)/$path` +* If `$path` is relative, the registry root is `(directory of vcpkg.json)/$path` + +Note that the path to vcpkg.json is _not_ canonicalized; it is used exactly as it is seen by vcpkg. + +#### `<registry.git>` + +This registry is the most complex. We would like to cache existing registries, +but we don't want to ignore new updates to the registry. +It is the opinion of the author that we want to find more updates than not, +so we will update the registry whenever the `vcpkg.json` or `vcpkg-configuration.json` +is modified. We will do so by keeping a sha512 of the `vcpkg.json` and `vcpkg-configuration.json` +inside the `vcpkg-installed` directory. + +We will download the specific ref of the repository to a central location (and update as needed), +and the root will be either: `<path to repository>`, if the `"path"` property is not defined, +or else `<path to repository>/<path property>` if it is defined. +The `"path"` property must be absolute, without a drive, and will be treated as relative to +the path to the repository. For example: + +```json +{ + "kind": "git", + "repository": "https://github.com/microsoft/vcpkg", + "path": "/ports" +} +``` + +is the correct way to refer to the registry built in to vcpkg, at the latest version. + +The following are all incorrect: + +```json +{ + "$reason": "path can't be drive-absolute", + "kind": "git", + "repository": "https://github.com/microsoft/vcpkg", + "path": "F:/ports" +} +``` + +```json +{ + "$reason": "path can't be relative", + "kind": "git", + "repository": "https://github.com/microsoft/vcpkg", + "path": "ports" +} +``` + +```json +{ + "$reason": "path _really_ can't be relative like that", + "kind": "git", + "repository": "https://github.com/microsoft/vcpkg", + "path": "../../meow/ports" +} +``` |
