Providing external libraries for mw-core on jenkins and production
Bryan Davis
2014-05-29 14:58:34 UTC
I thought I'd start this discussion in earnest here on the mw-core
list and then take it to a larger list if needed once we have a
reasonable plan.

My logging changes [0][1][2][3] are getting closer to being mergeable
(the first has already been merged). Tony Thomas' Swift Mailer change
[4] is also progressing. Both sets of changes introduce the concept of
specifying external library dependencies, both required and suggested,
to mediawiki/core.git via composer.json. Composer can be used by
people directly consuming the git repository to install and manage
these dependencies. I gave a example set of usage instructions in the
commit message for my patch that introduced the dependency on PSR-3
[0]. In the production cluster, on Jenkins job runners and in the
tarball releases prepared by M&M we will want a different solution.

My idea of how to deal with this is to create a new gerrit repository
(mediawiki/core/vendor.git?) that contains a composer.json file
similar to the one I had in patch set 7 of my first logging patch [5].
This composer.json file would be used to tell Composer the exact
versions of libraries to download. Someone would manually run Composer
in a checkout of this repository and then commit the downloaded
content, composer.lock file and generated autoloader.php to the
repository for review. We would then be able to branch and use this
repository as git submodule in the wmf/1.2XwmfY branches that are
deployed to production and ensure that it is checked out along with
mw-core on the Jenkins nodes. By placing this submodule at $IP/vendor
in mw-core we would be mimicking the configuration that direct users
of Composer will experience. WebStart.php already includes
$IP/vendor/autoload.php when present so integration with the rest of
wm-core should follow from that. It would also be possible for M&M to
add this repo to their tarballs for distribution.

I think Ori has a slightly different idea about how to approach this
issue. I'd like to hear his idea in this thread and then reach
consensus on how to move forward here or take both ideas (and any
other credible alternatives) to a large list for a final decision.

[0]: https://gerrit.wikimedia.org/r/#/c/119939/
[1]: https://gerrit.wikimedia.org/r/#/c/119940/
[2]: https://gerrit.wikimedia.org/r/#/c/119941/
[3]: https://gerrit.wikimedia.org/r/#/c/119942/
[4]: https://gerrit.wikimedia.org/r/#/c/135290/
[5]: https://gerrit.wikimedia.org/r/#/c/119939/7/libs/composer.json,unified

Bryan Davis Wikimedia Foundation <***@wikimedia.org>
[[m:User:BDavis_(WMF)]] Sr Software Engineer Boise, ID USA
irc: bd808 v:415.839.6885 x6855
Bryan Davis
2014-05-29 16:14:42 UTC
Post by Bryan Davis
I thought I'd start this discussion in earnest here on the mw-core
list and then take it to a larger list if needed once we have a
reasonable plan.
My logging changes [0][1][2][3] are getting closer to being mergeable
(the first has already been merged). Tony Thomas' Swift Mailer change
[4] is also progressing. Both sets of changes introduce the concept of
specifying external library dependencies, both required and suggested,
to mediawiki/core.git via composer.json. Composer can be used by
people directly consuming the git repository to install and manage
these dependencies. I gave a example set of usage instructions in the
commit message for my patch that introduced the dependency on PSR-3
[0]. In the production cluster, on Jenkins job runners and in the
tarball releases prepared by M&M we will want a different solution.
My idea of how to deal with this is to create a new gerrit repository
(mediawiki/core/vendor.git?) that contains a composer.json file
similar to the one I had in patch set 7 of my first logging patch [5].
This composer.json file would be used to tell Composer the exact
versions of libraries to download. Someone would manually run Composer
in a checkout of this repository and then commit the downloaded
content, composer.lock file and generated autoloader.php to the
repository for review. We would then be able to branch and use this
repository as git submodule in the wmf/1.2XwmfY branches that are
deployed to production and ensure that it is checked out along with
mw-core on the Jenkins nodes. By placing this submodule at $IP/vendor
in mw-core we would be mimicking the configuration that direct users
of Composer will experience. WebStart.php already includes
$IP/vendor/autoload.php when present so integration with the rest of
wm-core should follow from that. It would also be possible for M&M to
add this repo to their tarballs for distribution.
I think Ori has a slightly different idea about how to approach this
issue. I'd like to hear his idea in this thread and then reach
consensus on how to move forward here or take both ideas (and any
other credible alternatives) to a large list for a final decision.
[0]: https://gerrit.wikimedia.org/r/#/c/119939/
[1]: https://gerrit.wikimedia.org/r/#/c/119940/
[2]: https://gerrit.wikimedia.org/r/#/c/119941/
[3]: https://gerrit.wikimedia.org/r/#/c/119942/
[4]: https://gerrit.wikimedia.org/r/#/c/135290/
[5]: https://gerrit.wikimedia.org/r/#/c/119939/7/libs/composer.json,unified
I was just talking about this email with RobLa and he brought up a use
case that my current description doesn't fully explain and I
remembered one that Ori gave on irc that is similar but slightly

RobLa's example is that of an external library that we need to patch
for WMF useage and upstream the change. To keep from blocking things
for our production cluster we would want to fork the upstream, add our
patch for local use and upstream the patch. During the time that the
patch was pending review in the upstream we would want to use our
locally patched version in production and Jenkins.

Composer provides a solution for this with its "repository" package
source. The Composer documentation actually gives this exact example
in their discussion of the "vcs" repository type [6]. We would create
a gerrit repository tracking the external library, add our patch(es),
adjust the composer.json file in mediawiki/core/vendor.git to
reference our fork, and finally run Composer in
mediawiki/core/vendor.git to pull in our patched version.

The example that Ori gave on irc was for libraries that we are
extracting from mw-core and/or extensions to be published externally.
This may be done for any and all of the current $IP/includes/libs
classes and possibly other content from core such as FormatJson.

My idea for this would be to create a new gerrit repository for each
exported project. The project repo would contain a composer.json
manifest describing the project correctly to be published at
packagist.org like most Composer installable libraries. In the
mediawiki/core/vendor.git composer.json file we would pull these
libraries just like any third-party developed library. This isn't
functionally much different than the way that we use git submodules
today. There is one extra level of indirection when a library is
changed. The mediawiki/core/vendor.git will have to be updated with
the new library version before the hash for the git submodule of
mediawiki/core/vendor.git is updated in a deploy or release branch.

[6]: https://getcomposer.org/doc/05-repositories.md#vcs

Bryan Davis Wikimedia Foundation <***@wikimedia.org>
[[m:User:BDavis_(WMF)]] Sr Software Engineer Boise, ID USA
irc: bd808 v:415.839.6885 x6855
Chris Steipp
2014-05-29 17:05:37 UTC
Post by Bryan Davis
Post by Bryan Davis
I thought I'd start this discussion in earnest here on the mw-core
list and then take it to a larger list if needed once we have a
reasonable plan.
My logging changes [0][1][2][3] are getting closer to being mergeable
(the first has already been merged). Tony Thomas' Swift Mailer change
[4] is also progressing. Both sets of changes introduce the concept of
specifying external library dependencies, both required and suggested,
to mediawiki/core.git via composer.json. Composer can be used by
people directly consuming the git repository to install and manage
these dependencies. I gave a example set of usage instructions in the
commit message for my patch that introduced the dependency on PSR-3
[0]. In the production cluster, on Jenkins job runners and in the
tarball releases prepared by M&M we will want a different solution.
My idea of how to deal with this is to create a new gerrit repository
(mediawiki/core/vendor.git?) that contains a composer.json file
similar to the one I had in patch set 7 of my first logging patch [5].
This composer.json file would be used to tell Composer the exact
versions of libraries to download. Someone would manually run Composer
in a checkout of this repository and then commit the downloaded
content, composer.lock file and generated autoloader.php to the
repository for review. We would then be able to branch and use this
repository as git submodule in the wmf/1.2XwmfY branches that are
deployed to production and ensure that it is checked out along with
mw-core on the Jenkins nodes. By placing this submodule at $IP/vendor
in mw-core we would be mimicking the configuration that direct users
of Composer will experience. WebStart.php already includes
$IP/vendor/autoload.php when present so integration with the rest of
wm-core should follow from that. It would also be possible for M&M to
add this repo to their tarballs for distribution.
I think Ori has a slightly different idea about how to approach this
issue. I'd like to hear his idea in this thread and then reach
consensus on how to move forward here or take both ideas (and any
other credible alternatives) to a large list for a final decision.
[0]: https://gerrit.wikimedia.org/r/#/c/119939/
[1]: https://gerrit.wikimedia.org/r/#/c/119940/
[2]: https://gerrit.wikimedia.org/r/#/c/119941/
[3]: https://gerrit.wikimedia.org/r/#/c/119942/
[4]: https://gerrit.wikimedia.org/r/#/c/135290/
I was just talking about this email with RobLa and he brought up a use
case that my current description doesn't fully explain and I
remembered one that Ori gave on irc that is similar but slightly
RobLa's example is that of an external library that we need to patch
for WMF useage and upstream the change. To keep from blocking things
for our production cluster we would want to fork the upstream, add our
patch for local use and upstream the patch. During the time that the
patch was pending review in the upstream we would want to use our
locally patched version in production and Jenkins.
Composer provides a solution for this with its "repository" package
source. The Composer documentation actually gives this exact example
in their discussion of the "vcs" repository type [6]. We would create
a gerrit repository tracking the external library, add our patch(es),
adjust the composer.json file in mediawiki/core/vendor.git to
reference our fork, and finally run Composer in
mediawiki/core/vendor.git to pull in our patched version.
The example that Ori gave on irc was for libraries that we are
extracting from mw-core and/or extensions to be published externally.
This may be done for any and all of the current $IP/includes/libs
classes and possibly other content from core such as FormatJson.
My idea for this would be to create a new gerrit repository for each
exported project. The project repo would contain a composer.json
manifest describing the project correctly to be published at
packagist.org like most Composer installable libraries. In the
mediawiki/core/vendor.git composer.json file we would pull these
libraries just like any third-party developed library. This isn't
functionally much different than the way that we use git submodules
today. There is one extra level of indirection when a library is
changed. The mediawiki/core/vendor.git will have to be updated with
the new library version before the hash for the git submodule of
mediawiki/core/vendor.git is updated in a deploy or release branch.
I'm assuming we'll eventually branch the project repo for each mediawiki
release, in so if mediawiki 1.24 relies on one version of a library, and
1.25 another, that will all get handled?

Obligatory security questions:
* Who is going to approve what libraries we use, since we're basically
blessing the version we use? And are we going to require code reviews for
all of them?
* Who is going to remain responsible for making sure that security updates
in those dependencies are merged with our repos and new versions of
mediawiki tarballs released?

(/me yells "Not it!")

As long as we have strong, ongoing, internal commitment to this, then I
don't see a problem.
Post by Bryan Davis
[6]: https://getcomposer.org/doc/05-repositories.md#vcs
[[m:User:BDavis_(WMF)]] Sr Software Engineer Boise, ID USA
irc: bd808 v:415.839.6885 x6855
MediaWiki-Core mailing list
Bryan Davis
2014-05-29 17:42:35 UTC
Post by Chris Steipp
I'm assuming we'll eventually branch the project repo for each mediawiki
release, in so if mediawiki 1.24 relies on one version of a library, and
1.25 another, that will all get handled?
* Who is going to approve what libraries we use, since we're basically
blessing the version we use? And are we going to require code reviews for
all of them?
* Who is going to remain responsible for making sure that security updates
in those dependencies are merged with our repos and new versions of
mediawiki tarballs released?
(/me yells "Not it!")
As long as we have strong, ongoing, internal commitment to this, then I
don't see a problem.
I just rewrote and sent this email to wikitech-l with the
encouragement of Ori. It would probably be good for Chris to share his
concerns publically there.

As to these questions, yeah we need to figure this out. I think the
cat is already out of the bag on using external libraries. Short of a
veto of the concept by this group I think it's down to a question of
"how" rather than "when" or "if".

Review should be required to get a new library approved I think
certainly. We don't want to open up the floodgates to allow any random
code into use by mediawiki/core. As to the level of review needed for
any particular library, I'm not sure that I'm qualified to answer this
definitively. Maybe any new external library should be subject to the
RFC process to plead the case for why it is needed?

Perhaps for the tracking of security issues each library would have an
"owner" (probably the original importer) who would be answerable to
Chris for tracking and updating the library? Do we have any process
today for the various javascript libraries we rely on? /me bets it's
something like "Krinkle takes care of that."

Bryan Davis Wikimedia Foundation <***@wikimedia.org>
[[m:User:BDavis_(WMF)]] Sr Software Engineer Boise, ID USA
irc: bd808 v:415.839.6885 x6855
Ori Livneh
2014-05-29 16:52:09 UTC
Post by Bryan Davis
I think Ori has a slightly different idea about how to approach this
issue. I'd like to hear his idea in this thread and then reach
consensus on how to move forward here or take both ideas (and any
other credible alternatives) to a large list for a final decision.
What you propose seems sensible. +1 from my perspective.