Problem/Motivation

According to repo page here. the repo size is 1.1 GB, which is huge for a repo.

Even the repo size of the Drupal core is just 174.5 MB.

I think, for some countries, cloning the repo is very slow, including China where I am.

At the commit 4d6578f3

$ du -sh * | sort -rh
337M    ebooks
87M     source
1.8M    assets
928K    scripts
444K    guidelines
16K     templates
8.0K    README.uk.txt
4.0K    README.txt
4.0K    ASSETS.yml

The ebooks folder is the largest eaten the most

And I found that the tag/release page of the Gitlab instance which d.o. used is able to attach files. for example: https://git.drupalcode.org/project/user_guide/tags/8.x-7.1/release/edit

Proposed resolution

  • Attach files under the ebook folder to the tag/release page, remove the ebook folder from the repo (Don't know what is the max file size allowed)
  • If possible remove all ebook related commits or rewrite the git commit history to decrease the repo size further.

Remaining tasks

Needs discussion

Comments

jungle created an issue. See original summary.

jhodgdon’s picture

Component: User Guide content » Project management
Category: Support request » Task

Those are good suggestions. I will talk to the drupal.org infrastructure team to see what we can do, this coming week. Thanks for making an issue!

jungle’s picture

Issue summary: View changes

Slightly update the description. And thanks Jennifer for your quick response.

andypost’s picture

eBooks are artefacts of build and I see no reason to store them in repo, so better attach them as result of release

Other huge question is how to remove the files from repo history...

PS: the workaround is to use depth argument for fetch but it's not very useful to discard history

jhodgdon’s picture

The reason the eBooks are in the repo is so that people who download the zip file from the project page get them in the download. The point is that people who would want to read the User Guide are new to Drupal, and may not be sophisticated users of drupal.org, so we want to have an easy way for them to find and download the ebooks. If they are not in the repo, I still need a way to tell people where to find them, and I am not sure this idea of attaching them to a release in GitLab is the right answer. How will people find them?

And by the way, I asked in the infrastructure Slack channel yesterday about this but didn't get any answers...

jhodgdon’s picture

Sorry for the delay on this! I talked with @mixologic and @drumm today, and we have a plan, which we're working on:

a) I will create a doc page where we can attach the ebooks zip files for download. [working on that now]

b) I will attach the current ebooks zip files to that page. [working on that now]

c) I will create new branches for all existing branches in the repo, with everything as it was except no ebooks. The branches will be called, for example, 8.x-7.x-new.

d) I will create a new release tag for a new release 8.x-7.2 on the new branch.

e) @drumm will update the existing release nodes and tags, so that they point to the new branches.

f) @drumm will delete the old branches, and the ebooks directory and its git history.

g) Anyone with a git clone will need to reclone. I will send out email to our User Guide email list with details.

@drumm may have some modifications of steps (e) and (f)... but that's the general plan. I'll update here as steps are completed.

  • jhodgdon committed 60aa7e5 on 8.x-7.x
    Issue #3074020 by jhodgdon: Update gitignore and add script to make zips...
jhodgdon’s picture

I've completed (a) and (b) -- the page is: https://www.drupal.org/docs/8/understanding-drupal-8/user-guide-e-book-d...

I've also updated the User Guide home page to tell people to go to that page to download ebooks.

For (c)... The branch 8.x-0.x did not have ebooks, so we don't need to update it. There was not an 8.x-1.x.
So we will need these new branches:
8.x-2.x-new
8.x-3.x-new
8.x-4.x-new
8.x-5.x-new
8.x-6.x-new
8.x-7.x-new

And there are 11 release tags on those branches.

Anyway I'm working on (c).

jhodgdon’s picture

(c) is done -- the -new branches have all been created.

jungle’s picture

@jhodgdon++

jhodgdon’s picture

(d) is done -- 8.x-7.2 tag has been created on the 8.x-7.x-new branch. I've also sent notice to the email list asking people not to commit today, and letting them know that when the changeover is done, they will need to re-clone. @drumm says he will do his part of this (thanks!!) later this afternoon.

As a note, the download for 8.x-7.2 is 73 MB, whereas 8.x-7.1 was 368 MB. On the e-book download page, I had to separate the downloads by language to make smaller files... I wrote a quick script to package them up (that's the earlier commit you can see on this issue).

drumm’s picture

The branches are now swapped. *-old are copies of the old branches, they can be deleted when we think everything is working well. The regular, unsuffixed branches are now copies of *-new, and I deleted *-new.

Once the *-old branches are deleted, we can run garbage collection on the GitLab server, and the “1.5 GB Files” listed at https://git.drupalcode.org/project/user_guide should go down a little.

I made a fresh clone of the repository, it is still 1.14 GiB for the 8.x-7.x branch, which I suspect is due to all the revisions of all the screenshots in every language.

jhodgdon’s picture

Status: Active » Needs review

I posted a note to the email list letting everyone know they can clone now. I also suggested doing a clone with --depth, which greatly reduces the download needed.

Anyway... the repo looks good to me so far... I'll leave this issue at Needs Review until we decide it is Fixed. I'll build ebooks locally as a test... do you have a job that imports the repo to Staging periodically? If not, maybe we should pull the trigger on that, with the latest 8.x-7.x or that 8.x-7.2 tag?

drumm’s picture

I walked through all the Drupal.org www & batch servers which maintain clones of this repo, 2 had gotten into a weird state, the other 6 were fine. They are all now clean, and clones were re-shallowed out, since this repo size has been an occasional issue for those servers, too.

Imports are running on staging, in response to the Git activity, and look okay so far.

jhodgdon’s picture

Local ebook build completed with no problems.

baluertl’s picture

Thanks Jennifer and drumm the hard work towards cleaning up the repo.

Now recloned, checked my Git config under [user] section, and pushed a fresh commit to test if everything works as before. Indeed!

jhodgdon’s picture

I think we can go ahead and do the steps in comment #12 to clean up and garbage-collect. Thanks!

guiu.rocafort.ferrer’s picture

Also tested in a clean git clone and completed a local ebook build without problems.

Also think we can go ahead with the steps in #12.

jhodgdon’s picture

@drumm -- It seems fine to go ahead and remove the -old branches and garbage collect. Thanks!

jhodgdon’s picture

OK, I did a git push origin --delete for all of the -old branches.

jhodgdon’s picture

Status: Needs review » Fixed

It looks like the garbage collection is done. The repo size is now 1.1GB on https://git.drupalcode.org/project/user_guide -- down from 1.5GB on comment #12.

There are a lot of images... Most of them are each under 100KB, with just 4 in the English directory in the 100-200KB range, and the other languages have similar sizes... but they add up. We currently have 13 languages, and each one has about 100 images.

I think we're done here.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.