Problem/Motivation

  1. Code is baked in to locale module. Drupal core has some support for parsing and generating Gettext Portable Object (.po) files. This functionality was improved with Drupal 7 to support text context, that was part of the spec but not implemented in Drupal before. However, the actual code that is used to process and generate Gettext files is so baked in, there is no practical way to reuse at all. You can only import all strings from a filename (tied together all through from opening the file to parsing and writing to the database); you can only export direct from the database for a HTTP header equipped download. Everything is vey tightly coupled.
  2. Duplicate code done with other modules As a consequence, contrib modules such as l10n_client, l10n_server, l10n_update and gettextapi need to duplicate this code.
  3. Parser is bound to fixed sources (memory, db) The current .po file parser for example takes modes like memory or database level storage and it has fixed callbacks invoked to store the data. It also takes a filename which it opens and reads itself. This makes it impossible to reuse, read in chunks, read from other sources, store at other places, etc. The above mentioned modules have examples of reworked versions of this API.
  4. It should be independent code Ultimately the gettext .po file parsing and generation code should be entirely independent of the locale module UI and/or database.
  5. Memory and runtime limits due to all in imports of possibly big files see #1637348: Import Gettext .po files in progressive batches to avoid time limits

Proposed resolution

Create a new Component Drupal/Component/Gettext

By introducing a generic Gettext component we get independent code. This will move gettext.inc functions into their new classes. The component must be capable of reading from/writing to *.po files, reading/writing to database, 'writing' to memory. Code involving reading and writing to the database is locale module specific, so those implementations should live with the locale module. More generic implementations for file parsing and generation and memory based loading (for the installer) should reside with the component.

Proposal is to have separate components for each direction (read, write) and each source type. This leads to:

  1. PoStreamReader
  2. PoStreamWriter
  3. PoDatabaseReader (implemented inside Drupal\locale\GetText, because it is locale database specific)
  4. PoDatabaseWriter (implemented inside Drupal\locale\GetText, because it is locale database specific)
  5. PoMemoryWriter

We can then chain a Reader to a Writer (core common implementations in Drupal\locale\GetText) to get a job done:

  • PoStreamReader -> PoDatabaseWriter (importing a translation)
  • PoStreamReader -> PoMemoryWriter (installing Drupal in ones own language)
  • PoDatabaseReader -> PoStreamWriter (exporting a translation, creating a *.POT file)

The architectural overview of the interfaces and classes is proposed is the following:

This solves all the problems explained above except the memory/runtime limit issue. We have been working hard to solve that as part of this patch, but after 200+ comments and lots of existing work, we decided to move that to a followup at #1637348: Import Gettext .po files in progressive batches to avoid time limits.

Related issues

Original report by Gábor Hojtsy

I've proposed this as a GSoC project at http://groups.drupal.org/node/138369 but nobody was interested. It is probably not interesting enough, but still needs to be done. Here is my existing writeup.

Drupal core has some support for parsing and generating Gettext Portable Object (.po) files. This functionality was improved with Drupal 7 to support text context, that was part of the spec but not implemented in Drupal before. However, the actual code that is used to process and generate Gettext files is so baked in, there is no practical way to reuse at all.

As a result, for example Translation template extractor and Localization server duplicate code for generating .po files themselves while Localization update and Localization server duplicate code to process .po files. While these implementations are more pluggable, there are already too many copies and the bigger problems are not solved. A major problem with .po file parsing is our one-pass approach, which bumps into memory and runtime limits on basic hosting. (That issue is being discussed at #569004: Add support for seek based batch import of .po files).

The current .po file parser for example takes modes like memory or database level storage and it has fixed callbacks invoked to store the data. It also takes a filename which it opens and reads itself. This makes it impossible to reuse, read in chunks, read from other sources, store at other places, etc. The above mentioned modules have examples of reworked versions of this API.

Ultimately the gettext .po file parsing and generation code should be entirely independent of the locale module UI and/or database.

CommentFileSizeAuthor
#268 1189184-locale-oop-268.patch610 bytesandypost
#265 locale-catch-265.patch1.29 KBKristen Pol
#264 locale-catch-264.patch1.3 KBKristen Pol
#259 locale-catch-259.patch1.28 KBpenyaskito
#257 locale-catch.patch1.18 KBGábor Hojtsy
#252 gettext-changelog.patch490 bytesGábor Hojtsy
#251 gettext-changelog.txt490 bytesGábor Hojtsy
#245 i1189184-245-interdiff.txt3.99 KBattiks
#245 i1189184-245.patch117.24 KBattiks
#244 interdiff.txt2.18 KBpenyaskito
#244 i1189184-244.patch116.73 KBpenyaskito
#243 interdiff.txt3.37 KBGábor Hojtsy
#243 i1189184-243.patch116.53 KBGábor Hojtsy
#241 i1189184-241.patch116.43 KBGábor Hojtsy
#241 interdiff.txt4.81 KBGábor Hojtsy
#240 interdiff.txt19.93 KBGábor Hojtsy
#240 i1189184-240.patch117.06 KBGábor Hojtsy
#237 interdiff.txt10.52 KBGábor Hojtsy
#237 i1189184-237.patch117.21 KBGábor Hojtsy
#236 i1189184-236.patch116.87 KBGábor Hojtsy
#235 interdiff.txt45.89 KBGábor Hojtsy
#234 i1189184-234.patch113.34 KBattiks
#232 i1189184-232.patch113.29 KBattiks
#230 i1189184-230.patch113.29 KBattiks
#225 i1189184-225.patch114.15 KBattiks
#223 i1189184-223.patch113.29 KBattiks
#212 i1189184-212.patch112.14 KBattiks
#209 i1189184-209.patch112.1 KBattiks
#207 i1189184-207.patch111.73 KBattiks
#205 i1189184-195.patch111.9 KBattiks
#202 1189184-gettext-parsing-202.patch679.61 KBpenyaskito
#200 GettextOOP.png63.15 KBGábor Hojtsy
#194 i1189184-194.patch107.72 KBattiks
#192 i1189184-192.patch107.71 KBattiks
#189 i1189184-188.patch107.5 KBattiks
#186 i1189184-186.patch113.74 KBattiks
#182 i1189184-180.patch133.43 KBattiks
#180 i1189184-180.patch133.49 KBattiks
#175 1189184-gettext-parsing-175.patch133.56 KBpenyaskito
#170 i1189184-170.patch143.02 KBattiks
#169 1189184-gettext-parsing-164.patch143.36 KBpenyaskito
#166 1189184-gettext-parsing-164.patch1.26 MBpenyaskito
#164 1189184-gettext-parsing-154.patch146.26 KBpenyaskito
#154 i1189184.patch142.62 KBattiks
#148 1189184-gettext-parsing-133.patch132.71 KBpenyaskito
#138 i1189184-138-4-do-not-test.patch0 bytesattiks
#133 1189184-gettext-parsing-131.patch125.07 KBpenyaskito
#131 1189184-gettext-parsing-129.patch110.96 KBpenyaskito
#131 interim.txt1.65 KBpenyaskito
#129 1189184-gettext-parsing-127.patch110.97 KBpenyaskito
#127 1189184-gettext-parsing-125.patch111.47 KBpenyaskito
#125 1189184-gettext-parsing-124.patch111.09 KBpenyaskito
#125 interim.txt40.36 KBpenyaskito
#123 1189184-gettext-parsing-123.patch111.06 KBclemens.tolboom
#119 1189184-gettext-parsing-119.patch111.32 KBclemens.tolboom
#116 1189184-gettext-parsing-116.patch111.19 KBclemens.tolboom
#100 1189184-gettext-parsing-100.patch111.66 KBclemens.tolboom
#95 1189184-gettext-parsing-95.patch111.65 KBSutharsan
#90 1189184-gettext-parsing-90.patch111.65 KBclemens.tolboom
#86 UML-gettext.png46.15 KBclemens.tolboom
#84 1189184-gettext-parsing-84.patch111.49 KBclemens.tolboom
#79 1189184-gettext-parsing-79.patch108.13 KBclemens.tolboom
#76 1189184-gettext-parsing-76.patch109.75 KBclemens.tolboom
#70 1189184-gettext-parsing-70.patch92.27 KBclemens.tolboom
#66 1189184-gettext-parsing-66.patch75.12 KBclemens.tolboom
#64 1189184-gettext-parsing-64.patch69.58 KBclemens.tolboom
#59 1189184-gettext-parsing-58.patch69.92 KBclemens.tolboom
#56 1189184-gettext-parsing-56.patch56.06 KBclemens.tolboom
#53 1189184-gettext-parsing-53.patch50.48 KBclemens.tolboom
#51 1189184-gettext-parsing-51.patch49.6 KBclemens.tolboom
#33 1189184-gettext-parsing-33.patch42.55 KBclemens.tolboom
#32 1189184-gettext-parsing-32.patch37.32 KBSutharsan
#29 20120323-gbntg1y73fycp3hhi8wkhw253t.jpeg84.45 KBGábor Hojtsy
#26 1189184-gettext-parsing-26.patch24.65 KBSutharsan
#25 gettext.sketch.txt7.61 KBSutharsan
#19 1189184-gettext-parsing-18.patch5.32 KBSutharsan
#13 gettextWriter.patch2.5 KBGábor Hojtsy
#8 incorporate-gettextapi.patch51.64 KBGábor Hojtsy
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Gábor Hojtsy’s picture

Gábor Hojtsy’s picture

#361597: CRUD API for locale source and locale target strings is a requirement to make this work well.

Gábor Hojtsy’s picture

#420510: Possibly misleading documentation/func naming for developers on locale.inc functions is not a duplicate clearly but makes a great point that the function naming and responsibility separation is broken. It is going to be resolved here and in #361597: CRUD API for locale source and locale target strings, so marked it duplicate.

Gábor Hojtsy’s picture

Assigned: Unassigned » Gábor Hojtsy

I'm hard at work on this as well as #361597: CRUD API for locale source and locale target strings at http://drupal.org/project/gettextapi for Drupal 7 so we can work out our solution in the wild, get a lot of real life use and then integrate into Drupal 8. The import refactoring is already very well ahead there.

Gábor Hojtsy’s picture

I've documented how this fits to overall Druapl 8 plans at http://groups.drupal.org/node/161589.

Gábor Hojtsy’s picture

Gábor Hojtsy’s picture

Title: Decouple gettext format handling from locale database and API » Decouple gettext format handling from locale database and API, integrate gettext_api module in core
Gábor Hojtsy’s picture

Status: Postponed » Needs review
FileSize
51.64 KB

#1219236: Locale module includes 1350+ lines of unneeded code on all page loads was committed to Drupal 8, so here is a quick stab at just taking the D7 gettextapi approach and applying it to current D8 core. You'll see that there are many callbacks and states passed around, so these are clear signs of possibilities to improve this much better with an OOP approach. Anyway, here is a direct port of the module to core for now to start this off. Lots to do here.

Gábor Hojtsy’s picture

#8: incorporate-gettextapi.patch queued for re-testing.

Gábor Hojtsy’s picture

#1260586: Consolidate .po file import to one directory touches the very same code and I consider that more important at the moment (also useful for installer usability improvements) and generally a gateway for starting integration work for l10n_update, so putting this one a bit on hold for now.

Gábor Hojtsy’s picture

Status: Needs review » Postponed

Adding UI language translation tag.

Also marking postponed for now on ongoing tasks like #532512: Plural string storage is broken, editing UI is missing

Gábor Hojtsy’s picture

Issue tags: +language-ui

Add interface language tag.

Gábor Hojtsy’s picture

Title: Decouple gettext format handling from locale database and API, integrate gettext_api module in core » Make gettext parsing/generation its own abstracted functionality
FileSize
2.5 KB

Retitling, since gettextapi in contrib is all procedural but we'll obviously want to rework this to be all around buzzword compliant like PSR-0 named/placed, etc. We want to likely attack the export functionality first once #532512: Plural string storage is broken, editing UI is missing lands, since export is much easier to do. I've in fact started out working on this and have a rough version here... It will not work, but I started to minimize/modernize the code proper. The idea is that an instance of this would get a string iterator that would feed the export with strings and it would format the individual strings with its formatString(). This is just a start. Not even setting for review, since we need #532512: Plural string storage is broken, editing UI is missing to land first.

Sutharsan’s picture

Gábor Hojtsy’s picture

The CodeIgniter code uses PHP's built-in gettext extension that is a no-go for Drupal (eg. it needs to kill the PHP processes to refresh the gettext info from the files last time I tried). Also I could not find actual .po parsing or generation code when skimming through the WP code.

Sutharsan’s picture

Sutharsan’s picture

formatString() in #13 works fine, except it removes the trailing space from every string part. An additional parameter for wordwrap() restores the space:

    $parts = explode("\n", wordwrap(str_replace('\n', "\\n\n", $string), 70, " \n"));
Sutharsan’s picture

My first wobbly steps in Drupal OOP. Be kind on me, but hard on the code ;) Added a translations database iterator to accompany to Gábor's (#13) Writer class.
I sure hope it is worth to use an iterator here because processing db records takes twice as much time as the usual db_query + foreach. Yes I see the benefits of seek based batch imports, but do we need all this flexibility when reading translations from the database?
The patch requires #532512: Plural string storage is broken, editing UI is missing.

Gábor Hojtsy’s picture

Is dbtng not providing us with ways to ask if we can continue to fetch? That is in itself iterator based, so I assume we can just ask, and that might mitigate most of the overhead.

Gábor Hojtsy’s picture

Status: Postponed » Needs review

Btw patch looks good. #532512: Plural string storage is broken, editing UI is missing landed, so thtat unblock this issue. Yay!

Gábor Hojtsy’s picture

Assigned: Gábor Hojtsy » Unassigned
Issue tags: +sprint

On that iterator, in other words, if we define our API to expect objects like the ones we are getting from the query result, we can forego doing are own iterator and just depend on the query result iterator, which is already there. If we are fine with the data directly received from the query.

Gábor Hojtsy’s picture

Title: Make gettext parsing/generation its own abstracted functionality » Make gettext .po generation its own abstracted functionality

Retitling for the export we are working on.

Sutharsan’s picture

Assigned: Unassigned » Sutharsan
Status: Needs review » Needs work

One more reference: Gettext and locale API project.
I've started to sketch out the code and expect to have some results in the following days. Just in time for the sprint ;)

Sutharsan’s picture

FileSize
7.61 KB

The sketch code I'd like to share. It covers the procedural part of the gettext import/export handling, the batch processing and the required methods from a GettextData class. No executable code yet, but big steps. Leaving out all fault handling, exceptions and user experience. Next is to sketch out the Gettext class(es).

Sutharsan’s picture

The patch contains:

  • abstract class Reader
  • abstract class Writer
  • class PoWriter extends Writer
  • interface GettextInterface
  • class GettextFileInterface implements GettextInterface

I need review and advise on how to connect a (generic) data source/target to the reader/writer. And whether to use an abstract class, use an traversable or interator, use a steam wrapper.

Gábor Hojtsy’s picture

I started reviewing this as a patch, but it looks like would need more of a review as an "architecture", ie. do you have a drawing of how different components fit here? It looked like pretty well organized on a quick pass, but it would be great if I'd not need to recreate a figure on how it works, if you have one or can produce one quick :) Ie. the different classes introduced, and how they are related. I hope it should be simple to produce :) Thanks.

Sutharsan’s picture

A quick UML of the Writer + GettextInterface classes: https://skitch.com/erikstielstra/8csg9/gettext-uml.graffle

Gábor Hojtsy’s picture

Posting here so it is sure available later.

20120323-gbntg1y73fycp3hhi8wkhw253t.jpeg

Gábor Hojtsy’s picture

Ok, so the patch has GettextPoFile, GettextDatabase, etc. that are not reflected here. Also it has both import and export which is not reflected on this figure. Can you elaborate a bit on it?

Sutharsan’s picture

For the architecture, you may ignore the gettext.sketch.inc file in the beginning of the patch. This was written before I started working on the classes. It should read $file = new GettextFileInterface('public://test-1.po.txt'); Below you find a piece of testing code I used for the PoWrite class:

    use Drupal\Core\Gettext\GettextFileInterface;
    use Drupal\Core\Gettext\PoWriter;
     
    function test_gettext() {
     
      $langcode = 'nl';
      $translations = array(
        (object) array(
          'source' => 'home',
          'translation' => 'thuis',
          'plural' => 0,
          'context' => '',
        ),
        (object) array(
          'source' => 'delete',
          'translation' => 'verwijderen',
          'plural' => 0,
          'context' => '',
        ),
        (object) array(
          'source' => array('1 day', '@count days'),
          'translation' => array('1 dag', '@count dagen'),
          'plural' => 1,
          'context' => '',
        ),
      );
     
      $file = new GettextFileInterface('public://test-1.po.txt');
      $target = new PoWriter($file, $langcode);
      $header = array(
          'authors' => array('me', 'you'),
          'po_date' => date("Y-m-d H:iO"),
          'plurals' => 'nplurals=2; plural=(n > 1);',
        );
      $target->setMetaData($header, $langcode);
      foreach ($translations as $translation) {
        $target->write($translation);
      }
    }

For simplicity I only focused on the Write side. Architecturally the Read side should be similar. Write is an abstract class, PoWrite, DatabaseWrite are derived from that and handle the syntax specific Writing. Translations are transferred between Read and Write as an (translation) object. PoWrite recieves a translations object and passes a set of formatted strings to the data interface. But this is where my problems begin. I thought to use the stream wrapper interface for file handling. But could not make a smooth connection with it. In the patch the GettextFileInterface class uses the PublicStream. This should be replaced by a TranslationStream class which uses the translation directory instead of the files directory.

Sutharsan’s picture

Reader and PoReader class added.
high level functions in gettext.sketch.inc modified to match the current architecture.

clemens.tolboom’s picture

Assigned: Unassigned » Sutharsan
FileSize
42.55 KB

Attached is a tiny update of some classes + inclusion of a test-script + a naive Iterator example of a PoInfiniteProducer which never fails to produce yet another translation structure.

I'm still puzzled by the use-case of Stream versus Iterator. And waiting for fuhby's lexer :)

To test run

drush @drupal.d8 php-script core/lib/Drupal/Core/Gettext/testGettext.php

Next check the public file system for new files.

Sutharsan’s picture

Assigned: Sutharsan » Unassigned

I'll jump back on it when I can contribute.

clemens.tolboom’s picture

Just some notes:

As we are now using symfony I checked their code. A simple parser: http://trac.symfony-project.org/ticket/9582 lead to inclusion into https://github.com/symfony/Translation/tree/master/Loader (unfortunately master).

Symfony PoDumper https://github.com/symfony/symfony/blob/master/src/Symfony/Component/Tra... has some nifty escaping

Also the plural processing is nice: https://github.com/symfony/Translation/blob/master/PluralizationRules.php

Should we spent time on these?

Gábor Hojtsy’s picture

@Clemens: all right, seems like we did not look hard enough in Symfony. Looking at their .po parsed code, it is definitely pretty simplistic, and does not cover many of our needs, such as being able to read a .po file in multiple HTTP requests. That does not mean we should do our stuff all from scratch. Question is if we can merge efforts somehow (and get them put their code on a release branch instead of it sitting on master only :).

clemens.tolboom’s picture

It looks like the Symfony implementation is not stream aware thus excluding it's use for us right now.

But the other bits (escaping + plural) could help us I guess.

I'm waiting for fuhby implementation of his stream PO parser to review (or match my thought against :p)

fubhy’s picture

The entire symfony translator library looks really interesting and I think we could do many really awesome things with that. It basically covers most of what I talked to you, Gabor, so enthusiastically about during our sprint. Please take a couple of minutes to look at this https://github.com/symfony/Translation in general .... The Parser, Dumper, etc. in this library are indeed very simplistic and we are going to need more (and I already have more) but the other stuff there looks really sweet at a first glance.

I've got a lot of work to catch up with due to DCon and won't be able to finish this before the weekend. I am looking forward to digging even deeper into this... Sorry for the delay.

clemens.tolboom’s picture

Assigned: Sutharsan » Unassigned

I'm more confused after reading #569004: Add support for seek based batch import of .po files and project http://drupal.org/project/gettextapi referred in that and this issue.

From @Sutharsan diagram and @fubhy intention to write a parser to _read_ .po files the title (and history) of this issue suggest something different: _write_ .po files.

Considering the Symfony way (Having a Dumper, Writer, Parser all in one Class tree) I still expect something similar in this patch.

So am I missing something important here?

(I'm still grasping StreamWrapper which are renamed through #1475020: Convert stream_wrappers.inc to PSR-0)

clemens.tolboom’s picture

Concerning Symfony translation component:

Translation
- we cannot get access to the read entries so reading and then stuff items into the database is a nogo. The translation is done by echo $translator->trans('Hello World!') . "\n"; .

PoFileLoader is missing features we probably need:
- it cannot parse our files ... ours are missing a white-space before each entry (http://www.gnu.org/software/gettext/manual/gettext.html#PO-Files). I checked ie drupal-7.11.ca.po and commerce_kickstart-7.x-1.3.da.po

- as documented by PoFileLoader::parse()

     * - No support for comments spanning multiple lines.
     * - Translator and extracted comments are treated as being the same type.
     * - Message IDs are allowed to have other encodings as just US-ASCII.

The following example is translation in action with a whitespace fixed po-file. And a prepared vendor tree with two added components:

ls -l core/vendor/Symfony/Component/
...
Config -> ../../../../../symfony/src/Symfony/Component/Config
Translation -> ../../../../../../symfony/Translation
drush php-script ../../code/translation-test.php 
Home : Inici
Enable !title : Habilita !title
Fields type(s) in use : Tipus de camp(s) en ús

use Symfony\Component\Translation\Translator;
use Symfony\Component\Translation\MessageSelector;
use Symfony\Component\Translation\Loader\ArrayLoader;

$filename = "drupal-7.11.ca.po";
$fullpath = dirname(__FILE__) . '/' . $filename;

$translator = new Translator('ca', new MessageSelector());
$translator->addLoader('po', new Symfony\Component\Translation\Loader\PoFileLoader());
$translator->addResource('po', $fullpath, 'ca');

$lookup = array(
  'Home',
  'Enable !title',
  'Fields type(s) in use',
);
foreach ($lookup as $english) {
  echo "$english : " . $translator->trans($english) . "\n";
}

My conclusion for now:

  1. I do not have enough knowledge about resource handling by Symfony thus cannot compare that with our Drupal needs. The resources are loaded when needed (see the Translator::trans code)
  2. All different file formats do look interesting enough if and when Drupal needs this.
  3. The PoFileLoader cannot load Drupal .po files
  4. The FileLoaders are not Batch aware as I guess there is no needs for this within Symfony context (?).
  5. We must write our own Batch aware readers and writers. (duh)
Gábor Hojtsy’s picture

@Cleemns: I'm not aware of a requirement for whitespace between different source/translation pairs in .po files. The best description of .po file format from http://www.gnu.org/software/gettext/manual/html_node/PO-Files.html says Entries begin with some optional white space. Usually, when generated through GNU gettext tools, there is exactly one blank line between entries.. So the GNU gettext tools add a blank line, but this is entirely optional.

clemens.tolboom’s picture

After puzzling a lot how to manage state (and streams work) I want to propose to build the following together with fuhby:

1. BatchStreamManager class

- This manage any Stream for read/write operation.
- Knows how to recover a stream for the next batch op.

2. POStreamHandler class

- Implements get/setHeader to read or write the PO Header
- read/write PO items
- Uses the stream managed by BatchStreamManager

3. POItem class

4. POHeader class

An small code example would look like the following.

//Batch setup:

// The file from either public:// private:// temporary:// or even http://
$uri = 'public://text-file.txt.po';

// $mode = read / write
$bsm = new BatchStreamManager()
$bsm->open($uri, $mode, $options);
// uses file_stream_wrapper_get_instance_by_uri($uri);

// The stream to work on.
$my_s = $bsm->getStream();
$po = new POStreamProcessor($my_s);

// read header
$po->getHeader();

// write mode
$po->setHeader($my_header);

// Read next item
$po_item = $po->read();
// read 10 items
$po_items = $po->read(10);

// Write next item(s)
$po->write($po_item);

// before ending batch iteration
$state = $bsm->getBatchState();
// Store state
batch_set_state($state);

next batch iteration

// Recove state
$state = batch_get_state();
$bsm = new BatchStreamManager()
$bsm->setBatchState($state);

$my_s = $b->getStream();
$po = new POStreamProcessor($my_s);
$po->read(20);

For the DB side of this issue we should have an abstraction for reading/writing POItems to the database as well.

For this to work POStreamProcessor must set the seek position to the end of the last read item and not just read chunks of bytes.

What we gain is:

  1. We decoupled Streams from managing Batched (the batch just uses the stream)
  2. We decoupled POProcessor from Streams (just uses the stream)
  3. A unified BatchStreamManager for whatever stream we can get.
  4. A PO reader/writer
  5. A clear definition of what a POItem / POHeader is.

@fubhy: What do you think of this decoupling? You need to provide for POItem and POHeader (sorry about that :p)

Gábor Hojtsy’s picture

Ok, that looks like way over the abstraction level I've ever thought of. Question is if we need/want that level of abstraction or it is going to kill this issue because of no agreement on the various abstractions :) if we can get this defined relatively soon and well, and then get it done, that would be great. I'd hate if we over-plan it and then die in the implementation :)

Sutharsan’s picture

I agree on the separation between stream interface and po lexer/generator. The stream object could be swapped for a database interface when translations come from the database, and the po generator can be swapped for a pot generator when we download a translation template. But I doubt the usefulness of a header and a po object.

The po header is almost empty. The plural formula is the only valuable part. The rest of the po header consist of static data and the header properties are not used in our context.
I have thought about a po object too. It can be usefull as a definition of translation data, but I can't think of any methods in this object. It is just transferred from the reader to the writer. Thats all. Further I have concerns about the performance impact of instantiating this object thousends of times.

andypost’s picture

The plural formula is the only valuable part.

Don't forget about number of plurals

clemens.tolboom’s picture

Rereading #42 - #45 and trying to build the BatchStreamManager I think we can do it with just a new Interface and follow the original thoughts of @Sutharsan having a POReader / POWriter as having a POProcessor would require different state management.

Thus a StreamStateInterface allowing for a POReader / POWriter to implement.

interface StreamStateInterface {
  public function setStreamState( array state);
  public function getStreamState();
}

Each POReader / POWriter implements StreamStateInterface and are capable to open a Stream by using

class POReader implements StreamStateInterface {
  public function open($uri, $mode, $options, &$opened_url) {
    $s = file_stream_wrapper_get_instance_by_uri($uri);
    if ($s) {
...
  }

  /**
   * Read a PO Item by reading severals lines from the file.
   *
   * @return a structured array | object?
   */
  public function readItem() {
  }
}

fubhy can now build this POReader and the Batch can set the state cleanly.

This seems way cleaner then #42 or #29

What do you think?

Sutharsan’s picture

@clemens, your proposal looks good. Indeed more straight forward that our earlier ideas. Remember that besides POReader and POWriter we need and a DBReader and DBWriter. Readers can have the same base class or interface and the Writers likewise.

fubhy’s picture

Sorry for disappearing from this issue for so long... Coming back from Denver I found an overwhelming amount of work on my desk and it took all of my focus to get that stuff over with as soon as possible. Anyways, here are my conclusions and opinions from the last couple of comments on this issue:

Even if this might be obvious: Let's first clarify what "Reading" / "Writing" / "Parsing" means.

Reading: Reading means (in my mind): "Find a msgid in the database / file (in general "source") to identify the respective translated strings". This is more like a "query" behavior (query for a specific condition "msgid" or even something else) to retrieve a subset of the available translated strings from whatever source we are querying. In case of normal t() invocations this would be a query for a "msgid" for example with just one return value.

In a more complex scenario (and if we had that data available in the database) we could also query for the context of a translated string or whatever else might be of interest. Consequently "Reading" is, by default, not a line-by-line reading ("batchable") process. It is really more like a single query (I believe that this is very important).

Hypothetically speaking "reading" from a .po file (querying) could, in a custom environment, replace the currently used db query for t() invocations and instead directly query for specific msgid's from a .po file (really just hypothetically speaking). So in order to make the Interface very solid and clean the "read" operation should just provide a "query for one, return one" functionality.

Parsing: This is basically the same as "reading" but doesn't have a conditional query component. It is a plain iterational procedure where we extract one "Gettext 'node' object" after the other from the data source until it is depleated (EOF is reached or all database record have been returned).

When parsing a .po file (exporting from a .po file) or exporting from the database we could have a special case operation where we don't query / read for a specific msgid but instead repeatedly ask for the next "Gettext 'node' object" in the source (file stream vs. db connection). This operation relies on an internal pointer in the data source (record id in the database vs. current file pointer / position in a file stream). This operation is batchable!

$nodes = array();

$stream = new DrupalPublicStreamWrapper();
$stream->setUri('some-po-file.po');

$pohandler= new GettextPoHandler($stream);

// Or maybe even set the data source in a separate method.
$pohandler->setDataSource($stream); 

// The current position (position of last node object that was parsed).
$position = 5 // E.g. the number of the node that we parsed in the last iteration.
$nodes += $pohandler->getNext($position);

// Or parse a specific number of nodes with a start and offset argument.
$offset = 10;
$nodes += $pohandler->getNodes($position, $offset); // Parses node 6-15.

// After we got a couple of nodes we can write them into the database.
$dbhandler = new GettextDbHandler(); // Defaults to the current db connection by default but also accepts a custom db connection as an argument in the constructor?!
$dbhandler ->writeNodes($nodes);

// Now that we inserted all these nodes into the database we can query for them in the db.
$result = $dbhandler->query('This is a msgid that I want to have translated', 'fr'); // 'fr' to override the currently active language in our environment to french (hypothetically) :).

So let's please make a distinction between "reading" and "parsing" :).

Looking at the bigger picture we can probably get to a point where importing / exporting easily works both ways (export from .po file => import into database // export from database => import into .po file).

If you ask me we dont even need two separate classes for reading / writing (the source & destination are the same (db <=> db / .po file <=> .po file) and both operate on that one source (file stream vs. database connection). So why make a disctinction? Just because the "direction" of the operations are different (get vs. set)? So.. Let's get to the "writing"

Writing: This is pretty obvious. Either write a "Gettext 'node' object" to a .po file or create a record for it in the database.

While "exporting" is a repeated "read" operation "importing" basically is a repeated "write" operation.

Practically speaking this is what we could do:

  1. Extract node objects from your environment (export from db) => Iterate over the generated "Gettext 'node' objects" and write (import) them into a (newly generated) .po file via our famous stream wrapper.
  2. Parse a whole .po file (export data from the .po file in a node-by-node manner) => Iterate over the generated "Gettext 'node' objects" and write (import) each one of them into the database.

On a side note: I think the title of this issue is indeed slightly outdated if we start to discuss complex proposals like the stuff we have been discussing in the last couple of replies. We should either rename the title of this issue or split this discussion into multiple issues (although the whole architecture [as I would like to see it] is so tightly integrated with each other that we cant really separate it if we really want to build it like this [yes, lets do it!]). :)

Sutharsan’s picture

Looks to me that we all have the same architecture in mind now. A source interface class for database or po file and a reader/parser/writer which works with the source to move data in and out the source interface.

I have no problem with combining the read and the write functionality into one class, theoretically it would be sane. It is that I ran into difficulties when building the class that I decided to split it. Also I saw no use case where we need both reading and writing in the same context. But if you manage to get the two into one without too much hassle, by all means go ahead.

But I have serious concerns of using the gettext classes for t(). t() should be light weight. Using a gettext interface class is theoretically correct and may therefore be preferable. But we risk performance as t() is called very often. If any I would give it the lowest priority in the development.

Time for code.

Gábor Hojtsy’s picture

Any progress on this? It has been a week since the last comment. :/ How can we make this going faster. It was not intended to be such a huge task, so we can scale it back if you are lost in the maze.

clemens.tolboom’s picture

Well for one this is not trivial :p

To run this from drupal root.

php core/lib/Drupal/Core/Gettext/testGettext.php

I did some more testing on drupal-7.11.ca.po file which is now partly parse by POReader in a crappy way.

Things to fix are:
- it does not stop reading.
- some plural lines are wrong parsed .. see TODO in POReader (set breakpoint in saveOneString on ksort line)
- the header is now an array ... I really like to have the full header structure
- POReader::readLine() has state related variables lying around. These must be defined in the class header.
- GettextFileInterface is opening a public stream ... this should not be its responsabiliy ... it should only manage a stream. We can test for is_local when assigning it a stream.

This to refactor/implement
- state management
- class structure

clemens.tolboom’s picture

Failures from the drupal-7.11.ca.po file.

clemens$ php core/lib/Drupal/Core/Gettext/testGettext.php | grep Array | grep source | sort -u
    [source] => Array
    [source] => Array%archive_file contains versions of modules or themes that are not compatible with Drupal !version: %names
    [source] => Array%type is used by @count pieces of content on your site. If you remove %type, you will not be able to edit the %type content and it may not display correctly.
    [source] => Array<strong>Warning:</strong> There are currently @count menu links in %title. They will be deleted (system-defined links will be reset).
    [source] => Array@count pending updates (@number_applied to be applied, @number_incompatible skipped)
    [source] => Array@count translation strings were skipped because they contain disallowed HTML.
    [source] => ArrayInstalling modules and themes requires access to your server via one of the following methods: <strong>@backends</strong>. See the <a href="@handbook_url">handbook</a> for other installation methods.
    [source] => ArraySorry, there have been more than @count failed login attempts for this account. It is temporarily blocked. Try again later or <a href="@url">request a new password</a>.
    [source] => ArrayThe @module module is missing, so the following modules will be disabled: @depends.
    [source] => ArrayTo uninstall @module, the following modules must be uninstalled first: @required_modules
    [source] => ArrayUpdating modules and themes requires access to your server via one of the following methods: <strong>@backends</strong>. See the <a href="@handbook_url">handbook</a> for other update methods.
    [source] => ArrayYou must include at least one positive keyword with @count characters or more.
clemens.tolboom’s picture

Status: Needs work » Needs review
FileSize
50.48 KB

I fixed the parser for multi-line plural source lines as that was the problem dumped in #52 and mentioned in #51. I'm puzzled whether the parsing code was that buggy?!? Should we report that elsewhere?

The command now produces a nice dump of public://test.po.txt which was a copy of drupal-7.11.ca.po ... I'd like to test for more complex po files.

php core/lib/Drupal/Core/Gettext/testGettext.php

seems to work with code like this

function testPoReader() {
  $uri = 'public://test.po.txt';
  echo "========================================\n";
  echo "Reading : $uri\n";
  echo "------------- PROCESSING ---------------\n";
  echo "File contents first 500 bytes\n";
  $contents = file_get_contents($uri);
  echo substr($contents, 0, 500) . "\n";
  echo "------------- PROCESSING ---------------\n";
  $file = new GettextFileInterface($uri);

  $reader = new Drupal\Core\Gettext\PoReader($file);

  // Reading the first items result in reading the header first
  // TODO : add method readHeader
  $translation = $reader->read();
  print_r($reader->header);
  print_r($translation);

  while ($translation = $reader->read()) {
    echo "...\n";
    print_r($translation);
  }

  echo "$line\n";
  echo "---------------- DONE ------------------\n";
}

so we can use this through a batch as the context+state variabled are now stored outside the PoReader::read() method. This needs some cleanup.

The code must now get converted to something mentioned in #42.

clemens.tolboom’s picture

For our db read/writer we need to read an item

(Locale module, line 520):

   $translation = db_query("SELECT s.lid, t.translation, s.version FROM {locales_source} s LEFT JOIN {locales_target} t ON s.lid = t.lid AND t.language = :language WHERE s.source = :source AND s.context = :context", array(
     ':language' => $langcode,
     ':source' => $string,
     ':context' => (string) $context,
   ))->fetchObject();

Or the full monty

_locale_export_get_strings($language = NULL, $options = array())

Writing a translation to DB (locales.pages.inc):

locale_translate_edit_form_submit()

Write the full monty

_locale_import_one_string_db(&$report, $langcode, $context, $source, $translation, $location, $overwrite_options, $customized = LOCALE_NOT_CUSTOMIZED)

Nice

Thanks @Sutharsan

Sutharsan’s picture

@clemens,

I'm puzzled whether the parsing code was that buggy?

You worked from my quick and dirty migration of _locale_import_read_po(). You should compare your findings with that code.

clemens.tolboom’s picture

Assigned: Unassigned » clemens.tolboom
FileSize
56.06 KB

@Sutharsan I worked from your version in POReader.

"that buggy" was to hard. Sorry for that. But I had to fix it's ending somehow. I'll check for the mentioned function later on.

Can you add some PO files to the issue summary with known(?) issues or with complex plural forms.

Attached patch is a little improved version. But it still has some issues. It introduces a POItem and POHeader class as I think that's the interface we should deal with. (if garbage collections is an issue we can refactor into a pool of unused objects)
But one benefit is we can now do

echo $myPOHeader . "\n";
// not yet
echo $myPOItem . "\n";

I don't like the names POHeader / POItem so better names are welcome.

I hope to get a Writer in place tomorrow and a new UML diagram. Next refactor into that model.

Just run php core/lib/Drupal/Core/Gettext/testGettext.php from drupal root.

Sutharsan’s picture

Sutharsan’s picture

Tested with Arabic and Russian, the strings '1 hour' and '1 day'. The plural string are presented in the right order and all translations in l.d.o are present in the parsed po data. For reference: '1 day' in Arabic at l.d.o and '1 day' in Russian at l.d.o

clemens.tolboom’s picture

Issue summary: View changes

Add parent

clemens.tolboom’s picture

Running the test script I now manage to import ar, ca, nl from the commandline :)

What it does is Remotely download and write these to public:// next read from public:// and write to Db

function pumpAround($langcode) {
  $uri = getRemoteUri($langcode);

  $reader = new PoFileReader($uri, $langcode);
  $header = $reader->getHeader();

  $uri = getPublicUri(__FUNCTION__, $langcode);
  $writer = new PoFileWriter($uri, $header);
  processN($writer, $reader, -1);

  $reader = new PoFileReader($uri, $langcode);
  $writer = new PODbWriter($langcode);
  processN($writer, $reader, -1);
}

I moved around a lot of code snippets from @Sutharsan effort ... not sure whether I missed ... guess I did :(

I created new classes POHeader and POItem and moved code into these clases.

This still needs a lot of work:

  • Interface or abstract class: I created an interface for POReader and POWriter but am in doubt.
  • When to open a POReader, POWriter
  • When to close a POReader, POWriter
  • DBReader state management fails somehow ... the prepared statement needs some feedback. @see DBReader::buildQuery()
  • The is Drupal code in POHeader
  • Documentation
  • Test : the tests from testGettext.php need to go into Unit and or Drupal test cases.
  • A new UML diagram as promised from #56 is still to come
clemens.tolboom’s picture

Status: Needs review » Needs work
+++ b/core/lib/Drupal/Core/Gettext/PODbReader.php
@@ -0,0 +1,169 @@
+    $query->orderBy('s.lid');
+    $query->condition('s.lid', ':lid', '>');
+
+    $this->_result = $query->execute(array('lid' => $this->_lid));

This seems not to work.

Calling getState() after some reading shows the lid increasing but it is not set for the next 'batch'

(I cannot find a test in testGettext.php .. darn.

Sutharsan’s picture

Do we need to read the DB in batches? Batch handling was introduced to cope with performance bottlenecks and unusual situations like installing Drupal. Does this apply to DB'es? Also the po export to file is no bottleneck, although I don't know about requirements for export at l.d.o. As far as I can see there is no need for batch handling the db read and file write.

Gábor Hojtsy’s picture

@Sutharsan: re #61, you are right.

clemens.tolboom’s picture

@Sutharsan I'm not sure what you mean:

- this issue is about making gettext abstract thus PODbReader should follow an interface. But this does not mean PODbWriter can not finish it's job in one go.

- In #39 I said I was confused about #569004: Add support for seek based batch import of .po files. That why I want you on the phone :)

- Having POFileReader batch aware means PODbWriter must be batch aware too. Ie whether or not writing the header or zapping all db entries. I guess we don't have these features now but shouldn't we?

- The other way around PODbReader + POFileWriter one could argue we don't need this batch but that would make the design asymmetric making PODbReader special which I don't like.

The -1 below is about writing ALL to whichever writer.

  processN($writer, $reader, -1);

and the result from a db_select->execute() below is an Iterator too.

class PODbReader implements BatchStateInterface, PoReaderInterface {
...
  private function buildQuery() {
...
    $query->condition('s.lid', $this->_lid, '>');
...
    $this->_result = $query->execute();
}

Would it better be adding readItems()?

interface PoReaderInterface {
...
  function readItems($count =  10);
}

Regarding Symfony Translation (not for this issue but the near future):
- Do we need it's rich file format support like .mo .csv .xliff to name a few?
- I intend to contribute our POFileParsing POFileLoader
- Should we adhere to their naming: Dumper / Loader
- As Symfony lacks a Batch (afaik) right now I (maybe) want to start a feature request over there.

Get @Sutharsan on the Phone :p

(to be continued)

clemens.tolboom’s picture

Attached patch fixed the get/setState of PODBReader mentioned in #59 and makes

pumpAround('nl); 

download from http:// to public:// into database next out into a new public://

I'll try to:

- get rid of gettext.inc
- have tests Unit + Web TestCases
- add UML diagram.
- ?

Gábor Hojtsy’s picture

Woot, woot, this is progressing great.

Gábor Hojtsy’s picture

Issue summary: View changes

Updated issue summary.

clemens.tolboom’s picture

Status: Needs work » Needs review
FileSize
75.12 KB

I talked with @Sutharsan about the context of this issue and that helped much :)

Note to self: read #48 and update issue summary.

Attached patch is only for the testbot ... to see what more I missed :p Strategy I choose for now is to rewrite gettext.inc gradually as @Sutharsan sketched in #32.

This patch only has

_locale_import_po()

rewritten. It imported nicely the nl-po file. But the rewrite needs a lot of header related fixes.
It showed me we have also a POMemoryReader|Writer sort of.

clemens.tolboom’s picture

Status: Needs review » Needs work
andypost’s picture

I think we need to leave ability for batch import like #569004: Add support for seek based batch import of .po files

clemens.tolboom’s picture

@andypost ? ... this issue has (in theory) batch support.

$reader = new POFileReader();
$writer = new POBDWriter();
$writer->writeItems($reader, 10);

$states(
  'reader' => $reader->getState(),
  'writer' => $writer->getState(),
);
batch-save-state($states);

next batch

$states = batch-get-state();

$reader = new POFileReader();
$reader->setState($states['reader']);

$writer = new POBDWriter();
$writer->setState($states['writer']);

$writer->writeItems($reader, 10);

But I first want to make the testbot happy.

clemens.tolboom’s picture

Issue summary: View changes

Added all functions from gettext.inc

clemens.tolboom’s picture

Status: Needs work » Needs review
FileSize
92.27 KB

Not all import errors are fixed.
- I'm puzzled with why importing a French .po should change the Plural-forms? Plural numbers changed. locale.test 1011
- How should this work : Language file automatically imported. locale.test 1087
- Other 2 errs are prob related to the above?

More gettext.inc functions are moved. See issue summary for details.

Next steps are:
PO Exports : Fix test errors to begin with.
POMemoryWriter (?+ POMemoryReader) : To obsolete function _locale_import_read_po()

Note: the PoDbWriter relates to #851362: Add hash column to {locales_source} to query faster locale strings

clemens.tolboom’s picture

Status: Needs review » Needs work

Still lots of work.

clemens.tolboom’s picture

That's weird. On my laptop I have when testing throught UI I get only failures related to 'Translation import'.

135 passes, 4 fails, 0 exceptions, and 45 debug messages.

From the testbot log

Path alias functionality 132 passes, 0 fails, and 0 exceptions
exception 'PDOException' with message 'SQLSTATE[42S02]: Base table or view not found: 1146 Table 'drupaltestbotmysql.locales_source' doesn't exist' in /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Statement.php:58
Stack trace:
#0 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Statement.php(58): PDOStatement->execute(Array)
#1 /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Connection.php(506): Drupal\Core\Database\Statement->execute(Array, Array)
#2 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/database.inc(215): Drupal\Core\Database\Connection->query('SELECT s.lid, t...', Array, Array)
#3 /var/lib/drupaltestbot/sites/default/files/checkout/core/modules/locale/locale.module(524): db_query('SELECT s.lid, t...', Array)
#4 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/bootstrap.inc(1380): locale('1 pass?@count p...', '', 'fr')
#5 /var/lib/drupaltestbot/sites/default/files/checkout/core/includes/common.inc(1779): t('1 pass?@count p...', Array, Array)
#6 /var/lib/drupaltestbot/sites/default/files/checkout/core/modules/simpletest/simpletest.module(107): format_plural(37, '1 pass', '@count passes')
#7 /var/lib/drupaltestbot/sites/default/files/checkout/core/scripts/run-tests.sh(373): _simpletest_format_summary_line(Array)
#8 /var/lib/drupaltestbot/sites/default/files/checkout/core/scripts/run-tests.sh(22): simpletest_script_run_one_test('1', 'PathMonolingual...')
#9 {main}FATAL PathTaxonomyTermTestCase: test runner returned a non-zero error code (1).
OpenID helper functions 45 passes, 0 fails, and 0 exceptions

Any idea?

clemens.tolboom’s picture

Status: Needs work » Needs review
Issue tags: -D8MI, -sprint, -language-ui

#70: 1189184-gettext-parsing-70.patch queued for re-testing.

Status: Needs review » Needs work
Issue tags: +D8MI, +sprint, +language-ui

The last submitted patch, 1189184-gettext-parsing-70.patch, failed testing.

Gábor Hojtsy’s picture

Clemens: if an overwrite mode was selected, new plural forms should overwrite old plural forms.

clemens.tolboom’s picture

Status: Needs work » Needs review
FileSize
109.75 KB

I added #1497230: Use Dependency Injection to handle object definitions to the patch from #70 as the test definitely will not work.

I refactored a lot by adding
- Interfaces: PoInterface, PoStreamInterface, PoReaderInterface, PoWriterInterface
- Class : POMemoryWriter
- Cleaned gettext.inc more

I changed locale.bulk.inc to use _locale_import_po straightaway ... I hope understood the batch intention enough.

Changed install.inc to use _locale_files_to_memory() as an initial test. But where is that tests?

This still needs lots of work.

I intend to do much more but it takes time :(

Let's see what the testbot thinks now.

clemens.tolboom’s picture

Status: Needs review » Needs work
clemens.tolboom’s picture

Hmmm ... I ran the tests locally on Mac OSX
- 17005 passes, 18 fails, 2 exceptions, and 3681 debug messages

Page cache test
Enable the page cache and test it with various HTTP requests.
73 passes, 6 fails, 2 exceptions, and 19 debug messages

drupal_render()
Performs functional tests on drupal_render().
50 passes, 2 fails, 0 exceptions, and 14 debug messages

Image dimensions
Tests that images have correct dimensions when styled.
47 passes, 9 fails, 0 exceptions, and 8 debug messages

Language dependency injection
Compares the default language from $GLOBALS against the dependency injected language object.
17 passes, 1 fail, and 0 exceptions

Testbot reports (from #76 )
- 35,271 pass(es), 64 fail(s), and 3 exception(s)

ExpandPlural handling (LocalePluralFormatTest) [Locale] 73 39 0
ExpandTranslation export (LocaleExportFunctionalTest) [Locale] 43 2 0
ExpandTranslation import (LocaleImportFunctionalTest) [Locale] 106 23 3

This is too weird for me ... I zap my drupal install then start all over with the patch from #76 .

clemens.tolboom’s picture

Can someone please test the patch locally? I suspect something messing with this new Classes.

This patch fixes the installer ... just did an Afrikaans installation with the PoMemoryWriter :-)

The POFileReader had some bugs too with plural which are hopefully fixed now too.

What bothers me still are the exceptions from the testbot.

clemens.tolboom’s picture

Status: Needs work » Needs review

(sorry)

clemens.tolboom’s picture

Status: Needs review » Needs work
clemens.tolboom’s picture

Status: Needs work » Needs review

[edit]Test bot missed #79
I missed browser refresh :/ [/edit]

clemens.tolboom’s picture

Running locally (output shortened)

sudo -u _www php ./core/scripts/run-tests.sh --url http://drupal.d8 --php /usr/bin/php Locale

Drupal test run
---------------

Tests to be run:
 - Browser language detection (LocaleBrowserDetectionTest)
...
 - Translation export (LocaleExportFunctionalTest)
 - Translation import (LocaleImportFunctionalTest)
...
 - Plural handling (LocalePluralFormatTest)
...

Test run started:
 Wednesday, April 25, 2012 - 11:23

Test summary
------------
...
Translation import 139 passes, 2 fails, 1 exception, and 46 debug messages
...

As suggested by @Gábor Hojtsy on irc i should solve the local problems first as the testbot could be picky about NOTICES and others.

clemens.tolboom’s picture

Wow ... some subtle changes were needed.

$writer->setLangcode($langcode);
-  $writer->setHeader($header);
   $options = array(
     'overwrite_options' => $overwrite_options,
     'customized' => $customized,
   );
+  // It's vital options are set first.
   $writer->setOptions($options);
+  // TODO: is this a design problem?
+  $writer->setHeader($header);

The order of setters is imported for parsing the header. On could say the design of PODbWriter is bad. But this is necessary for the batch interface. We cannot have constructors with arguments as the next batch does not know how to recreate the object right?

Next the Plural-Forms was empty when writing a POHeader. Fixed that.

function locale_translate_batch_import($filepath, &$context) {
@@ -298,6 +301,8 @@ function locale_translate_batch_import($filepath, &$context) {
   // we can extract the language code to use for the import from the end.
   if (preg_match('!(/|\.)([^\./]+)\.po$!', $filepath, $langcode)) {
     $file = (object) array('filename' => drupal_basename($filepath), 'uri' => $filepath);
+    // We need only the last match
+    $langcode = array_pop($langcode);
     _locale_import_po($file, $langcode, array(), LOCALE_NOT_CUSTOMIZED, TRUE);
     $context['results'][] = $filepath;
   }

This was tricky ... the $langcode is an array ... not sure why other tests did not fail.

Lastly I reordered the 'Plural number changed' to match the po upload better. There was a unrelated drupalPost in between.

Status: Needs review » Needs work

The last submitted patch, 1189184-gettext-parsing-84.patch, failed testing.

clemens.tolboom’s picture

Status: Needs work » Needs review
FileSize
46.15 KB

Attached current UML.

UML-gettext.png

Note the three main Interfaces: BatchState, POReader|Writer, POStream

What is missing is a POMemoryReader. I haven't found a usecase for it. POMemoryWriter has a getData method which breaks symmetry.

I haven't got time to implement the batch due to the fails _ALL_ of the testbot. I suspect the testsuite not to find the new Gettext components somehow.

clemens.tolboom’s picture

From irc ([edit:with some additions])

jthorson: ClemensTolboom:  Something to try ... run the tests locally using run-tests.sh with --concurrency set to something other than 1 ... this could help better mimic the testbot operation.
...
jthorson: ClemensTolboom:  I think http://drupal.org/node/346844#comment-1258161 is related to what I was looking for.
...
ClemensTolboom : jthorson: I guess so for that particular exception … but does that explain the import/export failures too?
jthorson : notes that the recurring locale-related issues are one item that he consistently hasn't been able to get a true grasp of the underlying problem.
...
jthorson : ClemensTolboom:  Will see if randy has any thoughts ... otherwise, I think manually testing on a testbot and then inspecting the state of the 'site-under-test' may be our next step.  That approach also allows us to manually start adding debug code to the particular test and re-triggering  the testbot run to see where it's getting hung up.
...
ClemensTolboom : jthorson: I don't want to waste your time if I can do better
...
jthorson : 	ClemensTolboom:  Another random thought ... have you run the tests locally on a site with locale not enabled to start? 19:12

ClemensTolboom : jthorson: I have a script in the patch that does a lot of test and that failed when locale.module was not enabled but apart from that no? 19:14

jthorson : ClemensTolboom:  Just thinking that could be one more difference between your local run-tests.sh execution and the testbot's. 19:15
ClemensTolboom : jthorson: ic … ok I'll start run-tests later on with that one disabled 19:15
jthorson : If that triggers the fails, let me know.  

So need to

clemens.tolboom’s picture

jthorson’s picture

Unable to troubleshoot tonight, as my manual bot is running into APC errors ... if you could open an issue in the https://drupal.org/project/issues/testbot queue, that will help remind me to come back and re-visit this once I've got it behaving again.

clemens.tolboom’s picture

I created #1548366: Locale related Exception in Review log of test. as requested by @jthorson

http://drupal.org/project/testbot states testbot runs with clean url turned off so when testing locally one should test without clean url too.

In #1548366: Locale related Exception in Review log of test. I added a comment #1539072: Support for disabled languages broken, drop it has the same exception.

I have (now) no time to run the full test-suite as that takes ages (it ran for 4+ hours :/ )... will do after May 5th.

Attached patch has some documentation added.

clemens.tolboom’s picture

Why I think the classloader could be causing this err is when developing and debugging with Netbeans the classloader does not load the file.

...
namespace Symfony\Component\ClassLoader;
...
class UniversalClassLoader
{
...
    public function loadClass($class)
    {
        if ($file = $this->findFile($class)) {
            require $file;
        }
    }
...
}

which assigned nada (NULL, FALSE?) to my $reader = new PoFileReader(); when not properly declared. I have to retry that after May 5th.

Status: Needs review » Needs work

The last submitted patch, 1189184-gettext-parsing-90.patch, failed testing.

Anonymous’s picture

Issue summary: View changes

Updated the gettext.inc function moves.

sun’s picture

Please make sure that all your namespaces and classes are using the same and sane letter-casing.

Some classes/namespaces in this patch are using "PO" while some others are using "Po".

PSR-0 is case-sensitive. Only use "Po" everywhere.

jthorson’s picture

Yup ... the PDO Exception is a red herring, introduced by http://drupal.org/node/1497230 (and present in all D8 tests ever since, despite them passing).

Sutharsan’s picture

Status: Needs work » Needs review
FileSize
111.65 KB

Patch modified to use "Po" everywhere as suggested by sun.

sun’s picture

Status: Needs review » Needs work

Thanks - #95 passed the testbot now.

The patch still contains plenty of notes and @todos, is missing comments, phpDoc, etc and thus, overall doesn't seem to be ready for reviews yet.

However, some early feedback:

+++ b/core/includes/gettext.inc
@@ -6,7 +6,28 @@
+use Drupal\Core\Gettext\PoDbReader;
+use Drupal\Core\Gettext\PoDbWriter;
+use Drupal\Core\Gettext\PoFileReader;
+use Drupal\Core\Gettext\PoFileWriter;
+use Drupal\Core\Gettext\PoMemoryWriter;

1) That looks like a crazy level of abstraction... I'd rather expect:

Gettext\DatabaseController
Gettext\FileController
Gettext\MemoryController

but if that level of abstraction is required, I'll let you guys roll...

2) Even when not using controllers but individual reader/writer classes, the class names could be simplified. They also should not contain abbreviations ("Db" vs. "Database").

clemens.tolboom’s picture

Just a short reply @sun about class names.

I'm looking @ http://symfony.com/doc/current/book/translation.html and esp. https://github.com/symfony/symfony/tree/master/src/Symfony/Component/Tra... their implementation where there are Loaders and Dumpers.

I like the design of Symfony but still don't get their whole picture regarding translation.

@sun: I can install drupal in ie af / ar etc through the PoMemoryWriter but I'm not sure whether we test this. Do you have some hints?

@others : Should we target using Symfony translation framework? What impact could that have on our way of handling translations? See links mentioned above.

sun’s picture

Oh, wow. I wasn't aware of the Translation component in Symfony. I quickly glanced through it and couldn't find anything that would look problematic.

Seems to revolve around reading and writing gettext strings in various formats (storage is separate), and also translating strings at runtime (with plural support, though hard-coded plural rules). I don't think we have any intention to change our runtime translation t() for D8, but I wonder whether we could only leverage the reader and writer parts of it?

If we want to explore that, we might want to create a parallel issue to this (so this one isn't hi-jacked in case it doesn't work out).

Gábor Hojtsy’s picture

@sun: there is some discussion about the Symfony components for .po reading above. The main issue as I've gathered it was that they did not support chunked reading (so you could import a huge file without your memory being blown away).

clemens.tolboom’s picture

Status: Needs work » Needs review
FileSize
111.66 KB

Why is patch from #95 pass tests now. Is my local env doing different?

I couldn't apply patch from #95 due to one line in gettext.inc concerning menu_router_rebuild() (committed May 3) ... renaming files in a patch made some more work too for comparing changes. How should one take care of that when working on issue branch?

(so this patch is just to make sure testbot does not fail again)

clemens.tolboom’s picture

Status: Needs review » Needs work
clemens.tolboom’s picture

I just did an symfony patch to see how our parser behaves in their environment. See https://github.com/symfony/symfony/pull/4235

Some notes:

Regarding PoItem + PoHeader these classes are nicer then just arrays but they are tied to the Gettext interface / file format and thus cluttering the PoDbWriter. I think we should talk about 'LC_MESSAGES' instead.

Symfony uses a message structure between all their loaders. I was bothered by the fact all our Classes needed to know what a POHeader/POItem is which felt bad.

We should define a datastructure and document that. PoMemoryWriter somehow uses one.

// Singular
$messages[$context][$source] = $translation
// plural
$messages[implode(LOCALE_PLURAL_DELIMITER, source)] = implode(LOCALE_PLURAL_DELIMITER, $translation);

Symfony uses

// For singural
$messages[$source] = $translation

// For plural two entries are added: the singular form and the plural containing an index + | concat
$messages["foo"]= "bar";
$messages["foos"]= "{0} bar|{1} bars"

I'll create a new issue regarding symfony the moment that pull request is pushed to their master. And thus try to stop talking about symfony within this issue :p

clemens.tolboom’s picture

From our office hours @irc #drupal-i18n http://groups.drupal.org/node/229008 :

We could try to write a StreamWrapper mimicking multiple files served one by one on each batch iteration to the symfony Translation\PoFileLoader component. That sounds like a nice solution.

But for this to work we need to adhere to symfony's message structure (see #102) ... I don't like the '|' separator used by symfony so we need to convince them to use our LOCALE_PLURAL_DELIMITER.

Symfony's plural handling http://symfony.com/doc/current/book/translation.html

$this->get('translator')->transChoice($source, $count, $args)

construct is lacking our http://api.drupal.org/api/drupal/includes!bootstrap.inc/function/format_... options.

I need to chew on this a little.

sun’s picture

Do the trans() and transChoice() methods matter at all?

We're not going to replace our actual t() string translation with Symfony (yet). The idea is to merely leverage the Translation component for gettext import/export/parsing operations.

EDIT: That said, I'd still recommend to create a new issue for the Symfony based approach. 'cos if that doesn't work out, this issue won't be totally hi-jacked.

clemens.tolboom’s picture

Assigned: clemens.tolboom » Unassigned
Status: Needs work » Postponed

@sun Agreed

I start working on #1570346: Let Symfony Translation component handle language messages from now to see how that could work for Drupal.

This issue is now postponed in favor off #1570346: Let Symfony Translation component handle language messages.

Gábor Hojtsy’s picture

Status: Postponed » Closed (duplicate)
Issue tags: -sprint

This is in practice duplicate of #1570346: Let Symfony Translation component handle language messages. Should be reopened if that one is not feasible. Moving off sprint.

clemens.tolboom’s picture

Status: Closed (duplicate) » Needs work
Issue tags: +sprint

As #1570346: Let Symfony Translation component handle language messages is now closed as it was to farfetched to implement within Symfony framework yet.

As I understood from @Sutharsan by a phone call he is sort of blocked by this issue as there is no API to code against. So the issue title does not match that goal either.

(I'll try to re-roll if needed the patch and get familiar with this issue before the sprint.)

Gábor Hojtsy’s picture

I don't understand what would Sutharsan be blocked about in regards to this issue, that would be good to know.

I think to successfully move forward with this patch, we should get this down to a level of abstraction that is not over the top (see Sun's comments) Can we simplify this based on what you learned in Symfony. What we have in D7 is no API, what this patch proposes seems like a very convoluted set of classes and interfaces. Is there a good middle ground?

clemens.tolboom’s picture

This patch needs a controller which stored translations in memory (with a batch interface?!?)

We should not name it Gettext or something but rather MessageCatalogController using a MessageCatalog (aka Symfony). I'm not sure about this but let's make the controller our new API.

The controller can then load/dump the messages into:
- load from file (*.po)
- dump to memory for the installer
- load from db
- dump to cache (weird thing)
- dump to file (*.po)

Needs feedback!

Gábor Hojtsy’s picture

@Clemens: in earlier Drupal major releases, we found that loading in a whole .po file to memory is huge and might not fit under limits. Now that sounds like a good possibility again, especially if you want to represent components with the above documented wrapper classes (PoHeader, PoItem). So I'm not sure everything should evolve around an all-in-memory storage system.

clemens.tolboom’s picture

@Gábor Hojtsy

The memory constraint puzzles me:
- Is ie install.nl.po supposed to be small? That would mean we get the Symfony domain 'problem'.
- As this is not about gettext msgctxt but a separate PO file (Symfony calls it domain) for the installer?
- Is http://localize.drupal.org taking care of that in some way?
- How should the message extract work?

You have a good point about PoItem if object memory is much bigger then an array.

Gábor Hojtsy’s picture

@Clemens: your comment made it sound like you want to base *everything* on a memory storage based solution, which is what I commented on. If that was not the case, maybe I misunderstood your proposal.

clemens.tolboom’s picture

Status: Needs work » Needs review
Issue tags: -D8MI, -sprint, -language-ui

#100: 1189184-gettext-parsing-100.patch queued for re-testing.

Status: Needs review » Needs work
Issue tags: +D8MI, +sprint, +language-ui

The last submitted patch, 1189184-gettext-parsing-100.patch, failed testing.

clemens.tolboom’s picture

@Gábor Hojtsy could you please give some feedback on the installer questions from #111 ... guess you missed it as we commented within the minute :)

Note to self
... cannot merge with head
... not sure what git tries to tell here.
See #100

<<<<<<< HEAD
  // Rebuild the menu, strings may have changed.
  menu_router_rebuild();
=======
  if (!$is_batch) {
    // Clear cache and force refresh of JavaScript translations.
    _locale_invalidate_js($langcode);
    cache()->deletePrefix('locale:');
>>>>>>> Fixed POMemoryReader and POFileReader parsing problems.

    // Rebuild the menu, strings may have changed.
    menu_rebuild();
clemens.tolboom’s picture

Status: Needs work » Needs review
FileSize
111.19 KB

Just a rebase patch to feed the bot

clemens.tolboom’s picture

Status: Needs review » Needs work
clemens.tolboom’s picture

Assigned: Unassigned » clemens.tolboom

I've reread the whole issue and noted a lot of issues for this issue.
- the issue summary sucks
- a lot of comment from others are not addressed
- some question I asked are not answered.
So these thinks needs to be taken care of before moving this forward.

This issue was postponed May 10 #105 so I need to learn it again :(

So my battleplan for today is:
- read the code and hopefully fix the tests (morning).
- summarize all open comments and questions (afternoon).

clemens.tolboom’s picture

Status: Needs work » Needs review
FileSize
111.32 KB

There was an old menu_rebuild() ... so testbot be nice :)

Status: Needs review » Needs work

The last submitted patch, 1189184-gettext-parsing-119.patch, failed testing.

Gábor Hojtsy’s picture

This is being worked on at http://drupalcode.org/sandbox/goba/1624820.git/shortlog/refs/heads/gette... in a sandbox branch for anybody who wants to follow the progress close.

Gábor Hojtsy’s picture

Issue summary: View changes

Updated issue summary.

penyaskito’s picture

Changed summary for pointing out that downloading the translations is required on installing.

Download to sites/default/files/translations the Dutch (nl), Catalan (ca) and Arabic (ar) translations from l.d.o and be sure that they are named install.{lang}.po.

clemens.tolboom’s picture

Status: Needs work » Needs review
FileSize
111.06 KB

The patch removed nplural knowledge from POItem which took me too much time to discover.

It used variable_get to iterate over its translation which is so against Separation of concern.

We must make sure somewhere higher up the chain POItem contains the correct data.

Thanks @penyaskito for your help :)

clemens.tolboom’s picture

Issue summary: View changes

Download install po files before installing the site on testing section

clemens.tolboom’s picture

I've updated the issue summary to a v0.2 (that means more to come)

penyaskito’s picture

Changed POHeader and POItem to PoHeader and PoItem for consistency. Let's see if it's already green.

Status: Needs review » Needs work

The last submitted patch, 1189184-gettext-parsing-124.patch, failed testing.

penyaskito’s picture

Status: Needs work » Needs review
FileSize
111.47 KB

Don't panic, I think that I used wrong revisions when creating the patch.

Status: Needs review » Needs work

The last submitted patch, 1189184-gettext-parsing-125.patch, failed testing.

penyaskito’s picture

This is the last attempt, if it fails I get back to @clemens.tolboom tomorrow.

penyaskito’s picture

Status: Needs work » Needs review

Feed the testbot.

penyaskito’s picture

Removed the references to PoWriter, that was renamed to PoWriterInterface.

penyaskito’s picture

Per the quotes at

The component must be capable of reading from/writing to *.po files, reading/writing to database, 'writing' to memory.

, looking that Symfony uses Loader and Dumper (#1189184-35: OOP & PSR-0-ify gettext .po file parsing and generation by @clemens.tolboom) and the terminology that @fubhy explains at #1189184-48: OOP & PSR-0-ify gettext .po file parsing and generation, maybe we should consider replacing the suffixes Writer and Reader to Loader and Dumper. Eventually, we could split Loader in Loader and Reader for sequential loads if needed.

penyaskito’s picture

Feeding the testbot. Added Gettext class used by the installer by Clemens and some refactoring done by Clemens and me.

clemens.tolboom’s picture

(notes)
When we start working on the Batch interface we could try to do it for testing manually
- install_import_translations which uses "locale_translate_batch_import"
This code below is probably obsolete? (related to: Why are files imported twice?)

  $batch = locale_translate_batch_import_files($langcode);
  if (!empty($batch)) {
    return $batch;
  }

as install_import_translations_remaining does something similar.

function install_import_translations_remaining(&$install_state) {
  include_once drupal_get_path('module', 'locale') . '/locale.bulk.inc';
  return locale_translate_batch_import_files($install_state['parameters']['langcode']);
}

But for debugging we can better use the UI interface for importing.

clemens.tolboom’s picture

Issue summary: View changes

(v0.2) A rewrite with problems mapped to solutions.

clemens.tolboom’s picture

Issue summary: View changes

Deleted code snippets which cluttered the text.

clemens.tolboom’s picture

Issue summary: View changes

Some layout improvements

clemens.tolboom’s picture

I updated the issue summary a little:
- removed irrelevant code snippets
- changed some styling
- added BatchStateInterface interface (does it help?!?)

clemens.tolboom’s picture

Issue summary: View changes

Rephrased BatchStateInterface a little (not sure whether that helps)

clemens.tolboom’s picture

I'm not sure the Batch interpretation currently done is what we need:
- Say we have 10 *.po files to load by a module x
- Now each .po file should be loaded into smaller chunks say 5
The module does not care the files are loaded in batches and we don't care about multiple files.

That doesn't sound to good. So any help is welcome. What should we do?

Gábor Hojtsy’s picture

@clemens.tolboom: I think the original idea and AFAIR in the contrib gettextapi module that was done to prototype this in D7 you get one batch item for each file. Then the batch items themselves can be progressive, so you can return from a batch item and ask for it to be rerun, at which point you get back to the same batch task with an updated batch sandbox. You can use the sandbox to store the batch state per file. See the gettextapi project in contrib.

attiks’s picture

Summary of my talk with Clemens in random order:

  1. POFileReader checks for BOM, but doesn't check if the file is UTF8, does it have to do it?
  2. Try to get rid of gettext.inc
  3. Don't export if there's nothing to export
  4. Show message if import succeeded
attiks’s picture

clemens.tolboom’s picture

@atttiks thanks for the summary.

1. The BOM / Byte Order Marker is copy/paste code. (for me for now: won't fix :/)

2. I hope today with @attiks and @penyaskito to remove all export function. And hopefully the _locale_import_po too. Then it _is_ empty :)

To make that work we need a way to get 'rid' of the drupal_set_message stuff. Yesterday I tried to create a Gettext::fileToDatabase() to discover the caller needs both the PoFileReader (for failures on reading) as PoDatabaseWriter (for failure and statistics). So we should discuss this a little with the three of us.

3. Did we talk about that? Is this solved now?

4. See the statistics from 2

Your last patch is empty. Have you asked @Gábor Hojtsy for git access?

attiks’s picture

1/ see #1630568: Validate that uploaded .po files are UTF8
2/ todo
3/ fixed
4/ is a temp fix, will have a look

attiks’s picture

trying to import the same po file twice:
Drupal\Core\Database\DatabaseExceptionWrapper: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'temporary://modules-block.nl_.po' for key 'uri': INSERT INTO {file_managed} (fid, uid, filename, uri, langcode, filemime, filesize, status, timestamp) VALUES (default, :db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7); Array ( [:db_insert_placeholder_0] => 1 [:db_insert_placeholder_1] => modules-block.nl_.po [:db_insert_placeholder_2] => temporary://modules-block.nl_.po [:db_insert_placeholder_3] => und [:db_insert_placeholder_4] => application/octet-stream [:db_insert_placeholder_5] => 15250 [:db_insert_placeholder_6] => 0 [:db_insert_placeholder_7] => 1339578176 ) in drupal_write_record() (line 483 of /var/aegir/platforms/i18n/core/includes/schema.inc).

attiks’s picture

ignore #142, file permission problem

clemens.tolboom’s picture

Remove hasResults from PoDbReader as 'emptiness' is a business rule like

$r = new PoDbReader();
....

$item = $r->getItem();
// We don't want to write an empty reader
if (!empty($item)) {
  $w = new PoFileWriter();
  $w->setX();

  $w->writeItem($item);
  $w->writeItems($r);
}

[edit] Think of PoFile|Database|Memory as Iterators which 'just' dry out.

clemens.tolboom’s picture

My idea for the batch is adding the following to Gettext.php


  static function getStates($reader, $writer) {
    return array(
      'reader' => $reader->getState(),
      'writer' => $writer->getState(),
    );
  }
  
  static function createFromStates($states) {
    $reader = new $state['reader']['__CLASS__'];
    $reader->setState($state['reader']['__CLASS__']);
    $writer = new $state['writer']['__CLASS__'];
    $writer->setState($state['writer']['__CLASS__']);
    
    return array($reader, $writer);
  }
attiks’s picture

Clemens: the current batch is working, why do you want to change it?

clemens.tolboom’s picture

Things to check for are:

What _is_ the new API

I somehow expect to have a single point in the code with a factory method to produce a reader or writer.

Issues named in this issue

- #1 #147915: Refactor locale import/export API to support non-local files and non-gettext formats (non-gettext formats) (we can load from remote)
- #2 #361597: CRUD API for locale source and locale target strings (crud would be cool especially when save multiple at once aka $writer->writeItems())
- #3 #420510: Possibly misleading documentation/func naming for developers on locale.inc functions (we have still a documentation problem over here)
- #6 #1219236: Locale module includes 1350+ lines of unneeded code on all page loads
- #21 #532512: Plural string storage is broken, editing UI is missing (I haven't seen that interface)

Possible design and bugs

- #18 Do we have a test for multilines
- #55 Our parser is from a refactoring try. We should check with the current code version.
- #69 Batch support is underway.
- Code is now located in code/lib/Drupal/Core but it should be in core/lib/Drupal/Component I guess

That's it for now.

clemens.tolboom’s picture

Issue summary: View changes

Fixed into {language}

penyaskito’s picture

Let's see some green here in the meantime.

penyaskito’s picture

One thing that bothers me is that for exporting a PO/POT file we are now creating the physical file on the server and then reading and flushing it to the client.

Should we implement a PoStringWriter for this instead?

clemens.tolboom’s picture

@penyaskito #149 that's performance related a good idea.

But why don't we do a redirect to the file written that is use a public:// file? Then it's up to apache to serve it.

penyaskito’s picture

Ok, seems that having all the string in memory is not a good idea. We can continue the clean up and consider that later if needed.

penyaskito’s picture

Saving the file in public:// doesn't look good to me, it could be using disk space that can be restricted in some environments, and the file will always be regenerated. But the way, you reminded me that we should delete it once it is downloaded, I will commit this to the sandbox.

attiks’s picture

tests added for multiline and allowed/blocked tags

attiks’s picture

FileSize
142.62 KB
penyaskito’s picture

I started the refactoring _locale_import_one_string_db. We need to get rid of the dependency from Gettext to locale module.

Gábor Hojtsy’s picture

I would guess at first that the locale module should implement the DB dumper / reader at the first place? If there is no DB definition to work with, how else could the dumper/reader work?

Status: Needs review » Needs work

The last submitted patch, i1189184.patch, failed testing.

penyaskito’s picture

Note to self:

* Answer Gábor.
* Check with attiks why the 2 new tests are failing. I fixed them with

diff --git a/core/modules/locale/lib/Drupal/locale/Tests/LocaleImportFunctionalTest.php b/core/modules/locale/lib/Drupal/locale/Tests/LocaleImportFunctionalTest.php
index e321be3..8415572 100644
--- a/core/modules/locale/lib/Drupal/locale/Tests/LocaleImportFunctionalTest.php
+++ b/core/modules/locale/lib/Drupal/locale/Tests/LocaleImportFunctionalTest.php
@@ -212,7 +212,7 @@ class LocaleImportFunctionalTest extends WebTestBase {
     $search = array(
       'string' => 'HTTP Result Code: !status',
       'language' => $langcode,
-      'translation' => 'translated',
+      'translation' => 'all',
     );
     $this->drupalPost('admin/config/regional/translate/translate', $search, t('Filter'));
     $this->assertNoText(t('No strings available.'), t('String successfully imported.'));
@@ -221,7 +221,7 @@ class LocaleImportFunctionalTest extends WebTestBase {
     $search = array(
       'string' => 'xyzzy',
       'language' => $langcode,
-      'translation' => 'translated',
+      'translation' => 'all',
     );
     $this->drupalPost('admin/config/regional/translate/translate', $search, t('Filter'));
     $this->assertNoText(t('No strings available.'), t('String successfully imported.'));

but not sure if the strings contained at files imported automatically should be imported marked as 'translated'. If the answer is not, the problem is at the test.

Gábor Hojtsy’s picture

Yeah, if you import translations, they should appear with the translations :) Unless you import empty translations, which removes them, just like deleting them on the UI and submitting the form. I think the purpose of this whole issue generally is to replace what we have in an underlying layer **with the same functionality**, so the tests should equally pass as before, no?

penyaskito’s picture

Some pending tasks:

#156: We have to move PoDbReader and PoDbWriter to the Locale module.

#159: We should take a look at the importing functionality when adding a new language, I think that we are not flagging the strings as 'translation'.

#159: If I'm right with my previous affirmation, I'm scared that this issue didn't appeared before. Maybe we need more tests?

#*: We should git mv core/lib/Drupal/Core/Gettext to core/lib/Drupal/Component/Gettext

attiks’s picture

#158 is right
#18 test for multilines added
#55 ok
#69 Batch support is working, batch inside batch is handled in a follow-up
#6 ok
#1 is out of scope for this issue

so list in #160 wraps it up (for now ;p)

attiks’s picture

slightly off topic: i created an overview off the current behaviour of the translation interface so we know which tests can pass: https://docs.google.com/spreadsheet/ccc?key=0Ar95pTGHJsoxdDMzbUl1Uk5aTGh...

penyaskito’s picture

Ok, so all tests at green now. Those failing tests were new, and the behavior is not what @attiks nor I expected to be. @attiks gathered together that list, that maybe we can handle in a DX follow-up task. @attiks, could you create it and attach the document?

So now, #160 is fixed

penyaskito’s picture

Status: Needs work » Needs review
FileSize
146.26 KB

Let's see a green. Maybe?

Status: Needs review » Needs work

The last submitted patch, 1189184-gettext-parsing-154.patch, failed testing.

penyaskito’s picture

Rebased on branch 8.x.

penyaskito’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 1189184-gettext-parsing-164.patch, failed testing.

penyaskito’s picture

Status: Needs work » Needs review
FileSize
143.36 KB

There is a problem with d8mi sandbox 8.x branch (unwanted commits), I did the patch now against drupal/8.x and the size makes sense.

attiks’s picture

FileSize
143.02 KB

minor changes:
* eb937ed - (HEAD, origin/gettext-1189184, gettext-1189184) Fixed some phpdocs and made array handling consiste
* 2f9c934 - Added some phpdocs. (14 hours ago)
* e8d69b6 - Removed params in importString. (14 hours ago)

attiks’s picture

Random design thoughts:

PO file without a header:
Notice: Undefined offset: 1 in Drupal\Component\Gettext\PoHeader->_locale_import_parse_header() (line 216 of /var/aegir/platforms/i18n/core/lib/Drupal/Component/Gettext/PoHeader.php).

PO file with only plural in header:
Notice: Undefined offset: 1 in Drupal\Component\Gettext\PoHeader->_locale_import_parse_header() (line 216 of /var/aegir/platforms/i18n/core/lib/Drupal/Component/Gettext/PoHeader.php).

PoHeader::compileHeader calls language_list()
should be passed as part of the header
only things needed are $language_name and $site_name

PoItem uses LOCALE_PLURAL_DELIMITER, should be a property

PoMemoryWriter uses LOCALE_PLURAL_DELIMITER, should be a property
isn't MemoryReader/Writer part of locale since it is Drupal specific?

attiks’s picture

Discussed with Christian: PoMemoryWriter can stay

vasi1186’s picture

After a pretty short round of code review (the patch and issue are pretty big...), I found a few things that I think should require some attention (besides the overall code style that is probably not 100% Drupal compliant, and there are still lots of todos in code).

+  private $translation;
+  private $finished;
+
+  /**
+   * @see BatchStateInterface
+   */
+  public function __construct() {
+    // empty
+  }
+

Do we need the empty constructor here?

 * @return boolean FALSE or NULL
+   */
+  private function readLine() {
+    // a string or boolean FALSE
+    $line = fgets($this->_fd);
+    $this->finished = ($line === FALSE);
.....

This function is pretty big, and it has lots of if-else statements. Maybe some refactoring, to make it a bit more extensible, flexible, etc...?

class PoFileWriter implements PoStreamInterface, PoWriterInterface, BatchStateInterface {
+
+  private $_uri;
+  private $_header;
+  private $_fd;
+  private $_seekpos;
+  private $_open = FALSE;
+
+  /**
+   * @see BatchStateInterface
+   */
+  function __construct() {
+    // empty
+  }

Empty constructor again.

/**
+   * Return a translation object (singular or plural)
+   *
+   * @todo Define a translation object for this purpose?
+   *       Or use a standard class for better performance?
+   */
+  public function read() {
+  }

In the abstract Reader class. I think we should avoid having empty methods, or methods in abstract classes that return just empty strings, or empty arrays. Probably in most of the cases, if they do not make much sense in the abstract class, then they should be forced to be implemented by a real class.
Same thing for the abstract Writer class.

penyaskito’s picture

About the empty constructors: We'll add batch support in a follow up issue and we need to set the state in the constructor. It isn't already there, but it will be. We need to remove them and add them again? I think that we're fine with having them there.

Reader and Writer are being removed. We had them there as a reference, but I will remove them now. If we need to check them later we can do git checkout e9af636 in the sandbox.

I'll take a look at the readLine() method.

penyaskito’s picture

Rebased patch.

penyaskito’s picture

To get rid of the circular reference between Drupal\Component\Gettext\Gettext class and the locale module, we need to move it back to the locale module, or the static methods there don't make sense at all. The Drupal\Component\Gettext\Gettext is some kind of a facade of the subsystem, but references locale classes PoDbWriter and PoDbReader (assuming a valid Db connection, which is undesirable by comment #156) and locale constants.

I'd like a second opinion before doing this move.

Status: Needs review » Needs work

The last submitted patch, 1189184-gettext-parsing-175.patch, failed testing.

attiks’s picture

yes for moving gettext.php

penyaskito’s picture

@attiks created two follow ups that shouldn't be missed if you are interested on this:

* #1637348: Import Gettext .po files in progressive batches to avoid time limits
* #1637334: PoDatabaseWriter needs Options set before Header

attiks’s picture

Status: Needs work » Needs review
FileSize
133.49 KB

let's go

Status: Needs review » Needs work

The last submitted patch, i1189184-180.patch, failed testing.

attiks’s picture

Status: Needs work » Needs review
FileSize
133.43 KB

try again

Gábor Hojtsy’s picture

Status: Needs review » Needs work

Just looked at the start, infer the rest from here :)

+++ /dev/nullundefined
@@ -1,1131 +0,0 @@
diff --git a/core/includes/gettext.sketch.inc b/core/includes/gettext.sketch.inc

diff --git a/core/includes/gettext.sketch.inc b/core/includes/gettext.sketch.inc
new file mode 100644
index 0000000..fdaa2c5

index 0000000..fdaa2c5
--- /dev/null

--- /dev/null
+++ b/core/includes/gettext.sketch.incundefined

+++ b/core/includes/gettext.sketch.incundefined
+++ b/core/includes/gettext.sketch.incundefined
@@ -0,0 +1,195 @@

@@ -0,0 +1,195 @@
+<?php
+
+/**
+ * @file
+ * Experimental "sketchy" code for gettext parsing and generating API.
+ * For a world without failures, exceptions and certainly without users ;)
+ */

This whole file should not be here, right? It is not for core.

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.phpundefined
@@ -0,0 +1,556 @@
+ * TODOs
+ * - constructor needs a state

This todo is mentioned above too? At least link the issue and change to @todo if want to keep here.

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.phpundefined
@@ -0,0 +1,556 @@
+ * TODOs:
+ * Current implementation has a simple white-space fall-thru. This should
+ * be improved

What?!? What does this mean? Is this still true?

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.phpundefined
@@ -0,0 +1,556 @@
+ * Current implementation misses the special comments
+ * #.
+ * #:
+ * #|

I think you can mention this, but let's not use this much space for it. You don't support flags either do you (#,).

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.phpundefined
@@ -0,0 +1,556 @@
+  private $_size;
+  private $_fd;

Missing docs.

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.phpundefined
@@ -0,0 +1,556 @@
+  private $translation;
+  private $finished;

Missing docs.

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.phpundefined
@@ -0,0 +1,556 @@
+    // empty

At least document why is this.

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.phpundefined
@@ -0,0 +1,556 @@
+  public function getURI() {
+    return $this->_uri;
+  }
+
+  public function setURI($uri) {
+    $this->_uri = $uri;
+  }
+
+  public function getLangcode() {
+    return $this->_langcode;
+  }
+
+  public function setLangcode($langcode) {
+    $this->_langcode = $langcode;
+  }

Add docs. Also all other methods.

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.phpundefined
@@ -0,0 +1,556 @@
+      // We immediately read the header as we are at BOF

"as we are at BOF"? Explain this a bit more.

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.phpundefined
@@ -0,0 +1,556 @@
+      throw new \Exception("Cannot open without URI set");

\Expection ?! Also single quotes.

vasi1186’s picture

Status: Needs work » Needs review

Some very small observations:

...
+  /**
+   * Language code.
+   *
+   * @var string
+   */
+  private $_langcode = NULL;
+  private $_size;
+  private $_fd;
+
+  /**
+   * The PO file header.
....

minor thing, $_size and $_fd are not documented... not sure if we must document every single class variable... Also, the PoHeader class has not documentation for its attributes.

+      // We immediately read the header as we are at BOF
...
+    // Make sure to (re)read the PoHeader

MInor thing, the "." should be at the end of the sentences.

+
+  public function setHeader(PoHeader $header) {
+    // TODO : throw exception?
+  }
+

I think there is some code missing here, that should have in the end set the $_header attribute of the class to $header (at least this is what I would expect from this function...)

The class PoMemoryWriter class has lots of methods that are not yet implemented. Should this class be removed entirely, or maybe just those methods? As I talked with attiks, they are actually not needed in this class....

....
+
+  public function getHeader() {
+    // TODO: what
+  }
+
+  public function getLangcode() {
+    // TODO: what
+  }
+
+  public function setHeader(PoHeader $header) {
+    // TODO: what
+  }
+
+  public function setLangcode($langcode) {
+    // TODO: what
+  }
+
...

The main issues I found are the missing documentation and code that is not used yet...

attiks’s picture

Status: Needs review » Needs work
attiks’s picture

Status: Needs work » Needs review
FileSize
113.74 KB

docs added, interface changed, ....

Gábor Hojtsy’s picture

Status: Needs review » Needs work
+++ b/core/includes/install.incundefined
--- /dev/null
+++ b/core/lib/Drupal/Component/Gettext/BatchStateInterface.phpundefined

+++ b/core/lib/Drupal/Component/Gettext/BatchStateInterface.phpundefined
+++ b/core/lib/Drupal/Component/Gettext/BatchStateInterface.phpundefined
@@ -0,0 +1,54 @@

@@ -0,0 +1,54 @@
+<?php
+
+// @TODO: BatchApi related code will need to be fixed by https://drupal.org/node/1637348.
+// The following code isn't used for the moment.
+

Can we push this to a separate branch? Its easier to focus on stuff that we have than try to ignore stuff that we don't need. :/

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.phpundefined
@@ -0,0 +1,556 @@
+ * @TODO: This code needs love

@TODO should be @todo (applies everywhere where you use TODO or @TODO).

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.phpundefined
@@ -0,0 +1,556 @@
+
+  private $translation;
+  private $finished;
+

Not documented yet.

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.phpundefined
@@ -0,0 +1,556 @@
+  /**
+   * @see BatchStateInterface
+   */
+  public function __construct() {
+    // empty
+  }

It does not seem hard to remove this unused code to me. Is it?

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.phpundefined
@@ -0,0 +1,556 @@
+
+  public function getURI() {
+    return $this->_uri;
+  }
+
+  public function setURI($uri) {
+    $this->_uri = $uri;
+  }
+
+  public function getLangcode() {
+    return $this->_langcode;
+  }
+
+  public function setLangcode($langcode) {
+    $this->_langcode = $langcode;
+  }
+
+
+  public function open() {

Lacks docs. Also other methods like this under there.

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.phpundefined
@@ -0,0 +1,556 @@
+      // We immediately read the header as we are at BOF

// Start reading the file with the header.

Sounds like a trivial comment :)

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.phpundefined
@@ -0,0 +1,556 @@
+      throw new \Exception("Cannot open without URI set");

Remove \, single quotes.

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.phpundefined
@@ -0,0 +1,556 @@
+
+  public function setState(array $state) {
+    $this->setURI($state['uri']);
+    $this->setLangcode($state['langcode']);
+    // Make sure to (re)read the PoHeader
+    $this->open();
+    // Move to last read position.
+    if (isset($state['seekpos'])) {
+      fseek($this->_fd, $state['seekpos']);
+    }
+    if (isset($state['lineno'])) {
+      $this->lineno = $state['lineno'];
+    }
+  }
+
+  public function getState() {
+    return array(
+      'class' => __CLASS__,
+      'uri' => $this->_uri,
+      'langcode' => $this->_langcode,
+      'seekpos' => ftell($this->_fd),
+      'lineno' => $this->lineno,
+    );
+  }

We don't use this.

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.phpundefined
@@ -0,0 +1,556 @@
+   * Return a translation object (singular or plural)
+   *
+   * @todo Define a translation object for this purpose?
+   *       Or use a standard class for better performance?

First line should be complete sentence with .

Text should be on new line, indented.

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.phpundefined
@@ -0,0 +1,556 @@
+    // TODO : throw exception?

If todo, move up to phpdoc and @todo, otherwise throw an exception :D

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.phpundefined
@@ -0,0 +1,556 @@
+   * Reads the header from the given input stream.
+   *
+   * We need to read the optional first COMMENT
+   * Next read a MSGID and a MSGSTR
+   */

This is not what the code does here.

+++ b/core/lib/Drupal/Component/Gettext/PoFileWriter.phpundefined
@@ -0,0 +1,181 @@
+  /**
+   * @see BatchStateInterface
+   * @link https://drupal.org/node/1637348
+   */
+  function __construct() {
+    // empty
+  }

Not used.

+++ b/core/lib/Drupal/Component/Gettext/PoFileWriter.phpundefined
@@ -0,0 +1,181 @@
+    // Open in append mode
+    $this->_fd = fopen($this->getURI(), 'a');
+    $this->_seekpos = ftell($this->_fd);
+    if ($this->_seekpos == 0) {
+      // If file is new position == 0
+      $this->writeHeader();
+    }
+    else {
+      $reader = new PoFileReader($this->uri);
+      $this->_header = $reader->getHeader();
+    }

Dots at end of lines.

The current code does not support batched writing, and we don't seem to have a problem with that at all. We do have a problem with lack of batched reading, but not writing. So let's not focus on solving batched writing here, it complicates this quite a bit.

+++ b/core/lib/Drupal/Component/Gettext/PoFileWriter.phpundefined
@@ -0,0 +1,181 @@
+  /*
+   * Write data to the file and advance the pointer.
+   */
+  private function write($data) {
+    $result = fputs($this->_fd, $data);
+    if ($result === FALSE) {
+      throw new Exception("Unable to write data : " . substr($data, 0, 20));
+    }
+    $this->_seekpos = ftell($this->_fd);
+  }

No need to keep a pointer, see above.

+++ b/core/lib/Drupal/Component/Gettext/PoFileWriter.phpundefined
@@ -0,0 +1,181 @@
+  /*
+   * Writes the po header to the file.
+   */
+  private function writeHeader() {
+    $this->write($this->_header);
+  }
+
+  /*
+   * Writes one PoItem to the file.
+   */
+  public function writeItem(PoItem $item) {
+    $this->write($item);
+  }
+
+  /*
+   * Writes one or more PoItem's to the file.
+   */

Gimme more stars! (At start of phpdoc). And later.

+++ b/core/lib/Drupal/Component/Gettext/PoFileWriter.phpundefined
@@ -0,0 +1,181 @@
+      throw new Exception("Empty URI");

Single quotes.

+++ b/core/lib/Drupal/Component/Gettext/PoHeader.phpundefined
@@ -0,0 +1,405 @@
+ * @TODO: This code needs love
+ * @see https://drupal.org/node/1637662

Meaning?

+++ b/core/lib/Drupal/Component/Gettext/PoHeader.phpundefined
@@ -0,0 +1,405 @@
+
+ * @author clemens
+ */

Nice try :) We don't have author names in core like that. You'll be on the commit message and your name will be huge in the contributor cloud for D8MI :)

+++ b/core/lib/Drupal/Component/Gettext/PoHeader.phpundefined
@@ -0,0 +1,405 @@
+  private $_langcode;
+  private $_projectIdVersion;
+  private $_potCreationDate;
+  private $_poRevisionDate;
+  private $_languageTeam;
+  private $_mimeVersion;
+  private $_contentType;
+  private $_contentTransferEncoding;
+  private $_pluralForms;
+  private $_authors;
+  private $_po_date;
+  private $_languageName;
+  private $_projectName;

Lacks docs. Also, most of these are NEVER used in the code. I think we should limit this to the properties that Drupal uses. I *know* this does not sound like a nice reusable library then, but we found Symfony's not that one either. We are also allowed to focus on what we need :)

+++ b/core/lib/Drupal/Component/Gettext/PoHeader.phpundefined
@@ -0,0 +1,405 @@
+  /**
+   * Creates a PoHeader with default values set.
+   *
+   * @param type $langcode
+   */

@see setDefaults(). Document langcode (yeah, I know trivial, yeah I know :)

+++ b/core/lib/Drupal/Component/Gettext/PoHeader.phpundefined
@@ -0,0 +1,405 @@
+
+  static public function mapping() {
...
+
+  function getPlural() {
+    return $this->_pluralForms;
+  }
+
+  function setLanguageName($languageName) {
+    $this->_languageName = $languageName;
+  }
+
+  function getLanguageName() {
+    return $this->_languageName;
+  }
+
+  function setProjectName($projectName) {
+    $this->_projectName = $projectName;
+  }

Dox.

+++ b/core/lib/Drupal/Component/Gettext/PoHeader.phpundefined
@@ -0,0 +1,405 @@
+      'Project-Id-Version' => '_projectIdVersion',
+      // * Report-Msgid-Bugs-To
+      'POT-Creation-Date' => '_potCreationDate',
+      'PO-Revision-Date' => '_poRevisionDate',
+      // * Last-Translator
+      'Language-Team' => '_languageTeam',
+      'MIME-Version' => '_mimeVersion',
+      // * Language
+      'Content-Type' => '_contentType',
+      'Content-Transfer-Encoding' => '_contentTransferEncoding',
+      'Plural-Forms' => '_pluralForms',

Only deal with what we support . Less code, happier reviewers :)

+++ b/core/lib/Drupal/Component/Gettext/PoHeader.phpundefined
@@ -0,0 +1,405 @@
+  /**
+   * Stores a given PO Header string
+   *
+   * TODO: the header string is cleaned by the parser :(
+   *   we need to accept unclean version too
+   *
+   * @param type $header
+   */

Dot at end of function sentence.

@todo if its really a todo, not TODO. Why do you need this? Is this a general wish, or a failure in the implementation? To me as a core reviewer, especially with ":(" it says "this is broken, DO NOT commit".

Also document $header is a PO format header, or something like that.

+++ b/core/lib/Drupal/Component/Gettext/PoHeader.phpundefined
@@ -0,0 +1,405 @@
+   * @TODO: compare with Symfony::setDefaults()
+   *
+   * @param type $values
+   */

Document the function, if to do, then @todo. Document which keys are supported in $values. Or if you'd need to repeat that docs all the time at other places, figure out one place for it and @see to that.

+++ b/core/lib/Drupal/Component/Gettext/PoHeader.phpundefined
@@ -0,0 +1,405 @@
+   * @return
+   *   An array containing the number of plurals and a
+   *   formula in PHP for computing the plural form.

Make the line longer as possible, not over the limit :) Too short lines are bad, too long lines are bad.

+++ b/core/lib/Drupal/Component/Gettext/PoHeader.phpundefined
@@ -0,0 +1,405 @@
+    // First, delete all whitespace
+    $pluralforms = strtr($pluralforms, array(" " => "", "\t" => ""));
+
+    // Select the parts that define nplurals and plural
+    $nplurals = strstr($pluralforms, "nplurals=");
+    if (strpos($nplurals, ";")) {
+      $nplurals = substr($nplurals, 9, strpos($nplurals, ";") - 9);
+    }
+    else {
+      return FALSE;
+    }
+    $plural = strstr($pluralforms, "plural=");
+    if (strpos($plural, ";")) {
+      $plural = substr($plural, 7, strpos($plural, ";") - 7);
+    }
+    else {
+      return FALSE;
+    }
+
+    // Get PHP version of the plural formula

Let's put dots at the end of lines.

+++ b/core/lib/Drupal/Component/Gettext/PoHeader.phpundefined
@@ -0,0 +1,405 @@
+      // Numbers and the $n variable are simply pushed into $element_stack
...
+    // Flush operator stack
...
+    // Now extract formula from stack

Put dots at comment line ends :) All the time, all the time :)

+++ b/core/lib/Drupal/Component/Gettext/PoHeader.phpundefined
@@ -0,0 +1,405 @@
+  /**
+   * Provides backward-compatible formula parsing for token_get_all().
+   *

backwards compatible? token_get_all()? I think this is a pre-existing comment, but it sounds totally out of place.

+++ b/core/lib/Drupal/Component/Gettext/PoInterface.phpundefined
@@ -0,0 +1,30 @@
+ * TODOs
+ * - constructor needs a state
+ * - add getState
+ * - gettextInterface should have a readLine method

Are these still stand? Batch related stuff should go away for now.

+++ b/core/lib/Drupal/Component/Gettext/PoInterface.phpundefined
@@ -0,0 +1,30 @@
+/**
+ * Defines PO / gettext related must haves.
+ *

Maybe figure out a fancier doc for this :) Like "Methods required for both reader and writer implementations." if it fits on one line :)

+++ b/core/lib/Drupal/Component/Gettext/PoItem.phpundefined
@@ -0,0 +1,138 @@
+/**
+ * PoItem allows for some transformation on it's data.
+ *
+ * @TODO: This code needs love
+ * @see https://drupal.org/node/1637662
+ *
+ * @author clemens
+ */

"some transformation"? This should be more specific. Needs love should be more specific, @TODO should be @todo, @author should not be in core patches.

+++ b/core/lib/Drupal/Component/Gettext/PoItem.phpundefined
@@ -0,0 +1,138 @@
+  /**
+   * The context this translation belongs to.
+   *
+   * The default context should be an empty string
+   * @see PoMemoryWriter.writeItem()
+   * @var string
+   */
+  public $context = '';
+  public $source;
+  public $plural;
+  public $comment;
+  public $translation;
+

Most not documented. Clean up docs by adding dot on end of line, empty line before @see, etc.

+++ b/core/lib/Drupal/Component/Gettext/PoItem.phpundefined
@@ -0,0 +1,138 @@
+  static public function mapping() {
+    return array(
+      'msgctxt' => 'context',
+      'msgid' => 'source',
+      'msgstr' => 'translation',
+      '#' => 'comment',
+    );
+  }

phpdoc missing. This is accidentally named the same as the header one, or is it just a pattern? :)

+++ b/core/lib/Drupal/Component/Gettext/PoItem.phpundefined
@@ -0,0 +1,138 @@
+  public function fromArray(array $values = array()) {
+    foreach ($values as $key => $value) {
+      $this->{$key} = $value;
+    }
+    if (isset($this->source) && strpos($this->source, LOCALE_PLURAL_DELIMITER) !== FALSE) {
+      $this->source = explode(LOCALE_PLURAL_DELIMITER, $this->source);
+      $this->translation = explode(LOCALE_PLURAL_DELIMITER, $this->translation);
+      $this->plural = count($this->translation);
+    }
+  }

For consistency, above we had setFromString, this one is fromArray. Should it be setFromArray()? Consistency is key.

+++ b/core/lib/Drupal/Component/Gettext/PoItem.phpundefined
@@ -0,0 +1,138 @@
+  public function __toString() {
+    return $this->compileTranslation();
+  }

phpdoc.

+++ b/core/lib/Drupal/Component/Gettext/PoItem.phpundefined
@@ -0,0 +1,138 @@
+  /**
+   * Compile PO translations strings from a translation object.
+   *
+   * Translation object consists of:
+   *   source       string (singular) or array of strings (plural)
+   *   translation  string (singular) or array of strings (plural)
+   *   plural       TRUE: source and translation are plurals
+   *   context      source context string
+   */

While this looks cool in fixed width, the supposed formatting for a list I think is - (hyphen) at each item's start and : (colon) inbetween the name of the item and the description. That is parsed better for api.d.o.

+++ b/core/lib/Drupal/Component/Gettext/PoItem.phpundefined
@@ -0,0 +1,138 @@
+  private function compileTranslation() {
+    $output = '';
+

compileTranslation sounds like strange terminology (see also phpdoc for it). Instead of bikeshedding the terminology, since this is private, only called from __toString(), why not inline the whole thing there.

If we want to keep it as separate method, name it maybe formatItem() which would be consistent with the rest of the "formatters".

+++ b/core/lib/Drupal/Component/Gettext/PoItem.phpundefined
@@ -0,0 +1,138 @@
+  /**
+   * Formats a string for output on multiple lines.
+   */
+  private function formatString($string) {
+    // Escape characters for processing.
+    $string = addcslashes($string, "\0..\37\\\"");
+
+    // Always include a line break after the explicit \n line breaks from
+    // the source string. Otherwise wrap at 70 chars to accommodate the extra
+    // format overhead too.
+    $parts = explode("\n", wordwrap(str_replace('\n', "\\n\n", $string), 70, " \n"));
+
+    // Multiline string should be exported starting with a "" and newline to
+    // have all lines aligned on the same column.
+    if (count($parts) > 1) {
+      return "\"\"\n\"" . implode("\"\n\"", $parts) . "\"\n";
+    }
+    // Single line strings are output on the same line.
+    else {
+      return "\"$parts[0]\"\n";
+    }
+  }

We can leave this here for the sake of not refactoring. But.... But, I have this condensed down to 2 lines in a comment above. This is pretty old PHP 3 code really.

+++ b/core/lib/Drupal/Component/Gettext/PoMemoryWriter.phpundefined
@@ -0,0 +1,63 @@
+ *
+ * @TODO: do we need a BatchStateInterface? Yes https://drupal.org/node/1637348

Yes we need, later. Later.

+++ b/core/lib/Drupal/Component/Gettext/PoMemoryWriter.phpundefined
@@ -0,0 +1,63 @@
+  private $_items;
+

Document it.

+++ b/core/lib/Drupal/Component/Gettext/PoMemoryWriter.phpundefined
@@ -0,0 +1,63 @@
+
+  public function setState(array $state) {
+    // nothing to do
+  }
+
+  public function getState() {
+    return array();
+  }

Yesss, remove it, remove it :)

+++ b/core/lib/Drupal/Component/Gettext/PoMemoryWriter.phpundefined
@@ -0,0 +1,63 @@
+   * Stores values into memory.
+   *
+   * The structure is context dependent.
+   * TODO: where is this structure documented?
+   * - array[context][source] = translation
+   *

Stores items in memory?

Of course it is context dependent, we imply that.

Yeah, would be cool to document it somewhere, like up there where $_items are defined for the class.

+++ b/core/lib/Drupal/Component/Gettext/PoMemoryWriter.phpundefined
@@ -0,0 +1,63 @@
+
+  public function writeItems(PoReaderInterface $reader, $count = -1) {
+    $forever = $count == -1;
+    while (($count-- > 0 || $forever) && ($item = $reader->readItem())) {
+      $this->writeItem($item);
+    }
+  }
+
+  public function getData() {
+    return $this->_items;

phpdoc.

+++ b/core/lib/Drupal/Component/Gettext/PoReaderInterface.phpundefined
@@ -0,0 +1,17 @@
+
+/**
+ * @file
+ * Definition of Drupal\Component\Gettext\PoReader.

+++ b/core/lib/Drupal/Component/Gettext/PoStreamInterface.phpundefined
@@ -0,0 +1,22 @@
+
+/**
+ * @file
+ * Definition of Drupal\Component\Gettext\PoReader.

doc does not match filename/classname.

+++ b/core/lib/Drupal/Component/Gettext/PoWriterInterface.phpundefined
@@ -0,0 +1,32 @@
+interface PoWriterInterface {
+  /**
+   * Writes the given item.
+   *
+   * @param PoItem $item

Minor: leave a line between the { and the doc.

+++ b/core/lib/Drupal/Component/Gettext/PoWriterInterface.phpundefined
@@ -0,0 +1,32 @@
+   * @param PoReaderInterface $reader
+   * @param $count

Document $reader. I know, I know it sounds trivial, but its useful to help when you code against it.

+++ b/core/modules/locale/lib/Drupal/locale/Gettext.phpundefined
@@ -0,0 +1,86 @@
+/**
+ * The Gettext class provides for Gettext specific functionality.

The gettext class is a gettext class :) Did you mean "Drupal specific functionality"? @attiks says you did :)

+++ b/core/modules/locale/lib/Drupal/locale/Gettext.phpundefined
@@ -0,0 +1,86 @@
+ * Most operations related to Gettext PO file handling are related to pumping
+ * data from a source to a destination:
+ * - Remote file http://*.po to memory
+ * - Database to public://*.po

to a destination, *for example*:

+++ b/core/modules/locale/lib/Drupal/locale/Gettext.phpundefined
@@ -0,0 +1,86 @@
+ *
+ * This class adds static methods mostly.

Remove.

+++ b/core/modules/locale/lib/Drupal/locale/Gettext.phpundefined
@@ -0,0 +1,86 @@
+   *
+   * @param type $langcode
+   * @param array $files
+   * @return array
...
+   *
+   * @param type $file
+   * @param type $langcode
+   * @param array $files
+   * @return array

type => string?
Document them, document return values.

+++ b/core/modules/locale/lib/Drupal/locale/Gettext.phpundefined
@@ -0,0 +1,86 @@
+    try {
+      // When opening header is parsed immediately
+      $reader->open();

Remove comment IMHO.

+++ b/core/modules/locale/lib/Drupal/locale/Gettext.phpundefined
@@ -0,0 +1,86 @@
+    }
+    catch (Exception $exc) {
+      throw new $exc;
...
+    catch (Exception $exc) {
+      throw new $exc;

Do we abbreviate exceptions like that?

+++ b/core/modules/locale/lib/Drupal/locale/Gettext.phpundefined
@@ -0,0 +1,86 @@
+    if (!$header) {
+      throw new Exception('missing or malformed header.');

Make it nicer by starting off with an M (vs. m).

+++ b/core/modules/locale/lib/Drupal/locale/Gettext.phpundefined
@@ -0,0 +1,86 @@
+    // It's vital options are set first.
+    // @TODO: this has to be fixed in https://drupal.org/node/1637334

IMHO remove the first comment line, leave the todo, rename to @todo :) And document what is "this" that has to be fixed.

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseReader.phpundefined
@@ -0,0 +1,229 @@
+
+// @TODO: BatchApi related code will need to be fixed by https://drupal.org/node/1637348.

Remove.

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseReader.phpundefined
@@ -0,0 +1,229 @@
+  /*
+   * @param $options
+   *   An associative array indicating what data should be overwritten, if any.
+   *   - not_customized: not customized strings should be overwritten.
+   *   - customized: customized strings should be overwritten.
+   * @param $customized
+   *   (optional) Whether the strings being imported should be saved as customized.
+   *   Use LOCALE_CUSTOMIZED or LOCALE_NOT_CUSTOMIZED.
+   */

Copy paste docs? Segment as needed per properties. reformat phpdoc as appropriate (no @param).

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseReader.phpundefined
@@ -0,0 +1,229 @@
+   * lid of last read record

Dot at end. Document that this is lid as in locales_source. (I guess).

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseReader.phpundefined
@@ -0,0 +1,229 @@
+    if (!isset($options['override_options'])) {
+      $options['override_options'] = array();
+    }
+    if (!isset($options['customized'])) {
+      $options['customized'] = LOCALE_NOT_CUSTOMIZED;
+    }
+    $this->_options = array(
+      'override_options' => $options['override_options'],
+    );
+    $this->_options += $options;

Sounds like what you are doing is:

$this->_options = $options + array(
'override_options' => array(),
'customized' => LOCALE_NOT_CUSTOMIZED,
);

Is += not doing deep merge??!

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseReader.phpundefined
@@ -0,0 +1,229 @@
+  /**
+   * Set the state for the BatchApi support.
+   */
+  function setState(array $state) {
+    $this->_lid = $state['lid'];
+    $this->setOptions($state['options']);
+    $this->buildQuery();
+  }
+
+  /**
+   * Get the state for the BatchApi support.
+   */
+  function getState() {
+    return array(
+      '__CLASS__' => __CLASS__,
+      'lid' => $this->_lid,
+      'options' => $this->_options,
+    );

Dropit.

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseReader.phpundefined
@@ -0,0 +1,229 @@
+  /**
+   * Set the header in PO format in a reader is not allowed but mandatory by ¶
+   * BatchStateInterface, so calling this will throw an exception.
+   *
+   * @throws Exception
+   *  Always, because you cannot set the PO header of a reader, but this method ¶
+   *  it's required by the BatchStateInterface.
+   */
+  function setHeader(PoHeader $header) {
+    throw new \Exception('You cannot set the PO header in a reader.');

Line ending spaces.

2-line function comment.

"this method it's required" => "is required". AND since this is going away, it is not required WOOOOO. Method can be removed altogether?

\Exception (again?)

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseReader.phpundefined
@@ -0,0 +1,229 @@
+   *
+   * @param $language
+   *   Language object to generate the output for, or NULL if generating
+   *   translation template.
+   * @param $options
+   *   (optional) An associative array specifying what to include in the output:
+   *   - customized: include customized strings (if TRUE)
+   *   - uncustomized: include non-customized string (if TRUE)
+   *   - untranslated: include untranslated source strings (if TRUE)
+   *   Ignored if $language is NULL.
+   *
+   * @return
+   *   An array of translated strings that can be used to generate an export.
+   */
+  private function buildQuery() {

These parameters don't exist on the method :) :)

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseReader.phpundefined
@@ -0,0 +1,229 @@
+    // Assume FALSE for all options if not provided by the API.
+    $options += array(
+      'customized' => FALSE,
+      'not_customized' => FALSE,
+      'not_translated' => FALSE,

Why not do these in setOptions like the 'customized' key is? Or in the constructor if we don't assume setOptions() is invoked, or in the variable def above?

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseReader.phpundefined
@@ -0,0 +1,229 @@
+
+    // TODO: we need to order by lid
+    // This does not seem to work
+    $query->orderBy('s.lid');
+    $query->condition('s.lid', $this->_lid, '>');

It is already ordered by lid, why @todo? What does not work?! We don't need the batching stuff here.

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseReader.phpundefined
@@ -0,0 +1,229 @@
+  /**
+   * Get the full result for the given langauge and options.

langauge => language

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseReader.phpundefined
@@ -0,0 +1,229 @@
+  /**
+   * Read one item at the time from the results and stores the state.

Read one item at *a* time from the results and *store* the state.

BTW we read it all at once, and don't need to store the state (as in the lid) as per no batch support in here. The database query results are there and *are the state*, since we don't need to continue this in multiple requests, we can rely just on that.

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseWriter.phpundefined
@@ -0,0 +1,347 @@
+
+// @TODO: BatchApi related code will need to be fixed by https://drupal.org/node/1637348.
+

Deleteit.

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseWriter.phpundefined
@@ -0,0 +1,347 @@
+class PoDatabaseWriter implements PoWriterInterface, BatchStateInterface {
+  /*
+   * @param $overwrite_options
+   *   An associative array indicating what data should be overwritten, if any.
+   *   - not_customized: not customized strings should be overwritten.
+   *   - customized: customized strings should be overwritten.
+   * @param $customized
+   *   (optional) Whether the strings being imported should be saved as customized.
+   *   Use LOCALE_CUSTOMIZED or LOCALE_NOT_CUSTOMIZED.
+   */
+
+  /*
+   * @param $options
+   *   An associative array indicating what data should be overwritten, if any.
+   *   - not_customized: not customized strings should be overwritten.
+   *   - customized: customized strings should be overwritten.
+   * @param $customized
+   *   (optional) Whether the strings being imported should be saved as customized.
+   *   Use LOCALE_CUSTOMIZED or LOCALE_NOT_CUSTOMIZED.
+   */
+  private $_options;

Temp code copy-paste?!

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseWriter.phpundefined
@@ -0,0 +1,347 @@
+  /**
+   * Default state need for BatchApi implementation
+   * @link https://drupal.org/node/1637348
+   */
+  static function getDefaultState() {
+    return array(
+      'langcode' => NULL,
+      'report' => array(
+        'additions' => 0,
+        'updates' => 0,
+        'deletes' => 0,
+        'skips' => 0,
+        'ignored' => 0,
+      ),
+      'options' => array(
+        'overwrite_options' => array(
+          'not_customized' => FALSE,
+          'customized' => FALSE,
+        ),
+        'customized' => LOCALE_NOT_CUSTOMIZED,
+      ),
+    );

Well, we need this for the in-page reading process, not only for the batch. Is this an in-file batch or inbetween file batch that you need this for?

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseWriter.phpundefined
@@ -0,0 +1,347 @@
+  /**
+   * Report array summarizing the number of changes done in the form:
+   * array(additions, updates, deletes, skips, ignored).
+   */
+  private $_report;

2 line comment => 1 line comment.

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseWriter.phpundefined
@@ -0,0 +1,347 @@
+  /**
+   * Implementation of BatchInterface::setState
+   *
+   * @param array $state
+   * @link https://drupal.org/node/1637348
+   */
+  public function setState(array $state) {
+    $state += self::getDefaultState();
+    $this->_report = $state['report'];
+    $this->setLangcode($state['langcode']);
+    $this->setOptions($state['options']);
+  }
+
+  public function getState() {
+    return array(
+      'class' => __CLASS__,
+      'report' => $this->getReport(),
+      'langcode' => $this->getLangcode(),
+      'options' => $this->getOptions(),
+    );

Do we need this for inbetween file batches or only in-file batches?

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseWriter.phpundefined
@@ -0,0 +1,347 @@
+  /**
+   * Get the header.
+   * @see setHeader
+   */
+  function getHeader() {
+    return $this->_header;
+  }

Which is text, or array, or...

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseWriter.phpundefined
@@ -0,0 +1,347 @@
+  /**
+   * Sets the header and configure drupal accordingly.

drupal => Drupal

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseWriter.phpundefined
@@ -0,0 +1,347 @@
+    // Check for options
...
+    // Check for langcode
...
+      // TODO: this is a sloppy way to create a file name
+      // but parsePluralForms is also weird to me still

dots at end of line.
"weird to me still" is not the kind of comment we want to have in core :D Its pre-existing code, so we can work on refactoring that one later, no need to mention it is crap, we know it is crap. At least I feel with you.

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseWriter.phpundefined
@@ -0,0 +1,347 @@
+  /**
+   * Write next items from a reader to the database.
+   * If no number of items is specified, all of theme are being written.

Put empty line between two lines. Rethink first doc line.

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseWriter.phpundefined
@@ -0,0 +1,347 @@
+   * @param $count
+   *  The number of items that will be wrote.
...
+   * @param PoItem $item
+   *  The item being imported.
+   * @param $location

Not enough whitespace in indent.

+++ b/core/modules/locale/locale.bulk.incundefined
@@ -107,7 +110,39 @@ function locale_translate_import_form_submit($form, &$form_state) {
+    } catch (Exception $exc) {
+      drupal_set_message(print_r($exc, TRUE));

Dadadadada!

clemens.tolboom’s picture

(Maybe I missed something the last 3 days then sorry for dups :p)

1. Just a quick comment on batches. Both reader and writer need to know whether they are 'in batch mode'. If the seekpos of PoFileWriter is not == 0 there is no need to write the header.

2. PoInterface should be GettextInterface as both PO and MO understand (afaik) header and language if we ever want to support MO files.

3. We now have a Component/Gettext but all files in it are named Po* (there is no README.txt explaining this :)

4. PoFileReader::readHeader() should have used PoFileReader::setHeader which makes the comment go away not using setHeader :)

5. All interface implementation method should have a @see as it's documentation right?

6. PoFileWriter opens in append mode. But there is no documentation the user should truncate the output file first if needed.

7. Is PoItem::mapping ever used?

8. Is moving PoDatabase* to locale.module not better implemented by adding an abstracted PoDatabaseReader then implemented by a LocalePoDatabaseReader. Currently modules cannot implement easily a similar storage. By only changing the table name we could gain a lot for l.d.o I guess.

9. Gettext.php should be in core/lib/Drupal/Component/Gettext as that is still abstract enough.

attiks’s picture

Status: Needs work » Needs review
FileSize
107.5 KB

Most of #187 is fixed, going for green

Status: Needs review » Needs work

The last submitted patch, i1189184-188.patch, failed testing.

penyaskito’s picture

Status: Needs work » Needs review

#188:

1. Just a quick comment on batches. Both reader and writer need to know whether they are 'in batch mode'. If the seekpos of PoFileWriter is not == 0 there is no need to write the header.

This will be handled in follow-up #1637348: Import Gettext .po files in progressive batches to avoid time limits.
Gábor and Webchick thoughts are that we should refactor in this issue, and implement new stuff in a new one.

8. Is moving PoDatabase* to locale.module not better implemented by adding an abstracted PoDatabaseReader then implemented by a LocalePoDatabaseReader. Currently modules cannot implement easily a similar storage. By only changing the table name we could gain a lot for l.d.o I guess.

My thought was that Component\Gettext shouldn't know much about Drupal. That's why we moved PoDatabase* to Locale. l10n_server can have it's own classes that extend PoDatabaseReader and overwrite whatever method we have, none of them are marked as final.

9. Gettext.php should be in core/lib/Drupal/Component/Gettext as that is still abstract enough.

See my answer to 8. Gettext.php has specific methods for locale and has knowledge about locale. BTW, moving it again there will reproduce the circular dependency between Gettext component and locale module.

attiks’s picture

FileSize
107.71 KB

Go bot

attiks’s picture

Status: Needs review » Needs work
+++ b/core/lib/Drupal/Component/Gettext/PoFileWriter.phpundefined
@@ -0,0 +1,139 @@
+  /**
+   * Gets the URI of the current file.

Gets -> Get

+++ b/core/lib/Drupal/Component/Gettext/PoHeader.phpundefined
@@ -0,0 +1,422 @@
+  /**
+   * Creates a PoHeader with default values set.
+   *
+   * @param string $langcode
+   * @see setDefaults()

Constructor, create ...

+++ b/core/lib/Drupal/Component/Gettext/PoHeader.phpundefined
@@ -0,0 +1,422 @@
+  /**
+   * Store a given PO Header string.
+   *
+   * @param string $header
+   * @see mapping()
+   */

Create ...

+++ b/core/lib/Drupal/Component/Gettext/PoInterface.phpundefined
@@ -0,0 +1,25 @@
+/**
+ * @file
+ * Definition of Drupal\Component\Gettext\PoReader.

PoInterface, not Reader

+++ b/core/lib/Drupal/Component/Gettext/PoItem.phpundefined
@@ -0,0 +1,139 @@
+
+  /**
+   * Build the PoItem from a structured array.
+   */
+  public function setFromArray(array $values = array()) {

Create ....

+++ b/core/lib/Drupal/Component/Gettext/PoStreamInterface.phpundefined
@@ -0,0 +1,22 @@
+/**
+ * @file
+ * Definition of Drupal\Component\Gettext\PoReader.

PoStreamInterface, not PoReader

+++ b/core/lib/Drupal/Component/Gettext/PoWriterInterface.phpundefined
@@ -0,0 +1,34 @@
+  ¶

trailing space

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseWriter.phpundefined
@@ -0,0 +1,340 @@
+    ¶

trailing space

attiks’s picture

Status: Needs work » Needs review
FileSize
107.72 KB

coding style fixes

clemens.tolboom’s picture

Status: Needs review » Needs work

(Just a bigger reminder)

+++ /dev/null
@@ -1,1131 +0,0 @@
-function _locale_import_read_po($op, $file, $overwrite_options = NULL, $lang = NULL, $customized = LOCALE_NOT_CUSTOMIZED) {

As we are removing this function we need to check our new code landed in PoFileReader. I'm sure I fixed something over there (PoFileReader) but as @Sutharsan said in #55 I was using refactored code already :( :) ?!?

attiks’s picture

Status: Needs work » Needs review

@clemens, I assume you compared the code your self? In #56 you admitted you didn't use Erik's code?

Back to NR

penyaskito’s picture

Status: Needs review » Needs work

(Just a bigger reminder)

+++ /dev/null
@@ -1,1131 +0,0 @@
-function _locale_import_read_po($op, $file, $overwrite_options = NULL, $lang = NULL, $customized = LOCALE_NOT_CUSTOMIZED) {

As we are removing this function we need to check our new code landed in PoFileReader. I'm sure I fixed something over there (PoFileReader) but as @Sutharsan said in #55 I was using refactored code already :( :) ?!?

This code is at Drupal\Component\Gettext\PoFileReader::readLine AFAIK.

penyaskito’s picture

Status: Needs work » Needs review
Artusamak’s picture

Why all the private attributes in the classes are not written with $_name and with a getter/setter function? Only some are, has this been done on purpose?. The comment is not always describing the type either. Arguments /return values are not always documented.
Doxygen blocks around function should use "Sets", "Gets" instead of "Set" / "Get".

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.php
@@ -0,0 +1,549 @@
+    // a string or boolean FALSE

UCfirst + period at the end of the line.

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.php
@@ -0,0 +1,549 @@
+          // Message strings must come after msgid, msgxtxt, msgid_plural, or other msgstr[] entries.

Comment is too long, 80 chars max.

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.php
@@ -0,0 +1,549 @@
+        // Skip to the next whitespace and trim away any further whitespace, bringing $line to the message data.

Comment is too long, 80 chars max.

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.php
@@ -0,0 +1,549 @@
+        // Anything that is not a token may be a continuation of a previous token.

Comment is too long, 80 chars max.

+++ b/core/lib/Drupal/Component/Gettext/PoHeader.php
@@ -0,0 +1,422 @@
+   * Set defaults

Missing end of line period.

+++ b/core/lib/Drupal/Component/Gettext/PoItem.php
@@ -0,0 +1,139 @@
+   * Formats a string for output on multiple lines.

Describe param.

+++ b/core/lib/Drupal/Component/Gettext/PoMemoryWriter.php
@@ -0,0 +1,65 @@
+  public function getData() {

Should be called getItems()?

+++ b/core/modules/locale/locale.bulk.inc
@@ -298,8 +371,16 @@ function locale_translate_batch_import($filepath, &$context) {
+    // We need only the last match

Missing ending period.

+++ b/core/includes/install.inc
--- /dev/null
+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.php

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.php
@@ -0,0 +1,549 @@
+/**
+ * Defines a Gettext file reader for PO format.
...
+ * @see https://drupal.org/node/1637662
...
+class PoFileReader implements PoStreamInterface, PoReaderInterface {
...
+  public function setHeader(PoHeader $header) {
...
+    $this->_header = $header;

Could use $this->setHeader($header) instead if implemented?

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.php
@@ -0,0 +1,549 @@
+          $this->log('The translation file %filename contains an error: "msgid_plural" was expected but not found on line %line.', $this->lineno);

Should we throw an exception?

+++ b/core/lib/Drupal/Component/Gettext/PoFileReader.php
@@ -0,0 +1,549 @@
+        // A string for the an id or context.

Typo in comment.

Gábor Hojtsy’s picture

FileSize
63.15 KB

Talked through the class and interface names with @attiks, with their purposes, etc. The inheritance/implementation tree was not very clean to avoid some empty method implementations. We need to experiment with this a little, here is the proposed naming for now.

- rename poItem to poMetadataInterface (it has nothing to do with .po per say)
- rename poFileReader/Writer to poStreamReader/Writer since the interface is for streams, not files (if it would be for files, we'd need to rename the other way around)
- go back to extending the poMetadataInterface with poWriterInterface (it looked ugly that only the db and file writers implemented it)

GettextOOP.png

clemens.tolboom’s picture

@attiks and @penyaskito (yes I used the in-between code :/)

I had the will but haven't had the time to do a line by line comparison between the old gettext.inc and new PoFileReader::readlIne so #55 and #195,6,7 are now reminders.

We should either update the issue summary by removing the reminder or do a line by line code review between the mentioned functions. It's not a big deal but has to be done.

The issue summary was update by me through #147.

(hope this makes things clearer :)

penyaskito’s picture

Status: Needs review » Needs work
FileSize
679.61 KB

Needs work yet.

penyaskito’s picture

Status: Needs work » Needs review
penyaskito’s picture

Status: Needs review » Needs work
attiks’s picture

Status: Needs work » Needs review
FileSize
111.9 KB

one for the testbot, rebased and conflicts removed

Status: Needs review » Needs work

The last submitted patch, i1189184-195.patch, failed testing.

attiks’s picture

Status: Needs work » Needs review
FileSize
111.73 KB

Tests should be ok

Status: Needs review » Needs work

The last submitted patch, i1189184-207.patch, failed testing.

attiks’s picture

FileSize
112.1 KB

Interface renamed

attiks’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, i1189184-209.patch, failed testing.

attiks’s picture

Status: Needs work » Needs review
FileSize
112.14 KB

for the bot, we'll need to wait for #1642966: Typo: coddition instead of condition

Status: Needs review » Needs work

The last submitted patch, i1189184-212.patch, failed testing.

Anonymous’s picture

attiks’s picture

Issue summary: View changes

Summary updated to reflect reality

attiks’s picture

Status: Needs work » Needs review
Gábor Hojtsy’s picture

I started helping out with cleaning up more phpdoc. Found still several missing pieces, etc. Committed http://drupalcode.org/sandbox/goba/1624820.git/commitdiff/7f7fed55987648...

Figured out readItem() returns an array while writeItem() takes PoItem. Looks very-very strange. Why is the difference?

clemens.tolboom’s picture

Assigned: clemens.tolboom » Unassigned

PoStreamReader::readItem calls PoStreamReader::readLine calls PoStreamReader::saveOneString() which created a PoItem so the documentation of PoReaderInterface is wrong.

Gábor Hojtsy’s picture

Well, I just carried that over from PoStreamReader's readItem:

  /**
   * Return a translation object (singular or plural).
   *
   * @return array
   *  A structured array that represent the translation.
   */
  public function readItem() {
    $this->readTranslation();
    return $this->_translation;
  }

Which definitely carries it over from the private property:

  /**
   * Structured array to store the current translation.
   *
   * @var array
   */
  private $_translation;

Are we looking at the same code?

clemens.tolboom’s picture

Another note on performance writing items through PoDatabaseWriter

I discussed this briefly with @penyaskito that when reading a PO file all items are in memory those need to be written to the database.

- is there an equivalent for db_update (see code below)?
- should we delete then insert multiple items like done below?

Taken from #932814: Error out when trying to masquerading as blocked user (#6)

+      $query = db_insert('masquerade_users')->fields(array('uid_from', 'uid_to'));
+      foreach ($users as $username) {
+        if ($to_user = _masquerade_user_load($username)) {
+          $query->values(array(
+            'uid_from' => $account->uid,
+            'uid_to' => $to_user->uid,
+          ));
+        }
+      }
+      $query->execute();
clemens.tolboom’s picture

In #216 I tried to say the documentation is wrong.

Both items $_translation and function readItem should return a PoItem.

Gábor Hojtsy’s picture

Issue tags: -D8MI, -sprint, -language-ui

#212: i1189184-212.patch queued for re-testing.

Status: Needs review » Needs work
Issue tags: +D8MI, +sprint, +language-ui

The last submitted patch, i1189184-212.patch, failed testing.

attiks’s picture

Assigned: Unassigned » attiks

i will have a look later today

attiks’s picture

Status: Needs work » Needs review
FileSize
113.29 KB

try again bot

Status: Needs review » Needs work

The last submitted patch, i1189184-223.patch, failed testing.

attiks’s picture

Status: Needs work » Needs review
FileSize
114.15 KB

Timestamp wasn't set on batch import, should be fixed.

Gábor Hojtsy’s picture

Superb, thanks for the update! Let's keep cleaning up the code, eg. the misdocumentation discussed above and get it closer to done :) Thanks for sticking to this issue, its a huge change but its going to be great when it lands, opening up flexibility for so many things.

attiks’s picture

Assigned: attiks » Unassigned
attiks’s picture

Issue summary: View changes

Batch in separate issue

Gábor Hojtsy’s picture

Issue tags: -D8MI, -sprint, -language-ui

#225: i1189184-225.patch queued for re-testing.

Status: Needs review » Needs work
Issue tags: +D8MI, +sprint, +language-ui

The last submitted patch, i1189184-225.patch, failed testing.

attiks’s picture

Status: Needs work » Needs review
FileSize
113.29 KB

rebased patch

Status: Needs review » Needs work

The last submitted patch, i1189184-230.patch, failed testing.

attiks’s picture

Status: Needs work » Needs review
FileSize
113.29 KB

Status: Needs review » Needs work

The last submitted patch, i1189184-232.patch, failed testing.

attiks’s picture

Status: Needs work » Needs review
FileSize
113.34 KB
Gábor Hojtsy’s picture

FileSize
45.89 KB

Ok I took several hours skipping some sleep to work on improving code style considerably, getting closer to making the patch committable. Attached interdiff, explained below. I went through all classes and interfaces in the Drupal gettext lib. I did not yet get to the locale implementations, those need some cleanup as well. Hopefully the code should still pass with my changes. @attiks can you help with rerolling the patch itself with my commit? (I committed this in the sandbox).

In this code style cleanup:

PoHeader:

- added @file comment
- shortened master comment
- added type explanations to properties
- got rid of mapping() and setDefaults() because they were overly complicated for just tracking the plural forms
- renamed getPlural to getPluralForms
- collapsed compileHeader() and __toString() into one, since there seemed to be no use for both being wrappers of each other
- dropped $filename from parsePluralForms() because it was not used there at all
- fixed several more type and function phpdocs

PoMemoryWriter

- fixed use statements
- grammar and phpdoc standards for interface implementation methods applied
- added phpdoc to all purposefully not implemented methods for api parser

PoMetadataInterface

- fixed use statements
- added plenty phpdoc for methods (these are referenced from "Implements" phpdocs)

PoReaderInterface

- fix readItem() docs, not an array but PoItem as discussed above

PoStreamInterface

- added plenty phpdoc for methods (these are referenced from "Implements" phpdocs)

PoStreamReader

- fixed use statements
- explained context better on the class property
- defined missing $this->plural as $this->_current_plural_index instead,
- renamed _current to _current_item and _translation to _last_item (although _current_item is not a PoItem per say)
- renamed file to stream in phpdoc as appropriate
- phpdoc standards for interface implementation methods applied
- streamlined readItem(), removed intermediary readTranslation()
- explained header reading in phpdoc
- explained readLine() state machine in phpdoc
- several code style fixes, minor code comment fixes and error message text improvements in readLine()
- add missing file name token in log(), automate line number handling since it was always provided
- renamed saveOneString() to setItemFromArray() for nicer API
- minor code style improvements in parseQuoted()

PoStreamWriter

- fixed use statements
- minor phpdoc tweaks, like above
- _seekpos was not a defined property, not used for anything, so removed
- hopefully explained write() better
- phpdoc standards for interface implementation methods applied

PoWriterInterface

- very minor phpdoc fixes

PoDatabaseWriter

- just apply the getPlural() => getPluralForms() rename and parsePluralForms() not getting a filepath

Gábor Hojtsy’s picture

FileSize
116.87 KB

Rerolled with these code style changes.

Gábor Hojtsy’s picture

FileSize
117.21 KB
10.52 KB

Three more fixes:

- PoStreamWriter did not declare that it implemented the PoStreamInterface (hah), although it implemented all methods proper; fix that
- log() in PoStreamReader is incompatible with string extraction (and only used from places where we can do that directly), so do that
- PoStreamReader did not have the errors property defined, define it now

Now to let the testbot do its magic :)

Status: Needs review » Needs work

The last submitted patch, i1189184-237.patch, failed testing.

attiks’s picture

Strange failing on Drupal\search\Tests\SearchCommentTest

Gábor Hojtsy’s picture

Status: Needs work » Needs review
FileSize
117.06 KB
19.93 KB

I doubt that is related to this patch (hopefully :). Worked more on cleanup of the locale defined code as well. Just cosmetic changes again and removing some unused code cruft:

PoItem:
- missing @file comment added

Gettext:
- missing @file comment added
- fixed missing use statement
- fixed missing explanations and improper types in phpdoc

PoDatabaseReader:
- missing @file comment added
- fixed $options documentation; this is *different* to the writer options
- also fixed setOptions() accordingly
- used proper "implements" phpdoc references to base interface
- buildQuery said it would return strings, which is not really what it does, fixed doc

PoDatabaseWriter:
- missing @file comment added
- explained options better
- explained _report better
- removed "ignored" key as it was not used for anything (and not present before the patch)
- removed hardwired location updates; the drupal.org provided files do not provide any location info anyway
- fixed indentation issues with queries
- removed batch related getState/setState that was not used

Gábor Hojtsy’s picture

Title: Make gettext .po generation its own abstracted functionality » OOP & PSR-0-ify gettext .po file parsing and generation
Issue tags: +PSR-0
FileSize
4.81 KB
116.43 KB

And some readability / understandability fixes in the tests (plus a minor fix in install.inc where we added a CMI related line that should not have been part of this patch), so we don't have xyzz and xyzz2 type of values :) I did not actually run the tests locally, crossing fingers :)

The code definitely has some rough edges still, but after 240 comments, let's focus on getting it in, since it is leaps and bounds ahead of what core has now. Buzzword-enhanced the issue title for that.

Status: Needs review » Needs work

The last submitted patch, i1189184-241.patch, failed testing.

Gábor Hojtsy’s picture

Status: Needs work » Needs review
FileSize
116.53 KB
3.37 KB

Found these (minor) reasons for all of the test fails:

- Gettext.php wanted to use the database writer from the wrong location (it is located with locale module, since it is reliant on its DB)
- the database writer did not save plural formulas properly since I forgot to rename one more $lang to $langcode, hah
- the database writer did not initialize its report array properly (it was indirectly initialized before via the batch related code that we don't need and was removed); added a direct constructor to init the reporting array
- I integrated the invalida HTML and the multiline text checks into the one test .po file but did not update the import success text check properly
- The multiline translation string was not properly set up for what was being tested

These are just side effects of minor changes. This patch now fixes all tests on my machine that I ran. Hopefully it will work here too.

Gábor Hojtsy’s picture

Issue summary: View changes

Fixed remaining tasks which got removed through http://drupal.org/node/1189184/revisions/2150050/view

Gábor Hojtsy’s picture

Issue summary: View changes

Serious update to the summary, include architecture picture.

penyaskito’s picture

FileSize
116.73 KB
2.18 KB

Reviewed all functional code (I have not been able of reviewing tests, sorry).

* Added two comments.
* Fixed one long line. There are some more at core/lib/Drupal/Component/Gettext/PoStreamReader.php, but they are long strings so I'm fine with it.
* Fixed a phpdoc typo.

For me is RTBC. Thanks everybody for your very hard work.

penyaskito’s picture

Issue summary: View changes

Updated issue summary.

tobiasb’s picture

Issue summary: View changes

Updated issue summary.

attiks’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
117.24 KB
3.99 KB

Some minor comments and one code style issue, but this should be good to go.

attiks’s picture

FYI the interdiff is the other way around

Dries’s picture

I skimmed this patch for 15 minutes and it looks sound. Let's wait for the testbot to come back with the results for the latest patch.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Ok, so the test bot came back green. Committed it to 8.x. Thanks.

sun’s picture

Gábor Hojtsy’s picture

@Dries: woot, thanks!

@sun: Good tags :) Will follow up on this later this week. As long as the sprint tag is on it, its not off our board :)

Gábor Hojtsy’s picture

Title: OOP & PSR-0-ify gettext .po file parsing and generation » CHANGELOG: OOP & PSR-0-ify gettext .po file parsing and generation
Component: locale.module » documentation
Status: Fixed » Needs review
FileSize
490 bytes

I think this definitely deserves a CHANGELOG.txt patch. Attached.

Added changenotice node at http://drupal.org/node/1703490, anybody missing details from there?

Gábor Hojtsy’s picture

FileSize
490 bytes

Helps if I name it patch, although pretty unlikely it ever fails tests :D

webchick’s picture

Status: Needs review » Fixed

Committed that follow-up as well. Thanks!

Gábor Hojtsy’s picture

Title: CHANGELOG: OOP & PSR-0-ify gettext .po file parsing and generation » OOP & PSR-0-ify gettext .po file parsing and generation
Component: documentation » locale.module

Thanks! Restore title.

sun’s picture

Status: Fixed » Needs work

On the change notice (http://drupal.org/node/1703490), that rather looks like a handbook page to me.

Normally, change notices have the purpose of telling the affected developer/implementor of particular change "how to recover" ;) from the API change - i.e., explaining the change in a minimal way, but focusing on the important part; D7 code vs. D8 code -- what do I need to change in my code to upgrade it correctly?

It looks like most of this affected rather "low-level" functions. But since there never really was a proper API for performing such operations, it is possible that there might be modules/code out there, which (re-)used Locale module's functions.

The essential changes seem to be these:

+++ b/core/includes/install.inc
@@ -700,13 +701,7 @@ function st($string, array $args = array(), array $options = array()) {
       $files = install_find_translation_files($install_state['parameters']['langcode']);
       if (!empty($files)) {
-        // Include cross-dependent code from locale module and gettext.inc.
-        require_once DRUPAL_ROOT . '/core/modules/locale/locale.module';
-        require_once DRUPAL_ROOT . '/core/includes/gettext.inc';
-        foreach ($files as $file) {
-          _locale_import_read_po('mem-store', $file);
-        }
-        $strings = _locale_import_one_string('mem-report');
+        $strings = Gettext::filesToArray($install_state['parameters']['langcode'], $files);
       }

+++ b/core/modules/locale/locale.bulk.inc
@@ -107,7 +110,39 @@ function locale_translate_import_form_submit($form, &$form_state) {
     // Now import strings into the language
-    if ($return = _locale_import_po($file, $language->langcode, $form_state['values']['overwrite_options'], $customized) == FALSE) {
+    try {
+      // Try to allocate enough time to parse and import the data.
+      drupal_set_time_limit(240);
+
+      $report = GetText::fileToDatabase($file, $language->langcode, $form_state['values']['overwrite_options'], $customized);
...
+    } catch (Exception $exc) {
+      drupal_set_message(print_r($exc, TRUE));
       $variables = array('%filename' => $file->filename);
       drupal_set_message(t('The translation import of %filename failed.', $variables), 'error');

@@ -207,7 +242,44 @@ function locale_translate_export_form_submit($form, &$form_state) {
-  _locale_export_po($language, _locale_export_po_generate($language, _locale_export_get_strings($language, $content_options)));
+  $reader = new PoDatabaseReader();
+  $languageName = '';
+  if ($language != NULL) {
+    $reader->setLangcode($language->langcode);
+    $reader->setOptions($content_options);
+    $languages = language_list();
+    $languageName = isset($languages[$language->langcode]) ? $languages[$language->langcode]->name : '';
+    $filename = $language->langcode .'.po';
+  }
+  else {
+    // Template required.
+    $filename = 'drupal.pot';
+  }
+
+  $item = $reader->readItem();
+  if (!empty($item)) {
+    $uri = tempnam('temporary://', 'po_');
+    $header = $reader->getHeader();
+    $header->setProjectName(variable_get('site_name', 'Drupal'));
+    $header->setLanguageName($languageName);
+
+    $writer = new PoStreamWriter;
+    $writer->setUri($uri);
+    $writer->setHeader($header);
+
+    $writer->open();
+    $writer->writeItem($item);
+    $writer->writeItems($reader);
+    $writer->close();
+
+    header("Content-Disposition: attachment; filename=$filename");
+    header("Content-Type: text/plain; charset=utf-8");
+    print file_get_contents($uri);
+    drupal_exit();
+  }
+  else {
+    drupal_set_message('Nothing to export.');
+  }

@@ -318,22 +390,32 @@ function locale_translate_batch_build($files, $finish_feedback = FALSE) {
 function locale_translate_batch_import($filepath, &$context) {
...
-    $file = locale_translate_file_create($filepath, $langcode[2]);
-    $success = _locale_import_read_po('db-store', $file, array(), $langcode[2]);
-    if ($success == NULL) {
-      $file->langcode = $langcode[2];
+    $file = entity_create('file', array('filename' => drupal_basename($filepath), 'uri' => $filepath));
+    // We need only the last match
+    $langcode = array_pop($langcode);
+    try {
+      $report = GetText::fileToDatabase($file, $langcode, array(), LOCALE_NOT_CUSTOMIZED);
+      $file->langcode = $langcode;
+      $file->timestamp = filemtime($file->uri);
       locale_translate_update_file_history($file);
+      $context['results']['files'][$filepath] = $filepath;
+      $context['results']['stats'][$filepath] = $report;
+    } catch (Exception $exception) {
+      $context['results']['files'][$filepath] = $filepath;
+      $context['results']['failed_files'][$filepath] = $filepath;
     }

PS: The catch {} line of try...catch statements in this code apparently violates coding standards ;) (should be on its own line)

Gábor Hojtsy’s picture

Updated the change notice with an import example and links to export changes. Since the "before" code is scattered among 3 functions its just not that I could throw up a few lines of standalone code and show it off.

I'll also submit a patch later to fix the try/catch code style issue.

Gábor Hojtsy’s picture

Status: Needs work » Reviewed & tested by the community
Issue tags: -Needs change record
FileSize
1.18 KB

Trivial patch.

attiks’s picture

+++ b/core/modules/locale/locale.bulk.incundefined
@@ -141,7 +141,8 @@ function locale_translate_import_form_submit($form, &$form_state) {
       drupal_set_message(print_r($exc, TRUE));

do we need this?

penyaskito’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
1.28 KB

@attiks Nice catch, I don't think so so new patch without the dumped exception.

Gábor Hojtsy’s picture

Status: Needs review » Reviewed & tested by the community

Ha, cool :)

webchick’s picture

Status: Reviewed & tested by the community » Needs work

While we're at it, can we make the $exc variable into $exception like the hunk below? We don't tend to abbreviate variable names.

I would just do this myself but I'm about to take off on an airplane and I'm not 100% positive I can safely rename that variable without testbot confirmation.

Gábor Hojtsy’s picture

Status: Needs work » Fixed
Issue tags: -sprint

All right, I hoped to have this tiny fix in by the D8MI sprint, so we can clear this out of the sprint board and not confuse people. They don't need to figure out from a 262 comment thread that only some minor code style fixes are relevant at the end as of now. So in the interest of better understanding, I've moved this to #1737186: Fix code style issues in new OOP Gettext parser and closing this down as fixed as per the original goals. The remaining code style issues will be fixed there. It is tagged Novice and is on the sprint.

webchick’s picture

Cool, I'm following that issue so whenever it gets bumped to RTBC I'm happy to commit it.

Kristen Pol’s picture

Status: Fixed » Needs review
FileSize
1.3 KB

All I did was rename the $exc to $exception per @webchick. Note that I simply edited the original patch directly and didn't create this patch via diff.

Kristen Pol’s picture

FileSize
1.29 KB

Whoops... got a little crazy with the last patch (too many substitutions). This one should be correct.

penyaskito’s picture

Status: Needs review » Fixed
Kristen Pol’s picture

Ah! Cool... sorry for the noise :)

andypost’s picture

Status: Fixed » Needs review
FileSize
610 bytes

Site info already in CMI, probably needs test

Gábor Hojtsy’s picture

Status: Needs review » Fixed

Can you please open this in a followup. Appreciated. Don't want to see comment #300.

andypost’s picture

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

Updated issue summary.