Hi,
i want to use pre-filter on a big xml file (more than 5000 entries) on a field which can contain up to 3 values on each entry.
For example i 've got 1100,2456 and 3402 for values of this field.
I don't want to import entries with the 3402 value, so i've made a pre-filter like this :

  public static function filtreClil($field) {
  	$clil = array(3402,3304);
    if (is_array($field)) {
      foreach ($field as &$f) {
        $f = self::filtreClil($f);
      }
      return $field;
    }else{
    	if(in_array($field,$clil)){
    		return NULL;
    	}else{
    		//return false;
    		return $field;
    	}
  	}
  }

Then in the pre-filter i call it with ::filtreClil and [field].
What i'm doing wrong ?
Thanks for help.

I've tried to apply some simples functions like prepend or replace to test if pre-filter work, but neither is working as expected.

CommentFileSizeAuthor
#2 2prod-xml.txt37.75 KBLaFabrik

Comments

Sorin Sarca’s picture

Hi,
pre-filters will not change data, are used in order to provide an alternative xpath. But in your case a pre-filter is what you need OR (depending on your xml structure) you can ignore that items directly with an xpath query.

I'll show the pre-filter way, but if you post a sample of your xml file I'll show you the method using only xpath (if possible).

public static function checkBadIds($field) {
  static $bad = array(3402, 3304);
  if (is_array($field)) {
    return count(array_intersect($bad, $field)) ? NULL : $field;
  }
  return in_array($field, $bad) ? NULL : $field;
}

If you want your function to be generic and provide bad values as params you should do the following:

public static function checkBadIds() {
  $bad = func_get_args();
  $field = array_shift($bad);
  if (is_array($field)) {
    return count(array_intersect($bad, $field)) ? NULL : $field
  }
  return in_array($field, $bad) ? NULL : $field;
}

And for pre-filter params use:

[field]
3402
3304
other number

Hope it helps.

LaFabrik’s picture

StatusFileSize
new37.75 KB

Thanks for your detailed anwser.
Here it is, a sample (1 product) of the xml feed.
As you can see there's a lot of fields, it's a ONIX3 message format (ebook).

<Product>
...
<Subject>
<MainSubject/>
<SubjectSchemeIdentifier>10</SubjectSchemeIdentifier>
<SubjectCode>FIC000000</SubjectCode>
</Subject>
<Subject>
<MainSubject/>
<SubjectSchemeIdentifier>29</SubjectSchemeIdentifier>
<SubjectCode>2312</SubjectCode>
<SubjectHeadingText>Romans contemporains</SubjectHeadingText>
</Subject>
<Subject>
<SubjectSchemeIdentifier>20</SubjectSchemeIdentifier>
<SubjectHeadingText>peuls, roman, seuil</SubjectHeadingText>
</Subject>
<Subject>
<SubjectSchemeIdentifier>29</SubjectSchemeIdentifier>
<SubjectSchemeVersion>DILICOM20</SubjectSchemeVersion>
<SubjectCode>3442</SubjectCode>
<SubjectHeadingText>Romans</SubjectHeadingText>
</Subject>
....

Whole sample file with 2 products is join (in txt because of the restriction).
The part i'm trying to filter is in Subject tag and only when SubjectSchemeIdentifier is "29", i want to be sure "3402" is not in SubjectCode.
So i'm not sure that you could check these 2 conditions in an Xpath request.
Thank you for your reply.
Have a good day.

Sorin Sarca’s picture

I don't know how big is your file (in MB), but if it's <= ~50MB you should use as process function "processXML" and for parent xpath you can use something like this
//Product[DescriptiveDetail/Subject[SubjectSchemeIdentifier!="29" and SubjectCode!="3402"]]

I don't know exactly your case so I wonder if this should be imported:

<Subject>
  <SubjectSchemeIdentifier>29</SubjectSchemeIdentifier>
  <SubjectCode>5</SubjectCode>
</Subject>
<Subject>
  <SubjectSchemeIdentifier>45</SubjectSchemeIdentifier>
  <SubjectCode>3402</SubjectCode>
</Subject>
LaFabrik’s picture

Hi Sorin,
sorry for the delay to respond, i was busy on other tricks.
Yes, actually the first file size to import is 100Mo, so i'm very interested of what you call "process function".
Can you tell me more about that ?
In fact i don't understand well where i must put the xpath you describe ? On the top parent xpath, is that it?

To answer the last question : i import every field of the products in my node "ebook", except those whose i can filtered based on the SubjectCode.

Sorin Sarca’s picture

Issue summary: View changes
Status: Active » Closed (works as designed)