Problem/Motivation
I am trying to create a field type that leverages the JSON and JSONB data types in Postgres. My goal is to have a field that can store JSON objects who's properties can be indexed. This is a Postgres specific feature.
- http://www.postgresql.org/docs/9.4/static/datatype-json.html
- http://pgeoghegan.blogspot.com/2014/03/what-i-think-of-jsonb.html
- http://nandovieira.com/using-postgresql-and-jsonb-with-ruby-on-rails
Proposed resolution
Add json:normal and jsonb:normal mapped to json and jsonb Drupal field types for pgsql driver, Schema::getFieldTypeMap().
'json:normal' => 'json',
'jsonb:normal' => 'jsonb',
);
return $map;
}
json vs. jsonb
The json and jsonb data types accept almost identical sets of values as input. The major practical difference is one of efficiency. The json data type stores an exact copy of the input text, which processing functions must reparse on each execution; while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed. jsonb also supports indexing, which can be a significant advantage.
https://www.postgresql.org/docs/current/datatype-json.html
Remaining tasks
- Review field type map and approach
Data model changes
json and jsonb field types are available for use in schema for pgsql driver.
Release notes snippet
Basically I want to be able to paste JSON or parse a CSV with another custom module and store the object in this field with the intent of being able to query this data, treating it like a NoSQL store. Since Mongo isn't fully implemented, and would require two databases running instead of one, Postgres seems like a natural fit.
| Comment | File | Size | Author |
|---|---|---|---|
| #56 | 2472709-56.patch | 2.62 KB | andypost |
| #56 | interdiff.txt | 622 bytes | andypost |
| #52 | interdiff.txt | 5.26 KB | andypost |
Comments
Comment #1
kevinquillen commentedComment #2
larowlanComment #4
kevinquillen commentedLooks like this feature is coming to MySQL as well:
http://mysqlserverteam.com/json-labs-release-native-json-data-type-and-b...
Comment #5
bzrudi71 commentedTagging...
Comment #6
bzrudi71 commentedComment #7
kevinquillen commentedRelated... ATEN just posted this and it feels relevant.
http://atendesigngroup.com/blog/speeding-complex-drupal-data-loads-custo...
If they were able to store as JSON in MYSQL or JSONB/JSON in Postgres it would save them a lot of steps.
I was able to do some queries, like this one, with some example data:
jsonb_array_elements is a Postgres function, one of which makes this field type worth using. The field in question stored 2351 JSON objects, and this query executed and returned a response to me in less than a second (~250 records).
I am working at posting my module that supports this field type this week.
Comment #8
bzrudi71 commentedThe only problem I see here is that we are currently going to bump PG min version to 9.1.2 but JSON support starts with 9.2 ;-) Please see #2477413: Increase minimum version requirement for Postgres to 9.1.2.
Comment #9
kevinquillen commentedCorrect - JSON starts at 9.2, JSONB introduced with 9.4.
As long as the minimum version is getting bumped, it would be nice if at least JSON was supported (9.2). How does this affect testing?
Comment #10
andypostSuppose this no more issue, we just need to define "pgsql_type" for field
Fix for contrib module #2480201: Clean schema implementation for field schema
Comment #13
andypostComment #14
andypost9.1 no more supported https://www.postgresql.org/support/versioning/ so looks it's time to up suggested version to 9.2
This needs check across main distrosbut anyway this issue can have BC implementation for this types
Comment #15
andypostSort of that may work
Comment #16
mradcliffeIt might be good to add a method/property to the abstract Connection class for detecting that capability with a default of FALSE. Then it could be added to PostgreSQL and MySQL drivers. Since that class is abstract we should be able to add that to the API, right?
Comment #17
andypost@mradcliffe Good idea,
getCapabilities()makes sense! But then we need some flags to return.For example this may be helpful for #2839683: Add support distinct on construction
Comment #18
daffie commentedCreated #2846994: Increase minimum version requirement for Postgres to 10 and require the pg_trgm extension. In PostgreSQL 9.2 are JSON and JSONB data types supported.
Comment #24
mradcliffeJotting down some notes from a sandbox project of mine.
We can check JSON capabilities with something like the following. I wrote this for Drupal 7. :-)
'SELECT JSON_VALID(:value)', [':value' => 'null']'SELECT pg_typeof(:value::jsonb)', [':value' => '{}']And if an exception is thrown, then we can set/cache false or something.
Comment #25
mradcliffeImplementing this should be unblocked now, but might be duplicated by work done in #3046696: Move from serialized columns to JSON encoded data wherever possible, or use allowed_classes.
Since minimum version is still 9.1, some sort of capabilities detection needs to be added to DBTNG.
Comment #27
andypostComment #30
andypostComment #31
anmolgoyal74 commentedRe-rolled for 9.2.x
Comment #32
andypostIt just needs tests which could be extended for mysql (sqlite not clear)
This condition no longer needed for 9.0+
https://www.drupal.org/node/2847322
Comment #33
mradcliffeI'm working on adding some tests for this today.
Comment #34
mradcliffeI uploaded a test-only patch, and then a patch that makes the suggested changes from @andypost in #32.
I removed the Needs tests tag.
I updated the issue summary with the template so the current proposed resolution is clear.
Comment #35
mradcliffeResolved coding standard issue in this patch. No major changes.
Comment #36
mradcliffeThank you, interdiff. I need to look at it before uploading. :(
I created the patch from the test-only codebase. I re-applied the patch and fixed the code standard issue.
Comment #37
andypostLooks ready for commiter
Comment #38
mradcliffeRe-running tests. I think the patch is failing because json and jsonb fail spell check and need exceptions.
Comment #39
daffie commentedIt is the word
jsonb. See: https://www.drupal.org/pift-ci-job/1923087Comment #40
mradcliffeAdds jsonb to dictionary. Stays at RTBC if it passes.
Comment #41
alexpottfwiw https://www.drupal.org/project/json_field exists and implements a field type for Postgres's jsonb format as well as the json format.
I think for something to get core inclusion we need to implement it for all core drivers - ie. #3109340: [policy] Decide whether to require json support for all database drivers for Drupal 10 needs to get agreed first and then we need to decide how this is implemented. Going to discuss with other framework managers.
Comment #42
catchI think we should at least postpone this on #3109340: [policy] Decide whether to require json support for all database drivers for Drupal 10. I don't think we have to implement support for each database driver simultaneously in the same issue, but we should have viable patches for each so that we can see what it looks like - i.e. whether we need an abstraction layer or not to enable modules to use json/jsonb regardless of driver.
Comment #44
andypostPolicy landed
Comment #48
anybodyComment #49
gappleI commented in the MySQL issue, that MySQL has a single
JSONtype that is most similar to Postgres'jsonb, which could possibly cause some confusion or unexpected behaviour if Drupal'sjson:normalfield type acts differently between database engines.The META issue might be the place to discuss the expected DB behaviour (raw vs. normalized), and whether Postgres' convention should be adopted by Drupal. (The Postgres documentation does recommend that
jsonbbe used as the default, unless legacy needs require the json data to not be normalized).Comment #51
andypostre-roll and changed
json:binaryforjsonbAlso it could use
pgsql_typein spec to provide this variation to keep it inline with Mysql - Re #49Patch also removes try/catch and ref to policy issue, added type-hints
Comment #52
andypostI removed
jsonbas it could be used providingpgsql_typeand to keep the driver inline with othersAlso copied the test to test every returned datatype from #3373370: Support JSON fields in Sqlite schema
The difference is that
boolreturned as "true" when sqlite returns 1 (so pgsql results are typed)Comment #53
andypostComment #54
andypostfix CS
Comment #55
andypostsimplified test
Comment #56
andypostfix cs
Comment #57
anybodyThanks @andypost for pushing this forward an unifying on
json:normalThe PostgreSQL documentation says about
jsonvs.jsonb:(https://www.postgresql.org/docs/current/datatype-json.html)
I added that to the issue summary.
As the implemented data type here is now the same as for the other database engines (see meta issue), I'll set this RTBC.
But we should discuss, how to proceed with
jsonband potentiallyjsonpath(see https://www.postgresql.org/docs/current/datatype-json.html#DATATYPE-JSON...) for PostgreSQL.Should that simply be provided by a contrib module? Or as a follow-up here?
Comment #58
anybodyMoving "Needs framework manager review" tag to the parent meta issue: #3343634: Add "json" as core data type to Schema and Database API
Comment #59
daffie commentedLets use the JSONB field instead of the older JSON field.
Can we rewrite this to
foreach ($test_data as $path => $expected) {Can we change this to: $query->addExpression('test_field -> 'list' -> 1');
$test_datais different. Can we make the testing values the same?Comment #60
andypostI don't think we can use
jsonbout of box because it doing normalization and removing duplicate keys (even in arrays)So if someone willing to use it they always can set it by
pgsql_typeStill not clear if we can provide alternative to
addExpression()method++ to move 2 arguments case out of loop to make it more clear
but the biggest question is the last case where 1 (array index) can't be passed as argument and that's why I did hardcoding it into query
Comment #61
daffie commentedI do not have a problem with removing duplicate keys. For me that is a plus. Doing normalization is also a plus for me. Far more important for me is the fact that you can do indexing on JSONB fields. Without an index we will be doing full table scans and that is a no-go for me.
Comment #62
bradjones1I would agree that indexing is a huge consideration here and is a strong +1 for
jsonb. It is also more read performant, per the docs, despite being slightly slower on initial input.The normalization and de-duplicating array keys doesn't feel like a blocker; is there a practical example of where this would be an issue for current code?
The point about being able to opt to the alternative implementation (
jsonvs.jsonb) is a good one but I think whatever we go with needs to be a sensible default for the 90+% usecase. It wouldn't hurt also to add in some assertions for input data that could be triggered at least in development, e.g. if you attempt to store data that will be truncated (e.g., duplicate keys) if the default implementation isjsonb.Comment #63
gapple+1 for
jsonbas I noted in #49:
- the postgres docs recommend
jsonb- mysql's json type doesn't preserve duplicate keys, does order object keys, and performs whitespace normalization
I don't think removing duplicate keys should be a concern - any value in a PHP data type won't have duplicate keys, and if an input to
json_decodehas a duplicate key only the last value will be preserved (same behaviour as postgres & mysql). So, anything currently usingjson_encode/json_decodeand a text field is already not able to have duplicate object keys.JSON objects are defined as being unordered, so any implementation using it shouldn't rely on order being preserved. It's probably only worth a documentation note that order preservation shouldn't be expected when storing and retrieving from json database fields. The options to use a text field and rely on PHP's encode/decode behaviour, or set
pgsql_typetojsonare available if key order is an issue for a particular case.Comment #64
bradjones1Per #3343634-14: Add "json" as core data type to Schema and Database API this is being consolidated into the parent issue for a single change.
Also appears the jsonb change was made on the parent, resolving #63.