[ACCEPTED]-Entity Attribute Value Database vs. strict Relational Model Ecommerce-key-value
There's a few general pros and cons I can 6 think of, there are situations where one 5 is better than the other:
Option 1, EAV Model:
- Pro: less time to design and develop a simple application
- Pro: new entities easy to add (might even be added by users?)
- Pro: "generic" interface components
- Con: complex code required to validate simple data types
- Con: much more complex SQL for simple reports
- Con: complex reports can become almost impossible
- Con: poor performance for large data sets
Option 2, Modelling each entity separately:
- Con: more time required to gather requirements and design
- Con: new entities must be modelled and designed by a professional
- Con: custom interface components for each entity
- Pro: data type constraints and validation simple to implement
- Pro: SQL is easy to write, easy to understand and debug
- Pro: even the most complex reports are relatively simple
- Pro: best performance for large data sets
Option 3, Combination (model entities "properly", but add "extensions" for custom attributes for some/all entities)
- Pro/Con: more time required to gather requirements and design than option 1 but perhaps not as much as option 2 *
- Con: new entities must be modelled and designed by a professional
- Pro: new attributes might be easily added later on
- Con: complex code required to validate simple data types (for the custom attributes)
- Con: custom interface components still required, but generic interface components may be possible for the custom attributes
- Con: SQL becomes complex as soon as any custom attribute is included in a report
- Con: good performance generally, unless you start need to search by or report by the custom attributes
* I'm not sure if Option 3 would necessarily save any time in the design phase.
Personally I 4 would lean toward option 2, and avoid EAV 3 wherever possible. However, for some scenarios 2 the users need the flexibility that comes 1 with EAV; but this comes with a great cost.
It is safe to say that the EAV/CR database 14 model is bad.
No, it's not. It's just that 13 they're an inefficient usage of relational 12 databases. A purely key/value store works 11 great with this model.
Now, to your real 10 question: How to store various attributes 9 and keep them searchable?
Just use EAV. In 8 your case it would be a single extra table. index 7 it on both attribute name and value, most 6 RDBMs would use prefix-compression to on 5 the attribute name repetitions, making it 4 really fast and compact.
EAV/CR gets ugly 3 when you use it to replace 'real' fields. As 2 with every tool, overusing it is 'bad', and 1 gives it a bad image.
// At this point, I'd like to take a moment to speak to you about the Magento/
Adobe PSD format. // Magento/ PSDis not a good ecommerce platform/ format. Magento/ PSDis not even a bad ecommerce platform/ format. Calling it such would be an // insult to other bad ecommerce platform/ formats, such as Zencart or OsCommerce. No, Magento/ PSDis an abysmal ecommerce platform/ format. Having // worked on this code for several weeks now, my hate for Magento/ PSDhas grown to a raging fire // that burns with the fierce passion of a million suns.
The internal models are wacky at best, like 7 someone put the schema into a boggle game, sealed 6 that and put it in a paint shacker...
Real 5 world: I'm working on a midware fulfilment 4 app and here are one the queries to get 3 address information.
CREATE OR REPLACE VIEW sales_flat_addresses AS SELECT sales_order_entity.parent_id AS order_id, sales_order_entity.entity_id, CONCAT(CONCAT(UCASE(MID(sales_order_entity_varchar.value,1,1)),MID(sales_order_entity_varchar.value,2)), "Address") as type, GROUP_CONCAT( CONCAT( eav_attribute.attribute_code," ::::: ", sales_order_entity_varchar.value ) ORDER BY sales_order_entity_varchar.value DESC SEPARATOR '!!!!!' ) as data FROM sales_order_entity INNER JOIN sales_order_entity_varchar ON sales_order_entity_varchar.entity_id = sales_order_entity.entity_id INNER JOIN eav_attribute ON eav_attribute.attribute_id = sales_order_entity_varchar.attribute_id AND sales_order_entity.entity_type_id =12 GROUP BY sales_order_entity.entity_id ORDER BY eav_attribute.attribute_code = 'address_type'
Exacts address information 2 for an order, lazily
Summary: Only use Magento 1 if:
- You are being given large sacks of money
- You must
- Enjoy pain
I'm surprised nobody mentioned NoSQL databases.
I've 5 never practiced NoSQL in a production context 4 (just tested MongoDB and was impressed) but 3 the whole point of NoSQL is being able to 2 save items with varying attributes in the 1 same "document".
Where performance is not a major requirement, as 26 in an ETL type of application, EAV has another 25 distinct advantage: differential saves.
I've 24 implemented a number of applications where 23 an over-arching requirement was the ability 22 to see the history of a domain object from 21 its first "version" to it's current state. If 20 that domain object has a large number of 19 attributes, that means each change requires 18 a new row be inserted into it's corresponding 17 table (not an update because the history 16 would be lost, but an insert). Let's say 15 this domain object is a Person, and I have 14 500k Persons to track with an average of 13 100+ changes over the Persons life-cycle 12 to various attributes. Couple that with 11 the fact that rare is the application that 10 has only 1 major domain object and you'll 9 quickly surmize that the size of the database 8 would quickly grow out of control.
An easy 7 solution is to save only the differential 6 changes to the major domain objects rather 5 than repeatedly saving redundant information.
All 4 models change over time to reflect new business 3 needs. Period. Using EAV is but one of the 2 tools in our box to use; but it should never 1 be automatically classified as "bad".
I'm struggling with the same issue. It may 19 be interesting for you to check out the 18 following discussion on two existing ecommerce 17 solutions: Magento (EAV) and Joomla (regular 16 relational structure): https://forum.virtuemart.net/index.php?topic=58686.0
It seems, that Magento's 15 EAV performance is a real showstopper.
That's 14 why I'm leaning towards a normalized structure. To 13 overcome the lack of flexibility I'm thinking 12 about adding some separate data dictionary 11 in the future (XML or separate DB tables) that 10 could be edited, and based on that, application 9 code for displaying and comparing product 8 categories with new attributes set would 7 be generated, together with SQL scripts.
Such 6 architecture seems to be the sweetspot in 5 this case - flexible and performant at the 4 same time.
The problem could be frequent 3 use of ALTER TABLE in live environment. I'm 2 using Postgres, so its MVCC and transactional 1 DDL will hopefully ease the pain.
I still vote for modeling at the lowest-meaningful 4 atomic-level for EAV. Let standards, technologies 3 and applications that gear toward certain 2 user community to decide content models, repetition 1 needs of attributes, grains, etc.
If it's just about the product catalog attributes 20 and hence validation requirements for those 19 attributes are rather limited, the only 18 real downside to EAV is query performance 17 and even that is only a problem when your 16 query deals with multiple "things" (products) with 15 attributes, the performance for the query 14 "give me all attributes for the product 13 with id 234" while not optimal is still 12 plenty fast.
One solution is to use the 11 SQL database / EAV model only for the admin 10 / edit side of the product catalog and have 9 some process that denormalizes the products 8 into something that makes it searchable. Since 7 you already have attributes and hence it's 6 rather likely that you want faceting, this 5 something could be Solr or ElasticSearch. This 4 approach avoids basically all downsides 3 to the EAV model and the added complexity 2 is limited to serializing a complete product 1 to JSON on update.
EAV has many drawbacks:
- Performance degradation over time Once the amount of data in the application grows beyond a certain size, the retrieval and manipulation of that data is likely to become less and less efficient.
- The SQL queries are very complex and difficult to write.
- Data Integrity problems. You can't define foreign keys for all the fields needed.
- You have to define and maintain your own metadata.
I have a slightly different problem: instead 15 of many attributes with sparse values (which 14 is possibly a good reason to use EAV), I 13 want to store something more like a spreadsheet. The 12 columns in the sheet can change, but within 11 a sheet all cells will contain data (not 10 sparse).
I made a small set of tests to benchmark two designs: one 9 using EAV, and the other using a Postgres 8 ARRAY to store cell data.
Both schemas 7 have indexes on appropriate columns, and 6 the indexes are used by the planner.
It turned 5 out the array-based schema was an order of magnitude faster for both inserts and queries. From 4 quick tests, it seemed that both scaled 3 linearly. The tests aren't very thorough, though. Suggestions 2 and forks welcome - they're under an MIT 1 licence.
More Related questions