I am planning to build a community-driven (always growing) site like stackoverflow... where people can post questions and other users can post replies (comment on questions/replies and vote up/down system) with the badge and points system.

Such website has potential of becoming very huge as it's community driven.

My concerns are performance of drupal for such websites.

Firstly, what version should I use? 6 or 7?

I am concerned with the field storage of d7 where each field = 1 or 2 tables. In a site like stackoverflow, I would think that I may have a lot of custom fields to add (and as it evolves with new features, even more fields would be required). So, wouldn't be the field storage of d7 greatly impact on performance and for scalability? if i expect traffic like stackoverflow.

But, d8's release is imminent and d6 will no more be maintained, so I do not think d6 is appropriate to start using now.

I have never used d7 before, only d6. I plan to use content types and cck(field API in d7) but again I am concerned with the performance. Are there techniques or other ways to create fields that are stored in the node table itself?

If i create an entity, will I be able to put all my fields in one table? and at the same time leverage the benefits of using modules just like content types?

Thanking you in advance for reading my long post and awaiting feedback, ideas, suggestions and recommendations from you guys. (Btw, I love drupal 6 lol but i think it's time to move on to d7)

edit: I know drupal.org runs on D7 now and it does not seem to have any problem handling the huge traffic(I would think it has more, or more less same traffic as stackoverflow). I am interested to know if there were special steps they followed for good performance (they have lots of content types(or maybe it's entities) like forums, modules, etc..)

Comments

Jaypan’s picture

Drupal 6 is not so scalable, and nearing the end of its life cycle, so it wouldn't be a great idea to use it.

Drupal 7 is more scalable, and as you can see with Drupal.org, can handle large amounts of traffic.

Drupal 8 will be the most scalable, but is probably near a year away from release, and at least another year after that from really being usable.

So D7 is what you would want to use.

gunjack07’s picture

thx for the reply but do you think d7 is scalable enough for a site similar to stackoverflow?

Jaypan’s picture

Yes.

But not if you overload it with a million modules.

gunjack07’s picture

do you guys have suggestions/solutions to overcome the overhead of having each field in a separate table? is there a way to store all of them in a single table per content type? like d6?

nevets’s picture

With a properly tuned database, the number of tables is not that big on an issue. The very small gain you might make would be lost in the effort and extra maintenance required. You are better of using appropriate caching mechanisms. (There is one for entities).

Jaypan’s picture

The Entity Cache module covers this. The fact that the data is stored across multiple tables becomes irrelevant.

nevets’s picture

Also scalability is tied to the web hosting you choose, shared hosting will not cut it and you are likely to need more than a single server.

gunjack07’s picture

can you recommend a hosting and budget plan to begin with that can be upgraded quickly if needed? I don't want to invest in a dedictated plan immediately.. i need to build my user base first

gunjack07’s picture

will a VPS server be enough as a start? I can upgrade as traffic increases? What hosting do you recommend?

gunjack07’s picture

the site traffic expectations is about 80, 000 to 100, 000 visits per day. Can drupal handle such load? Read some case studies saying drupal is slow and does not scale especially with non-anymous visits(as caching not very efficient for that situation) well unless you hack it like hell

Jaypan’s picture

1) You are looking at a Drupal site right now that has very high traffic.
2) You don't necessarily have to hack it like hell, but you will need to know what you are doing. Adding a few modules and expecting it to scale isn't going to happen. You'll need to know how to code custom modules, work with servers, and have a very deep understanding of how Drupal works.

gunjack07’s picture

so I am just discarding one of the main advantages of drupal that is leveraging on huge modules base if i am just not using them.. it's like I am only using drupal's framework.. wouldn't it better to go with a non-bloated framework then?

nevets’s picture

You can use modules (most sites do), as the number of modules used grows, the amount of server resources also grows (memory in particular, though in D7 well written modules have a small impact).

Jaypan’s picture

wouldn't it better to go with a non-bloated framework then?

First, Drupal core is not bloated. It's actually quite slim, as it is more a framework than a CMS. Next, it depends on what you are comparing it to. As far as the common CMS are concerned, I'd say none of them scale as well as Drupal does. If you are going to use a framework such as cake PHP or symfony or something, these will scale better, but then you are using a framework - same as Drupal is a framework. And if you use Drupal as a framework, it will (or rather 'can', if you code it right) scale well.

Also, there is no need to not use any of Drupal's modules, you just have to strongly consider which ones you are going to use. Many developers add on any module they can find to add new functionality. Sometimes 2-3 modules are needed to add one bit of functionality. The more bloat you add to your site, the more overhead it's going to require, and the less well your site will perform with large access numbers. But some modules are well worth the overhead they require, as they provide a lot of strong functionality with that overhead, and they are well maintained meaning they will be secure and receive lots of updates. The key is to finding a balance. Don't go willy-nilly with the modules, and be prepared to do some custom coding to get some functionality rather than adding modules to get it.

WorldFallz’s picture

brilliantly stated... bookmarking this one!

gunjack07’s picture

thx for the advice. I am going to limit myself to only the essential modules like views.

Do you use entity API and entity construction kit?

Jaypan’s picture

I don't use Views, Entity API or Entity Construction Kit, though I do use my own custom module that does the same thing as ECK.

Edit - though that's just me. If you are evaluating modules to use on a site, these three are good modules as they all provide powerful functionality and are well maintained.

gunjack07’s picture

wow no views? I can't live without views on drupal 6 lol (maybe not necessarily for frontoffice but to generate friendly customised lists in backoffice for my clients' needs).

Do you usually create a new content type vs create a new entity type? I am thinking that for very light/small entities to link with a node, it's more preferable and more sense to create a new entity type. Have you personally used entity cache?

anyway thx for your insights, views and insights for my questions; you have convince me to stick with drupal :) hehe I am a big d6 fan; hope I will have similar experience with my first real d7 development project. I am already getting more familiar with the new features of d7 and although entity concepts are a bit hard to understand (haven't completely understood it yet), it is looking great and fun.

Jaypan’s picture

The Views module is basically just a graphic interface for creating SQL queries. It also adds a lot of formatting to the results of those SQL queries, so that the output can be styled. I prefer to bypass it altogether and write my own SQL queries, which gives me more flexibility, and doesn't require loading the overhead of the views module, and adding additional modules to create queries that Views core cannot create by itself. I then use more core methods of styling the output - usually View Modes.

The only time I really use Views is to create calendars, as the Calendar module seems to require views, though one of these days I'm going to dig through the code and see if I can't use some of the functions in the module to output calendars using the results of my own query.

I always create entity types. I haven't used nodes in over two years now in D7. Content types are based on nodes. Nodes were made essentially as a type of article, with an author, creation date, updated date, revisions, published status etc. If you are creating articles, then nodes are great. If you are creating anything else then while they work, I prefer to use something specifically made to that data type.

And yes, I use the Entity Cache module. It's one of the most important modules I figure, as without it, generating each entity requires pulling from many tables for every load. This will add a lot of overhead to a site.

gunjack07’s picture

yup i know views is just a gui to create queries but it helps create then very quickly and the import/export function is very useful when migrating a view from dev to production.

When you say you write your queries, do you use the Entityfieldquery or make direct queries with db_query?

Jaypan’s picture

yup i know views is just a gui to create queries but it helps create then very quickly and the import/export function is very useful when migrating a view from dev to production.

That's useful for development, but when you are considering building an enterprise level site with 10s of thousands of hits per day, then you need to consider the overhead on page requests ahead of convenience of development.

When you say you write your queries, do you use the Entityfieldquery or make direct queries with db_query?

I don't usually use EntityFieldQuery, as I've found it's not as flexible as I'd like at times. I usually use db_query() to select entity IDs, then use a _load_multiple() to load them. This is essentially the same process as using EntityFieldQuery() to get the entity IDs then loading the objects with those IDs.

For insert, update and delete queries however I use db_insert(), db_update() and db_delete(), as db_query() should only be used for select queries.

gunjack07’s picture

what if you need to query a field attached to an entity? you use db_query for that also? if the storage type of the fields change, will this still work with you use db_query?

Jaypan’s picture

what if you need to query a field attached to an entity? you use db_query for that also? if the storage type of the fields change, will this still work with you use db_query?

If I need to query a field, I join the relevant tables and write my query accordingly. If the storage engine changes, then likely the query would stop working.

question: when do you use property instead of field? why not just use property? (can you apply the formatters of field on properties? e.g. image?)

I rarely use properties unless I know for a fact it's something that won't need translation and it's something that won't need to tie in with other APIs. An example is the 'created' timestamp on an entity.

These are starting to get off the original topic, so if you have more questions it's probably better to start a new thread in the module development and code questions forum.

gunjack07’s picture

haha yup you are right. Will start a new topic. I usually use db_query in d6 for complex queries tbh but I read it's no more recommended in d7 as the field storage can vary. I did not know properties can't be translated

gunjack07’s picture

question: when do you use property instead of field? why not just use property? (can you apply the formatters of field on properties? e.g. image?)