Heya! I'm working on a project for a client who is using Drupal 7. Among other things, we want to perform an audit of the content currently stored in their Drupal instance. Specifically, I want to create a list of every page / link on the site and any related files / downloads that Drupal is managing for them. 

Looking at the APIs, I honestly have no idea where to start here. I am also digging around the database and, while I see the nodes and various field_ tables that seem to have the content stored in them, I see no obvious way they're connected. 

Can someone please point me in the right direction to get started here? I assumed every URL in Drupal mapped back to a node, and that seems to pan out looking at the database, but it's not at all clear how the rest of the content of a page get build with that path, nor is it obvious how to build the URL from the node data. 

How would you build a list of all links and files like the one I'm describing?

Thanks in advance!

Comments

vm’s picture

every node/path has a nid (node id). which then generates a yoursite.com/nid path.

If the path or pathauto modules are in use then then are adding on to that to produce readable urls based on parameters set in the configuration of those modules. note pathauto depends on the token. Module which allows tokens to be used to build out readable paths.

rzazueta’s picture

OK, that makes sense in terms of how the paths are created - thank you!

So, how does Drupal get all of the content to build the page for that node? Looking in the DB, I'm not seeing how the keys align at all. 

vm’s picture

my understanding is that the entities in the node are tied to the node.

edit response to question @ : https://stackoverflow.com/questions/7773025/drupal-7-node-fields-mapping... may provide you the information you are seeking

jaypan’s picture

You can try one of the various sitemap modules to get a list of page URLs on the site. As for file downloads, that would be harder.

I would actually probably use an external link checker, that goes through the entire site and builds a list of all links. If you are on a Mac, the free Integrity program is what I use. It will crawl your entire site for a list of links. However, I'm not sure if you can get links for authenticated uses with that program, maybe only anonymous users.

Contact me to contract me for D7 -> D10/11 migrations.

ressa’s picture

For collecting links Fink is also pretty cool:

Fink (pronounced "Phpink") is a command line tool, written in PHP, for checking HTTP links

You can download it, and run it like this, resulting in a report (report.json):

$ wget https://github.com/dantleech/fink/releases/download/0.10.3/fink.phar
$ php fink.phar https://example.org -x0 -oreport.json