Creating Drupal content using Python
I wanted to import data into Drupal from elsewhere - in my case, from XML exported by another system.
Perhaps the easiest way to interface with Drupal at a simple level is to use the Services module. This lets you, for example, create applications in other languages which can create, list, or update Drupal content. There is a mix of documentation for it, some of which is a bit dated.
So here's a quick example of how you can use a Python program to create a node in Drupal. Hope this helps to get you started if you need to do something similar.
#! /usr/bin/python # # Creating nodes in Drupal 6 via the Services module using # Python and XML-RPC. # # To use this, get and install the Services module from # http://drupal.org/project/services # # Enable the XMLRPC Server module, the System Service module and the # Node Service module at least. # # # Note that this is intended to demo quick importing of data from other # systems. It makes no use of authentication, in fact you need to enable # various permissions for anonymous users to be able to use it. # Enable things like 'administer nodes', 'create page content', # 'create url aliases' and 'administer url aliases'. # # You should remember, therefore, to DISABLE them before you go live # with a production site! # # This assumes you have 'Use sessid' but not 'Use keys' enabled in the # Services settings at http://yoursite/admin/build/services/settings. import xmlrpclib, time # Put the URL for your Service in here. See admin/build/services. s = xmlrpclib.ServerProxy('http://myhost.com/services/xmlrpc') class node: # You need to set uid and username appropriately for your site if you don't want # everything to be posted by Anonymous. def __init__(self, title, body, path, ntype='page', uid=1, username='qsf'): self.title = title self.body = body self.path = path self.type = ntype self.uid = uid self.name = username self.promote = False try: sessid, user = s.system.connect() # Here you could read in the content for each node from some other source and do # n = node(title, body, path) # but for now we'll just do: n = node('A test node', 'This is an interesting page', 'interesting') # and then save it into Drupal s.node.save(sessid, n) # where it should appear at /interesting. except xmlrpclib.Fault, err: print "A fault occurred" print "Fault code: %d" % err.faultCode print "Fault string: %s" % err.faultString
Importing content from Plone to Drupal
I used a variant of this script to convert a Plone-based site to Drupal. Whether any of the following would work for somebody else's site depends to a great degree on the particular configurations involved, but I offer it in case it's useful as inspiration! You'll almost certainly need to tweak it.
As those familiar with Plone will know, the content is stored in an object-oriented database, and mapping that onto other things can be tricky. Most of my content was reasonably straightforward, though, and I tackled the problem by switching on the Zope options to make the Plone site accessible by FTP. I could then use the FTP tool of my choice to access what looked like a filesystem and copy the whole hierarchy onto another machine.
My Plone content was basically HTML and images, and each node in Plone came across as a simple HTML file without all the boilerplate formatting. Perfect! I could then write a script which would take, for example, the file in news/article1, read it as XML, gather the bits I wanted and create a node in Drupal which also had news/article1 as the path.
The main bit of manual tweaking was to do with relative URLs. In Drupal you really want every link on your site to be an absolute URL (ie. to begin with '/'), because there are so many ways of chopping and displaying your content. On the old site, news/article1 might link to article2 in the same folder, which isn't going to work if news/article1 is sometimes accessed as node/29, or recentstuff/thisweek, or whatever. You need to look for links that don't begin either with http or with / and update them appropriately to /news/article2 or whatever. I didn't do that in the script because there were few enough that I could do it using global searches in my editor.
OK, so here's my import.py script. Having copied the Plone content to my local drive by FTP, I changed into the folder and did, for example:
./import.py news/article1 news/article2 ...
OK - here it is. I hope it's reasonably self-explanatory. I use the tidy utility to convert sometimes non-conformant HTML into valid XML, and
#! /usr/bin/python # # This assumes you have 'Use sessid' but not 'Use keys' enabled in the # Services settings at http://yoursite/admin/build/services/settings. # You'll need the appropriate permissions set, and the Path module enabled. import xmlrpclib, time, sys, subprocess, os import xml.etree.ElementTree as ET # Where is the XML-RPC service on your Drupal site? s = xmlrpclib.ServerProxy('http://localhost:8888/services/xmlrpc') class node: # You need to set uid and username appropriately for your site if you don't want # everything to be posted by Anonymous. def __init__(self, title, body, path, ntype='page', date=None, uid=1, username='qsf'): self.title = title self.body = body self.path = path self.type = ntype self.uid = uid self.name = username self.promote = False self.format = 3 self.comment = 0 if date: # self.created = date # self.changed = date self.date = date print "date = ",date sessid, user = s.system.connect() for path in sys.argv[1:]: try: # Read the XML file, tidying it up and making valid XML. tidypipe = subprocess.Popen(["tidy", "-q", "-asxml", "-n", path], stdout=subprocess.PIPE) # The 'tidy' process will sometimes add namespace declarations. # ElementTree stuff below gets messy if we have to parse namespaces # so I'm just going to throw them away before we read the file. nsstrip = subprocess.Popen(["sed", "s/^<html .*>$/<html>/"], stdin = tidypipe.stdout, stdout=subprocess.PIPE) # Read the file into a DOM tree tree = ET.parse(nsstrip.stdout) root = tree.getroot() desc = None date = None ntype = 'page' # Some of the info we want is in the <meta> tags. for i in root.findall(".//meta"): if i.get('name')=='Description': desc = i.get('content') if i.get('name')=='Effective_date': date = i.get('content') if i.get('name')=='Type' and i.get('content')=='News Item': ntype='story' # This is where we build up the body of our new node. body = "" # We turn what was the 'Description' in Plone into the teaser text in Drupal if desc: body += "<p>%s</b>\n<!--break-->\n" % desc # The title comes from <title> title = root.find('.//title').text # and the rest from <body> body_outer = root.find('.//body') # The <body> may have text that isn't inside any element. body += body_outer.text # then we add all the elements. for n in body_outer: body += ET.tostring(n) # Strip off index_html, if that's part of the path path = path.replace('/index_html', '') # Show the user what's happening print "Title:",title print "Desc: ",desc print "Date: ",date # And create a new node n = node(title, body, path, ntype=ntype, date=date) s.node.save(sessid, n) # On a Mac this will open it in the browser so you can check it out. # os.system('open %s/%s' % ('http://localhost:8888', path)) print "------------------------------------------" except xmlrpclib.Fault, err: print "======================================" print "A fault occurred with",path print "Fault code: %d" % err.faultCode print "Fault string: %s" % err.faultString print "======================================"
I don't normally have comments switched on on this site, so if you'd like to add any comments/tips, it might be best to do so on the Status-Q post.
Quentin Stafford-Fraser, Dec 2008.