How to read in data

So you've run a bunch of simulations and gotten your data. This in itself isn't that hard-- you can open the file in a editor and it's all right there. But you can't code with that. You want to be able to easily access where atoms are, do calculations, and then go through all frames.

MP3 is designed to make this easy, by handling the interface to the files. It uses an object oriented design to make things easy. It's all in python, which is easy to use and portable. In order to use it, you first should understand the basics of python and object-oriented design. Simply reading the python tutorial should be plenty. From here on, everything assumes that you've done that.

Everything is in the python namespace mp3. (Yes, I know what you're thinking. Yes, that's why we named it that, no, there's no reason and the acronym doesn't really mean anything. It's just funny.) Most things are directly in the mp3 namespace, but there are some sub-modules you may need to import at some point. I don't reccomend doing from mp3 import *, since there is a lot of random stuff there which shouldn't be in your name mainspace (In the past, I didn't maintain that as well as I could have.)

So, let's get started.

{{{cd tmpDir/ # some place we can do scratch work wget python }}}

{{{>>> import mp3 }}}

The first task is to read in the coordinates. This is handled by what is called a cord-object. There are many variations on them, all suitable for reading from different file formats or doing different operations. We'll start out with CordDCD, which is designed to read DCD files (produced by NAMD and VMD, for example).

MP3 was designed very particularly. When you open a file, not all data is available for you immediately. Some of the files my group deals with are huge, far more than the available memory in most computers. Plus, that is just inefficient, when only a small part of the data is needed at any one time.

So a design decision was made. MP3 operates by looking at one data frame at a time. This minimizes the amount of data that has to be stored in memory, but slightly complicates accessing data. You have to explicitly tell mp3 to load the next frame, and once you do the previous frames are lost. This method is suitable for a lot of analyzes, but of course the code can always cache previous frames if need be.

So, now on to reading. The CordDCD object is initilized with the filename of the DCD file, as such

{{{>>> c = mp3.CordDCD("water1000.dcd") >>> c cord object 2478 atoms 10 frames >>> }}}

The string representation of the object shows the number of atoms and frames in it.

When it is first loaded, no data frames are loaded. To open the next frame, use the nextframe() method.

{{{>>> c.nextframe() array([[-13.15221596, -6.29832411, 1.11053026],


This loads the next frame from the file, stores it, and returns it as well. The last frame can be accessed with the frame() method. It will be the same frame returned from the last nextframe() call.

{{{>>> c.frame() array([[-13.15221596, -6.29832411, 1.11053026],

}}} Note that frame() is a method, not an attribute (an attribute would be accessed as c.frame, not c.frame()) Most properties are methods, instead of attributes. This was a design decision made a while ago, and while it may not have been the best choice, it's being kept for backwards compatibility.

We might also want to know the number of frames total and the frame number we are on. This can be found with nframes() (number of frames) and framen() (frame number).

{{{>>> c.nframes() 10 >>> c.framen() 0 }}}

Just like in standard python indexing, numbering starts from zero.

So, just how do you use the data in the dcd? Presumably, you want to access atom positions. To do that, use the frame. Let's make a copy of it to use to make the code more legible: {{{>>> frame = c.frame() }}}

frame is an array object. The mp3 tutorial will not go into the details of arrays, but this can be found in the numpy, numarray, or Numeric documentation. They are all python modules which do roughly the same task, though numpy is the most modern. The first axis is for atoms, the second axis is for x,y,z coordinates of atoms. (Unfortunately, mp3 only works for three-dimensional data.) The atoms are in the same order they are in in the file.

{{{>>> frame[0] # x,y,z coordinates of the first atom array([-13.15221596, -6.29832411, 1.11053026], type=Float32)

>>> frame[0,0] # x coordinate of the first atom -13.152215957641602 >>> frame[0,1] # y coordinate of the first atom -6.2983241081237793 >>> frame[0,2] # z coordinate of the first atom 1.1105302572250366

>>> frame[:10] # coordinates of the first ten atoms array([[-13.15221596, -6.29832411, 1.11053026],


mp3/Tutorial-Reading (last edited 2008-03-10 01:38:34 by localhost)