Python Beginner Notes: Read & Write Data
Keynotes: This article contains things which ordinary physics student and a beginner in Python, learned trying to solve real problem I got for my homework. It’s about how to easily read experimental data sorted in bunch of files and write them all in one file. How to prepare data in wanted format.
Motivation: My first story on Medium is going to be about the things I’ve learned as someone for getting the first time trying to use Python for data manipulation and preparation. I’m a physics student currently working in a theoretical physics group, and doing some specific numeric calculations using FORTRAN. Just to clarify why do we use old school FORTRAN? The first reason is tradition, and the second is because it gives the best calculations for a job that we are at now. We use a lot of codes written 20+ years ago.
Problem: Last week I got a task to prepare some data. We had experimental data prepared in 54 .txt files (actually .dat files, but it’s just a matter of convention), and I had to put it all in one file. When I do this in FORTRAN it can be pretty much work, so I decided to use the advantage of Python.
So, I’ll try to explain what I did, as a pure beginner. The first image shows the structure of a single .txt file that needs to be read. There are three columns. The first column is energy (numerical values) and it’s the same in each of 54 files. The second and third columns contained numerical values for a physical variable (Value1 and Value2). Each of the 54 files gives data for 54 different variables. So the dimension of every file is 3 columns and 285 rows. On a screenshot, I show just a few rows.
Idea was to get a single file that had 285 lines and the first column containing energy values + 54x2 columns containing physical variables values. A screenshot of a part of such a file is given on the next image. So I had to merge files by columns.
Solution: The first step is to read data, multiple files in a single folder. I used Pandas and Glob packages.
The great thing about Glob is that I could read multiple files and concate them in a single line of code. But this gave me first column (W) multiple times (position 1, 4, 7, 11,…). So I decided to remove it and manually put it back only as a first column. I did it using the second part of the code.
The final step was to save it all to a single file. I used NumPy save text and pandas to_csv functions. Both gave the same result.
So, final result gives one file with data sorted as shown on this photo.
One comment on used codes: command glob.glob(‘*.txt’) reads files in arbitrary order. I used option sort(glob.glob(‘*.txt’)) to read files ordered by name. Other options are shown in next code.
Issue to solve: I did not manage to export header (column names) in such a manner that they were aligned to columns containing numerical values. If I use pandas I got results as shown.
I hope that solution for last will come in comments from more expirience coders.