Python Beginner Notes: Read & Write Data

Physics Student
The Startup
Published in
4 min readNov 10, 2020

--

Keynotes: This article contains things which ordinary physics student and a beginner in Python, learned trying to solve real problem I got for my homework. It’s about how to easily read experimental data sorted in bunch of files and write them all in one file. How to prepare data in wanted format.

Photo by Laura Kapfer on Unsplash

Motivation: My first story on Medium is going to be about the things I’ve learned as someone for getting the first time trying to use Python for data manipulation and preparation. I’m a physics student currently working in a theoretical physics group, and doing some specific numeric calculations using FORTRAN. Just to clarify why do we use old school FORTRAN? The first reason is tradition, and the second is because it gives the best calculations for a job that we are at now. We use a lot of codes written 20+ years ago.

Problem: Last week I got a task to prepare some data. We had experimental data prepared in 54 .txt files (actually .dat files, but it’s just a matter of convention), and I had to put it all in one file. When I do this in FORTRAN it can be pretty much work, so I decided to use the advantage of Python.

So, I’ll try to explain what I did, as a pure beginner. The first image shows the structure of a single .txt file that needs to be read. There are three columns. The first column is energy (numerical values) and it’s the same in each of 54 files. The second and third columns contained numerical values for a physical variable (Value1 and Value2). Each of the 54 files gives data for 54 different variables. So the dimension of every file is 3 columns and 285 rows. On a screenshot, I show just a few rows.

Input files — numeric values sorted in three columns with 285 lines (rows)

Idea was to get a single file that had 285 lines and the first column containing energy values + 54x2 columns containing physical variables values. A screenshot of a part of such a file is given on the next image. So I had to merge files by columns.

The final result is going to be as shown.

Solution: The first step is to read data, multiple files in a single folder. I used Pandas and Glob packages.

The great thing about Glob is that I could read multiple files and concate them in a single line of code. But this gave me first column (W) multiple times (position 1, 4, 7, 11,…). So I decided to remove it and manually put it back only as a first column. I did it using the second part of the code.

Data sorted in wanted format (W — first column, other variables — second, third, etc columns.

The final step was to save it all to a single file. I used NumPy save text and pandas to_csv functions. Both gave the same result.

So, final result gives one file with data sorted as shown on this photo.

Final result — data sorted in single file.

One comment on used codes: command glob.glob(‘*.txt’) reads files in arbitrary order. I used option sort(glob.glob(‘*.txt’)) to read files ordered by name. Other options are shown in next code.

Issue to solve: I did not manage to export header (column names) in such a manner that they were aligned to columns containing numerical values. If I use pandas I got results as shown.

Header and columns are not properly aligned.

I hope that solution for last will come in comments from more expirience coders.

--

--