Story Time With Penaz: The Database Project

A story of deadlines, despair and a dash of SQL. With a bitter lesson at the end.

This is a small story and a lesson I'll need to remember for the days to come. The situation: I'm doing a database for my University exam, it's about the day before the day you have to turn in the project, everything's perfect, the database works without errors, all the triggers work and functions are perfect, given the test data (about 3 rows per table) I made by hand, I receive the data to fill into the database from my project mate, everything looks great.

I open my browser and start Google SpreadSheet (an overall nice web app) and download the TSV (Tab Separated Values) file generated. I source the sql file and there seem to be no errors, I do "SELECT * FROM Sede;", everything works fine. I do a second trial with "SELECT * FROM Dipendente;" and this happens:

SQL Horror

A bit surprised I decide to take a look at the generated data, everything seems fine.

Half an hour passes without any progress, I decide to show only part of the data with "SELECT Matricola FROM Dipendente;" and the column shows fine, even more baffled I keep on going with my research and at the 60 minute mark from the beginning of this nightmare I manage to isolate the issue to the last column of the table, but it doesn't look like I'm getting closer to the solution. A bit of  sweat comes out of my forehead.

I take out my trusty editor (NeoVIM) again and use ":set list" to see if there are any weird (non visibile) characters inside the file. Nothing.

I hit the 90 minute mark, I start getting angry. I try solutions like adding "FIELDS ENCLOSED BY " ' ";  " But this makes the situation even worse, making the MariaDB engine throw me 400 warnings, all telling me that data was cropped or that there were too many columns on the file, compared to the table inside the database.

The two hour mark is hit, and I am completely clueless, I don't even do any more research on the issue, I gave up and thought that I wouldn't be able to turn in the project and miss months of work, plus months of waiting for the next exam session to come. I was in despair, hitting my desk in frustration every time a solution I tried didn't work.

Then for some reason, without even realizing, I searched on Google "Line terminators" and notice a thing: - Linux/Unix use one character (known as "\n") to tell a line is terminated and go to new line - Windows uses a series of two characters ("\r\n") as line terminators - At the 2.5 hour mark, almost in a zombie-like state I add the magic words: LINES STARTING BY ''' TERMINATED BY '\r\n';

And source the database for I think the 50th time, and scared I do the query: "SELECT * FROM Dipendente;"

Everything works. I'm in tears, no warnings for that table, I won.

I decide to take all the data and pass it through dos2unix, so to change all the line terminators.

I don't know how, but I managed to solve a crippling issue given by the fact that at the end of a line there were two invisible characters instead of one.

Was it divine intervention, something coming from the back of my memory, this probably saved me months of university.

Something to notice is, using Google SpreadSheets to download a TSV from a Linux machine, I get a file with Windows-style line terminators.

And if we assume that TSV files were originally made only for Windows, why does MariaDB on Linux lose its mind when loading them without a specific setting?

Something I won't understand honestly.

Thank you everybody for reading this, and I wish you the best. Have a good one.