Even for analysis, use cases may be so different and corresponding tools so diverse that the only format you can store your own copy of data is something as simple as CSV or JSON.
For example, if you want to have running statistics, CSV is good enough - it’s a row-based format which you can read line by line without worrying about memory requirements. I don’t know whether it’s possible to do the same with HDF5, but it also won’t give any real advantage in this scenario.
When your data is really huge, Hadoop FS may be helpful (I’ll call to “Hadoop FS” instead of short “HDFS” to avoid confusion with “HDF5”). Hadoop FS supports any files, but choice of input and output formats - something you need to efficiently split data into chunks and process in parallel - is very limited. HDF5 isn’t of one of them. Hadoop / Spark have their own efficient formats like Parquet, but it automatically makes your data a touch fruit for other tools.
Finally, if you want to access separate records from your data in random order, you’ll have to use some kind of indexed database (like PostgreSQL or MongoDB). In this case you don’t care about the format at all, you just use API of that database.