Results 1 to 10 of 10

  1. Post
    #1

    Spreadsheet within DB

    Hey guys,
    I'm just trying to decide on the best way to support a "database within database" concept. Basically the situation is we have a web application where users can upload big ass spreadsheets. They can use a form to enter some data and an AJAX runs using one of the columns from their spreadsheet to match and if it matches, populate some other fields with the associated row data.

    In terms of numbers we are expecting spreadsheets of up to 20 columns and maybe 30,000 rows. We could have a few hundred of them.

    Our (ex) developer performed some "tests" and decided EAV would be best. Well his test must have been 3 columns and 10 rows as EAV seems to such with these volumes no matter what we do with indexes. Like almost 5 minutes to load a spreadsheet some times which obviusly won't work for AJAX. The ideas we have come up with:

    - Find a magical solution to make EAV work
    - Have one table per spreadsheet (leaning towards this atm)
    - Look at no-sql DB (scary)
    - Maybe store the databases as files on the filesystem and use PHP to parse and "query" them (probably bad idea)

    What are people's thoughts on this? We are using MySQL 5 for the DB server but could implement something else.

  2. Post
    #2
    My first thought is. KILL IT! KILL IT WITH FIRE!

    My second thought is, so they upload the spreadsheet, and you store it in the DB somehow, server-side, to run some queries? Are the spreadsheets of a consistent schema? Do you always have the same columns?

    Your queries run against one column and then retrieve the rest of the row's information based on that, right?

    Lastly, spreadsheet != DB. Unless you're doing it wrong with Excel and Access glued together via COM/OLE/Satan's infernal power.

  3. Post
    #3
    Edward Diego wrote:
    Lastly, spreadsheet != DB.
    Tell DOC that. I've seen their Excel "databases"

  4. Post
    #4
    Edward Diego wrote:
    Unless you're doing it wrong with Excel and Access glued together via COM/OLE/Satan's infernal power.
    Ha. I did this once. Only once. It was gay.

    ED speaks much truth. However as far as I can figure out, you only have 2 real options if each speadsheet is unique, and you've mentioned both:

    1. Table for each spreadsheet
    2. Store the file on your web server and store a link in your database along with a table with the reference column information.

    Option two seems a bit ridiculous to me though, as you will have to do a DB lookup (which will be fast), then parse the document (slowish). And depending on how much information you need to store in the database to allow users to search (i.e. which columns they can search on) will determine if this is even feasible.

    Option one on the other hand requires a lot more database room, but should be relatively fast if you index columns correctly. Given the limited options available I'd probably be leaning toward a table for each spreadsheet if database storage isn't an issue.

  5. Post
    #5
    Ah ye olde spreadsheet upload, always a pain in the rear.

    We do something similar on one of our sites where a user can upload a list of barcodes and product names to do a search and find matching stuff in our system, we add all the matches to their cart/basket all the non matches to their watch list and let them export out their watch list.

    How complicated are the spreadsheets? Are they straight data (headings, cells, rows) or are they using graphs, formulas etc? If you can stipulate they have to upload csv's for example, DO IT it's way simpler... otherwise the Java based POI library is your friend for reading/writing xls. For php I think you'd can use php/java bridge to leverage it, for us our site that mucks around with spreadsheets is in asp.net so we're using the c# port of POI called NPOI works well.

    Anyway based on what you've said so far I would favour parsing the spreadsheet after upload into DB tables, doing all the "work" on the data when it's in the DB and providing an export back out to spreadsheet feature.

  6. Post
    #6
    If you're doing this in .NET world, my simple suggestion is

    a) get machine with some RAM
    b) load spreadsheet into RAM
    c) do LINQ to it

    If you're doing it in PHP... give http://phplinq.codeplex.com/ that a shot. Will still need some RAM if the spreadsheets are large though.

  7. Post
    #7
    Thanks for the info guys,

    Luckily its pretty straightforward IE just a CSV (data) upload so no graphs or anything like that. Just straight UTF-8 characters for each cell probably no more than a couple of hundred characters long but more like 10-20 per cell.

    We don't really even need to do an export of the spreadsheets but probably will do so for backup purposes. OK so at this stage I'm getting the vibe that we should look fort eh individual table approach?

    That PHPLinq looks pretty cool ED. RAM Isn't too much of an issue costing like $10 per GB these days. The only thing is that if we do it like that in PHPlinq wouldn't it ened to reload the entire spreadsheet each time an AJAX runs = very disk/cpu/mem intensive?

  8. Post
    #8
    Zeon wrote:
    Thanks for the info guys,

    Luckily its pretty straightforward IE just a CSV (data) upload so no graphs or anything like that. Just straight UTF-8 characters for each cell probably no more than a couple of hundred characters long but more like 10-20 per cell.

    We don't really even need to do an export of the spreadsheets but probably will do so for backup purposes. OK so at this stage I'm getting the vibe that we should look fort eh individual table approach?

    That PHPLinq looks pretty cool ED. RAM Isn't too much of an issue costing like $10 per GB these days. The only thing is that if we do it like that in PHPlinq wouldn't it ened to reload the entire spreadsheet each time an AJAX runs = very disk/cpu/mem intensive?
    Holy shit I'd hope not. Surely you could keep it in memory between requests somehow? I'm really bad at PHP, but I presume there's a way to persist state to memory across requests.

  9. Post
    #9
    Edward Diego wrote:
    Holy shit I'd hope not. Surely you could keep it in memory between requests somehow? I'm really bad at PHP, but I presume there's a way to persist state to memory across requests.
    The only way I could think of achieving that with php is using something like Redis and serializing. Php has no support in the language or interpreter for such behaviour afaik, and if it did, it would destroy your computer in a flame of memory leaks.

  10. Post
    #10
    te_chris wrote:
    The only way I could think of achieving that with php is using something like Redis and serializing. Php has no support in the language or interpreter for such behaviour afaik, and if it did, it would destroy your computer in a flame of memory leaks.

    ...man, in that case... um... do you have to use PHP for it? Worst case, 200 chars * 20 cols * 30,000 rows in CSV = 156MB. You can't transfer that again and again. If you could load it into memory reasonably fast, then persist it to disk and reload it as needed, but hmm...

    This screams for it to be stuck in memory and associated to a session. Otherwise, use a key-value store of your choosing. Create a new bucket for each new spreadsheet, each column header is a key, each column's data is a string represented list value. Each row is also stored in a key/val indexed off the row's index. (So you find 'foo' in columnA at position 645, you retrieve the row with key 645 to emulate select * from sheet where columnA = foo, also you have the option of storing individual rows as JSON or column name / value pairs to make it saner to work wtih.

    Anything's got to be better than writing queries for MySQL like

    Code:
    select (col_1, col_2, col_55) from sheet_345 where col_7 ilike 'Monkeys'
    Which is what Zeon sounds close to doing.