20170213

Collating a list in Go

GOLANG: So I am working on an admin tool for a database-backend. One of the reports wanted is to extract from a table number of messages in a given network. Which sounds easier than it is. The database entry for a message got quite the few fields, but the only one I am interested in, is the network ID which is stamped into each message entry. So there is a lot of network IDs that repeat. I got to count them and collate them for the report.

In this case, it is finding what networks is near or at the limit for free use, which is to say between 40 and 50 messages in the network. And to do this without hitting the backend database any more than strictly speaking necessary. With well over 100k individual messages in the database spread out over some 5k networks, the obvious solution is to just fetch and cache the entire table column of network IDs. Each ID being 64 bits, that is a transfer of roughly 800kb which is not too bad.

Then sort through it all. It took a bit of head-scratching thinking through the problem. And a pot of coffee.

Eventually I came up with this:

---

type collate_t struct {
  id       int64
  count    int
  // whatever else you need
}

func collateList(entry []collate_t) []collate_t {
   
  var collected collate_t
  var collated []collate_t

  for i := range entry {
    collected.id = entry[i].id
    if i > 0 && getExist(collated, collected.id) {
      collected.count = 0
    } else {
      if i < len(entry) {
        for j := i + 1; j < len(entry); j++ {
          if collected.id == entry[j].id {
            collected.count++
          }
        }

        collated = append(collated, collected) 
      }
    }
  }
  return collated
}

func getExist(c []collate_t, id int64) bool {
  for i := range c {
    if id == c[i].id
    return true
  }
  return false
}


---

That is *NOT* the first version - it is however the *WORKING* version 😁

Oh, and I did not include the cutoff function to trim the fat to only network IDs with between 40 and 50 messages in them - as that is rather simple.

The code is presented here a bit more general than what I've used in the actual application - as that is mired in a function with some postgres SQL queries and a few other bits and bobs. Anyhow, I think the solution is quite efficient - eventhough it do not use any of Gos' concurrency-goodieness.

The short of it is to iterate over the incoming entry slice - transfer the ID to a temp struct. Check if the incoming slice is iterated over at least once and if that ID have been counted already and is registered in the collated list. If not, run through the entire range of the incoming slice and count the number of hits on that ID. Finally then append to the collated list the new collected ID and count.

Pretty pleased with it, and just had to reward myself a Guiness for the effort -  and to calm my little tummy after all the coffee it took to reach that solution!

No comments:

Post a Comment