I have a csv file that looks like the following:
ID L1 L2 L3 L4 X1 Y1 Z1
1 3 3 1 2 f f x
1 3 3 3 2 g f f
2 3 4 4 3 o p q
I want to focus on Keeping (id, Li) where i = 1, 2, 3, 4 as key and frequency of occurrence as value. I want the output as a list [1, 3, 5]
which represents the following:
<1, 3> appeared 5 (i.e. where ever 1 was there 3 appeared in L1 and/or L2 L3 L4)
<1, 1> appeared 1
<1, 2> appeared 2
<2, 3> appeared 2
<2, 4> appeared 2
If there is a new entry, it gets added and the old one gets counted.
Here is what I have tried:
import csv
import sys
from collections import defaultdict
from itertools import imap
from operator import itemgetter
csv.field_size_limit(sys.maxsize)
d = defaultdict(lambda: defaultdict(lambda: defaultdict(int)))
with open(myfile, 'r') as fi:
for item in csv.DictReader(fi):
for count in range(1, 5):
d[int(item['ID'])]['L'+str(count)][item['L'+str(count)]] += 1
But this is creating separate values for each L1-4 column wise. Like [1 (ID), 3 (L1), 2 (Frequency)] [1, 3(L2), 2]
. How can the whole L1-4 be considered as one and based on ID and values of L1-4 frequency is counted?
You have done well to state your problem with the phrase "keeping (id, Li) as key". In fact, you can use a little-known Python ability to do just that. The Python tuple object is a valid dict key. Therefore, this will work:
counts = defaultdict(int)
# accumulate counts indexed by a tuple (id,Li)
for item in csv.DictReader(fi):
id = int(item["ID"]) # note 'int()' here and below assumes you actually want the values to be integers, drop it if you want them as strings from csv
for l in ("L1", "L2", "L3", "L4"):
counts[ (id, int(item[l])) ] += 1
# now, all that's left is to convert each entry in counts to the list that we want
for key,item in counts.items():
lst = list(key) + [item] # list() converts the tuple (id,Li) to [id,li] so we can append the count to that
print (repr(lst))
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments