Scanning dicts

alaraints · April 22, 2019, 11:48am

I have a function in my script for scanning one file with another that I have solved in awk:
awk ‘FNR==NR {
a[$2]=$1
next
}
{ for (i in a)
if ($2 ~ i) {$3+=a[i];}
}’ file1 file2 > fileout

file1 and file2 are type string-int.

Can this be solved in Julia more efficiently? How?

Best regards,

alaraints

StefanKarpinski · April 22, 2019, 11:53am

I’m afraid that’s beyond my ability to understand awk, could you explain what your script does?

alaraints · April 22, 2019, 2:41pm

$ denotes field in awk tabular layout.
The first half creates an associative awk array from the file1. $2 is short text ADFGT; $1 the corresponding value 324 (for example). In Julia this would be called a dict, I guess?
The second half takes the text bits i from the array a (one by one) and looks for match in file2, $2, which is a longer string. If match is found, the corresponding $1 of file1 is added to the matching line $3.
file1 is 50k lines, file2 is 2M lines. Takes ages.
I am currently reading Introducing Julia/Controlling the flow - Wikibooks, open books for an open world, I’m as far as creating the arrays. I have to find the match function yet, and a couple of other things too. Just wondering if it is worth the effort. Thank you for your interest.

Tamas_Papp · April 22, 2019, 2:49pm

You may find the manual useful, especially on strings and about using the language in general.

Depends on your payoff function. I can’t say whether it would be faster or slower than awk, especially if this is your first Julia program. But if you aim to learn Julia, programming a script like this is not a bad start.

Topic		Replies	Views
How do I make the julia code efficient? General Usage question	3	302	September 21, 2022
Is there a dictionary-based data structure where the keys are disjoint ranges, and indexing with an integer looks up the matching range? General Usage question	3	547	April 24, 2020
Does Python and Julia have the same file reading tools? Data	14	2031	January 4, 2020
Julia simple String process and file I/O is slow Performance performance , parsing	12	2937	December 23, 2020
Anybody can show me the grammer for dict's dict in julia General Usage question	8	5552	March 10, 2019

Scanning dicts

Related topics