달력

5

« 2024/5 »

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31


http://linuxmaster.tistory.com
 http://cafe.naver.com/linuxmaster.cafe fs


  자바 큰파일 읽어서 String 에 저장
  첨부파일을 String 담아야 하는 이놈의 첨부파일(TEXT)가 경우 19메가인데..
  일반적인 파일 읽기  reader.readLine() 메소드를 이용하여 String 객체에 누적하도록
  프로그램을 해두고 오늘도 야근 밥을 먹고왔는데로 진척이없다 -.-;;; C 처럼 fread없나 ~.~
  뭐 아무래도 String 객체가 한줄 읽을 때 마다 새로 만들고 가비지 켈렉션을 하는 오버헤드가
  많은 갑다... 그래서 구굴링을 한 결과 StringBuffer를 이용하라~~
 그래서 해보니~ 잠깐 ~ 화장실 갔다 온사이에 끝~~ 역시 검색의 생활화~~ 
 참고로 아래 코드 첨부함

================================================================================================
 StringBuffer buffer = new StringBuffer();
 BufferedReader reader = new BufferedReader(new FileReader(txt_attach_file));
 String tmp_buff="";
 while ((inputLine = reader.readLine()) != null){
        buffer.append(inputLine).append(env_var.m_cr_lf);
 }
 reader.close();
 System.out.println(buffer.toString());


================================================================================================

 Indexing from a big file
항목에서 총 9개의 메시지 - 트리로 보기 
 보낸 사람:  Jon Skeet - 프로필 보기
날짜:  2002년12월26일(목) 오전8시51분
이메일:   Jon Skeet <s...@pobox.com>
그룹스:   comp.lang.java.programmer
분류되지 않음순위:   
옵션 보기 
회신 | 작성자에게 회신 | 전달 | 인쇄 | 개별 메시지 | 원문 보기 | 불건전 메시지 신고 | 이 작성자의 메시지 찾기 



- 따온 텍스트 숨기기 -
- 따온 텍스트 보기 -

a...@pandora.nospam.be wrote:
> I'll first explain my situation and then you will see what my problem
> is. I have one big text-file from around 1GB and every line in the
> file starts with a name plus some extra information.

> There are 4000 lines in this file and for adding them to a database
> I'll have to search that file about 10.000 times.
> Now I can make an index of that file with only the names and such
> index takes just 100 kb. Then I could search the index-file, find out
> what linenumber it is and pick the line in the big file. But now I'm
> wondering if this goes faster at all. This because the only manner I
> know for going to for example line 1000 in the big file is:


> LineNumberReader in = new LineNumberReader
> (new FileReader("names1.txt"));  
> for (int i=0; i<1000; i++) in.readLine();


> I just read in the 1000 lines.
> Or are there other solutions ? I really hope so ...
> Maybe putting the file into a binary file and index with the name
> where a certain line begins (in bytes). But that'll make it a lot more
> complicated.


When you write your index, instead of writing the line number, write the
position in the file in terms of bytes, then you can seek there.

--
Jon Skeet
s...@pobox.com - http://www.pobox.com/~skeet
If replying to the group, please do not mail me at the same time


회신
 

 보낸 사람:  Michiel Konstapel - 프로필 보기
날짜:  2002년12월26일(목) 오후9시26분
이메일:   "Michiel Konstapel" <a...@me.nl>
그룹스:   comp.lang.java.programmer
분류되지 않음순위:   
옵션 보기 
회신 | 작성자에게 회신 | 전달 | 인쇄 | 개별 메시지 | 원문 보기 | 불건전 메시지 신고 | 이 작성자의 메시지 찾기 


> >When you write your index, instead of writing the line number, write the
> >position in the file in terms of bytes, then you can seek there.
> And how can I get from the linenumber to the byte where some line
> begins ? Thx for the help


Don't use readLine() but read the file manually, looking for the appropriate
EOL sequence (usually \n for *nix text files, \r\n for Windows and \r for
Mac). Read the file character by character. When you find the end of a line,
store the index of the start of the next one somewhere. I'd go for an array,
using the line number as the array index and the position in the file as the
value. Then, open a RandomAccessFile and you can use seek() to go to the
start of any line you want.
HTH,
Michiel

회신
 

 보낸 사람:  Michiel Konstapel - 프로필 보기
날짜:  2002년12월26일(목) 오후10시03분
이메일:   "Michiel Konstapel" <a...@me.nl>
그룹스:   comp.lang.java.programmer
분류되지 않음순위:   
옵션 보기 
회신 | 작성자에게 회신 | 전달 | 인쇄 | 개별 메시지 | 원문 보기 | 불건전 메시지 신고 | 이 작성자의 메시지 찾기 


"@did" <a...@pandora.nospam.be> wrote in message


news:hntl0v490rn1e9jmjjk6ijutarortooukr@4ax.com...


> Thx, I think it will become something like that. Except for the array,
> I tried to put only the 4000 names in a Vector and I got the
> "OutOfMemory" exception and I have 512 mb ram.


What exactly are you putting in the vector? 4000 doesn't seem like a lot,
unless they're all big somethings. Note that by default, java only uses 64
MB of RAM. Use the -Xmx command line switch to set the maximum amount it can
use, if needed.
Michiel

회신
 

 보낸 사람:  Hayek - 프로필 보기
날짜:  2002년12월26일(목) 오후10시47분
이메일:   Hayek <haye...@nospam.xs4all.nl>
그룹스:   comp.lang.java.programmer
분류되지 않음순위:   
옵션 보기 
회신 | 작성자에게 회신 | 전달 | 인쇄 | 개별 메시지 | 원문 보기 | 불건전 메시지 신고 | 이 작성자의 메시지 찾기 


@did wrote:


 >
 > Any help is welcome !!

I will send you a 5 mb dir file,
and a Java program that reads it.


On a pedestrian Celeron 700 WITH 66Mhz bus it reads a
txt file of 5.8 MB in 300 msecs and parses it in 300 msecs.


For your file that would extrapolate to twice 51 seconds.


If you have a 2.1 ghz processor and faster ram, that
would be twice 20 seconds. You might add more ram, or
read the file in more passes.


I've been doing this since the advent of Apple ][ Pascal
in 1982 because the readln()'s, were as horrible slow as
they are still under Java. Not as horrible but
relatively to.. So we looked then for making it faster,
reading in a chunk of file and do the parsing directly
from the bytearray. When I started with Java, I used
immediately the same box of tricks, knowing that it was
interpreted, so was Apple Pascal then. The famous
P-system. There was no hotspot compiler then, but this
piece of code that clearly shows you what it is worth.
It is candiferous. If you do not know what candiferous
means rent the movie "Moulin Rouge".


I am looking forward to 64 bit computing and 12 gig of
ram, and of course 64 bit Java running on an amd Opteron.


Hayek.


--
The small particles wave at
the big stars and get noticed.
:-)


회신
 

 보낸 사람:  Michiel Konstapel - 프로필 보기
날짜:  2002년12월27일(금) 오전1시13분
이메일:   "Michiel Konstapel" <a...@me.nl>
그룹스:   comp.lang.java.programmer
분류되지 않음순위:   
옵션 보기 
회신 | 작성자에게 회신 | 전달 | 인쇄 | 개별 메시지 | 원문 보기 | 불건전 메시지 신고 | 이 작성자의 메시지 찾기 



- 따온 텍스트 숨기기 -
- 따온 텍스트 보기 -

> >"@did" <a...@pandora.nospam.be> wrote in message
> >news:hntl0v490rn1e9jmjjk6ijutarortooukr@4ax.com...
> >> Thx, I think it will become something like that. Except for the array,
> >> I tried to put only the 4000 names in a Vector and I got the
> >> "OutOfMemory" exception and I have 512 mb ram.

> >What exactly are you putting in the vector? 4000 doesn't seem like a lot,
> >unless they're all big somethings. Note that by default, java only uses
64
> >MB of RAM. Use the -Xmx command line switch to set the maximum amount it
can
> >use, if needed.
> >Michiel


> A name is about 40 characters, and when adding them all(4000) to a
> vector I get the "java.lang.OutOfMemoryError" exception. I'll give it
> a try with command line setting you proposed.


40*4000 = 16K characters, or 32KB of memory. There must be something else
using all that memory. Care to post your code?
Michiel

회신
 

Subject changed: Indexing from a big file - java.zip (0/1)   
 보낸 사람:  Michiel Konstapel - 프로필 보기
날짜:  2002년12월27일(금) 오후12시16분
이메일:   "Michiel Konstapel" <a...@me.nl>
그룹스:   comp.lang.java.programmer
분류되지 않음순위:   
옵션 보기 
회신 | 작성자에게 회신 | 전달 | 인쇄 | 개별 메시지 | 원문 보기 | 불건전 메시지 신고 | 이 작성자의 메시지 찾기 


"@did" <a...@pandora.nospam.be> wrote in message


news:8kum0v4ppjld4rj1ceadvuqgq4pp439ga6@4ax.com...


> http://studwww.rug.ac.be/~ssteveli/java/java.zip

> This will be better ...


Yup, got it. Let me take a look....
You're right, the cause of the OutOfMemoryError is quite non-obvious but
some things I've read on the newsgroup pointed me in the right (I hope)
direction. This is your file reading loop:

  while(line != null) {
   size = line.length();
   name = line.substring(size-40,size);
   System.out.println(name);
   all.add(name);
   line = in.readLine();
  }


Looks fine. But the line.substring() is tripping you up: your lines are very
long (about 4K characters each) and though you are only interested in 40
characters of each, substring() actually returns a String instance that
*shares* the character array of the original String. That means, that the
whole 8KB (4K characters times 16 bits) of data is actually still referenced
(instead of only 80 bytes), so it can't be garbage collected. In the end
you'll actually have the entire file (64 megs) in memory - of which by
default you only have 64 MB...


The solution is to use the seemingly useless String constructor, new
String(String). What this does is copy the character data to a new,
minimally sized character array, which isn't shared with another instance.
That means that the old 4K characters can be garbage collected. After a
minor modification,


    name = new String(line.substring(size-40,size));


your code works fine, completing in about 15 seconds on my machine (Duron
800).
HTH,
Michiel


회신
 

 보낸 사람:  Michiel Konstapel - 프로필 보기
날짜:  2002년12월28일(토) 오전4시34분
이메일:   "Michiel Konstapel" <a...@me.nl>
그룹스:   comp.lang.java.programmer
분류되지 않음순위:   
옵션 보기 
회신 | 작성자에게 회신 | 전달 | 인쇄 | 개별 메시지 | 원문 보기 | 불건전 메시지 신고 | 이 작성자의 메시지 찾기 


"@did" <a...@pandora.nospam.be> wrote in message


news:qj8o0vkkn074gbua0kr8ffs6roaiqcnos4@4ax.com...


> It had to be something like that, that the whole string stayed in the
> memory because when I wrote only those names into a file and read it
> from that I file there wasn't any problem. But thx a lot for reaching
> out the solution.


No problem, glad I could help. It's just one of those "you just have to
know" things.
HAND,
Michiel

회신
 

 보낸 사람:  John F. O'Brien - 프로필 보기
날짜:  2002년12월28일(토) 오전10시57분
이메일:   "John F. O'Brien" <jobr...@ticon.net>
그룹스:   comp.lang.java.programmer
분류되지 않음순위:   
옵션 보기 
회신 | 작성자에게 회신 | 전달 | 인쇄 | 개별 메시지 | 원문 보기 | 불건전 메시지 신고 | 이 작성자의 메시지 찾기 


Would reading into a StringBuffer object (rather than String) have worked?


"Michiel Konstapel" <a...@me.nl> wrote in message


news:5U1P9.1891$T41.2098@amstwist00...


- 따온 텍스트 숨기기 -
- 따온 텍스트 보기 -

> "@did" <a...@pandora.nospam.be> wrote in message
> news:qj8o0vkkn074gbua0kr8ffs6roaiqcnos4@4ax.com...

> > It had to be something like that, that the whole string stayed in the
> > memory because when I wrote only those names into a file and read it
> > from that I file there wasn't any problem. But thx a lot for reaching
> > out the solution.


> No problem, glad I could help. It's just one of those "you just have to
> know" things.
> HAND,
> Michiel


회신
 

 보낸 사람:  Michiel Konstapel - 프로필 보기
날짜:  2002년12월28일(토) 오후2시05분
이메일:   "Michiel Konstapel" <a...@me.nl>
그룹스:   comp.lang.java.programmer
분류되지 않음순위:   
옵션 보기 
회신 | 작성자에게 회신 | 전달 | 인쇄 | 개별 메시지 | 원문 보기 | 불건전 메시지 신고 | 이 작성자의 메시지 찾기 


> "Michiel Konstapel" <a...@me.nl> wrote in message
> news:5U1P9.1891$T41.2098@amstwist00...
> > "@did" <a...@pandora.nospam.be> wrote in message
> > news:qj8o0vkkn074gbua0kr8ffs6roaiqcnos4@4ax.com...

> > > It had to be something like that, that the whole string stayed in the
> > > memory because when I wrote only those names into a file and read it
> > > from that I file there wasn't any problem. But thx a lot for reaching
> > > out the solution.


> > No problem, glad I could help. It's just one of those "you just have to
> > know" things.


"John F. O'Brien" <jobr...@ticon.net> wrote in message
news:auj0fh$gsk$1@galileo.ticon.net...


> Would reading into a StringBuffer object (rather than String) have worked?


No, because readLine() would still have returned a String. Or do you mean
appending each line to a StringBuffer, like

  StringBuffer buffer = new StringBuffer();
  String line = in.readLine();
  while(line != null) {
     size = line.length();
     name = line.substring(size-40,size);
     System.out.println(name);
     // was: all.add(name);
     buffer.append(line).append("\n");
     line = in.readLine();
  }


In that case, yes. StringBuffer.append() copies the characters it appends,
not referencing the appended String. The temporary Strings referenced by
line and name would all be eligible for garbage collection.
Michiel


회신
 

 

 

:
Posted by mastar