This article includes a, but its sources remain unclear because it has insufficient. Please help to this article by more precise citations. (August 2010) A sliding window protocol is a feature of packet-based. Sliding window protocols are used where reliable in-order delivery of packets is required, such as in the as well as in the (TCP). Conceptually, each portion of the transmission (packets in most data link layers, but bytes in TCP) is assigned a unique consecutive sequence number, and the receiver uses the numbers to place received packets in the correct order, discarding duplicate packets and identifying missing ones.
- Data Link Layer In The Internet
- Program For Sliding Window Protocol In Java
- Sliding Window Protocol Program In Java
![Sliding window protocol program in java with output Sliding window protocol program in java with output](/uploads/1/2/3/9/123960973/616630615.jpg)
The problem with this is that there is no limit on the size of the sequence number that can be required. By placing limits on the number of packets that can be transmitted or received at any given time, a sliding window protocol allows an unlimited number of packets to be communicated using fixed-size sequence numbers. The term 'window' on the transmitter side represents the logical boundary of the total number of packets yet to be acknowledged by the receiver. The receiver informs the transmitter in each acknowledgment packet the current maximum receiver buffer size (window boundary).
The TCP header uses a 16 bit field to report the receive window size to the sender. Therefore, the largest window that can be used is 2 16 = 64 kilobytes. In slow-start mode, the transmitter starts with low packet count and increases the number of packets in each transmission after receiving acknowledgment packets from receiver.
For every received, the window slides by one packet (logically) to transmit one new packet. When the window threshold is reached, the transmitter sends one packet for one ack packet received. If the window limit is 10 packets then in slow start mode the transmitter may start transmitting one packet followed by two packets (before transmitting two packets, one packet ack has to be received), followed by three packets and so on until 10 packets. But after reaching 10 packets, further transmissions are restricted to one packet transmitted for one ack packet received. In a simulation this appears as if the window is moving by one packet distance for every ack packet received. On the receiver side also the window moves one packet for every packet received. The sliding window method ensures that traffic on the network is avoided.
The application layer will still be offering data for transmission to TCP without worrying about the network traffic congestion issues as the TCP on sender and receiver side implement sliding windows of packet buffer. The window size may vary dynamically depending on network traffic. For the highest possible, it is important that the transmitter is not forced to stop sending by the sliding window protocol earlier than one (RTT). The limit on the amount of data that it can send before stopping to wait for an should be larger than the of the communications link.
If it is not, the protocol will limit the effective of the link. Contents. Motivation In any communication protocol based on for, the receiver must acknowledge received packets. If the transmitter does not receive an acknowledgment within a reasonable time, it re-sends the data.
A transmitter that does not hear an acknowledgment cannot know if the receiver actually received the packet; it may be that it was lost or damaged in transmission. If reveals corruption, the packet will be ignored by the receiver and no acknowledgement will be sent. Similarly, the receiver is usually uncertain about whether its acknowledgements are being received. It may be that an acknowledgment was sent, but was lost or corrupted in the transmission medium. In this case, the receiver must acknowledge the retransmission to prevent the data being continually resent, but must otherwise ignore it. Protocol operation The transmitter and receiver each have a current sequence number n t and n r, respectively.
They each also have a window size w t and w r. The window sizes may vary, but in simpler implementations they are fixed. The window size must be greater than zero for any progress to be made. As typically implemented, n t is the next packet to be transmitted, i.e.
The sequence number of the first packet not yet transmitted. Likewise, n r is the first packet not yet received. Both numbers are with time; they only ever increase. The receiver may also keep track of the highest sequence number yet received; the variable n s is one more than the sequence number of the highest sequence number received. For simple receivers that only accept packets in order ( w r = 1), this is the same as n r, but can be greater if w r 1.
Note the distinction: all packets below n r have been received, no packets above n s have been received, and between n r and n s, some packets have been received. When the receiver receives a packet, it updates its variables appropriately and transmits an acknowledgment with the new n r. The transmitter keeps track of the highest acknowledgment it has received n a. The transmitter knows that all packets up to, but not including n a have been received, but is uncertain about packets between n a and n s; i.e.
N a ≤ n r ≤ n s. The sequence numbers always obey the rule that n a ≤ n r ≤ n s ≤ n t ≤ n a + w t. That is:. n a ≤ n r: The highest acknowledgement received by the transmitter cannot be higher than the highest n r acknowledged by the receiver. n r ≤ n s: The span of fully received packets cannot extend beyond the end of the partially received packets.
![Program Program](/uploads/1/2/3/9/123960973/395959601.jpg)
n s ≤ n t: The highest packet received cannot be higher than the highest packet sent. n t ≤ n a + w t: The highest packet sent is limited by the highest acknowledgement received and the transmit window size. Transmitter operation Whenever the transmitter has data to send, it may transmit up to w t packets ahead of the latest acknowledgment n a. That is, it may transmit packet number n t as long as n t n r, the packet is stored until all preceding packets have been received. If x≥ n s, the latter is updated to n s= x+1.
If the packet's number is not within the receive window, the receiver discards it and does not modify n r or n s. Whether the packet was accepted or not, the receiver transmits an acknowledgment containing the current n r. (The acknowledgment may also include information about additional packets received between n r or n s, but that only helps efficiency.) Note that there is no point having the receive window w r larger than the transmit window w t, because there is no need to worry about receiving a packet that will never be transmitted; the useful range is 1 ≤ w r ≤ w t. Sequence number range required. Sequence numbers modulo 4, with w r=1. Initially, n t= n r=0 So far, the protocol has been described as if sequence numbers are of unlimited size, ever-increasing.
However, rather than transmitting the full sequence number x in messages, it is possible to transmit only x mod N, for some finite N. ( N is usually a.) For example, the transmitter will only receive acknowledgments in the range n a to n t, inclusive. Since it guarantees that n t− n a ≤ w t, there are at most w t+1 possible sequence numbers that could arrive at any given time. Thus, the transmitter can unambiguously decode the sequence number as long as N w t. A stronger constraint is imposed by the receiver.
The operation of the protocol depends on the receiver being able to reliably distinguish new packets (which should be accepted and processed) from retransmissions of old packets (which should be discarded, and the last acknowledgment retransmitted). This can be done given knowledge of the transmitter's window size. After receiving a packet numbered x, the receiver knows that x x− w t.
Thus, packets numbered x− w t will never again be retransmitted. The lowest sequence number we will ever receive in future is n s− w t The receiver also knows that the transmitter's n a cannot be higher than the highest acknowledgment ever sent, which is n r. So the highest sequence number we could possibly see is n r+ w t ≤ n s+ w t. Thus, there are 2 w t different sequence numbers that the receiver can receive at any one time.
![Sliding Window Protocol Java Program Sliding Window Protocol Java Program](/uploads/1/2/3/9/123960973/770164391.png)
It might therefore seem that we must have N ≥ 2 w t. However, the actual limit is lower. The additional insight is that the receiver does not need to distinguish between sequence numbers that are too low (less than n r) or that are too high (greater than or equal to n s+ w r). In either case, the receiver ignores the packet except to retransmit an acknowledgment.
Thus, it is only necessary that N ≥ w t+ w r. As it is common to have w r1, but a fixed w r=1. The receiver refuses to accept any packet but the next one in sequence. If a packet is lost in transit, following packets are ignored until the missing packet is retransmitted, a minimum loss of one. For this reason, it is inefficient on links that suffer frequent packet loss. Ambiguity example Suppose that we are using a 3-bit sequence number, such as is typical for.
This gives N=2³=8. Since w r=1, we must limit w t≤7. This is because, after transmitting 7 packets, there are 8 possible results: Anywhere from 0 to 7 packets could have been received successfully. This is 8 possibilities, and the transmitter needs enough information in the acknowledgment to distinguish them all.
If the transmitter sent 8 packets without waiting for acknowledgment, it could find itself in a quandary similar to the stop-and-wait case: does the acknowledgment mean that all 8 packets were received successfully, or none of them? Selective repeat The most general case of the sliding window protocol is. This requires a much more capable receiver, which can accept packets with sequence numbers higher than the current n r and store them until the gap is filled in. The advantage, however, is that it is not necessary to discard following correct data for one round-trip time before the transmitter can be informed that a retransmission is required. This is therefore preferred for links with low reliability and/or a high. The window size w r need only be larger than the number of consecutive lost packets that can be tolerated. Thus, small values are popular; w r=2 is common.
Ambiguity example The extremely popular uses a 3-bit sequence number, and has optional provision for selective repeat. However, if selective repeat is to be used, the requirement that n t+ n r ≤ 8 must be maintained; if w r is increased to 2, w t must be decreased to 6. Suppose that w r =2, but an unmodified transmitter is used with w t =7, as is typically used with the go-back-N variant of HDLC.
Further suppose that the receiver begins with n r = n s =0. Now suppose that the receiver sees the following series of packets (all modulo 8): 0 1 2 3 4 5 6 (pause) 0 Because w r =2, the receiver will accept and store the final packet 0 (thinking it is packet 8 in the series), while requesting a retransmission of packet 7. However, it is also possible that the transmitter failed to receive any acknowledgments and has retransmitted packet 0. In this latter case, the receiver would accept the wrong packet as packet 8.
The solution is for the transmitter to limit w t ≤6. With this restriction, the receiver knows, after receiving packet 6, that the transmitter's n a ≥1, and thus the following packet numbered 0 must be packet 8.
If all acknowledgements were lost, then the transmitter would have to stop after packet 5. Extensions There are many ways that the protocol can be extended:. The above examples assumed that packets are never reordered in transmission; they may be lost in transit ( makes corruption equivalent to loss), but will never appear out of order. The protocol can be extended to support packet reordering, as long as the distance can be bounded; the sequence number modulus N must be expanded by the maximum misordering distance. It is possible to not acknowledge every packet, as long as an acknowledgment is sent eventually if there is a pause. For example, TCP normally acknowledges every second packet. It is common to inform the transmitter immediately if a gap in the packet sequence is detected.
Dec 17, 2016 - Pluraleyes 4 Serial Number + Crack Full Version Free Download. It is an advanced and most powerful audio & video sync application program. Apr 22, 2018 - PluralEyes Crack is an innovative audio and video sync software.This version is also 10 times faster than the previous version.it can mech. Jan 31, 2017 - Pluraleyes 4 Crack Full Serial Download. Pluraleyes 4 Crack is an advanced audio and video sync software that offers fast, accurate, and. Keygen-pluraleyes-edius free download. Alpinelinux 3.7 aarch64 raspberrypi 3 Alpinelinux lastest release 3.7 aarch64 persistent image built for the raspberrypi. Plural eyes keygen. Jan 26, 2017 - Download PluralEyes 4 Full 2016. Instalando Plural Eyes 4. PluralEyes 3.3.3 Windows / How to get tutorial for free! Keygen is short for Key.
HDLC has a special REJ (reject) packet for this. The transmit and receive window sizes may be changed during communication, as long as their sum remains within the limit of N.
Normally, they are each assigned maximum values that respect that limit, but the working value at any given time may be less than the maximum. In particular:. It is common to reduce the transmit window size to slow down transmission to match the link's speed, avoiding. One common simplification of selective-repeat is so called SREJ-REJ ARQ.
Data Link Layer In The Internet
This operates with w r=2 and buffers packets following a gap, but only allows a single lost packet; while waiting for that packet, w r=1 and if a second packet is lost, no more packets are buffered. This gives most of the performance benefit of the full selective-repeat protocol, with a simpler implementation. See also. References.
Networking and Socket Programming is one of the important area of Java programming language, especially for those programmers, who are working in client server based applications. Knowledge of important protocols e.g. In detail is very important, especially if you are in business of writing high frequency trading application, which communicate via FIX Protocol or native exchange protocol. In this article, we will some of the frequently asked questions on networking and socket programming, mostly based around TCP IP protocol. This article is kinda light on NIO though, as it doesn't include questions from multiplexing, selectors, ByteBuffer and FileChannel but it does include classical questions like difference between IO and NIO. Main focus of this post is to make Java developer familiar with low level parts e.g. How TCP and UDP protocol works, socket options and writing multi-threaded servers in Java.
Questions discussed here is not really tied up with Java programming language, and can be used in any programming language, which allows programmers to write client-server applications. By the way, If you are going for interview on Investment banks for core Java developer role, you better prepare well on Java NIO, Socket Programming, TCP, UDP and Networking along with other popular topics e.g., and. You can also contribute any question, which is asked to you or related to socket programming and networking and can be useful for Java interviews. Java Networking and Socket Programming Questions Answers Here is my list of 15 interview questions related to networking basics, internet protocol and socket programming in Java. Though it doesn't contain basic questions form API e.g. Server, ServerSocket, but it focus on high level concept of writing scalable server in Java using NIO selectors and how to implement that using, there limitations and issues etc.
I will probably add few more questions based on some best practices while writing socket based application in Java. If you know a good question on this topic, feel free to suggest. 1) Difference between TCP and UDP protocol?
There are many differences between TCP (Transmission control Protocol) and UDP (User Datagram Protocol), but main is TCP is connection oriented, while UDP is connection less. This means TCP provides guaranteed delivery of messages in the order they are sent, while UDP doesn't provide any delivery guarantee.
Because of this guarantee, TCP is slower than UDP, as it needs to perform more work. TCP is best suited for message, which you can't afford to loss, e.g. Order and trade messages in electronic trading, wire transfer in banking and finance etc. UDP is more suited for media transmission, where loss of one packet, known as datagrams is affordable and doesn't affect quality of service. This answer is enough for most of the interviews, but you need to be more detailed when you are interviewing as Java developer for high frequency trading desk.
Some of the points which many candidate forget to mention is about order and data boundary. In TCP, messages are guaranteed to be delivered in the same order as they are sent but data boundary is not preserved, which means multiple messages can be combined and sent together, or receiver may receive one part of the message in one packet and other part of the message in next packet. Though application will receive full message and in the same order. TCP protocol will do assembling of message for you.
On the other hand, UDP sends full message in a datagram packet, if clients receives the packet it is guaranteed that it will get the full message, but there is no guarantee that packet will come in same order they are sent. In short, you must mention following differences between TCP and UDP protocol while answering during interview:. TCP is guaranteed delivery, UDP is not guaranteed. TCP guarantees order of messages, UDP doesn't. Data boundary is not preserved in TCP, but UDP preserves it. TCP is slower compared to UDP. For more detailed answer, see my post.
2) How does TCP handshake works? Three messages are exchanged as part of TCP head-shake e.g. Initiator sends SYN, upon receiving this Listener sends SYN-ACK, and finally initiator replied with ACK, at this point TCP connection is moved to ESTABLISHED state. This process is easily understandable by looking at following diagram. 3) How do you implement reliable transmission in UDP protocol? This is usually follow-up of previous interview question.
Though UDP doesn't provide delivery guarantee at protocol level, you can introduce your own logic to maintain reliable messaging e.g. By introducing sequence numbers and retransmission. If receiver find that it has missed a sequence number, it can ask for replay of that message from Server. TRDP protocol, which is used (a popular high speed messaging middle-ware) uses UDP for faster messaging and provides reliability guarantee by using sequence number and retransmission. 4) What is Network Byte Order? How does two host communicate if they have different byte-ordering?
There are two ways to store two bytes in memory, little endian (least significant byte at the starting address) and big endian (most significant byte at the starting address). They are collectively known as host byte order. For example, an Intel processor stores the 32-bit integer as four consecutive bytes in memory in the order 1-2-3-4, where 1 is the most significant byte. IBM PowerPC processors would store the integer in the byte order 4-3-2-1. Networking protocols such as TCP are based on a specific network byte order, which uses big-endian byte ordering. If two machines are communicating with each other and they have different byte ordering, they are converted to network byte order before sending or after receiving.
Therefore, a little endian micro-controller sending to a UDP/IP network must swap the order in which bytes appear within multi byte values before the values are sent onto the network, and must swap the order in which bytes appear in multi byte values received from the network before the values are used. In short, you can also say network byte order is standard of storing byte during transmission, and it uses big endian byte ordering mechanism. 5) What is Nagle's algorithm? If interviewer is testing your knowledge of TCP/IP protocol than it's very rare for him not to ask this question. Nagle's algorithm is way of improving performance of TCP/IP protocol and networks by reducing number of TCP packets that needs to be sent over network. It works by buffering small packets until buffer reaches Maximum Segment Size. Since small packets, which contains only 1 or 2 bytes of data, has more overhead in terms of TCP header, which is of 40 bytes.
These small packets can also leads to congestion in slow network. Nagle's algorithm tries to improve efficiency of TCP protocol by buffering them, to send a larger packet. Also Nagle's algorithm has negative effect on non small writes, so if you are writing large data on packets than it's better to disable Nagle's algorithm. In general, Nagle's algorithm is a defence against careless application, which sends lots of small packets to network, but it will not benefit or have a negative effect on well written application, which properly takes care of buffering.
6) What is TCPNODELAY? TCPNODELAY is an option to disable Nagle's algorithm, provided by various TCP implementations. Since Nagle's algorithm performs badly with TCP delayed acknowledgement algorithm, it's better to disable Nagle's when you are doing write-write-read operation.
Where a read after two successive write on socket may get delayed up-to 500 millisecond, until the second write has reached the destination. If latency is more concern over bandwidth usage e.g. In a network based multi-player game, user wants to see action from other player immediately, it's better to bypass Nagle's delay by using TCPNODELAY flag.
7) What is multicasting or multicast transmission? Which Protocol is generally used for multicast? Multi-casting or multicast transmission is one to many distribution, where message is delivered to a group of subscribers simultaneously in a single transmission from publisher. Copies of messages are automatically created in other network elements e.g.
Routers, but only when the topology of network requires it. Multi-casting can only be implemented using UDP, because it sends full data as datagram package, which can be replicated and delivered to other subscribers.
Since TCP is a point-to-point protocol, it can not deliver messages to multiple subscriber, until it has link between each of them. Though, UDP is not reliable, and messages may be lost or delivered out of order. Reliable multicast protocols such as Pragmatic General Multicast (PGM) have been developed to add loss detection and retransmission on top of IP multicast. IP multicast is widely deployed in enterprises, commercial stock exchanges, and multimedia content delivery networks. A common enterprise use of IP multicast is for IPTV applications 8) What is difference between Topic and Queue in JMS? Main difference between Topic and Queue in Java Messaging Service comes when we have multiple consumers to consumer messages. If we set-up multiple to consume messages from Queue, each messages will be dispatched to only one thread and not all thread.
On the other hand in case of Topic each subscriber gets it's own copy of message. 9) What is difference between IO and NIO?
Main difference between NIO and IO is that NIO provides asynchronous, non blocking IO, which is critical to write faster and scalable networking systems. While most of utility from IO classes are blocking and slow.
NIO take advantage of asynchronous system calls in UNIX systems such as select system call for network sockets. Using select, an application can monitor several resources at the same time and can also poll for network activity without blocking. The select system call identifies if data is pending or not, then read or write may be used knowing that they will complete immediately. 10) How do you write multi-threaded server in Java? A multi-threaded server is the one which can server multiple clients without blocking.
Java provides excellent support to developer such server. Prior to Java 1.4, you can write multi-threaded server using traditional socket IO and threads. This had severe limitation on scalability, because it creates new thread for each connection and you can only create a fixed number of threads, depending upon machine's and platform's capability. Though this design can be improved by using and worker threads, it still a resource intensive design.
After JDK 1.4 and NIO's introduction, writing scalable and multi-threaded server become bit easier. You can easily create it in single thread by using Selector, which takes advantage of asynchronous and non-blocking IO model of Java NIO. 11) What is ephemeral port? In TCP/IP connection usually contains four things, Server IP, Server port, Client IP and Client Port. Out of these four, 3 are well known in most of the time, what is not known is client port, this is where ephemeral ports comes into picture. Ephemeral ports are dynamic port assigned by your machine's IP stack, from a specified range, known as ephemeral port range, when a client connection explicitly doesn't specify a port number.
These are short lived, temporary port, which can be reused once connection is closed, but most of IP software, doesn't reuse ephemeral port, until whole range is exhausted. Similar to TCP, UDP protocol also uses ephemeral port, while sending datagram. In Linux ephemeral port range is from 32768 to 61000, while in windows default ephemeral port range is 1025 to 5000.
Similarly different operating system has different ephemeral port ranges 12) What is sliding window protocol? Sliding window protocol is a technique for controlling transmitted data packets between two network computers where reliable and sequential delivery of data packets is required, such as provided by Transmission Control Protocol (TCP). In the sliding window technique, each packet includes a unique consecutive sequence number, which is used by the receiving computer to place data in the correct order. The objective of the sliding window technique is to use the sequence numbers to avoid duplicate data and to request missing data 13) When do you get 'too many files open' error?
Just like File connection, Socket Connection also needs file descriptors, Since every machine has limited number of file descriptors, it's possible that they may ran out of file descriptors. When it happen, you will see error. You can check how many file descriptor per process is allowed on UNIX based system by executing ulimit -n command or simply count entries on /proc/ /fd/ 14) What is TIMEWAIT state in TCP protocol? When does a socket connection goes to TIMEWAIT state?
When one end of TCP Connection closes it by making system call, it goes into TIMEWAIT state. Since TCP packets can arrive in wrong order, the port must not be closed immediately to allow late packets to arrive. That's why that end of TCP connection goes into TIMEWAIT state.
For example, if client closes a socket connection than it will go to TIMEWAIT state, similarly if server closes connection than you will see TIMEWAIT there. You can check status of your TCP and UDP sockets by using these. 15) What will happen if you have too many socket connections in TIMEWAIT state on Server? When a socket connection or port goes into TIMEWAIT state, it doesn't release file descriptor associated with it.
File descriptor is only released when TIMEWAIT state is gone i.e. After some specified configured time. If too many connections are in TIMEWAIT state than your Server may ran out of file descriptors and start throwing 'too many files open' error, and stop accepting new connections.
That's all about in this list of networking and socket programming interview questions and answers. Though I have originally intended this list for Java programmers it is equally useful for any programmer. In fact, this is bare minimum knowledge of sockets and protocols every programmer should have.
I have found that C and C programmers are better answering these questions than an average Java programmer. One reason of this may be because Java programmers has got so many useful library e.g. Apache MINA, which does all the low level work for them. Anyway, knowledge of fundamentals is very important and everything else is just an excuse, but at same point I also recommend using tried and tested libraries like Apache MINA for production code.
Further Reading. Anonymous said. On NIO, I have seen some good questions like: 1) Difference between StringBuilder and ByteBuffer in Java? 2) How Selector works? 3) Difference between Channel and Stream in Java? 4) What is loopback?
What happens if your client and server on same host? 5) How many sockets a Java program can open without crashing?
I don't know if you heard about them, but I found them quite interesting. Anonymous said.
Few more questions to add: 1) What is multicasting? 2) What is difference between broadcast and multicast? Which is more efficient and why? 3) what is multicast address and multicast group?
Program For Sliding Window Protocol In Java
Here you will get sliding window protocol program in C. In computer networks sliding window protocol is a method to transmit data on a network. Sliding window protocol is applied on the Data Link Layer of OSI model. At data link layer data is in the form of frames. In Networking, Window simply means a buffer which has data frames that needs to be transmitted. Both sender and receiver agrees on some window size. If window size=w then after sending w frames sender waits for the acknowledgement (ack) of the first frame.
As soon as sender receives the acknowledgement of a frame it is replaced by the next frames to be transmitted by the sender. If receiver sends a collective or cumulative acknowledgement to sender then it understands that more than one frames are properly received, for eg:- if ack of frame 3 is received it understands that frame 1 and frame 2 are received properly. In sliding window protocol the receiver has to have some memory to compensate any loss in transmission or if the frames are received unordered.
Efficiency of Sliding Window Protocol η = (W.t x)/(t x+2t p) W = Window Size t x = Transmission time t p = Propagation delay Sliding window works in full duplex mode It is of two types:- 1. Selective Repeat: Sender transmits only that frame which is erroneous or is lost. Go back n: Sender transmits all frames present in the window that occurs after the error bit including error bit also. Sliding Window Protocol Program in C Below is the simulation of sliding window protocol in C.
Output Enter window size: 3 Enter number of frames to transmit: 5 Enter 5 frames: 12 5 89 4 6 With sliding window protocol the frames will be sent in the following manner (assuming no corruption of frames) After sending 3 frames at each stage sender waits for acknowledgement sent by the receiver 12 5 89 Acknowledgement of above frames sent is received by sender 4 6 Acknowledgement of above frames sent is received by sender Sliding Window Protocol Program in C Below is the simulation of sliding window protocol in C.
Sliding Window Protocol Program In Java
The simple answer is NO per my understanding (I once studied sliding window algorithm years ago, so I just remember the principles, while cannot remember some details. Correct me if you have more insightful understanding). As the name of the algorithm 'Sliding Window', your window should be sliding not jumping as it says at every position k in the file, the fingerprint of its content is computed in your quotes. That is to say the window slides one character each time. Per my knowledge, the concept of chunks and windows should be distinguished. So should be fingerprint and hash, although they could be the same.
Given it too expense to compute hash as fingerprint, I think is a more proper choice. The chunk is a large block of text in the document and a window highlight a small portion in a chunk. IIRC, the sliding windows algorithm works like this:. The text file is chunked at first;. For each chunk, you slide the window (a 15-char block in your running case) and compute their fingerprint for each window of text;.
You now have the fingerprint of the chunk, whose length is proportional to the length of chunk. The next is how you use the fingerprint to compute the similarity between different documents, which is out of my knowledge. Could you please give us the pointer to the article you referred in the OP. As an exchange, I recommend you this paper, which introduce a variance of sliding window algorithm to compute document similarity. Another application you can refer to is rsync, which is a data synchronisation tool with block-level (corresponding to your chunk) deduplication. See this short article for.