File gnutella sharing




















PING messages represent a significant amount of network traffic. One possible improvement would be for a servant, before disconnecting from the network, to send a disconnect message to servants that had previously received a PONG message.

Once the servant has established connections to the network, you will probably begin searching for files. The search message contains the search criteria and the preferred download speed, which is meant to prevent responses from servants on a slow connection. In practice, the user often sets the download speed incorrectly, effectively preventing the filtering of servants on slow connections. When a servant responds to a search message, it includes all of the information needed to retrieve the file, including the IP address and the port on which the server is listening for connections.

The servants can even resume partial downloads if the GET request ends before the file transfer is complete. Finally, the protocol supports a specialized message for dealing with firewall issues. There is a special PUSH message you can use to forward a file that was previously found through a search request. The main idea is to communicate enough information, so the servant behind the firewall can establish the connection to the servant requesting the file, thus sidestepping the firewall.

Future enhancements to the Gnutella protocol will be made in the areas of scalability and spam defense. As the network has grown, network traffic has increased, due to message broadcasts from individual servants. Some work has been done to create a set of guidelines for servants, in order to limit excessive message traffic without requiring a protocol modification. Each Gnutella message contains two pieces of information that can help to address these issues. The first is Time to Live TTL , a value that is set by the servant creating the message and is decremented each time it is forwarded.

When the TTL reaches a value less than one, it should be dropped not forwarded when received. The second value is the Hop count, which starts at zero and is incremented each time the message is forwarded. The guidelines for servants center on dropping messages that have large TTL and Hop values.

A message with a large TTL may be spam; a message with a large Hop value has already flowed over a large part of the network. Defending against spam is much more difficult. Spam occurs on the network when a servant responds to all or most queries with a text message, instead of an actual file that the servant is serving.

The text message usually advertises a Website, and since the spamming servant is responding to a search query, the message is also displayed on other clients that monitor searches. One could envision an alternative Gnutella-like protocol, based on XML, that could have a number of advantages over today's binary data protocol. For one, you could easily read messages, making it easier to develop software using the protocol. You could also validate messages with a validating XML parser, which would allow you to discard malformed messages.

An XML protocol would also make byte-swapping unnecessary, due to the use of little-endian byte ordering. Since Java uses network order, you must swap bytes for some of the numeric values in the messages. One downside of an XML-based protocol is that messages would be larger than those on today's system. I haven't yet discussed anything Java-related, so you may be wondering what this article is doing in JavaWorld. Well, this article introduces JTella, an API designed to enable fast and easy development of Java applications and tools that access the Gnutella network.

JTella is still in an early stage of development version 0. Of course, it can form and maintain connections to the network. Second, it offers a search-monitoring function that allows you to monitor searches received by a JTella servant.

Third, it can send search queries over the network and process the results. I'll now show you some code examples for using JTella.

Two example applications are shown: one with code to monitor the search requests received, the other to send new search requests over the network. See Resources for the source code. Both examples accept two command-line parameters: the first provides the name of a host, the second provides the port used by the remote Gnutella servant. See Resources for several sources of this information. But there's a big problem with spanning trees: they're extremely brittle.

If even a single node faults or churns out of the network, entire arteries of the tree will get knocked offline and become unreachable. In a P2P setting, we have to assume that nodes will crash, packets will be dropped, and the network topology will change over time.

A spanning tree is great for when the network is static, but in a P2P network, it's a non-starter. If we want the scalability of a spanning tree without the brittleness, we want to use gossip. Gossip protocols are simple in principle, and you probably already have the intuition of how they work. This continues until either all reachable nodes receive the message, or the message expires.

The more aggressive the infection factor, the faster and more completely the message propagates. On the other hand, the higher the infection factor, the more noise gets created in the network and the more bandwidth gets consumed per message. If a node gossips to all of their peers, it's known as flooding. Bitcoin performs flooding rather than random infection.

Like many randomized protocols, gossip is imperfect, but it ends up approximating the properties of a minimum spanning tree with high probability. At the same time, it offers much higher fault-tolerance. So with a little bit of gossip theory under our belts, let's look at how Gnutella works beneath the hood.

The design of Gnutella started with a simple idea: let's take a file sharing system like Napster, but remove the central server. Napster, you'll recall, had a central server or set of servers that operated as a search engine over all of the available files and allowed peers to find each other. In Gnutella, the P2P swarm will itself handle search requests and peer discovery.

In Gnutella, each client serves double-time as both a client and server Gnutella calls them " servents ". The clients connect to each other through directly through a P2P overlay graph. The overlay graph is the P2P network that is "overlaid" on top of the actual underlying network. The underlying network in this case is IP itself—that's how nodes have to literally send packets to each other. In the underlying network also known as the underlay , two IPs that share a large prefix will be physically close to each other.

But the P2P overlay graph doesn't necessarily respect the underlying distance. Nearby peers in the overlay graph might be far away in the physical world, and neighbors in the real world may be far away in the P2P graph. Whenever the overlay network is insensitive to the underlying network topology, you're going to get suboptimal routing, since messages aren't taking the real world shortest path.

More advanced P2P systems try to use smarter routing models which take the underlying network into account, but the simplest possible approach is to build an unstructured network , which produces its own random topology and follows that when it comes to message routing. Gnutella is an unstructured network, as is Bitcoin with some caveats we'll address later. By now you've gotten the high-level idea of how Gnutella works.

Let's delve into the finer details. In Gnutella, each node keeps track of a list of peers for simplicity, let's say no more than 5 peers each. Every node periodically pings its peers to ensure they're still online. If a node notices any of its peers have gone dark for too long, the node will un-peer with them and find someone else peer with. The Ping and Pong are used for peer discovery and heartbeating, so let's ignore those for now.

The Push message is only for coordinating file downloads when the file owner is behind a firewall, so ignore that as well. The meat of the protocol occurs in Query and QueryHit messages. Let's say you're looking for a Metallica song, so you construct a Query message. You want to disseminate your query widely, so you gossip this query to a random 3 of your peers. Each of those peers also gossip the query to a random 3 of their peers, and so on.

Note that if this forwarding process were to continue indefinitely, we'd have a problem: your message would loop around in the network forever. This is obviously not what we want—we need some form of memory so a node can discard messages that it's already seen. To solve this, we'll give each message a UUID. By simply keeping track of UUIDs you've already forwarded, you can ignore repeat messages and thus prevent any infinite message loops.

But there's still an issue: every message will propagate through the P2P network until it reaches literally everyone. This is fine if you want that, but it's overkill for file sharing. This would rapidly become a scalability bottleneck every node would literally have to handle every single search in the whole network. To solve this, we can add a TTL time-to-live on every message.

The TTL is an integer that is decremented each time the message is forwarded—once the TTL hits 0, the message is discarded. This means each message will only travel so far before disappearing, like a decaying wave. Freeloaders and parasites cannot be controlled. The freeloader gains all the benefit of the whole system and pushes the cost to those foolish enough to give away their resources. Whether or not this turns out to be a major problem for peer-to-peer systems remains to be seen but the Mojo Nation technology provides flexible tools to reduce freeloading if it becomes a serious problem.

Furthermore, we found out that free riding is distributed evenly between domains, so that no one group contributes significantly more than others, and that peers that volunteer to share files are not necessarily those who have desirable ones.

We argue that free riding leads to degradation of the system performance and adds vulnerability to the system. If this trend continues copyright issues might become moot compared to the possible collapse of such systems. In this paper we analyzed user traffic in Gnutella and concluded that there is a significant amount of free riding in the system.

Furthermore, we found that free riding is distributed evenly between domains, so that no one group contributes significantly more than others, and that peers that volunteer to share files are not necessarily those who have desirable ones.

These findings have serious implications for the future development of Gnutella and its many variants. In order for distributed systems with no central monitoring to succeed, a large amount of voluntary cooperation is required, a requirement that is very hard to fulfill in systems with large user populations that remain anonymous.

Sometimes, the logic behind the decision to cooperate or not changes when the interaction is ongoing, since future expected utility gains will join present ones in influencing the rational individual's decision. In particular, individual expectations concerning the future evolution of the social dilemma can play a significant role in each member's decisions[Hu96].

An interesting continuation of these experiments may lead to an understanding of how free riding changes over time. The Gnutella protocol restores the Web's original symmetry, enabling even transient computers to effectively participate as servers.

It's far from a complete solution, and alternative systems may eclipse it. Nonetheless, this simple and idiosyncratic protocol is currently in the vanguard of the emergence of the transient Web. The transient Web has the potential to be every bit as disruptive as the conventional "permanent" Web, and possibly more so.

What do Gnutella and the Web have to do with each other? Isn't Gnutella just one of many P2P file-sharing systems? Yes, Gnutella enables P2P file sharing, but take a closer look. With Gnutella, file transfer is accomplished via HTTP, the same protocol Web browsers and servers use to transfer Web pages and other data. Under the hood, each Gnutella application contains a no-frills Web-server component for serving files and a primitive browser-like element for retrieving them.

AOL yanked the program from Nullsoft's site within hours, but dozens of reverse-engineered replacements have since been posted to the Net, many complete with source code. As shown at right, Gnutella's architecture is fully decentralized, so file-sharers' computers can find each other without soliciting a central server. Shut down any part of the network, and the rest will keep running. Gnutella's freedom to file-share, alas, isn't without trade-offs: It replaces efficient client-server transactions with a many-to-many packet flurry that can chew up bandwidth.

And the decentralization that prevents shutdowns also means there's no place to build user relationships or collect revenue. Who cares? Not the end user. The Gnutella application on your desktop is actually a peer, acting as both client and server in interactions with a network of similar peers. Unlike Napster, Gnutella has no central servers to which it can connect for information. Before it can begin swapping files, your peer must be told by the user or from its own database the IP address of one other peer to which it can connect.

Your peer sends a ping request to the other peer, announcing its presence on the network. The ping request includes a TTL time-to-live count, which states how many times the request can be forwarded to other computers. The default for most Gnutella peers is 7. The other peer replies to your ping with a pong, which contains its IP address and file sharing information total files and kilobytes shared.

The other peer also forwards your ping to additional Gnutella peers it knows about, after first reducing the TTL count by 1, from 7 to 6. Each peer that receives the packet similarly subtracts 1 from the TTL and forwards the packet to others. Many peers end up forwarding your ping to one another over and over. Gnutella relies on fat bandwidth to overcome this inefficiency.

Users raising their TTLs past 7 could flood the Net with trillions of pings. To keep Gnutella efficient, other peers will adjust high TTLs before forwarding them. Each peer that receives your ping sends back a pong to your peer, routing the pong back along the path of the ping.

As pongs arrive, your hostcatcher collects the IP addresses of available peers. They may be anywhere on the Internet, but all are at most seven degrees of separation from you. This network of peers known to your own is your radius. A typical radius includes 2, to 10, other peers, with , to 1 million files.

Gnutella's open architecture means you can also share files with users of compatible programs such as Gnotella or Gnucleus. To find a file, you enter a search term into the Gnutella interface on your screen.

Your peer then sends a query directly to every peer known to your peer. Each peer searches its local files for matches to your query. If it doesn't find any, it doesn't reply. This prevents your computer from being bombarded with "no results" messages. If there are one or more matches, a query results message is routed to your peer, containing the IP address of the sender and the matching file name.

Unlike Napster or a Web search engine, your peer doesn't know when the search process is complete: Peers that haven't replied either have found no results, or are still working on a reply. Newer implementations allow the user to set the duration of the search. When you select one of the query results for downloading, your peer creates a standard http request - the kind used by browsers to request Web pages - from the IP address and filename in the results message.

It sends this request directly to the peer, which returns the file via http.



0コメント

  • 1000 / 1000