代码改变世界

client.java(org\apache\nutch\protocol\ftp)

2007-07-30 23:10  cppguy  阅读(1389)  评论(0编辑  收藏  举报

client.java为Nutch提供这样的功能:获取Ftp服务器上的文件列表,并能提供能从服务器上下传文件的功能。
  这个类Client(继承自ftp)考虑到了与服务器交互的所有的底层细节,并且提供了方便的高层接口。
  这个类是修改自apache commons-net的FtpClient.java
  (1) Use stream mode for data tranfer. Block mode will be better for
 *     multiple file downloading and partial file downloading. However
 *     not every ftpd has block mode support.
 * (2) Use passive mode for data connection.
 *     So nutch will work if we run behind firewall.
 * (3) Data connection is opened/closed per ftp command for the reasons
 *     listed in (1). There are ftp servers out there,
 *     when partial downloading is enforeced by closing data channel
 *     socket on our client side, the server side immediately closes
 *     control channel (socket). Our codes deal with such a bad behavior.
 * (4) LIST is used to obtain remote file attributes if possible.
 *     MDTM & SIZE would be nice, but not as ubiquitously implemented as LIST.
 * (5) Avoid using ABOR in single thread? Do not use it at all.
以下是开放的公用方法

    /***
     * Enable or disable verification that the remote host taking part
     * of a data connection is the same as the host to which the control
     * connection is attached.  The default is for verification to be
     * enabled.  You may set this value at any time, whether the
     * FTPClient is currently connected or not.
     * <p>
     * @param enable True to enable verification, false to disable verification.
     ***/
    public void setRemoteVerificationEnabled(boolean enable)
    {
        __remoteVerificationEnabled = enable;
    }

    /***
     * Return whether or not verification of the remote host participating
     * in data connections is enabled.  The default behavior is for
     * verification to be enabled.
     * <p>
     * @return True if verification is enabled, false if not.
     ***/
    public boolean isRemoteVerificationEnabled()
    {
        return __remoteVerificationEnabled;
    }

    /***
     * Login to the FTP server using the provided username and password.
     * <p>
     * @param username The username to login under.
     * @param password The password to use.
     * @return True if successfully completed, false if not.
     * @exception FTPConnectionClosedException
     *      If the FTP server prematurely closes the connection as a result
     *      of the client being idle or some other reason causing the server
     *      to send FTP reply code 421.  This exception may be caught either
     *      as an IOException or independently as itself.
     * @exception IOException  If an I/O error occurs while either sending a
     *      command to the server or receiving a reply from the server.
     ***/
    public boolean login(String username, String password) throws IOException
    {
        user(username);

        if (FTPReply.isPositiveCompletion(getReplyCode()))
            return true;

        // If we get here, we either have an error code, or an intermmediate
        // reply requesting password.
        if (!FTPReply.isPositiveIntermediate(getReplyCode()))
            return false;

        return FTPReply.isPositiveCompletion(pass(password));
    }

    /***
     * Logout of the FTP server by sending the QUIT command.
     * <p>
     * @return True if successfully completed, false if not.
     * @exception FTPConnectionClosedException
     *      If the FTP server prematurely closes the connection as a result
     *      of the client being idle or some other reason causing the server
     *      to send FTP reply code 421.  This exception may be caught either
     *      as an IOException or independently as itself.
     * @exception IOException  If an I/O error occurs while either sending a
     *      command to the server or receiving a reply from the server.
     ***/
    public boolean logout() throws IOException
    {
        return FTPReply.isPositiveCompletion(quit());
    }

    // retrieve list reply for path
    public void retrieveList(String path, List entries, int limit,
      FTPFileEntryParser parser)
      throws IOException,
        FtpExceptionCanNotHaveDataConnection,
        FtpExceptionUnknownForcedDataClose,
        FtpExceptionControlClosedByForcedDataClose {
      Socket socket = __openPassiveDataConnection(FTPCommand.LIST, path);

      if (socket == null)
        throw new FtpExceptionCanNotHaveDataConnection("LIST "
          + ((path == null) ? "" : path));

      BufferedReader reader =
          new BufferedReader(new InputStreamReader(socket.getInputStream()));

      // force-close data channel socket, when download limit is reached
      boolean mandatory_close = false;

      //List entries = new LinkedList();
      int count = 0;
      String line = parser.readNextEntry(reader);
      while (line != null) {
        FTPFile ftpFile = parser.parseFTPEntry(line);
        // skip non-formatted lines
        if (ftpFile == null) {
          line = parser.readNextEntry(reader);
          continue;
        }
        entries.add(ftpFile);
        count += line.length();
        // impose download limit if limit >= 0, otherwise no limit
        // here, cut off is up to the line when total bytes is just over limit
        if (limit >= 0 && count > limit) {
          mandatory_close = true;
          break;
        }
        line = parser.readNextEntry(reader);
      }

      //if (mandatory_close)
      // you always close here, no matter mandatory_close or not.
      // however different ftp servers respond differently, see below.
      socket.close();

      // scenarios:
      // (1) mandatory_close is false, download limit not reached
      //     no special care here
      // (2) mandatory_close is true, download limit is reached
      //     different servers have different reply codes:

      try {
        int reply = getReply();
        if (!_notBadReply(reply))
          throw new FtpExceptionUnknownForcedDataClose(getReplyString());
      } catch (FTPConnectionClosedException e) {
        // some ftp servers will close control channel if data channel socket
        // is closed by our end before all data has been read out. Check:
        // tux414.q-tam.hp.com FTP server (hp.com version whp02)
        // so must catch FTPConnectionClosedException thrown by getReply() above
        //disconnect();
        throw new FtpExceptionControlClosedByForcedDataClose(e.getMessage());
      }

    }

    // retrieve file for path
    public void retrieveFile(String path, OutputStream os, int limit)
      throws IOException,
        FtpExceptionCanNotHaveDataConnection,
        FtpExceptionUnknownForcedDataClose,
        FtpExceptionControlClosedByForcedDataClose {

      Socket socket = __openPassiveDataConnection(FTPCommand.RETR, path);

      if (socket == null)
        throw new FtpExceptionCanNotHaveDataConnection("RETR "
          + ((path == null) ? "" : path));

      InputStream input = socket.getInputStream();

      // 20040318, xing, treat everything as BINARY_FILE_TYPE for now
      // do we ever need ASCII_FILE_TYPE?
      //if (__fileType == ASCII_FILE_TYPE)
      // input = new FromNetASCIIInputStream(input);

      // fixme, should we instruct server here for binary file type?

      // force-close data channel socket
      boolean mandatory_close = false;

      int len; int count = 0;
      byte[] buf =
        new byte[org.apache.commons.net.io.Util.DEFAULT_COPY_BUFFER_SIZE];
      while((len=input.read(buf,0,buf.length)) != -1){
        count += len;
        // impose download limit if limit >= 0, otherwise no limit
        // here, cut off is exactly of limit bytes
        if (limit >= 0 && count > limit) {
          os.write(buf,0,len-(count-limit));
          mandatory_close = true;
          break;
        }
        os.write(buf,0,len);
        os.flush();
      }

      //if (mandatory_close)
      // you always close here, no matter mandatory_close or not.
      // however different ftp servers respond differently, see below.
      socket.close();

      // scenarios:
      // (1) mandatory_close is false, download limit not reached
      //     no special care here
      // (2) mandatory_close is true, download limit is reached
      //     different servers have different reply codes:

      // do not need this
      //sendCommand("ABOR");

      try {
        int reply = getReply();
        if (!_notBadReply(reply))
          throw new FtpExceptionUnknownForcedDataClose(getReplyString());
      } catch (FTPConnectionClosedException e) {
        // some ftp servers will close control channel if data channel socket
        // is closed by our end before all data has been read out. Check:
        // tux414.q-tam.hp.com FTP server (hp.com version whp02)
        // so must catch FTPConnectionClosedException thrown by getReply() above
        //disconnect();
        throw new FtpExceptionControlClosedByForcedDataClose(e.getMessage());
      }

    }

    // reply check after closing data connection
    private boolean _notBadReply(int reply) {

      if (FTPReply.isPositiveCompletion(reply)) {
        // do nothing
      } else if (reply == 426) { // FTPReply.TRANSFER_ABORTED
      // some ftp servers reply 426, e.g.,
      // foggy FTP server (Version wu-2.6.2(2)
        // there is second reply witing? no!
        //getReply();
      } else if (reply == 450) { // FTPReply.FILE_ACTION_NOT_TAKEN
      // some ftp servers reply 450, e.g.,
      // ProFTPD [ftp.kernel.org]
        // there is second reply witing? no!
        //getReply();
      } else if (reply == 451) { // FTPReply.ACTION_ABORTED
      // some ftp servers reply 451, e.g.,
      // ProFTPD [ftp.kernel.org]
        // there is second reply witing? no!
        //getReply();
      } else if (reply == 451) { // FTPReply.ACTION_ABORTED
      } else {
      // what other kind of ftp server out there?
        return false;
      }

      return true;
    }

    /***
     * Sets the file type to be transferred.  This should be one of
     * <code> FTP.ASCII_FILE_TYPE </code>, <code> FTP.IMAGE_FILE_TYPE </code>,
     * etc.  The file type only needs to be set when you want to change the
     * type.  After changing it, the new type stays in effect until you change
     * it again.  The default file type is <code> FTP.ASCII_FILE_TYPE </code>
     * if this method is never called.
     * <p>
     * @param fileType The <code> _FILE_TYPE </code> constant indcating the
     *                 type of file.
     * @return True if successfully completed, false if not.
     * @exception FTPConnectionClosedException
     *      If the FTP server prematurely closes the connection as a result
     *      of the client being idle or some other reason causing the server
     *      to send FTP reply code 421.  This exception may be caught either
     *      as an IOException or independently as itself.
     * @exception IOException  If an I/O error occurs while either sending a
     *      command to the server or receiving a reply from the server.
     ***/
    public boolean setFileType(int fileType) throws IOException
    {
        if (FTPReply.isPositiveCompletion(type(fileType)))
        {
            __fileType = fileType;
            __fileFormat = FTP.NON_PRINT_TEXT_FORMAT;
            return true;
        }
        return false;
    }

    /***
     * Fetches the system type name from the server and returns the string.
     * This value is cached for the duration of the connection after the
     * first call to this method.  In other words, only the first time
     * that you invoke this method will it issue a SYST command to the
     * FTP server.  FTPClient will remember the value and return the
     * cached value until a call to disconnect.
     * <p>
     * @return The system type name obtained from the server.  null if the
     *       information could not be obtained.
     * @exception FTPConnectionClosedException
     *      If the FTP server prematurely closes the connection as a result
     *      of the client being idle or some other reason causing the server
     *      to send FTP reply code 421.  This exception may be caught either
     *      as an IOException or independently as itself.
     * @exception IOException  If an I/O error occurs while either sending a
     *  command to the server or receiving a reply from the server.
     ***/
    public String getSystemName()
      throws IOException, FtpExceptionBadSystResponse
    {
      //if (syst() == FTPReply.NAME_SYSTEM_TYPE)
      // Technically, we should expect a NAME_SYSTEM_TYPE response, but
      // in practice FTP servers deviate, so we soften the condition to
      // a positive completion.
        if (__systemName == null && FTPReply.isPositiveCompletion(syst())) {
            __systemName = (getReplyStrings()[0]).substring(4);
        } else {
            throw new FtpExceptionBadSystResponse(
              "Bad response of SYST: " + getReplyString());
        }

        return __systemName;
    }

    /***
     * Sends a NOOP command to the FTP server.  This is useful for preventing
     * server timeouts.
     * <p>
     * @return True if successfully completed, false if not.
     * @exception FTPConnectionClosedException
     *      If the FTP server prematurely closes the connection as a result
     *      of the client being idle or some other reason causing the server
     *      to send FTP reply code 421.  This exception may be caught either
     *      as an IOException or independently as itself.
     * @exception IOException  If an I/O error occurs while either sending a
     *      command to the server or receiving a reply from the server.
     ***/
    public boolean sendNoOp() throws IOException
    {
        return FTPReply.isPositiveCompletion(noop());
    }