Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TFX Keras Component Tutorial does not run successfully on Windows/Jupyter #2156

Closed
TobiasGoerke opened this issue Jul 15, 2020 · 7 comments
Closed

Comments

@TobiasGoerke
Copy link

TobiasGoerke commented Jul 15, 2020

Following error occurs when running the TFT step using the interactive context:

RuntimeError: FileNotFoundError: [Errno 2] No such file or directory: 'D:\\TEMP\\tfx-interactive-2020-07-15T16_20_13.150570-i7rcm6cj\\Transform\\transform_graph\\5\\.temp_path\\tftransform_tmp\\beam-temp-vocab_compute_and_apply_vocabulary_vocabulary-5f1d634ac6a611ea9e7dfc7774534fa7\\efb7cd96-3da6-4041-b0e7-f205e6945118.vocab_compute_and_apply_vocabulary_vocabulary' [while running 'Analyze/VocabularyOrderAndWrite[compute_and_apply_vocabulary/vocabulary]/WriteToFile/Write/WriteImpl/WriteBundles']

Full stack trace:
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\common.py in process(self, windowed_value)
    960     try:
--> 961       return self.do_fn_invoker.invoke_process(windowed_value)
    962     except BaseException as exn:

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\common.py in invoke_process(self, windowed_value, restriction_tracker, watermark_estimator, additional_args, additional_kwargs)
    726       self._invoke_process_per_window(
--> 727           windowed_value, additional_args, additional_kwargs)
    728     return None

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\common.py in _invoke_process_per_window(self, windowed_value, additional_args, additional_kwargs)
    813           windowed_value,
--> 814           self.process_method(*args_for_process),
    815           self.threadsafe_watermark_estimator)

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\io\iobase.py in process(self, element, init_result)
   1078     bundle = element
-> 1079     writer = self.sink.open_writer(init_result, str(uuid.uuid4()))
   1080     for e in bundle[1]:  # values

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\options\value_provider.py in _f(self, *args, **kwargs)
    134           raise error.RuntimeValueProviderError('%s not accessible' % obj)
--> 135       return fnc(self, *args, **kwargs)
    136 

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\io\filebasedsink.py in open_writer(self, init_result, uid)
    195     writer_path = FileSystems.join(init_result, uid) + suffix
--> 196     return FileBasedSinkWriter(self, writer_path)
    197 

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\io\filebasedsink.py in __init__(self, sink, temp_shard_path)
    416     self.temp_shard_path = temp_shard_path
--> 417     self.temp_handle = self.sink.open(temp_shard_path)
    418 

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\io\textio.py in open(self, temp_path)
    400   def open(self, temp_path):
--> 401     file_handle = super(_TextSink, self).open(temp_path)
    402     if self._header is not None:

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\options\value_provider.py in _f(self, *args, **kwargs)
    134           raise error.RuntimeValueProviderError('%s not accessible' % obj)
--> 135       return fnc(self, *args, **kwargs)
    136 

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\io\filebasedsink.py in open(self, temp_path)
    137     """
--> 138     return FileSystems.create(temp_path, self.mime_type, self.compression_type)
    139 

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\io\filesystems.py in create(path, mime_type, compression_type)
    223     filesystem = FileSystems.get_filesystem(path)
--> 224     return filesystem.create(path, mime_type, compression_type)
    225 

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\io\localfilesystem.py in create(self, path, mime_type, compression_type)
    167       os.makedirs(os.path.dirname(path))
--> 168     return self._path_open(path, 'wb', mime_type, compression_type)
    169 

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\io\localfilesystem.py in _path_open(self, path, mode, mime_type, compression_type)
    142     compression_type = FileSystem._get_compression_type(path, compression_type)
--> 143     raw_file = io.open(path, mode)
    144     if compression_type == CompressionTypes.UNCOMPRESSED:

FileNotFoundError: [Errno 2] No such file or directory: 'D:\\TEMP\\tfx-interactive-2020-07-15T16_20_13.150570-i7rcm6cj\\Transform\\transform_graph\\5\\.temp_path\\tftransform_tmp\\beam-temp-vocab_compute_and_apply_vocabulary_vocabulary-5f1d634ac6a611ea9e7dfc7774534fa7\\efb7cd96-3da6-4041-b0e7-f205e6945118.vocab_compute_and_apply_vocabulary_vocabulary'

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<ipython-input-23-b81ece606df8> in <module>
      3     schema=schema_gen.outputs['schema'],
      4     module_file=os.path.abspath(_taxi_transform_module_file))
----> 5 context.run(transform)

c:\users\got\anaconda3\envs\test\lib\site-packages\tfx\orchestration\experimental\interactive\interactive_context.py in run_if_ipython(*args, **kwargs)
     64       # __IPYTHON__ variable is set by IPython, see
     65       # https://ipython.org/ipython-doc/rel-0.10.2/html/interactive/reference.html#embedding-ipython.
---> 66       return fn(*args, **kwargs)
     67     else:
     68       absl.logging.warning(

c:\users\got\anaconda3\envs\test\lib\site-packages\tfx\orchestration\experimental\interactive\interactive_context.py in run(self, component, enable_cache, beam_pipeline_args)
    166         component, pipeline_info, driver_args, metadata_connection,
    167         beam_pipeline_args, additional_pipeline_args)
--> 168     execution_id = launcher.launch().execution_id
    169 
    170     return execution_result.ExecutionResult(

c:\users\got\anaconda3\envs\test\lib\site-packages\tfx\orchestration\launcher\base_component_launcher.py in launch(self)
    203                          execution_decision.input_dict,
    204                          execution_decision.output_dict,
--> 205                          execution_decision.exec_properties)
    206 
    207     absl.logging.info('Running publisher for %s',

c:\users\got\anaconda3\envs\test\lib\site-packages\tfx\orchestration\launcher\in_process_component_launcher.py in _run_executor(self, execution_id, input_dict, output_dict, exec_properties)
     65         executor_context)  # type: ignore
     66 
---> 67     executor.Do(input_dict, output_dict, exec_properties)

c:\users\got\anaconda3\envs\test\lib\site-packages\tfx\components\transform\executor.py in Do(self, input_dict, output_dict, exec_properties)
    389       label_outputs[labels.CACHE_OUTPUT_PATH_LABEL] = cache_output
    390     status_file = 'status_file'  # Unused
--> 391     self.Transform(label_inputs, label_outputs, status_file)
    392     absl.logging.debug('Cleaning up temp path %s on executor success',
    393                        temp_path)

c:\users\got\anaconda3\envs\test\lib\site-packages\tfx\components\transform\executor.py in Transform(***failed resolving arguments***)
    950                       compute_statistics,
    951                       per_set_stats_output_paths,
--> 952                       materialization_format)
    953   # TODO(b/122478841): Writes status to status file.
    954 

c:\users\got\anaconda3\envs\test\lib\site-packages\tfx\components\transform\executor.py in _RunBeamImpl(self, use_tfxio, analyze_data_list, transform_data_list, preprocessing_fn, input_dataset_metadata, transform_output_path, raw_examples_data_format, temp_path, input_cache_dir, output_cache_dir, compute_statistics, per_set_stats_output_paths, materialization_format)
   1325                | 'Materialize[{}]'.format(infix) >> self._WriteExamples(
   1326                    materialization_format,
-> 1327                    dataset.materialize_output_path))
   1328 
   1329     return _Status.OK()

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\pipeline.py in __exit__(self, exc_type, exc_val, exc_tb)
    545     try:
    546       if not exc_type:
--> 547         self.run().wait_until_finish()
    548     finally:
    549       self._extra_context.__exit__(exc_type, exc_val, exc_tb)

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\pipeline.py in run(self, test_runner_api)
    524         finally:
    525           shutil.rmtree(tmpdir)
--> 526       return self.runner.run_pipeline(self, self._options)
    527     finally:
    528       shutil.rmtree(self.local_tempdir, ignore_errors=True)

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\portability\fn_api_runner\fn_runner.py in run_pipeline(self, pipeline, options)
    171 
    172     self._latest_run_result = self.run_via_runner_api(
--> 173         pipeline.to_runner_api(default_environment=self._default_environment))
    174     return self._latest_run_result
    175 

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\portability\fn_api_runner\fn_runner.py in run_via_runner_api(self, pipeline_proto)
    181     # TODO(pabloem, BEAM-7514): Create a watermark manager (that has access to
    182     #   the teststream (if any), and all the stages).
--> 183     return self.run_stages(stage_context, stages)
    184 
    185   @contextlib.contextmanager

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\portability\fn_api_runner\fn_runner.py in run_stages(self, stage_context, stages)
    329           stage_results = self._run_stage(
    330               runner_execution_context,
--> 331               bundle_context_manager,
    332           )
    333           monitoring_infos_by_stage[stage.name] = (

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\portability\fn_api_runner\fn_runner.py in _run_stage(self, runner_execution_context, bundle_context_manager)
    506               input_timers,
    507               expected_timer_output,
--> 508               bundle_manager)
    509 
    510       final_result = merge_results(last_result)

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\portability\fn_api_runner\fn_runner.py in _run_bundle(self, runner_execution_context, bundle_context_manager, data_input, data_output, input_timers, expected_timer_output, bundle_manager)
    544 
    545     result, splits = bundle_manager.process_bundle(
--> 546         data_input, data_output, input_timers, expected_timer_output)
    547     # Now we collect all the deferred inputs remaining from bundle execution.
    548     # Deferred inputs can be:

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\portability\fn_api_runner\fn_runner.py in process_bundle(self, inputs, expected_outputs, fired_timers, expected_output_timers, dry_run)
    928     with UnboundedThreadPoolExecutor() as executor:
    929       for result, split_result in executor.map(execute, zip(part_inputs,  # pylint: disable=zip-builtin-not-iterating
--> 930                                                             timer_inputs)):
    931         split_result_list += split_result
    932         if merged_result is None:

c:\users\got\anaconda3\envs\test\lib\concurrent\futures\_base.py in result_iterator()
    584                     # Careful not to keep a reference to the popped future
    585                     if timeout is None:
--> 586                         yield fs.pop().result()
    587                     else:
    588                         yield fs.pop().result(end_time - time.monotonic())

c:\users\got\anaconda3\envs\test\lib\concurrent\futures\_base.py in result(self, timeout)
    430                 raise CancelledError()
    431             elif self._state == FINISHED:
--> 432                 return self.__get_result()
    433             else:
    434                 raise TimeoutError()

c:\users\got\anaconda3\envs\test\lib\concurrent\futures\_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\utils\thread_pool_executor.py in run(self)
     42       # If the future wasn't cancelled, then attempt to execute it.
     43       try:
---> 44         self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
     45       except BaseException as exc:
     46         # Even though Python 2 futures library has #set_exection(),

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\portability\fn_api_runner\fn_runner.py in execute(part_map_input_timers)
    924           input_timers,
    925           expected_output_timers,
--> 926           dry_run)
    927 
    928     with UnboundedThreadPoolExecutor() as executor:

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\portability\fn_api_runner\fn_runner.py in process_bundle(self, inputs, expected_outputs, fired_timers, expected_output_timers, dry_run)
    824             process_bundle_descriptor.id,
    825             cache_tokens=[next(self._cache_token_generator)]))
--> 826     result_future = self._worker_handler.control_conn.push(process_bundle_req)
    827 
    828     split_results = []  # type: List[beam_fn_api_pb2.ProcessBundleSplitResponse]

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\portability\fn_api_runner\worker_handlers.py in push(self, request)
    351       self._uid_counter += 1
    352       request.instruction_id = 'control_%s' % self._uid_counter
--> 353     response = self.worker.do_instruction(request)
    354     return ControlFuture(request.instruction_id, response)
    355 

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\worker\sdk_worker.py in do_instruction(self, request)
    469       # E.g. if register is set, this will call self.register(request.register))
    470       return getattr(self, request_type)(
--> 471           getattr(request, request_type), request.instruction_id)
    472     else:
    473       raise NotImplementedError

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\worker\sdk_worker.py in process_bundle(self, request, instruction_id)
    504         with self.maybe_profile(instruction_id):
    505           delayed_applications, requests_finalization = (
--> 506               bundle_processor.process_bundle(instruction_id))
    507           monitoring_infos = bundle_processor.monitoring_infos()
    508           monitoring_infos.extend(self.state_cache_metrics_fn())

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\worker\bundle_processor.py in process_bundle(self, instruction_id)
    970           elif isinstance(element, beam_fn_api_pb2.Elements.Data):
    971             input_op_by_transform_id[element.transform_id].process_encoded(
--> 972                 element.data)
    973 
    974       # Finish all operations.

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\worker\bundle_processor.py in process_encoded(self, encoded_windowed_values)
    216       decoded_value = self.windowed_coder_impl.decode_from_stream(
    217           input_stream, True)
--> 218       self.output(decoded_value)
    219 
    220   def monitoring_infos(self, transform_id, tag_to_pcollection_id):

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\worker\operations.py in output(self, windowed_value, output_index)
    330   def output(self, windowed_value, output_index=0):
    331     # type: (WindowedValue, int) -> None
--> 332     cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
    333 
    334   def add_receiver(self, operation, output_index=0):

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\worker\operations.py in receive(self, windowed_value)
    193     # type: (WindowedValue) -> None
    194     self.update_counters_start(windowed_value)
--> 195     self.consumer.process(windowed_value)
    196     self.update_counters_finish()
    197 

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\worker\operations.py in process(self, o)
    669     # type: (WindowedValue) -> None
    670     with self.scoped_process_state:
--> 671       delayed_application = self.dofn_runner.process(o)
    672       if delayed_application:
    673         assert self.execution_context is not None

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\common.py in process(self, windowed_value)
    961       return self.do_fn_invoker.invoke_process(windowed_value)
    962     except BaseException as exn:
--> 963       self._reraise_augmented(exn)
    964       return None
    965 

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\common.py in _reraise_augmented(self, exn)
   1043           step_annotation)
   1044       new_exn._tagged_with_step = True
-> 1045     raise_with_traceback(new_exn)
   1046 
   1047 

c:\users\got\anaconda3\envs\test\lib\site-packages\future\utils\__init__.py in raise_with_traceback(exc, traceback)
    444         if traceback == Ellipsis:
    445             _, _, traceback = sys.exc_info()
--> 446         raise exc.with_traceback(traceback)
    447 
    448 else:

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\common.py in process(self, windowed_value)
    959     # type: (WindowedValue) -> Optional[SplitResultResidual]
    960     try:
--> 961       return self.do_fn_invoker.invoke_process(windowed_value)
    962     except BaseException as exn:
    963       self._reraise_augmented(exn)

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\common.py in invoke_process(self, windowed_value, restriction_tracker, watermark_estimator, additional_args, additional_kwargs)
    725     else:
    726       self._invoke_process_per_window(
--> 727           windowed_value, additional_args, additional_kwargs)
    728     return None
    729 

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\runners\common.py in _invoke_process_per_window(self, windowed_value, additional_args, additional_kwargs)
    812       self.output_processor.process_outputs(
    813           windowed_value,
--> 814           self.process_method(*args_for_process),
    815           self.threadsafe_watermark_estimator)
    816 

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\io\iobase.py in process(self, element, init_result)
   1077   def process(self, element, init_result):
   1078     bundle = element
-> 1079     writer = self.sink.open_writer(init_result, str(uuid.uuid4()))
   1080     for e in bundle[1]:  # values
   1081       writer.write(e)

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\options\value_provider.py in _f(self, *args, **kwargs)
    133         if not obj.is_accessible():
    134           raise error.RuntimeValueProviderError('%s not accessible' % obj)
--> 135       return fnc(self, *args, **kwargs)
    136 
    137     return _f

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\io\filebasedsink.py in open_writer(self, init_result, uid)
    194     suffix = ('.' + os.path.basename(file_path_prefix) + file_name_suffix)
    195     writer_path = FileSystems.join(init_result, uid) + suffix
--> 196     return FileBasedSinkWriter(self, writer_path)
    197 
    198   @check_accessible(['file_path_prefix', 'file_name_suffix'])

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\io\filebasedsink.py in __init__(self, sink, temp_shard_path)
    415     self.sink = sink
    416     self.temp_shard_path = temp_shard_path
--> 417     self.temp_handle = self.sink.open(temp_shard_path)
    418 
    419   def write(self, value):

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\io\textio.py in open(self, temp_path)
    399 
    400   def open(self, temp_path):
--> 401     file_handle = super(_TextSink, self).open(temp_path)
    402     if self._header is not None:
    403       file_handle.write(coders.ToBytesCoder().encode(self._header))

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\options\value_provider.py in _f(self, *args, **kwargs)
    133         if not obj.is_accessible():
    134           raise error.RuntimeValueProviderError('%s not accessible' % obj)
--> 135       return fnc(self, *args, **kwargs)
    136 
    137     return _f

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\io\filebasedsink.py in open(self, temp_path)
    136     ``close``.
    137     """
--> 138     return FileSystems.create(temp_path, self.mime_type, self.compression_type)
    139 
    140   def write_record(self, file_handle, value):

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\io\filesystems.py in create(path, mime_type, compression_type)
    222     """
    223     filesystem = FileSystems.get_filesystem(path)
--> 224     return filesystem.create(path, mime_type, compression_type)
    225 
    226   @staticmethod

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\io\localfilesystem.py in create(self, path, mime_type, compression_type)
    166       # TODO(Py3): Add exist_ok parameter.
    167       os.makedirs(os.path.dirname(path))
--> 168     return self._path_open(path, 'wb', mime_type, compression_type)
    169 
    170   def open(

c:\users\got\anaconda3\envs\test\lib\site-packages\apache_beam\io\localfilesystem.py in _path_open(self, path, mode, mime_type, compression_type)
    141     """
    142     compression_type = FileSystem._get_compression_type(path, compression_type)
--> 143     raw_file = io.open(path, mode)
    144     if compression_type == CompressionTypes.UNCOMPRESSED:
    145       return raw_file

RuntimeError: FileNotFoundError: [Errno 2] No such file or directory: 'D:\\TEMP\\tfx-interactive-2020-07-15T16_20_13.150570-i7rcm6cj\\Transform\\transform_graph\\5\\.temp_path\\tftransform_tmp\\beam-temp-vocab_compute_and_apply_vocabulary_vocabulary-5f1d634ac6a611ea9e7dfc7774534fa7\\efb7cd96-3da6-4041-b0e7-f205e6945118.vocab_compute_and_apply_vocabulary_vocabulary' [while running 'Analyze/VocabularyOrderAndWrite[compute_and_apply_vocabulary/vocabulary]/WriteToFile/Write/WriteImpl/WriteBundles']

Steps to reproduce error:

  • Create and activate clean conda environment (Python 3.6)
  • pip install jupyter
  • jupyter notebook
  • Clone ipynb file and run all with Jupyter
  • Error occurs on line context.run(transform)

Edit: Notebook runs correctly in colab

@rcrowe-google
Copy link
Contributor

D:\TEMP\tfx-interactive-2020-07-15T16_20_13.150570-i7rcm6cj\Transform\transform_graph\5\.temp_path\tftransform_tmp\beam-temp-vocab_compute_and_apply_vocabulary_vocabulary-5f1d634ac6a611ea9e7dfc7774534fa7\efb7cd96-3da6-4041-b0e7-f205e6945118.vocab_compute_and_apply_vocabulary_vocabulary

How much of this path exists on your system? For example, does D:\TEMP exist?

@TobiasGoerke
Copy link
Author

D:\TEMP\tfx-interactive-2020-07-15T16_20_13.150570-i7rcm6cj\Transform\transform_graph\5.temp_path\tftransform_tmp\beam-temp-vocab_compute_and_apply_vocabulary_vocabulary-5f1d634ac6a611ea9e7dfc7774534fa7\efb7cd96-3da6-4041-b0e7-f205e6945118.vocab_compute_and_apply_vocabulary_vocabulary

How much of this path exists on your system? For example, does D:\TEMP exist?

The path exists fully. There are just no files in it.

@ntakouris
Copy link

ntakouris commented Jul 16, 2020

@TobiasGoerke did you try running this in a non interactive context ? Just with

BEAM_ARGS = [
    '--runner=DirectRunner'
]

   ... pipeline.Pipeline(
        pipeline_name=JOB_NAME,
        pipeline_root=PIPELINE_ROOT,
        components=[...],
        beam_pipeline_args=BEAM_ARGS
    )

and BeamDagRunner().run(<pipeline>)

@TobiasGoerke
Copy link
Author

TobiasGoerke commented Jul 17, 2020

Don't know if that helps but I've written a short beam test using the same transform code which works fine:

class TestStringMethods(unittest.TestCase):

    def test_transform(self):
        data, schema = create_test_data_and_schema()

        with beam.Pipeline() as pipeline:
            with tft_beam.Context(temp_dir=tempfile.mkdtemp()):
                transformed_dataset, transform_fn = (
                        (data, schema) | tft_beam.AnalyzeAndTransformDataset(preprocessing_fn)
                )
                transformed_data, transformed_metadata = transformed_dataset

Edit: this test is not using the Transform component..

@ghost
Copy link

ghost commented Feb 27, 2021

Has anyone solved this? @TobiasGoerke how did you resolve this issue, because I'm facing it as well in Jupyter with Python 3.7.

RuntimeError: FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\k.mufti\\Desktop\\tfx1\\Transform\\transform_graph\\10\\.temp_path\\tftransform_tmp\\beam-temp-vocab_compute_and_apply_vocabulary_1_vocabulary-e7e34b0678f011eb8789fc7774edcabe\\5e8b473f-f071-47a7-9aa8-b419f3a6d5ff.vocab_compute_and_apply_vocabulary_1_vocabulary' [while running 'Analyze/VocabularyOrderAndWrite[compute_and_apply_vocabulary_1/vocabulary]/WriteToText/Write/WriteImpl/WriteBundles']

@rcrowe-google
Copy link
Contributor

Unfortunately we don't currently test against or support Windows deployments.

@TobiasGoerke
Copy link
Author

I don't think I got it to work @Kadri-Saal..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants